Solution:
If we know the length of the stream, then problem is trivial. Simply generate a random number r between 1 and N. Declare the number at index r as the desired random number.
For an unknown but finite N, we need that elements are selected with equal probability. Let us first build the loop invariant property for the solution. Say we have seen k elements so far, the selected number (say n) must be selected with probability 1/k. As we see a new element (say n1), we should set n=n1 with probability 1/(k+1). This ensures that n1 is selected with probability 1/(k+1). In the other case, n (the one NOT replaced by n1) is selected with probability = 1/k * (1-1/(k+1)) = 1/(k+1). Hence this strategy ensures that a number a selected with probability 1/N for N>=k. This loop invariant is adhered by the following code:
n=0, N=0 while Stream.hasElement() ++N if rand <= 1/N n = Stream.nextElement() return nAfter the algorithm processes all elements, n would contain the randomly selected element and N would contain the number of elements in the stream.
Related post: http://blog.smola.org/post/1077104724/random-elements-from-a-stream#disqus_thread
ReplyDeleteYou made numerous nice ideas there. I done a search on the issue and learnt nearly all peoples will agree with your blog.
ReplyDelete