Problem: Given a finite stream of data elements, develop an algorithm to pick an element from the stream, randomly. Algorithm should guarantee that, every element of the stream has equal probability of getting picked. Additionally, we don't know the size of the stream until the stream actually finishes. We also have constant space, so all the elements of the stream can not be stored.
Hint: Once we see an element from the stream, we either store it or discard it with some probability, p.
Solution: show
This problem sounds hard due to lack of crucial data. Assume that we know the length of the stream, then the solution is trivial. Simply generate a random number between 1 and stream.length (say r). Then pick the rth element and declare it as the randomly chosen element. Here is the code for the same.
r = random(1, stream.length)
for i <- 1 to r-1
stream.nextElement()
return stream.nextElement() // this is the randomly picked element
Now, since we don't know the length of the stream, this problem looks difficult. But surprisingly the solution is quite simple. Let us first build the loop invariant property for the solution.
Assuming we have seen n-1 elements and picked one element e, randomly so far with probability 1/(n-1). Now we see another element (say e1) , if we keep e with probability = (n-1)/n and replace it with e1 with probability 1/n, then e contains e1 with probability 1/n or an element from previously seen n-1 elements with probability = 1/(n-1) * (n-1)/n = 1/n.
Bingo, so we made sure that all elements are picked with uniform randomness. This loop invariant is adhered by the following code:
n = 0
e = Nil
while stream.hasElement()
n += 1
r = random(1, n) // including 1 and n
if r == 1
e = stream.nextElement()
return e
2 comments:
Related post: http://blog.smola.org/post/1077104724/random-elements-from-a-stream#disqus_thread
You made numerous nice ideas there. I done a search on the issue and learnt nearly all peoples will agree with your blog.
Post a Comment