Jargon: Order Statistics problem is to find the kth smallest element in an unsorted array A[1..n]. This problem can be solved in O(n) time in average case using randomized select algorithm or in O(n) time in worst case time using an algorithm that chooses the pivot element carefully.
Problem: How to find second smallest element in an array A[1..n] of size n. We would like to do in much less than 2n comparisons.
Answer: Time Complexity = n + ceil(log n) - 2
Solution: show
Finding the smallest element is trivial. The following code would find the smallest element using n-1 comparisons.
min = A[1]
for i = 2 to n
if A[i] < min
min = A[i]
return min
The trivial algorithm to find the second minimum is to keep another variable (smin) along with min and if an element knocks out smin then check if it knocks out min and accordingly do the book-keeping. It can be done using 2n-3 comparisons and takes constant space.
A point to observe is that the second smallest element is knocked out by the smallest one at some stage. If we preserve which element is knocked out by whom then finding the second smallest element would just require us to find the smallest amongst the elements knocked out by the smallest element.
But in order to do that we need to build the algorithm in a way that we are playing a pair wise tournament. Lets say we play a tournament and knockout n/2 elements after n/2 comparisons. The recurrence relation looks like this:
T(n) = T(n/2) + n/2
Solving this recurrence gives that T(n) = n - 1. The height of the recurrence tree is lg n. So if we just check these elements knocked by root (min) we get the second minimum.For. E.x.
1 5 2 6
\ / \ /
1 2
\ /
1
As soon as we build the tree using n-1 comparisons, we can check the second largest amongst the number knocked out by 1. i.e. 2, 5. so the second minimum is 2. The number of comparison is n + ceil(log n) - 2, where log is taken to the base 2. We implement this algorithm by maintaining a list of indices knocked by an element. The python code is towards the end of this post.
Another algorithm that is very easy to implement but sadly takes 2n - 2logn - 1 in terms of array element comparisons. Simply build a min heap (rooted at 1). Return the smaller of numbers at index 2 or 3 in the array. The cost of min heap building is = 2 * [n/4 + 2n/8 + ...... + (logn - 1) * n / 2^logn]. The solution to this arithmetic - geometric series is 2n - 2logn - 2. Comparing elements at 2nd and 3rd index adds one to the cost. Its very easy to implement though.
Code for n + log n - 2 algorithm: second_smallest.py
Alternate implementation using linked list: second_smallest_linkedlist.py
Great article! And it becomes so easy to code it with Python. In fact, a certain company gave reference to your blog and asked to solve the problem.
ReplyDeleteI encountered this problem as an exercise in CLRS and failed to figure out where the lg(n) come from. The phrase "knocked out by 1" explains everything.
ReplyDeleteThank you.
I am sorry, I am not sure if I understand this. What happens when the input array is sorted?
ReplyDelete{1, 2, 3, 4}
Won't the algorithm fail?
Ignore my comment. I was wrong.
ReplyDeleteThe tournament approach sounds like a great idea because the number of comparisons made is very small, but the other half of the story is that you are doing about N memory writes as opposed to no memory writes (if you kept just the top 2 values, those would likely end up being in registers), and you are using extra space. In practice, I would be surprised if this approach could outperform a naive implementation that just maintains the top 2 values at all times.
ReplyDeleteHi Anon. A naive approach would be fine if cost of comparison is small.
ReplyDeleteImagine that the N items are database transactions and a comparison cost is large (say date comparison + string matching + expression evaluations). Then tournament approach makes more sense.
Again I am making cases but the solution is interesting mostly from academic point-of-view.