Puzzles, Maths and Algorithms: Finding Majority Element

Jargon: Majority Element is an element of an array that occurs more than half the number of times in the array. Also, [x] means smallest integer greater than x.

Problem 1: Assume that an integer array A[1..n] has a majority element and elements other than majority element are distinct. How to find majority element. What's the space and time complexity of the algorithm ?

Complexity: Time Complexity = n/2(comparisons), Space Complexity = O(1)

Solution: show

We will use pigeonhole principle to make our clever observations. There can be a maximum of [n/2]-1 elements that are distinct, remaining [n/2] element have to be same (as majority element exists in the array). We can consider that the [n/2]-1 distinct elements to be laid out and the space between two laid out elements to be a basket of infinite size. So we will have [n/2] baskets. If we try to distribute [n/2] majority element to these baskets, then in only one permutation will each basket get one element. Otherwise, some baskets would get more than one element and some baskets would remain empty.

Using this observation, we can claim that to find majority element we need to check adjacent elements for equality or two hop neighbor (adjacent to neighbor) for equality. That requires two comparison per element and the algorithm would look like this:

for i <- 1 to n-2
  if A[i] == A[i+1] or A[i] == A[i+2]
    return A[i]

A quick observation tells that this algorithm's for loop would stop when i=[3n/4]+1 even in worst case and the number of comparisons done will be ~ 3n/2. Now, we can make another clever observation that is, if we do not find two adjacent elements to be equal after complete scanning of array, then first element has to be majority element by pigeonhole principle.

for i <- 1 to n-1
  if A[i] == A[i+1]
    return A[i]
return A[1]

In Worst case this algorithms stops at i=n+1 and the number of comparisons done is = n-1. Even though this looks better than the first solution but it could be worse in practice. On a highly pipelined architecture with branch prediction, first one is much better than the second one, even though it does more comparisons per loop. So testing for large array sizes with random shuffling of elements will give clear insight into which algorithm is faster. Now an extremely clever solution which is based on observation that if we pair array elements and strike out a pair if its both the elements are distinct or else return the pair element as majority element. If all pair are striked out then what ever element is left has to be majority element.

i = 1
while i <= n 
  if A[i] == A[i+1]: return A[i]
  i+= 2

return A[n]

Worst case analysis tells that algorithms would stop when i=n-1 and the number of comparisons done is ~ n/2. This is indeed the best in worst case that we can do.

Problem 2: Assume that an integer array A[1..n] has a majority element and elements other than majority element need NOT be distinct. How to find majority element. What's the space and the time complexity of the algorithm?

Complexity: Time Complexity=n(comparison), Space Complexity=O(1)

Solution: show

This algorithm uses an observation that if we consider our array to be list of voters and array value is the candidate to whom they are voting, then majority element is the winning candidate. Additionally, if all other candidates are merged to form one candidate then still it will fail to defeat the majority element candidate (hence the name majority element). So we can think of it this way. Assuming a negative vote decrease the count of majority element then still the count of majority element would be atleast 1. This algorithm implements the above idea.

element = A[1]
votes = 1
for i <- 2 to n:
  if A[i] == element
    votes += 1
  else if votes > 0
    votes -= 1
  else:
    element = A[i]
    votes = 1

return element

This algorithm does n-1 comparisons. The comparisons such as votes > 0 don't count as they are integer variable comparison.

Problem 3: Now consider that there are k majority elements, i.e., each of the k majority elements appear in the array more than ceil(n/(k+1)) times. How do we find the k majority elements. (Problem 2 is a special case of this problem with k=1).

Complexity: Time Complexity=k*n(comparison), Space Complexity=O(k)

Solution: show

Algorithm uses the same intuition as that presented for problem 2. But instead of one counter, we keep k counters. Each counter is initialized with a unique element from the array and we maintain count of how many times that element has been seen so far. Now for a new element x, if it matches an element in the counter then its count is incremented by 1. If the counter of an element is 0 then the x replaces that element and its count is set to 1. If x doesnt match any element on the counter, and all elements have count > 0, then all counters are decremented by 1. Then we move to next element. Following code implements this logic.

for i = 1:n
   // update element count in majority array
   for j = 1:k
      if Maj[j] == A[i] 
         Count[j] += 1
         break
      end
   end
   
   // succeeded in above operation (dont go below)
   if k > j
      continue
   end
   
   // Came here, it means that element not in 
   // majority array (put it in an empty slot)
   for j = 1:k
      if Count[j] == 0
         Maj[j] = A[i] 
         Count[j] = 1
         break
      end
   end
   
   // succeeded in above operation (dont go below)
   if k > j
      continue
   end
   
   // Came here, it means that no empty slow 
   // (decrement counts of all majority elements)
   for j = 1:k
      Count[j] -= 1
   end
end

The algorithm runs in O(nk) time and takes O(k) extra space. It is easy to see that algorithm finds all the k majority elements. This is becase a non-majority element can knock of all the K majority element once. Since there are less than n - k*ceil(n/(k+1)) (< n/(k+1)) non-majority elements, majority elements will survive in the Maj array.

9 comments:

HariNovember 26, 2009 at 4:52:00 AM PST
Super solution i like it :)
UnknownMarch 3, 2010 at 2:08:00 AM PST
beautiful soln....
eugene.yarovoiJuly 18, 2012 at 5:36:00 AM PDT
If there's a majority element and non-majority elements are distinct, you can solve in O(1) expected time. Just pick a random pair of distinct indexes, and see if the values at those indexes are the same. If so, you've found your majority element. The probability of making such a pick is > 1/4, so the expected number of attempts needed is < 4.
AdityaJuly 18, 2012 at 1:51:00 PM PDT
Interesting solution Eugene, but we are looking for the worst case complexity here.

In general probabilistic methods are faster (for e.g. for primality testing) but its hard to put tight guarantees on them.
AnonymousFebruary 25, 2013 at 12:57:00 PM PST
Strictly speaking, your Space Complexity is actually O(log n) (or O(k log n)) since maintaining a counter isn't O(1) space but actually O(log n) space.

Zorba P.
AdityaFebruary 25, 2013 at 1:16:00 PM PST
Zorba,

Please be specific about which problem you are talking about.

For all the three problems, look at the pseudo code and think if you see the memory to be O(1) or O(K log N).

I guess you are confusing between space and time complexities or may be iterative vs recursive implementations ?

Please elaborate.

Thanks.

Which problem are you talking about
AnonymousOctober 29, 2013 at 6:17:00 PM PDT
hi,

in the third part, should the condition be if ( j<k) as every time the control breaks out of the first nested for loop it will always be less than k and move on the next for loop.

AnonymousOctober 29, 2013 at 6:19:00 PM PDT
also what is the initial value of maj[] and count[] in the third part? thanks!
AdityaOctober 30, 2013 at 9:25:00 AM PDT
Thanks Anon, you are right about the comparison (j > k). It has been corrected.

For initialization, maj[i] = Nan, count[i] = 0 should work.

December 22, 2008

Finding Majority Element

9 comments: