Puzzles, Maths and Algorithms: January 2009

January 31, 2009

Loops in Linked List

Problems with linked lists occur mainly while building or using a bad memory manager. Let us discuss two well known linked list problems.

Problem: Find if two linked lists short at some node. Find the node at which the two lists short.

Solution: show

Subtract the length of smaller list from the larger one. Move a pointer ahead on larger list by the difference of lengths. Now put a pointer on head of smaller list. More two pointers together. The point at which they meet is the point where they short. The following code gives the idea.

d = length(head1) - length(head2)
l1, l2 = head1, head2
if d < 0
  d *= -1 
  l1, l2 = head2, head1

ptr1 = l1
for i <- 1 to d
  ptr1 = ptr1->next

ptr2 = l2
while ptr1 != null
  if ptr2 == ptr1
    return ptr1

ptr1 = ptr1->next
ptr2 = ptr2->next

We can use this idea to find the shorted node to find the solution to next problem.

Problem: A linked list might contain a loop. How do we detect existence of the loop and find the node from which loop starts. Propose an O(n) time algorithm that doesn't take extra space and doesn't modify the linked list.

Solution: show

To find if the linked list contains a loop, run two pointers over the linked list, one that move one node at a time and other that moves two nodes at a time. Both of these pointers meet if the linked list contains a loop, otherwise the pointer moving with twice speed reaches the end while the slower pointer reaches midway.

fastptr = head
slowptr = head

while fastptr != null and fastptr->next != null
  fastptr = fastptr->next->next
  slowptr = slowptr->next
  if fastptr == slowptr
    return "loop exists"
return "loop doesnt exist"

If loop exists, then fast pointer meets the slow pointer before slow pointer completes one rotation of the loop. This fact can be easily proved. Once the pointers meet, we can count the number of nodes in the loop.

loopNodeCount = 0
do 
  loopNodeCount += 1
  slowptr = slowptr->next
while slowptr != fastptr
return loopNodeCount

Now using the trick used in the two linked list shorting problem above, we start two pointers, one from the head and other after advancing "loopNodeCount" nodes ahead. The node at which both the pointers meet is the loop node.

advptr = head
for i in range(1, loopNodeCount)
  advptr = advptr->next

normalptr = head
while normalptr != advptr
  normalptr = normalptr->next
  advptr = advptr->next
return advptr

Monty Hall Problem

This is a very famous probability problem. This problem illustrates that probability is sometimes more convincing than "what we perceive as obvious". I'll post another problem that defeats "obvious reasoning".

Monty Hall problem: Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

Solution: show

Red Green Cards: A box has three cards. First card has both sides green, second card has both sides red, third card has one side green and one side red. A card is picked at random and its one side is observed to be green. What is the probability that the other side of the card is also green.

Solution: show

January 30, 2009

Finding Second Smallest Element

Jargon: Order Statistics problem is to find the kth smallest element in an unsorted array A[1..n]. This problem can be solved in O(n) time in average case using randomized select algorithm or in O(n) time in worst case time using an algorithm that chooses the pivot element carefully.

Problem: How to find second smallest element in an array A[1..n] of size n. We would like to do in much less than 2n comparisons.

Answer: Time Complexity = n + ceil(log n) - 2

Solution: show

Finding the smallest element is trivial. The following code would find the smallest element using n-1 comparisons.

min = A[1]
for i = 2 to n
  if A[i] < min 
    min = A[i]
return min

The trivial algorithm to find the second minimum is to keep another variable (smin) along with min and if an element knocks out smin then check if it knocks out min and accordingly do the book-keeping. It can be done using 2n-3 comparisons and takes constant space. A point to observe is that the second smallest element is knocked out by the smallest one at some stage. If we preserve which element is knocked out by whom then finding the second smallest element would just require us to find the smallest amongst the elements knocked out by the smallest element. But in order to do that we need to build the algorithm in a way that we are playing a pair wise tournament. Lets say we play a tournament and knockout n/2 elements after n/2 comparisons. The recurrence relation looks like this:

T(n) = T(n/2) + n/2

Solving this recurrence gives that T(n) = n - 1. The height of the recurrence tree is lg n. So if we just check these elements knocked by root (min) we get the second minimum.For. E.x.

1   5 2   6  
\ /   \ /  
 1     2
  \   /
    1

As soon as we build the tree using n-1 comparisons, we can check the second largest amongst the number knocked out by 1. i.e. 2, 5. so the second minimum is 2. The number of comparison is n + ceil(log n) - 2, where log is taken to the base 2. We implement this algorithm by maintaining a list of indices knocked by an element. The python code is towards the end of this post.

Another algorithm that is very easy to implement but sadly takes 2n - 2logn - 1 in terms of array element comparisons. Simply build a min heap (rooted at 1). Return the smaller of numbers at index 2 or 3 in the array. The cost of min heap building is = 2 * [n/4 + 2n/8 + ...... + (logn - 1) * n / 2^logn]. The solution to this arithmetic - geometric series is 2n - 2logn - 2. Comparing elements at 2nd and 3rd index adds one to the cost. Its very easy to implement though.

Code for n + log n - 2 algorithm: second_smallest.py
Alternate implementation using linked list: second_smallest_linkedlist.py

January 23, 2009

String Permutations

Printing all permutations of a string is a very common interview question. We'll discuss this problem and some interesting variations of it. The original problem of string permutation says, "print all permutations of a string". As an example, if the string is "abc" there are 6 permutations {abc, acb, bac, bca, cab, cba}. Assume that string contains no duplicate elements.

Solution: show

There are many ways in which this problem can be solved. The most easiest of the ways is to code it using simple yet powerful recursion technique.

function permute(str, d)
  if d == length(str)
    print str
  else
    for i <- d to length(str)
      swap(str[d] <-> str[i]) // swap character for permutation
      permute(str, d + 1)
      swap(str[d] <-> str[i]) // undo swap for parent call

Above code performs the swapping operation twice. First time to generate all possible permutations of character at that level and second time to restore to the original string. It is important to restore to the original string (as passed to this call), otherwise the algorithm ends up printing redundant permutations. Additionally, this code assumes that there are no duplicates in the string, else it fails.

There are other ways in which string permutation can be printed that require circular shifting, but it cant get any simpler than above.

Source Code: permute_unique_chars.py

Lets us assume that string can contain duplicates. How do we print all non-redundant permutations of the string. For ex. If string is "abab" the permutations are {baab, abba, abab, baba, bbaa, aabb}

Solution: show

Clearly the solution to first problem fails here. This is due to fact that we generate all permutations of the string after the swapping step. Same character could get swapped to dth spot again, leading to redundant permutations. A very simple trick solves the game. The trick is to pre-sort the string on alphabetical order. Now we need to ensure that while we are swapping, to generate permutation for that particular character, if that matches the last swapped character we ignore generating permutations. Effectively, we are doing a cyclic shift and before the end of the call we reset the cyclic shift. The following code implements the same:

function permute(str, d)
  if d == length(str)
    print str
  else
    lastSwap = Nil
    for i <- d to length(str)
      if lastSwap == str[i]
        continue
      else
        lastSwap = str[i]

      swap(str[d] <-> str[i]) // swap character for permutation
      permute(str, d + 1)

    for i <- d to length(str)-1
       str[i] = str[i+1]
    str[length(str)] = last

If the string is pre-sorted alphabetically then the above algorithm works magically. Note that in this case, after permute is called, we do not undo the swap. This ensures that we are cyclic shifting and hence the intermediate str from d to l is also sorted. After the for loop call, we reverse the cyclic shift. The reason is that sorting ensures that if there are duplicates then the second occurrence of the duplicate element is picked immediately after we finish generating permutations for the first occurrence of the duplicate element. If sorting is not allowed, then you need to maintain a local hashtable (local to each function call), that maintains characters swapped to dth spot and ensures that duplicate candidates are discarded. Presort and make life simple.

Source Code: permute_chars.py

Some other constraints could be added to make the problem more interesting. One such constrain could be of partial ordering of characters. For Ex. 'a' should appear before 'c', and 'b' should appear before 'c' in all permutations. There can exist ordering that leads to no permutation of the string such as 'a' precedes 'b', 'b' precedes 'c', 'c' precedes 'a'. Additionally implementing the permutation algorithm in these cases require checking for ordering violation at each permutation level or just at the leaf of recursion, depending on whether checking for ordering violations is more costly or generating all permutations.

An interesting problem, that talks of special kind of ordering and can be very efficiently computed is as follows: Assume that the string is made up of equal number of '{' and '}'. You need to print all permutations of this string which could satisfy a C compiler i.e. an ending bracket is balanced by an opening bracket. For ex. for '{{}}' the permutations are [{{}}, {}{}].

Solution: show

The hard question is how many such permutations exist. For N open and N close brackets, the number of such permutations is Nth Catalan number = 1/(n+1) * factorial(2*N)/factorial(N)^2. The solution is derived from the problem: Number of paths in rectangular grid.

Permutation problems can be easily solved using recursion. The trick is to formulate all the possible cases at a particular level and code according to the formulation. The steps used below illustrate this approach. Let's formulate the problem in general sense. Assume that at a given step of recursion, we have n open brackets and m closed brackets that are not used so far. Now we can formuate each and every case for a given (n,m) and code the possible substeps.

Problem Formulation:
(n,m) ---> NOT POSSIBLE,   if m < n
(n,m) ---> PRINT AND DONE, if m = 0 and n = 0
(n,m) ---> (n-1, m)        if n > 0   
              +
           (n, m-1)        if m > 0 and m > n

The above formulation is generic and takes into account all possible input values of (n,m). If we put following constraint on the initial value of n and m such as following, then actual implementation can omit some of the above cases:

Constraints: 
n > 0
m == n

The actual implementation that follows above constraints and problem formuation:

function permute(str, n, m)
  if m == 0
    print str
  else
    if n > 0
      permute (str + '{', n - 1, m);

    if m > n
      permute (str + '}', n, m - 1);

To enforce the above mentioned constraints, permute function should be kept as inner/private function and the actual function that should be exposed is as follows:

function GenerateCStyleBrackets(N)
  if N <= 0
    return

  str = new String[2 * N]
  permute(str, N, N) // str is passed as pointer.

The memory cost is O(N). The maximum stack depth is also O(N). So total space cost is O(N).

Source Code: permute_brackets.py

January 22, 2009

Party Friends

Problem: In any party, there exists two people with same number of friends. Is is true or false. Assume that party has more than one people.

Solution: show

January 18, 2009

Innocents and Criminals: Finding Minority Entity

Problem: There are two types of people in a particular city, innocents and criminals. All you know is that innocents are in majority and would like to get rid of criminals and criminals would like to protect themselves from persecution. You can ask any number of questions, with yes/no type answer, from any person in the city. Propose an algorithm which requires asking the minimum number of questions.

Hint: Trivial solution requires asking O(n^2) questions. You can do it in less than 2n questions.

Solution: show

The trivial solution is very simple and is in-fact O(n^2) time algorithm. Round up every person and ask him about the status of everyone else. People with majority vote of criminal are criminals and people with minority vote of criminal are innocents. Following code solves it:

for i = 1 to N
   for j = 1 to N
      if person[i].isCriminal(j)  // doesnt matter if i == j
         vote[j] += 1
      else
         vote[j] -= 1

for j <- 1 to n 
   if vote[j] > 0
      person[j].persecute()

The most efficient algorithm requires asking only 2N-2 questions. The key to solution of this problem lies in the solution to the problem of Finding Majority Element. If we can find one innocent person in the city, then he can label everyone else truthfully. We know for sure that an innocent will tell the truth. Criminals can lie or say truth depending on circumstances.

Assume that we formulate the problem this way. Let us consider a pool of people, who claim to be innocents. Also let us say that we declare that we will pick the first person in the pool to be our innocent man and he will reveal the identity of everyone else. Then definitely both innocents and criminals would try their best to capture the first spot in the pool.

Now, we play a game like this. We choose a person randomly to represent the first person in the pool. Now we select a new person randomly and ask him should the last person inducted in the pool is Innocent. If he says Yes, then we add him to the pool. If he says No then we remove him and the last person from the pool and discard them from further selection. If the pool is empty we select a person not selected previously to get the first spot in the pool.

We repeat the process until we don't have anymore people to consider. The intuition is that even if all Criminals join the pool initially (by lying), then innocents would boot them out as they are in majority. If more innocents join the pool initially then criminals will not be able to boot all innocents out. Hence the first person remaining in the pool is indeed an innocent. We can see this argument working inductively as well.

The following code implements above logic with the help of a stack:

stack.push(person[1])
for i <- 2 to N
   if !stack.isEmpty() && person[i].isCriminal(stack.top())
      stack.pop()
   else
      stack.push(person[i])
return stack.bottom()

January 16, 2009

Finding First Non-Zero Digit of N Factorial

Find the first non-zero digit of N!. Assume N! to be extremely large.

Solution:
If we assume that the product doesnt contain either of 2's and 5's, then finding last digit is trivial. Simple multiply the result with next number and take its mod 10. The problem is complicated as 2's and 5's lead to trailing 0's. So the first step is to remove 2's and 5's. Then solve the problem. The cancel 5's with 2's and multiply back the left over 2's.

For Ex. 7! = 1 * 2 * 3 * 4 * 5 * 6 * 7. We extract 2's and 5's so the product is 1 * 3 * 3 * 7. The trailing digit is 3. We have extracted four 2's and one 5. So we get 3 extra 2's. So multiply to 3 we get 4 as the answer. To check 7! = 5040

d = 1 
n2 = 1

for i = 3 to n
  j = i

  while j%10 == 0 //remove trailing zeros
    j /= 10

  while j%2 == 0 //count of 2's
    j /= 2
    n2 += 1

  while j%5 == 0 //cancel a 5 with a 2
    j /= 5
    n2 -= 1    

  d = (d * j) % 10

d = (d * 2^n2) % 10 //multiply remaining 2's to d
return d

January 13, 2009

Count Trailing Zeros in N Factorial

How many trailing zeros does N factorial contain, for ex. 5! = 120 so ans is 1. Propose an algorithm to count trailing zeros. Assume that N! is very large and does not fit into word size of the computer.

Solution:
This problem is straight forward. N! = 1*2*3*4*5* .... *N. If we count all 5's in this product, then that many trailing zeros are there. This is because a 5 is countered by 2 to produce a zero. Additionally, there are more 2's than 5's so counting 5's gives the right answer. The following pseudo-code does that:

count = 0
i = 5
while i <= n
  j = i
  while j%5 == 0
    j = j/5
    count += 1
  i += 5
return count

This algorithm takes n/5 outer loop iterations and is bounded by log n [base=5] inner loop iterations, hence it takes O(n log n) time. We can clearly do better than this.

Actually, we can solve this problem in O(log n) time. For a given i the term i/5 gives total number of 5's that come before it. For ex. 17/5 is 3, there are 3 numbers that contain fives which are less than 17 i.e. 5, 10, 15. So we count number of 5, then number of 25, then so on.

count = 0
d = r = 1
while r > 0
  d *= 5
  r = [n/d] // [x] is greatest integer less than x
  count += r
return count

This algorithm runs in O(log n) time and an asymptotic improvement on the previous algorithm.

January 12, 2009

Picking k Elements Randomly from Stream

Consider the problem of picking K elements randomly from a stream of N elements. Solve the problem for known and unknown (but finite N).

Solution:
For K=1, the solution is posted in my previous blog post Picking an Element Randomly from Stream.

Case 1: N is known
In this case problem can be solved in O(N) time using O(K) space. The idea is that pick the first element with probability K/N. If the element is picked then pick the next element with probability (K-1)/(N-1) otherwise pick it with probability K/(N-1). To see that the second element is still picked with probability K/N. There are two cases.

First element is picked. So second element is picked with probability = K/N * (K-1)/(N-1).
First element is not picked. So second element is picked with probability = (1-K/N) * K/(N-1).

So the selection probability of second element is = K/N*(K-1)/(N-1) + (1-K/N)*K/(N-1) = K/N. Similar logic extends further. The following code implements this approach.

for i = 1 to N
   if rand <= K/(N+1-i)
      print A[i]
      K -= 1
      if K == 0
         break
      end
   end
end
if K != 0
  print A[N-K:N]
end

Case 2: N is unknown.
If we do not know N beforehand, and yet we want to guarantee that all elements are selected with probability K/N, we can use Reservoir Sampling algorithm.

for i = 1:K
   S[i] = A[i]
end

for i = K+1:length(A)
   j = random(1,i)
   if j <= K
      S[j] = A[i]
   end 
end

The first K elements of the stream are automatically selected. So for N = K this solves the problem. Now consider N = K+1. For i <= N-1 all elements are selected with probability 1 so far. The last element should be selected with probability K/N and so should others be. The index j is less than or equal to K with probability K/(K+1), so last element is rejected with probability 1/(K+1). For the elements in the array at index j they can be rejected if last element is accepted and its index is j. The chances of that is K/(K+1) * 1/K = 1/(K+1). So all elements are rejected with 1/(K+1) probability. Now consider N = K+2. Until index i=K+1 all elements are rejected with probability 1/(K+1). Using same logic as we applied above, we can see that an element gets rejected with probability 2/(K+2).

January 7, 2009

Optimal Coin Change Problem

Given an unlimited supply of coins of denominations C1, C2, ..., CN we wish to make change for a value V. Give an algorithm for producing change with minimum number of coins.

Complexity: Depending on denomination of coins, we can either use fast greedy algorithm or dynamic programming.

Solution:
Consider two specific types of denominations of coins
a) 1, C, C^2, C^3, ..., C^(N-1)
b) 1, C, 2C, 3C, ...., (N-1)C

For both of these arrangements, the problem can be solved using a greedy approach. We see that if we first pay with the highest denomination coin, the value (for which change is required) reduces greatest (as compared to any lower denomination). Additionally we see that optimal solution would require us to use as many highest denomination coin as possible. We can prove this claim using contradiction. The greedy algorithm that is pretty fast and runs in O(N), is as follows:

change = 0
for j = N to 1
  change += int(value/C[j])
  value = value % C[j]
return change

The generic problem of coin change cannot be solved using the greedy approach, because the claim that we have to use highest denomination coin as much as possible is wrong here and it could lead to suboptimal or no solutions in some cases. For ex, if coin denomination are 1, 4, 6 and we want change for 8. Using one coin of denomination 6 forces us to use 2 coins of denomination 1, but clearly the optimal solution is to use 2 coin of denomination 4 each. The key observation to make is the optimal structure of the sub-problem. If we assume for the time being that we have optimal change for all values < i. To produce change for value i, we need to add a coin to some previous solution with value j < i. We need to choose j that minimizes the value of change[i], the O(V*N) algorithm looks like the following:

change[0] = 0 // no coin required.
for i = 1 to V
  change[i] = INF
  for j = 1 to N
    if i >= C[j] and change[i] > change[i - C[j]]
      change[i] = 1 + change[i - C[j]]
return change[n]

To print the choice of coins, we can keep a separate array to track which coin, cj was chosen for a given i.

Source Code: coin_change.py

January 5, 2009

Picking an Element Randomly from Stream

How to pick a number randomly from a stream (or a linked list) with a finite but unknown length N? Solution must use constant space and guarantee that number is picked with probability = 1/N.

Solution:
If we know the length of the stream, then problem is trivial. Simply generate a random number r between 1 and N. Declare the number at index r as the desired random number.

For an unknown but finite N, we need that elements are selected with equal probability. Let us first build the loop invariant property for the solution. Say we have seen k elements so far, the selected number (say n) must be selected with probability 1/k. As we see a new element (say n1), we should set n=n1 with probability 1/(k+1). This ensures that n1 is selected with probability 1/(k+1). In the other case, n (the one NOT replaced by n1) is selected with probability = 1/k * (1-1/(k+1)) = 1/(k+1). Hence this strategy ensures that a number a selected with probability 1/N for N>=k. This loop invariant is adhered by the following code:

n=0, N=0
while Stream.hasElement()
  ++N
  if rand <= 1/N 
    n = Stream.nextElement()
return n

After the algorithm processes all elements, n would contain the randomly selected element and N would contain the number of elements in the stream.

January 1, 2009

Generating Random Numbers

How to generate random integer between A and B, using an unbiased coin. For ex. If A = 3 and B = 6 then your algorithm should generate 3, 4, 5, 6 with equal probabilities.

Solution:
Without loss of generality, we consider the problem of generating random number r in range [0,n] with n=B-A. We can return r+A as the solution to input problem. The number n can be represented using k bits where k = 1 + [log n] ([x] indicates the largest integer smaller than x). Now the k-bits can be generated using the unbiased coin trivially in O(k). The generated number r can be greater than n with probability = 1-(n+1)/2^k. If that happens then repeat the experiment. Pseudo code for the algorithm looks like the following:-

function random(A, B)
  n = B - A
  k = 1 + int(log n)
  r = 0
  for i = 1 to k
    bit = coin_toss(); // head means 1, tail means 0
    r += bit
    r <<= 1
    if r > n
      return random(A, B)
  return r

This function can go into infinite loop. Let us calculate the expected number of times random(A,B) is called. It is called once by the the actual caller and called again if r > n. Let us assume that the expected number of calls be e. The probability that r <= n is given as Pr(r <= n) = (n+1)/2^k.

=> e = 1 * Pr(r <= n) + (1 + e) * Pr(r > n) => e = 2^k/(n+1)

In the worst case, n=2^m+1. In this case, k=1+m. We get

==> e = 2 * 2^m/(2^m + 2) < 2

So expected number of internal calls is at-most 2. This gives a good assurance against going into infinite loops. Note that e can be estimated by using the observation that random variable e - 1 is a geometric distribution. Another way to reach above conclusion is by observing probability of failure, p = 1-(n+1)/2^k < 1/2. Since the probability of failure is less than 1/2, the chances that we run into infinite loop is quite low.

The above solution can be explained in simpler terms as well. Assume that we have people numbered A to B and we are playing a tournament. In each round of the tournament, two players are paired up, they call on the coin toss, player who wins goes to next level. In case if pairs can not be formed we add dummies to form pairs. If a normal player wins we are done, but if dummy wins we repeat the experiment again. So for example if we have 3, 4, 5, 6, 7 as initial number, and we add 'd' as dummy. Then a possible tournament could be as follows:

round 1: (3 vs 4) , (5 vs 6) , (7 vs d) --- winner --> 3, 5, d
round 2: (3 vs 5), (d vs d) -- winner --> 5, d
round 3: (5 vs d) -- winner --> 5

If d wins the last round, we repeat the experiment. This algorithm is just the same as earlier algorithm.