Problem 1: There are three light bulbs in a room. Outside the room are three switches which operate those bulbs. You know that the bulbs are initially turned off and you can toggle the switches as much as you want. While you toggle the switches, you cant peek inside the room. Finally when you are done with the switches, you enter the room once and you have to tell which switch is associated with which bulb.
Solution: show
Problem 2: Same problem as 1, but now we dont know if the bulbs are initially on or off. So in a way we do not know if turning the switch up turns on the bulb or turns it off. All we know is that all the bulbs are in same state, i.e., all are either on or off. Now we have to toggle the switches and enter the room once and tell which switch operates which bulb.
Puzzles, Maths and Algorithms
February 16, 2015
July 2, 2014
Solving Sudoku Puzzles
Recently, I got hooked on to Sudoku (thanks to a wonderful IPhone app). I found it to be a slightly more constructive time sink and plan to teach it to my daughter some day (she is one year old as of yet). For this, I thought of writing an assistive software to teach kids how to simplify a Sudoku. This motivated me to first build a Sudoku solver and this blog is about it.
A Sudoku is a matrix of cells, typically 9x9, but it can be any number. In this post, we consider N x N Sudoku (where N is a perfect square, i.e. 1x1, 4x4, 9x9, 16x16). Now for such a Sudoku, we first construct a groups of cells called as blocks. We get N row blocks, N column blocks and N sub-grid blocks. All the blocks contain N cells each. A cell can have a value from 1 to N, such that, if a cell is assigned a value, then no other cell in the same block can be assigned that value. To check if a Sudoku is solved, all the blocks must sum to N*(N+1)/2. Another check is one where number uniqueness constraints are satisfied per block and each cell is assigned a value from 1 to N.
To simplify a Sudoku by hand, I typically use the following techniques:
The program is oriented to be event driven, where a cell is asked to remove its hint based on the logic of (1,2,3). In this process if a cell has only one hint remaining, it triggers a fix event. When the fix event is called, this cell is assigned the value and that value is removed from the hint of cells that share block with it. If at anytime any inconsistencies are found, the events return a “false”, indicating that Sudoku is not solvable. This is a fail fast technique and it quickly helps in determining if a Sudoku is non-solvable or not. Otherwise, we would wait for all cells to get filled and then realize that it failed (wasting precious time). A simple brute-force strategy is very easy to code, however the possibilities to check are huge. For e.g., we might end up checking O(9^81) possibilities (which can be prohibitive). The simplification algorithms (1) and (2) come to the rescue by drastically simplifying the Sudoku. For most of the easy problems - the recursive step (3) is not even triggered.
I searched online for Sudoku problems and found that Peter Norvig has a blog about it. He used algo(1,3) but not (2) and has shared a set of easy, hard, and hardest Sudoku puzzles. It amazes me to say that for most hard problem, I take at least 8-10 minutes (and sometimes a hint) but a computer can solve it in order of ~0.1 ms. If you are interested in the maths of Sudoku (e.g. number of possible solutions), please check out the wikipedia page: maths of sudoku.
In the first analysis below, we see how many cells different algorithms are able to solve.
Here we see that algorithm (1) itself can solve 89% of the cells for easy problems. However, the joint application of (1+2) is able to solve 96% of the easy cells and 41% of the hard cells (and roughly provide a 10-15% improvement. This is a great simplification, especially for easy Sudoku problems. Amongst (2a) and (2b), I find (2a) to be a better strategy to run in conjunction with (1) as (1+2a) has same result as (*).
In the next analysis, we see how many hints are left after application of different algorithms
Similar analysis as above. 2a appears better than 2b. But (*) must better than (1) or (2). So combining these strategies is much more effective. We see that clearly (1+2a) as good as (*) so we can simply drop algorithm 2b. To see how much (if at all) there is a penalty of running the (2b) algo, we turn to time taken.
For measuring time, I use Java's System.NanoTime() function, which gives very precise timing. Experiments are run on my MacBook Pro with 2.6Ghz intel I7 processor. To get the timings, I solve a puzzle 500 times and take the average solving time. We get the following timings.
Clearly brute force without simplification is a bad idea. (*) improves timing over (1) for hard problems at least. For other problems, numbers are very close to conclude. My experience over other random problems that I tried, suggests that (*) leads to fastest simplification (i.e. reduced iterations) and hence better.
I also tried the problem that Peter mentions taking most time and noted its timing through my Java implementation. It took 6134 ms for algo (1), but only 10ms for (*). A clear evidence of the joint algorithms' usefulness. This improved timing comes from the fact that Sudoku gets drastically simplified before entering the brute-force stage and even for each choice of brute-force, it further gets simplified eliminating possible hints.
The Sudoku program is generic in the sense that it can solve 1x1, 4x4, 9x9, 16x16, … problems. A quick run of the program is as follows:
A word of advice to readers. While building the Sudoku (or similar programs), you might be tempted to optimize it. I suggest its more important for the code to be readable and debuggable and concise. I tried to optimize my code, but realized its not worth it. Using some tricks i got speed up of 1ms for a specific alto, but that meant highly cryptic data structure. This would lead to increased timing in another algo and a nightmare to debug. Just do basic optimizations. As a side-effect of my attempt, i learnt few useful things to speed up the code in Java.
A Sudoku is a matrix of cells, typically 9x9, but it can be any number. In this post, we consider N x N Sudoku (where N is a perfect square, i.e. 1x1, 4x4, 9x9, 16x16). Now for such a Sudoku, we first construct a groups of cells called as blocks. We get N row blocks, N column blocks and N sub-grid blocks. All the blocks contain N cells each. A cell can have a value from 1 to N, such that, if a cell is assigned a value, then no other cell in the same block can be assigned that value. To check if a Sudoku is solved, all the blocks must sum to N*(N+1)/2. Another check is one where number uniqueness constraints are satisfied per block and each cell is assigned a value from 1 to N.
To simplify a Sudoku by hand, I typically use the following techniques:
- A number is assigned to a cell,
- If a cell can hold only that number.
- If that number can be held by only that cell (in at least one block).
- Two numbers are assigned to two cells if those two cells share at least one block (Note: two cells can share maximum of two blocks),
- The two numbers are referred exclusively by these two cells in a given block. In this case other hints of the two numbers are pruned.
- The two cells refer to those two numbers only. In this case the two numbers are removed from the hints of other cells in their shared block.
- To finally solve it, simplify Sudoku by applying (1) and (2) until further simplification is not possible. Then pick a cell with least number of hints. Assign it one of those hints and repeat this brute-force method. If failed, then try another hint. If all hints fail then Sudoku is not solvable.
The program is oriented to be event driven, where a cell is asked to remove its hint based on the logic of (1,2,3). In this process if a cell has only one hint remaining, it triggers a fix event. When the fix event is called, this cell is assigned the value and that value is removed from the hint of cells that share block with it. If at anytime any inconsistencies are found, the events return a “false”, indicating that Sudoku is not solvable. This is a fail fast technique and it quickly helps in determining if a Sudoku is non-solvable or not. Otherwise, we would wait for all cells to get filled and then realize that it failed (wasting precious time). A simple brute-force strategy is very easy to code, however the possibilities to check are huge. For e.g., we might end up checking O(9^81) possibilities (which can be prohibitive). The simplification algorithms (1) and (2) come to the rescue by drastically simplifying the Sudoku. For most of the easy problems - the recursive step (3) is not even triggered.
I searched online for Sudoku problems and found that Peter Norvig has a blog about it. He used algo(1,3) but not (2) and has shared a set of easy, hard, and hardest Sudoku puzzles. It amazes me to say that for most hard problem, I take at least 8-10 minutes (and sometimes a hint) but a computer can solve it in order of ~0.1 ms. If you are interested in the maths of Sudoku (e.g. number of possible solutions), please check out the wikipedia page: maths of sudoku.
In the first analysis below, we see how many cells different algorithms are able to solve.
Input | (1) | (2a) | (2b) | (2) | (1+2a) | (1+2b) | (*) | |
Easy | 35% | 89% | 54% | 68% | 70% | 96% | 94% | 96% |
Hard | 25% | 31% | 26% | 26% | 26% | 41% | 31% | 41% |
Hardest | 30% | 45% | 35% | 35% | 36% | 50% | 45% | 50% |
Here we see that algorithm (1) itself can solve 89% of the cells for easy problems. However, the joint application of (1+2) is able to solve 96% of the easy cells and 41% of the hard cells (and roughly provide a 10-15% improvement. This is a great simplification, especially for easy Sudoku problems. Amongst (2a) and (2b), I find (2a) to be a better strategy to run in conjunction with (1) as (1+2a) has same result as (*).
In the next analysis, we see how many hints are left after application of different algorithms
Input | (1) | (2a) | (2b) | (2) | (1+2a) | (1+2b) | (*) | |
Easy | 474 | 29 | 126 | 89 | 82 | 10 | 16 | 10 |
Hard | 544 | 230 | 252 | 260 | 250 | 184 | 228 | 181 |
Hardest | 508 | 161 | 191 | 193 | 189 | 141 | 158 | 141 |
Similar analysis as above. 2a appears better than 2b. But (*) must better than (1) or (2). So combining these strategies is much more effective. We see that clearly (1+2a) as good as (*) so we can simply drop algorithm 2b. To see how much (if at all) there is a penalty of running the (2b) algo, we turn to time taken.
For measuring time, I use Java's System.NanoTime() function, which gives very precise timing. Experiments are run on my MacBook Pro with 2.6Ghz intel I7 processor. To get the timings, I solve a puzzle 500 times and take the average solving time. We get the following timings.
Brute Force | (1) | (1+2a) | (1+2b) | (*) | |
Easy | 0.14 ms | 0.04 ms | 0.03 ms | 0.03 ms | 0.03 ms |
Hard | 21 ms | 1.16 ms | 0.44 ms | 0.55 ms | 0.42 ms |
Hardest | 0.61 ms | 0.17 ms | 0.20 ms | 0.19 ms | 0.22 ms |
Clearly brute force without simplification is a bad idea. (*) improves timing over (1) for hard problems at least. For other problems, numbers are very close to conclude. My experience over other random problems that I tried, suggests that (*) leads to fastest simplification (i.e. reduced iterations) and hence better.
I also tried the problem that Peter mentions taking most time and noted its timing through my Java implementation. It took 6134 ms for algo (1), but only 10ms for (*). A clear evidence of the joint algorithms' usefulness. This improved timing comes from the fact that Sudoku gets drastically simplified before entering the brute-force stage and even for each choice of brute-force, it further gets simplified eliminating possible hints.
The Sudoku program is generic in the sense that it can solve 1x1, 4x4, 9x9, 16x16, … problems. A quick run of the program is as follows:
Solving a Sudoku puzzle of size 1 ...
===
| * |
===
[status] solved=0 hints=1
===
| 1 |
===
[status] SOLVED
Time taken=0.022 ms
Solving a Sudoku puzzle of size 4 ...
===== =====
| 4 * | 2 * |
| * * | 4 1 |
===== =====
| 1 * | * * |
| * * | 1 4 |
===== =====
[status] solved=7 hints=36
===== =====
| 4 1 | 2 3 |
| 2 3 | 4 1 |
===== =====
| 1 4 | 3 2 |
| 3 2 | 1 4 |
===== =====
[status] SOLVED
Time taken=0.132 ms
Solving a Sudoku puzzle of size 9 ...
======= ======= =======
| * * * | 7 * * | 6 * 3 |
| 7 * 3 | 9 * * | * 1 2 |
| * * * | 5 3 * | * * * |
======= ======= =======
| 1 * * | * * * | * * 7 |
| 9 * * | * * * | * * * |
| * 4 5 | * * * | * 9 6 |
======= ======= =======
| 3 * * | * * * | 9 * * |
| * * * | * * 1 | * 6 5 |
| * 1 * | 3 6 * | * 7 4 |
======= ======= =======
[status] solved=27 hints=486
======= ======= =======
| 5 9 4 | 7 1 2 | 6 8 3 |
| 7 8 3 | 9 4 6 | 5 1 2 |
| 6 2 1 | 5 3 8 | 7 4 9 |
======= ======= =======
| 1 6 2 | 8 5 9 | 4 3 7 |
| 9 3 7 | 6 2 4 | 1 5 8 |
| 8 4 5 | 1 7 3 | 2 9 6 |
======= ======= =======
| 3 5 6 | 4 8 7 | 9 2 1 |
| 4 7 8 | 2 9 1 | 3 6 5 |
| 2 1 9 | 3 6 5 | 8 7 4 |
======= ======= =======
[status] SOLVED
Time taken=2.198 ms
Solving a Sudoku puzzle of size 16 ...
============= ============= ============= =============
| 8 7 * * | * * * * | * 3 * * | 13 * 4 * |
| * 5 14 * | * * 3 10 | 15 9 1 * | * 6 * * |
| 16 * * * | 5 8 7 * | * 14 * * | 9 * 11 12 |
| * * 4 * | * 14 6 13 | * 11 10 12 | * 7 * 3 |
============= ============= ============= =============
| 14 * * 8 | * * 1 * | * * * 3 | 7 4 12 * |
| 9 * * * | * 6 15 12 | * * 13 14 | * 3 1 * |
| 11 * 10 3 | * * 13 * | * 8 * 1 | * * 6 * |
| 6 * * 1 | 14 * 4 * | * 5 * 9 | 11 * * 13 |
============= ============= ============= =============
| * * * * | 15 * * * | * * 9 * | 5 * 2 10 |
| 10 1 * * | 6 * 5 * | 13 15 7 16 | * * * * |
| * * 16 11 | * 4 * 8 | 2 * * * | * 13 * 7 |
| * 9 * 7 | 1 3 * 2 | 6 * 8 10 | 16 15 14 4 |
============= ============= ============= =============
| 7 * 13 * | 9 16 * 5 | * * 14 4 | 3 8 * 2 |
| * * 3 * | 10 * * * | * * * * | * 16 15 * |
| 1 * 9 * | * * 14 4 | * * * * | * * 7 * |
| * 6 8 * | 3 * * * | 10 7 * * | * * * * |
============= ============= ============= =============
[status] solved=116 hints=2240
============= ============= ============= =============
| 8 7 11 10 | 2 12 9 1 | 5 3 16 6 | 13 14 4 15 |
| 12 5 14 13 | 4 11 3 10 | 15 9 1 7 | 2 6 16 8 |
| 16 3 1 6 | 5 8 7 15 | 4 14 2 13 | 9 10 11 12 |
| 2 15 4 9 | 16 14 6 13 | 8 11 10 12 | 1 7 5 3 |
============= ============= ============= =============
| 14 13 15 8 | 11 2 1 9 | 16 10 6 3 | 7 4 12 5 |
| 9 4 7 5 | 8 6 15 12 | 11 2 13 14 | 10 3 1 16 |
| 11 2 10 3 | 7 5 13 16 | 12 8 4 1 | 15 9 6 14 |
| 6 16 12 1 | 14 10 4 3 | 7 5 15 9 | 11 2 8 13 |
============= ============= ============= =============
| 3 8 6 12 | 15 13 16 7 | 14 4 9 11 | 5 1 2 10 |
| 10 1 2 4 | 6 9 5 14 | 13 15 7 16 | 8 12 3 11 |
| 15 14 16 11 | 12 4 10 8 | 2 1 3 5 | 6 13 9 7 |
| 13 9 5 7 | 1 3 11 2 | 6 12 8 10 | 16 15 14 4 |
============= ============= ============= =============
| 7 11 13 15 | 9 16 12 5 | 1 6 14 4 | 3 8 10 2 |
| 5 12 3 14 | 10 7 8 6 | 9 13 11 2 | 4 16 15 1 |
| 1 10 9 2 | 13 15 14 4 | 3 16 5 8 | 12 11 7 6 |
| 4 6 8 16 | 3 1 2 11 | 10 7 12 15 | 14 5 13 9 |
============= ============= ============= =============
[status] SOLVED
Time taken=1.346 ms
Solving all puzzles from: ./src/sudoku/hardest.txt
Solved : 11/11
Solved Cells : 270(input) 891(finally) 891(expected)
Cell Hints : 5589(input) 0(finally)
Time Taken : 2.59014144 milli-seconds
Avg time : 0.23546740363636365 milli-seconds
Solving all puzzles from: ./src/sudoku/hard.txt
Solved : 95/95
Solved Cells : 1953(input) 7695(finally) 7695(expected)
Cell Hints : 51678(input) 0(finally)
Time Taken : 41.772783616 milli-seconds
Avg time : 0.4397135117473684 milli-seconds
Solving all puzzles from: ./src/sudoku/easy.txt
Solved : 50/50
Solved Cells : 1418(input) 4050(finally) 4050(expected)
Cell Hints : 23688(input) 0(finally)
Time Taken : 1.868110848 milli-seconds
Avg time : 0.03736221696 milli-seconds
===
| * |
===
[status] solved=0 hints=1
===
| 1 |
===
[status] SOLVED
Time taken=0.022 ms
===== =====
| 4 * | 2 * |
| * * | 4 1 |
===== =====
| 1 * | * * |
| * * | 1 4 |
===== =====
[status] solved=7 hints=36
===== =====
| 4 1 | 2 3 |
| 2 3 | 4 1 |
===== =====
| 1 4 | 3 2 |
| 3 2 | 1 4 |
===== =====
[status] SOLVED
Time taken=0.132 ms
======= ======= =======
| * * * | 7 * * | 6 * 3 |
| 7 * 3 | 9 * * | * 1 2 |
| * * * | 5 3 * | * * * |
======= ======= =======
| 1 * * | * * * | * * 7 |
| 9 * * | * * * | * * * |
| * 4 5 | * * * | * 9 6 |
======= ======= =======
| 3 * * | * * * | 9 * * |
| * * * | * * 1 | * 6 5 |
| * 1 * | 3 6 * | * 7 4 |
======= ======= =======
[status] solved=27 hints=486
======= ======= =======
| 5 9 4 | 7 1 2 | 6 8 3 |
| 7 8 3 | 9 4 6 | 5 1 2 |
| 6 2 1 | 5 3 8 | 7 4 9 |
======= ======= =======
| 1 6 2 | 8 5 9 | 4 3 7 |
| 9 3 7 | 6 2 4 | 1 5 8 |
| 8 4 5 | 1 7 3 | 2 9 6 |
======= ======= =======
| 3 5 6 | 4 8 7 | 9 2 1 |
| 4 7 8 | 2 9 1 | 3 6 5 |
| 2 1 9 | 3 6 5 | 8 7 4 |
======= ======= =======
[status] SOLVED
Time taken=2.198 ms
============= ============= ============= =============
| 8 7 * * | * * * * | * 3 * * | 13 * 4 * |
| * 5 14 * | * * 3 10 | 15 9 1 * | * 6 * * |
| 16 * * * | 5 8 7 * | * 14 * * | 9 * 11 12 |
| * * 4 * | * 14 6 13 | * 11 10 12 | * 7 * 3 |
============= ============= ============= =============
| 14 * * 8 | * * 1 * | * * * 3 | 7 4 12 * |
| 9 * * * | * 6 15 12 | * * 13 14 | * 3 1 * |
| 11 * 10 3 | * * 13 * | * 8 * 1 | * * 6 * |
| 6 * * 1 | 14 * 4 * | * 5 * 9 | 11 * * 13 |
============= ============= ============= =============
| * * * * | 15 * * * | * * 9 * | 5 * 2 10 |
| 10 1 * * | 6 * 5 * | 13 15 7 16 | * * * * |
| * * 16 11 | * 4 * 8 | 2 * * * | * 13 * 7 |
| * 9 * 7 | 1 3 * 2 | 6 * 8 10 | 16 15 14 4 |
============= ============= ============= =============
| 7 * 13 * | 9 16 * 5 | * * 14 4 | 3 8 * 2 |
| * * 3 * | 10 * * * | * * * * | * 16 15 * |
| 1 * 9 * | * * 14 4 | * * * * | * * 7 * |
| * 6 8 * | 3 * * * | 10 7 * * | * * * * |
============= ============= ============= =============
[status] solved=116 hints=2240
============= ============= ============= =============
| 8 7 11 10 | 2 12 9 1 | 5 3 16 6 | 13 14 4 15 |
| 12 5 14 13 | 4 11 3 10 | 15 9 1 7 | 2 6 16 8 |
| 16 3 1 6 | 5 8 7 15 | 4 14 2 13 | 9 10 11 12 |
| 2 15 4 9 | 16 14 6 13 | 8 11 10 12 | 1 7 5 3 |
============= ============= ============= =============
| 14 13 15 8 | 11 2 1 9 | 16 10 6 3 | 7 4 12 5 |
| 9 4 7 5 | 8 6 15 12 | 11 2 13 14 | 10 3 1 16 |
| 11 2 10 3 | 7 5 13 16 | 12 8 4 1 | 15 9 6 14 |
| 6 16 12 1 | 14 10 4 3 | 7 5 15 9 | 11 2 8 13 |
============= ============= ============= =============
| 3 8 6 12 | 15 13 16 7 | 14 4 9 11 | 5 1 2 10 |
| 10 1 2 4 | 6 9 5 14 | 13 15 7 16 | 8 12 3 11 |
| 15 14 16 11 | 12 4 10 8 | 2 1 3 5 | 6 13 9 7 |
| 13 9 5 7 | 1 3 11 2 | 6 12 8 10 | 16 15 14 4 |
============= ============= ============= =============
| 7 11 13 15 | 9 16 12 5 | 1 6 14 4 | 3 8 10 2 |
| 5 12 3 14 | 10 7 8 6 | 9 13 11 2 | 4 16 15 1 |
| 1 10 9 2 | 13 15 14 4 | 3 16 5 8 | 12 11 7 6 |
| 4 6 8 16 | 3 1 2 11 | 10 7 12 15 | 14 5 13 9 |
============= ============= ============= =============
[status] SOLVED
Time taken=1.346 ms
Solved : 11/11
Solved Cells : 270(input) 891(finally) 891(expected)
Cell Hints : 5589(input) 0(finally)
Time Taken : 2.59014144 milli-seconds
Avg time : 0.23546740363636365 milli-seconds
Solved : 95/95
Solved Cells : 1953(input) 7695(finally) 7695(expected)
Cell Hints : 51678(input) 0(finally)
Time Taken : 41.772783616 milli-seconds
Avg time : 0.4397135117473684 milli-seconds
Solved : 50/50
Solved Cells : 1418(input) 4050(finally) 4050(expected)
Cell Hints : 23688(input) 0(finally)
Time Taken : 1.868110848 milli-seconds
Avg time : 0.03736221696 milli-seconds
- bytes is more expensive that int. So use int.
- int check (e.g. i==0) is very-very slightly faster than boolean check. But don't use it if it compromises readability.
- Exception handling (even if exception is never thrown) is expensive.
- Java doesn't do tail call optimization. So don't worry too must about trying formatting your recursive routine.
- keyword final has no effect on performance in Java. So if you think a one liner function is inlined, its not.
- Avoid using array list and use fixed array. Even if array list is of size say k and array of n where k < n, its faster to use array. UNLESS k <<< n.
- Hashmap, Hashsets would easily hog performance. It can easily double the running time. So if you can avoid it, its better.
- 1-d array is cheaper than 2-d array.
- Using objects is extremely cheap. I find next to nil penalty on using objects. So use them to simplify code.
- Caching can be useful. for example instead of referring A[i] several times, its faster to set x =A[i] and refer to x.
- private, public, and other qualifies only increase space of byte code but has no performance impact. So use them for good design principles and not for performance improvements.
May 2, 2014
Stick Breaking
Problem Set 1: Given a stick of length 1 meter. It is cut into two parts randomly.
Problem Set 2: Given a stick of length 1 meter. It is cut into three parts randomly.
Problem Set 3: Given a stick of length 1 meter. It is cut into two parts randomly. Then bigger stick is picked and is again cut into two parts randomly.
Generalized Cut Problem: Find the solution to problem 2a-2c, 3a-3c for n-1 cuts.
Solution: show
- Problem 1a: What is the expected length of the smaller stick?
- Problem 1b: What is the expected length of the bigger stick?
Problem Set 2: Given a stick of length 1 meter. It is cut into three parts randomly.
- Problem 2a: What is the expected length of the smallest stick?
- Problem 2b: What is the expected length of the largest stick?
- Problem 2c: What is the expected length of the stick with second smallest (or second largest) length?
- Problem 2d: What is the expected length of the middle stick (stick that is between the two cuts)?
- Problem 2e: What is the probability that the three sticks would form a triangle?
- Problem 2f: What is the probability that the three sticks would form a right angle triangle?
- Problem 2g: What is the probability that the three sticks would form an equilateral triangle?
Problem Set 3: Given a stick of length 1 meter. It is cut into two parts randomly. Then bigger stick is picked and is again cut into two parts randomly.
- Problem 3a: What is the expected length of the smallest stick?
- Problem 3b: What is the expected length of the largest stick?
- Problem 3c: What is the expected length of the stick with second smallest (or second largest) length?
- Problem 3d: What is the expected length of the middle stick (stick that is between the two cuts)?
- Problem 3e: What is the probability that the three sticks would form a triangle?
- Problem 3f: What is the probability that the three sticks would form a right angle triangle?
- Problem 3g: What is the probability that the three sticks would form an equilateral triangle?
Generalized Cut Problem: Find the solution to problem 2a-2c, 3a-3c for n-1 cuts.
Solution: show
Solution for problem 1: 1/4 (1a), 3/4 (1b)
Solution for problem 2: 1/9 (2a), 11/18 (2b), 5/18 (2c), 1/3 (2d), 1/4 (2e), 0 (2f), 0 (2g)
Solution for problem 3: 15/16 - 2log(3/2) (3a), …
Let us work out solution for 1a. Smaller stick has length x which is uniformly distributed in the range [0,0.5]. The probability distribution of x is p(x) = 1 / [1/2-0] = 2. Yes it is a pdf, ∫ 00.5p(x)dx = 1. The expected length of the smaller stick is E[x] = ∫ x p(x) dx = ∫ 00.52xdx = 1/4.
2a) Let us work out solution for 2a. Let two cuts are at distance x,y from one end. Let x < y. So the three sticks are of length x, y-x, 1-y. Let z = min(x,y-x,1-y) represent the minimum length of the stick. Let Pr(z>a) indicate the probability that min length stick has length greater than a.
=> Pr(z>a) = 2 * Pr(x>a, y-x>a, 1-y>a) = Pr(x>a, a+x < y < 1-a)
Note multiplication by 2 is for the other case (x > y). Since a+x< y < 1-a, we get x < 1-2a. Also setting x =a, we get a < 1/3 (which is obvious).
=> Pr(z > a) = 2 ∫ a1-2a ∫ a+x1-adxdy = (1-3a)2.
So we get Pr(z<=a) = 1-Pr(z>a). And Pr(z=a) = derivative of Pr(z<=a) at a = 6(1-3a).
=> E[a] = ∫ 01/36a(1-3a)da = 1/9
2b) We can do a similar derivation for z = max(x,y-x,1-y) to get the expected max length. It would be slightly more complicated. Unlike above, we compute here Pr(z< a)
=> Pr(z < a) = Pr(x < a, y-x < a, 1-y < a) = Pr( 1-2a < x < a, 1-a < y < x + a).
We need to be careful here are respect constraints like x > 0, x < y, y < 1. To ensure these constraints are respected. Pr(z < a) is broken into three cases, Case 1: a < 1/2. Case 2: a >= 1/2, 0 < x < 1-a, Case 3: a >= 1/2, 1-a < x < a. Then we have to take differential of obtained Pr(z < a) to get Pr(z=a). [Dont mix terms from the three cases].
Integrate the first case from [1/3 to 1/2]. Second and third case from [1/2 to 1]. We will see the desired output as 1/2+1/9=11/18. (Don't forget to multiply by 2 to cover for x > y case).
2c) It would be easy to obtain: 1 - (2a) - (2b) = 1 - 1/9 - 1/2 - 1/9 = 1/2 - 2/9 = 5/18.
2d) Since there are no constraints, expected length of middle cut is 1/3
2e) For triangle, we require satisfaction of three inequalities
x + y - x > 1-y,
x + 1- y > y - x,
y - x + 1 - y > x
Apart from the implicit x < y inequality.
=> 1/2 < y < x + 1/2, x < 1/2
So x follows a pdf of P(x) = 2. y satisfied the range with probability x. So the probability of triangle formation is ∫ 01/22 x dx = 1/4.
2f) There are discrete stick lengths, e.g. (3/12, 4/12, 5/12) that satisfy this property. There are countably infinite, yet the area encapsulated by them is 0.
Problem set 3 is slightly more complicated. Let us work out 3a).
Let the first cut is at length x from shorter side (x < 1/2). Now from the stick of length (1-x), we make a cut at random of length y. Let y < (1-x)/2. If x is the shortest stick then x < y => x < 1/3.
Case 1: 1/3 < x < 1/2, 0 < y < (1-x)/2 => y is the shortest stick
Case 2: 0 < x < 1/3
Case 2a: 0 < y < x => y is the shortest stick
Case 2b: x < y < (1-x)/2 => x is the shortest stick
We also note that P(x) = 2 (a uniform distribution in [0,1/2]), P(y) = 2/(1-x) (a uniform distribution in [0, (1-x)/2]).
So the Expected min length = ∫ 1/31/2 ∫ 0(1-x)/2 4y/(1-x) dx dy + ∫ 01/3 ∫ 0x 4y/(1-x) dx dy +
∫ 01/3∫ x(1-x)/24x/(1-x) dx dy = 15/16 - 2 log(3/2)
For Generalized problem read David and Nagaraja's Order Statistics (pp. 133-135, and p. 153)
The solutions through Monte-Carlo simulations. Following python code solves problem set 2 and 3.
Solution for problem 2: 1/9 (2a), 11/18 (2b), 5/18 (2c), 1/3 (2d), 1/4 (2e), 0 (2f), 0 (2g)
Solution for problem 3: 15/16 - 2log(3/2) (3a), …
Let us work out solution for 1a. Smaller stick has length x which is uniformly distributed in the range [0,0.5]. The probability distribution of x is p(x) = 1 / [1/2-0] = 2. Yes it is a pdf, ∫ 00.5p(x)dx = 1. The expected length of the smaller stick is E[x] = ∫ x p(x) dx = ∫ 00.52xdx = 1/4.
2a) Let us work out solution for 2a. Let two cuts are at distance x,y from one end. Let x < y. So the three sticks are of length x, y-x, 1-y. Let z = min(x,y-x,1-y) represent the minimum length of the stick. Let Pr(z>a) indicate the probability that min length stick has length greater than a.
=> Pr(z>a) = 2 * Pr(x>a, y-x>a, 1-y>a) = Pr(x>a, a+x < y < 1-a)
Note multiplication by 2 is for the other case (x > y). Since a+x< y < 1-a, we get x < 1-2a. Also setting x =a, we get a < 1/3 (which is obvious).
=> Pr(z > a) = 2 ∫ a1-2a ∫ a+x1-adxdy = (1-3a)2.
So we get Pr(z<=a) = 1-Pr(z>a). And Pr(z=a) = derivative of Pr(z<=a) at a = 6(1-3a).
=> E[a] = ∫ 01/36a(1-3a)da = 1/9
2b) We can do a similar derivation for z = max(x,y-x,1-y) to get the expected max length. It would be slightly more complicated. Unlike above, we compute here Pr(z< a)
=> Pr(z < a) = Pr(x < a, y-x < a, 1-y < a) = Pr( 1-2a < x < a, 1-a < y < x + a).
We need to be careful here are respect constraints like x > 0, x < y, y < 1. To ensure these constraints are respected. Pr(z < a) is broken into three cases, Case 1: a < 1/2. Case 2: a >= 1/2, 0 < x < 1-a, Case 3: a >= 1/2, 1-a < x < a. Then we have to take differential of obtained Pr(z < a) to get Pr(z=a). [Dont mix terms from the three cases].
Integrate the first case from [1/3 to 1/2]. Second and third case from [1/2 to 1]. We will see the desired output as 1/2+1/9=11/18. (Don't forget to multiply by 2 to cover for x > y case).
2c) It would be easy to obtain: 1 - (2a) - (2b) = 1 - 1/9 - 1/2 - 1/9 = 1/2 - 2/9 = 5/18.
2d) Since there are no constraints, expected length of middle cut is 1/3
2e) For triangle, we require satisfaction of three inequalities
x + y - x > 1-y,
x + 1- y > y - x,
y - x + 1 - y > x
Apart from the implicit x < y inequality.
=> 1/2 < y < x + 1/2, x < 1/2
So x follows a pdf of P(x) = 2. y satisfied the range with probability x. So the probability of triangle formation is ∫ 01/22 x dx = 1/4.
2f) There are discrete stick lengths, e.g. (3/12, 4/12, 5/12) that satisfy this property. There are countably infinite, yet the area encapsulated by them is 0.
Problem set 3 is slightly more complicated. Let us work out 3a).
Let the first cut is at length x from shorter side (x < 1/2). Now from the stick of length (1-x), we make a cut at random of length y. Let y < (1-x)/2. If x is the shortest stick then x < y => x < 1/3.
Case 1: 1/3 < x < 1/2, 0 < y < (1-x)/2 => y is the shortest stick
Case 2: 0 < x < 1/3
Case 2a: 0 < y < x => y is the shortest stick
Case 2b: x < y < (1-x)/2 => x is the shortest stick
We also note that P(x) = 2 (a uniform distribution in [0,1/2]), P(y) = 2/(1-x) (a uniform distribution in [0, (1-x)/2]).
So the Expected min length = ∫ 1/31/2 ∫ 0(1-x)/2 4y/(1-x) dx dy + ∫ 01/3 ∫ 0x 4y/(1-x) dx dy +
∫ 01/3∫ x(1-x)/24x/(1-x) dx dy = 15/16 - 2 log(3/2)
For Generalized problem read David and Nagaraja's Order Statistics (pp. 133-135, and p. 153)
The solutions through Monte-Carlo simulations. Following python code solves problem set 2 and 3.
from random import random def prob2(): niter = 100000 n, ntri, nrtri = 0.0, 0, 0 lsmall, llarge, lmid, lcenter = 0, 0, 0, 0 while n < niter: n += 1 p = random() q = random() x = min(p, q) y = max(p, q) l1 = x l2 = y-x l3 = 1-y mn = min(l1, l2, l3) mx = max(l1, l2, l3) md = 1 - mn - mx lsmall += mn llarge += mx lmid += md lcenter += y if l1 + l2 > l3 and l1 + l3 > l2 and l2 + l3 > l1: ntri += 1 if mn**2 + md**2 == mx**2: nrtri += 1 print 'min len:', lsmall/n print 'max len:', llarge/n print 'mid len:', lmid/n print 'center stick len', lcenter/n print 'triangle prob:', ntri/float(n) print 'right triangle prob:', nrtri/float(n) def prob3(): niter = 100000 n, ntri, nrtri = 0.0, 0, 0 lsmall, llarge, lmid, lcenter = 0, 0, 0, 0 while n < niter: n += 1 p = random() q = random() l1 = 0.5 * (1-p) l2 = 0.25 * (1+p) * (1-q) l3 = 0.25 * (1+p) * (1+q) mn = min(l1, l2, l3) mx = max(l1, l2, l3) md = 1 - mn - mx lsmall += mn llarge += mx lmid += md lcenter += (l2+l3)/2 if l1 + l2 > l3 and l1 + l3 > l2 and l2 + l3 > l1: ntri += 1 if mn**2 + md**2 == mx**2: nrtri += 1 print 'min len:', lsmall/n print 'max len:', llarge/n print 'mid len:', lmid/n print 'center stick len', lcenter/n print 'triangle prob:', ntri/float(n) print 'right triangle prob:', nrtri/float(n)
May 1, 2014
Defective Balls
For problem 1-3 consider a weighing scale (with two sides) and it only tells which side is heavier (it doesnt tell by how much).
Problem 1: You have 9 balls of same weight, except that 1 ball is defective and it is known to be lighter. Find the defective ball in two weighing.
Problem 2: You have 9 balls of same weight, except that 1 ball is defective (it can be heavier or lighter, we dont know). Find the defective ball in three weighing.
Problem 3: You have 9 balls of same weight, except that 2 balls are defective and they are known to be lighter. Find the defective balls in four weighing.
Problem 4: There are 9 machines that produce the balls. You get 9 balls from each machine during a production cycle. You are given a weighing scale that gives the exact weight. You are told that one machine produced defective balls in a given production cycle. Find the defective machine through a single weight measurement. Assume that a perfect ball weight 1lbs and defective ball weight 1.1lbs.
Problem 5: Consider problem 4. Now assume that the weight of the defective ball is unknown. Now find the defective machine in two weighing.
Problem 6: There are two jars A and B. A contains 50 perfect balls and B contains 50 defective balls. Arrange the 100 balls in the two jars in a way that the probability of randomly picking a ball (by first randomly picking a jar) is maximized.
Solution: show
Problem 1: You have 9 balls of same weight, except that 1 ball is defective and it is known to be lighter. Find the defective ball in two weighing.
Problem 2: You have 9 balls of same weight, except that 1 ball is defective (it can be heavier or lighter, we dont know). Find the defective ball in three weighing.
Problem 3: You have 9 balls of same weight, except that 2 balls are defective and they are known to be lighter. Find the defective balls in four weighing.
Problem 4: There are 9 machines that produce the balls. You get 9 balls from each machine during a production cycle. You are given a weighing scale that gives the exact weight. You are told that one machine produced defective balls in a given production cycle. Find the defective machine through a single weight measurement. Assume that a perfect ball weight 1lbs and defective ball weight 1.1lbs.
Problem 5: Consider problem 4. Now assume that the weight of the defective ball is unknown. Now find the defective machine in two weighing.
Problem 6: There are two jars A and B. A contains 50 perfect balls and B contains 50 defective balls. Arrange the 100 balls in the two jars in a way that the probability of randomly picking a ball (by first randomly picking a jar) is maximized.
Solution: show
For the problems 1-3, divide the 9 balls into 3 groups of 3 balls each.
Problem1: Weight group1 and group2. If they are same weight, defective ball is in group3. Pick two balls from group3 and weight them. If they weight same, then the ball left out (from group3) is defective. All other cases are also easy to see.
Problem2: Weight group1 and group2. Then weigh group2 and group3. After these two weighing you would know the group containing the defective ball as well as whether that ball is heavy or light. Then simply pick two balls from the identified group and weight them.
Problem3: This problem has two cases: Case1: One group contains both the defective balls or Case2: Two groups contains one defective ball. Weight group1 and group2. Then weight group2 and group3. After these two weighing you would know whether we have case1 or case2 and also the groups containing the defective balls would be known. Case1 is now easy to solve. Consider case2. Say group1 and group2 contain one defective ball. We can easily find the defective ball in the two groups INDEPENDENTLY in one weighing each (hence 4 weighing).
Problem 4: Let defective machine is x. Pick 1 ball from machine 1, 2 from machine 2, and so on. Total wt would be n*(n+1)/2 + x(.1). Solve for x.
Problem 5: Form two equations. Let defective ball has wt = y. First pick 1 ball from machine 1, 2 from machine 2, and so on. We get equation:
n*(n+1)/2 - x(1-y) = A
Next pick 1 ball from machine 9, 2 from machine 8, and so on. We get equation:
n*(n+1)/2 - (n-x+1)(1-y) = B
We have two equations and two variables. We get x = a(n+1)/(a+b), where a = n(n+1)/2-A and b = n(n+1)/2-B.
Problem 6. Leave one perfect ball in Jar A and put rest in Jar B. The probability of picking the perfect ball is now 1/2 + 49/99. This is clearly the best we can do.
Problem1: Weight group1 and group2. If they are same weight, defective ball is in group3. Pick two balls from group3 and weight them. If they weight same, then the ball left out (from group3) is defective. All other cases are also easy to see.
Problem2: Weight group1 and group2. Then weigh group2 and group3. After these two weighing you would know the group containing the defective ball as well as whether that ball is heavy or light. Then simply pick two balls from the identified group and weight them.
Problem3: This problem has two cases: Case1: One group contains both the defective balls or Case2: Two groups contains one defective ball. Weight group1 and group2. Then weight group2 and group3. After these two weighing you would know whether we have case1 or case2 and also the groups containing the defective balls would be known. Case1 is now easy to solve. Consider case2. Say group1 and group2 contain one defective ball. We can easily find the defective ball in the two groups INDEPENDENTLY in one weighing each (hence 4 weighing).
Problem 4: Let defective machine is x. Pick 1 ball from machine 1, 2 from machine 2, and so on. Total wt would be n*(n+1)/2 + x(.1). Solve for x.
Problem 5: Form two equations. Let defective ball has wt = y. First pick 1 ball from machine 1, 2 from machine 2, and so on. We get equation:
n*(n+1)/2 - x(1-y) = A
Next pick 1 ball from machine 9, 2 from machine 8, and so on. We get equation:
n*(n+1)/2 - (n-x+1)(1-y) = B
We have two equations and two variables. We get x = a(n+1)/(a+b), where a = n(n+1)/2-A and b = n(n+1)/2-B.
Problem 6. Leave one perfect ball in Jar A and put rest in Jar B. The probability of picking the perfect ball is now 1/2 + 49/99. This is clearly the best we can do.
December 9, 2013
Palindrome Numbers
Problem 1: Given a number n, check whether it is a palindrome or not. For e.g., n = 303 is a palindrome, where as n = 302 is not. Give an O(log n) algorithm.
Problem 2: Given a number n, find the largest palindrome that is less than or equal to n. For e.g., n = 36, then 33 is the largest palindrome less than or equal to n. Give an O(log n) algorithm.
Problem 3: Given a number n, find all the palindrome less than or equal to n. Give an efficient O(k) algorithm, where k = number of palindromes less than n.
Problem 4: Prove that all palindromes with even length is divisible by 11. For e.x. 6996 is divisible by 11.
Problem 5: Consider a palindrome number generator as follows.
Step 1: n = n + reverse(n)
Step 2: If !is_palindrome(n): Goto Step 1
Given a palindrome number m, find the smallest n which can generate m using this generator. Is it possible that no such n exist?
For e.g., m = 121, we get n = 19. To see this, we get, n1 = 19+91 = 110. In the next step we get n2 = 110 + 011 = 121.
Problem 2: Given a number n, find the largest palindrome that is less than or equal to n. For e.g., n = 36, then 33 is the largest palindrome less than or equal to n. Give an O(log n) algorithm.
Problem 3: Given a number n, find all the palindrome less than or equal to n. Give an efficient O(k) algorithm, where k = number of palindromes less than n.
Problem 4: Prove that all palindromes with even length is divisible by 11. For e.x. 6996 is divisible by 11.
Problem 5: Consider a palindrome number generator as follows.
Step 1: n = n + reverse(n)
Step 2: If !is_palindrome(n): Goto Step 1
Given a palindrome number m, find the smallest n which can generate m using this generator. Is it possible that no such n exist?
For e.g., m = 121, we get n = 19. To see this, we get, n1 = 19+91 = 110. In the next step we get n2 = 110 + 011 = 121.
December 1, 2013
Finding Relevant Keywords from Large Corpus of Text
Problem 1: Given a large paragraph of words containing n words (separated by space), and k keywords. Find the smallest distance between these keywords in the paragraph.
To illustrate the problem, consider that the paragraph is A B A C E D O A B. Let us say that we are interested in two keywords A and D. Then the closest they appear is at position 8 and 6 (D O A) and the minimum distance is 1 (i.e. number of non-keyword words between them).
Build an efficient algorithm with O(k) space and O(n log k) running cost.
Solution: show
Problem 2: Given a large paragraph of words (separated by space). Consider N tuples of names and interests. If a given person and their interest appears within K words then its a match else not. Find all matches from the paragraph.
To illustrate the problem, consider that the paragraph is "John likes swimming and Jack likes Tennis". Let the interest map is <john swimming>, <john tennis>, <jack swimming>, <jack tennis>. For k = 3, 4, 5, 6 the matches found in the paragraph are <john swimming>, <jack swimming>, <jack tennis>.
Propose an efficient algorithm for two cases: k is small, k is very large.
To illustrate the problem, consider that the paragraph is A B A C E D O A B. Let us say that we are interested in two keywords A and D. Then the closest they appear is at position 8 and 6 (D O A) and the minimum distance is 1 (i.e. number of non-keyword words between them).
Build an efficient algorithm with O(k) space and O(n log k) running cost.
Solution: show
To solve this problem, we need to register the last seen position of each of k keywords, as we read the paragraph word-by-word. If the next word, and if that matches a keyword, we need to update the last seen position of the matching keyword and also compute the minimum of the position register. That helps us find the smallest distance.
In order to implement this efficiently, we need a hash map and min heap (tree based implementation, not array based). Following is the code to solve this problem.
In order to implement this efficiently, we need a hash map and min heap (tree based implementation, not array based). Following is the code to solve this problem.
Let K[i] indicate ith keyword. Let W[i] indicate ith word in the paragraph. // init map and heap map = {} heap = [] for i in 1 to k // init heap with -Inf as position of the keyword node = heap_node() node.val = -Inf // map stores the keyword and its pointer to node in the heap heap.put(node) map.put(K[i], node) end idx_start = 0 idx_end = 0 min_dist = Inf for i in 1 to n w = W[i] // check if w is one of the keywords, if not ignore if !map.contains_key(w) continue end // get the heap node corresponding to the // key W[i] and update its position node = map.get(w) node.val = i // re-heapify the min heap heap.heapify() // min of all positions idx = heap.min() // if idx = -Inf then some keywords are not seen yet. if idx == -Inf continue end // compute distance between keywords dist = i - idx - k + 1 if dist < min_dist idx_start = idx idx_end = i min_dist = dist end endThe above algorithm requires O(k) memory for the map and the heap. For each word, it requires O(log k) to re-heapify the min heap.
Problem 2: Given a large paragraph of words (separated by space). Consider N tuples of names and interests. If a given person and their interest appears within K words then its a match else not. Find all matches from the paragraph.
To illustrate the problem, consider that the paragraph is "John likes swimming and Jack likes Tennis". Let the interest map is <john swimming>, <john tennis>, <jack swimming>, <jack tennis>. For k = 3, 4, 5, 6 the matches found in the paragraph are <john swimming>, <jack swimming>, <jack tennis>.
Propose an efficient algorithm for two cases: k is small, k is very large.
August 12, 2012
Bulls and Cows
Bulls and Cows is a two player game. First player thinks of a N digit secret number and second player guesses that number. This is done by second player making a guess and first player telling the number of bulls and cows in the guess. If the digits of guess matches secret and they are in right position, it's a bull, if they match but on different position, they are cows. For e.g. Let secret be 0123 and guess be 0012. The guess has 1 bull, 2 cows. The game continues, until second player gets a response of N bulls.
The problem is to write a computer solver for bulls and cows that would try to guess the secret making only a small number of attempts.
Solution: show
The problem is to write a computer solver for bulls and cows that would try to guess the secret making only a small number of attempts.
Solution: show
Let D be the unique digits that can be put in the secret. Let N be the length of the secret number. The total number of secrets is D^N.
A naive solution is to try all D^N guesses until N bulls is the response. This approach makes D^N attempts in worst case. Not good.
We can do better than this. Let us make a random first guess and we get a response r. We create a pruned set by selecting all numbers (from D^N possible) that have response r with the guess. Secret must be in the pruned set. We can pick a number from this set as our guess and repeat the logic until we hear N bulls.
There are three ways to pick the next number
a) Pick the first number from the pruned set
b) Pick the number randomly from the pruned set
c) Use a heuristic to pick a number (Note this number can be outside the pruned set).
Our experiments show that b) is better than a). For N = 4, D = 10 with distinct digits in secret (5040 possible secrets) and initial guess of 0123, the pick first strategy makes 6.0051 attempts on average and 9 attempts in a worst case. On the other hand, randomly picking from the pruned set makes 5.9136 attempts on average and 9 attempts in a worst case.
Note that optimal strategy for N=4, D=10 with distinct digits in secret (5040 possible secrets) takes an average of 5.2131 attempts with 7 attempts in the worst case (technical paper). It is also guaranteed that 6 attempts is the worst case lower bound for any algorithm. Since 5.2131 is theoretical best, 5.9136 achieved by random is not bad.
Let us now explore option c. There are (N+1)*(N+2)/2 - 1 response classes starting from (0 bulls, 0 cows) to (N bulls, 0 cows). Note the -1 is there because if the response is N-1 bulls then we cannot have 1 cow. The response to a guess, partitions the D^N numbers (as shown in Fig 1).
Now we know that the secret lies in the pruned set. We need to pick our next guess that partitions the pruned set in the best possible manner. One way to judge the goodness of partition is by information gain [sum_i ni * log (ni)]. The issue now is that to compute information gain, we need to compute responses for D^N * length(pruned set) pairs.
Note that the best guess can be outside the pruned set, that is why we need to compute D^N * length(pruned set) responses and not length(pruned set)^2. To see this consider that pruned set is [089, 189, 289, 389, 489, 589, 689, 789]. If our secret is 789 we need 8 guesses but if we try 012 as guess 1 then we get pruned set [389, 489, 589, 689, 789], then we try 345 as our guess 2 to get a pruned set [689, 789] and now two more guesses are required. So we solved it in 4 attempts instead of 8.
But the bigger issues is that this technique can only be applied if length of pruned set is small, otherwise it can be computationally prohibitive and memory intensive. Our strategy is as follows:
Source code: bullscows.py
Technical resources: optimal strategy, bullscows.pdf
A naive solution is to try all D^N guesses until N bulls is the response. This approach makes D^N attempts in worst case. Not good.
We can do better than this. Let us make a random first guess and we get a response r. We create a pruned set by selecting all numbers (from D^N possible) that have response r with the guess. Secret must be in the pruned set. We can pick a number from this set as our guess and repeat the logic until we hear N bulls.
There are three ways to pick the next number
a) Pick the first number from the pruned set
b) Pick the number randomly from the pruned set
c) Use a heuristic to pick a number (Note this number can be outside the pruned set).
Our experiments show that b) is better than a). For N = 4, D = 10 with distinct digits in secret (5040 possible secrets) and initial guess of 0123, the pick first strategy makes 6.0051 attempts on average and 9 attempts in a worst case. On the other hand, randomly picking from the pruned set makes 5.9136 attempts on average and 9 attempts in a worst case.
Note that optimal strategy for N=4, D=10 with distinct digits in secret (5040 possible secrets) takes an average of 5.2131 attempts with 7 attempts in the worst case (technical paper). It is also guaranteed that 6 attempts is the worst case lower bound for any algorithm. Since 5.2131 is theoretical best, 5.9136 achieved by random is not bad.
Let us now explore option c. There are (N+1)*(N+2)/2 - 1 response classes starting from (0 bulls, 0 cows) to (N bulls, 0 cows). Note the -1 is there because if the response is N-1 bulls then we cannot have 1 cow. The response to a guess, partitions the D^N numbers (as shown in Fig 1).
Fig 1. Response partition of 10^4 numbers for the secret 0123 |
Now we know that the secret lies in the pruned set. We need to pick our next guess that partitions the pruned set in the best possible manner. One way to judge the goodness of partition is by information gain [sum_i ni * log (ni)]. The issue now is that to compute information gain, we need to compute responses for D^N * length(pruned set) pairs.
Note that the best guess can be outside the pruned set, that is why we need to compute D^N * length(pruned set) responses and not length(pruned set)^2. To see this consider that pruned set is [089, 189, 289, 389, 489, 589, 689, 789]. If our secret is 789 we need 8 guesses but if we try 012 as guess 1 then we get pruned set [389, 489, 589, 689, 789], then we try 345 as our guess 2 to get a pruned set [689, 789] and now two more guesses are required. So we solved it in 4 attempts instead of 8.
But the bigger issues is that this technique can only be applied if length of pruned set is small, otherwise it can be computationally prohibitive and memory intensive. Our strategy is as follows:
step 1. Pick distinct digits to make first guess. step 2. Select random guess from the pruned set. step 3. Perform step 2, three or four times. step 4. Pick guess using information gain measure. step 5. Perform step 4, until game is solved.For N = 4, D = 10, (5040 possible secrets), this strategy makes 5.7201 attempts on average with 8 attempts in a worst case. Summary of algorithms is as follows:
D=10, N=4, 5040 secrets | ||
algo | average #attempts | worst #attempts |
---|---|---|
pick first | 6.0051 | 9 |
pick random | 5.9136 | 9 |
heuristic pick | 5.7201 | 8 |
optimal | 5.2131 | 7 |
D=10, N=4, 10000 secrets | ||
algo | average #attempts | worst #attempts |
---|---|---|
pick first | 8.0291 | 13 |
pick random | 6.6323 | 11 |
heuristic pick | 6.2001 | 8 |
Source code: bullscows.py
Technical resources: optimal strategy, bullscows.pdf
July 21, 2012
Number of Paths in a Rectangular Grid
Problem 1: Consider a rectangular grid of n x m rectangular cells. The problem is to find the number of shortest (or monotonic in this case) paths along the edges of the cell that start at (0,0) and end at (n,m). How do we print all such paths?
A monotonic path consists of moving up or right but not down or left. Figure 1 illustrates such a path for a grid of size 5 x 3.
Solution: show
Problem 2: Now we need to print all monotonic paths that do not go above the diagonal y = x. Figure 2 shows such a path. Note that the path in figure 1 goes above the diagonal, hence not desirable in this case. How many such paths exist for n x m grid (n >= m)?
Solution: show
Problem 3: Now let us consider that apart from up and right moves we can take up-right (diagonal) moves. Figure 3 shows such a path. How many such paths exist for n x m grid?
Solution: show
A monotonic path consists of moving up or right but not down or left. Figure 1 illustrates such a path for a grid of size 5 x 3.
Figure 1 |
To go from (0,0) to (n,m), we need to make n moves to the right and m moves up. These moves can be performed in any order and still we reach the point (n,m). So the total number of ways is simply the total number of ways n red balls and m green balls can be arranged in a straight line. This is simply n+mCm.
We can print all such paths using recursion. The following code does this.
We can print all such paths using recursion. The following code does this.
Function printPath(n, m, path) if n == 0 and m == 0 print path return end if n > 0 printPath(n-1, m, path + "right") end if m > 0 printPath(n, m-1, path + "up") end end
Problem 2: Now we need to print all monotonic paths that do not go above the diagonal y = x. Figure 2 shows such a path. Note that the path in figure 1 goes above the diagonal, hence not desirable in this case. How many such paths exist for n x m grid (n >= m)?
Figure 2 |
Let us first discuss how to print all monotonic paths that do not go above the diagonal. In order for a path to be valid, at all points (x,y) on the path, y should be less than or equal to x. We can simply ensure this constraint while printing paths, using the following code.
Now, In order to obtain the number of paths, we need to subtract the number of paths that cross the diagonal from the total number of paths n+mCm.
Consider a path that crosses the diagonal. Let that path cross the diagonal for the first time at point (x,y). Since this is the first point after crossing the diagonal, so y should be equal to x+1. So the path has taken x right and y up moves. It is yet to take n-x right and m-y up moves to reach (n,m). If we flip the n-x right to n-x up and m-y up to right, then starting from point (x,y), we will reach the point (x+m-y,y+n-x). Setting y = x+1, we get that the resulting end point is (m-1,n+1). Since m <= n, the resulting end point is above the diagonal. So we can do surgery on all paths that cross the diagonal to make them reach (m-1,n+1). So the total number of paths that cross the diagonal must be the total number of paths from (0,0) to (m-1,n+1). This is n+mCm-1.
So the desired number is = n+mCm - n+mCm-1 = [(n-m+1)/(n+1)] * n+mCm. This is a proof based on Andre's reflection method.
Alternately:
Andre's reflection method was used to originally solve a very interesting problem called Ballot problem. The ballot problem states that in a two candidate election, candidate A polled p votes and candidate B polled q votes (p > q). What is the probability that candidate A was leading over candidate B throughout the counting.
Well the probability is simply (p-q)/(p+q).
We can use the Ballot theorem to find the number of monotonic paths below the diagonal (that do not even touch the diagonal). We can assume right move to be candidate A and up move to be candidate B and the two polled n,m votes respectively. So the probability of number of right leading number of up is (n-m)/(n+m). So the number of monotonic paths that are below the diagonal (do not even touch) is [(n-m)/(n+m)] * n+mCm.
Now since we are allowed to touch the diagonal, we can assume instead going to (n+1,m). This is because the first two moves would be right moves (otherwise we might touch the diagonal). Then we eliminate the first right move and allow a diagonal touch. So the total number of ways is [(n+1-m)/(n+1+m)] * n+m+1Cm = [(n-m+1)/(n+1)] * n+mCm.
Function printPath(x, y, n, m, path) if x == n and y == m: print path return end if x < n printPath(x+1, y, n, m, path + "right") end if y < m and y < x printPath(x, y+1, n, m, path + "up") end endThe total number of such paths are [(n-m+1)/(n+1)] * n+mCm. For n = m, this simplifies to [1/(n+1)] * 2nCn which is known as Catalan number. The wiki page also contains other interesting problems that have Catalan number as their solution.
Now, In order to obtain the number of paths, we need to subtract the number of paths that cross the diagonal from the total number of paths n+mCm.
Consider a path that crosses the diagonal. Let that path cross the diagonal for the first time at point (x,y). Since this is the first point after crossing the diagonal, so y should be equal to x+1. So the path has taken x right and y up moves. It is yet to take n-x right and m-y up moves to reach (n,m). If we flip the n-x right to n-x up and m-y up to right, then starting from point (x,y), we will reach the point (x+m-y,y+n-x). Setting y = x+1, we get that the resulting end point is (m-1,n+1). Since m <= n, the resulting end point is above the diagonal. So we can do surgery on all paths that cross the diagonal to make them reach (m-1,n+1). So the total number of paths that cross the diagonal must be the total number of paths from (0,0) to (m-1,n+1). This is n+mCm-1.
So the desired number is = n+mCm - n+mCm-1 = [(n-m+1)/(n+1)] * n+mCm. This is a proof based on Andre's reflection method.
Alternately:
Andre's reflection method was used to originally solve a very interesting problem called Ballot problem. The ballot problem states that in a two candidate election, candidate A polled p votes and candidate B polled q votes (p > q). What is the probability that candidate A was leading over candidate B throughout the counting.
Well the probability is simply (p-q)/(p+q).
We can use the Ballot theorem to find the number of monotonic paths below the diagonal (that do not even touch the diagonal). We can assume right move to be candidate A and up move to be candidate B and the two polled n,m votes respectively. So the probability of number of right leading number of up is (n-m)/(n+m). So the number of monotonic paths that are below the diagonal (do not even touch) is [(n-m)/(n+m)] * n+mCm.
Now since we are allowed to touch the diagonal, we can assume instead going to (n+1,m). This is because the first two moves would be right moves (otherwise we might touch the diagonal). Then we eliminate the first right move and allow a diagonal touch. So the total number of ways is [(n+1-m)/(n+1+m)] * n+m+1Cm = [(n-m+1)/(n+1)] * n+mCm.
Problem 3: Now let us consider that apart from up and right moves we can take up-right (diagonal) moves. Figure 3 shows such a path. How many such paths exist for n x m grid?
Figure 3 |
This problem is similar to problem 1, except that paths can contain diagonal moves. One diagonal move consumes 1 right and 1 up move.
So let us assume that we make r diagonal moves, where r <= min(n,m). In this case, we need to make n-r right moves and m-r up moves apart from r diagonal moves to reach (n,m). The number of ways, we can do this is = (n+m-r)! / [ (n-r)! * (m-r)! * r! ]. If we set r = 0, then we get the solution to problem 1. But in this case r can go from 0 to min(m,n). So the total number of ways is =
The code to print all paths has a small addition to the code for problem 1.
Source Code: grid_paths.py
So let us assume that we make r diagonal moves, where r <= min(n,m). In this case, we need to make n-r right moves and m-r up moves apart from r diagonal moves to reach (n,m). The number of ways, we can do this is = (n+m-r)! / [ (n-r)! * (m-r)! * r! ]. If we set r = 0, then we get the solution to problem 1. But in this case r can go from 0 to min(m,n). So the total number of ways is =
Sum_r (n+m-r)! / [(n-r)!*(m-r)!*r!], where r in [0, min(m,n)]Alternately, we can derive the recurrence relation for the total number of paths. Let D(n,m) be the total number of paths from (0,0) to (n,m) with up, right and diagonal moves. Clearly, we can reach (n,m) from (n-1,m) by a right move, from (n,m-1) by a up move, and from (n-1,m-1) by a diagonal move. These are the only three atomic steps. So, D(n,m) = D(n-1,m) + D(n,m-1) + D(n-1,m-1) for n,m > 0. D is called as Delannoy number.
The code to print all paths has a small addition to the code for problem 1.
Function printPath(n, m, path) if n == 0 and m == 0 print path return end if n > 0 printPath(n-1, m, path + "right") end if m > 0 printPath(n, m-1, path + "up") end if n > 0 and m > 0 printPath(n-1, m-1, path + "diagonal") end endWe can extend this problem further by asking the number of paths that do not cross the diagonal (same constraint as problem 2). The number of paths in this case is called Schroder number.
Source Code: grid_paths.py
July 9, 2012
Finding Top K Elements in Large Dataset
Problem: Consider the problem of finding top K frequent numbers from a file with N numbers. Now consider that N is very large such that it cannot fit in the memory M available to the program. How do we find the top K frequent elements now with the assumption that K < M < N.
Deterministic solution: show
Single pass probabilistic solution: show
Deterministic solution: show
The idea is similar to solving the problem for small N (N < M) (see here). For large N, divide the problem into chunks of size <= M, then solve it.
In order to divide the problem, consider a uniform hashing function H that takes a key and returns an integer from the set {1,2,....,ceil(N/M)}. So we get N/M chunks with roughly M unique keys per chunk. Since the number of unique numbers (U) is less than N, we expect each chunk to have less than M unique numbers.
Now for each chunk, we compute the frequency of numbers contained in that chunk. This frequency computation can be done in memory and we can maintain a MIN-HEAP of size K which can be directly updated (follow the steps presented here). As a result only two reads of the dataset and one disk write is required to get the top K frequent items. The complexity of the algorithm is O(N log K).
Probably for a small K (K < M^2/N), we can compute the top K per chunk in O(K log M) and combine the top K for all chunks N*K/M (<M) to get the overall top K. The total complexity of the algorithm is O(N + M log M).
Alternate method:
We assume that the disk has a huge capacity which allows us to make a disk based hash table. Imagine that we create a very large file with holes and divide the file into B blocks with a capacity to hold more than N/B numbers and their integer counts. Using a uniform hash function H which takes as input an arbitrary number x and return H(x): the block number from the set {0, 1, ..., B-1}, we can store the numbers and their frequencies on disk map. H would give us the offset to seek within the file and we write number (and their frequency) sequentially once in correct block (or via another hash function H1).
Now we proceed as usual, start counting the numbers in memory using a hash table (former approach). When we encounter a new number that could not be put in memory, we purge some entries from the hash table. The purged entries are written to the disk map. Now to purge the entries, we maintain an array which counts the frequency of the keys that lie within a block. Keys that belong to the most infrequent blocks can be purged first (or blocks that are least recently used or that lead to least disk access, etc).
The following code gives a very basic detail of this approach
In order to divide the problem, consider a uniform hashing function H that takes a key and returns an integer from the set {1,2,....,ceil(N/M)}. So we get N/M chunks with roughly M unique keys per chunk. Since the number of unique numbers (U) is less than N, we expect each chunk to have less than M unique numbers.
Now for each chunk, we compute the frequency of numbers contained in that chunk. This frequency computation can be done in memory and we can maintain a MIN-HEAP of size K which can be directly updated (follow the steps presented here). As a result only two reads of the dataset and one disk write is required to get the top K frequent items. The complexity of the algorithm is O(N log K).
Probably for a small K (K < M^2/N), we can compute the top K per chunk in O(K log M) and combine the top K for all chunks N*K/M (<M) to get the overall top K. The total complexity of the algorithm is O(N + M log M).
Alternate method:
We assume that the disk has a huge capacity which allows us to make a disk based hash table. Imagine that we create a very large file with holes and divide the file into B blocks with a capacity to hold more than N/B numbers and their integer counts. Using a uniform hash function H which takes as input an arbitrary number x and return H(x): the block number from the set {0, 1, ..., B-1}, we can store the numbers and their frequencies on disk map. H would give us the offset to seek within the file and we write number (and their frequency) sequentially once in correct block (or via another hash function H1).
Now we proceed as usual, start counting the numbers in memory using a hash table (former approach). When we encounter a new number that could not be put in memory, we purge some entries from the hash table. The purged entries are written to the disk map. Now to purge the entries, we maintain an array which counts the frequency of the keys that lie within a block. Keys that belong to the most infrequent blocks can be purged first (or blocks that are least recently used or that lead to least disk access, etc).
The following code gives a very basic detail of this approach
BlockFreq = array(B) NumberFreq = hashtable(M) diskwrite = 0 for i = 1:N x = A[i] BlockFreq[H[x]] += 1 if NumberFreq.haskey(x) NumberFreq[x] += 1 continue end if NumberFreq.hasspace() NumberFreq[x] = 1 continue end if DiskMap.haskey(x) DiskMap[x] += 1 else DiskMap[x] = 1 end if diskwrite == 10 purge(NumberFreq, BlockFreq) diskwrite = 0 else diskwrite += 1 end endHere purge is a procedure to purge some set of keys from NumberFreq based on the BlockFreq. Note that this code omits several key details of this process, so the idea presented here is quite crude.
Single pass probabilistic solution: show
Solution 1 is quite efficient as it requires only two disk reads of the dataset, but the bottleneck can be the disk writes during the initial chunk formation. We can reduce that bottleneck by considering a data-structure called Bloom filters.
So consider that we have B uniform hash functions H1, H2, ..., HB and each hash function converts a key to a range {1,2,...,R}. Now imagine an array C of size B x R (<M) that represents count of how many times each key is seen. For each number (say x) that we read from the dataset, compute Hi[x] and increment C[i,Hi[x]] by 1. So we maintain B counts of x in different R buckets. We can say that the true count of x is less than min(C[1,H1[x]], ..., C[B,HB[x]]).
Now if the query is to get all the elements with frequency greater than some threshold then we can use bloom filters to get all such numbers (with some false positives though, which can be filtered using another pass on the dataset). This can save a complete write of the data to the disk. (see the paper: Computing Iceberg Queries Efficiently).
But in our case, we are interested in finding the top K frequent numbers. Following modification can be used to estimate the frequency of each number.
Note that the above algorithm takes a single passes on the dataset (and no disk write) but it is not guaranteed to give the top K frequent items. It can make some mistakes for some less frequent items. In practice the choice of the hashing functions can be critical for the performance of the algorithm.
So consider that we have B uniform hash functions H1, H2, ..., HB and each hash function converts a key to a range {1,2,...,R}. Now imagine an array C of size B x R (<M) that represents count of how many times each key is seen. For each number (say x) that we read from the dataset, compute Hi[x] and increment C[i,Hi[x]] by 1. So we maintain B counts of x in different R buckets. We can say that the true count of x is less than min(C[1,H1[x]], ..., C[B,HB[x]]).
Now if the query is to get all the elements with frequency greater than some threshold then we can use bloom filters to get all such numbers (with some false positives though, which can be filtered using another pass on the dataset). This can save a complete write of the data to the disk. (see the paper: Computing Iceberg Queries Efficiently).
But in our case, we are interested in finding the top K frequent numbers. Following modification can be used to estimate the frequency of each number.
MH = MIN-HEAP(K) for i = 1:N x = A[i] for b = 1:B C[b,Hb(x)] += Sb(x) end if contains(MH,x) increment count of x in the heap else f = median(Hi(x) * Si(x), \forall i) if f > min(MH) remove-min(MH) insert(MH, (x,f)) end end endThe Sb functions is a {-1,+1} hash function and this data-structure is called CountSketch. More details of the method is available in the paper: Finding Frequent Items in Data Streams.
Note that the above algorithm takes a single passes on the dataset (and no disk write) but it is not guaranteed to give the top K frequent items. It can make some mistakes for some less frequent items. In practice the choice of the hashing functions can be critical for the performance of the algorithm.
July 3, 2012
Finding Top K Frequent Items
Problem: Consider a file containing N numbers. For e.x. {2,3,4,3,1,78,78,3} is a file containing N=8 numbers. Assume that N is small enough to fit in the computer's memory M. So in this case N << M. Now we need to find top K frequent numbers in the file. For the above example, output should be {3,78} for K = 2.
Running Time: Time complexity should be either O(N), O(N log K), O(N + K log N)
Solution: show
Finding top K frequent elements is a classical database and data streaming problem and there are several solutions to it ranging from deterministic to probabilistic. I'll illustrate the three obvious ones here.
The first step is to count how many times each number appears in the file. If the file is pre-sorted then we need a single scan over the file.
The problem that remains is to find the most frequent K numbers from the array. The naive approach would be to sort numbers on their frequencies and pick the top K. This would take O(U log U) time where U (=5) is the number of unique elements in the array. If we consider U = O(N) then the time complexity of this sorting is O(N log N). We can do better than that as follows:
Approach 1: O(N) time
Use selection algorithm to find the Kth most frequent number (on the second element of the tuple) using the Selection Algorithm in O(U) time. The Kth most frequent element partitions the array in two parts: first part containing top K most frequent elements and second part containing bottom U-K-1 frequent elements. So we get the top K most frequent elements in no particular order in O(N) time (assuming U = O(N)). They can be sorted in O(K log K) if needed. Note that although this approach runs in O(N) time, the constants hidden in the O-notation can be large. So in practice this approach can be slower than the two approaches described below.
Approach 2: O(N log K) time
Pick first K tuples and put them on MIN-HEAP, where a tuple (x,y) is less than a tuple (a,b) if y is less than b. The time complexity to make the min heap of size K is O(K).
Then for the remaining U - K elements, pick them one by one. If the picked element is lesser than the minimum on the heap, discard that element. Otherwise remove the min element from the head and insert the selected element in the heap. This ensures that heap contains only K elements. This delete-insert operation is O(log K) for each element.
Once we are done picking all the elements, the elements that finally remain in the min-heap are the top K frequent items which can be popped in O(K log K) time. The overall cost of this approach is O(K + (U-K) log K + K log K) = O(K + U log K). Since K < U and U = O(N), we get the time complexity of O(N log K).
Approach 3: O(N + K log N) time
This approach is similar to approach 2 but the main difference is that we make a MAX-HEAP of all the U elements. So the first step is to make the max heap of all the elements in O(U). Then remove the maximum element from the heap K times in O(K log U) time. The K removed elements are the desired most frequent elements. The time complexity of this method is O(U + K log U) and by setting U = O(N) we get O(N + K log N).
Let us stop for a moment and contrast approach 2 from 3. For simplicity let T2 = K + N log K be the time complexity of approach 2 and T3 = N + K log N be the time complexity of the third approach. Figure below plots the ratio T2/T3 for N=100 and for different values of K. We observe that approach 3 is considerably better for small values of K whereas approach 2 is better for large values of K. Though actual difference depends on the constants involved we can still see the merit of one approach over another.
The first step is to count how many times each number appears in the file. If the file is pre-sorted then we need a single scan over the file.
Function COUNTS: A = load_file_in_array last = A[1] ctr = 1 for i = 2:N if last == A[i] ctr += 1 else CountMap{last} = ctr; last = A[i] ctr = 1 end end // required for the last element of the array CountMap{last} = ctr endNote that CountMap is a hashmap that stores counts of all the elements. The procedure COUNTS is quite efficient if the file is pre-sorted. Now if the file is not pre-sorted then sorting increases the time complexity to O(N log N). In that case, we can do better by directly using hashmap to maintain current count of each number without sorting the file as follows:
Function EFFICIENT_COUNTS: A = load_file_in_array for i = 1:N CountMap{A[i]} += 1 end endThe above procedure obtains counts of each element in a single scan of the file. Hence it runs in O(N) time. So now we have all the numbers along with their frequencies in an array (can be extracted by enumerating all keys of the CountMap or by another scan of the file!!). So for the example we have in the problem statement, we get the following tuple: {(2,1), (3,3), (4,1), (1,1), (78,2)}.
The problem that remains is to find the most frequent K numbers from the array. The naive approach would be to sort numbers on their frequencies and pick the top K. This would take O(U log U) time where U (=5) is the number of unique elements in the array. If we consider U = O(N) then the time complexity of this sorting is O(N log N). We can do better than that as follows:
Approach 1: O(N) time
Use selection algorithm to find the Kth most frequent number (on the second element of the tuple) using the Selection Algorithm in O(U) time. The Kth most frequent element partitions the array in two parts: first part containing top K most frequent elements and second part containing bottom U-K-1 frequent elements. So we get the top K most frequent elements in no particular order in O(N) time (assuming U = O(N)). They can be sorted in O(K log K) if needed. Note that although this approach runs in O(N) time, the constants hidden in the O-notation can be large. So in practice this approach can be slower than the two approaches described below.
Approach 2: O(N log K) time
Pick first K tuples and put them on MIN-HEAP, where a tuple (x,y) is less than a tuple (a,b) if y is less than b. The time complexity to make the min heap of size K is O(K).
Then for the remaining U - K elements, pick them one by one. If the picked element is lesser than the minimum on the heap, discard that element. Otherwise remove the min element from the head and insert the selected element in the heap. This ensures that heap contains only K elements. This delete-insert operation is O(log K) for each element.
Once we are done picking all the elements, the elements that finally remain in the min-heap are the top K frequent items which can be popped in O(K log K) time. The overall cost of this approach is O(K + (U-K) log K + K log K) = O(K + U log K). Since K < U and U = O(N), we get the time complexity of O(N log K).
Approach 3: O(N + K log N) time
This approach is similar to approach 2 but the main difference is that we make a MAX-HEAP of all the U elements. So the first step is to make the max heap of all the elements in O(U). Then remove the maximum element from the heap K times in O(K log U) time. The K removed elements are the desired most frequent elements. The time complexity of this method is O(U + K log U) and by setting U = O(N) we get O(N + K log N).
Let us stop for a moment and contrast approach 2 from 3. For simplicity let T2 = K + N log K be the time complexity of approach 2 and T3 = N + K log N be the time complexity of the third approach. Figure below plots the ratio T2/T3 for N=100 and for different values of K. We observe that approach 3 is considerably better for small values of K whereas approach 2 is better for large values of K. Though actual difference depends on the constants involved we can still see the merit of one approach over another.
March 26, 2010
Problems solvable using Hashtable
Hashtable are extremely useful data-structure as they provide storage and retrieval in O(1) time (amortized). Several problems of algorithms can be very efficiently solved using hashtables which otherwise turn out to be quite expensive. In this post, we consider some of these problems:
Problem 1: Remove duplicate elements from an unsorted array of size N.
Solution: show
Problem 2: Find intersection of K unsorted array of N elements each. Intersection consists of elements that appear in all the K arrays.
Problem 3: How to make a linked list support operations in O(1) time. The operations on linked list can be insertion after any arbitrary valued node, deletion of any arbitrary valued node.
Problem 4: Find all unique pairs of element in an array that sum to S. For ex. If array = {2,4,6,4,6} and S = 8 then answer is {(2,6), (4,4)}
Problem 5: Consider an array containing unique elements. Find a triplet of elements in the array that sum to S (extension of problem 4). Can hash-tables improve the running time of your algorithm.
Problem 6: Consider two strings of size M, N. Perform string matching in size O(M+N).
Problem 1: Remove duplicate elements from an unsorted array of size N.
Solution: show
Lets first see the solution without hashtable. Naive solution would be to compare each pair of elements. If they are same then drop one of them. This solution would take O(N^2) time in average case and O(N) time in best case, when all the elements are same. The following code does that:
We can do better than worst case O(N^2) time. If we sort the elements in O(N logN) time and compare adjacent elements in O(N), we can solve the above problem in O(N logN) time.
for i = 1 to N-1 if a[i] == NaN: continue print a[i] for j = i+1 to N if a[i] == a[j]: a[j] = NaNAbove code prints all non-duplicates from the array. Naive solution is average case O(N^2). In practice if you believe that array consists of few numbers duplicated lots of times, say K numbers duplicated N/K times, then the running time of above algorithm is O(KN).
We can do better than worst case O(N^2) time. If we sort the elements in O(N logN) time and compare adjacent elements in O(N), we can solve the above problem in O(N logN) time.
sort(a) // O(N log N) time, O(1) space using Heap Sort k = a[1] for i = 2 to N if a[i] == k: a[i] = NaN else: k = a[i]Using heap sort, we can make the above algorithm run in O(N log N) time in worst case and it would take O(1) extra space (apart from N for the array). Using hashtable, we can solve this problem in O(N) time (amortized) and O(N) space (for the hash table). The algorithm looks like following:
for i = 1 to N if hash.has_key(a[i]) == false: print a[i] hash.put(a[i])Above algorithm stores an unseen element in hash-table and ignores the seen element. To check if an element is seen or not, hash table provides storage/retrieval methods in O(1) time.
Problem 2: Find intersection of K unsorted array of N elements each. Intersection consists of elements that appear in all the K arrays.
Problem 3: How to make a linked list support operations in O(1) time. The operations on linked list can be insertion after any arbitrary valued node, deletion of any arbitrary valued node.
Problem 4: Find all unique pairs of element in an array that sum to S. For ex. If array = {2,4,6,4,6} and S = 8 then answer is {(2,6), (4,4)}
Problem 5: Consider an array containing unique elements. Find a triplet of elements in the array that sum to S (extension of problem 4). Can hash-tables improve the running time of your algorithm.
Problem 6: Consider two strings of size M, N. Perform string matching in size O(M+N).
March 19, 2010
Dragon and Knight
Problem: A dragon and knight live on an island. This island has seven poisoned wells, numbered 1 to 7. If you drink from a well, you can only save yourself by drinking from a higher numbered well. Well 7 is located at the top of a high mountain, so only the dragon can reach it.
One day they decide that the island isn't big enough for the two of them, and they have a duel. Each of them brings a glass of water to the duel, they exchange glasses, and drink. After the duel, the knight lives and the dragon dies.
Why did the knight live? Why did the dragon die?
Solution: show
Problem: Now consider that Dragon and Knight are equally intelligent then who is expected to die.
Answer: Both survive the battle.
Solution: show
One day they decide that the island isn't big enough for the two of them, and they have a duel. Each of them brings a glass of water to the duel, they exchange glasses, and drink. After the duel, the knight lives and the dragon dies.
Why did the knight live? Why did the dragon die?
Solution: show
Dragon knows that knight cant reach well 7. So he thinks that if i give knight water from well 6 or 7, knight would die for sure. So he would get water from well 6 or 7 for knight.
Knight knew that no matter from which well he gives the water, dragon will go and drink from well 7 and live. So he got normal water for dragon and before duel drank water from well 1.
When they exchanged glasses and drank water, dragon rushed to well 7 and drank poison from it, thinking that it would cure the poison he just drank (but he drank normal water), so dragon died. Knight on the other hand already had poison from well 1 so what dragon gave him effectively cured him. So he lived.
Knight knew that no matter from which well he gives the water, dragon will go and drink from well 7 and live. So he got normal water for dragon and before duel drank water from well 1.
When they exchanged glasses and drank water, dragon rushed to well 7 and drank poison from it, thinking that it would cure the poison he just drank (but he drank normal water), so dragon died. Knight on the other hand already had poison from well 1 so what dragon gave him effectively cured him. So he lived.
Problem: Now consider that Dragon and Knight are equally intelligent then who is expected to die.
Answer: Both survive the battle.
Solution: show
After the battle, dragon drinks from well 1 in order to guarantee that he is poisoned. He then drinks from well 7 to cure himself. Even if knight gets normal water or poison from well 1-6, dragon is now cured and survives.
As for the knight, he drinks from well 1 before the duel. If dragon has got poison for him then he is cured. But if dragon got normal water then he has to ensure that he cures by drinking poison from higher than well 1. So after the battle, knight re-drinks from well 1. This guarantees that he is poisoned from well 1. Then he goes and drinks from well 2 to cure himself.
So both dragon and knight survive the battle.
As for the knight, he drinks from well 1 before the duel. If dragon has got poison for him then he is cured. But if dragon got normal water then he has to ensure that he cures by drinking poison from higher than well 1. So after the battle, knight re-drinks from well 1. This guarantees that he is poisoned from well 1. Then he goes and drinks from well 2 to cure himself.
So both dragon and knight survive the battle.
March 18, 2010
Number and Age of David's Kids
Problem: The product of the ages of David's children is the square of the sum of their ages. David has less than eight children. None of his children have the same age. None of his children is more than 14 years old. All of his children is at least two years old. How many children does David have, and what are their ages?
Answer: (12,6,4,2)
Solution: show
Source Code: david_kids_ages.py
Answer: (12,6,4,2)
Solution: show
A way to mathematically analyze this problem is to write the general form of equations, differentiate and maximize w.r.t to largest age. That leads to a Fibonacci number series. But after that complex heuristics have to be employed to devise the solution (and we cannot guarantee if that is the unique).
This is a constraint satisfaction problem. The best way to solve it is write the code. The following code solves the problem in general sense:
After running the code, we see the following solution
This is a constraint satisfaction problem. The best way to solve it is write the code. The following code solves the problem in general sense:
Function find_ages(kidAges) s = sum(kidAges) p = product(kidAges) if s*s == p print kidAges l = length(kidAges) if l >= 7 return else if l == 0 x = 14 else x = kidAges[l] end for i = 2 to x-1 kidAges1 = clone(kidAges) kidAges1.add(i) find_ages(kidAges1) end end // call find_ages([])
After running the code, we see the following solution
[12, 6, 4, 2]Note that in the above pseudo code, we cloned the kidAges array. If we dont clone then it wont work properly. Down side of cloning is that now we use a huge amount of memory. How do we write code without cloning? Check the python script attached below.
Source Code: david_kids_ages.py
March 11, 2010
Largest Sum of Consecutive Numbers
Problem: Given an array of N integers (both positive and negative), find the sub-sequence with largest sum.
For ex: Let A = {1 2 -5 4 5 -1 2 -11} then largest sum is 10 (start = 4, end = 7)
Solution: show
Problem: Given an array of N integers (both positive and negative), find the sub-sequence with largest absolute sum. This problem differs from the one at the top as we want the absolute sum (taking mod)
For ex: Let A = {1 2 -5 4 5 -1 2 -11} then largest absolute sum is 11.
Solution: show
For ex: Let A = {1 2 -5 4 5 -1 2 -11} then largest sum is 10 (start = 4, end = 7)
Solution: show
The naive solution to this problem can be formulated and coded in O(N^3) time. Consider every possible pair of start and end index and calculate sum of the elements within those index and keep track of the max sum. The following code performs that
If we save the cumulative sum of elements in an array CSUM, then CSUM[i] indicates the sum of elements enclosed in indices (1,i), CSUM[j]-CSUM[i-1] would then indicate the sum of elements enclosed in indices (i,j). The following algorithm captures this intuition.
max = - Inf imax = jmax = -1 for i = 1 to N for j = i to N // get sum of elements enclosed in index (i,j) s = 0 for k = i to j s += A[k] if s > max max = s imax = i jmax = jClearly the above algorithm is O(N^3). We can definitely improve the running time of the above algorithm. After observing carefully, we can see that an operation that is run several times (O(N^2) times) is computing the sum of elements between indices i,j. Can we do it quickly. Indeed we can.
If we save the cumulative sum of elements in an array CSUM, then CSUM[i] indicates the sum of elements enclosed in indices (1,i), CSUM[j]-CSUM[i-1] would then indicate the sum of elements enclosed in indices (i,j). The following algorithm captures this intuition.
csum[0] = 0 for i = 1 to N csum[i] = csum[i-1] + A[i] max = - Inf imax = jmax = -1 for i = 1 to N for j = i to N s = CSUM[j] - CSUM[i-1] if s > max max = s imax = i jmax = jThis is the best we can do with this approach. This problem can be done in O(N) time using a clever approach. Consider a sub-sequence with sum s > 0. Let the next element n is such that s + n < 0. Clearly this sub-sequence cannot be part of largest subsequence that contains s and n as removing s,n from that sub-sequence we get a larger sum. Easy to prove using contradiction. This idea is captured in the following O(N) algorithm
max = -Inf imax = jmax = -1 i_temp = -1 s_temp = 0 for i = 1 to N s_temp += A[i] if s_temp > max // keep track of max so far max = s_temp imax = i_temp jmax = i if s_temp < 0 // abort this sub-sequence s_temp = 0 i_temp = i + 1
Problem: Given an array of N integers (both positive and negative), find the sub-sequence with largest absolute sum. This problem differs from the one at the top as we want the absolute sum (taking mod)
For ex: Let A = {1 2 -5 4 5 -1 2 -11} then largest absolute sum is 11.
Solution: show
Simply keep track of max and min at the same time in the above O(N) solution.
Assuming that we have to use the above algorithm without modification, then we can take these steps to get max absolute sum in O(N) time:
Instead of trying the above two methods, we can try a very nice and simple approach
Assuming that we have to use the above algorithm without modification, then we can take these steps to get max absolute sum in O(N) time:
1. Run O(N) algorithm on array A to get the max 2. Create array B by multiply all numbers in A by -1 3. Run O(N) algorithm on array B to get the max 4. Pick the max between the output of two runs of the algorithm
Instead of trying the above two methods, we can try a very nice and simple approach
// build csum matrix as before csum[0] = 0 for i = 1 to N csum[i] = csum[i-1] + A[i] answer = max(csum) - min(csum)The answer is maximum element of csum minus the minimum element of csum.
February 3, 2010
Polynomial Evaluation
Problem: Given a polynomial with degree bound of n=m^r. How do we evaluate the polynomial at n different points such that the polynomial can be recovered from these n points (i.e. the coefficients can be calculated back using the points).
Note: We do not want to recover the coefficients but cleverly choosing the points as that impacts the running time of the algorithm.
Solution: show
Note: We do not want to recover the coefficients but cleverly choosing the points as that impacts the running time of the algorithm.
Solution: show
Let the polynomial be A(x) = a0 + a1 x + a2 x^2 + ... + a(n-1) x^(n-1).
A(x) has a degree bound of n. Its actual degree can be lower than n-1 if a(n-1) equals zero but that doesn't matter here.
Using Horner's rule, we can compute a polynomial in O(n) time by arranging the terms as follows:
A(x) = a0 + x (a1 + x (a2 + ... + x (a(n-2) + x a(n-1))))
So it will take O(n^2) time to compute polynomial at n different points. But we can use divide and conquer to do better than that i.e. in O(n lg n). We can create "m" different polynomials such that each term of ith polynomial has degree mod m = i.
=> A(x) = [ a0 + a(m) x^m + a(2m) x^2m ... ] +
[ a1 + a(m+1) x^(m+1) + a(2m+1) x^(2m+1) ... ] +
.
.
.
[ a(m-1) x^(m-1) + a(2m-1) x^(2m-1) + ... ]
=> A(x) = A[0](y) + x A[1](y) + ... + x^i A[i](y) + ... + x^(m-1) A[m-1](y)
where y = x^m, and
A[i](x) = a(i) + a(m+i) . x + ... + a(n-m+i) . x^(n/m-1)
Still the recurrence for evaluation each point is T(n) = m T(n/m) + O(m) = O(n). So evaluating polynomial at n points leads to O(n^2) which is not great.
Now we can choose the n points cleverly. Lets set w as principal nth root of unity.
=> w^n = 1
=> w = e^(2 . pi . i / n)
= cos (2 . pi /n) + i . sin(2 . pi / n)
The two interesting properties of w are
1. w^(n/2) = -1
2. [w^(kn/m + p)]^m = w^(kn + mp) = w^(mp)
So the n points on which the polynomial can be evaluated are w^0, w^1, ..., w^(n-1). Now we need to evaluate lower order polynomials of size n/m at only first m powers of w as afterwards it cycles again and we can reuse the computation. This comes from the observation that
A(w^p) = A[0](w^(mp)) + w^p A[1](w^(mp)) + ... + w^(m-1) A[m-1](w^(mp))
Also,
A(m^(kn/m+p)) = A[0](w^(mp)) + w^(kn/m+p) A[1](w^(mp)) + ... + [w^(kn/m+p)]^(m-1) A[m-1](w^(mp))
So A[0], ... A[m-1] are evaluated at p = { 0, 1, ..., m-1 }, and the computation is used for all the other n-m powers of w.
The general recurrence of this algorithm is:
T(n) = m T(n/m) + O(n)
=> T(n) = O(n lg n)
m = 2 works well in practice and is employed by Fast Fourier Transform in order to intrapolate the coefficients of the polynomial. This is generally used for multiplying two polynomials to get higher order polynomial. The steps are to first evaluate the lower order polynomials at n points and then use the multiplication of the point values to generate the coefficients of the higher order polynomial.
A(x) has a degree bound of n. Its actual degree can be lower than n-1 if a(n-1) equals zero but that doesn't matter here.
Using Horner's rule, we can compute a polynomial in O(n) time by arranging the terms as follows:
A(x) = a0 + x (a1 + x (a2 + ... + x (a(n-2) + x a(n-1))))
So it will take O(n^2) time to compute polynomial at n different points. But we can use divide and conquer to do better than that i.e. in O(n lg n). We can create "m" different polynomials such that each term of ith polynomial has degree mod m = i.
=> A(x) = [ a0 + a(m) x^m + a(2m) x^2m ... ] +
[ a1 + a(m+1) x^(m+1) + a(2m+1) x^(2m+1) ... ] +
.
.
.
[ a(m-1) x^(m-1) + a(2m-1) x^(2m-1) + ... ]
=> A(x) = A[0](y) + x A[1](y) + ... + x^i A[i](y) + ... + x^(m-1) A[m-1](y)
where y = x^m, and
A[i](x) = a(i) + a(m+i) . x + ... + a(n-m+i) . x^(n/m-1)
Still the recurrence for evaluation each point is T(n) = m T(n/m) + O(m) = O(n). So evaluating polynomial at n points leads to O(n^2) which is not great.
Now we can choose the n points cleverly. Lets set w as principal nth root of unity.
=> w^n = 1
=> w = e^(2 . pi . i / n)
= cos (2 . pi /n) + i . sin(2 . pi / n)
The two interesting properties of w are
1. w^(n/2) = -1
2. [w^(kn/m + p)]^m = w^(kn + mp) = w^(mp)
So the n points on which the polynomial can be evaluated are w^0, w^1, ..., w^(n-1). Now we need to evaluate lower order polynomials of size n/m at only first m powers of w as afterwards it cycles again and we can reuse the computation. This comes from the observation that
A(w^p) = A[0](w^(mp)) + w^p A[1](w^(mp)) + ... + w^(m-1) A[m-1](w^(mp))
Also,
A(m^(kn/m+p)) = A[0](w^(mp)) + w^(kn/m+p) A[1](w^(mp)) + ... + [w^(kn/m+p)]^(m-1) A[m-1](w^(mp))
So A[0], ... A[m-1] are evaluated at p = { 0, 1, ..., m-1 }, and the computation is used for all the other n-m powers of w.
The general recurrence of this algorithm is:
T(n) = m T(n/m) + O(n)
=> T(n) = O(n lg n)
m = 2 works well in practice and is employed by Fast Fourier Transform in order to intrapolate the coefficients of the polynomial. This is generally used for multiplying two polynomials to get higher order polynomial. The steps are to first evaluate the lower order polynomials at n points and then use the multiplication of the point values to generate the coefficients of the higher order polynomial.
February 15, 2009
Longest Monotone Sequence and Palindromes
Problem: Given an array of integers, find the longest subsequence of elements which monotonically increases. for ex. array = {1 4 8 2 5 7 3 4 6}, the longest subsequence = {1 2 3 4 6}
Solution: show
Problem: Given a string, find the longest size palindrome in the string. for ex. string = "aaabbccaccbaaa", solution = "bccaccb"
Solution: show
Solution: show
Let us construct the solution using the LCS problem. LCS is longest common subsequence, which says that given two string X, Y find the longest common subsequence between them, for ex. if X = {a b c a b c} and Y = {b a c b a c} then a possible LCS(X, Y) = {b c b c}. LCS problem can be solved using dynamic programming in O(n^2).
We can solve the longest monotonically increasing sequence problem using the solution to LCS. Let X = original string. Let Y = sort X (increasing order). LCS(X, Y) gives the desired answer (easy to observe).
There is a faster O(n log n) algorithm (for details check here)
We can solve the longest monotonically increasing sequence problem using the solution to LCS. Let X = original string. Let Y = sort X (increasing order). LCS(X, Y) gives the desired answer (easy to observe).
There is a faster O(n log n) algorithm (for details check here)
Problem: Given a string, find the longest size palindrome in the string. for ex. string = "aaabbccaccbaaa", solution = "bccaccb"
Solution: show
The trivial solution is to check if a palindrom of all possible size starts from a given index. For a index i in array, the possible possitions to test are i < j <= n. Total time taken = O(n^3). We can solve this problem using Longest Common Substring and suffix trees to solve this problem in O(n) time.
January 31, 2009
Loops in Linked List
Problems with linked lists occur mainly while building or using a bad memory manager. Let us discuss two well known linked list problems.
Problem: Find if two linked lists short at some node. Find the node at which the two lists short.
Solution: show
Problem: A linked list might contain a loop. How do we detect existence of the loop and find the node from which loop starts. Propose an O(n) time algorithm that doesn't take extra space and doesn't modify the linked list.
Solution: show
Problem: Find if two linked lists short at some node. Find the node at which the two lists short.
Solution: show
Subtract the length of smaller list from the larger one. Move a pointer ahead on larger list by the difference of lengths. Now put a pointer on head of smaller list. More two pointers together. The point at which they meet is the point where they short. The following code gives the idea.
d = length(head1) - length(head2) l1, l2 = head1, head2 if d < 0 d *= -1 l1, l2 = head2, head1 ptr1 = l1 for i <- 1 to d ptr1 = ptr1->next ptr2 = l2 while ptr1 != null if ptr2 == ptr1 return ptr1 ptr1 = ptr1->next ptr2 = ptr2->nextWe can use this idea to find the shorted node to find the solution to next problem.
Problem: A linked list might contain a loop. How do we detect existence of the loop and find the node from which loop starts. Propose an O(n) time algorithm that doesn't take extra space and doesn't modify the linked list.
Solution: show
To find if the linked list contains a loop, run two pointers over the linked list, one that move one node at a time and other that moves two nodes at a time. Both of these pointers meet if the linked list contains a loop, otherwise the pointer moving with twice speed reaches the end while the slower pointer reaches midway.
fastptr = head slowptr = head while fastptr != null and fastptr->next != null fastptr = fastptr->next->next slowptr = slowptr->next if fastptr == slowptr return "loop exists" return "loop doesnt exist"If loop exists, then fast pointer meets the slow pointer before slow pointer completes one rotation of the loop. This fact can be easily proved. Once the pointers meet, we can count the number of nodes in the loop.
loopNodeCount = 0 do loopNodeCount += 1 slowptr = slowptr->next while slowptr != fastptr return loopNodeCountNow using the trick used in the two linked list shorting problem above, we start two pointers, one from the head and other after advancing "loopNodeCount" nodes ahead. The node at which both the pointers meet is the loop node.
advptr = head for i in range(1, loopNodeCount) advptr = advptr->next normalptr = head while normalptr != advptr normalptr = normalptr->next advptr = advptr->next return advptr
Monty Hall Problem
This is a very famous probability problem. This problem illustrates that probability is sometimes more convincing than "what we perceive as obvious". I'll post another problem that defeats "obvious reasoning".
Monty Hall problem: Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
Solution: show
Red Green Cards: A box has three cards. First card has both sides green, second card has both sides red, third card has one side green and one side red. A card is picked at random and its one side is observed to be green. What is the probability that the other side of the card is also green.
Solution: show
Monty Hall problem: Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
Solution: show
Well for a mathematician stuck on the game show, I would suggest tossing the unbiased coin. The reason being that Game Show host might be clever and seeing that mathematician got the prize behind door 1, would play this card in order to entice the mathematician to switch. Mathematician can be well aware of this trick of Game show host and might not switch. But game show host thought so .. errr. This is a recursive logic and would go to infinity.
Under the assumption that game show host has no hidden motive and performs this step always, the best choice is to switch. That increases the chances by 33.33%. There is a very simple explanation to this. The prize could be behind any doors. Since you pick door 1. Assume game show hosts says that either keep door 1 or take both door 2 and 3, you would go for both door. That doubles up wining chances to 66%. He opens door 3 for you and you open door 2 when you say switch.
Under the assumption that game show host has no hidden motive and performs this step always, the best choice is to switch. That increases the chances by 33.33%. There is a very simple explanation to this. The prize could be behind any doors. Since you pick door 1. Assume game show hosts says that either keep door 1 or take both door 2 and 3, you would go for both door. That doubles up wining chances to 66%. He opens door 3 for you and you open door 2 when you say switch.
Red Green Cards: A box has three cards. First card has both sides green, second card has both sides red, third card has one side green and one side red. A card is picked at random and its one side is observed to be green. What is the probability that the other side of the card is also green.
Solution: show
Well it seems that answer should be 1/2. But its 2/3. For explanation, let us label card 1 as g1|g2, second card as r1|r2, third card as g3|r3. Now since one side of the card is green. It can either be g1, g2, g3. Three events and two point to first card.
January 30, 2009
Finding Second Smallest Element
Jargon: Order Statistics problem is to find the kth smallest element in an unsorted array A[1..n]. This problem can be solved in O(n) time in average case using randomized select algorithm or in O(n) time in worst case time using an algorithm that chooses the pivot element carefully.
Problem: How to find second smallest element in an array A[1..n] of size n. We would like to do in much less than 2n comparisons.
Answer: Time Complexity = n + ceil(log n) - 2
Solution: show
Code for n + log n - 2 algorithm: second_smallest.py
Alternate implementation using linked list: second_smallest_linkedlist.py
Problem: How to find second smallest element in an array A[1..n] of size n. We would like to do in much less than 2n comparisons.
Answer: Time Complexity = n + ceil(log n) - 2
Solution: show
Finding the smallest element is trivial. The following code would find the smallest element using n-1 comparisons.
Another algorithm that is very easy to implement but sadly takes 2n - 2logn - 1 in terms of array element comparisons. Simply build a min heap (rooted at 1). Return the smaller of numbers at index 2 or 3 in the array. The cost of min heap building is = 2 * [n/4 + 2n/8 + ...... + (logn - 1) * n / 2^logn]. The solution to this arithmetic - geometric series is 2n - 2logn - 2. Comparing elements at 2nd and 3rd index adds one to the cost. Its very easy to implement though.
min = A[1] for i = 2 to n if A[i] < min min = A[i] return minThe trivial algorithm to find the second minimum is to keep another variable (smin) along with min and if an element knocks out smin then check if it knocks out min and accordingly do the book-keeping. It can be done using 2n-3 comparisons and takes constant space. A point to observe is that the second smallest element is knocked out by the smallest one at some stage. If we preserve which element is knocked out by whom then finding the second smallest element would just require us to find the smallest amongst the elements knocked out by the smallest element. But in order to do that we need to build the algorithm in a way that we are playing a pair wise tournament. Lets say we play a tournament and knockout n/2 elements after n/2 comparisons. The recurrence relation looks like this:
T(n) = T(n/2) + n/2Solving this recurrence gives that T(n) = n - 1. The height of the recurrence tree is lg n. So if we just check these elements knocked by root (min) we get the second minimum.For. E.x.
1 5 2 6 \ / \ / 1 2 \ / 1As soon as we build the tree using n-1 comparisons, we can check the second largest amongst the number knocked out by 1. i.e. 2, 5. so the second minimum is 2. The number of comparison is n + ceil(log n) - 2, where log is taken to the base 2. We implement this algorithm by maintaining a list of indices knocked by an element. The python code is towards the end of this post.
Another algorithm that is very easy to implement but sadly takes 2n - 2logn - 1 in terms of array element comparisons. Simply build a min heap (rooted at 1). Return the smaller of numbers at index 2 or 3 in the array. The cost of min heap building is = 2 * [n/4 + 2n/8 + ...... + (logn - 1) * n / 2^logn]. The solution to this arithmetic - geometric series is 2n - 2logn - 2. Comparing elements at 2nd and 3rd index adds one to the cost. Its very easy to implement though.
Code for n + log n - 2 algorithm: second_smallest.py
Alternate implementation using linked list: second_smallest_linkedlist.py
January 23, 2009
String Permutations
Printing all permutations of a string is a very common interview question. We'll discuss this problem and some interesting variations of it. The original problem of string permutation says, "print all permutations of a string". As an example, if the string is "abc" there are 6 permutations {abc, acb, bac, bca, cab, cba}. Assume that string contains no duplicate elements.
Solution: show
Source Code: permute_unique_chars.py
Lets us assume that string can contain duplicates. How do we print all non-redundant permutations of the string. For ex. If string is "abab" the permutations are {baab, abba, abab, baba, bbaa, aabb}
Solution: show
Source Code: permute_chars.py
Some other constraints could be added to make the problem more interesting. One such constrain could be of partial ordering of characters. For Ex. 'a' should appear before 'c', and 'b' should appear before 'c' in all permutations. There can exist ordering that leads to no permutation of the string such as 'a' precedes 'b', 'b' precedes 'c', 'c' precedes 'a'. Additionally implementing the permutation algorithm in these cases require checking for ordering violation at each permutation level or just at the leaf of recursion, depending on whether checking for ordering violations is more costly or generating all permutations.
An interesting problem, that talks of special kind of ordering and can be very efficiently computed is as follows: Assume that the string is made up of equal number of '{' and '}'. You need to print all permutations of this string which could satisfy a C compiler i.e. an ending bracket is balanced by an opening bracket. For ex. for '{{}}' the permutations are [{{}}, {}{}].
Solution: show
Source Code: permute_brackets.py
Solution: show
There are many ways in which this problem can be solved. The most easiest of the ways is to code it using simple yet powerful recursion technique.
There are other ways in which string permutation can be printed that require circular shifting, but it cant get any simpler than above.
function permute(str, d) if d == length(str) print str else for i <- d to length(str) swap(str[d] <-> str[i]) // swap character for permutation permute(str, d + 1) swap(str[d] <-> str[i]) // undo swap for parent callAbove code performs the swapping operation twice. First time to generate all possible permutations of character at that level and second time to restore to the original string. It is important to restore to the original string (as passed to this call), otherwise the algorithm ends up printing redundant permutations. Additionally, this code assumes that there are no duplicates in the string, else it fails.
There are other ways in which string permutation can be printed that require circular shifting, but it cant get any simpler than above.
Source Code: permute_unique_chars.py
Lets us assume that string can contain duplicates. How do we print all non-redundant permutations of the string. For ex. If string is "abab" the permutations are {baab, abba, abab, baba, bbaa, aabb}
Solution: show
Clearly the solution to first problem fails here. This is due to fact that we generate all permutations of the string after the swapping step. Same character could get swapped to dth spot again, leading to redundant permutations. A very simple trick solves the game. The trick is to pre-sort the string on alphabetical order. Now we need to ensure that while we are swapping, to generate permutation for that particular character, if that matches the last swapped character we ignore generating permutations. Effectively, we are doing a cyclic shift and before the end of the call we reset the cyclic shift. The following code implements the same:
function permute(str, d) if d == length(str) print str else lastSwap = Nil for i <- d to length(str) if lastSwap == str[i] continue else lastSwap = str[i] swap(str[d] <-> str[i]) // swap character for permutation permute(str, d + 1) for i <- d to length(str)-1 str[i] = str[i+1] str[length(str)] = lastIf the string is pre-sorted alphabetically then the above algorithm works magically. Note that in this case, after permute is called, we do not undo the swap. This ensures that we are cyclic shifting and hence the intermediate str from d to l is also sorted. After the for loop call, we reverse the cyclic shift. The reason is that sorting ensures that if there are duplicates then the second occurrence of the duplicate element is picked immediately after we finish generating permutations for the first occurrence of the duplicate element. If sorting is not allowed, then you need to maintain a local hashtable (local to each function call), that maintains characters swapped to dth spot and ensures that duplicate candidates are discarded. Presort and make life simple.
Source Code: permute_chars.py
Some other constraints could be added to make the problem more interesting. One such constrain could be of partial ordering of characters. For Ex. 'a' should appear before 'c', and 'b' should appear before 'c' in all permutations. There can exist ordering that leads to no permutation of the string such as 'a' precedes 'b', 'b' precedes 'c', 'c' precedes 'a'. Additionally implementing the permutation algorithm in these cases require checking for ordering violation at each permutation level or just at the leaf of recursion, depending on whether checking for ordering violations is more costly or generating all permutations.
An interesting problem, that talks of special kind of ordering and can be very efficiently computed is as follows: Assume that the string is made up of equal number of '{' and '}'. You need to print all permutations of this string which could satisfy a C compiler i.e. an ending bracket is balanced by an opening bracket. For ex. for '{{}}' the permutations are [{{}}, {}{}].
Solution: show
The hard question is how many such permutations exist. For N open and N close brackets, the number of such permutations is Nth Catalan number = 1/(n+1) * factorial(2*N)/factorial(N)^2. The solution is derived from the problem: Number of paths in rectangular grid.
Permutation problems can be easily solved using recursion. The trick is to formulate all the possible cases at a particular level and code according to the formulation. The steps used below illustrate this approach. Let's formulate the problem in general sense. Assume that at a given step of recursion, we have n open brackets and m closed brackets that are not used so far. Now we can formuate each and every case for a given (n,m) and code the possible substeps.
Permutation problems can be easily solved using recursion. The trick is to formulate all the possible cases at a particular level and code according to the formulation. The steps used below illustrate this approach. Let's formulate the problem in general sense. Assume that at a given step of recursion, we have n open brackets and m closed brackets that are not used so far. Now we can formuate each and every case for a given (n,m) and code the possible substeps.
Problem Formulation: (n,m) ---> NOT POSSIBLE, if m < n (n,m) ---> PRINT AND DONE, if m = 0 and n = 0 (n,m) ---> (n-1, m) if n > 0 + (n, m-1) if m > 0 and m > nThe above formulation is generic and takes into account all possible input values of (n,m). If we put following constraint on the initial value of n and m such as following, then actual implementation can omit some of the above cases:
Constraints: n > 0 m == nThe actual implementation that follows above constraints and problem formuation:
function permute(str, n, m) if m == 0 print str else if n > 0 permute (str + '{', n - 1, m); if m > n permute (str + '}', n, m - 1);To enforce the above mentioned constraints, permute function should be kept as inner/private function and the actual function that should be exposed is as follows:
function GenerateCStyleBrackets(N) if N <= 0 return str = new String[2 * N] permute(str, N, N) // str is passed as pointer.The memory cost is O(N). The maximum stack depth is also O(N). So total space cost is O(N).
Source Code: permute_brackets.py
January 22, 2009
Party Friends
Problem: In any party, there exists two people with same number of friends. Is is true or false. Assume that party has more than one people.
Solution: show
Solution: show
Yes, The statement is true. We can prove it by contradiction.
Let there be N people at the party.
Let all the N people have different number of friends.
Since a person can have maximum of N-1 friends and a minimum of 0 friends. Hence the possible number of friends for a person is 0, 1, 2, ..., N-1. Since, we are assuming that there are no two people with same number of friends, so everyone has different number of friends. This means that someone has 0 friend, someone has 1 friend, so on and someone has N-1 friends. But 0 & N-1 cannot coexist. This is because if a person has N-1 friends then that means he is friend with every one, including the one who has 0 friends, which is a contradiction. Hence by pigeon hole principle at least two people should have the same number of friends.
Let there be N people at the party.
Let all the N people have different number of friends.
Since a person can have maximum of N-1 friends and a minimum of 0 friends. Hence the possible number of friends for a person is 0, 1, 2, ..., N-1. Since, we are assuming that there are no two people with same number of friends, so everyone has different number of friends. This means that someone has 0 friend, someone has 1 friend, so on and someone has N-1 friends. But 0 & N-1 cannot coexist. This is because if a person has N-1 friends then that means he is friend with every one, including the one who has 0 friends, which is a contradiction. Hence by pigeon hole principle at least two people should have the same number of friends.
Subscribe to:
Posts (Atom)