**Problem 1**: Given a large paragraph of words containing n words (separated by space), and k keywords. Find the smallest distance between these keywords in the paragraph.

To illustrate the problem, consider that the paragraph is A B A C E D O A B. Let us say that we are interested in two keywords A and D. Then the closest they appear is at position 8 and 6 (D O A) and the minimum distance is 1 (i.e. number of non-keyword words between them).

Build an efficient algorithm with O(k) space and O(n log k) running cost.

**Solution**:

show
To solve this problem, we need to register the last seen position of each of k keywords, as we read the paragraph word-by-word. If the next word, and if that matches a keyword, we need to update the last seen position of the matching keyword and also compute the minimum of the position register. That helps us find the smallest distance.

In order to implement this efficiently, we need a hash map and min heap (tree based implementation, not array based). Following is the code to solve this problem.

Let K[i] indicate ith keyword.
Let W[i] indicate ith word in the paragraph.
// init map and heap
map = {}
heap = []
for i in 1 to k
// init heap with -Inf as position of the keyword
node = heap_node()
node.val = -Inf
// map stores the keyword and its pointer to node in the heap
heap.put(node)
map.put(K[i], node)
end
idx_start = 0
idx_end = 0
min_dist = Inf
for i in 1 to n
w = W[i]
// check if w is one of the keywords, if not ignore
if !map.contains_key(w)
continue
end
// get the heap node corresponding to the
// key W[i] and update its position
node = map.get(w)
node.val = i
// re-heapify the min heap
heap.heapify()
// min of all positions
idx = heap.min()
// if idx = -Inf then some keywords are not seen yet.
if idx == -Inf
continue
end
// compute distance between keywords
dist = i - idx - k + 1
if dist < min_dist
idx_start = idx
idx_end = i
min_dist = dist
end
end

The above algorithm requires O(k) memory for the map and the heap. For each word, it requires O(log k) to re-heapify the min heap.

**Problem 2**: Given a large paragraph of words (separated by space). Consider N tuples of names and interests. If a given person and their interest appears within K words then its a match else not. Find all matches from the paragraph.

To illustrate the problem, consider that the paragraph is "John likes swimming and Jack likes Tennis". Let the interest map is <john swimming>, <john tennis>, <jack swimming>, <jack tennis>. For k = 3, 4, 5, 6 the matches found in the paragraph are <john swimming>, <jack swimming>, <jack tennis>.

Propose an efficient algorithm for two cases: k is small, k is very large.

Excellent website. Plenty of helpful info here. I am sending it to several pals ans also sharing in delicious.

ReplyDeleteAnd obviously, thank you for your sweat!

Stop by my page ... anchor