December 1, 2013

Finding Relevant Keywords from Large Corpus of Text

Problem 1: Given a large paragraph of words containing n words (separated by space), and k keywords. Find the smallest distance between these keywords in the paragraph.

To illustrate the problem, consider that the paragraph is A B A C E D O A B. Let us say that we are interested in two keywords A and D. Then the closest they appear is at position 8 and 6 (D O A) and the minimum distance is 1 (i.e. number of non-keyword words between them).

Build an efficient algorithm with O(k) space and O(n log k) running cost.

Solution: show

Problem 2: Given a large paragraph of words (separated by space). Consider N tuples of names and interests. If a given person and their interest appears within K words then its a match else not. Find all matches from the paragraph.

To illustrate the problem, consider that the paragraph is "John likes swimming and Jack likes Tennis". Let the interest map is <john swimming>, <john tennis>, <jack swimming>, <jack tennis>. For k = 3, 4, 5, 6 the matches found in the paragraph are <john swimming>, <jack swimming>, <jack tennis>.

Propose an efficient algorithm for two cases: k is small, k is very large.

1 comment:

  1. Excellent website. Plenty of helpful info here. I am sending it to several pals ans also sharing in delicious.

    And obviously, thank you for your sweat!

    Stop by my page ... anchor

    ReplyDelete