next up previous contents index
Next: Primes Up: Examples Previous: Examples

String Searching

   

The first example is a function that finds all occurrences of a word in a string (a sequence of characters). The function string_search(w,s) (see Figure 6) takes a search word w and a string s, and returns the starting position of all substrings in s that match w. For example,

   figure3977
Figure: Finding all occurrences of the word w in the string s.

The algorithm starts by considering all positions between 0 and #s-#w as candidates for a match (no candidate could be greater than this since it would have to match past the end of the string). The candidates are stored as pointers (indices) into s of the beginning of each match. The algorithm then progresses through the search string, using recursive calls to next_char, narrowing the set of candidate matches on each step.

Based on the current candidates, next_char narrows the set of candidates by only keeping the candidates that match on the next character of w. To do this, each candidate checks whether the i tex2html_wrap_inline9591 character in w matches the i tex2html_wrap_inline9593 position past the candidate index. All candidates that do match are packed and passed into the recursive call of next_char. The recursion completes when the algorithm reaches the end of w. The progression of candidates in the "foo" example would be:

20pt

tabular3999

Lets consider the complexity of the algorithm. We assume #w = m and #s = n. The depth complexity of the algorithm is some constant times the number of recursive calls, which is simply tex2html_wrap_inline9599 . The work complexity of the algorithm is the sum over the recursive calls, of the number of candidates in each recursive call. In practice, this is usually tex2html_wrap_inline9601 , but in the worst case this can be the product of the two lengths tex2html_wrap_inline9603 (the worst case can only happen if most of the characters in w are repeated). There are parallel string-searching algorithms that give better bounds on the parallel time (depth complexity), and that bound the worst case work complexity to be linear in the length of the search string [15, 44], but these algorithms are somewhat more complicated.



next up previous contents index
Next: Primes Up: Examples Previous: Examples



Jonathan Hardwick
Tue Nov 28 13:57:00 EST 1995