Given s[1..m] and t[1..n], α(s',t') is an alignment
if
s', t' in (∑')*
|s'| = |t'| = l ≥ max{m,n}
s is the subsequence obtained by removing "_" from s'
(ditto for t and t')
There is no value of i for which s'[i] = t'[i] = "_".
Goal: Find the optimal alignment w.r.t. a given scoring scheme
Distance Based Scoring
D[s,t] = ∑(d[s'[i],t'[i]),
i = 1..l
d(x,x) = 0
d(x,y) ≥ 0
d(x,"_") ≥ 0
d(x,z) < d(x,y) + d(y,z)
NOTE:
If d(x,y) = 1 and d(x,"_") = 1, then
D(s,t) is the minimum number of operations required to transform
s into t, where the operations are substitution, insertion and
deletion. This is called the "edit distance".
If d(x,y) ≥ 1 and d(x,"_") ≥
1, then it is called the "weighted edit distance".
D[s,t] is a metric. It satisfies the triangle inequality.
D[s,t] is the sum of the distances for positions in
the alignment. This implies that we assume positional
independence.
Dynamic Programming Algorithm for Global Alignment
Initialization
D[0,t[j]] = D[0,t[j-1]] + d(t[j],"_")
D[s[i],0] = D[s[i-1],0] + d("_",s[i])
Recurrence
D[i,j] = min {
D[i-1,j] + d(s[i], "_")
D[i-1,j-1] + d(s[i], t[j])
D[i,j-1] + d("_", t[j])
Compute score of all pairs of prefixes in O(m • n) time.
D[m,n] gives the score of the optimal alignment.
Trace back through the alignment matrix in O(m+n) time to obtain
the optimal alignment.
There may be more than one optimal alignment
Last
modified: September 1st, 2011
Maintained by Dannie Durand (durand@cs.cmu.edu) and Annette McLeod.