Goal: Amino acid similarity matrices that take into account
Markov models of sequence evolution require
Use data to infer transition probabilities for amino acids.
Two commonly used families of amino acid substitution matrices
Ajk = (1/|T|) ΣT ATjk
P1[j,k] = mj Aj,k ---------- Σi ≠ j Aj,i P1[j,j] = 1 - mj
mj = 1 Σi ≠ j Aj,i ------- ------------- n pj z Σh Σi ≠ h Ah,iwhere pj is the background frequency of j and n is the length of the MSA. Select the nomalization factor, z, so that
Σj = 1 to 20 (pj mj) = 0.01
yielding
mj = 0.01 1 Σi ≠ j Aj,i --- ------------ pj Σh Σi ≠ h Ah,i
Note - P[j,k] is a Markov chain
P2[j,k] = ΣP[j,l] P[l,k] = (P1[j,k])2
Pn[j,k] = (P1[j,k])n
Let qn(j,k) = pj Pn[j,k] be the probability that, at a given position, we see amino
acid j aligned with amino acid k;
i.e., that amino acid
j is replaced by amino acid k after n PAMs of
mutational change. Then the PAM n scoring matrix is
S[j,k] = λ log q[jk]
pj pk
= λ log Pn[j,k]
pk
where λ is a constant. Typically λ = 10 and the entries of S[] are
rounded to the nearest integer.