03-511/711, 15-495/856 Course Notes - Oct. 27th, 2011


Amino Acid Substitution Matrices


Overview

Goal: Amino acid similarity matrices that take into account

Markov models of sequence evolution require

Two commonly used families of amino acid substitution matrices

Each family is parameterized by evolutionary distance. Both use the following approach
  1. "Trusted" MSA's (ungapped)
  2. Count substitutions, correcting for sample bias in choice of sequences
  3. Estimate substitution frequencies
      PAM - evolutionary model
      BLOSUM - directly from data
  4. Construct Log odds scoring matrix

PAM matrices


PAM2 matrix:

P2[j,k] = ΣP[j,l] P[l,k] = (P1[j,k])2

PAMn matrix:

Pn[j,k] = (P1[j,k])n

  • Obtain log odds scoring matrix

    Let qn(j,k) = pj Pn[j,k] be the probability that, at a given position, we see amino acid j aligned with amino acid k;
    i.e., that amino acid j is replaced by amino acid k after n PAMs of mutational change. Then the PAM n scoring matrix is

    S[j,k] = λ log q[jk]
                          pj pk


              = λ log Pn[j,k]
                              pk

    where λ is a constant. Typically λ = 10 and the entries of S[] are rounded to the nearest integer.

  • Are PAM matrices symmetric?

    Last modified: October 31, 2011
    Maintained by Dannie Durand (durand@cs.cmu.edu) and Annette McLeod.