03-511/711, 15-495/856 Course Notes - Nov. 7th, 2006


Amino Acid Substitution Matrices


Scoring overview

Scoring in pairwise alignment

Jukes-Cantor

Markov model of point mutations in nucleic acid sequence

Kimura 2-parameter model

Amino Acid Substitution Matrices

Overview

Goal: Amino acid similarity matrices that take into account

Markov models of sequence evolution require

Two commonly used families of amino acid substitution matrices

Each family is parameterized by evolutionary distance. Both use the following approach
  1. "Trusted" MSA's (ungapped)
  2. Count substitutions, correcting for sample bias in choice of sequences
  3. Estimate substitution frequencies
      PAM - evolutionary model
      BLOSUM - directly from data
  4. Construct Log odds scoring matrix

PAM matrices


PAM2 matrix:

P2[j,k] = ΣP[j,l] P[l,k] = (P1[j,k])2

PAMn matrix:

Pn[j,k] = (P1[j,k])n

  • Obtain log odds scoring matrix

    Let q(j,k) be the probability that, at a given position, we see amino acid j aligned with amino acid k;
    i.e., that amino acid j is replaced by amino acid k after n PAMs of mutational change. Then the PAM n scoring matrix is

    S[j,k] = λ log q[jk]
                          pj pk


              = λ log P[j,k]
                           pk

    where λ is a constant. Typically λ = 10 and the entries of S[] are rounded to the nearest integer.

  • Are PAM matrices symmetric?

    Last modified: November 7, 2006.
    Maintained by Dannie Durand (durand@cs.cmu.edu) and Annette McLeod.