Sequence Alignment Algorithms
Algorithms for the pairwise alignment of sequences are the basis for the
methods used by modern DNA or protein database search websites. By alignment,
we mean pairing up the residues of one sequence with the residues of the
other sequence, introducing gaps if necessary, so that the sequences match
as closely as possible. For example, the sequences {CATTAGTAG} and {CTAACTTAAG}
can be aligned as shown below. This alignment contains 5 matches and 2
mismatches.
C
|
A |
T |
T |
- |
A |
G |
- |
- |
T |
A |
G |
C |
- |
- |
T |
A |
A |
C |
T |
T |
A |
A |
G |
This website demonstrates two of the most well-known algorithms for
pairwise alignment of two sequences:
-
Needleman-Wunsch (1970) aligns the two sequences, with gaps, to
achieve the greatest number of matches.
-
Smith-Waterman (1981) uses a scoring function involving mismatch
and gap penalties, and finds the highest-scoring local alignment of subsequences
of the two sequences.
The algorithms rely on a mathematical technique called dynamic programming.
Essentially, this is based on the principle that to find the best alignment of two sequences
containing 10 letters each,
we find the best alignment of the first 9 letters of each sequence, and then work out the
best way of adding the extra letters. We can thus build up the optimal
alignment of two long sequences, by first aligning just the first letters of each (which is
an easy problem), then adding more letters one at a time.
Please select the algorithm you want to use, from the list on the left-hand
menubar.
Comments and feedback welcome: please email me at
M.B.Reed@bath.ac.uk.