Sequence Alignment Algorithms

Algorithms for the pairwise alignment of sequences are the basis for the methods used by modern DNA or protein database search websites. By alignment, we mean pairing up the residues of one sequence with the residues of the other sequence, introducing gaps if necessary, so that the sequences match as closely as possible. For example, the sequences {CATTAGTAG} and {CTAACTTAAG} can be aligned as shown below. This alignment contains 5 matches and 2 mismatches.

C	A	T	T	-	A	G	-	-	T	A	G
C	-	-	T	A	A	C	T	T	A	A	G

This website demonstrates two of the most well-known algorithms for pairwise alignment of two sequences:

Needleman-Wunsch (1970) aligns the two sequences, with gaps, to achieve the greatest number of matches.
Smith-Waterman (1981) uses a scoring function involving mismatch and gap penalties, and finds the highest-scoring local alignment of subsequences of the two sequences.

The algorithms rely on a mathematical technique called dynamic programming. Essentially, this is based on the principle that to find the best alignment of two sequences containing 10 letters each, we find the best alignment of the first 9 letters of each sequence, and then work out the best way of adding the extra letters. We can thus build up the optimal alignment of two long sequences, by first aligning just the first letters of each (which is an easy problem), then adding more letters one at a time.

Please select the algorithm you want to use, from the list on the left-hand menubar.

Comments and feedback welcome: please email me at M.B.Reed@bath.ac.uk.