The bioinformatics database: www.ncbi.nlm.nih.gov/
Here's a good example of what you can get out of it: E. Coli K-12 MG1655
Sequence alignment is trying to match a DNA or amino acid sequence, even though the sequences might not be exactly the same, otherwise it would be a straight up string-search algorithm.
This is fundamental in bioinformatics for two reasons:
- when you sequence the DNA of a new species, you can guess what each protein does by comparing it with similar proteins in other species that you have already studied
- when doing DNA sequencing, and specially short-read DNA sequencing, you generally need to align the reads to reference genomes to know where you are inside the entire genome, and then be able to spot mutations, notably single-nucleotide polymorphisms