Patterns in the Human Genome Nucleotide Sequence
At its simplest level, the Human Genome Nucleotide Sequence is a (long!)
sequence of four
symbols, G (Guanine), A (Adenine), C (Cytosine) and T (Thymine).
One way of analysing such sequences is to examine how much one can
compress sub-sequences. As an example, one can compress easily the
sequence AAAAAAAAAA but not so easily AGGTACAGTG. This project will
use commonly available compression software (Lempel-Ziv compression such as
pkzip or
gzip) to check recursively varying sequence lengths against one another.
It will be programmed in PHP.
Please contact michael.mcgettrick@nuigalway.ie or drop by room 437 for more information.