Patterns in the Human Genome Nucleotide Sequence

At its simplest level, the Human Genome Nucleotide Sequence is a (long!) sequence of four symbols, G (Guanine), A (Adenine), C (Cytosine) and T (Thymine). One way of analysing such sequences is to examine how much one can compress sub-sequences. As an example, one can compress easily the sequence AAAAAAAAAA but not so easily AGGTACAGTG. This project will use commonly available compression software (Lempel-Ziv compression such as pkzip or gzip) to check recursively varying sequence lengths against one another. It will be programmed in PHP. Please contact michael.mcgettrick@nuigalway.ie or drop by room 437 for more information.