>>>> (STATISTICAL) PATTERNS IN THE HUMAN GENOME NUCLEOTIDE SEQUENCE
At its simplest level, the Human Genome Nucleotide Sequence is a (long!) sequence of four symbols, G (Guanine), A (Adenine), C (Cytosine) and T (Thymine). This sequence lends itself to many varying analyses, not all of which have been explored:

Simple statistical features, e.g. average occurence of each letter, relative frequency of occurrence of different short subsequences, etc.
Markovian distribution analyses, essentially the probabilities of a given letter being followed by another letter
Z₄ analysis, i.e. mathematical analysis based upon identifying A,C,T,G with the digits 0,1,2,3
Calculation of the Kolmogorov complexity of different (gene) sequences, i.e. (in a loose sense) the degree of randomness in the sequence.

Based upon the data at http://www.ebi.ac.uk/genomes/mot/index.html (see a snapshot at http://grobner.nuigalway.ie/g2.txt) The project will involve carrying out some of these (or other) analyses, depending on the students particular interests. Please contact michael.mcgettrick@nuigalway.ie or drop by room 437 for more information.