Prof Cathal Seoighe

Professor of Bioinformatics

Research interests

  • Genomics
  • Molecular evolution

Research overview


****I’m looking for a postdoctoral researcher to join the group. Deadline for application is Friday 27 March, 2020. For more information click here.****


The Seoighe group carries out research into various aspects of genomics and molecular evolution, including questions that can be tackled using existing data as well as collaborative research with experimental scientists. Recent work has included development of tools for the analysis of deep sequencing data from the T cell receptor and immunoglobulins,  application of evolutionary models to infer immune epitopes in viruses, analysis of the diversity of mRNA splicing in the thymus and its potential importance for the avoidance of autoimmunity, identification of genetic with transcriptome-wide effects on RNA processing and the development of computational tools that can be used to disentangle the effects of cell type composition and gene expression variation using gene expression or methylation data.


A sample of some of our current and published work is provided below.



Mutation rate variation

The rate of germline mutation is a key parameter in molecular evolution and population genetics. As the ultimate source of genetic novelty, germline mutations provide the raw material on which selection acts and the basis for genetic drift over time. On the other hand somatic mutations, although they cannot be transmitted to the next generation, are the basis of the development of cancer and may also be a significant factor in aging. We are interested in inter-individual and inter-specific variation in rates of both germline and somatic mutation. We previously developed a method to study genetic variation in germline mutation rates. Our method makes use of haplotype data and is based on a characteristic pattern of haplotype divergence expected to occur in the context of a mutator allele (an allele or genetic variant that increases the rate of germline mutation). This pattern consists of a number of haplotypes with a peak in the number of derived (i.e. non-ancestral) alleles against a background in which other haplotypes in the population have typical numbers of derived alleles. The results of a simulation that illustrates this pattern of haplotype divergence are shown (to the right). We found that the genomic loci at which these peaks occur in humans are enriched for genes involved in DNA replication and repair. The paper reporting these results is freely available as Seoighe Scally, PLoS Genetics, 2017. We are currently extending our analyses of germline mutation rate variation and developing methods to investigate somatic mutation rate variation as part of a research project funded by Science Foundation Ireland.


LymAnalyzer: a tool for comprehensive analysis of next generation sequencing data of T cell receptors and immunoglobulins. Nucleic Acids Research, 2015 (pdf)

The enormous diversity of T and B cell receptors is generated through recombination of diverse V, (D) and J gene segments, together with somatic mutation processes. LymAnalyzer is a free specialized tool for accurate and rapid mapping of sequencing reads to immune gene segments and alleles in a reference database. It includes extraction of the Complementarity Determining Region 3 (CDR3) and clustering of related clones. In addition to mapping to known immune gene alleles, the tool can infer novel alleles that are absent from the reference database. We are interested in hearing from research groups interested in using this tool or collaborating to customize or develop similar tools for related problems (contact information below).

Promiscuous mRNA splicing under the control of AIRE in medullary thymic epithelial cells. Bioinformatics, 2015 (pdf)

Medullary thymic epithelial cells (mTECs) play a crucial role in the development of self-tolerance (i.e. the body’s ability to recognize and not mount an immune response against its own proteins). It has been known for a long time that there are mechanisms in mTECs to ensure that certain genes that are normally only expressed in very specific tissues are expressed in mTECs. This allows T cells, which mature in the thymus, to be exposed to these ’tissue-restricted antigens’ (TRAs) in the course of their training to distinguish self from non-self proteins. TRAs can also result from specific forms of genes that result from tissue-specific alternative mRNA splicing (the processing of the RNA of a given gene to produce different proteins).  Not much is known about how T cells are trained to avoid responding to these TRAs. In this paper we have shown that splice isoform diversity is higher in medullary thymic epithelial cells than in any other tissue type examined and that this diversity of mRNA splicing is dependent on the AIRE gene, which plays a key role in the expression of tissue restricted genes, suggesting that mechanisms exist to ensure that T cells are exposed to diverse splice isoforms.

Identification of broadly neutralizing antibody epitopes in HIV-1 env. Virology Journal (2013) (pdf)

People infected with HIV generally produce antibodies that are capable of neutralizing the virus, but yet ultimately the immune system loses the battle to control this infection. The reason is that HIV evolves within the infected individual to evade almost all immune responses that are mounted against it.

Screen Shot 2015-02-19 at 13.15.10This results in a situation in which plasma from an HIV infected patient can neutralize virus from an earlier time point of infection, but not the viruses obtained at the same time point as the plasma and, generally, not the broad diversity of viruses that are found across the whole HIV pandemic (HIV viruses are incredibly diverse as a result of the virus’ rapid rate of evolution). However, some HIV-infected individuals produce broadly neutralizing antibodies that are capable of neutralizing most viruses. These antibodies are of great interest because if a vaccine can be designed that causes them to be produced it may be effective against the diversity of viruses that an at-risk individual may encounter. The graph depicted here shows data from 7 individuals who produced broadly neutralizing antibodies, with the effectiveness of their antibodies against a broad range of viruses (depicted in the phylogenetic tree) illustrated as a heatmap (graded yellow to red according to effectiveness). We developed evolutionary models to identify the sites in the virus at which the pattern of evolution over the phylogenetic tree tracks the changes in virus neutralization (i.e. tracks the heatmap data for a given patient). We also developed a model (not illustrated here) that can identify collections of sites that are close by in the three-dimensional viral protein structure that show this behaviour and used this to identify candidate conformational epitopes that are targeted by broadly neutralizing antibodies in these patients.

Gene expression deconvolution using CellMix. (pdf)

Screen Shot 2015-02-19 at 13.32.02Renaud Gaujoux, a former PhD student, developed a software package, CellMix, that provides a general computational framework for implementing, developing and testing computational methods for gene expression deconvolution. Biological samples are almost always heterogeneous, consisting of different types of cells that are mixed in varying proportions. The gene expression deconvolution problem consists of disentangling the effects of sample composition from intra-cellular variation in gene expression and our software package, along with an earlier package (NMF) is now widely used for this. An example of the results of application of CellMix to deconvolve gene expression data from blood samples is shown.

Selected publications

  • Geeleher P, Nath A, Wang F, Zhang Z, Barbeira AN, Fessler J, Grossman RL, Seoighe C, Stephanie Huang R. Cancer expression quantitative trait loci (eQTLs) can be determined from heterogeneous tumor gene expression data by modeling variation in tumor purity. Genome Biology 2018 Sep 11;19(1):130.
  • Seoighe C, Tosh NJ, Greally JM.DNA methylation haplotypes as cancer markers (Brief Communications Arising). Nature Genetics 2018 Aug;50(8):1062-1063
  • Yu Y, Ceredig R, Seoighe C. A Database of Human Immune Receptor Alleles Recovered from Population Sequencing Data. Journal of Immunology 2017 Mar 1;198(5):2202-2210.
  • Seoighe C, Scally A. Inference of Candidate Germline Mutator Loci in Humans from Genome-Wide Haplotype Data. PLoS Genetics 2017 Jan 17;13(1):e1006549
  • Yang H, Seoighe C. Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization. PLoS One. 2016 Oct 14;11(10):e0164880.
  • Keane PA, Seoighe C. Intron Length Coevolution across Mammalian Genomes Molecular Biology and Evolution 2016 Oct;33(10):2682-91.
  • Keane P, Ceredig R, Seoighe C. Promiscuous mRNA splicing under the control of AIRE in medullary thymic epithelial cells. Bioinformatics. 2015
  • Lacerda M, Seoighe C. Population genetics inference for longitudinally-sampled mutants under strong selection. Genetics. 2014 Nov;198(3):1237-50. doi: 10.1534/genetics.114.167957
  • Gaujoux R, Seoighe C. CellMix: a comprehensive toolbox for gene expression deconvolution. Bioinformatics. 2013
  • Seoighe C, Korir PK. Evidence for intron length conservation in a set of mammalian genes associated with embryonic development. BMC Bioinformatics. 2011
  • Lacerda M, Scheffler K, Seoighe C. Epitope discovery with phylogenetic hidden Markov models. Mol Biol Evol. 2010
  • Wood N, Bhattacharya T, Keele BF, Giorgi E, Liu M, Gaschen B, Daniels M, Ferrari G, Haynes BF, McMichael A, Shaw GM, Hahn BH, Korber B, Seoighe C. HIV evolution in early infection: selection pressures, patterns of insertion and deletion, and the impact of APOBEC. PLoS Pathog. 2009