Prof Cathal Seoighe

Professor of Bioinformatics

Research interests

  • Genomics
  • Molecular evolution

Research overview


The Seoighe group carries out research into various aspects of genomics and molecular evolution, including questions that can be tackled using existing data as well as collaborative research with experimental scientists. Recent work has included development of tools for the analysis of deep sequencing data from the T cell receptor and immunoglobulins,  application of evolutionary models to infer immune epitopes in viruses, analysis of the diversity of mRNA splicing in the thymus and its potential importance for the avoidance of autoimmunity, identification of genetic with transcriptome-wide effects on RNA processing and the development of computational tools that can be used to disentangle the effects of cell type composition and gene expression variation using gene expression or methylation data.

A sample of some of our published work is provided below.

LymAnalyzer: a tool for comprehensive analysis of next generation sequencing data of T cell receptors and immunoglobulins. Nucleic Acids Research, 2015 (pdf)

The enormous diversity of T and B cell receptors is generated through recombination of diverse V, (D) and J gene segments, together with somatic mutation processes. LymAnalyzer is a free specialized tool for accurate and rapid mapping of sequencing reads to immune gene segments and alleles in a reference database. It includes extraction of the Complementarity Determining Region 3 (CDR3) and clustering of related clones. In addition to mapping to known immune gene alleles, the tool can infer novel alleles that are absent from the reference database. We are interested in hearing from research groups interested in using this tool or collaborating to customize or develop similar tools for related problems (contact information below).

Promiscuous mRNA splicing under the control of AIRE in medullary thymic epithelial cells. Bioinformatics, 2015 (pdf)

Medullary thymic epithelial cells (mTECs) play a crucial role in the development of self-tolerance (i.e. the body’s ability to recognize and not mount an immune response against its own proteins). It has been known for a long time that there are mechanisms in mTECs to ensure that certain genes that are normally only expressed in very specific tissues are expressed in mTECs. This allows T cells, which mature in the thymus, to be exposed to these ’tissue-restricted antigens’ (TRAs) in the course of their training to distinguish self from non-self proteins. TRAs can also result from specific forms of genes that result from tissue-specific alternative mRNA splicing (the processing of the RNA of a given gene to produce different proteins).  Not much is known about how T cells are trained to avoid responding to these TRAs. In this paper we have shown that splice isoform diversity is higher in medullary thymic epithelial cells than in any other tissue type examined and that this diversity of mRNA splicing is dependent on the AIRE gene, which plays a key role in the expression of tissue restricted genes, suggesting that mechanisms exist to ensure that T cells are exposed to diverse splice isoforms.

Identification of broadly neutralizing antibody epitopes in HIV-1 env. Virology Journal (2013) (pdf)

People infected with HIV generally produce antibodies that are capable of neutralizing the virus, but yet ultimately the immune system loses the battle to control this infection. The reason is that HIV evolves within the infected individual to evade almost all immune responses that are mounted against it.

Screen Shot 2015-02-19 at 13.15.10This results in a situation in which plasma from an HIV infected patient can neutralize virus from an earlier time point of infection, but not the viruses obtained at the same time point as the plasma and, generally, not the broad diversity of viruses that are found across the whole HIV pandemic (HIV viruses are incredibly diverse as a result of the virus’ rapid rate of evolution). However, some HIV-infected individuals produce broadly neutralizing antibodies that are capable of neutralizing most viruses. These antibodies are of great interest because if a vaccine can be designed that causes them to be produced it may be effective against the diversity of viruses that an at-risk individual may encounter. The graph depicted here shows data from 7 individuals who produced broadly neutralizing antibodies, with the effectiveness of their antibodies against a broad range of viruses (depicted in the phylogenetic tree) illustrated as a heatmap (graded yellow to red according to effectiveness). We developed evolutionary models to identify the sites in the virus at which the pattern of evolution over the phylogenetic tree tracks the changes in virus neutralization (i.e. tracks the heatmap data for a given patient). We also developed a model (not illustrated here) that can identify collections of sites that are close by in the three-dimensional viral protein structure that show this behaviour and used this to identify candidate conformational epitopes that are targeted by broadly neutralizing antibodies in these patients.

Gene expression deconvolution using CellMix. (pdf)

Screen Shot 2015-02-19 at 13.32.02Renaud Gaujoux, a former PhD student, developed a software package, CellMix, that provides a general computational framework for implementing, developing and testing computational methods for gene expression deconvolution. Biological samples are almost always heterogeneous, consisting of different types of cells that are mixed in varying proportions. The gene expression deconvolution problem consists of disentangling the effects of sample composition from intra-cellular variation in gene expression and our software package, along with an earlier package (NMF) is now widely used for this. An example of the results of application of CellMix to deconvolve gene expression data from blood samples is shown.

Profiling the chromatin structure of the human nucleolar organizer regions. (pdf)

Screen Shot 2015-02-19 at 13.42.51Although the completion of the human genome sequencing project was announced with great fanfare in 2001, many parts of the genome remain unsequenced to this day. Among the more enigmatic unsequenced regions are the nucleolar organizer regions (NORs). These regions reside on the short arms of the 5 ‘acrocentric’ chromosomes (i.e. the chromosomes in which the centromere is close to one of the chromosome ends) and they contain the ribosomal genes. These genes are required in very large amounts as they form the building blocks of the ribosomes that are responsible for translating all other proteins. In addition to the ribosomal genes the NORs contain regions of DNA that are repeated across the 5 acrocentric chromosomes but whose function is unknown. Up to now it had been assumed that these regions did not encode any functional genes, although it may have a structural role in the nucelolus – the nuclear structure that forms from the NOR when ribosomal RNA is transcribed. Prof. Brian McStay in the Chromosome Biology Group at NUI Galway specializes in the nucleolus. We performed an analysis of chromatin modifications found in this region in collaboration with the McStay group and provided evidence that the region is actively transcribed. Subsequent experiments in the McStay group confirmed the existence of these transcripts and showed that the encoded genes may be essential for the functioning of NORs. The figure to the right shows the results of chromatin segmentation and the mRNA transcripts that were found in the distal junction region of the NOR.

Selected publications

  • Keane P, Ceredig R, Seoighe C. Promiscuous mRNA splicing under the control of AIRE in medullary thymic epithelial cells. Bioinformatics. 2014
  • Gaujoux R, Seoighe C. CellMix: a comprehensive toolbox for gene expression deconvolution. Bioinformatics. 2013
  • Seoighe C, Korir PK. Evidence for intron length conservation in a set of mammalian genes associated with embryonic development. BMC Bioinformatics. 2011
  • Lacerda M, Scheffler K, Seoighe C. Epitope discovery with phylogenetic hidden Markov models. Mol Biol Evol. 2010
  • Wood N, Bhattacharya T, Keele BF, Giorgi E, Liu M, Gaschen B, Daniels M, Ferrari G, Haynes BF, McMichael A, Shaw GM, Hahn BH, Korber B, Seoighe C. HIV evolution in early infection: selection pressures, patterns of insertion and deletion, and the impact of APOBEC. PLoS Pathog. 2009