skip to primary navigationskip to content

Martin Hemberg

hembergMartin Hemberg PhD

Associate Group Leader at the Gurdon Institute and Career Development Fellow Group Leader at the Wellcome Trust Sanger Institute.

Publications listed at: Europe PMC Pubmed



Computational analyses of large genomic datasets

hemberg 2016Although every cell in an organism contains the same DNA, there is a great variety of cell types (e.g. skin, muscle, kidney) due to the fact that different genes are being transcribed. The amount of transcripts, or RNA, made from a specific gene can be measured in the cell and is referred to as the expression level of the gene. Understanding how, why, when and where genes are turned on and off is crucial for understanding many biological processes, ranging from devlopment to a variety of diseases, including cancer and autism.

Recent technological advances have made it possible to analyse gene expression and other related properties in a high-throughput manner, and this has resulted in a wealth of data. However, the experimental data is typically large, high-dimensional and noisy. We are interested in developing computational methods that will make it possible to extract as much information as possible from the data.

Our ongoing research projects include:

  • Inference of gene regulatory networks from single-cell RNA-seq data.
  • Characterisation of the transcriptome of individual nematodes (with the Miska lab). 
  • Characterisation of the heterogeneity of liver organoids (with the Huch lab).
  • Identification and characterization of non-canonical secondary structures in DNA. 
  • Virtual Reality technology for visualising genomic data (collaborating with HammerheadVR).

Selected publications:

• Nguyen TA et al. (2016) High-throughput functional comparison of promoter and enhancer activities. Genome Res. Jun 16. pii: gr.204834.116. [Epub ahead of print]

• Delmans M and Hemberg M (2016) Discrete distributional differential expression (D3E)--a tool for gene expression analysis of single-cell RNA-seq data. BMC Bioinformatics. 17:110.

• Kiselev VY et al. (2016) SC3 - consensus clustering of single-cell RNA-Seq data. BiorXiv pre-print published online 14 April 2016.

• Prabakaran S et al. (2014) Quantitative profiling of peptides from RNAs classified as noncoding. Nature Communications 5: 5429.

• Kim TK et al. (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465 (7295): 182–187.


Plain English

What can sequencing data tell us about disease?

To create the different cell types in an organism, different genes are expressed at different times from the whole genome as transcripts of RNA, which will include both protein-coding and non-coding species. Understanding how, why, when and where genes are expressed is crucial for understanding not just development but also many diseases.

High-throughput sequencing of RNA from different tissues can now provide insights into gene expression and related properties, but the experimental datasets are large, high-dimensional and noisy. Computational methods are required to extract maximum information from such data.


Nicholas Keone Lee

Based at Wellcome Trust Sanger Institute: Tallulah Andrews • Ilias Georgakopolous-Soares • Vladimir Kiselev • Guillermo Parada