Thomas Down PhD

Transcription Informatics

 

Co-workers:
Siarhei Maslau • Jing Su

We study the mechanisms by which programs of gene expression are selected during the development of multicellular organisms. Regulatory sequences contain clusters of binding sites for transcription factors, most of which interact with some specific DNA sequence motif. By discovering the repertoire of transcription factor binding sites, we can uncover an important part of the cell’s regulatory network. We are addressing this question using a new computational motif discovery tool, NestedMICA, to find DNA sequence motifs that are over-represented in larger sets of regulatory sequences from across the genomes of a panel of multicellular organisms.



A regulatory motif discovered in the Drosophila genome, and the embryonic expression pattern of a gene regulated by this motif. (P Tomancak et al. Genome Biology 3:research0088)

We would also like to understand how particular patterns of gene expression are stably maintained over time -- for instance, when a cell becomes committed to a particular developmental lineage. To this end, we are involved in studies of stable epigenetic modifications: particularly DNA cytosine methylation. High-throughput sequencing technologies allow epigenetic modifications to be studied on a genome-wide basis, and we have developed a new analytical technique, which we applied to deep sequencing data to generate the first map of DNA methylation across a complete vertebrate genome. We are now exploiting this technology to study how DNA methylation interacts with other regulatory mechanisms. We are also investigating how human DNA methylation changes are associated with ageing and complex diseases.



The Methyl DNA Immunoprecipitation (MeDIP) technique can be used to quantify the methylation state of genomic DNA on a large scale. In methylated regions (coloured blue), signal correlates with the density of CpG dinucleotides..

 



Visualisation of DNA methylation state using the Ensembl genome browser, with yellow indicating unmethylated sequences and blue indicating highly methylated regions.

 

Plain English:
Genome sequences are available for many organisms, but tools for finding and interpreting the sequences which regulate gene activity remain very limited. My group is developing new computational techniques for analysing gene regulators. Gene regulators consist principally of clusters of short DNA words (motifs), each of which enables the regulation of a gene by one of several hundred transcription factors. Therefore, we are creating comprehensive catalogs or dictionaries of regulatory motifs found in the human genome, and in several key model organisms. We then use them to annotate complete genome sequences and understand which genes are controlled by which transcription factors.

Understanding gene regulation will help to answer many fundamental biological questions, especially in developmental biology where little is known about how the many cell types found in complex organisms differentiate from stem cells. It has been shown that some genetic diseases, including one form of the blood disease thalassemia, are caused by defects in regulators rather than protein-coding genes. Accurate and comprehensive identification of gene regulators is essential to fully understand genetic disease and variation between individuals.

 

Selected publications:

• MM, Marioni JC, Birney E, Hubbard TJP, Durbin R, Tavare S, Beck S (2008) A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nature Biotech 26:779-785

• Down TA, Bergman CM, Su J, Hubbard TJP (2007) Large scale discovery of promoter motifs in Drosophila melanogaster. PLoS Comput Biol 3:e7

• Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA, Haefliger C, Horton R, Howe K, Jackson DK, Kunde J, Koenig C, Liddle J, Niblett D, Otto T, Pettett R, Seemann S, Thompson C, West T, Rogers J, Olek A, Berlin K, Beck S (2006) DNA methylation profiling of human chromosomes 6, 20 and 22. Nature Genetics 38, 1378-1385

• Down TA and Hubbard TJP (2005) NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequences. Nucleic Acids Res 33, 1445-1453.

• Down TA and Hubbard TJP (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res 12:458-461

• Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJP (2007) Integrating sequence and structural biology with DAS. BMC Bioinformatics 8:333

 

Resources:

BioTIFFIN regulatory motif database
NestedMICA motif-discovery toolkit

 



The BioTIFFIN interface for browsing regulatory sequence motifs.