Research interests

1 & 2 Fundamentals of gene and genome evolution with application to healthcare

Why are synonymous mutations under selection?

In mammals it has often been assumed that because population sizes are small, selection must be weak and thus selection couldn’t act on synonymous mutations as these – by definition – don’t change the amino acid content of a protein.

However, a decade ago we provided evidence that synonymous mutations are under selection [1-6]. The mechanistic basis for this include modification of miRNA pairing [7], of mRNA structure [3] and possibly of nucleosome positioning [8].

However the most important mechanism appears to be disturbance of RNA splicing [9,5,10-17]. In this regard, we have had a long-standing interest in the role of exonic splice enhancers (ESEs) in determining the fate of synonymous mutations and selection in non-coding RNAs [18,17].

We have shown that ESE presence at the ends of exons can explain codon usage bias [10] and amino acid bias [19], as well as explain selection on synonymous [5] and nonsynonymous [19] mutations. This selection is ancient within eukaryotes, being observed in taxa as distant as brown algae [16] and flies [12]. More generally, we can detect selection for sites in mRNAs where RNA binding proteins attach [20]. We find that disease-causing mutations are unusually common at the ends of exons and in exons with low ESE density [21]. From the end of exon excess we estimate that 25-45% of disease causing mutations disrupt splicing [21].

Not all selection on ESE binding is however owing to selection on splice regulation: we find that, because ESE binding partners have other roles, ESEs are also under selection in intronless genes [22,20]. In addition, selection acts to avoid the generation of binding sites for RNA binding proteins in sites where they would be non-optimal [20].

More generally, in humans at least, because our genome is so “bloated”, we have abundant problems with unwanted transcripts. We have developed the synthetic model - the unwanted transcript hypothesis - that suggests that we have many devices to differentiate wanted from unwanted transcripts and many of these employ GC content, as mutation is biased GC->AT while CDS sequences need to have a GC content above the neutral equilibrium. High expression of transgenes rich in GC at synonymous sites supports this model, as does the increase in GC of our retrogenes compared with the parental genes.

Can we use this information to improve transgenes and to aid diagnostics?

When we make transgenes for mammals we usually leave in the first intron but remove all of the others. The unwanted transcript hypothesis suggests a further simple modification - incrase GC content at synonymous sites. Additionally, as we have shown how important selection of ESEs is for synonymous site nucleotide content, can we then improve transgenes by modifying synonymous sites where functioning ESEs reside, but that are no longer near intron-exon boundaries? With Greg Kudla of the University of Edinburgh we are providing a web package to design a transgene and will test this against alternatives. The first test of the idea, to generate a novel transgene for gene therapy application, shows that our transgenes outperform the commercial alternative [23]. The same logic suggests that we should be able to predict which synonymous mutations might cause disease.

Why do genes of similar expression cluster in genomes?

While it was often considered that, with a few strange exceptions, gene order in the human genome is random [24], we observed that genes with similar expression profiles tend to cluster. In particular we found that genes expressed in many tissues (housekeeping genes) tend to cluster in mammals [25,26] and flies [27]. In yeast essential genes [28] and highly co-expressed genes [29,30] cluster, while in worm much co-expression is operonic [31].

The expression of multiple copies of a recent insert into our genome (the endogenous retrovirus HERVH) provides a convenient natural test of the idea that one gene’s expression might have knock-on consequences. With my collaborator Zsuzsanna Iszvak at MDC Berlin, we have been examining patterns of expression of these inserts and found that HERVH expression often generates new genes by combining with the neighbours or by recruiting neighbouring sequence [32]. This includes what looks to be a human-specific gene, ESRG. Expression of a fragment of HERVH enabled us to extract what appear to be human naïve-like stem cells [32].

There are both selectionist [29,33,34] and neutralist [35,17] explanations for the patterns of clustering. We are currently interested in the possibility that expression of one gene causes the neighbours to be expressed (probably owing to chromatin effects) and have shown that the extent of expression change of one gene predicts the extent of expression change of the neighbours (expression piggybacking) [36].

Can we use this information to define genomic safe-harbours?

One problem with transgenesis is that the transgene might alter expression of the neighbours and vice versa. Indeed, the first gene therapy trials were halted as the transgene affected expression of neighbouring oncogenes. We can then ask whether there are any safe harbours in the genome – domains where one gene’s expression is autonomous from the neighbours. In particular we are testing the possibility that domains where genes are insulated in their expression change might be domains where transgenes are insulated – this would allow a very rapid identification of strong candidate safe harbours without the need for extensive transgene experiments.

The error prone genome?

Both of the above problems suggest that a major problem for genomes is handling errors [37]: splicing is error prone so selection favours ESEs to reinforce things; when chromatin is open to allow a gene to be expressed, neighbouring genes are accidentally expressed too. We are similarly interested in the impact of frameshifting errors [38] and mistranslation errors [39].

The curiosity of the ESE case is that, as humans have larger introns, we have more ESEs than many organisms as large introns are harder to splice. Does this mean that the usual assumption that selection is weaker when population sizes are small is wrong when selection is acting to mitigate errors, as errors could be more common when population size is small [40]?

3. Estimating and understanding key parameters

What is the mutation rate? What determines the mutation rate? What is the recombination rate and how much recombination involves non-crossover mechansims? Addressing these issues is now possible with Next Generation Sequencing. To this end, with collaborators Dacheng Tian and Sihai Yang at Nanjing University, we have provided estimates for the per generation mutation rate via parent-offspring sequencing in Arabidopsis [41], honeybee [41], bumblebee [42], rice [41] and peach [43]. Additionally, we have sought to test the hypothesis that heterozygozity [41], hybridization [43] and recombination [44,42] might be mutagenic. The comparison of the honeybee, with its very high recombination rate, and the bumblebee with its much lower rate recombination rate [44,42], is especially instructive of the extent of recombination-associated mutation (which we estimate to be ~5% of all mutations) [45].

The same data as enable the above estimation allows us to examine recombination rates [45,44] and to examine the direct consequence of recombination, such as biased gene conversion [45]. We have also estimated the gene conversion rate for four species via tetrad sampling (unpublished).

Using SNP and substitution data we have also defined the extent and causes of between gene/genomic region in (what we assume to be) the mutation rate [46-49].

Why be dispensable?

A further key parameter is the relationship between gene dosage and fitness. Some genes are “essential” – when dose goes to zero you die. Others seem to have remarkably little relationship between dose and fitness. We established early on that “non-essential” genes are not truly nonessential – they evolve under strong purifying selection just as essential genes do [50]. Indeed, we find that if we control for expression level – a parameter we determined to be the key predictor of rates of evolution [51] – it isn’t clear that essential and non-essential genes differ in their rates [52].

But can we predict dosage sensitivity? Collaborating with Julie Ahringer in Cambridge, we tested the idea that the presence of a paralog might help explain why some genes seem to have no effect on knockout. Surprisingly, we find this not to be the case – double knockouts typically were the same as single gene knockouts. Rather it looks like dull genes – those with no big phenotypic effects – are most easily duplicated [53]. In an alternative in silico approach, we considered the effects of knockouts via flux balance analysis and found that most non-essential genes were non-essential just because of the lab conditions [54]. Similarly, we could show that if we force a fixed environment we can predict which genes can be lost because they are effectively unnecessary [55]. Conversely, we also find that genes involved in protein complexes tend to be particularly sensitive to reduced dosage [56], what we termed the dosage balance hypothesis. This same model we present to unify both our understanding of dominance but also of gene family size. The idea that dull genes are more prone to duplicate we have recently shown (with Aoife McLysaght at Trinity Dublin) also explains why duplicates seem to evolve faster – they were dull and were always evolving faster.

4. How best to teach evolution?

A new venture for us, in collaboration with colleagues specialist in researching pedagogy, is the Genetics and Evolution Teaching Project (Gevo) which aims to provide tests of easy to implement teaching practices aimed at improving the teaching of evolution.

Our first analysis is a large controlled trial of the role of teaching order in understanding and acceptance of evolution. We hypothesised that if students know the fundamental concepts of genetics then this might help them understand evolution better. To evaluate this we performed a large trial in which in UK secondary schools pupils were either taught genetics and then evolution or evolution and then genetics. We found that the students being taught genetics first had a 5-10% improvement in their understanding of evolution, above that shown in the group taught evolution first. The change was seen in both higher and foundation ability classes. Indeed, in the foundation classes the genetics-first approach was the only approach that enabled an increase in evolution understanding. Teaching genetics first comes at no cost to genetics understanding (and may even improve it). These results suggest a simple, minimally disruptive, zero-cost intervention to improve evolution understanding: teach genetics first [57].

Current projects aim to discern whether various suggested interventions at primary level are all equally efficient at imparting evolution knowledge. To enable this we start by finding methods to fairly assess understanding for this age group.

A project just about to start will look to see if irregular school transition affect science subjects – where progressive accumulation is key – more than arts based subjects and whether all forms of irregular transition are equally disruptive.

Our research is focused on:

1.understanding fundamental issues concerning the evolution of genes and genomes and ...

2.applying this understanding to improved diagnostics and improved healthcare.

3.determining and understanding key molecular evolutionary parameters: mutation, recombination, dispensability, dominance.

4.understanding how best to teach evolution.

References

1.Hurst LD, Pal C. Evidence for purifying selection acting on silent sites in BRCA1. Trends Genet. 2001;17:62-5.

2.Chamary JV, Hurst LD. Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: Evidence for selectively driven codon usage. Mol Biol Evol. 2004;21:1014-23.

3.Chamary JV, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 2005;6:R75.

4.Chamary J-V, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet. 2006;7:98-108.

5.Parmley JL, Chamary JV, Hurst LD. Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers. Mol Biol Evol. 2006;23:301-9.

6.Parmley JL, Hurst LD. How do synonymous mutations affect fitness? Bioessays. 2007;29:515-9.

7.Hurst LD. Preliminary assessment of the impact of microRNA-mediated regulation on coding sequence evolution in mammals. J Mol Evol. 2006;63:174-82.

8.Warnecke T, Batada NN, Hurst LD. The impact of the nucleosome code on protein-coding sequence evolution in yeast. PLoS Genet. 2008;4:e1000250.

9.Chamary JV, Hurst LD. Biased codon usage near intron-exon junctions: selection on splicing enhancers, splice-site recognition or something else? Trends Genet. 2005;21:256-9.

10.Parmley JL, Hurst LD. Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol Biol Evol. 2007;24:1600-3.

11.Parmley JL, Hurst LD. How common are intragene windows with KA > KS owing to purifying selection on synonymous mutations? J Mol Evol. 2007;64:646-55.

12.Warnecke T, Hurst LD. Evidence for a trade-off between translational efficiency and splicing regulation in determining synonymous codon usage in Drosophila melanogaster. Mol Biol Evol. 2007;24:2755-62.

13.Warnecke T, Parmley JL, Hurst LD. Finding exonic islands in a sea of non-coding sequence: splicing related constraints on protein composition and evolution are common in intron-rich genomes. Genome Biol. 2008;9:r29.

14.Warnecke T, Weber CC, Hurst LD. Why there is more to protein evolution than protein function: splicing, nucleosomes and dual-coding sequence. Biochem Soc Trans. 2009;37:756-61.

15.Caceres EF, Hurst LD. The evolution, impact and properties of exonic splice enhancers. Genome Biol. 2013;14.

16.Wu XM, Tronholm A, Caceres EF, Tovar-Corona JM, Chen L, Urrutia AO, et al. Evidence for Deep Phylogenetic Conservation of Exonic Splice-Related Constraints: Splice-Related Skews at Exonic Ends in the Brown Alga Ectocarpus Are Common and Resemble Those Seen in Humans. Genome Biol Evol. 2013;5:1731-45.

17.Schüler A, Ghanbarian AT, Hurst LD. Purifying Selection on Splice-Related Motifs, Not Expression Level nor RNA Folding, Explains Nearly All Constraint on Human lincRNAs. Mol Biol Evol. 2014;31:3164-83.

18.Hurst LD, Smith NGC. Molecular evolutionary evidence that H19 mRNA is functional. Trends Genet. 1999;15:134-5.

19.Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H, Hurst LD. Splicing and the evolution of proteins in mammals. PLoS Biol. 2007;5:343-53.

20.Savisaar R, Hurst LD. Both maintenance and avoidance of RNA-binding potein interactions constrain coding region evolution. Mol Biol Evol. 2017: 34 1110-1126.

21.Wu X, Hurst LD. Determinants of the Usage of Splice-Associated cis-Motifs Predict the Distribution of Human Pathogenic SNPs. Mol Biol Evol. 2016;33:518-29.

22.Savisaar R, Hurst LD. Purifying Selection on Exonic Splice Enhancers in Intronless Genes. Mol Biol Evol. 2016;33:1396-418.

23.Thumann G, Harmening N, Prat-Souteyrand C, Marie C, Pastor M, Sebe A, et al. Engineering of PEDF-Expressing Primary Pigment Epithelial Cells by the SB Transposon System Delivered by pFAR4 Plasmids. Mol Ther Nucleic Acids. 2017;6:302-14.

24.Hurst LD, Pal C, Lercher MJ. The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet. 2004;5:299-310.

25.Lercher MJ, Urrutia AO, Hurst LD. Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nature Genet. 2002;31:180-3.

26.Lercher MJ, Urrutia AO, Pavlicek A, Hurst LD. A unification of mosaic structures in the human genome. Hum Mol Genet. 2003;12:2411-5.

27.Weber CC, Hurst LD. Support for multiple classes of local expression clusters in Drosophila melanogaster, but no evidence for gene order conservation. Genome Biol. 2011;12.

28.Pal C, Hurst LD. Evidence for co-evolution of gene order and recombination rate. Nature Genet. 2003;33:392-5.

29.Hurst LD, Williams EJ, Pal C. Natural selection promotes the conservation of linkage of co-expressed genes. Trends Genet. 2002;18:604-6.

30.Lercher MJ, Hurst LD. Co-expressed yeast genes cluster over a long range but are not regularly spaced. J Mol Biol. 2006;359:825-31.

31.Lercher MJ, Blumenthal T, Hurst LD. Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. Genome Res. 2003;13:238-43.

32.Wang J, Xie G, Singh M, Ghanbarian AT, Rasko T, Szvetnik A, et al. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature. 2014;516:405-9.

33.Batada NN, Hurst LD. Evolution of chromosome organization driven by selection for reduced gene expression noise. Nature Genet. 2007;39:945-9.

34.Wang GZ, Lercher MJ, Hurst LD. Transcriptional Coupling of Neighboring Genes and Gene Expression Noise: Evidence that Gene Orientation and Noncoding Transcripts Are Modulators of Noise. Genome Biol Evol. 2011;3:320-31.

35.Batada NN, Urrutia AO, Hurst LD. Chromatin remodelling is a major source of coexpression of linked genes in yeast. Trends Genet. 2007;23:480-4.

36.Ghanbarian AT, Hurst LD. Neighboring Genes Show Correlated Evolution in Gene Expression. Mol Biol Evol. 2015;32:1748-66.

37.Warnecke T, Hurst LD. Error prevention and mitigation as forces in the evolution of genes and genomes. Nat Rev Genet. 2011;12:875-81.

38.Warnecke T, Huang Y, Przytycka TM, Hurst LD. Unique cost dynamics elucidate the role of frame-shifting errors in promoting translational robustness. Genome Biol Evol. 2010;2:636-45.

39.Warnecke T, Hurst LD. GroEL dependency affects codon usage-support for a critical role of misfolding in gene evolution. Mol Syst Biol. 2010;6:340.

40.Wu XM, Hurst LD. Why Selection Might Be Stronger When Populations Are Small: Intron Size and Density Predict within and between-Species Usage of Exonic Splice Associated cis-Motifs. Mol Biol Evol. 2015;32:1847-61.

41.Yang S, Wang L, Huang J, Zhang X, Yuan Y, Chen J-Q, et al. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature. 2015;523:463-U187.

42.Liu HX, Jia YX, Sun XG, Tian DC, Hurst LD, Yang SH. Direct Determination of the Mutation Rate in the Bumblebee Reveals Evidence for Weak Recombination-Associated Mutation and an Approximate Rate Constancy in Insects. Mol Biol Evol. 2017;34:119-30.

43.Xie ZQ, Wang L, Wang LR, Wang ZQ, Lu ZH, Tian DC, et al. Mutation rate analysis via parent-progeny sequencing of the perennial peach. I. A low rate in woody perennials and a higher mutagenicity in hybrids. Proceedings of the Royal Society B-Biological Sciences. 2016;283.

44.Wang L, Zhang YC, Qin C, Tian DC, Yang SH, Hurst LD. Mutation rate analysis via parent-progeny sequencing of the perennial peach. II. No evidence for recombination-associated mutation. Proceedings of the Royal Society B-Biological Sciences. 2016;283.

45.Liu H, Zhang X, Huang J, Chen J-Q, Tian D, Hurst LD, et al. Causes and consequences of crossing-over evidenced via a high-resolution recombinational landscape of the honey bee. Genome Biol. 2015;16.

46.Lercher MJ, Williams EJB, Hurst LD. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: Implications for understanding the mechanistic basis of the male mutation bias. Mol Biol Evol. 2001;18:2032-9.

47.Lercher MJ, Hurst LD. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 2002;18:337-40.

48.Lercher MJ, Chamary J-V, Hurst LD. Genomic regionality in rates of evolution is not explained by clustering of genes of comparable expression profile. Genome Res. 2004;14:1002-13.

49.Weber CC, Pink CJ, Hurst LD. Late-Replicating Domains Have Higher Divergence and Diversity in Drosophila melanogaster. Mol Biol Evol. 2012;29:873-82.

50.Hurst LD, Smith NGC. Do essential genes evolve slowly? Curr Biol. 1999;9:747-50.

51.Pal C, Papp B, Hurst LD. Highly expressed genes in yeast evolve slowly. Genetics. 2001;158:927-31.

52.Papp B, Pal C, Hurst LD. Rate of evolution and gene dispensability. Nature. 2003; 421:496-497.

53.Woods S, Coghlan A, Rivers D, Warnecke T, Jeffries SJ, Kwon T, et al. Duplication and Retention Biases of Essential and Non-Essential Genes Revealed by Systematic Knockdown Analyses. PLoS Genet. 2013;9.

54.Papp B, Pal C, Hurst LD. Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature. 2004;429:661-4.

55.Pal C, Papp B, Lercher MJ, Csermely P, Oliver SG, Hurst LD. Chance and necessity in the evolution of minimal metabolic networks. Nature. 2006;440:667-70.

56.Papp B, Pal C, Hurst LD. Dosage sensitivity and the evolution of gene families in yeast. Nature. 2003;424:194-7.

57.Mead R, Hejmadi M, Hurst LD. Teaching genetics prior to teaching evolution improves evolution understanding but not acceptance. PLoS Biol. 2017: 15: e2002255.