The variational theory of evolution has a peculiar selfdefeating property. If evolution occurs by the differential reproduction of different variants, we expect the variant with the highest rate of reproduction eventually to take over the population and all other genotypes to disappear. But then there is no longer any variation for further evolution. The possibility of continued evolution therefore is critically dependent on renewed variation.
For a given population, there are three sources of variation: mutation, recombination, and immigration of genes. However, recombination by itself does not produce variation unless alleles are segregating already at different loci; otherwise there is nothing to recombine. Similarly, immigration cannot provide variation if the entire species is homo-zygous for the same allele. Ultimately, the source of all variation must be mutation.
Variation from mutations
Mutations are the source of variation, but the process of mutation does not itself drive evolution. The rate of change in gene frequency from the mutation process is very low because spontaneous mutation rates are low (Table 24-9). The mutation rate is defined as the probability that a copy of an allele changes to some other allelic form in one generation. Suppose that a population were completely homozygous A and mutations to a occurred at the rate of 1/100,000 Then, in the next generation, the frequency of a alleles would be only 1.0 × 1/100,000 = 0.00001 and the frequency of A alleles would be 0.99999. After yet another generation of mutation, the frequency of a would be increased by 0.99999 × 1/100,000 = 0.00009 to a new frequency of 0.000019, whereas the original allele would be reduced in frequency to 0.999981. It is obvious that the rate of increase of the new allele is extremely slow and that it gets slower every generation because there are fewer copies of the old allele still left to mutate. A general formula for the change in allele frequency under mutation is given in Box 24-3.
Point-Mutation Rates in Different Organisms.
Effect of Mutation on Allele Frequency. Let μ be the mutation rate from allele A to some other allele a (the probability that a gene copy A will become a in the DNA replication preceding meiosis). If pt is the frequency of the A allele in generation (more...)
Mutation rates are so low that mutation alone cannot account for the rapid evolution of populations and species.
If we look at the mutation process from the standpoint of the increase of a particular new allele rather than the decrease of the old form, the process is even slower. Most mutation rates that have been determined are the sum of all mutations of A to any mutant form with a detectable effect. Any specific base substitution is likely to be at least two orders of magnitude lower in frequency than the sum of all changes. So, precise reverse mutations (“back mutations”) to the original allele A are unlikely, although many mutations may produce alleles that are phenotypically similar to the original.
It is not possible to measure locus-specific mutation rates for continuously varying characters, but the rate of accumulation of genetic variance can be determined. Beginning with a completely homozygous line of Drosophila derived from a natural population, 1/1000 to 1/500 of the genetic variance in bristle number in the original population is restored each generation by spontaneous mutation.
Variation from recombination
The creation of genetic variation by recombination can be a much faster process than its creation by mutation. When just two chromosomes with “normal” survival, taken from a natural population of Drosophila, are allowed to recombine for a single generation, they produce an array of chromosomes with 25 to 75 percent as much genetic variation in survival as was present in the entire natural population from which the parent chromosomes were sampled. This outcome is simply a consequence of the very large number of different recombinant chromosomes that can be produced even if we take into account only single crossovers. If a pair of homologous chromosomes is heterozygous at n loci, then a crossover can take place in any one of the n − 1 intervals between them, and, because each recombination produces two recombinant products, there are 2(n − 1) new unique gametic types from a single generation of crossing-over, even considering only single crossovers. If the heterozygous loci are well spread out on the chromosomes, these new gametic types will be frequent and considerable variation will be generated. Asexual organisms or organisms, such as bacteria, that very seldom undergo sexual recombination do not have this source of variation, so new mutations are the only way in which a change in gene combinations can be achieved. As a result, asexual organisms may evolve more slowly under natural selection than sexual organisms.
Variation from migration
A further source of variation is migration into a population from other populations with different gene frequencies. The resulting mixed population will have an allele frequency that is somewhere intermediate between its original value and the frequency in the donor population. Suppose a population receives a group of migrants whose number is equal to, say, 10 percent of its native population size. Then the newly formed mixed population will have an allele frequency that is a 0.90:0.10 mixture between its original allele frequency and the allele frequency of the donor population. If its original allele frequency of A were, say, 0.70, whereas the donor population had an allele frequency of only, say, 0.40, the new mixed population would have a frequency of 0.70 × 0.90 + 0.40 × 0.10 = 0.67. Box 24-4 derives the general result. The change in gene frequency is proportional to the difference in frequency between the recipient population and the average of the donor populations. Unlike the mutation rate, the migration rate (m) can be large, so the change in frequency may be substantial.
Effect of Migration on Allele Frequency. If pt is the frequency of an allele in the recipient population in generation t and P is the allelic frequency in a donor population (or the average over several donor populations) and if m is the proportion of (more...)
We must understand migration as meaning any form of the introduction of genes from one population into another. So, for example, genes from Europeans have “migrated” into the population of African origin in North America steadily since the Africans were introduced as slaves. We can determine the amount of this migration by looking at the frequency of an allele that is found only in Europeans and not in Africans and comparing its frequency among blacks in North America.
We can use the formula for the change in gene frequency from migration if we modify it slightly to account for the fact that several generations of admixture have taken place. If the rate of admixture has not been too great, then (to a close order of approximation) the sum of the single-generation migration rates over several generations (let’s call this M) will be related to the total change in the recipient population after these several generations by the same expression as the one used for changes due to migration. If, as before, P is the allelic frequency in the donor population and p0 is the original frequency among the recipients, then
For example, the Duffy blood group alleleFya is absent in Africa but has a frequency of 0.42 in whites from the state of Georgia. Among blacks from Georgia, the Fya frequency is 0.046. Therefore, the total migration of genes from whites into the black population since the introduction of slaves in the eighteenth century is
When the same analysis is carried out on American blacks from Oakland (California) and Detroit, M is 0.22 and 0.26, respectively, showing either greater admixture rates in these cities than in Georgia or differential movement into these cities by American blacks who have more European ancestry. In any case, the genetic variation at the Fy locus has been increased by this admixture.
Inbreeding and assortative mating
Random mating with respect to a locus is common, but it is not universal. Two kinds of deviation from random mating must be distinguished. First, individuals may mate with each other nonrandomly because of their degree of common ancestry; that is, their degree of genetic relationship. If mating between relatives occurs more commonly than would occur by pure chance, then the population is inbreeding. If mating between relatives is less common than would occur by chance, then the population is said to be undergoing enforced outbreeding, or negative inbreeding.
Second, individuals may tend to choose each other as mates, not because of their degree of genetic relationship but because of their degree of resemblance to each other at some locus. Bias toward mating of like with like is called positive assortative mating. Mating with unlike partners is called negative assortative mating. Assortative mating is never complete.
Inbreeding and assortative mating are not the same. Close relatives resemble each other more than unrelated individuals on the average but not necessarily for any particular trait in particular individuals. So inbreeding can result in the mating of quite dissimilar individuals. On the other hand, individuals who resemble each other for some trait may do so because they are relatives, but unrelated individuals also may have specific resemblances. Brothers and sisters do not all have the same eye color, and blue-eyed people are not all related to one another.
Assortative mating for some traits is common. In humans, there is a positive assortative mating bias for skin color and height, for example. An important difference between assortative mating and inbreeding is that the former is specific to a trait, whereas the latter applies to the entire genome. Individuals may mate assortatively with respect to height but at random with respect to blood group. Cousins, on the other hand, resemble each other genetically on the average to the same degree at all loci.
For both positive assortative mating and inbreeding, the consequence to population structure is the same: there is an increase in homozygosity above the level predicted by the Hardy-Weinberg equilibrium. If two individuals are related, they have at least one common ancestor. Thus, there is some chance that an allele carried by one of them and an allele carried by the other are both descended from the identical DNA molecule. The result is that there is an extra chance of homozygosity by descent, to be added to the chance of homozygosity (p2 + q2) that arises from the random mating of unrelated individuals. The probability of homozygosity by descent is called the inbreeding coefficient (F). Figure 24-6 and Box 24-5 illustrate the calculation of the probability of homozygosity by descent. Individuals I and II are full sibs because they share both parents. We label each allele in the parents uniquely to keep track of them. Individuals I and II mate to produce individual III. If individual I is A1/A3 and the gamete that it contributes to III contains the allele A1, then we would like to calculate the probability that the gamete produced by II is also A1. The chance is 1/2 that II will receive A1 from its father, and, if it does, the chance is 1/2 that II will pass A1 on to the gamete in question. Thus, the probability that III will receive an A1 from II is 1/2 × 1/2 = 1/4 and this is the chance that III—the product of a full-sib mating—will be homozygous by descent.
Calculation of homozygosity by descent for an offspring (III) of a brother–sister (I–II) mating. The probability that II will receive A1 from its father is 1/2; if it does, the probability that II will pass A1 on to the generation producing (more...)
Effect of the Mating of Close Relatives on Homozygosity. The probability of a homozygous a/a offspring from a brother–sister mating is: We assume that the chance that both grandparents are A/a is negligible. If p is very small, then q is nearly 1.0 (more...)
Such close inbreeding can have deleterious consequences. Let’s consider a rare deleterious allelea that, when homozygous, causes a metabolic disorder. If the frequency of the allele in the population is p, then the probability that a random couple will produce a homozygous offspring is only p2 (from the Hardy-Weinberg equilibrium). Thus, if p is, say, 1/1000, the frequency of homozygotes will be 1 in 1,000,000. Now suppose that the couple are brother and sister. If one of their common parents is a heterozygote for the disease, they may both receive it and may both pass it on to their offspring. As the calculation shows, the rarer the gene, the worse the relative risk of a defective offspring from inbreeding. For more-distant relatives, the chance of homozygosity by descent is less but still substantial. For first cousins, for example, the relative risk is 1/16p compared with random mating.
Systematic inbreeding between close relatives eventually leads to complete homozygosity of the population but at different rates, depending on the degree of relationship. Which allele is fixed within a line is a matter of chance. If, in the original population from which the inbred lines are taken, allele A has frequency p and allele a has frequency q = 1 − p, then a proportion p of the homozygous lines established by inbreeding will be homozygous A/A and a proportion q of the lines will be a/a. Inbreeding takes the genetic variation present within the original population and converts it into variation between homozygous inbred lines sampled from the population (Figure 24-7).
Repeated generations of self-fertilization (or inbreeding) will eventually split a heterozygous population into a series of completely homozygous lines. The frequency of A/A lines among the homozygous lines will be equal to the frequency of allele A in the (more...)
Suppose that a population is founded by some small number of individuals who mate at random to produce the next generation. Assume that no further immigration into the population ever occurs again. (For example, the rabbits now in Australia probably have descended from a single introduction of a few animals in the nineteenth century.) In later generations, then, everyone is related to everyone else, because their family trees have common ancestors here and there in their pedigrees. Such a population is then inbred, in the sense that there is some probability of a gene’s being homozygous by descent. Because the population is, of necessity, finite in size, some of the originally introduced family lines will become extinct in every generation, just as family names disappear in a closed human population because, by chance, no male offspring are left. As original family lines disappear, the population comes to be made up of descendants of fewer and fewer of the original founder individuals, and all the members of the population become more and more likely to carry the same alleles by descent. In other words, the inbreeding coefficientF increases, and the heterozygosity decreases over time until finally F reaches 1.00 and heterozygosity reaches 0.
The rate of loss of heterozygosity per generation in such a closed, finite, randomly breeding population is inversely proportional to the total number (2N) of haploid genomes, where N is the number of diploid individuals in the population. In each generation, 1/2N of the remaining heterozygosity is lost, so
where Ht and H0 are the proportions of heterozygotes in the tth and original generations, respectively. As the number t of generations becomes very large, Ht approaches zero.
Balance between inbreeding and new variation
Any population of any species is finite in size, so all populations should eventually become homozygous and differentiated from one another as a result of inbreeding. In nature, however, new variation is always being introduced into populations by mutation and by some migration between localities. Thus, the actual variation available for natural selection is a balance between the introduction of new variation and its loss through local inbreeding. The rate of loss of heterozygosity in a closed population is 1/2N, so any effective differentiation between populations will be negated if new variation is introduced at this rate or higher.
Mutations are one of the fundamental forces of evolution because they fuel the variability in populations and thus enable evolutionary change. Based on their effects on fitness, mutations can be divided into three broad categories: the ‘good’ or advantageous that increase fitness, the ‘bad’ or deleterious that decrease it and the ‘indifferent’ or neutral that are not affected by selection because their effects are too small. While this simplistic view serves well as a first rule of thumb for understanding the fate of mutations, research in recent decades has uncovered a complex web of interactions. For example, (i) the effects of mutations often depend on the presence or absence of other mutations, (ii) their effects can also depend on the environment, (iii) the fate of mutations may depend on the size and structure of the population, which can severely limit the ability of selection to discriminate among the three types (making all seem nearly ‘indifferent’), and (iv) mutations' fate can also depend on the fate of others that have more pronounced effects and are in close proximity on the same chromosome.
A major theoretical goal in the study of the population genetics of mutations is to understand how mutations change populations in the long term. To this end, we have to consider many features of evolution and extant populations at both the phenotypic and molecular level, and ask how these can be explained in terms of rates and kinds of mutations and how they are affected by the forces that influence their fates.
We have increasing amounts of information at our disposal to help us answer these questions. The continuous improvement of DNA sequencing technology is providing more detailed genotypes on more species and observations of more phenomena at the genomic level. We are also gaining more understanding of the processes that lead from changes at the level of genotypes through various intermediate molecular changes in individuals to new visible phenotypes. Use of this new knowledge presents both opportunities and challenges to our understanding, and new methods have been developed to address them.
Brian Charlesworth has been at the forefront of many of the developments in the population genetics of mutations, both in the collection and analysis of new data and in providing new models to explain the observations he and others have made. This themed issue of Phil. Trans. R. Soc. B is dedicated to him to mark his 65th birthday. The authors of the accompanying papers have individually made important contributions to the field and have been directly associated with or indirectly influenced by his work.
In this collection of papers, various aspects are considered in detail, and in this introduction, we aim to provide an overview as a basis for the in-depth treatments that follow. We outline some of the theories that serve as the quantitative basis for more applied questions and have been developed with the main aims of: (i) measuring the rates at which different types of mutations occur in nature, (ii) predicting quantitatively their subsequent fate in populations, and (iii) assessing how they affect some properties of populations and therefore could be used for inference. The subsequent papers are broadly arranged in a continuum from specific questions of basic parameter estimation (strength of mutation, selection, recombination), via those that contribute a combination of biological theories and data on these parameters, to those which mostly address broader biological theories.
There is an enormous range of mutational effects on fitness, and wide differences exist in the strength of other evolutionary forces that operate on populations. This generates an array of complex phenomena that continues to challenge our capacity to mechanistically understand evolution. To make problems tractable, theoreticians have divided the parameter space into smaller regions such that specific simplifying assumptions can be made. These typically comprise assuming the absence of particular events (e.g. no recombination) or the presence of particular equilibria (e.g. mutation-selection balance). Subsequently, new theories are often developed in which these assumptions are relaxed so as to narrow the gap to reality, typically including more interactions between various evolutionary forces, albeit at the cost of becoming less tractable to analysis.
The dynamics of mutations are dominated by chance, yet we search for general principles that are independent of particular random events. This tension is reflected in the models used. All mutations start out as single copies and most are lost again by chance, so we can at best predict probabilities of particular fates; but the stochastic models that can deal rigorously with randomness are often too complex to analyse for realistic scenarios. If we are interested only in the mean outcome of many individual random events, we may approximate the process by deterministic models that predict a precise outcome; but these approximations can break down if only few individuals or rare events are involved.
To facilitate concise descriptions, there is a long history in population genetics of using mathematical symbols as abbreviations for various parameters and observations, but unfortunately there is no unique nomenclature. To try to meet our two conflicting goals of conciseness and readability, we list some important evolutionary parameters and their common abbreviations in table 1. Even so, for good reasons of history or local convention, some of these symbols are defined differently in some papers in this collection.
Naturally, a review of this length cannot cover all aspects of the population genetics of mutations. For example, mutation plays a pivotal part in coalescent theory (Hein et al. 2005) and in the construction of genotype–phenotype maps that are at the core of some efforts to understand adaptive landscapes, which provide a paradigm for understanding many broader aspects of population genetics from the perspective of individual mutations (‘causes cancer or not’), as reviewed elsewhere (Loewe 2009). Here we focus almost entirely on how populations of individuals are changed by large numbers of mutations that have specified effects on fitness.
In §2 of this paper, we discuss what is known about the diversity of mutations, and here and subsequently we refer to other papers in this themed issue that provide more in-depth information. In §3, we review some of the relevant theory in population genetics, starting with (i) simple theories that treat the fate of individual mutations in isolation before turning to more complicated models that consider (ii) linkage, (iii) epistasis, (iv) quantitative genetics approaches, and (v) challenges faced when attempting to integrate all these. Subsequently, we provide an overview of several general questions that have been resolved and others that remain (§4) and finally some conclusions (§5).
Some parameters in the population genetics of mutations*.
It has often been said that mutations are random, a statement that is simultaneously true and false: true because mutations do not originate in any way or at any time that is related to whether their effects are beneficial—one of the central tenets of Neodarwinism; and false because mutations are the result of complex biochemical reactions that result in non-uniformly distributed mutation frequencies, favouring some (random) changes over others.
(a) Types and rates of mutations
Mutations are caused by physical changes to the hereditary material and, because DNA is a long sequence of base pairs organized into physically unlinked chromosomes, there are many possible ways it can change. There are (i) point mutations that change only a single letter and lead to so-called ‘single nucleotide polymorphisms’ in populations, (ii) insertions and deletions of various sizes (also called ‘indels’ if it is difficult to decide which of the two actually happened; these can also lead to ‘copy number variants’), (iii) transpositions that move a sequence from one position to another, and can thereby cause mutations at the boundaries, (iv) inversions of various sizes that change the orientation of a stretch of DNA, (v) chromosome mutations that affect long enough pieces of DNA to become visible under the microscope and might even lead to the loss or duplication of a whole chromosome (also known as non-disjunctions), and (vi) changes in the ploidy level, where a whole copy of the genome is either gained or lost.
A special class of mutations is caused by transposable elements. As reviewed in this themed issue by Lee & Langley (2010), there are various types of these elements that can move around in the genome and can copy, insert or excise themselves, sometimes in response to conditions such as stress. Mechanisms exist to control the frequency of transposition events to limit the damage from resulting deleterious mutations.
At each level, biochemical factors are such that some types of changes occur more frequently than others. For example, in many species there are many more transitions than transversions, the methylation of CpG sites in mammals leads to about tenfold higher mutation rates at these sites (Rosenberg et al. 2003) and the ratio of insertions to deletions can differ among species (Gregory 2004; Grover et al. 2008).
Mutation rates are difficult to measure because the events are so rare that it is like measuring the frequency of needles in haystacks. Historically, this has been accomplished mainly by finding single genes or groups of genes that lead to phenotypic changes that can easily be observed in populations with known descent (Drake et al. 1998) and extrapolating to the level of genomes. As Kondrashov & Kondrashov (2010) point out in their contribution to this issue, the recent advances in post-genomic sequencing technology have led to breakthroughs that now allow direct determination of mutation rates in species with sequenced genomes, work which Charlesworth has stimulated by his developments of theory and to which he has contributed directly (Haag-Liautard et al. 2007). Future work in this area is important because accurate estimates of mutation rates at different sites and in different species can be important for testing alternative theories.
Mutations are frequently classified as non-synonymous or synonymous according to whether or not they change the amino acid sequence, which depends entirely on the function of the mutated base pair. They are easy to recognize and so are frequently used in population genetic tests. They provide useful rules of thumb: e.g. synonymous sites often evolve neutrally or under weak selection and non-synonymous sites are often under strong purifying selection, even if its strength is difficult to quantify.
(b) The distribution of mutational effects on fitness (DME)
What is the distribution of mutational effects on fitness? Charlesworth (1996b) once described this as the first and most difficult question he would ask the fairy godmother of evolutionary genetics. It specifies the probability distribution of selection coefficients for spontaneous mutations of a given genome. Thus, it could be argued that the DME is highly dependent on the genotype; but all organisms are complex functional networks and have ‘learned’ to live with a flux of new mutations, such that powerful normalizing forces might cause the DMEs we can observe today to share important properties. For example, the total of fitness degrading and fitness increasing effects that get fixed might be in equilibrium such that there is no unbounded change in fitness in most populations.
In their paper in this issue, Keightley & Eyre-Walker (2010) show that one of the most robust findings from research on DMEs is that the effects span many orders of magnitude. It is well known that some deleterious mutations are lethal while others appear to be effectively neutral in all population genetic tests, implying that heterozygous selection coefficient s of mutants ranges from −1 (lethal) to more neutral than −10−7 (effectively neutral for some Drosophila species). It is hard to see why any particular range of intermediate selection coefficients should not exist, raising problems for many population genetic theories tailored towards dealing only with mutations of a particular effect. Different types of mutations typically have different DMEs. For example, conservative amino acid changes usually have much smaller effects than frame-shift mutations that disfigure the rest of a protein (Sunyaev et al. 2001).
In diploids, the selection coefficients of heterozygous mutations are modulated by their dominance h. The correlation between h and s is typically strongly negative, which suggests that the properties of underlying biochemical reaction networks are usually pivotal for determining dominance (Charlesworth 1979; Kacser & Burns 1981; Phadnis & Fry 2005).
Of course not all mutations are harmful, and the occasional fitness increasing mutations drive adaptive evolution. In this issue Orr (2010) points out how some intriguing statements can be made about advantageous mutations beyond the fact that they are usually rare and difficult to observe. They include (i) back mutations that occur if a large enough number of slightly deleterious mutations was previously fixed, possibly at a time when the effective population size was smaller (Charlesworth & Eyre-Walker 2007), (ii) compensatory mutations that at least partially repair some harmful effects at the molecular level (e.g. Burch & Chao 1999; Innan & Stephan 2001; Kern & Kondrashov 2004), (iii) quantitative trait mutations that can either increase or decrease the value of a trait with an impact on fitness (e.g. Keightley & Halligan 2009), (iv) resistance mutations that are part of biological arms races between hosts and parasites (Hamilton et al. 1990), and (v) mutations that enable a species to start expanding into a new ecological niche (e.g. Elena & Lenski 2003; Bergthorsson et al. 2007). The frequencies and DMEs of these groups are probably very different and their prediction and estimation are likely to be fruitful fields for further research.
Our knowledge of DMEs has come from laboratory experiments like that of Trindade et al. (2010) and from population genetics approaches as used by Keightley & Eyre-Walker (2010), both of which are reported in this issue. Experimental approaches for inferring DMEs are based on mutation accumulation experiments pioneered by Mukai in Drosophila (Mukai et al. 1972; Keightley & Eyre-Walker 1999; Lynch et al. 1999), and one of the most extensive experiments of this type was completed by Charlesworth et al. (2004). Their strength is in the direct observation of the consequences of mutations that have relatively large effects of around 1 per cent, but they require great care to control for potential confounding factors and mutations of small effects cannot be detected. Thus, many researchers have recently used population genetics models and DNA sequence data to infer DMEs (Loewe & Charlesworth 2006; Keightley & Eyre-Walker 2010). This is done by assuming an analytic DME (e.g. lognormal or gamma), using a population genetics model to predict from the DME observable DNA sequence patterns, and adjusting the assumed DME until it predicts the data. Such DME estimates are most reliable for slightly deleterious mutational effects (|s| slightly above 1/Ne, where Ne is the effective population size). Their limitations have inspired a third approach for estimating DMEs, which builds on what are intended as realistic computational systems biology models to infer the effects of mutations in relatively well understood systems such as a circadian clock or a signal transduction pathway (Loewe 2009). While this approach cannot lead to general statements except by testing many different systems, it may provide more precise statements on the DME of advantageous mutations.
3. Theories on the population genetics of mutations
Important questions to be addressed include (i) predicting the fate of individual mutations such as their fixation probability Pfix and times to loss Tloss or fixation Tfix in a population, (ii) how a given flux of mutations will impact properties of a population such as nucleotide diversity (πA, πS), divergence (KA, KS), survival or the rate of evolution of quantitative traits, (iii) how the fates of different mutations will affect each other, (iv) how quantitative genetic variation is maintained, and (v) the estimation of evolutionary parameters of populations and species from DNA sequence patterns (e.g. recombination rate Ne, etc.).
Theories to investigate some of these questions can be categorized by the complexity of the models assumed and by their general approach: those restricted to single sites, in which all mutations are treated as completely independent of each other; those invoking linkage, in which changes in the frequency of mutations are no longer independent, even if their effects are independent; and those invoking epistasis in which the effects of mutations depend on which others are present.
In each case, the overall effects of mutations can be studied in two ways. The analysis may focus on individuals and their mutations explicitly, track their fate and later summarize the behaviour of many mutations in a population to compute quantities like DNA sequence diversity. Alternatively it may focus directly on quantitative traits in which mutations are not individually identified but considered more implicitly as components of the total effect on either individual or population mean phenotype.
(a) Single site theories
One of the successes of twentieth century population genetics was the development of single site analyses that describe the fate of a mutation at one site independent of all others in the genome (Kimura 1962; Kimura & Ohta 1969a,b; Crow & Kimura 1970). This site may be affected by mutation and back mutation, directional selection and balancing selection, migration and drift, but is assumed to be unaffected by linkage or changes at other loci, including those due to selection. Once the behaviour of a single site is understood, the behaviour of many such sites can be predicted, again assuming they all evolve independently. These models have been used to investigate topics such as codon bias (Bulmer 1991; McVean & Charlesworth 1999), reviewed in this themed issue by Sharp et al. (2010), who consider particularly the factors determining this bias. This approach has also helped estimate the strength of selection on amino acid changes (Loewe et al. 2006).
In the simplest models, few evolutionary forces are assumed to act in the population. For example mutation-selection balance models exclude drift (e.g. Crow & Kimura 1970) and the Neutral Theory excludes selection (Kimura 1983), as in much of the recent work on coalescent theory (Hein et al. 2005). Charlesworth and colleagues expanded Kimura's single site theory of mutation-selection-drift equilibrium to accommodate the presence of back mutations (McVean & Charlesworth 1999). This is an important step towards more realism because the infinite sites assumption of earlier work fails over very long periods of time as back mutations become increasingly probable. It is elegant because the model requires only one additional parameter. Substitution rates (KA/KS) and DNA sequence diversity (πA/πS) can then be predicted under the assumption of a given DME (Loewe et al. 2006) by computing the fixation probabilities Pfix, the resulting fluxes of mutations that take Tfix generations to get fixed or Tloss generations to get lost, and taking genome wide weighted averages over all these classes of mutations.
(b) Linkage theories
More complex models are needed to understand the response to selection in systems where recombination is limited between neighbouring sites.
The long term response and limits to the selective increase of quantitative traits influenced by many loci depend on the proportion of trait-increasing alleles that are fixed by selection. Robertson's (1960) theory of limits, in which loci were assumed to be independent, was extended by including the impact of linkage between pairs of advantageous trait genes (Hill & Robertson 1966). The analysis showed that, especially if one of the loci was at low frequency and had a large effect, it interfered with selection at the other such that the latter's fixation probability was greatly reduced. The authors did not consider new mutants, but obviously should have done so for effects are then most extreme. This observation stimulated the development of what might be called linkage theories, as they describe the consequences of linkage between different sites on the same chromosome (also called the ‘Hill–Robertson effect’ by Felsenstein (1974), a term that stuck; see McVean & Charlesworth (2000), Keightley & Otto (2006), Roze & Barton (2006), Coméron et al. (2008) and Kaiser & Charlesworth (2009)).
The phenomena that result from linkage are complex because a great variety of processes are at work, described by evolutionary parameters that span many orders of magnitude. Thus we cannot point to a single ‘linkage theory’ for a general solution, but instead to a wide range of models developed (table 2). Each describes the consequences of linkage and selection for a particular range of recombination rates and selection coefficients, but we hope that future research will enable these to be combined as special cases of a more comprehensive theory. A common feature is the reduction in Ne as a result of linkage and selection, which can lead to competing explanations for similar phenomena as Stephan (2010) and Barton (2010) show in this issue. The resulting Ne can be used in some of the single site approaches above to predict observables such as the DNA sequence diversity in a population or codon bias.
The magnitude of selection coefficients and recombination rates play pivotal roles in specifying which linkage theory can be applied. As we discuss below, some can be ordered on a continuum according to the selection coefficients of the most strongly selected mutations that they consider. More work on combining them is needed to deal with the findings of extreme diversity of mutational effects and recombination rates. In his paper in this issue McVean (2010) describes how recent experimental and population wide DNA sequence analyses show that previous rather uniform estimates of recombination rates can stem from averaging over very broad scales and that the majority of recombination events, at least in humans, can occur in hotspots.
(i) Background selection
Most mutations for which selection is effective are strongly deleterious (e.g Keightley & Eyre-Walker 2010; Trindade et al. 2010). Therefore, we start with the theory of background selection (BGS), developed primarily by Charlesworth et al. (1993a, 1995, 1996a), and reviewed here by Stephan (2010). It is based on the reduction in Ne at a neutral site that results from the deterministic selective removal of strongly deleterious mutations at linked sites. Kimura's single site mutation-selection-drift analysis shows that deleterious mutations with effects beyond a certain size have no realistic chance of ever getting fixed, but the process of their removal generates BGS. For deterministic removal, the selection coefficient s needs to be only slightly larger than 1/Ne in recombining populations (Nordborg et al. 1996). In the absence of recombination, the situation is more complicated, because it also depends on the number of deleterious mutations that simultaneously hit the non-recombining region. The critical s can be up to several orders of magnitude larger for some realistic cases; more precise statements require determining s by finding the corresponding location of the ‘wall of BGS’ using Muller's ratchet (MR) theory (Loewe 2006).
(ii) Muller's ratchet
If the mutational effects are only slightly deleterious or there are too many mutations that simultaneously segregate in a population, deterministic purifying selection can be overwhelmed by random effects, leading to a ‘slow regime’ of mutation accumulation, where fixations of slightly deleterious mutations occasionally happen by bad luck. If such events are repeated over long periods of time, they can have substantial consequences on fitness and may degrade the corresponding genetic system. Examples are seen in Y-chromosomes (Charlesworth 1978; Charlesworth & Charlesworth 2000), or in the predicted genomic decay of a population (Lynch et al. 1995b; Loewe 2006). Much research on this scenario has concentrated on cases where no recombination occurs, as described by the classical theory of MR on which there is an extensive literature (Loewe 2006). Classically, MR describes the inevitable accumulation of slightly deleterious mutations that results from solely deleterious mutation pressure in the absence of any recombination that could repair some of the damage by recombining mediocre genotypes into good and bad ones. The unexpected stochastic loss of individuals with the least mutations causes mutation accumulation (Muller 1964; Felsenstein 1974), and diffusion theory has been used to estimate the relevant rates (Stephan et al. 1993; Gordo & Charlesworth 2000a,b; Etheridge et al. 2009). We consider the essence of MR to be in the deleterious mutation accumulation caused by the random loss from the population of the least loaded class of individuals. Hence, it would be possible to explore models of MR with rare recombination where deleterious mutations are still accumulated, but the decay of fitness is effectively reduced by processes that occasionally produce a less loaded class (e.g. recombination, back mutations or advantageous mutations). Indeed Charlesworth et al. (1993b), has explored related scenarios with low levels of recombination under the label ‘mutation accumulation’ and has also used the classical theory of MR without recombination to quantify processes that can degenerate Y-chromosomes (Charlesworth 1978; Charlesworth & Charlesworth 2000; Gordo & Charlesworth 2001).
If selection coefficients are further reduced with all else being equal, mutations will accumulate faster, eventually switching from the slow regime to the ‘fast regime’, where mutation accumulation is the norm rather than the rare exception. This represents a qualitative change from the slow regime above in which the removal of all deleterious mutations is expected and mutation accumulation occurs as a result of rare random failures. Not so in the fast regime, where values of the evolutionary parameters (e.g. U, s, re, see table 1) are such that the expected critical number of optimal genotypes falls below one and any move of the population towards equilibrium will also bring the best existing genotype closer to extinction or push it over the edge. While the resulting mutation accumulation is still a random process, it is now deterministically unavoidable and hence it has also been called ‘quasi-deterministic’ (Gordo & Charlesworth 2001). This regime was first described explicitly in the context of the classical theory of MR (Gessler 1995) and subsequently in settings that also allow for advantageous mutations (Desai & Fisher 2007; Rouzine et al. 2008). It is also relevant for the weak-selection linkage theory developed under the label ‘weak-selection Hill-Robertson effect’ (McVean & Charlesworth 2000), which is a special case of an extended theory of MR that describes the accumulation of slightly deleterious mutations under a wide range of conditions that include the potential for recombination and beneficial mutations. Special cases of this MR theory have already been developed under various labels and are important for understanding patterns of DNA diversity and divergence in regions of low recombination (McVean & Charlesworth 2000; Coméron et al. 2008; Kaiser & Charlesworth 2009) and the evolution of sex (Keightley & Otto 2006) as also discussed by Barton (2010).
(iii) Effective neutrality
If the strength of purifying selection is even weaker, Ohta's nearly neutral theory (NN) (Ohta 1992) and in the extreme Kimura's neutral theory become increasingly relevant. The upper limit to effective neutrality is at about |Nes| ≈ ¼, above which the dynamics differ substantially (Kimura 1983). If mutations with smaller effects are linked, they will still ‘interfere’ with each other's fixation by genetic drift, as fixation only happens in bundles, but they no longer interfere with their selection, which is effectively absent. As it is unlikely that mutations occur where s = 0 exactly, a large class of NN mutations must have very slightly increasing effects on fitness because, once a significant fraction of sites are fixed by genetic drift for mildly deleterious mutations, an appreciable rate of advantageous back mutations occur. Eventually an equilibrium κ is expected that characterizes the mutational flux between these types of sites and that is governed by mutational biases (McVean & Charlesworth 1999). It is hard to see how these mutational fluxes could be so skewed that the ratio of possible deleterious to advantageous mutations becomes as large as in the rest of the genome, where the fixation of advantageous mutations provides ample opportunities for deleterious mutations to occur. A consequence of the ‘blind’ mutational fluxes under NN is that many deleterious mutations can be fixed if Ne decreases for any reason (Kondrashov 1995). Correspondingly, the rate of adaptive substitutions increases after a population expansion and thus strongly depends on the history of Ne (Charlesworth & Eyre-Walker 2007). Thus, historic bottlenecks might increase the apparent rate of adaptive substitutions: if a reduction of Ne increases the fraction of sites occupied by mutations that otherwise would have been effectively deleterious, then as Ne increases again, back mutations can lead to subsequent waves of fixation of advantageous mutations, potentially affecting the interpretation of estimates of the fraction of adaptive substitutions in an evolutionary line. Such reasoning is based on assuming that the DME is not bimodal and there are enough sites for each relevant mutational effect.
(iv) Fixing beneficial mutations
An increase in advantageous selection coefficients leads to two important effects. First, fixation probability increases (Pfix ≈ 2s for a single site), although an excessive supply of advantageous mutations with little recombination can lead to diminishing returns as mutations compete with each other instead of with the inferior wild-type (de Visser et al. 1999). Second, if such mutations occur regularly, equilibrium DNA diversity is affected because advantageous mutations have an increased chance of fixation so more of them stay as polymorphisms in the population on their way to fixation, even if the majority is still lost eventually. As mutations with large s are presumably rare, many properties of the population will be affected only temporarily and no stable equilibrium occurs. The dynamics of these processes merit further study and indeed important insights have recently been achieved (Stephan et al. 2006). Various specific models have been developed to describe how advantageous mutations interact with each other and with other mutations. Models of clonal interference (CI) have been used to understand the evolution of bacteria as shown by Sniegowski & Gerrish (2010) in this themed issue. CI occurs when various asexual clones with different advantageous mutations compete against each other for fixation, for example in cells with different successful adaptations of metabolism to a new carbon source. To incorporate low levels of recombination, models of interference selection (IS) have been developed to explain how weak but frequent adaptive evolution might shape the features of genomes that recombine rarely, using Drosophila as an example (Coméron & Kreitman 2002; Coméron et al. 2008). Such frequent advantageous mutations might come from codon-bias adaptation, and their fixation could be eased if recombination rates increase through the presence or lengthening of introns. Related models also help in understanding the origin of sex (Roze & Barton 2006), as also discussed by Barton (2010). Finally models of Hitchhiking (HH) deal with strong selective sweeps that occur once or repeatedly, such as could be generated by adaptations to parasites or changing environments (Hamilton et al. 1990). These usually drag to fixation a multitude of linked mutations with weak or no effects (Maynard Smith & Haigh 1974). HH causes a reduction of Ne, particularly in regions of low recombination rates, but as Stephan (2010) discusses further, it is difficult to distinguish this from the reduction of Ne due to BGS, despite the vast difference in the underlying selection. Much future work will be required to properly quantify the various contributions of the theories above to patterns of genetic diversity.
(c) Epistatic interactions between mutational effects
In the models discussed above, all mutational effects are assumed to be mutually independent, whereas a wide range of interactions, or epistasis, between them have been reported frequently. Such findings have stimulated the desire to build more realistic models of evolution which incorporate epistasis (Wolf et al. 2000; Phillips 2008). There are several types of epistasis and the literature is littered with non-intuitive adjectives intended to describe them (table 3). Ultimately, they are defined by the values of related fitness effects, so precise definitions have to be checked for each study. Table 3 provides definitions using a multiplicative model to define the absence of epistasis (i.e. WAB = WAWB/W); but an additive model (WAB = WA + WB − W), a log transform of the multiplicative case, could have been used. The definitions apply to haploids, and further complexities can be introduced when considering diploid genotypes, multi-locus models or longer sequences of mutations. When thinking about epistasis, we can either consider consequences for fitness values (e.g. ‘negative’/‘positive’ epistasis) or the relative size of mutational effects (e.g. ‘synergistic’/‘antagonistic’ epistasis), but unfortunately the mapping between these terms is not identical for positive and negative mutations (table 3). Next we consider some categories of epistatic interactions and some of their potential implications.
Synergistic epistasis: the absolute effect of the second mutation is bigger than that of the first. At the extreme, a long series of such mutations would eventually increase the effects of slightly deleterious mutations on fitness to a degree that may stop them accumulating (Kondrashov 1994). However, synergistic epistasis cannot stop MR if some of the mutations have very small effects (Butcher 1995). Synergistic epistasis could also play a role in the evolution of sex, if genomic mutation rates are high enough (U > 1; Kondrashov 1988; Haag-Liautard et al. 2007; Barton 2010).
Antagonistic epistasis: the absolute effect of the second mutation is smaller than that of the first. In this case, at the extreme, a long series of such mutations in a population with sufficiently high fitness would eventually lead to an effectively neutral rate of mutation accumulation with no further measurable fitness changes. In more realistic settings and on a more immediate timescale, antagonistic epistasis may counter the overall increases in the strength of selection that are caused by synergistic epistasis.
Epistasis that limits the paths of evolution: if mutation A is deleterious and mutation B is beneficial, but the combination AB is even more beneficial, then the evolutionary path towards AB can only proceed in the sequence B → A (and not A → B, unless random drift overrides selection). This has been termed ‘sign epistasis’ and was found in the context of the protein evolution that leads to some antibiotic resistance (Weinreich et al. 2005, 2006; Poelwijk et al. 2007). By comparing mutations fixed in homologous proteins in different species, Kondrashov et al. (2002) estimated that sign epistasis affected about 10 per cent of amino acid substitutions.
Epistasis that generates selective valleys: this occurs when two deleterious mutations A and B are beneficial if they appear together, either because B compensates for A (e.g. by restoring base-pairing in RNA structures (Parsch et al. 1997; Innan & Stephan 2001)) or because A and B traverse a fitness valley that leads to another local optimum with a different fitness. An assumption of Wright's Shifting Balance Theory of evolution is that such epistatic interactions are common; but evidence for this theory is lacking (Coyne et al. 1997, 2000), even though compensatory evolution may play an important role in molecular evolution (Kimura 1985). Further work is required on the frequency and depth of selective valleys.
The complexity and interdependence of biological systems are such that interactions among gene products are an essential requirement for life. Yet most population genetic analysis, including that relating to mutations, has proceeded using either multiplicative or additive non-epistatic models, which differ little unless effects are large or an additive model leads to negative values (by ‘fixing’ too many deleterious mutations). These models are attractive because they are simple to use, do not require extensive lists of (usually unknown) parameters, and offer the potential opportunity to obtain general results that do not depend on specific parameters.
The question is therefore the extent to which theories based on the simplification of non-epistatic models are relevant in nature. In this issue Crow (2010) concludes that many models of evolution do not depend on the magnitude of epistasis. Although there are potentially many epistatic terms describing many loci, these are likely to contribute very little if the functional relationship between genotype and phenotype is continuous. They are thus hard to measure except for genes of large effect. Molecular biologists may be seeing so many epistatic interactions in part due to ascertainment bias: to elucidate molecular pathways, genes may be knocked out, thereby highlighting potential interactions within the system investigated.
Analyses in various contexts have shown that multiplicative/additive models provide adequate descriptions of variance components and other properties for many complex traits, including complex heritable diseases (Keightley & Kacser 1987; Risch 1990; Hill et al. 2008; Slatkin 2008). For example, many current evolutionary rates depend on additive genetic rather than epistatic components, and Kimura's analysis of pseudo-linkage equilibrium shows that even in the presence of epistasis, rates of evolution stabilize to those described by additive variances (Kimura 1965; Nagylaki 1993; Crow 2010).
Epistasis has been included in population genetic models mainly by using simple approaches, typically as a mathematical function without knowledge of the underlying complexity. For example, synergistic epistasis has been defined by a quadratic function of numbers of deleterious alleles (Charlesworth 1990; Dolgin & Charlesworth 2006). There is a long debate in population genetics about whether there is more synergistic or antagonistic epistasis and hence on the shape of the functions. Unfortunately, it has been difficult to determine from experiments which dominates, because measuring enough mutational interactions at sufficient accuracy is so much work and because there is a large variability in epistatic effects, with about as many synergistic as antagonistic effects found. This wide distribution of epistatic effects (Elena & Lenski 1997; Segre et al. 2005; Sanjuan & Elena 2006) leads to qualitatively different results for many models that include only one type of epistasis or none: for example, a model of slightly deleterious mutation accumulation could no longer rely on synergistic epistasis to stop new mutations getting fixed. More work is needed to explore the consequences of realistic systems of epistatic interactions on long-term evolution.
Where next? It may be possible to explore epistatic interactions using computational systems biology models, as described in a recent framework for evolutionary systems biology, that could help construct genotype–phenotype maps (Loewe 2009). Cheap automatable simulations on a massive scale for a wide range of parameters then become possible, once a system has been properly modelled. Nevertheless, such results will apply only to the particular model being investigated; and the high levels of complexity might render intractable any such models that have too many interactions and too many unknown parameters with potentially large effects.
Even if multiplicative/additive models are not refuted by the data, work on epistasis is important for understanding the population genetics of mutations, particularly to address questions about long-term evolution. Each species presumably lives on or near some local adaptive peak; epistatic interactions determine how many ways there are to evolve between peaks. An increased understanding of the fitness consequences of prolonged mutation accumulation may help to predict risks to endangered species. Increases of mutation rates that may be caused by use of technology by humans might dangerously upset potentially finely balanced natural long-term equilibria between fitness increasing and fitness decreasing processes that are influenced by many evolutionary factors including epistasis (Loewe 2006).
(d) Analysis at the level of quantitative traits
Parameters for quantitative traits such as means, variances and correlations among relatives can be estimated from phenotypic data, but provide little information about the underlying genotypes or the distribution of their effects on traits and on fitness. The same parameters can also estimate the impacts of mutations, but with similar limitations, although new techniques increasingly enable individual mutants to be identified, their effects on traits to be measured, and their mode of action to be determined. There is no all-embracing theory of mutation for quantitative traits; rather there are models that explain to some degree the observed phenomena.
A fundamental parameter in models of quantitative traits is the rate of increase in genetic variation due to mutation per generation. It can be estimated from mutation accumulation experiments starting from an inbred or isogenic base, for which the increase in variability among unselected lines or the response to artificial selection is measured. The result is typically scaled as the ‘mutational heritability’ VM/VE the ratio of the increment VM in the genetic variance per generation to the environmental variance VE (table 1). Estimates of this quantity (summarized by e.g. Houle et al. 1996; Keightley & Halligan 2009) are typically in the range 0.0001–0.01, centred on VM/VE ≈ 0.001. They are similar for different traits and species, with some indication of an increase with generation time. For a typical trait with a coefficient of variation (CV) of 10 per cent, this represents an increment in CV of about 0.3 per cent per generation. How the mutational variance is controlled and why it is rather homogeneous is not understood at any mechanistic level, but presumably reflects past conflicting evolutionary pressures.
The distribution of effects of natural mutations on quantitative traits can be assessed from mutation accumulation studies, but information is limited because small effects are hard to detect. Inferences depend on the assumptions made about the distribution, which appears to be less leptokurtic than the exponential, implying that for many traits mutations of moderate effect are not rare, in contrast to the more leptokurtic distributions found for fitness in such experiments (Keightley & Halligan 2009). In selection experiments, mutants of large effect are frequently detected for quantitative traits, many with strongly deleterious effects on fitness (e.g. López & López-Fanjul 1993). Effects of mutations on the mean can be highly asymmetric, as exemplified by observed excesses of fitness-decreasing mutations.
More complete information can be obtained from insertional mutagenesis experiments which enable individual mutants to be identified and their effects on any number of traits to be estimated (Mackay et al. 1992; Mackay 2009). In this issue Mackay (2010) summarizes results and shows that the mutants have a wide range of effects, typically affect many traits at the same time (pleiotropy), often influence fitness directly (for example through reduced viability), and usually modulate the effects of other mutations (epistasis).
Mathematical models used in the analysis of the effects of mutations on quantitative traits are usually simplistic, in part so they are tractable and in part due to lack of detailed information. There is therefore some gap between the modelling and the real world but, as Crow (2010) illustrates, approximations such as an assumed lack of epistasis may not be critical. Linkage is also typically ignored, partly justified because segregating mutants relating to a trait are probably scattered across the genome.
A point of reference is an additive model where mutants are assumed to be neutral with respect to fitness. The variance within populations stabilizes at 2NeVM, and, assuming a random walk model, the rate of population divergence due to drift each generation between unselected lines approaches 2VM (Lynch & Hill 1986). Under directional selection, if mutant effects on the trait are infinitesimally small or have a symmetric distribution around zero, the rate of response of the trait is proportional to the genetic variance at equilibrium, 2NeVM, independent of the selection intensity (Hill & Keightley 1988).
The standing genetic variance observed within populations is not simply proportional to Ne, however, and must depend on the influence of selection. A major activity in theoretical research has therefore been to assess the roles of mutation, selection and other factors in explaining the high levels of variation maintained in quantitative traits in natural populations. One class of model is based on a balance between mutation and stabilizing selection, whereby fitness depends on the phenotype for the trait and mutants are at a selective disadvantage because they cause deflection from the optimum. If selection is assumed to act solely on the target trait, and it is determined by few loci, the predicted variance maintained is less than observed for a ‘typical’ strength of stabilizing selection (Turelli 1984). One inadequacy of this model is that stabilizing selection acting on other traits through pleiotropic effects is ignored, for with pleiotropy more loci are likely to influence the trait but the aggregate selection is stronger.
In an alternative model the mutant's effects on fitness are assumed not to depend at all on its effect on the target trait, but only through other pleiotropic effects. Under this purely pleiotropic model high levels of genetic variance can be maintained, but population means are unstable and there is little apparent stabilizing selection (Keightley & Hill 1990). While variants of these models still do not fully explain both the stabilizing selection and the genetic variances observed (Bürger 2000; Johnson & Barton 2005; Zhang & Hill 2005), comprehensive surveys indicate that the strength of stabilizing selection on any trait may be much weaker than has been assumed (Kingsolver et al. 2001). Thus there remains much uncertainty about the mechanisms whereby levels of genetic variation in quantitative traits are maintained and to what extent forces are involved that maintain heterozygosity other than by a mutation-drift-selection balance.
Recent genome-wide association studies indicate that variation maintained in quantitative traits is contributed by many segregating loci. For example almost 50 loci have been shown to contribute to standing variation in height (Weedon & Frayling 2008), yet together these contribute only about 5 per cent of the variance of this highly heritable trait. There must therefore be many more segregating, each contributing a very small amount of genetic variance. Similarly, multiple pleiotropic contributing factors explaining only part of the variation were inferred for a disease trait, schizophrenia susceptibility (Purcell et al. 2009). Such analyses miss genes of very low frequency, however, even if they have a large effect, and it has been argued that deleterious rare variants are contributing much of the variation in disease (Goldstein 2009). Incorporating such data into models for maintenance of variation will itself be a challenge.
Substantial changes in quantitative traits solely from the input of mutations can be achieved in laboratory experiments from isogenic base populations, and such is the case, for example in Escherichia coli (Lenski & Travisano 1994), D. melanogaster (e.g. Caballero et al. 1991; Mackay et al. 1994) and mice (Keightley 1998). Because of sampling, however, the patterns of response can be quite erratic, particularly if some mutations have a large effect on the trait. Molecular analysis of long term selection experiments started from an isogenic founder also provides a way to identify the mutations which contributed to the response and to assess their pleiotropic effects and interactions among them (Barrick et al. 2009).
Over very long time periods, divergence among unselected lines has tended to increase more slowly than neutral predictions (Mackay et al. 1995). Similarly, over experiments spanning hundreds of generations or more, such an attenuation of responses have been observed in selection lines started from an isogenic base in E. coli (Lenski & Travisano 1994; Elena & Lenski 2003) and Drosophila (Mackay et al. 2005). Such a plateau can occur because there is a limited potential number of functional alleles and no new useful mutations occur, or because back mutations dominate, or because fitness effects are limiting, e.g. segregating mutants with highly deleterious pleiotropic effects on fitness (López & López-Fanjul 1993). Continuing rapid responses in animal breeding programmes show that mutation plays an increasing role in long continued selection, and also that the influence of unfavourable pleiotropic effects on fitness traits can be minimized by direct selection on such traits (Hill 2010).
A topic of concern is the potential rate of deterioration of the fitness of small populations in the presence of recurrent mutations and the consequent risk to its survival (Lynch et al. 1995a). Calculations depend critically on the effects and rates of mutation (deleterious mutations increase risk, particularly those of intermediate s, while advantageous mutations reduce risk), with risk increasing if population size is small, recombination rate is low, and the population is fragmented (for additional factors see Loewe 2006). Most of these dependencies are not linear and interact with each other. To predict the evolution of fitness in these complex systems and how the survival of species might be influenced by increases in mutation rates caused by humans is a major task.
(e) The future challenge: integrating and testing theories
Many simple theories have been developed and a major challenge for future research is to integrate them with the aim of accurately predicting the behaviour of systems that lie near boundaries. These can be very interesting biologically but be poorly predicted as their parameter combinations stretch the assumptions of locally good models (e.g. the transition between regimes of MR (Loewe 2006)).
We need to know how all relevant processes interact to influence mutation accumulation and patterns of diversity. To do so, it is important to integrate the linkage theories that deal with different mutational effects (table 2), as these usually span many orders of magnitude in real organisms. Charlesworth has contributed to such integration by considering the effects of BGS in combination with MR (Gordo & Charlesworth 2001) and investigating a wide range of selection coefficients in simulation models with back mutation and recombination (McVean & Charlesworth 2000; Kaiser & Charlesworth 2009). In principle, general multi-locus models provide a way of integrating all the aspects of evolution discussed above (Kirkpatrick et al. 2002), but these need large numbers of parameters to be specified. This problem affects all complex models and provides motivation to develop simple models, but these may be of limited reliability when extrapolating beyond testable conditions. Massively parallel computer simulations could help by facilitating the forward simulation of complex models, highlighting by parameter sensitivity analyses those parameters that are most worthwhile to determine experimentally, and estimating parameters via Approximate Bayesian Computation so as to avoid the need to know the likelihood function of complex models (Beaumont & Rannala 2004). The cumbersome experimental work needed for estimating sensitive parameters with precision implies that such modelling work can benefit greatly from being based on a model organism. Brian Charlesworth has always maintained that Drosophila is ideal in this respect and the large body of research on fruit flies in the last 100 years supports this view. Papers in this themed issue, including those by Hughes, Lee & Langley, Mackay, McDermot & Noor, and Stephan, highlight some of the power of the Drosophila model.
Types of epistasis, compared to a multiplicative model*.
4. General questions and applications
An insight into the population genetics of mutations can contribute to a deeper understanding of many practical and theoretical challenges that we face today. Here we highlight just some of them.
The extent to which deleterious, (near)-neutral or advantageous mutations shape DNA diversity (Stephan 2010), codon bias (Sharp et al. 2010) and repetitive element distribution (Lee & Langley 2010) is fundamental to our understanding of genomic structure, and to develop this beyond models of individual loci often requires understanding how much mutations interact epistatically (Crow 2010; Lee & Langley 2010; Mackay 2010).
Many mutations reduce fitness too much to accumulate in a population, but some maladaptive DNA changes have effects small enough to spread (Keightley & Eyre-Walker 2010). A species could be driven to extinction by their accumulation unless there are sufficiently many adaptive mutations. Although other important non-genetic processes (e.g. habitat fragmentation, hunting, pollution) and genetic factors (e.g. mutation, linkage, inbreeding) can contribute to species extinction, one can argue that extinctions are always caused by a lack of mutations that enable adaptation to new or rapidly changing situations. The nature of adaptive mutations is therefore important (Orr 2010). Indeed a finely tuned long-term balance may exist between fitness decreasing and fitness increasing processes, as reducing fitness decreasing processes (like DNA replication errors) becomes exceedingly costly. Since a large fraction of mutations is deleterious, any anthropogenic increase in mutation rates from mutagenic pollution will have manifold debilitating effects. These include obvious ones such as Mendelian genetic diseases and cancers, but also many more with a smaller impact that escape natural selection with potentially disastrous long-term consequences. Therefore, a better understanding of related long-term processes is important for developing reasonable policies that limit the release of mutagenic chemicals into the environment.
Evolutionary models are important for understanding a range of problems fundamental to biology and to other applications to health and welfare. For example, in this themed issue Hughes (2010) discusses the role of mutation in the evolution of ageing. This is a controversial subject on which Charlesworth (2000) has made significant contributions. Models of the evolution of pathogenic microbes (Sniegowski & Gerrish 2010) and their mutations (Trindade et al. 2010) are important for medical applications, such as optimizing the use of antibiotics to minimize resistance evolution and developing vaccines that might anticipate and neutralize simple evolutionary changes a pathogen is expected to produce. The wealth of data that can be obtained for these systems makes them attractive subjects for basic research on evolution. Some mutations that would be deleterious in natural populations provide the opportunity for improvement of crops and livestock in the farm environment. Understanding their pleiotropic effects, for example, is fundamental to long-term increase in food production.
At the fundamental level, broad questions such as the origin of species and their extinctions are influenced by the accumulation of mutations, and various models have accordingly been developed. In this issue, McDermott & Noor (2010) review recent advances regarding a particular mechanism of speciation, but there are many others (Coyne & Orr 2004). Long-term models of evolution, involving mutation, selection and chance that are required to answer such questions are also required to address the evolution of sex (Barton 2010), an example of a deceptively simple problem requiring a deep analysis.
We have provided an overview of the nature of mutations and theories that describe their fate once they have entered a population. Much work remains to be done, however, in order to integrate existing theories more fully and to better understand their implications. We have shown that such work is important for questions of practical interest, such as how fast species can adapt to new environments, how genetic factors can contribute to their extinction and what consequences follow from man-made technology driven increases in mutation rates that may unintentionally increase genetic diseases in humans as well as threaten the survival of endangered species. Mutations, however, also provide the raw material for the improvement of plants and animals for food production, and we need to know how best to use them. The population genetics of mutations is undoubtedly central to many theoretical and applied questions in biology.