Bottleneck main page
BOTTLENECK : A program for detecting recent effective population size reductions from allele data frequencies
Sylvain PIRY(1), Gordon LUIKART(2) and Jean-Marie CORNUET(1)
Principle : Populations which have experienced a recent reduction of their effective population size exhibit a correlative reduction of the allele numbers and heterozygosities at polymorphic loci. But the allelic diversity is reduced faster than the heterozygosity, i.e. the observed heterozygosity is larger than the heterozygosity expected from the observed allele number were the locus at mutation-drift equilibrium. Strictly speaking, this has been demonstrated only for loci evolving under the Infinite Allele Model (IAM) by Maruyama and Fuerst (1985). If the locus evolves under the strict Stepwise Mutation Model (SMM), there can be situations where this heterozygosity excess is not observed (Cornuet and Luikart 1996). However, few loci follow the strict SMM, and as soon as they depart slightly from this mutation model towards the IAM, they will exhibit an heterozygosity excess as a consequence of a genetic bottleneck.
In a population at mutation-drift equilibrium (i.e. the effective size of which has remained constant in the past), there is approximately an equal probability that a locus shows an heterozygosity excess or an heterozygosity deficit. To determine whether a population exhibits a significant number of loci with heterozygosity excess, we proposed three tests, namely a "sign test", a "standardized differences test" (Cornuet and Luikart 1996), and a "Wilcoxon sign-rank test" (Luikart et al., 1997a). We also proposed a descriptor of the allele frequency distribution ("mode-shift" indicator) which discriminates many bottlenecked populations from stable populations (Luikart et al, 1997b).
Description : The program BOTTLENECK computes for each population sample and for each locus the distribution of the heterozygosity expected from the observed number of alleles (k), given the sample size (n) under the assumption of mutation-drift equilibrium. This distribution is obtained through simulating the coalescent process of n genes under two possible mutation models, the IAM and the SMM. This enables the computation of the average (Hexp) which is compared to the observed heterozygosity (Hobs, in the sense of Nei's gene diversity) to establish whether there is an heterozygosity excess or deficit at this locus. In addition, the standard deviation (SD) of the mutation-drift equilibrium distribution of the heterozygosity is used to compute the standardized difference for each locus ((Hobs-Hexp)/SD). The distribution obtained through simulation enables also the computation of a P-value for the observed heterozygosity.
The way in which the coalescent process is simulated is unconventional due to the conditioning by the observed number of alleles. The phylogeny of the n genes is simulated as usual (Hudson, 1990). Under the IAM, a single mutation is allocated at a time and the resulting number of alleles is computed. The process is repeated until the latter reaches the observed number of alleles. Under the SMM, a Bayesian approach is used as explained in Cornuet and Luikart (1996). Briefly, the likelihood distribution of the parameter theta (= 4Neµ) given the number of alleles (k) and the sample size (n) is evaluated as the proportion of iterations (in the simulation process) producing exactly k alleles for a varying set of thetas. As a second step, drawing random values of theta according to the likelihood distribution, the coalescent process is simulated as usual. Only heterozygosities found in iterations producing exactly k alleles are considered.
Once all loci available in a population sample have been processed, the three statistical tests are performed for each mutation model as explained in Cornuet and Luikart (1996) and Luikart et al. (1997a, b) and the allele frequency distribution is established in order to see whether it is approximately L-shaped (as expected under mutation-drift equilibrium) or not (recent bottlenecks provoke a mode shift).
Data file format : Five data file formats are accepted and automatically recognized by BOTTLENECK. All are text files. Two are the GENEPOP and GENETIX formats. The other three formats concern single population data. The first line is a title line. Each following line provides the necessary data for each locus. In all cases, the line starts with the name of the locus followed by the number of alleles (k). In one data file format, the line includes successively the sample size (number of gene copies = n) and the unbiased genic diversity (sensu Nei, 1987). In the second format, the line is completed with the number of copies of each allele. In the third format, the line includes the sample size (n) and the frequency of each allele. All data on the same line are separated by one or more spaces.
Cornuet J.M. and Luikart G., 1997 Description and power analysis of two tests for detecting recent population bottlenecks from allele frequency data. Genetics 144:2001-2014. PubMed query. Please, cite this article if you use Bottleneck.
Hudson R.R., 1990 Gene genealogies and the coalescent process, pp. 1-42 in Oxford Survey in Evolutionary Biology, Vol. 7, edited by D. Futuyama and J. Antonovics. Oxford University Press, Oxford.
Luikart G., Allendorf F.W., Cornuet J.M. and William B. Sherwin, 1997. Distortion of allele frequency distributions provides a test for recent population bottlenecks. Journal of Heredity (Accepted July, 1997)
Luikart G. and Cornuet J.M., 1998. Empirical evaluation of a test for identifying recently bottlenecked populations from allele frequency data. Conservation Biology 12(1):228-237.
Luikart G., 1997. Usefulness of molecular markers for detecting population bottlenecks and monitoring genetic change. Ph. D. Thesis. University of Montana, Missoula, USA.
Maruyama T. and Fuerst P.A., 1985 Population bottlenecks and non equilibrium models in population genetics. II. Number of alleles in a small population that was formed by a recent bottleneck. Genetics 111:675-689.