The term linkage disequilibrium was first introduced back in late 1940's to describe the degree of non-random association between pairs of loci. In the absence of demographic effects that might confound the linkage disequilibrium patterns, LD summary statistics such as r2 can be used to define the level of co-occurrence of alleles at two loci (Hill and Robertson 1968). When r2 is zero, alleles at two loci do not co-occur more frequently than would be expected under random sampling. r2 approaches its maximum of 1 as alleles at two loci show more frequent cooccurrence within the population sample examined. There are various other linkage disequilibrium statistics that can be used for this purpose (Hedrick 1987) all of which aim to estimate the predictive value of a marker locus on another locus that is displaying non-zero LD with it (if LD statistic is zero, two loci examined have zero predictive value for each other).
Association mapping uses these properties of the measures of pairwise LD statistics to infer the predictive value of a marker locus for the association of the chromosomal region it resides with the phenotype. The high-LD chromosomal region around a marker locus defines the predictive range of a certain genetic marker. If LD within this genomic range is complete, any polymorphism within this range will have the same predictive value for the association with the phenotype. Hence, as a result of a significant marker-phenotype association, it can be concluded that the causative polymorphism resides within this high LD region around the marker locus.
With respect to association mapping, the most significant aspect of LD is its predictive properties over the haplotype it resides in. However, the extent of LD (in base pairs) within species and even within individual genomes are highly variable, and therefore most reliably estimated empirically (Long and Langley 1999). Theoretical estimation of the levels of LD for realistic population models that does not satisfy the assumptions of Wright-Fisher model is complex. The hardship is mostly due to the large number of interrelated factors involved in the formation of patterns of LD, including but not limited to genetic drift, population admixture, and natural selection (Pritchard and Przeworski 2001; Wall and Pritchard 2003).
The statistical power of associations is determined by the extent of LD with the causative polymorphism, as well as sample size used for the study (Long and Langley 1999; Wang and Rannala 2005). If LD decays too fast within a region, large number of markers would be required to scan target regions of a genome. On the other hand, if LD decays too slowly, the size of the haplotype blocks would be too large to unambiguously reveal underlying causative locus. In other words, the decay of LD over physical distance in the study population determines the marker density required and the level of resolution that may be obtained in an association study.
There are several summary statistics proposed for estimation of linkage disequilibrium (Hedrick 1987), however the most commonly used summary statistic within the association study framework is known as r2 (Hill and Robertson 1968; Lewontin 1988). Conceptually and mathematically r is the Pearson's (product moment) correlation coefficient of the correlation that describes the predictive value of the allelic state at one polymorphic locus on the allelic state at another polymorphic locus, where r2 is the squared value of correlation coefficient that is also called coefficient of determination. r2 explains the proportion of a sample variance of a response variable that is explained by the predictor variables when a linear regression is performed.
Lewontin's D, is another summary statistic for LD that is commonly used. D describes the difference between the coupling gamete frequencies and repulsion gamete frequencies at two loci. From D a second measure of linkage disequilibrium, that is normalized D' can also be estimated. Even in samples taken from populations at equilibrium under neutrality, variances of linkage disequilibrium summary statistics are typically large but D' has the lowest variance (Hedrick 1987). However, estimation using D' may generate erratic and unreliable results when low frequency alleles or small sample sizes are used for the analysis. It is advised to collapse the alleles using an allele frequency cut-off prior to estimation of linkage disequilibrium statistics D and D'.
Other than these commonly used summary statistics for LD, there are also likelihood-based methods that investigate probability of independence between pairs of sites using two-locus sampling distributions, rather than calculating a summary statistic for LD. These methods, usually referred to as model-based LD
estimators, also provide means of estimating population recombination parameter 4Nc under neutral equilibrium model from nucleotide sequence data (Golding 1984; Hudson 1985; Hudson 2001) or generating other model-based estimates of LD for comparisons with observed patterns (Mueller 2004) under various population structure and demographic history scenarios. Although the estimation of LD through these methods are more computationally intensive compared to the pairwise-LD estimation methods, they are extensively used for evolutionary and population genetic studies as well as investigations on the domestication of various crop plant species (Wright et al. 2005; Wright and Gaut 2005).
Estimating LD from empirical data is a straightforward procedure; however interpretation of results of LD analysis and extrapolation of this information to the genome may be more complex. It is important to estimate the rate of decay of LD with physical distance to be able to extrapolate information gathered from a small collection of sampled loci to the whole genome investigated. This extrapolation is essential for association mapping study design since it may be used for determining the marker density required for scanning previously unexplored regions of the genome as well as determining the maximum resolution that can be achieved for genotype phenotype associations for the study population.
The levels of LD are expected to be highly variable across the genome, due to several factors such as variation in recombination rate and selection. For reliable results, this variation needs to be taken into account when designing experiments to exploit LD. Variation in rate of recombination across the genome is a key factor that contributes to the variance observed in patterns of LD. A number of researchers have focused on the distance at which average r2 is reduced to 0.10, as a reasonable point to conclude there is minimal LD to detect associations with complex traits. The reasoning for this r2cut-off is as follows: in a complex trait a large quantitative trait locus (QTL) may only explain approximately 10% of the phenotypic variation. If a marker only explains 10% of the total QTL variation, then the marker will only explain one percent of the phenotypic variation. Detection of locus effects that cause smaller than 1% phenotypic variation requires exponentially increasing population sizes therefore such small effects would be considered undetectable in a moderate size study population.
Sufficient power for association studies of complex traits requires LD blocks to be defined more strictly for greater LD as well as larger population sizes. Current human genetic studies focus on genome scans aiming for much higher LD (e.g. r 2> 0.80) (Barrett 2006), and are developing haplotype based approaches that can help capture more variants (Pe'er et al. 2006).
Studies on rates of decay of linkage disequilibrium in various plant taxa (Flint-Garcia et al. 2003) such as maize (Zea mays ssp. mays) (Ching et al. 2002;
Palaisa et al. 2003; Remington et al. 2001a; Tenaillon et al. 2002), barley (Hordeum vulgarae) (Caldwell et al. 2004; Caldwell et al. 2006), Arabidopsis thaliana (Nordborg et al. 2002; Nordborg et al. 2005), and sorghum (Sorghum bicolor) (Hamblin et al. 2005) and durum wheat (Triticum durum) (Maccaferri et al. 2005), indicate tremendous variation in the extent of linkage disequilibrium. This variation is mostly due to founder effect followed by genetic drift that leads to unequal number of effective recombinations in species sub-populations. Furthermore, selfing also plays an important role (Nordborg 2000).
The population sample effect is clearly observed in maize, where LD decays within 1kb in land races (Tenaillon et al. 2001), in approximately 2 kb in diverse inbred lines (Remington et al. 2001a) and can extend up to 100 kb in commercial elite inbred lines (Ching et al. 2002). In barley, in a study of four loci Caldwell et al. (2006) shows that LD might extend up to 212 Kb in elite lines while it might decay below r2 = 0.2 within 0.4 kb for the same region in wild lines. In wild barley (Hordeum spontaneum) the results on analysis of LD over 18 loci suggests that LD decay displays a pattern quite similar to that of maize at some loci, that decays below significant levels within 2 kb (Morrell et al. 2005). However, there are a proportion of the loci that show more extensive LD, which may be the result of admixture. In European Aspen (Populus tremula), Ingvarsson (2005) shows that there is substantial variation not only across populations but also across loci, and estimates the range of decay of LD to an expected value of r2 to less than 0.05 within a few hundred basepairs. In a comparison of nine loci across two population samples of loblolly pine (Pinus taeda L.), Gonzalez-Martinez et al. (2006a) shows that the rates of decay of linkage disequilibrium are fast; decays below the level of r2 < 0.2 within 2 kb but is variable and not significantly different for the independent population samples investigated for loblolly pine.
In predominantly selfing Arabidopsis, LD at a key flowering time locus (FRI) extends beyond 250 Kb (Nordborg et al. 2002). However, in large genomic surveys, the decay of LD was reported to be much faster genome-wide: below the level of r2 < 0.2 within about 30 Kb (Nordborg et al. 2005). In another selfing species, soybean (Glycine max), Zhu et al. (2003) studied the patterns of LD in 143 short amplicons that spans approximately 12.5 cM of the genome. The study reports that significant decay of LD was detectable within approximately 2-2.5 cM that roughly equals to 1-1.5 Mb. There are few studies that investigate LD in rice (Orzya sativa) to date; at a disease resistance locus it was reported that substantial LD extends beyond 100 kb (Garris et al. 2003) and even further at the waxy domestication locus (Olsen et al. 2006). For the rice genome, more comprehensive studies are underway.
Was this article helpful?
Are you sick of feeling like the whole world Is spinning out of control. Do You Feel Weak Helpless Nauseous? Are You Scared to Move More Than a Few Inches From The Safety of Your Bed! Then you really need to read this page. You see, I know exactly what you are going through right now, believe me, I understand because I have been there & experienced vertigo at it's worst!