Another critical aspect of a population's recent evolution for understanding and interpreting present-day associations between markers and phenotypes is its demographic history of founder and bottleneck events. Populations with a recent history of founder or bottleneck effects are particularly valuable in quantitative genetic studies for several reasons:
1. Founder and bottleneck effects augment the magnitude and physical extent of likely linkage disequilibria, thereby creating strong associations between markers and phe-notypes.
2. The founder or bottleneck event reduces the overall levels of genetic variation, thereby resulting in a more uniform genetic background. This uniformity is particularly important for traits whose underlying genetic architecture is characterized by epista-sis. A more uniform genetic background converts much of the epistasis, which can appear erratic when the actual causative genes are not being measured, into more stable marginal effects attributable to the remaining polymorphic loci (Cheverud and Routman 1996; Cheverud et al. 1999; Goodnight 1995, 2000).
3. A founder event can increase the frequencies of some genes associated with a phe-notype of interest. It is therefore easier to sample individuals with a phenotype of interest in such founder populations, and the alleles underlying this phenotype are upon a more homogeneous genetic background.
The founder populations most likely to have strong associations with noncausative markers are those that have experienced little or no subsequent gene flow with other populations after the founder event. Postfounder reproductive isolation allows the persistence of the linkage disequilibrium patterns, commonness of a phenotype of interest, and the genetic background homogeneity induced by the original founder or bottleneck event. Moreover, sampling individuals from a relatively isolated population makes it less likely to inadvertently sample individuals from more than one local deme in a subdivided population. Pooling individuals from genetically differentiated demes creates linkage disequilibrium in the total sample that is a function only of the allele frequency differences between demes and is independent of actual linkage relationships (equation 6.3). Disequilibrium that is independent of true linkage undercuts the fundamental biological premise of the marker association analysis and can create spurious associations. Obviously, pooling of individuals from genetically differentiated demes should be avoided when population subdivision is known, but often the subdivision was not known or taken into account during the sampling stage of a study. There are methods for trying to detect or adjust for such cryptic subdivision after the fact (Pritchard et al. 2000; Devlin et al. 2001), but pooling is best avoided in the first place. Sampling from a relatively isolated founder population is one effective method of minimizing the dangers of cryptic population subdivision in a sample. For all these reasons, the identification and characterization of relatively genetically isolated populations that have undergone recent founder or bottleneck effects have become a high priority in quantitative genetic studies (Peltonen et al. 2000, Peltonen 2000).
The basic idea of the intrapopulational disequilibrium approach is to score the population for a large number of markers and test the marker genotypes for phenotypic associations. If a phenotypic association is detected, it is attributed not to the marker genotypes themselves but rather to linkage disequilibrium between the marker locus with a nearby quantitative trait locus, or QTL, a locus whose genetic variation significantly contributes to some of the phenotypic variation observed in the population. Since it is not known a priori how many QTLs may exist or their chromosomal locations, it is critical to have sufficient marker loci scattered throughout the genome to ensure likely linkage disequilibrium between at least one of the markers and any given QTL. Even for founder populations, sufficient coverage often requires a marker density of less than 5 cM (Ober et al. 2001).
This approach is most straightforward when the relevant phenotypic variation is primarily due to a single locus, as for example a Mendelian disease in humans. The value of isolated founder populations is illustrated in such a case by the first successful example of positional cloning of the gene of a human genetic disease, Huntington's chorea. Huntington's chorea is inherited as an autosomal dominant genetic disease and is associated with a late age-of-onset degeneration of the central nervous system (usually after 40 years of age) that ultimately causes death. The cloning of this disease gene depended first upon identifying a relatively isolated founder population that had a high frequency for this disease. Such a population was found in the region of Lake Maricaibo in Venezuela (Gusella et al. 1983). A restriction fragment length polymorphism (RFLP) was found to be associated with the disease in this population, which indicated that the disease locus was nearby on the tip of the short arm of chromosome 4 of the human genome. It must be emphasized that the association of
Huntington's disease with this RFLP is not universally true for all human populations; it is applicable primarily to that one founder population. However, the known chromosomal location of the RFLP was a universal indicator of the approximate chromosomal location of the locus for Huntington's disease and allowed the eventual identification and cloning of that disease gene.
Linkage disequilibrium often allows the mapping of the disease gene to an interval of just 1-2 cM (Peltonen et al. 2000). As pointed out in Figure 2.5 for the sickle cell allele, when a disease allele arises by mutation it occurs on a specific haplotype background that can be stable for many generations in a small region about the original mutation. Often disease mutations are recurrent, thereby occurring on a variety of haplotype backgrounds in the species as a whole. This indeed is what Figure 2.5 shows for the sickle cell allele. However, in any one founder population, it is likely that most or even all of the copies of the disease allele trace back to a recent common ancestral DNA molecule through the coalescent process, thereby making the haplotype background associated with the mutation far more uniform in a specific founder population. Therefore, once a general region of the genome has been indicated by disequilibrium with single markers, an examination of haplotype associations within this region can often refine the location of the disease gene to 50-200 kb, greatly facilitating the targeting of physical cloning and sequencing efforts (Peltonen 2000).
Difficulties arise with this marker approach when the genetic architecture is more complex. Suppose, for example, that many genes contribute to the phenotypic variation but in a manner characterized by extensive epistasis. Many of the epistatic QTLs could be overlooked (Cheverud et al. 1996) if phenotypic associations are only examined one marker at a time. Thus, the genetic model used constrains what is discovered; we tend to see only what we look for, particularly when it comes to epistasis (Frankel and Schork 1996).
Another difficulty is in choosing the marker loci. The analysis depends upon linkage disequilibrium, so any factor that tends to reduce disequilibrium undermines the power of this approach. As pointed out in Chapter 5, not all marker pairs display significant disequilibrium, sometimes even over less than 2 kb (e.g., Figure 5.18). Therefore, it is doubtful if a single marker allele would always display disequilibrium with a QTL. Moreover, when using haplotypes in small DNA regions, the linkage disequilibrium that exists is not well correlated with physical distance, as also pointed out in Chapter 5. This means that when attempting fine-scale mapping of a QTL, a significant phenotypic association with a marker is not a reliable guide to the actual location of the QTL.
Another problem with markers is the danger of selecting a marker at a highly mutable site. Frequent mutation at a site weakens linkage disequilibrium because it places the same genetic state on a variety of chromosomal backgrounds. It is commonplace in much of the evolutionary literature to assume the infinite-sites model (which does not allow multiple hits) for nuclear DNA. At first glance, the infinite-sites model seems reasonable. For example, a genetic survey of the human gene lipoprotein lipase (LPL) discussed in Chapter 5 revealed 88 polymorphic sites out of 9734 in the sequenced region—a figure that seems to be well below saturation. Moreover, almost all these 88 polymorphic sites have only two alternative nucleotide states, which seemingly further bolsters the argument against multiple mutational hits at the same nucleotide sites. However, these arguments are based upon the premise that mutations are equally likely to occur at all sites and, given a mutation, that any of the three nucleotide states are equally likely to arise. Neither of these premises is justified in this case, as pointed out in Chapter 5. For example, about a third of all mutations in human nuclear DNA are transitions from 5-methylcytosine to thymine that occur exclusively at
CpG dinucleotides, a combination markedly underrepresented in human DNA. Mutational hotspots have also been reported for mononucleotide-repeat regions, DNA polymerase a arrest sites, and other rarely occurring sequence motifs in human DNA. The pattern of site polymorphism in the LPL gene parallels the results of these mutation studies, with 9.6% of the nucleotides in CpG sites being polymorphic, 3.3% of the nucleotides in mononucleotide runs of length 5 or greater, 3.0% of the nucleotides within 3 bp of the polymerase a arrest site motif of TG(A/G)(A/G)GA, and 0.5% at all other sites (Templeton et al. 2000a). Altogether, almost half of the polymorphic sites in the sequenced portion of the LPL gene were from one of these three highly mutable classes and therefore would be less than ideal choices for a genetic marker, and an analysis of the haplotype trees estimated for this region (corrected for recombination) show that nucleotides at these highly mutable sites have indeed experienced multiple mutational changes in the evolutionary history of this DNA region. Unfortunately, the only consideration in finding markers is often that they are highly polymorphic in all populations, a procedure that biases in favor of sites with high mutability. One solution to the problem of multiple genetic backgrounds for mutable marker loci is to choose populations with extreme founder or bottleneck events in their recent evolutionary history. This greatly reduces the chances of multiple mutational hits at the marker locus in this particular population. Hence, an understanding of the recent evolutionary history of a population is critical in designing and executing this type of measured genotype study (Templeton 1999a).
Markers of Linkage. Physical linkage between two loci affects their pattern of cosegre-gation in a pedigree or set of controlled crosses regardless of whether or not there is linkage disequilibrium between the two loci in the population as a whole. Consequently, if one of these loci is a measured marker locus and the other is an unmeasured polymorphic QTL that contributes to phenotypic variation, then the pattern of segregation at the marker locus in the pedigree or controlled crosses will be associated with phenotypic differences among individuals within the pedigree or set of crosses. The strength of this association depends upon two factors:
• Magnitude of phenotypic impact of QTL
• Amount of recombination between marker locus and QTL
An experimenter has no control over the first of these factors, but the second can be controlled to some extent by the choice of the number and the genomic locations of the marker loci. Ideally, enough markers should be studied to cover the entire genome. Minimally, this means markers every 20 cM, which implies that any QTL will be <10 cM from a marker (and therefore double cross overs will not be important).
In humans, this linkage mapping approach is used with pedigree data. Because the associations are detected through linkage within a pedigree, there is no need for linkage disequilibrium in the population as a whole. Nevertheless, such linkage mapping studies in humans are still primarily and most powerfully executed in founder populations. The linkage disequilibrium induced by founder events is no longer directly relevant in a pedigree study, but founder populations are still the preferred objects of study because of their greater uniformity of genetic background, the greater incidence of phenotypic variants that exist in some founder populations, and the lessening of the chances for artifacts due to cryptic population subdivision. Moreover, although population-level linkage disequilibrium is not strictly required for this approach, its presence can make informative patterns of cosegregation within a pedigree much more likely. Therefore, linkage disequilibrium in the population as a whole enhances the power of linkage association studies within pedigrees (Ober et al. 2001). Finally, individuals from founder populations in humans are more likely to share many common environmental variables than individuals from most other human populations, thereby resulting in a more uniform environmental background as well.
An example of this approach is given by Ober et al. (2000, 2001), who measured 20 quantitative traits associated with asthma, diabetes mellitus, cardiovascular disease, hypertension, and autism in a population of Hutterites. The Hutterites are a recent, religiously defined founder population who practice a communal farming life-style which attenuates many sources of environmental variation. The phenotypes chosen for study have high incidences in this population, as expected from a severe founder event. In particular, 11% of the population have asthma, 28% of the Hutterites over 30 have diabetes or impaired glucose tolerance, and 34% have hypertension. Over 500 marker loci have been scored in about 700 living Hutterites that come from a single, 13-generation pedigree. This marker density yields a 9.1-cM map overall, with greater marker density in some regions of the genome. The simplest approach to detect QTLs is to model each theoretical QTL as a two-allele locus with an additive contribution to the phenotype and look for associations between such hypothetical QTLs and each marker. Because of the large number of statistical tests that are not independent because of linkage, the analysis is complicated by the need to correct for these multiple comparisons, but procedures exist for doing so (e.g., Cheverud 2001). Ober et al. (2000) found evidence for 23 QTLs influencing the phenotype of asthma susceptibility in the Hutterites.
Instead of using a single-site analysis of association, it is also possible to combine information from two or more sites into an integrated analysis of linkage association. One of the most common approaches of this type is interval mapping, in which a QTL is hypothesized to lie between two adjacent markers and the likelihood of the QTL being at various intermediate positions between the flanking markers is statistically evaluated. For example, consider the simple case of a controlled cross design in which two inbred strains are crossed to produce an F1, which in turn is then backcrossed to one of the parental stains. Let the marker alleles from one parental strain be designated by capital letters and by small letters for the other strain, and assume that the backcross was to this latter strain. Now consider two marker loci, say locus A and locus B, that are adjacent to one another on the chromosome map with a recombination frequency of r. Now we hypothesize a QTL, say locus X, in between these two marker loci. Assume that one parental strain was originally fixed for the X allele and that the other strain is fixed for the x allele at this hypothetical QTL. Assume that the genotypic value of Xx is GXx for the phenotype being measured in the backcross progeny and Gxx for those backcross individuals with genotype xx. Assuming that the markers have a recombination frequency of r that is sufficiently small that double crossovers can be ignored (this assumption can be dropped, but then one needs a mapping function), the recombination frequency between marker A and QTL X is rx and the recombination frequency between marker B and the QTL is r — rx, as shown in Figure 10.3. Given these assumptions, Table 10.1 shows the expected phenotypic means for the observed marker genotypes as a function of the hypothesized phenotypic effects of the QTL at the hypothesized map position of rx from marker locus A. Similar models exist for F2 crosses and pedigree data, although they are generally more complex. The important point is that the fit of the observed phenotypes to the expected phenotypes depends on both the hypothesized phenotypic impact of locus X and its chromosomal position. This fit is typically measured by a likelihood ratio test (Appendix 2) of the hypothesis that the QTL
Marker locus A
Was this article helpful?
Are you sick of feeling like the whole world Is spinning out of control. Do You Feel Weak Helpless Nauseous? Are You Scared to Move More Than a Few Inches From The Safety of Your Bed! Then you really need to read this page. You see, I know exactly what you are going through right now, believe me, I understand because I have been there & experienced vertigo at it's worst!