Linkage Disequilibrium Calculator
Calculate D, D’, and r² values for genetic linkage analysis
Comprehensive Guide: How to Calculate Linkage Disequilibrium
Linkage disequilibrium (LD) measures the non-random association of alleles at different loci in a given population. Understanding LD is crucial for genetic mapping, association studies, and evolutionary biology. This guide explains the mathematical foundations and practical applications of LD calculations.
1. Fundamental Concepts of Linkage Disequilibrium
Linkage disequilibrium occurs when alleles at two different loci are associated more or less frequently than expected by chance. This phenomenon provides insights into:
- Genetic recombination rates between loci
- Population history and structure
- Selection pressures acting on specific genomic regions
- Disease gene mapping through association studies
2. Mathematical Measures of Linkage Disequilibrium
Three primary metrics quantify linkage disequilibrium:
- D (Disequilibrium Coefficient): The basic measure representing the difference between observed and expected haplotype frequencies
- D’ (Standardized Disequilibrium): A normalized version of D that accounts for allele frequencies
- r² (Correlation Coefficient): The square of the correlation coefficient between alleles
2.1 Calculating D (Disequilibrium Coefficient)
The D value is calculated as:
D = PAB – (pA × pB)
Where:
- PAB = Observed frequency of haplotype AB
- pA = Frequency of allele A
- pB = Frequency of allele B
2.2 Calculating D’ (Standardized Disequilibrium)
D’ normalizes D to range between -1 and 1:
D’ = D / Dmax
where Dmax = min[pA(1-pB), pB(1-pA)] when D > 0
Dmax = min[pApB, (1-pA)(1-pB)] when D < 0
2.3 Calculating r² (Correlation Coefficient)
r² measures the correlation between alleles:
r² = D² / [pA(1-pA) × pB(1-pB)]
3. Interpretation of LD Values
| LD Measure | Value Range | Interpretation | Genetic Implications |
|---|---|---|---|
| D’ | |D’| = 1 | Complete LD | No recombination between loci; alleles always inherited together |
| D’ | |D’| > 0.7 | Strong LD | Low recombination; useful for fine-mapping |
| D’ | 0.3 < |D'| < 0.7 | Moderate LD | Some recombination; broader association regions |
| D’ | |D’| < 0.3 | Weak LD | High recombination; little association |
| r² | r² = 1 | Perfect correlation | Alleles are completely correlated; one can perfectly predict the other |
| r² | r² > 0.8 | Strong correlation | High predictive value; useful for tagging SNPs |
4. Practical Applications of LD Calculations
Linkage disequilibrium analysis has transformative applications across genetic research:
4.1 Genome-Wide Association Studies (GWAS)
LD patterns help identify genomic regions associated with complex traits by:
- Reducing the number of tests needed through tagging SNPs
- Identifying haplotype blocks that segregate with disease
- Fine-mapping causal variants within associated regions
4.2 Evolutionary Biology
LD provides insights into:
- Population bottlenecks and expansions
- Selective sweeps and positive selection
- Gene flow between populations
- Recombination hotspots and coldspots
4.3 Agricultural Genetics
In plant and animal breeding, LD helps:
- Identify quantitative trait loci (QTLs)
- Implement genomic selection programs
- Understand domestication syndromes
- Preserve genetic diversity in conservation programs
5. Factors Affecting Linkage Disequilibrium
| Factor | Effect on LD | Biological Mechanism | Typical Timescale |
|---|---|---|---|
| Recombination | Reduces LD | Physical exchange between homologous chromosomes | Generational |
| Mutation | Can create or break LD | New alleles introduce novel haplotype combinations | Evolutionary |
| Genetic Drift | Increases LD | Random fluctuations in allele frequencies in small populations | Generational |
| Population Structure | Increases LD | Subpopulation allele frequency differences create spurious associations | Evolutionary |
| Selection | Context-dependent | Positive selection increases LD; balancing selection reduces LD | Evolutionary |
| Gene Conversion | Reduces LD | Non-reciprocal transfer of genetic information | Generational |
6. Advanced Topics in LD Analysis
6.1 Haplotype Block Structure
Genomes are organized into haplotype blocks – regions with strong LD separated by recombination hotspots. Key characteristics:
- Typical block size in humans: 5-100 kb
- Hotspots occur approximately every 50-100 kb
- Block structure varies across populations and genomic regions
- Can be visualized using LD plots (heatmaps)
6.2 LD Decay Analysis
LD decay measures how quickly LD diminishes with physical distance. Important for:
- Estimating historical effective population size
- Determining appropriate marker density for studies
- Comparing recombination rates across species
- Identifying regions under selection
6.3 Multi-locus LD Measures
Extensions to multiple loci include:
- Multi-allelic D’: For loci with more than two alleles
- Composite LD: Measures between multiple loci simultaneously
- Extended haplotype homozygosity (EHH): Measures LD decay from a core haplotype
- Integrated haplotype score (iHS): Detects recent positive selection
7. Common Pitfalls in LD Analysis
Avoid these mistakes in your LD calculations:
- Small sample sizes: Can lead to unreliable LD estimates and false positives/negatives
- Population stratification: Undetected subpopulation structure can create spurious LD
- Ignoring phase information: Incorrect haplotype phasing distorts LD measures
- Multiple testing: Without correction, thousands of LD tests inflate false positive rates
- Assuming constant recombination: Recombination rates vary across the genome
- Neglecting missing data: Imputation or proper handling of missing genotypes is essential
8. Software Tools for LD Analysis
Several specialized tools facilitate LD calculation and visualization:
- PLINK: Command-line tool for whole genome association analysis
- Haploview: Java application for haplotype analysis and LD visualization
- LDlink: Web-based suite for exploring LD in human populations
- R packages: genetics, LDheatmap, and snpStats for statistical LD analysis
- TASSEL: Software for association mapping in plants and animals
9. Future Directions in LD Research
Emerging areas in linkage disequilibrium research include:
- Single-cell LD analysis: Examining LD patterns in individual cells
- Epigenetic LD: Studying co-inheritance of epigenetic marks
- 3D genome LD: Integrating chromosomal conformation with LD patterns
- Machine learning approaches: Predicting LD patterns from sequence data
- Ancient DNA LD: Analyzing LD in historical and archaeological samples