KING-robust kinship inference
How the KING-robust estimator infers pairwise relatedness from heterozygote sharing, why it does not require an explicit allele-frequency reference, and how Haeckel translates kinship coefficients into relationship classifications.
The KING-robust kinship estimator, published by Manichaikul and colleagues in 2010, infers the kinship coefficient between any two genotyped individuals using only their genotypes, with no need for a reference allele-frequency panel and with explicit robustness to population structure. That last property is what makes it the right tool for a multi-ancestry platform like Haeckel.
The estimator
For each pair of individuals i and j, count the number of SNPs at which both are heterozygous (N_het_ij), the number where i is homozygous and j carries a different homozygous genotype (N_AA_BB_ij + N_BB_AA_ij = N_homDiff_ij), and the total number of heterozygotes in i and j separately (N_het_i, N_het_j). The KING-robust estimate of the kinship coefficient is then phi_ij = (N_het_ij - 2 × N_homDiff_ij) / (N_het_i + N_het_j).
The intuition: relatives share more heterozygous sites than expected by chance, and they show fewer cases where one is AA and the other is BB. The numerator counts the difference between these two quantities, normalised by the total heterozygote count. The result is robust to population structure because the formula depends only on the joint genotype distribution within the pair, not on the population mean.
Translating phi into a relationship
- phi ≈ 0.5: monozygotic twin or duplicated sample.
- phi ≈ 0.25: parent-offspring or full sibling.
- phi ≈ 0.125: half-sibling, avuncular (aunt/uncle/niece/nephew), or grandparent-grandchild.
- phi ≈ 0.0625: first cousin.
- phi ≈ 0.03125: second cousin or half first cousin.
- phi < 0.022: unrelated (or relationship more distant than third cousin, beyond the typical resolution of array data).
To distinguish parent-offspring from full sibling, both of which sit at phi = 0.25, Haeckel computes the IBD0 (proportion of the genome where the pair shares no alleles by descent). Parent-offspring pairs always share at least one allele at every variant, so IBD0 ≈ 0. Full siblings share zero alleles at about 25% of variants on average, so IBD0 ≈ 0.25. The platform reports both phi and IBD0 to nail down the exact relationship.
Practical limits
KING-robust estimates degrade beyond third-cousin relationships because the expected number of shared IBD segments drops below the noise floor of typical array data. Detecting a genuine fifth-cousin relationship requires either many millions of variants (low-pass WGS) or a large reference cohort with phased haplotypes for IBD-segment-based methods like hap-IBD or iLASH.
- Manichaikul A et al. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics.
- Conomos MP et al. (2016). Model-free estimation of recent genetic relatedness. American Journal of Human Genetics.
Explain this article in the context of my own genome and tell me what is most relevant for me.