The S-star statistic for archaic introgression
A reference-free test that detects long, divergent haplotypes consistent with introgression from an archaic source population, even when no archaic genome is available for comparison.
The S-star statistic, introduced by Plagnol and Wall in 2006 and refined by Vernot and Akey in 2014, asks whether a stretch of modern human DNA contains more divergent variation than a model of pure within-species drift can plausibly produce. When it does, the most economical explanation is that the stretch was inherited from a population that diverged from modern humans long enough ago for additional mutations to accumulate. In practice, that source is almost always an archaic group such as Neanderthals or Denisovans.
Why a reference-free test matters
Direct comparison against the Vindija or Altai Neanderthal genomes is the most powerful approach when those references exist. But the introgressing population may not have a sequenced reference at all. Vernot and Akey demonstrated that S-star recovers archaic tracts in modern non-Africans purely from the modern data, with no archaic input, by exploiting the asymmetry in coalescent times: archaic-derived haplotypes coalesce hundreds of thousands of years deeper than purely modern haplotypes do.
How the score is computed
The analyser works in sliding windows along the genome. Within each window, it identifies SNPs that are private to the test individual relative to a panel of reference individuals from a population not believed to share recent introgression with the source (sub-Saharan Africans serve this role for Neanderthal detection). For each pair of private SNPs, the score adds a contribution proportional to the physical distance between them when both alleles fall on the same chromosome (in cis), and subtracts a small penalty otherwise.
Long runs of co-located private alleles produce high S-star values. Because mutation accumulates linearly with time and recombination breaks down haplotypes proportionally, a high S-star window over a long stretch is the unmistakable signature of a haplotype that has had little time to recombine with the modern genetic background and therefore must have entered the population recently in evolutionary time.
Ghost population detection
When S-star flags a tract that does NOT match any sequenced archaic reference, the result is a candidate "ghost" introgression: a contribution from a population that is now extinct and has left no body for paleogenomicists to sequence. Several published cases exist. Sub-Saharan African populations carry tracts attributable to a ghost archaic source that diverged before the Neanderthal-Denisovan split, and Andamanese populations carry candidate ghost tracts that may relate to a Southeast Asian archaic group still unidentified in the fossil record.
What S-star alone cannot tell you
S-star is sensitive to introgression but does not, by itself, identify the source. A high-S window in a European individual is overwhelmingly likely to be Neanderthal-derived because the regional fossil and ancient-DNA record points there, but the statistic itself cannot confirm the source identity. Haeckel pairs S-star with D-statistics whenever an archaic reference is available, and with the four-state Viterbi HMM to assign each tract to Neanderthal, Denisovan, unknown-archaic, or modern.
- Plagnol V, Wall JD (2006). Possible ancestral structure in human populations. PLOS Genetics.
- Vernot B, Akey JM (2014). Resurrecting surviving Neandertal lineages from modern human genomes. Science.
- Vernot B et al. (2016). Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science.
- Durvasula A, Sankararaman S (2020). Recovering signals of ghost archaic introgression in African populations. Science Advances.
Explain this article in the context of my own genome and tell me what is most relevant for me.