Methods

Tract assignment via four-state Viterbi HMM

How a hidden Markov model with Modern, Neanderthal, Denisovan, and Unknown-Archaic states segments your genome into ancestry tracts, and how the coalescent dating step assigns each tract a TMRCA.

8 min read · updated Apr 19, 2026

The S-star statistic flags candidate introgressed windows, and D-statistics confirm that introgression occurred. Neither tells you, position by position, which segments of your genome came from which archaic source. That is the job of the tract HMM, a hidden Markov model that reads your variants left to right along each chromosome and emits a most-likely sequence of ancestry states.

Four hidden states

Each base pair belongs to one of four hidden states.

  • Modern: standard Homo sapiens ancestry, no archaic contribution.
  • Neanderthal: a tract introgressed from Neanderthal.
  • Denisovan: a tract introgressed from Denisovan.
  • Unknown-Archaic: a tract diverged enough to look archaic but matching no sequenced reference, suggestive of a ghost source.

Emission model

For each variant, the emission probability under each state is computed from the rarity of the alleles in modern populations versus their frequencies in archaic references. A derived allele that is rare in moderns but homozygous in Vindija raises the emission probability for the Neanderthal state. A derived allele common in moderns lowers the emission probability for any archaic state. The emissions are not hardcoded but computed per variant from real allele-frequency data, which is why the HMM degrades gracefully when a SNP is poorly typed.

Transition probabilities

The transition matrix encodes the prior probability of switching ancestry along the chromosome. The off-diagonal entries are calibrated from the expected tract length under a coalescent model with the relevant introgression date and recombination rate. For Neanderthals, with introgression around 50,000 years ago and a typical generation time of 25 years, the expected mean tract length in modern non-Africans is around 50-100 kilobases. The transition matrix targets that distribution.

Viterbi decoding

With emission and transition matrices in hand, the Viterbi algorithm finds the single most likely state sequence over the chromosome. The output is a partition of the chromosome into contiguous tracts, each labelled with one of the four states. Tracts shorter than a minimum length are merged into the surrounding state to control false positives.

Coalescent dating

Once a tract is identified, its time to most recent common ancestor with the archaic reference can be estimated from the number of polymorphic sites within the tract, the published human per-base mutation rate, and a standard generation time. The output is a TMRCA with a confidence interval reflecting the per-tract Poisson variance plus the uncertainty in the mutation-rate constant itself.

Most Neanderthal tracts in modern Europeans coalesce around 600-800 thousand years ago, consistent with the known split between the Neanderthal and modern human lineages. A tract with a substantially older TMRCA suggests either a deeper archaic source (the unknown-archaic state) or a long-uninterrupted modern haplotype that happened to accumulate enough mutations to look archaic, which the model accounts for in the credibility interval.

References
  • Sankararaman S et al. (2014). The genomic landscape of Neandertal ancestry in present-day humans. Nature.
  • Skov L et al. (2018). Detecting archaic introgression using an unadmixed outgroup. PLOS Genetics.
  • Hubisz MJ et al. (2020). Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLOS Genetics.
Ask Mirror about this for your own genome

Explain this article in the context of my own genome and tell me what is most relevant for me.