Methods · Validation

Every inference has a citation.

Every inference traces to a named, peer-reviewed method and the reference panel it was validated against.

Validated against 1000 Genomes + HGDP · Pipeline v7

The ledger

What runs, and what backs it

Every analyzer cites a named, peer-reviewed method, and is validated against reference cohorts before it ships.

MLE-EM + spatial thinning + Bootstrap Wald + AMR deconvolution[1][2]
Ancestry inference (significance-gated)
PRS-CS · LDpred2 · Lassosum · SBayesR · C+T[3][4][5][6][12]
Polygenic risk scoring (six methods + ensemble)
S* statistic · ABBA-BABA D-stats · 4-state Viterbi HMM[9][10]
Archaic introgression (Neanderthal · Denisovan)
HIrisPlex logistic regression[7]
Phenotype prediction (eye · hair · skin)
Recursive phylogenetic traversal · Bayesian confidence
Y-DNA + mtDNA haplogroups (165 nodes)
CPIC star-allele calling[11]
Pharmacogenomics (12 genes · 10 FDA Black Box)
KING-robust kinship · ROH detection[8]
Relationship inference · consanguinity
pgvector 1536-d embeddings
Networks · kinship · AI Candidate Finder
Evo 2 DNA foundation model (in integration)[13]
Variants of uncertain significance, from first principles
Rigor

Validation & confidence

A method is only as good as the honesty of its uncertainty. Haeckel gates on confidence and refuses to guess past its coverage.

Ancestry
Significance-gated
Bootstrap Wald; signal below the threshold is returned as Unassigned
AMR deconvolution
Admixed reference orthogonalized to its Native American vertex
Removes spurious cross-population signal in unadmixed individuals
Phenotype
Coverage-gated (HIrisPlex)
Below the coverage threshold it returns no prediction
Haplogroups
validateTree integrity gate at module load
An unauthored child node fails the build at startup
Pipeline
Versioned (v7)
Existing genomes are flagged for re-analysis on every version bump
Citations

References

The peer-reviewed foundation. The research vision behind Haeckel lives at herugenomics.com/research.

  1. [1]The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
  2. [2]Bergström A, et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367 (2020).
  3. [3]Ge T, Chen C-Y, Ni Y, et al. Polygenic prediction via Bayesian regression and continuous shrinkage priors (PRS-CS). Nat Commun 10, 1776 (2019).
  4. [4]Privé F, Arbel J, Vilhjálmsson BJ. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).
  5. [5]Mak TSH, et al. Polygenic scores via penalized regression on summary statistics (lassosum). Genet Epidemiol 41, 469–480 (2017).
  6. [6]Lloyd-Jones LR, et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics (SBayesR). Nat Commun 10, 5086 (2019).
  7. [7]Walsh S, et al. The HIrisPlex system for simultaneous prediction of hair and eye colour. Forensic Sci Int Genet 7, 98–115 (2013).
  8. [8]Manichaikul A, et al. Robust relationship inference in genome-wide association studies (KING). Bioinformatics 26, 2867–2873 (2010).
  9. [9]Green RE, et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
  10. [10]Vernot B, Akey JM. Resurrecting surviving Neandertal lineages from modern human genomes (S*). Science 343, 1017–1021 (2014).
  11. [11]Clinical Pharmacogenetics Implementation Consortium (CPIC) guidelines. cpicpgx.org.
  12. [12]Lambert SA, et al. The Polygenic Score Catalog as an open database for reproducibility. Nat Genet 53, 420–425 (2021).
  13. [13]Brixi G, et al. Genome modeling and design across all domains of life with Evo 2. Arc Institute (2025).