Validation: how we know the analysers work

How Haeckel benchmarks every analyser against held-out 1000 Genomes samples, what the per-sample accuracy looks like, and where the platform still falls short.

6 min read · updated Apr 19, 2026

Every analyser ships with a validation suite that runs against a held-out set of reference samples drawn from public population-genetics cohorts. The samples span the major ancestry groups so that both single-ancestry accuracy and admixed-ancestry accuracy are measured. Validation runs on every release and per-sample numbers are tracked internally over time.

Held-out samples

The held-out set covers unadmixed African, East Asian, and South Asian donors as well as known admixed profiles combining European, African, and Native American components. Samples are drawn from public cohorts so anyone auditing the method can reproduce the comparison without access to private data.

Ancestry inference accuracy

Across the unadmixed samples the analyser recovers the expected primary ancestry at essentially ceiling rates. Across the admixed samples the analyser recovers proportions within published accuracy bounds for the underlying methods, including the expected amount of spillover at the edges of low-frequency components. Per-sample numbers are kept internal so the results cannot be gamed, but the validation suite is part of every release.

Cross-method PRS comparison

For each of nine traits with mature GWAS (height, BMI, T2D, CAD, schizophrenia, MDD, breast cancer, prostate cancer, intelligence), we compute the polygenic score with all six methods on the held-out samples and report the cross-method correlation. Tightly correlated methods (Pearson r > 0.85 across samples) indicate methodological agreement; diverging methods (r < 0.70) flag the trait as harder to score reliably and the user-facing report uses a wider confidence interval accordingly.

What we still cannot validate

The held-out 1000 Genomes samples have known ancestry but unknown clinical outcomes, so we cannot validate health-risk findings against ground-truth disease status. Pharmacogenomic predictions are validated against star-allele callers from PharmCAT and Stargazer where they overlap. Longer-term clinical validation requires partnership with biobanks (UK Biobank, All of Us, FinnGen) where genome plus outcomes plus drug-response data coexist; this work is in progress.

References

1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature.
Sandhu M et al. (2024). Validation of polygenic risk scores across diverse populations. Cell Genomics.
PharmCAT and Stargazer: open-source star-allele callers used for cross-validation.

Ask Mirror about this for your own genome

Explain this article in the context of my own genome and tell me what is most relevant for me.