Methods

1000 Genomes plus HGDP+TGP: the reference panel

How the platform combines the 1000 Genomes Phase 3 release with the gnomAD v3.1.2 HGDP+TGP subset to build a 62-population reference for ancestry inference, PCA centroids, and PRS calibration.

6 min read · updated Apr 19, 2026

Every population-genetic inference Haeckel makes is anchored in a reference panel. The reference panel is not one dataset but a union of two: the 1000 Genomes Project Phase 3 release (2,504 samples across 26 populations) plus the gnomAD v3.1.2 HGDP+TGP subset (3,202 samples across 78 populations, of which 35 are unique to HGDP). Together they cover 62 distinct subpopulations across all major continental ancestries.

Why both, not just one

1000 Genomes is the canonical reference for diversity in well-studied populations. HGDP samples populations that 1000G underrepresents, including Mbuti, Biaka, San, Mozabite, Druze, Kalash, Pathan, Karitiana, Surui, Pima, Maya, Bougainville, Papuan Highland, Papuan Sepik, and many others. Populations that appear in both are deduplicated so no sample is double-counted.

How allele frequencies are extracted

Allele frequencies are computed per population from the source VCFs and subpopulation-specific fields in gnomAD. The frequencies are stored in the reference database keyed by chromosome and position so the variant-enrichment stage of the pipeline can join them in constant time per variant.

PCA centroids

A one-time PCA on the full reference panel produces three principal-component coordinates per sample. Each subpopulation's centroid is the median of its samples' coordinates (median rather than mean to be robust to a few mislabelled samples). The centroids form the basis on which every Haeckel user's 3D coordinates are projected.

References
  • 1000 Genomes Project Consortium (2015). A global reference for human genetic variation. Nature.
  • Bergström A et al. (2020). Insights into human genetic variation and population history from 929 diverse genomes. Science.
  • Karczewski KJ et al. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature.
Ask Mirror about this for your own genome

Explain this article in the context of my own genome and tell me what is most relevant for me.