Cross-ancestry PRS calibration

Why polygenic scores trained in one ancestry transfer poorly to others, and how Haeckel applies a per-individual mu and sigma calibration plus pre-computed variance scalars to make scores comparable across users.

7 min read · updated Apr 19, 2026

Most published GWAS were conducted in European-ancestry cohorts. Polygenic scores derived from them transfer poorly to non-European users for two distinct reasons. First, the LD patterns differ between populations, so a SNP that was tagging a true causal variant in Europeans may not tag the same variant in East Asians. Second, the allele frequencies of true causal variants differ across populations, so the population mean and variance of the score itself shift.

The two failure modes

Mean shift: the average raw PRS in a non-European population is not zero, even when the trait risk is the same as in Europeans. A naive z-scoring against the European reference will systematically over- or under-rate non-European users.
Variance compression: the standard deviation of the raw PRS is often smaller in non-European populations because the GWAS missed the LD-tagged signal. A naive z-scoring will compress the entire non-European cohort into a narrow band around zero.

Per-individual mu and sigma calibration

Haeckel computes a population-specific mu (mean) and sigma (standard deviation) for each PRS, then derives a per-user mu_user and sigma_user as a weighted blend of the population values, weighted by the user's inferred ancestry components. The user's standardised z-score is then (raw_score - mu_user) / sigma_user, which is comparable across users regardless of ancestry composition.

Pre-computed variance scalars

Computing sigma per population at PRS evaluation time is expensive when the cohort is large. We pre-compute variance scalars V_k for each PRS-population pair, where V_k = Var(raw_score | ancestry_k). The per-user sigma_user is then the square root of the sum of (ancestry_k_fraction × V_k) across populations, which reduces the per-user calibration step to O(K) rather than O(N × K).

Limits of calibration

Calibration corrects the mean and variance but cannot recover information that the GWAS never had. If a true causal variant is common in the user's ancestry but absent from the European GWAS reference, the score will not capture its effect at all, no matter how cleverly we re-centre. The honest summary, surfaced in the per-trait dossier, is that PRS accuracy for non-European users remains lower than for European users until ancestry-specific GWAS at adequate sample size are available.

References

Martin AR et al. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics.
Wang Y et al. (2020). Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nature Communications.
Marnetto D et al. (2020). Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nature Communications.

Ask Mirror about this for your own genome

Explain this article in the context of my own genome and tell me what is most relevant for me.