What is a polygenic risk score?
A weighted sum of thousands of common variants that summarises inherited disposition for a complex trait. We explain how it is built, what the number means, and where it falls short.
A polygenic risk score, often abbreviated PRS, summarises a person's inherited disposition for a complex trait by adding up the small effects of thousands of common genetic variants. Each variant contributes a tiny weight that was estimated in a separate genome-wide association study, and the weighted sum approximates how far that individual sits from the population average for the trait in question.
The output is usually expressed as a z-score (standard deviations above or below the mean) or as a percentile in the reference population. A z-score of +1.5 for height, for example, means the genetic component of your height places you above roughly 93% of the cohort the score was trained in.
How Haeckel computes a PRS
Haeckel does not commit to a single PRS algorithm because every method makes different statistical assumptions and each one performs best on a different type of trait. Instead, the platform runs six independent methods and combines them.
- Basic weighted sum, included as a sanity check against the historical literature.
- Clumping and Thresholding (C+T), the workhorse of GWAS-era PRS construction.
- PRS-CS, a Bayesian method that shrinks effect sizes using a continuous shrinkage prior and a linkage-disequilibrium reference panel.
- LDpred2, another Bayesian method that models a spike-and-slab prior, useful when the trait is highly polygenic.
- Lassosum, an L1-penalised regression solved by coordinate descent.
- SBayesR, a four-component Gaussian mixture that handles a mix of small and large effects.
For each trait, Haeckel reports the ensemble result alongside the per-method scores so you can see how much agreement there is. When the methods disagree sharply, the trait either has a more complex architecture than any single method assumes or the available training data is too small to nail down the effect sizes.
How to read the number
A PRS captures a fraction of the heritable variation in a trait, and that fraction depends on both the trait and the population the score was trained in. For height, a well-tuned PRS recovers about 25% of the variance. For coronary artery disease, around 8 to 11%. For most psychiatric traits, less than 5%. The remainder is environment, gene-environment interaction, rare variants the array did not catch, and pure noise.
When you see "92nd percentile for height," it does not mean you will be tall. It means the part of your height that is genetically determined places you in that bracket of the reference cohort, and that genetic component itself only explains a quarter of why anyone ends up the height they are.
What a PRS cannot do
A polygenic score is not a diagnosis. It cannot tell you whether you will develop a disease, only that your inherited disposition is shifted in one direction relative to a reference population. For diseases with strong monogenic components, such as familial hypercholesterolaemia or Huntington's, the PRS sits underneath a much stronger single-gene signal that the platform reports separately.
- Choi SW, Mak TS-H, O'Reilly PF (2020). Tutorial: a guide to performing polygenic risk score analyses. Nature Protocols.
- Ge T et al. (2019). Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature Communications.
- Privé F et al. (2020). LDpred2: better, faster, stronger. Bioinformatics.
- Lewis CM, Vassos E (2020). Polygenic risk scores: from research tools to clinical instruments. Genome Medicine.
Walk me through my own polygenic risk scores and how confident I should be in each one.