LDpred2 in detail: spike-and-slab Bayesian PRS
How LDpred2's spike-and-slab prior captures both the bulk of zero-effect SNPs and the tail of true causal variants, with four operating modes including the auto mode that learns its hyperparameters from the data.
LDpred2, published by Privé and colleagues in 2020, evolves the spike-and-slab idea introduced in the original LDpred. Each SNP's effect is modelled as a mixture: with probability pi the SNP is causal and its effect is drawn from a Gaussian distribution N(0, h²/(M × pi)), and with probability 1-pi the effect is exactly zero (the spike). M is the total number of SNPs and h² is the SNP heritability of the trait.
The four operating modes
- Infinitesimal: pi is fixed at 1, every SNP is treated as causal with a Gaussian effect. Optimal for traits with thousands of small contributors.
- Auto: pi and h² are sampled jointly from the posterior using MCMC. The most general mode, recommended unless you have strong prior information.
- Grid: pi is searched over a grid of values (typically 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0) and the best fit is selected. Useful for diagnostic comparison.
- Sparse: a strong prior toward small pi values, suitable for traits where few SNPs carry most of the signal.
The Gibbs sampler
Each iteration samples each SNP's effect conditional on the others, the LD reference, and the current pi and h². The conditional posterior at each SNP is itself a spike-and-slab: the posterior probability of being causal is updated based on how well a non-zero effect would fit the marginal estimate after accounting for the contributions of nearby SNPs. The sampler is initialised from the marginal effect estimates and runs through burn-in and sampling phases of sufficient length to converge per LD block.
Convergence and validation
Multiple independent Gibbs chains run per block, with Gelman-Rubin R-hat reported per parameter (each SNP effect, plus pi and h²). Chains that fail to converge are flagged. Cross-validation against a held-out cohort is performed for the auto mode whenever the underlying GWAS is large enough to support it, providing an additional sanity check on the posterior.
When LDpred2 performs best
LDpred2 outperforms PRS-CS for traits with intermediate polygenicity, where the true causal SNP set is a small fraction of the genome but each causal SNP carries a meaningful effect. Several autoimmune traits, lipid traits, and some cancer-risk scores fall in this regime. For very high or very low polygenicity, PRS-CS or SBayesR usually edges ahead.
- Privé F, Arbel J, Vilhjálmsson BJ (2020). LDpred2: better, faster, stronger. Bioinformatics.
- Vilhjálmsson BJ et al. (2015). Modeling linkage disequilibrium increases accuracy of polygenic risk scores. American Journal of Human Genetics.
Explain this article in the context of my own genome and tell me what is most relevant for me.