SBayesR in detail: four-component Gaussian mixture for polygenic prediction
How SBayesR generalises spike-and-slab to a four-component mixture, with a Dirichlet hyperprior on the mixing proportions and an inverse-Gamma on the residual variance.
SBayesR, published by Lloyd-Jones and colleagues in 2019, recognises that real-world polygenic traits often carry effects of widely different magnitudes within the same trait. A few large-effect SNPs (think MHC variants in autoimmune traits) coexist with many small-effect SNPs and a vast majority of zero-effect SNPs. A two-component spike-and-slab is a poor fit for this distribution.
The four-component mixture
Each SNP's effect is drawn from a mixture of four normal distributions: a zero component, a small-effect component, a medium-effect component, and a large-effect component. The mixing proportions, denoted pi, follow a Dirichlet hyperprior dominated by the zero component, reflecting the empirical observation that the vast majority of SNPs have no detectable effect on most traits.
Gibbs sampling at scale
The sampler cycles through SNP-by-SNP component assignment, joint effect-size update within each block, mixing-proportion update via Dirichlet conjugacy, and residual-variance update via inverse-Gamma conjugacy. The chain length is chosen to allow the mixing proportions time to settle on the right partition, which is the slowest-converging part of the model.
When SBayesR performs best
SBayesR outperforms two-component methods on traits with a clear distribution of large + medium + small effects, which empirically includes height, BMI, T2D, and most psychiatric traits in well-powered GWAS. For traits where the GWAS is too small to identify the large-effect tier reliably, the four-component mixture collapses toward two components and SBayesR offers little advantage over LDpred2 or PRS-CS at the cost of slower convergence.
- Lloyd-Jones LR et al. (2019). Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nature Communications.
- Erbe M et al. (2012). Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science.
Explain this article in the context of my own genome and tell me what is most relevant for me.