Lassosum: L1-penalised PRS with coordinate descent
How Lassosum uses the LASSO with linkage-disequilibrium correction to produce sparse polygenic scores, why we cap the LD-mixing parameter at 0.4 for cross-ancestry safety, and when the method shines.
Lassosum, published by Mak and colleagues in 2017, applies the L1-penalised regression of Tibshirani 1996 (the LASSO) to polygenic-score construction. The objective is to find a vector of effect sizes B that minimises B'(R_s)B - 2B'r + lambda × |B|_1, where r is the marginal GWAS effect-size vector, R is the LD correlation matrix, and R_s = (1-s) × I + s × R is a mix of the LD matrix and the identity controlled by the parameter s.
Coordinate descent
The LASSO objective has no closed-form solution but is convex, so coordinate descent converges quickly. Each iteration updates one SNP's effect at a time using the soft-threshold operator, where the residualised marginal effect of the SNP (after subtracting the contributions of the other SNPs) is shrunk toward zero by the L1 penalty lambda. Lambda is tuned on a grid, and the best lambda is selected by held-out validation accuracy.
Why we cap the LD mixing parameter for cross-ancestry users
The identity-LD fallback
When the LD reference panel for the user's ancestry is unavailable, Haeckel falls back to the identity matrix (s = 0), which is mathematically equivalent to ridge-regression-style shrinkage with no LD correction. The result is a less calibrated score but never a wildly miscalibrated one. We surface the fallback explicitly in the per-method output so downstream consumers know not to over-interpret.
When Lassosum performs best
Lassosum produces sparse scores that are easier to interpret and faster to deploy than the dense Bayesian alternatives. For traits where a small number of large-effect SNPs dominates the signal (Mendelian-leaning conditions, lipid disorders, some cancer susceptibility scores), Lassosum often matches or beats Bayesian methods at a fraction of the runtime. For highly polygenic traits, the sparsity bias hurts and Bayesian methods take over.
- Mak TS-H, Porsch RM, Choi SW, Zhou X, Sham PC (2017). Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology.
- Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B.
Explain this article in the context of my own genome and tell me what is most relevant for me.