Methods

How Haeckel actually computes everything, from spatial-thinning ancestry to Bayesian polygenic scoring.

The full pipeline that turns your raw genotypes into a 62-population ancestry vector: spatial thinning, MLE-EM, bootstrap Wald testing, and AMR deconvolution.

8 min read · updated Apr 19, 2026

Bayesian polygenic scoring

How PRS-CS, LDpred2, and SBayesR turn GWAS summary statistics and an LD reference panel into a calibrated polygenic score, with convergence diagnostics that flag unreliable scores.

7 min read · updated Apr 19, 2026

Embedding similarity for Networks

How a dense vector representation captures genome plus interests, and how cosine similarity surfaces compatible matches without revealing identifying information.

4 min read · updated Apr 19, 2026

The S-star statistic for archaic introgression

A reference-free test that detects long, divergent haplotypes consistent with introgression from an archaic source population, even when no archaic genome is available for comparison.

7 min read · updated Apr 19, 2026

D-statistics, ABBA-BABA, and f4-ratios

How the four-population test detects introgression between specific groups, how f4-ratios estimate the admixture proportion, and why we use a block jackknife for the standard error.

8 min read · updated Apr 19, 2026

Tract assignment via four-state Viterbi HMM

How a hidden Markov model with Modern, Neanderthal, Denisovan, and Unknown-Archaic states segments your genome into ancestry tracts, and how the coalescent dating step assigns each tract a TMRCA.

8 min read · updated Apr 19, 2026

PRS-CS in detail: Bayesian regression with continuous shrinkage

How PRS-CS turns marginal GWAS effect sizes into joint posterior estimates using a continuous shrinkage prior, an MCMC sampler with two Gibbs chains per LD block, and Gelman-Rubin convergence diagnostics.

9 min read · updated Apr 19, 2026

LDpred2 in detail: spike-and-slab Bayesian PRS

How LDpred2's spike-and-slab prior captures both the bulk of zero-effect SNPs and the tail of true causal variants, with four operating modes including the auto mode that learns its hyperparameters from the data.

8 min read · updated Apr 19, 2026

SBayesR in detail: four-component Gaussian mixture for polygenic prediction

How SBayesR generalises spike-and-slab to a four-component mixture, with a Dirichlet hyperprior on the mixing proportions and an inverse-Gamma on the residual variance.

7 min read · updated Apr 19, 2026

Lassosum: L1-penalised PRS with coordinate descent

How Lassosum uses the LASSO with linkage-disequilibrium correction to produce sparse polygenic scores, why we cap the LD-mixing parameter at 0.4 for cross-ancestry safety, and when the method shines.

6 min read · updated Apr 19, 2026

Cross-ancestry PRS calibration

Why polygenic scores trained in one ancestry transfer poorly to others, and how Haeckel applies a per-individual mu and sigma calibration plus pre-computed variance scalars to make scores comparable across users.

7 min read · updated Apr 19, 2026

KING-robust kinship inference

How the KING-robust estimator infers pairwise relatedness from heterozygote sharing, why it does not require an explicit allele-frequency reference, and how Haeckel translates kinship coefficients into relationship classifications.

7 min read · updated Apr 19, 2026

Detecting runs of homozygosity (ROH)

How sliding-window ROH detection identifies long stretches of homozygous genotypes that signal recent shared ancestry between a person's parents, and how F_ROH summarises the signal across the genome.

6 min read · updated Apr 19, 2026

Data quality control: call rate, heterozygosity, and Ti/Tv

The four metrics Haeckel computes on every uploaded file to flag genotyping errors, sample contamination, and array-platform problems before any downstream analysis runs.

6 min read · updated Apr 19, 2026

How clinical health-risk findings are derived

The 89-gene panel, the ClinVar matching strategy, sex-aware X-linked handling, and how penetrance estimates produce age-stratified lifetime risk numbers.

7 min read · updated Apr 19, 2026

How your 3D PCA coordinates are computed

A weighted projection of your ancestry composition onto pre-computed reference centroids, calibrated so that Euclidean distance approximates √Fst between any pair of users.

5 min read · updated Apr 19, 2026

The genomic pipeline, end to end (v5)

Every stage your raw DNA file passes through, in order, from upload through encryption, parsing, variant enrichment, the seventeen analyser modules, and final result persistence.

8 min read · updated Apr 23, 2026

1000 Genomes plus HGDP+TGP: the reference panel

How the platform combines the 1000 Genomes Phase 3 release with the gnomAD v3.1.2 HGDP+TGP subset to build a 62-population reference for ancestry inference, PCA centroids, and PRS calibration.

6 min read · updated Apr 19, 2026

LD reference panels: how they are computed and why size matters

How Haeckel computes and ships LD reference panels for all five 1000 Genomes superpopulations, and how the mixing parameter caps cross-ancestry amplification.

6 min read · updated Apr 19, 2026

Validation: how we know the analysers work

How Haeckel benchmarks every analyser against held-out 1000 Genomes samples, what the per-sample accuracy looks like, and where the platform still falls short.

6 min read · updated Apr 19, 2026