Methods
How Haeckel actually computes everything, from spatial-thinning ancestry to Bayesian polygenic scoring.
The full pipeline that turns your raw genotypes into a 62-population ancestry vector: spatial thinning, MLE-EM, bootstrap Wald testing, and AMR deconvolution.
How PRS-CS, LDpred2, and SBayesR turn GWAS summary statistics and an LD reference panel into a calibrated polygenic score, with convergence diagnostics that flag unreliable scores.
How a dense vector representation captures genome plus interests, and how cosine similarity surfaces compatible matches without revealing identifying information.
A reference-free test that detects long, divergent haplotypes consistent with introgression from an archaic source population, even when no archaic genome is available for comparison.
How the four-population test detects introgression between specific groups, how f4-ratios estimate the admixture proportion, and why we use a block jackknife for the standard error.
How a hidden Markov model with Modern, Neanderthal, Denisovan, and Unknown-Archaic states segments your genome into ancestry tracts, and how the coalescent dating step assigns each tract a TMRCA.
How PRS-CS turns marginal GWAS effect sizes into joint posterior estimates using a continuous shrinkage prior, an MCMC sampler with two Gibbs chains per LD block, and Gelman-Rubin convergence diagnostics.
How LDpred2's spike-and-slab prior captures both the bulk of zero-effect SNPs and the tail of true causal variants, with four operating modes including the auto mode that learns its hyperparameters from the data.
How SBayesR generalises spike-and-slab to a four-component mixture, with a Dirichlet hyperprior on the mixing proportions and an inverse-Gamma on the residual variance.
How Lassosum uses the LASSO with linkage-disequilibrium correction to produce sparse polygenic scores, why we cap the LD-mixing parameter at 0.4 for cross-ancestry safety, and when the method shines.
Why polygenic scores trained in one ancestry transfer poorly to others, and how Haeckel applies a per-individual mu and sigma calibration plus pre-computed variance scalars to make scores comparable across users.
How the KING-robust estimator infers pairwise relatedness from heterozygote sharing, why it does not require an explicit allele-frequency reference, and how Haeckel translates kinship coefficients into relationship classifications.
How sliding-window ROH detection identifies long stretches of homozygous genotypes that signal recent shared ancestry between a person's parents, and how F_ROH summarises the signal across the genome.
The four metrics Haeckel computes on every uploaded file to flag genotyping errors, sample contamination, and array-platform problems before any downstream analysis runs.
The 89-gene panel, the ClinVar matching strategy, sex-aware X-linked handling, and how penetrance estimates produce age-stratified lifetime risk numbers.
A weighted projection of your ancestry composition onto pre-computed reference centroids, calibrated so that Euclidean distance approximates √Fst between any pair of users.
Every stage your raw DNA file passes through, in order, from upload through encryption, parsing, variant enrichment, the seventeen analyser modules, and final result persistence.
How the platform combines the 1000 Genomes Phase 3 release with the gnomAD v3.1.2 HGDP+TGP subset to build a 62-population reference for ancestry inference, PCA centroids, and PRS calibration.
How Haeckel computes and ships LD reference panels for all five 1000 Genomes superpopulations, and how the mixing parameter caps cross-ancestry amplification.
How Haeckel benchmarks every analyser against held-out 1000 Genomes samples, what the per-sample accuracy looks like, and where the platform still falls short.