Finding people genetically similar to you
How the "Genetically Similar" sidebar surfaces real matches, what the cosine similarity score means, and what you can and cannot infer from a match.
The "Genetically Similar" card on your sidebar shows you the five users on the platform whose genomes sit closest to yours by Euclidean distance in 3D principal-component space. The PCA coordinates are computed from your ancestry composition during the genomic pipeline run, then re-projected onto the same global PCA basis as every other user.
What the score means
A "similarity" of 0.95 between you and another user means the two of you sit very close together in PCA space, which usually implies similar ancestry composition. It does not necessarily imply you are related. People can sit close together because both are predominantly Northern European, for example, without sharing any recent common ancestor at all.
For actual relatedness, the kinship inference module on the Genome page is the right tool. It uses the KING-robust method to estimate the kinship coefficient between any two individuals, which translates directly into a relationship classification (parent, sibling, first cousin, second cousin, and so on). Kinship inference requires consent from the other user.
How the algorithm chooses the five matches
- Pre-filter to candidates of the same broad ancestry-composition class (within ±15% on each major component). Without this filter the top results would all be drawn from the global mean rather than from genuinely close neighbours.
- Compute Euclidean distance in 3D PCA space against every remaining candidate. The PCA basis is calibrated so distance is approximately √Fst, the standard population-genetic distance metric.
- Rank by ascending distance and keep the top 50.
- Filter out users you have blocked, users who have blocked you, users with hideFromAISearch enabled, and the system's own demo and historical accounts.
- Return the top 5 from the filtered list.
PCA distance vs embedding distance
The Networks recommender uses two distinct distance metrics for two distinct purposes. PCA distance measures genetic similarity (how close are the two genomes in ancestry space). Embedding distance measures combined similarity (genome plus interests plus haplogroups plus a small bag of profile signals, projected into a dense vector by an embedding model). The "Genetically Similar" card uses PCA distance only; the "Networks for you" recommender uses embedding distance, which can surface someone with very different ancestry but matching interests.
Performance characteristics
The match query returns results in interactive latency at the current cohort size. The PCA distance computation is a three-axis Euclidean sum across all candidates, which is essentially free. The embedding-similarity query uses an approximate-nearest-neighbour index over the vector column with recall and latency tuned for the Networks experience.
Explain this article in the context of my own genome and tell me what is most relevant for me.