Can polygenic scores predict educational outcomes?


In this blog, Dr. Emma Meaburn, our resident genetics expert, discusses the latest research in using direct measures of DNA variation to predict educational outcomes. Does this work? How can it be useful?

Individual differences in educational traits are heritable

We each contain in every cell in our body the complete set of genetic instructions to build a human, with the distinctly human characteristics of a highly developed brain and the capacity to reason and communicate. Despite the overarching genetic similarity between us, there are numerous – and important – differences in our DNA sequence. If you were to pick any two unrelated individuals at random and examine their DNA sequence, you would find that they differ at roughly 1 in every 1,200 DNA letters (bases). It is now beyond doubt that these genetic differences account for a portion of the differences we see between individuals in how they think, feel and behave. This is termed ‘heritability’. Twin and DNA-based studies have robustly demonstrated that individual differences in educationally relevant traits such as time spent in education (Lee et al., 2018), general cognitive function (Davies et al., 2018) and even academic subjects studied (Rimfeld et al., 2016) are heritable. To illustrate the size of this genetic influence, a recent DNA-based study by Donati et al identified SNP-heritabilities ranging from 41-53% for performance in National Curriculum-based Standardised Assessment tests (SATs) of English, Maths and Science at 11 and 14 years of age (Donati et al, 2021).

Polygenic scores capture a portion of the heritability of educational traits

Let’s refer to a difference in a DNA base between individuals as a ‘genetic variant’. One key insight from recent large-scale genetic studies is that there are many thousands of common genetic variants that together contribute to the heritability of educational traits and outcomes. It transpires that even though each individual DNA variant makes a small contribution, they can be summed together into a single genetic ‘score’ that predicts a portion of the differences we observe or measure between people. This aggregate measure has been termed a ‘polygenic score’ (or polygenic index). To calculate a person’s polygenic score for a particular trait, you sum up the total number of risk-increasing and risk-decreasing variants found in their genome, each weighted by their magnitude of impact. The polygenic score for number of ‘years of education’ completed predicts around 11% of the variance in years of schooling in adolescents and adults (Lee et al., 2018). To put the size of this explanatory power into context, this is better than household income, although not quite as good as maternal education as a predictor of child educational attainment.

Studies measuring DNA variation directly and attempting to predict educational outcomes struggled for a long time because the signal they were trying to detect was so tiny. The studies had to include thousands and thousands of participants before statistically reliable links between DNA variation and years spent in education could be detected. Lee et al.‘s (2018) study involved 1.1 million participants. The same group of researchers last year pushed the number to over 3 million participants and now reported that they could predict up to 16% of differences in educational attainment from direct measures of DNA variation (Okbay et al., 2022).

Polygenic scores for early identification of individuals at risk

Polygenic scores are normally distributed in a population: some people will have a higher score relative to everyone else, while some people will have lower score, but most people will be average. In an out-of-sample prediction, 75% of individuals in the top 10% of the ‘years of schooling’ polygenic distribution go to university, as compared to 25% of individuals in the bottom 10% (Plomin & von Stumm, 2018).  Educational systems have limited resources, and these resources are currently targeted on interventions designed to support students who struggle. Given the same finite resources, low polygenic scores could be a mechanism for triggering in-person assessment or early (or more frequent) monitoring, before the emergence of overt problems. In principle, measures of DNA variation are available at birth.

The (current) challenges for polygenic prediction

Aside from the (very real) practical and ethical challenges of requiring genetic data for children, what are the key barriers for polygenic prediction of educational attainment?

Firstly, it is important to remember that polygenic scores indicate propensity, not inevitability. This is because they do not capture all genetic effects, and genetic effects will always be contingent upon the (home and school) environments in which we grow up. This means that many individuals born with a low polygenic score will still flourish academically. Conversely, individuals with very high polygenic scores may not perform well academically for other reasons, such as experiencing a large environmental risk or having genetic effects not captured by the polygenic score. Research to identify the full spectrum of genetic effects is ongoing (Ganna et al., 2016), but in parallel we need a better understanding of how polygenic effects vary as a function of the environment (Domingue et al., 2020).

Secondly, the studies on which polygenic scores are derived have been limited to populations with European genetic ancestries and the current Educational Attainment (EA) polygenic scores are not as accurate in its predictive abilities in non-European samples. This severely limits generalisability, and risks increasing economic and education disparities between European and non-European populations (Martin et al., 2019). To redress this imbalance culturally and ancestrally diverse genetic studies are a research priority, but the results will take time to feed through (Peterson et al., 2019).

Thirdly, polygenic scores for educational prediction will arguably remain of limited practical value until we know what the optimal environments are that will maximise genetic potential. For this, we need a much better understanding of how polygenic influences impact molecular, biological and neural processes to cause cognitive and behavioural differences between people. Important research is addressing this question (see Dreary et al., 2020; van der Meer & Kaufmann, 2022), but we are still some way off from having a good explanative account of polygenic effects.

“How does society want polygenic scores to be used in education? An analogy can be made with attainment-based selection and streaming in schools … but now we are dealing with a marker of academic potential rather than realised performance”

Finally, even if these challenges were overcome, a central question to ask is how does society want polygenic scores to be used in education? An analogy can be made with attainment-based selection and streaming in schools, the benefits of which continue to be debated (Rix & Ingham, 2021). The arguments are the same, but now we are one step removed and dealing with a marker of academic potential, rather than realised performance. For example, polygenic scores could theoretically be used to personalise educational provision and maximise every student’s educational potential. Alternatively, they could be used to focus resources and identify students deemed to have genetically endowed promise. The answer to this difficult – but important – question is not clear cut.

Future perspectives

So where does this leave us? Polygenic scores should not be ignored, but the hype (and concern) around them needs to be informed by what they can and cannot realistically deliver. Polygenic scores will never definitively predict complex educational outcomes, as heritability is not 100%. However, they do predict (statistically) meaningful differences in educational traits between individuals in a population, and this predictive power is likely to increase.

If their potential in educational settings is to be actualised, we need a clearer understanding of how they relate to, and can be integrated with, existing (non-genetic) measures of educational performance and potential. Only then can we progress in a way that ensures educational and social inequalities in the classroom are mitigated rather than exacerbated.

If you are interested in these topics, see our recent CEN seminar discussing the book “The Genetic Lottery” by behavioural geneticist Kathryn Paige-Harden:


Davies, G., Lam, M., Harris, S.E. et al. Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat Commun 9, 2098 (2018).

Deary, I.J., Cox, S.R. & Hill, W.D. Genetic variation, brain, and intelligence differences. Mol Psychiatry 27, 335–353 (2022).

Domingue, Benjamin W., Sam Trejo, Emma Armstrong Carter, and Elliot M. Tucker-Drob.
2020. “Interactions between Polygenic Scores and Environments: Methodological and Conceptual Challenges.” Sociological Science 7: 465-486.

Donati, G., Dumontheil, I., Pain, O. et al. Evidence for specificity of polygenic contributions to attainment in English, maths and science during adolescenceSci Rep 11, 3851 (2021).

Ganna A, Genovese G, Howrigan DP, Byrnes A, et al. (2016). Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nat Neurosci. 2016 Dec;19(12):1563-1565. doi: 10.1038/nn.4404. Epub 2016 Oct 3. PMID: 27694993; PMCID: PMC5127781.

Lee, J.J., Wedow, R., Okbay, A. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individualsNat Genet 50, 1112–1121 (2018).

Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019 Apr;51(4):584-591. doi: 10.1038/s41588-019-0379-x. Epub 2019 Mar 29. Erratum in: Nat Genet. 2021 May;53(5):763. PMID: 30926966; PMCID: PMC6563838.

Okbay A, Wu Y, Wang N, et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat Genet. 2022 Apr;54(4):437-449. doi: 10.1038/s41588-022-01016-z.

Paige Harden, K. (2021). The Genetic Lottery: Why DNA Matters for Social Equality. Published by Princeton University Press 2021.

Peterson RE, Kuchenbaecker K, Walters RK, et al. (2019). Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations. Cell. 2019 Oct 17;179(3):589-603. doi: 10.1016/j.cell.2019.08.051. Epub 2019 Oct 10. PMID: 31607513; PMCID: PMC6939869.

Plomin R, von Stumm S. The new genetics of intelligence. Nat Rev Genet. 2018;19:148–59.

Rimfeld K, Ayorech Z, Dale PS, Kovas Y, Plomin R. Genetics affects choice of academic subjects as well as achievement. Sci Rep. 2016 Jun 16;6:26373. doi: 10.1038/srep26373. PMID: 27310577; PMCID: PMC4910524.

Rix, J., & Ingham, N. (2021).The impact of education selection according to notions of intelligence: A systematic literature review. International Journal of Educational Research Open
Volume 2, 2021, 100037.

van der Meer, D., Kaufmann, T. Mapping the genetic architecture of cortical morphology through neuroimaging: progress and perspectivesTransl Psychiatry 12, 447 (2022).

Leave a Reply

Your email address will not be published. Required fields are marked *