BM_2026v17n3

Bioscience Methods 2026, Vol.17, No.3, 153-168 http://bioscipublisher.com/index.php/bm 153 Research Article Open Access SNP-Based Heritability Is Not a Parameter but a Model-Defined Estimand: Evidence from UK Biobank Xuanjun Fang Hainan Provincial Key Laboratory of Crop Molecular Breeding, Hainan Institute of Tropical Agricultural Resources (HITAR), Sanya, 572025 Corresponding author: xuanjunfang@hitar.org Bioscience Methods, 2026, Vol.17, No.3 doi: 10.5376/bm.2026.17.0013 Received: 06 Apr., 2026 Accepted: 07 May, 2026 Published: 18 May, 2026 Copyright © 2026 Fang, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Fang X.J., 2026, SNP-based heritability is not a parameter but a model-defined estimand: evidence from UK biobank, Bioscience Methods, 17(3): 153-168 (doi: 10.5376/bm.2026.17.0013) Abstract SNP-based heritability is widely interpreted as a fundamental property of complex traits, yet estimates vary substantially across methods. Here we show that this variation arises because different approaches do not estimate the same quantity: SNP-based heritability is a model-defined estimand rather than a single biological parameter. Using UK Biobank height data as a representative case, we systematically compare estimates from individual-level methods (GCTA-GREML and related estimators) and summary-statistics-based approaches (LD Score Regression and SumHer). We find that GREML-based methods consistently yield higher estimates (~0.60-0.69), LDSC produces systematically lower values (~0.56), and SumHer yields intermediate or higher estimates (~0.63). These differences persist under matched samples and SNP sets, indicating that they cannot be attributed to sampling variation alone. We demonstrate that the discrepancies arise from differences in data representation, model assumptions, and the treatment of linkage disequilibrium (LD) and allele frequency. Accordingly, each method targets a distinct estimand: GREML captures variance explained through genomic relationships, LDSC estimates LD-weighted marginal effects, and SumHer models MAF- and LD-dependent architectures. This framework resolves apparent inconsistencies in SNP heritability estimates and clarifies that cross-method comparisons are generally not statistically valid without alignment of underlying assumptions. More broadly, our results redefine SNP-based heritability as a model-dependent functional determined by SNP coverage, LD structure, and estimation framework. These findings provide a principled basis for interpreting heritability estimates and have implications for genetic studies ranging from biobank-scale analyses to genomic prediction. Keywords SNP heritability; Estimand; Estimand mismatch; GCTA-GREML; LD Score Regression (LDSC); UK Biobank; Linkage disequilibrium; Genetic architecture 1 Introduction Heritability is a central parameter in quantitative genetics, used to quantify the contribution of genetic factors to phenotypic variation. In classical frameworks, heritability is typically estimated using pedigree or twin-based designs, where genetic variance is inferred from known relatedness structures. However, with the advent of genome-wide association studies (GWAS) and high-throughput genotyping technologies, the paradigm of heritability estimation has undergone a fundamental shift-from pedigree-based inference to SNP-based heritability derived from molecular markers (SNP-based heritability, hSNP 2 ) (Yang et al., 2010). The evolution of statistical genetic methods-from linkage analysis and candidate gene approaches to GWAS-has fundamentally reshaped how genetic variation is quantified and interpreted (Fang and Wu, 2026). Within this paradigm shift, SNP-based heritability estimation, particularly under the GCTA-GREML framework, represents a transition from pedigree-based inference to genotype-driven variance decomposition (Fang, 2026). SNP-based heritability is typically defined as the proportion of phenotypic variance explained by observed or imputed SNP markers across the genome. Its estimation is commonly based on linear mixed models (LMMs) or their extensions. Among these, the GCTA-GREML framework estimates genetic variance components using individual-level genotype data by constructing a genomic relationship matrix (GRM), and is widely regarded as approximately unbiased and statistically efficient under appropriate model assumptions (Yang et al., 2016). In contrast, LD Score Regression (LDSC) and its extensions (e.g., S-LDSC) estimate heritability using GWAS summary statistics, enabling large-scale analyses when individual-level data are unavailable (Bulik-Sullivan et al.,

RkJQdWJsaXNoZXIy MjQ4ODYzNA==