CMB_2026v16n3

Computational Molecular Biology 2026, Vol.16, No.3, 159-180 http://bioscipublisher.com/index.php/cmb 160 Traditional heritability estimation primarily relies on pedigree-based variance component models, which infer additive genetic variance by comparing phenotypic similarity between related and unrelated individuals (Yang et al., 2017; Srivastava et al., 2023). However, these methods depend heavily on the completeness of pedigree information and are often constrained by simplified assumptions regarding shared environmental effects. In populations lacking detailed pedigree records or affected by environmental confounding, both their applicability and accuracy are limited (Zhu and Zhou, 2020). With the widespread adoption of high-throughput genotyping technologies and the emergence of genome-wide association studies (GWAS), the field has undergone a methodological revolution. Yang et al. (2010; 2011) proposed the genome-wide complex trait analysis (GCTA) framework based on single nucleotide polymorphisms (SNPs), and further developed the genomic-relatedness-based restricted maximum likelihood (GREML) method based on the genome-wide relationship matrix (GRM). By constructing a GRM and leveraging SNP-derived genetic similarity among individuals after removing close relatives, this approach decomposes phenotypic variance and overcomes the limitations of traditional pedigree-based methods (Yang et al., 2011; Zhu and Zhou, 2020). Compared with pedigree models, GCTA-GREML enables direct estimation of heritability from SNP data without requiring pedigree information, and supports partitioning of genetic variance by genomic regions or functional annotations, thereby substantially expanding the scope of heritability estimation (Zhu and Zhou, 2020; Srivastava et al., 2023). However, the introduction of the GCTA and GREML framework has also triggered extensive debate regarding the issue of “missing heritability.” Classical twin and pedigree studies often yield relatively high heritability estimates, whereas SNP-based GREML estimates are typically substantially lower. This discrepancy has been interpreted as evidence that GWAS cannot fully explain the genetic variation underlying complex traits (Yang et al., 2011; 2015). Potential explanations include incomplete tagging of causal variants by SNPs, insufficient contribution from rare variants, complex genetic mechanisms such as non-additive effects and gene-environment interactions, as well as limitations of statistical modeling (Speed et al., 2016; Evans et al., 2017; Mathew et al., 2017). Furthermore, existing studies have shown that GCTA-GREML estimates are highly sensitive to factors such as GRM construction methods, sample composition, linkage disequilibrium (LD) patterns, and phenotypic measurement error, further highlighting the complexity of its application and the need for careful interpretation (Speed et al., 2012; Kumar et al., 2015; Evans et al., 2017). Thus, the problem of missing heritability is not only a statistical challenge but also a genetic and biological one, and the associated debates have driven continuous innovation in both methodology and theory. In recent years, improvements such as LD-adjusted relationship matrices and multi-component modeling have been proposed, providing potential solutions to the limitations of the original GCTA-GREML framework (Mathew et al., 2017; Zhu and Zhou, 2020). In crop breeding practice in China, DNA marker-assisted breeding was systematically summarized and promoted from the late 20th to early 21st century. Its core idea is to track quantitative trait loci (QTLs) or candidate genes using a limited number of molecular markers, thereby improving selection efficiency (Fang et al., 2001). This study systematically reviews the theoretical framework and statistical assumptions of GCTA and GREML relative to pedigree-based methods, clarifying their conceptual positioning and applicability boundaries in heritability estimation. The analytical framework adopted here is consistent with our previous systematic examination of the statistical continuity among linkage analysis, candidate gene strategies, and GWAS, emphasizing the continuity and division of roles among different methods in terms of statistical assumptions, signal scale, and inferential objectives (Fang and Wu, 2026). We focus on the derivation logic of the GREML method within variance component modeling, compare its estimands and interpretive scope with those of traditional pedigree models, and discuss the potential impact of model assumptions on result interpretation. Based on the above background, this study does not aim to provide a general introductory overview, but rather focuses on the core issue of the “statistical interpretability boundaries of SNP-based heritability estimation,” with the goal of constructing an operational framework for analysis and interpretation. Specifically, the study addresses

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==