PGT_2026v17n3

Plant Gene and Trait 2026, Vol.17, No.3, 156-172 http://genbreedpublisher.com/index.php/pgt 161 summary data while accounting for model uncertainty and LD structure (Evans et al., 2024). This class of methods emphasizes robustness in model selection and statistical inference. More recent developments include approaches that incorporate nonparametric or Bayesian modeling strategies (e.g., TIGAR and its extensions), as well as methods that integrate multiple priors to improve power (Parrish et al., 2022; Liang et al., 2025). These extensions broaden the applicability of TWAS across diverse data settings. In practice, different methods reflect distinct priorities. Some emphasize the portability of expression weights across datasets, whereas others focus on model integration and uncertainty control. Regardless of the approach, TWAS results are typically interpreted in conjunction with fine-mapping and colocalization analyses, which help refine gene-level signals into more credible candidate regions (Li and Ritchie, 2021; Mai et al., 2023). 3.3 Limitations Despite its utility in linking GWAS signals to functional interpretation, TWAS has several important limitations. First, the transferability of expression prediction models is often constrained. The estimated weights depend on the ancestry, LD structure, and tissue context of the reference dataset. When these differ from those of the target GWAS population, prediction accuracy may decline, leading to reduced statistical power and potential bias (Li and Ritchie, 2021; Mai et al., 2023). Multi-tissue approaches and expanded reference resources such as GTEx can mitigate this issue to some extent, but do not fully resolve it. Second, TWAS results remain fundamentally associative. Because of LD, the weights used to predict expression may capture signals from variants that are correlated with, but not identical to, the true causal variant. In addition, co-regulation and unobserved confounding can cause non-causal genes to appear significantly associated. As a result, interpreting TWAS findings as evidence of causal effects can be misleading (Wainberg et al., 2019; Evans et al., 2024). Simulation and methodological studies have shown that such interpretations may inflate false positive rates if not carefully controlled (Zhu and Zhou, 2020; De Leeuw et al., 2023). For this reason, TWAS findings are typically evaluated alongside locus-level evidence. Colocalization analyses can be used to assess whether GWAS and expression signals are consistent with a shared underlying variant, while Mendelian randomization can provide additional support for potential causal relationships. This layered approach helps reduce misinterpretation and improves the reliability of downstream inference. Finally, the scope of TWAS is limited by the availability and coverage of reference datasets. Current eQTL resources provide incomplete representation of rare variants, trans-regulatory effects, and noncoding RNAs, which constrains the comprehensiveness of the models. Future developments are likely to focus on more flexible modeling strategies, expanded multi-ancestry and multi-tissue datasets, and explicit modeling of genotype-by-environment interactions, particularly in plant and multi-environment studies (Parrish et al., 2022; Liang et al., 2025). 4 The Role and Limitations of Colocalization Analysis In moving from GWAS signals toward functional interpretation, a central question is whether association signals observed in different data sources-such as GWAS and molecular QTL-reflect the same underlying genetic factors within a given genomic region. Colocalization analysis was developed to address this question by evaluating the consistency of signals across datasets and providing a basis for downstream interpretation. Unlike association analysis within a single dataset, colocalization focuses on the correspondence between signals from different sources. The goal is to assess whether two signals can plausibly be explained by the same underlying variant, given the local LD structure and statistical uncertainty. In practice, this step serves to refine candidate regions, narrowing the focus from general associations to loci that are more likely to support coherent biological interpretation. 4.1 Statistical framework of colocalization Widely used approaches such as COLOC adopt a Bayesian framework to compare a set of mutually exclusive hypotheses, including no association, association in only one dataset, independent associations in both datasets,

Made with FlippingBook

RkJQdWJsaXNoZXIy MjQ4ODYzNA==