Imputing gene expression data for large-scale association studies

Imputing gene expression data for large-scale association studies

We are used with imputing genotypic data, but can we also do the same for gene expression data? Yes, is the answer from Gusev et al., whose 2016 paper on the topic has already been cited >30 times. Studying gene expression alone in the very large sample sizes required to capture small effect sizes is expensive as well as limited by species availability. How does their approach work in short? Essentially, the authors start out from a small cohort where both genotypic and gene expression data are available. eQTLs represent genetic effect on gene expression, and the SNP component of eQTLs can then be used as a predictor of gene expression. Cross-validation was then used to evaluate the performance of the predictors. One limitation of the study was the finding that the prediction accuracy was lower than the theoretical expected accuracy. Another weakness is the possibility that findings from the analysis are confounded by the phenotype causing gene expression changes (and not the genotype).

Comments are closed.