Multikollinearität im Zeitalter der statistischen Genomik: Vorschläge zur Einbeziehung von Abhängigkeiten zwischen molekularen Kovariaten und die Anwendung in der Tierzucht
Contact: Dr. Dörte Wittenburg
Duration: 2017-2020
Funding: Deutsche Forschungsgemeinschaft, DFG WI 4450/2-1
Abstract:
In animal breeding, molecular data (e.g. single nucleotide polymorphisms; SNPs) are incorporated as predictor variables in statistical models to reach an improved genomic evaluation of animals. This leads to more precisely estimated breeding values of not-yet phenotyped animals, which is important for breeding purposes, and enables the genetic architecture of some traits to be elucidated. Not only is the effect size relevant but also the position on the genome. Particularly as high-dimensional SNP data are available, a causative variant can be pinpointed to a specific base pair on the genome. As the number of model parameters increases with a still growing number of SNPs, multicollinearity between covariates can affect the results of whole-genome regression methods. The objective of this study is to additionally incorporate dependencies between the molecular covariates, which are due to the linkage and linkage disequilibrium among chromosome segments, for more accurate estimates of SNP effects. The theoretical covariance between SNP genotypes can be used to filter the whole set of SNPs in order to remain at less but representative predictor variables. Furthermore, a joint approach is proposed that allows the simultaneous selection and shrinkage of relevant predictors. It is hypothesised that this method fulfils the requirements of genomic evaluation: the dependencies between SNPs are considered, smooth estimates are obtained within groups of highly correlated SNPs and the solution is sparse among and also within these groups. Thus, genomic regions that affect a trait can be identified.