Predicative modeling using genome-wide DNA methylation data
Lily Wang,Lanyu Zhang,Gabriel Odom,Lizhong Liu,Tiago Chedraoui Silva University of Miami
Abstract
In the search of predicative methylation signatures for complex diseases, previous studies have highlighted valuable DNA methylation-based biomarkers. However, almost all of these studies have built prediction models based on single CpGs, without considering methylation status of neighboring CpG sites. Compared with single CpGs, differentially methylated regions (DMRs) give higher confidence and likelihood of biological importance. However, how to build DNA methylation-based prediction models that take advantage of the regional nature of methylation data is still unclear. Using datasets from several large epigenome-wide association studies for Alzheimer’s disease, we compared different strategies for identifying genomic regions, and summarizing methylation levels at multiple CpGs within the genomic regions for building prediction models. To evaluate performance of different prediction models, we used a five-fold cross-validation scheme with 10 repetitions. Our results showed that identifying co-methylated clusters within genomic regions improves prediction performance, while selecting the most significant CpGs within a genomic region to represent the region decreases prediction performance compared to alternative summary measures such as mean or median.
Keywords: DNA methylation,prediction modeling