POPULARITY
Background: Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied. Results: We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case-control status - effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case-control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches. Conclusions: To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case-control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway.
Background: Genome-wide association studies (GWAS) with metabolic traits and metabolome-wide association studies (MWAS) with traits of biomedical relevance are powerful tools to identify the contribution of genetic, environmental and lifestyle factors to the etiology of complex diseases. Hypothesis-free testing of ratios between all possible metabolite pairs in GWAS and MWAS has proven to be an innovative approach in the discovery of new biologically meaningful associations. The p-gain statistic was introduced as an ad-hoc measure to determine whether a ratio between two metabolite concentrations carries more information than the two corresponding metabolite concentrations alone. So far, only a rule of thumb was applied to determine the significance of the p-gain. Results: Here we explore the statistical properties of the p-gain through simulation of its density and by sampling of experimental data. We derive critical values of the p-gain for different levels of correlation between metabolite pairs and show that B/(2*alpha) is a conservative critical value for the p-gain, where a is the level of significance and B the number of tested metabolite pairs. Conclusions: We show that the p-gain is a well defined measure that can be used to identify statistically significant metabolite ratios in association studies and provide a conservative significance cut-off for the p-gain for use in future association studies with metabolic traits.
Background: Genome-wide association studies (GWAS) have provided a large set of genetic loci influencing the risk for many common diseases. Association studies typically analyze one specific trait in single populations in an isolated fashion without taking into account the potential phenotypic and genetic correlation between traits. However, GWA data can be efficiently used to identify overlapping loci with analogous or contrasting effects on different diseases. Results: Here, we describe a new approach to systematically prioritize and interpret available GWA data. We focus on the analysis of joint and disjoint genetic determinants across diseases. Using network analysis, we show that variant-based approaches are superior to locus-based analyses. In addition, we provide a prioritization of disease loci based on network properties and discuss the roles of hub loci across several diseases. We demonstrate that, in general, agonistic associations appear to reflect current disease classifications, and present the potential use of effect sizes in refining and revising these agonistic signals. We further identify potential branching points in disease etiologies based on antagonistic variants and describe plausible small-scale models of the underlying molecular switches. Conclusions: The observation that a surprisingly high fraction (>15%) of the SNPs considered in our study are associated both agonistically and antagonistically with related as well as unrelated disorders indicates that the molecular mechanisms influencing causes and progress of human diseases are in part interrelated. Genetic overlaps between two diseases also suggest the importance of the affected entities in the specific pathogenic pathways and should be investigated further.
Background: Genome-wide association studies (GWAS) based on single nucleotide polymorphisms (SNPs) revolutionized our perception of the genetic regulation of complex traits and diseases. Copy number variations (CNVs) promise to shed additional light on the genetic basis of monogenic as well as complex diseases and phenotypes. Indeed, the number of detected associations between CNVs and certain phenotypes are constantly increasing. However, while several software packages support the determination of CNVs from SNP chip data, the downstream statistical inference of CNV-phenotype associations is still subject to complicated and inefficient in-house solutions, thus strongly limiting the performance of GWAS based on CNVs. Results: CONAN is a freely available client-server software solution which provides an intuitive graphical user interface for categorizing, analyzing and associating CNVs with phenotypes. Moreover, CONAN assists the evaluation process by visualizing detected associations via Manhattan plots in order to enable a rapid identification of genome-wide significant CNV regions. Various file formats including the information on CNVs in population samples are supported as input data. Conclusions: CONAN facilitates the performance of GWAS based on CNVs and the visual analysis of calculated results. CONAN provides a rapid, valid and straightforward software solution to identify genetic variation underlying the `missing' heritability for complex traits that remains unexplained by recent GWAS. The freely available software can be downloaded at http://genepi-conan.i-med.ac.at.