Background
Independence models among variables is one of the most relevant topics in epidemiology,particularly in molecular epidemiology for the study of gene-gene and gene-environmentinteractions. They have been studied using three main kinds of analysis: regressionanalysis, data mining approaches and Bayesian model selection. Recently, methods ofalgebraic statistics have been extensively used for applications to biology. In this paperwe present a synthetic, but complete description of independence models in algebraicstatistics and a new method of analyzing interactions, that is equivalent to thecorrection by Markov bases of the Fisher’s exact test.
Methods
We identified the suitable algebraic independence model for describing the dependence oftwo genetic variables from the occurrence of cancer and exploited the theory of toricvarieties and Gröbner basis for developing an exact independence test based on theDiaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it tothe study of gene-gene interaction in Gen-Air, an European case-control study. We computedthe p-value for each pair of genetic variables interacting with disease status and wecompared our results with the standard asymptotic chi-square test.
Results
We found an association among COMT Val158Met, APE1Asp148Glu and bladder cancer (p-value: 0.009). We also found the interactionamong TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer(p-value: 0.00035). Leukaemia was observed to significantly interact with the pairsERCC2 Lys751Gln and RAD51 172 G > T (p-value0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value:0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value:0.0036).
Conclusion
Taking advantage of results from theoretical and computational algebra, the method wepropose was more selective than other methods in detecting new interactions, andnevertheless its results were consistent with previous epidemiological and functionalfindings. It also helped us in controlling the multiple comparison problem. In the lightof our results, we believe that the epidemiologic study of interactions can benefit ofalgebraic methods based on properties of toric varieties and Gröbner bases.