Hostname: page-component-78c5997874-t5tsf Total loading time: 0 Render date: 2024-11-19T07:25:56.624Z Has data issue: false hasContentIssue false

Integrating biological information into the statistical analysis and design of microarray experiments*

Published online by Cambridge University Press:  05 October 2009

G. J. M. Rosa*
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
A. I. Vazquez
Affiliation:
Department of Dairy Science, University of Wisconsin, Madison, WI 53706, USA
*
Get access

Abstract

Microarray technology is a powerful tool for animal functional genomics studies, with applications spanning from gene identification and mapping, to function and control of gene expression. Microarray assays, however, are complex and costly, and hence generally performed with relatively small number of animals. Nevertheless, they generate data sets of unprecedented complexity and dimensionality. Therefore, such trials require careful planning and experimental design, in addition to tailored statistical and computational tools for their appropriate data mining. In this review, we discuss experimental design and data analysis strategies, which incorporate prior genomic and biological knowledge, such as genotypes and gene function and pathway membership. We focus the discussion on the design of genetical genomics studies, and on significance testing for detection of differential expression. It is shown that the use of prior biological information can improve the efficiency of microarray experiments.

Type
Full Paper
Copyright
Copyright © The Animal Consortium 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

This paper has been presented at the session “Genomics selection and bioinformatics” of the 59th Annual meeting of the European Association for Animal Production held in Vilnius (Lithuania), 24 to 27 August 2008. Dr A. Maki-Tanila acted as a guest editor.

References

Beissbarth, T, Speed, TP 2004. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20, 14641465.CrossRefGoogle ScholarPubMed
Bueno Filho, JSS, Gilmour, SG, Rosa, GJM 2006. Design of microarray experiments for genetical genomics studies. Genetics 174, 945957.CrossRefGoogle ScholarPubMed
Bussemaker, HJ, Li, H, Siggia, ED 2001. Regulatory element detection using correlation with expression. Nature Genetics 27, 167171.CrossRefGoogle ScholarPubMed
Cardoso, FF, Rosa, GJM, Steibel, JP, Ernst, CW, Bates, RO, Tempelman, RJ 2008. Selective transcriptional profiling and data analysis strategies for eQTL mapping in outbred F2 populations. Genetics 180, 16791690.CrossRefGoogle Scholar
Cassar-Malek, I, Picard, B, Bernard, C, Hocquette, J-F 2008. Application of gene expression studies in livestock production systems: a European perspective. Australian Journal of Experimental Agriculture 48, 701710.CrossRefGoogle Scholar
Cohen, BA, Mitra, RD, Hughes, JD, Church, GM 2000. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nature Genetics 26, 183186.CrossRefGoogle ScholarPubMed
Conlon, EM, Liu, XS, Lieb, JD, Liu, JS 2003. Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of sciences of the United States of America 100, 33393344.CrossRefGoogle ScholarPubMed
Cui, XG, Hwang, JTG, Qiu, J, Blades, NJ, Churchill, GA 2005. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 5975.CrossRefGoogle ScholarPubMed
Fu, J, Jansen, RC 2006. Optimal design and analysis of genetic studies on gene expression. Genetics 172, 19931999.CrossRefGoogle ScholarPubMed
Fukuoka, Y, Inaoka, H, Kohane, IS 2004. Inter-species differences of co-expression of neighboring genes in eukaryotic genomes. BMC Genomics 5, 4.CrossRefGoogle ScholarPubMed
Hiendleder, S, Bauersachs, S, Boulesteix, A, Blum, H, Arnold, GJ, Frohlich, T, Wolf, E 2005. Functional genomics: tools for improving farm animal health and welfare. Revue Scientifique Et Technique-Office International Des Epizooties 24, 355377.CrossRefGoogle ScholarPubMed
Hoeschele, I, Li, H 2005. A note on joint versus gene-specific mixed model analysis of microarray gene expression data. Biostatistics 6, 183186.CrossRefGoogle ScholarPubMed
Huang, DS, Pan, W 2006. Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data. Bioinformatics 22, 12591268.CrossRefGoogle ScholarPubMed
Huang, DW, Sherman, BT, Lempicki, RA 2009. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research 37, 113.CrossRefGoogle ScholarPubMed
Imoto, S, Higuchi, T, Goto, T, Kuhara, S, Miyano, S, Tashiro, T 2003. Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. Proceedings IEEE Computer Society Bioinformatics Conference 104113.Google ScholarPubMed
Jaffrezic, F, De Koning, DJ, Boettcher, PJ, Bonnet, A, Buitenhuis, B, Closset, R, Déjean, S, Delmas, C, Detilleux, JC, Dovc, P, Duval, M, Foulley, JL, Hedegaard, J, Hornshoj, H, Hulsegge, IB, Janss, L, Jensen, K, Jiang, L, Lavric, M, Lê Cao, KA, Lund, MS, Malinverni, R, Marot, G, Nie, H, Petzl, W, Pool, MH, Robert-Granié, C, SanCristobal, M, van Schothorst, EM, Schuberth, HJ, Sorensen, P, Stella, A, Tosser-Klopp, G, Waddington, D, Watson, M, Yang, W, Zerbe, H, Seyfert, HM 2007. Analysis of the real EADGENE data set: comparison of methods and guidelines for data normalisation and selection of differentially expressed genes. Genetics Selection Evolution 39, 633650.CrossRefGoogle ScholarPubMed
Jannink, JL 2005. Selective phenotyping to accurately map quantitative trait loci. Crop Science 45, 901908.CrossRefGoogle Scholar
Jansen, RC, Nap, J 2001. Genetical genomics: the added value from segregation. Trends in Genetics 17, 388391.CrossRefGoogle ScholarPubMed
Jin, CF, Lan, H, Attie, AD, Churchill, GA, Bulutuglo, D, Yandell, BS 2004. Selective phenotyping for increased efficiency in genetic mapping studies. Genetics 168, 22852293.CrossRefGoogle ScholarPubMed
Kerr, MK, Churchill, GA 2001. Statistical design and the analysis of gene expression microarray data. Genetical Research 77, 123128.CrossRefGoogle ScholarPubMed
Kim, Y, Doan, BQ, Duggal, P, Bailey-Wilson, JE 2007. Normalization of microarray expression data using within-pedigree pool and its effect on linkage analysis. BMC Proceedings 1, S152.CrossRefGoogle ScholarPubMed
Kondor, RI, Lafferty, J 2002. Diffusion Kernels on Graphs and Other Discrete Input Spaces. Proceedings of the 19th International Conference on Machine Learning, pp. 315–322.Google Scholar
Kruglyak, S, Tang, H 2000. Regulation of adjacent yeast genes. Trends in Genetics 16, 109111.CrossRefGoogle ScholarPubMed
Lam, AC, Fu, JY, Jansen, RC, Haley, CS, de Koning, DJ 2008. Optimal design of genetic studies of gene expression with two-color microarrays in outbred crosses. Genetics 180, 16911698.CrossRefGoogle ScholarPubMed
Lee, I, Date, SV, Adai, AT, Marcotte, EM 2004. A probabilistic functional network of yeast genes. Science 360, 15551558.CrossRefGoogle Scholar
Lercher, MJ, Urrutia, AO, Hurst, LD 2002. Clustering of housekeeping genes provides a unified model of gene order in human genome. Nature Genetics 31, 180183.CrossRefGoogle ScholarPubMed
Morozova, O, Marra, MA 2008. Applications of next-generation sequencing technologies in functional genomics. Genomics 92, 255264.CrossRefGoogle ScholarPubMed
Oshlack, A, Emslie, D, Corcoran, LM, Smyth, GK 2007. Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes. Genome Biology 8, R2.CrossRefGoogle ScholarPubMed
Pan, W 2006a. Incorporating gene functional annotations in detecting differential gene expression. Applied Statistics 55, 301316.Google Scholar
Pan, W 2006b. Incorporating gene functional annotations in detecting differential gene expression. Journal of the Royal Statistical Society Series C 55, 301316.CrossRefGoogle Scholar
Piepho, H-P 2005. Optimal allocation in designs for assessing heterosis from cDNA gene expression data. Genetics 171, 359364.CrossRefGoogle ScholarPubMed
Rosa, GJM, De Leon, N, Rosa, AJM 2006. Review of microarray experimental design strategies for genetical genomics studies. Physiological Genomics 28, 1523.CrossRefGoogle ScholarPubMed
Rosa, GJM, Steibel, JP, Tempelman, RJ 2005. Reassessing design and analysis of two-colour microarray experiments using mixed effects models. Comparative and Functional Genomics 6, 123131.CrossRefGoogle ScholarPubMed
Roy, PJ, Stuart, JM, Lund, J, Kim, SK 2002. Chromosomal clustering of muscle-expressed genes in Canenorhabditis elegans. Nature 418, 975979.CrossRefGoogle Scholar
Smith, GW, Rosa, GJM 2007. Interpretation of microarray data: trudging out of the abyss towards elucidation of biological significance. Journal of Animal Science 85 (suppl. E), E20E23.CrossRefGoogle ScholarPubMed
Smyth, GK 2004. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3 (1), Article 3.CrossRefGoogle ScholarPubMed
Sorensen, P, Bonnet, A, Buitenhuis, B, Closset, R, Déjean, S, Delmas, C, Duval, M, Tosser-Klopp, G, Hedegaard, J, Hornshoj, H, Hulsegge, IB, Jaffrézic, F, Jensen, K, Jiang, L, de Koning, DJ, Lê Cao, KA, Nie, H, Petzl, W, Pool, MH, Robert-Granié, C, San Cristobal, M, van Schothorst, EM, Schuberth, HJ, Seyfert, HM, Waddington, D, Watson, M, Yang, W, Zerbe, H, Lund, MS 2007. Analysis of the real EADGENE data set: Multivariate approaches and post analysis. Genetics Selection Evolution 39, 651668.CrossRefGoogle ScholarPubMed
Spellman, PT, Rubin, GM 2002. Evidence for large domains of similarly expressed genes in Drosophila genome. Journal of Biology 1, 5.CrossRefGoogle ScholarPubMed
Subramanian, A, Tamayo, P, Mootha, VK, Mukherjee, S, Ebert, BL, Gillette, MA, Paulovich, A, Pomeroy, SL, Golub, TR, Lander, ES, Mesirov, JP 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 1554515550.CrossRefGoogle ScholarPubMed
Torres, TT, Metta, M, Ottenwälder, B, Schlötterer, C 2008. Gene expression profiling by massively parallel sequencing. Genome Research 18, 172177.CrossRefGoogle ScholarPubMed
Tseng, GC 2007. Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23, 22472255.CrossRefGoogle ScholarPubMed
Tuikkala, J, Elo, L, Nevalainen, OS, Aittokallio, T 2006. Improving missing value estimation in microarray data with gene ontology. Bioinformatics 22, 566572.CrossRefGoogle ScholarPubMed
Vazquez, AI, Rosa, GJM, de los Campos, G, Weigel, KA, Gianola, D 2009. Biologically informed models for microarray gene expression data analysis. In symposium on statistical genetics of livestock for the post-genomic era, Madison, WI, USA, 4 to 6 May 2009. (http://dysci.wisc.edu/sglpge/posters.html)Google Scholar
Vert, J-P, Kanehisa, M 2003. Graph-driven features extraction from microarray data using diffusion kernels and kernel CCA. In Advances in Neural Information Processing Systems 15 (ed. S Becker, S Thrun and K Obermayer), pp. 14251432. MIT Press, Cambridge, MA.Google Scholar
Wei, P, Pan, W 2008. Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics 24, 404411.CrossRefGoogle Scholar
Werhli, AV, Husmeier, D 2007. Reconstructing gene regulatory networks with Bayesian networks by combining expression data with multiple sources of prior knowledge. Statistical Applications in Genetics and Molecular Biology 6, 15.CrossRefGoogle ScholarPubMed
Williams, EJB, Bowles, DJ 2004. Coexpression of neighboring genes in the genome of Arabidopsis thaliana. Genome Research 14, 10601067.CrossRefGoogle ScholarPubMed
Wolfinger, RD, Gibson, G, Wolfinger, ED, Bennett, L, Hamadeh, H, Bushel, P, Afshari, C, Paules, RS 2001. Assessing gene significance from cDNA microarray expression data via mixed models. Journal of Computational Biology 8, 625637.CrossRefGoogle ScholarPubMed
Xu, Z, Zou, F, Vision, TJ 2005. Improving QTL mapping resolution in experimental crosses by the use of genotypically selected samples. Genetics 170, 401408.CrossRefGoogle ScholarPubMed
Yang, YH, Speed, T 2002. Design issues for cDNA microarray experiments. Nature Reviews Genetics 3, 579588.CrossRefGoogle ScholarPubMed
Zhang, K, Lefkowitz, ER, Wei, H, Lorraine, A, Page, G, Allison, DB 2009 Does sequence similarity predict expression similarity – A case study in Arabidopsis thaliana (in preparation).Google Scholar