Penalized Integrative Analysis of High-Dimensional Omics Data

doi:10.1017/CBO9781107706484.009

8 - Penalized Integrative Analysis of High-Dimensional Omics Data

from Part B - Vertical Integrative Analysis (General Methods)

Published online by Cambridge University Press: 05 September 2015

George Tseng ,

Debashis Ghosh and

Xianghong Jasmine Zhou

Jin Liu ,

Xingjie Shi ,

Jian Huang and

Shuangge Ma

Show author details

George Tseng: Affiliation:
University of Pittsburgh
Debashis Ghosh: Affiliation:
Pennsylvania State University
Xianghong Jasmine Zhou: Affiliation:
University of Southern California
Jin Liu: Affiliation:
Duke-NUS Graduate Medical School
Xingjie Shi: Affiliation:
Shanghai University of Finance and Economics, China
Jian Huang: Affiliation:
University of Iowa
Shuangge Ma: Affiliation:
Capital University of Economics and Business, China

Book contents

Get access

Summary

Abstract

With omics data, results generated from single-dataset analysis are often unsatisfactory. Integrative analysis methods conduct the joint analysis of data from multiple independent studies or on multiple correlated responses, can effectively increase power, and outperform single-dataset analysis and meta-analysis. In this chapter, we review the penalized integrative analysis methods under both the homogeneity and heterogeneity models. Computation using the coordinate descent approach is described. We also discuss several important extensions. The analysis of a genome-wide association study demonstrates the applicability of reviewed methods.

Introduction

In the study of complex diseases such as cancer, cardiovascular diseases, and autoimmune diseases, profiling studies are nowroutinely conducted, generating “large d, small n” data, where the number of omics features profiled (genes, SNPs, methylation loci, etc.) d is much larger than the sample size n. Many different types of analyses can be conducted. For example, Chapters 3 and 4 were focused on identifying meaningful networks. In this chapter, our analysis goal is to identify a small subset of omics measurements that are associated with disease outcomes or phenotypes. Such measurements are also referred to as “markers” in the literature and in this chapter. Statistically, this is a variable selection problem. The development of integrative analysis methods has been partly motivated by the following examples.

8.1.1 Example 1

Consider the analysis of data generated in multiple independent studies with comparable designs. For example, in Ma et al. (2011), four pancreatic cancer data sets are collected and analyzed. The four data sets were generated in four independent studies, all having a case-control design, collecting mRNA gene expression measurements and searching for genes associated with the risk of pancreatic cancer. In high-dimensional omics studies, it has been recognized that the results generated in single-data-set analysis often have unsatisfactory properties such as low reproducibility. Among many possible contributing factors, the most important one is perhaps the small n. Multi-data-set analysis can effectively increase sample size and outperform single-data-set analysis (Guerra and Goldstein, 2009). This perspective has been explained in multiple chapters of this book. When the designs of multiple studies are “close enough”, it can be reasonable to expect that they identify the same set of markers.

Type: Chapter
Information: Integrating Omics Data , pp. 174 - 204

DOI: https://doi.org/10.1017/CBO9781107706484.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

8 - Penalized Integrative Analysis of High-Dimensional Omics Data

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive