DNA methylation, one type of epigenetic modification mainly occurring at the CpG dinucleotide, is associated with the regulation of gene expression. DNA methylation does not remain constant over time (Fraga et al., Reference Fraga, Ballestar, Paz, Ropero, Setien, Ballestar and Esteller2005; Wong et al., Reference Wong, Caspi, Williams, Craig, Houts, Ambler and Mill2010), even in early infancy (Martino et al., Reference Martino, Loke, Gordon, Ollikainen, Cruickshank, Saffery and Craig2013). Previous studies suggest there are age-sensitive sites of DNA methylation (Alisch et al., Reference Alisch, Barwick, Chopra, Myrick, Satten, Conneely and Warren2012; Bell et al., Reference Bell, Tsai, Yang, Pidsley, Nisbet, Glass and Deloukas2012; Bocklandt et al., Reference Bocklandt, Lin, Sehl, Sanchez, Sinsheimer, Horvath and Vilain2011; Florath et al., Reference Florath, Butterbach, Muller, Bewerunge-Hudler and Brenner2014; Hannum et al., Reference Hannum, Guinney, Zhao, Zhang, Hughes, Sadda and Zhang2013; Johansson et al., Reference Johansson, Enroth and Gyllensten2013; Rakyan et al., Reference Rakyan, Down, Maslau, Andrew, Yang, Beyan and Valdes2010).
Six studies have developed algorithms that use DNA methylation to predict chronological age (Bocklandt et al., Reference Bocklandt, Lin, Sehl, Sanchez, Sinsheimer, Horvath and Vilain2011; Florath et al., Reference Florath, Butterbach, Muller, Bewerunge-Hudler and Brenner2014; Hannum et al., Reference Hannum, Guinney, Zhao, Zhang, Hughes, Sadda and Zhang2013; Horvath, Reference Horvath2013; Koch & Wagner, Reference Koch and Wagner2011; Weidner et al., Reference Weidner, Lin, Koch, Eisele, Beier, Ziegler and Wagner2014). The predicted value is regarded as the methylation-based biological age (mage) for the corresponding tissue. In particular, Hannum and colleagues developed an age predictor based on methylation levels at 71 probes from the Illumina 450K Methylation arrays using data from whole blood (Hannum et al., Reference Hannum, Guinney, Zhao, Zhang, Hughes, Sadda and Zhang2013). Another age predictor based on methylation levels at 353 probes common to the Illumina 450K and 27K Methylation arrays using data from multiple tissues and cell types was developed by Horvath (Reference Horvath2013).
Methylation age acceleration index, defined as Δage = mage — chronological age (Horvath, Reference Horvath2013), represents the inconsistency between methylation-based biological age and chronological age. There is now evidence that both the Δage measures from the Hannum and Horvath predictors are associated with risks of some diseases, and of all-cause mortality (Horvath et al., Reference Horvath, Erhart, Brosch, Ammerpohl, von Schonfels, Ahrens and Hampe2014, Reference Horvath, Garagnani, Bacalini, Pirazzini, Salvioli, Gentilini and Franceschi2015; Marioni et al., Reference Marioni, Shah, McRae, Chen, Colicino, Harris and Cox2015).
Both the Hannum Δage and the Horvarth Δage were found to be correlated in adolescent twin pairs and their family members (Marioni et al., Reference Marioni, Shah, McRae, Chen, Colicino, Harris and Cox2015), and Horvarth Δage was also found to be correlated in a small number of middle-aged twin pairs (Horvath, Reference Horvath2013). The heritability of Δage was estimated to be 40%, based on assuming that the variance is composed of an additive genetic variance (A) and an individual-specific variance (E) (i.e., non-shared environmental factors) only. However, this assumption was not tested.
Of the six published studies using DNA methylation to predict chronological age, two (Bocklandt et al., Reference Bocklandt, Lin, Sehl, Sanchez, Sinsheimer, Horvath and Vilain2011; Florath et al., Reference Florath, Butterbach, Muller, Bewerunge-Hudler and Brenner2014) did not provide the identifiers and coefficients of probes used in their regression models, and we did not get this information from contacting the authors so could not apply their predictors to our dataset. The mage measured by the predictors from the Koch study (Koch & Wagner, Reference Koch and Wagner2011) and the Weidner study (Weidner et al., Reference Weidner, Lin, Koch, Eisele, Beier, Ziegler and Wagner2014) had a low correlation with chronological age in our dataset (r = 0.31 for the Koch predictor; r = 0.38 for the Weidner predictor), so we excluded these two predictors to measure mage. We therefore used the two predictors from the Hannum study (Hannum et al., Reference Hannum, Guinney, Zhao, Zhang, Hughes, Sadda and Zhang2013) and the Horvath study (Horvath, Reference Horvath2013) for analysis.
In this study, we estimated the familial correlation of Δage measured by the Hannum and Horvath predictors using blood samples donated by 132 middle-aged female twin pairs and 215 of their sisters participating in a twin family study of mammographic density, a risk factor for breast cancer, to explore possible causes of variation in Δage.
Materials and Methods
Subjects
Subjects were from the Australian Mammographic Density Twins and Sisters Study (AMDTSS; Odefrey et al., Reference Odefrey, Stone, Gurrin, Byrnes, Apicella, Dite and Southey2010; Stone et al., Reference Stone, Gurrin, Byrnes, Schroen, Treloar, Padilla and Hopper2007), in which female twins and their sisters were recruited between 2004 and 2009. When recruited, the participants were breast cancer free. The study was approved by the Human Research Ethics Committee of The University of Melbourne, and all participants gave written informed consent. Participants completed questionnaire surveys through telephone-administered interviews and donated blood samples. Questionnaires collected demographic information and self-reported weight, height, and other known and putative breast cancer risk factors. Blood samples were couriered to the laboratory within 48 hours of collection, and were processed to generate dried blood spot Guthrie cards.
In this study, in which we oversampled twin families with one or more sisters, 479 women comprising 66 MZ pairs, 66 DZ pairs, and 215 sisters from 130 families were selected for DNA methylation measurement. The mean chronological age was 56 years (range 40–78 years; standard deviation 8 years). There were a total of 552 sibling pairings (including twin-sister pairs). Table 1 shows that the majority of families (87%) had three or four members, with 48% containing one twin pair and one sister, and 38% containing one twin pair and two sisters.
MZ = monozygotic twins, DZ = dizygotic twins.
DNA Methylation Measurement
DNA was extracted in batches of 192 samples from dried blood spots using a method developed in-house (Joo et al., Reference Joo, Wong, Baglietto, Jung, Tsimiklis, Park and Severi2013). Briefly, for each sample, 20 blood spot punches 3.2 mm in diameter were added to 180 μl phosphate buffered saline and 20 μl protease. After an overnight incubation at 56°C, the blood spots were homogenized twice using the TissueLyser II (Qiagen, Hilden, Germany) at 25 hertz for 30 seconds. The resulting supernatant was transferred to clean collection microtubes and DNA was extracted using the QIAamp® 96 DNA blood protocol as per manufacturers’ instructions (Qiagen, Hilden, Germany). DNA quantity was assessed using the Quant-iT™ Picogreen® dsDNA assay (Life Technologies, Grand Island, NY) measured on the EnSpire® Multimode Plate Reader (PerkinElmer, Waltham, Massachusetts).
One microgram of DNA was sodium bisulfite converted using the EZ DNA Methylation-Gold protocol as per manufacturers’ instructions (Zymo Research, Irvine, CA) and eluted in 20 μl elution buffer. The success of bisulfite conversion and the presence of DNA after bisulfite conversion were evaluated using an in-house bisulfite-specific quantitative PCR (Wong et al., Reference Wong, Joo, McLean, Baglietto, English, Severi and Southey2015). Bisulfite-specific primers (forward sequence: 5′ tAA GGT AtA ATt AGA GGA TGG GAG GGA t; reverse sequence: 5′ aaC AAA CTC Aaa TAa AAT TCT TCC TC) were designed to amplify a 134 bp region within breast cancer susceptibility gene BRCA1 (Genbank: L78833.1). Lower-case letters correspond to bisulfite converted cytosines.
Each reaction consisted of 1X SYBR Green I Master (Roche, Basel, Switzerland), 300 pM each of forward and reverse primers (Integrated DNA Technologies, Coralville, IA), and 3 μl diluted bisulfite converted DNA (diluted 1:3 in nuclease free water). The reaction was equilibrated to 10 μl with nuclease free water (Life Technologies, Carlsbad, CA). The bisulfite-specific qPCR assay was performed on the LightCycler® 480 System (Roche, Basel, Switzerland) with the following cycling conditions: initial polymerase activation for 5 minutes at 95°C followed by 40 cycles of DNA denaturation for 10 seconds at 95°C, primer annealing for 30 seconds at 60°C and extension for 90 seconds at 72°C. Subsequent melting of the amplified product was performed from 97°C to 65°C for 60 seconds with fluorescent data acquired on the green channel.
All DNA samples were assayed in duplicate. Good quality (non-degraded), non-bisulfite converted DNA extracted from the U266 multiple myeloma cell line was used as a negative control. Only DNA samples that amplified at least five quantitation cycles earlier than the negative control (Cq >5) were assayed on the Infinium HumanMethylation450 BeadChip array.
Epigenome-wide methylation was assessed using the Infinium HumanMethylation450 BeadChip arrays (Sandoval et al., Reference Sandoval, Heyn, Moran, Serra-Musach, Pujana, Bibikova and Esteller2011) in accordance with the manufacturer's instructions. Briefly, a total of 200 ng of bisulfite converted DNA was whole genome amplified and hybridized onto the BeadChips. The TECAN automated liquid handler (Tecan Group Ltd, Mannedorf, Switzerland) was used for the single-base extension and staining steps. DNA samples extracted from members of the same family were assayed on the same beadchip to minimize potential beadchip batch effects. Additionally, two randomly selected technical replicates (one plate included three replicates) and two U266 cell line DNA samples were included on each plate.
Methylation Data Processing
Raw methylation data was processed by Bioconductor minfi package (Aryee et al., Reference Aryee, Jaffe, Corrada-Bravo, Ladd-Acosta, Feinberg, Hansen and Irizarry2014), which includes normalization of data using Illumina's reference factor-based normalization methods (preprocessIllumina) and subset-quantile within array normalization (SWAN) for type I and II probe bias correction (preprocessSWAN; Maksimovic et al., Reference Maksimovic, Gordon and Oshlack2012). An empirical Bayes batch-effects removal method, ComBat (Johnson et al., Reference Johnson, Li and Rabinovic2007), was applied to minimize the technical variation across batches. A total of 65 probes corresponding to known single nucleotide polymorphisms, the identifiers of which start with ‘rs’, were excluded. Probes with detection p value higher than .01 were assigned as missing. Samples with more than 5% missing probes were excluded, as were probes having a missing value in one or more samples. After cleaning, 479,957 probes for all 479 samples remained.
Statistical Methods
Analyses were based on beta values, defined as the ratio of the methylated probe intensity to the sum of methylated and non-methylated probe intensities. Ranging from 0 to 1, beta values approximate the percentage of methylation.
We calculated Spearman correlation coefficients for 11 replicate sample pairs and 119,794 non-replicate sample pairs, and compared the coefficients using Wilcoxon rank test to test if the observed variation in methylation was due to biological causes. In the main analyses, for the 11 replicate samples, the methylation measurement we used was the average of the two measurements.
Chronological age was defined as the age when the blood was collected. The Horvath mage was calculated using the online calculator (http://labs.genetics.ucla.edu/horvath/dnamage/). The Hannum mage was calculated as the sum of beta values in our study multiplied by the corresponding regression coefficients as reported by Hannum and colleagues. Δage was estimated by the mage measures above minus the chronological age.
The two Δage measures both reflect the difference between mage and chronological age, and they were highly correlated with each other. Therefore, we combined the two measures together to get one measure for the difference between mage and chronological age. The combined measure was calculated as the average of the two Δage measures after standardizing each to have mean = 0 and standard deviation = 1.
For each Δage measure and the combined Δage measure, we estimated the familial correlation for different types of relatives (MZ, DZ, and sibling pairs) under asymptotic likelihood theory using a multivariate normal model and the software FISHER (Hopper & Mathews, Reference Hopper and Mathews1982, Reference Hopper and Mathews1994; Lange et al., Reference Lange, Boehnke and Weeks1987). The mean values were adjusted for age and estimated cellular composition (Houseman et al., Reference Houseman, Accomando, Koestler, Christensen, Marsit, Nelson and Kelsey2012; Jaffe & Irizarry, Reference Jaffe and Irizarry2014) by linear regression to remove the fixed effect of these covariates. Familial correlation for MZ pairs (r MZ), DZ pairs (r DZ), and sibling pairs (including twin-sister pairs; r Sib) were estimated simultaneously. The correlations between estimates of r MZ, r DZ, and r Sib were also estimated. In order to compare r MZ, r DZ, and r Sib, we fitted five models: (1) r MZ ≠ r DZ ≠ r Sib; (2) r MZ = r DZ ≠ r Sib; (3) r MZ ≠ r DZ = r Sib; (4) r MZ = r Sib ≠ r DZ; and (5) r MZ = r Sib = r DZ. The relative goodness of fit between nested models was assessed using the likelihood ratio test. In this analysis, four tests were performed for each measure. To control for Type I error, we took p = .013 (0.05/4) as our nominal threshold for statistical inference.
The study of Marioni et al. (Reference Marioni, Shah, McRae, Chen, Colicino, Harris and Cox2015) used a similar twin family design to ours to estimate familial correlations of the Hannum Δage and the Horvath Δage. We contacted the authors to obtain their estimates and combined with ours using the fixed-effect model in the metafor package in R (Viechtbauer, Reference Viechtbauer2010) to estimate the weighted average correlations.
We examined the aspect of familial variance shared by the Hannum Δage and the Horvath Δage. The correlations between two Δage measures were estimated (Hopper & Mathews, Reference Hopper and Mathews1994; Lange et al., Reference Lange, Boehnke and Opitz1983) and compared across different types of relatives as for the individual Δage measures.
Results
The median Spearman correlation in beta values for 11 duplicate sample pairs was 0.986 (range 0.980–0.990), larger than 0.982 (range 0.964–0.990) for non-replicate samples. The difference was significant (p = .003), consistent with the observed variation in methylation being due to biological causes.
Table 2 shows that, for both the Hannum and Horvarth measures, there was no evidence of a difference between the means of mage and chronological age (both p = .06). The correlation was 0.80 between the two mage measures, and 0.76 between the two Δage measures. For both Δage measures, there was no evidence of a difference in means between the three types of relatives (p = .4 for the Hannum Δage; p = .7 for the Horvath Δage).
SD = standard deviation, MZ = monozygotic twins, DZ = dizygotic twins.
Familial Correlation of the Individual ΔAge Measure
Table 3 shows that for both Δage measures there was familial correlation (model V), and comparing with model I shows that the familial correlation varied across different types of relatives. For both Δage measures, the correlation between the estimates of r MZ and r DZ in model I was approximately 0.04, while the correlation between the estimates of r Sib and either r MZ or r DZ was approximately 0.10. This means that the estimates of r MZ, r DZ, and r Sib were virtually independent of one another.
MZ = monozygotic twins, DZ = dizygotic twins, SE = standard error; Ref = Reference.
For both Δage measures, although the MZ pair correlation was greater than twice the DZ pair correlation, and the DZ pair correlation was approximately twice the sibling pair correlation (model I), there was no statistically significant difference between the MZ and DZ pair correlations (model II vs. model I), nor between the DZ and sibling pair correlations (model III vs. model I). The only statistically significant difference was between the MZ and the sibling pair correlations (model IV vs. model I; both p < .005). Furthermore, there was marginal evidence that the MZ pair correlation was greater than twice the sibling pair correlation (p = .08 for Hannum Δage; p = .05 for Horvath Δage).
Familial Correlation of the Combined ΔAge Measure
Table 4 shows that for the combined Δage measure there was familial correlation (model V), and the correlation varied across different types of relatives (model V vs. model I). The DZ pair correlation was approximately halfway between the MZ and the sibling pair correlations (model I). However, again there was no statistically significant difference between the MZ and DZ pair correlations (model II vs. model I), nor between the DZ pair and sibling pair correlations (model III vs. model I). The only statistically significant difference was between the MZ and the sibling pair correlations (model IV vs. model I; p = .002).
MZ = monozygotic twins, DZ = dizygotic twins, SE = standard error; Ref = Reference.
Weighted Average Correlations Across the Two Studies
For the Hannum Δage, the weighted average correlations were 0.48 (standard error [SE] = 0.07) for MZ pairs, 0.27 (SE = 0.07) for DZ pairs and 0.15 (SE = 0.03) for sibling pairs. For the Horvath Δage, the weighted average correlations were 0.51 (SE = 0.07) for MZ pairs, 0.20 (SE = 0.07) for DZ pairs and 0.13 (SE = 0.03) for sibling pairs.
Given the low correlations between the estimates of r MZ, r DZ, and r Sib observed in our study (see above), and assuming this almost independence of estimates also applies to the study of Marioni et al., the difference between the MZ and DZ pair weighted average correlations was significant for both measures (p = .03 for the Hannum Δage; p = .002 for the Horvath Δage), and the difference between the DZ and sibling pair weighted average correlations was not significant for both measures (p = .14 for the Hannum Δage; p = .40 for the Horvath Δage). Note that the sibling pair correlation was one-third and one-fourth the MZ pair correlation, respectively. The MZ pair correlation was marginally greater than twice the sibling pair correlation (p = .08 for the Hannum Δage; p = .01 for the Horvath Δage).
Correlation Between Two ΔAge Measures Across Different Types of Relatives
Table 5 shows that the cross-trait correlation between the Hannum Δage and the Horvath Δage was familial (model V), and varied across different types of relatives (model V vs. model I). For DZ pairs, the cross-trait correlation was greater than the sibling pair correlation, and similar to the MZ correlation. After statistical testing, only the difference between the MZ and the sibling pair cross-trait correlations was significant.
MZ = monozygotic twins, DZ = dizygotic twins, SE = standard error; Ref = Reference.
Discussion
By studying middle-aged twins and their sisters, we found that both the Hannum and Horvarth mortality-associated methylation acceleration indices were correlated in different types of relatives, consistent with the findings of previous studies (Horvath, Reference Horvath2013; Marioni et al., Reference Marioni, Shah, McRae, Chen, Colicino, Harris and Cox2015). Familial correlation implies there are genetic and/or shared environmental causes of variation in the methylation acceleration index.
The classical twin model assumes that for all the environmental factors that influence the trait and are shared or correlated within twins, their twin pair correlation and strength of association with the trait are both exactly the same for MZ pairs as they are for DZ pairs. Under this assumption, any and all excess in the correlation between MZ pairs compared with DZ pairs is attributable to genetic causes of variation. This means that the classic twin model gives an upper estimate of the role of genetic factors in trait variation. It also means that if the MZ pair correlation is not significantly greater than the DZ pair correlation, there is no evidence for genetic factors influencing the trait variation, a point often overlooked in many twin studies that estimate heritability directly without first testing twin pair correlations. Furthermore, if there are only additive genetic variance and individual-specific variance (i.e., non-shared environment variance) for a trait — as assumed by Marioni et al. (Reference Marioni, Shah, McRae, Chen, Colicino, Harris and Cox2015) — the MZ pair correlation is expected to be twice the DZ pair correlation, and twice the sibling pair correlation. However, with respect to the latter, we found marginal evidence that the MZ pair correlation was greater than twice the sibling correlation for individual Δage measure. Therefore, although the correlation across types of relatives differed, they did not necessarily do so in strict accordance with the expectation under the AE model.
The weighted average correlations were also not consistent with the AE model. The weighted estimates for MZ and DZ pairs were nominally statistically different, consistent with a genetic cause of variation under the equal environment assumption. However, there was marginally evidence that the MZ pair correlation was greater than twice the sibling pair correlation, which brings questions to the presumption of the former twin family study (Marioni et al., Reference Marioni, Shah, McRae, Chen, Colicino, Harris and Cox2015). The result raises the possibility that there are other shared non-genetic determinants. Therefore, the 40% proportion of variance due to additive genetic factors is likely overestimated by the former study, even without taking into account the impact of any shared environment factor.
The same issue applies to the shared determinants of the two highly correlated Δage measures. Although there was a difference in the cross-trait correlation across different types of relatives, it is not possible to definitively pinpoint the relevant differences, except that the cross-trait correlation for MZ pairs was greater than that for sibling pairs. Therefore, the shared determinants are most likely not genetic factors alone.
The strength of this study is the use of twin families. By also including sibling pairs, we can estimate the familial correlation for more types of relatives other than only for twins, which provides more information than a study including twins alone. The other strength is the use of the multivariate normal model for pedigree analysis so as to enable efficient estimate of familial correlation across different types of relatives. Although the sets of pairs of relatives are not independent, with groups having come from the same family, the statistical approach we have used takes this into account. The major weakness of our study is that the sample size is such that there is still considerable imprecision in the estimate of familial correlation. Clearly, larger sample sizes are needed.
In conclusion, our study does not find evidence that variation in methylation acceleration index is explained by additive genetic factors and individual-specific factors (i.e., non-shared environmental factors) alone. Instead, there might be substantial variance due to shared non-genetic factors. Therefore, the proportion of variance due to unmeasured genetic factors is likely less than 40% as estimated in the previous study. More twin and family studies are needed to clarify this issue.
Acknowledgments
The authors thank the twins and sisters who participated in this study. This research was facilitated through the Australian Twin Registry, a national research resource in part supported by a Centre for Research Excellence Grant from the National Health and Medical Research Council of Australia. This study was supported by the National Health and Medical Research Council of Australia (grant number 1050561 and 1079102), Cancer Australia and National Breast Cancer Foundation (grant number 509307), and Kaiser Permanente (grant number R01CA168893/115-9278/1278-01). John Hopper is a Senior Principal Research Fellow and Melissa Southey is a Senior Research Fellow of the National Health and Medical Research Council of Australia.