MZ twins develop from a single fertilized egg and thus are thought to carry an identical set of genetic information. However, early post-zygotic mutational events have been found to constitute a low but substantial proportion of de novo mutations in humans (Dal et al., Reference Dal, Ergüner, Sağiroğlu, Yüksel, Onat, Alkan and Özçelik2014). These comprise genomic alterations at different scales, ranging from changes affecting only single nucleotides to larger CNVs of varying size as well as epigenetic differences. As outlined in a comprehensive review provided by Van Dongen et al. (Reference Van Dongen, Slagboom, Draisma, Martin and Boomsma2012), twin studies represent a powerful approach to gain insight into the genetic etiology underlying complex traits and trait variation. Also, extensive intra-individual genetic variations have been observed in human tissues (O’Huallachain et al., Reference O’Huallachain, Karczewski, Weissman, Urban and Snyder2012; Reference O’Huallachain, Weissman and Snyder2013) and CNVs in MZ twins also revealed somatic mosaicism in tissues arising from the same zygote (Bruder et al., Reference Bruder, Piotrowski, Gijsbers, Andersson, Erickson, de Ståhl and Dumanski2008). Hence, the search for differences in the genetic constitution within discordant MZ twins has been suggested a promising approach in gene finding (Zwijnenburg et al., Reference Zwijnenburg, Meijers-Heijboer and Boomsma2010). This assumption has been supported by observations that very early post-twinning mutational or epigenetic events can cause phenotypic discordance in MZ twins (Gervin et al., Reference Gervin, Vigeland, Mattingsdal, Hammerø, Nygård, Olsen and Lyle2012; Helderman-van den Enden et al., Reference Helderman-van den Enden, Maaswinkel-Mooij, Hoogendoorn, Willemsen, Maat-Kievit, Losekoot and Oostra1999; Kondo et al., Reference Kondo, Schutte, Richardson, Bjork, Knight, Watanabe and Gershwin2002; Kruyer et al., Reference Kruyer, Mila, Glover, Carbonell, Ballesta and Estivill1994; Mitchell et al., Reference Mitchell, Lleo, Zammataro, Mayo, Invernizzi, Bach and LaSalle2011; Selmi et al., Reference Selmi, Cavaciocchi, Lleo, Cheroni, De Francesco, Lombardi and Gerrshwin2014; Taylor et al., Reference Taylor, Thum and Abdalla2008; Zwijnenburg et al., Reference Zwijnenburg, Meijers-Heijboer and Boomsma2010).
In accordance with these observations, Breckpot et al. (Reference Breckpot, Thienpont, Gewillig, Allegaert, Vermeesch and Devriendt2012) identified three possibly disease-causing de novo CNVs in one affected twin out of six MZ twin pairs discordant for CHD, and Kondo et al. (Reference Kondo, Schutte, Richardson, Bjork, Knight, Watanabe and Gershwin2002) identified a nonsense mutation in IRF6 in the affected twin only in MZ discordant twins for Van der Woude syndrome. Moreover, a post-twinning ~1.3 Mb de novo deletion had been identified in a concordant-affected twin pair with attention problems (AP; Ehli et al., Reference Ehli, Abdellaoui, Hu, Hottenga, Kattenberg, van Beijsterveldt and Davies2012) that might be of value to explain the higher AP score in the twin showing the deletion.
To identify the cause of disease in this study, we performed WES in nine MZ twin pairs. These twin pairs were discordant for CHD, endocrine disorders, omphalocele, and congenital CDH. We hypothesized that the affected twin would show a causative de novo variation.
Materials and Methods
Patients and DNA Isolation
In our study, we included nine MZ twin pairs with different discordances who were recruited through the Department of Neonatology, Children's Hospital, Department of Obstetrics and Prenatal Medicine, Department of Medicine I, and through the Department of Pediatric Cardiology, University of Bonn. For all twins, parental DNA could be obtained. Written informed consent was obtained from all patients and/or the parents prior to study entry. The Ethics Committee of the Medical Faculty of the University of Bonn approved the study. Isolation of genomic DNA from blood was carried out by using the Chemagic DNA Blood Kit special (Chemagen, Baesweiler, Germany), while genomic DNA from saliva samples was isolated with the Oragene DNA Kit (DNA Genotek Inc., Kanata, Canada).
Twin Pairs
The phenotypic features of each affected twin and family data are listed in Table 1. All twin pairs were born to non-consanguineous parents. There was an unremarkable family history in all cases. Because MZ twins share all their DNA sequence, while DZ twins share, on average, only one half, the zygosity of the twin pairs was confirmed by comparing the SNVs of each co-twin in a twin pair (99.8% similar).
Whole Exome Sequencing (WES) and Data Analysis
Exonic and adjacent intronic sequences were enriched from genomic DNA using the NimbleGen SeqCap EZ Human Exome Library v2.0 enrichment kit. WES was performed using a 100bp paired-end read protocol due to the manufacturer's recommendations on an Illumina HiSeq2000 sequencer by the Cologne Center for Genomics (CCG), Cologne, Germany. On average, the sequence output reached 9.3 sequenced gigabases, from which 12% were marked as duplicates (Picard software), and the remaining 91% of reads could be mapped on the human hg19 reference genome (bwa-aln software). The mean coverage of targeted regions was 77–123 (on average, 95) and the percentage of targets that covered at least 30x ranged from 87 to 93% (on average, 90%). The UnifiedGenotyper (GATK) and Mpileup (Samtools) software were used to call variations and the CCG software FUNC (unpublished) was used to annotate variations for functional consequences in respect of altered protein structure and splicing impairment. The paired sample feature from the DeNovoGear software was further used to examine potential de novo mutations in twin pairs.
Data analysis and filtering of mapped target sequences was performed with the ‘Varbank’ exome and genome analysis pipeline v.2.1 (unpublished; https://varbank.ccg.uni-koeln.de). In particular, we filtered for high-quality (coverage of more than six reads, fraction of allele carrying reads at least 25%, a minimum genotype quality score of 10, VQSLOD greater than -8) and rare (Caucasian population allele frequency < 0.5%) variations on targeted regions + flanking 100bp. In order to exclude pipeline specific artifacts, we also filtered against an in-house epilepsy cohort (n = 511, AF < 2%) of variations, which were created with the same analysis pipeline. The filter conditions were set to be more sensitive following manual inspections of aligned reads. In particular, we looked for typical patterns of false positive variations (e.g., more than two haplotypes, base quality or mapping quality bias, strand bias, allele read position bias, low complexity region, alignment errors).
Variation Analysis
Variations identified by WES were amplified from DNA by polymerase chain reaction (PCR), and automated sequence analysis was carried out using standard procedures. In brief, primers were directed to all 13 variations observed and the resultant PCR products were subjected to direct automated BigDye Terminator sequencing (3130XL Genetic Analyzer, Applied Biosystems, Foster City, California, USA). Both strands from each amplicon were sequenced and presence of the variations in each twin pair was investigated by sequencing the respective PCR product.
Results
The only de novo variation identified in all twins by using the de novo probability tool DeNovoGear (c.233A>G, p.Gln78Arg) was observed in TMPRSS13 (encoding transmembrane protein, serine 13) in one of the CHD-affected twins (twin pair 4). This variation had been detected in 0.9% of the alleles in 511 in-house epilepsy controls and was deposited in dbSNP Build144 (rs75037497). In the Exome Aggregation Consortium (ExAC), the allele frequency was highest for the Finnish (0.2%) and East Asian (0.16%) populations. However, Sanger sequencing did not confirm this variation in the affected twin (see Supplementary figure for examples).
Analysis of all twins without DeNovoGear, using standard filter criteria (Table 2), detected 722 heterozygous variations that were only present in the affected twins and were absent in in-house controls. By comparing the read quality in Varbank with visual inspection, 12 variations remained for validation through Sanger sequencing. However, none of these variations could be confirmed. Either they were found to be false-positive calls in the affected twin or they turned out to be present in both twins.
OMIM = Online Mendelian Inheritance in Man; ZFIN = The Zebrafish Model Organism Database. MGI = Mouse Genome Informatics; *de novo variation.
Discussion
To our knowledge, this is the first WES study to search for genomic variations in discordant twins with CHD, CDH, omphalocele, prolactinoma, or acromegaly. Stringent analysis and comparison of WES data revealed no genetic differences between each of the discordant MZ twins.
Whereas a few studies have shown discordant twin phenotypes to result from dominant de novo mutations affecting coding regions, as described for Van der Woude syndrome (Kondo et al., Reference Kondo, Schutte, Richardson, Bjork, Knight, Watanabe and Gershwin2002), our current findings suggest that such events are not a common cause for phenotypic discordance among MZ twins. This is in accordance with other studies in which no genetic differences in CNV profiles or coding regions were found between MZ twins discordant for several disorders including the following: (1) multiple sclerosis (Baranzini et al., Reference Baranzini, Mudge, van Velkinburgh, Khankhanian, Khrebtukova, Miller and Kingsmore2010), (2) schizophrenia (Ono et al., Reference Ono, Imamura, Tasaki, Kurotaki, Ozawa, Yoshiura and Okazaki2010), (3) renal agenesis (Jin et al., Reference Jin, Zhu, Hu, Liu, Li, Li and Chen2014), (4) the VACTERL association (Solomon et al., Reference Solomon, Pineda-Alvarez, Hadley, Hansen, Kamat and Donovan2013), (5) urogenital malformations (Baudisch et al., Reference Baudisch, Draaken, Bartels, Schmiedeke, Bagci, Bartmann and Reutter2013), (6) caudal appendage with multiple congenital anomalies (Cogulu et al., Reference Cogulu, Pariltay, Koroglu, Aykut, Ozyurek, Levent and Ozkinay2013), (7) Crohn's disease (Petersen et al., Reference Petersen, Spehlmann, Raedler, Stade, Thomsen, Rabionet and Franke2014), (8) congenital hypothyroidism (Magne et al., Reference Magne, Serpa, Vliet, Samuels and Deladoëy2015), (9) amyotrophic lateral sclerosis (Meltz Steinberg et al., Reference Meltz Steinberg, Nicholas, Koboldt, Yu, Mardis and Pamphlett2015), or (10) congenital cataract (Wei et al., Reference Wei, Sun, Hu, Yang, Qiao and Yan2015). Abdellaoui et al. (Reference Abdellaoui, Ehli, Hottenga, Weber, Mbarek, Willemsen and Boomsma2015) reported concordance rates of CNVs of ~80% within 1,097 phenotypically unselected MZ twin pairs. However, only 2 of 20 putative de novo CNV candidates tested could be validated, suggesting that only a very small number of true post-twinning de novo CNVs may remain among this large number of MZ twin pairs. Moreover, whole-genome sequencing revealed that the genomes of 100-year-old MZ twins were discerned by only eight somatic single-base substitutions, whereas no variation was found in a 40-year-old twin pair (Ye et al., Reference Ye, Beekman, Lameijer, Zhang, Moed, van den Akker and Slagboom2013). Taking the estimate of Bruder et al. (Reference Bruder, Piotrowski, Gijsbers, Andersson, Erickson, de Ståhl and Dumanski2008), that de novo post-twinning CNV frequency could be as high as 5% on a per-individual basis or 10% per twinning event, all these data may rather support the assumption of an even more rare occurrence of post-twinning CNVs.
Several further explanations may account for our findings in the present study. First, WES covers only a small portion (1–2%) of the genome and provides no information of non-coding mRNA elements and/or intronic or intergenic regulatory regions responsible for the discordant phenotypes of the respective twins. It is also known, that (1) a small fraction of exons are refractory to WES in current technologies and that (2) potential variations might be filtered out as a result of low coverage (Jamuar et al., Reference Jamuar, Lam, Kircher, D’Gama, Wang, Barry and Walsh2014). Second, recent studies revealed substantial genetic variation in human tissue (O’Huallachain et al., Reference O’Huallachain, Karczewski, Weissman, Urban and Snyder2012; Reference O’Huallachain, Weissman and Snyder2013) and hence one may miss tissue-specific somatic mutations, especially in these tissues affected by the particular phenotype. Here, we investigated peripheral blood and testing of other tissue types that may have established a causative variation, as has been recently shown for Proteus syndrome (Lindhurst et al., Reference Lindhurst, Sapp, Teer, Johnston, Finn and Biesecker2011). Moreover, epigenetic differences may account for the disease, as reported by Kaminsky et al. (Reference Kaminsky, Tang, Wang, Ptak, Oh, Wong and Petronis2009), who detected epigenetic variability between MZ twin pairs in all tissues tested. Supportive of this assumption, different DNA methylation, leading to dysregulation of gene expression, was found in MZ twins discordant for psoriasis or primary biliary cirrhosis (Gervin et al., Reference Gervin, Vigeland, Mattingsdal, Hammerø, Nygård, Olsen and Lyle2012; Mitchell et al., Reference Mitchell, Lleo, Zammataro, Mayo, Invernizzi, Bach and LaSalle2011; Selmi et al., Reference Selmi, Cavaciocchi, Lleo, Cheroni, De Francesco, Lombardi and Gerrshwin2014). On the other hand, such changes could not be attributed to disease discordance in three multiple sclerosis discordant MZ twin pairs (Baranzini et al., Reference Baranzini, Mudge, van Velkinburgh, Khankhanian, Khrebtukova, Miller and Kingsmore2010).
There are also technical limits of both next-generation sequencing (NGS) and BigDye/Sanger sequencing to detect low-frequency variants. For NGS, aside from the methods of library creation, pooling, and enrichment strategies, as well as the variant caller program, the coverage is highly important. Spencer et al. (Reference Spencer, Tyagi, Vallania, Bredemeyer, Pfeifer, Mitra and Duncavage2014) demonstrated that >500x coverage is required for optimal performance for detecting low-frequency single nucleotide variants, hence this very expensive strategy is not performed on a regular basis. These authors reported that an analytical sensitivity of >90% can be achieved when analyzing variant allele fractions of 10% by generating this high coverage. For routine Sanger sequencing, Tsiatis et al. (Reference Tsiatis, Norris-Kirby, Rich, Hafez, Gocke, Eshleman and Murphy2010) found a limit of correct detection of 15–20% when testing tumor samples for the detection of KRAS mutations.
Taken together, the data of others and our study suggest that CNVs or exome DNA differences are very rare and that these might have a minor role in the pathophysiology of the complex diseases investigated. Hence, it is possible that the causes of the discordance among the presently investigated twin pairs may not be directly gene related but due to currently unidentified epigenetic or environmental factors. As outlined by Manolio et al. (Reference Manolio, Collins, Cox, Goldstein, Hindorff, Hunter and Visscher2009), post-twinning de novo mutations will not contribute to family resemblance and heritability, but can explain some of the variation at present attributed to ‘environment’. Since we could not confirm post-zygotic disease-causing mutations in any of the affected twins of the analyzed MZ discordant twin pairs, we have to assume that in an ACE (A = additive genetic factors, C = shared environment effects, E = unshared environment effects) twin model (Eaves et al., Reference Eaves, Last, Young and Martin1978), E plays a much more important role in their phenotypic discordance than A. This would concur with a low heritability of the present disease/birth defect in the offspring of the affected twin.
A contemporary approach to identifying post-zygotic variation explaining at least some of the ‘missing’ genetic contribution might be the expansion of sample sizes for MZ discordant twin-pairs in these disorders.
Conclusions
Genomic mutations in coding regions do not seem to play a major role in the discordance of MZ twins, although our cohort represents only a small number of MZ discordant twins. We cannot exclude that causative mutations may reside outside the regions accessible by WES or that discordance is the result of different epigenetic modifications.
Acknowledgments
We thank all families and patients for their participation in the study.
Supplementary Material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/thg.2015.93.