Substance use (SU) and substance use disorders (SUDs) — including alcohol, nicotine, and cannabis use disorders — run in families (Bierut et al., Reference Bierut, Dinwiddie, Begleiter, Crowe, Hesselbrock, Nurnberger, Porjesz, Schuckit and Reich1998; Kendler et al., Reference Kendler, Abrahamsson, Ohlsson, Sundquist and Sundquist2023; Merikangas et al., Reference Merikangas, Stolar, Stevens, Goulet, Preisig, Fenton, Zhang, O’Malley and Rounsaville1998). Offspring of parents with SUDs have an increased risk of SU and SUDs compared to offspring of parents without SUDs (Lieb et al., Reference Lieb, Merikangas, Höfler, Pfister, Isensee and Wittchen2002; McGovern et al., Reference McGovern, Bogowicz, Meader, Kaner, Alderson, Craig, Geijer-Simpson, Jackson, Muir, Salonen, Smart and Newham2023; Mellentin et al., Reference Mellentin, Brink, Andersen, Erlangsen, Stenager, Bjerregaard and Christiansen2016). Both genetic and environmental influences are critical in the intergenerational transmission of SU and SUDs (Kendler et al., Reference Kendler, Chen, Dick, Maes, Gillespie, Neale and Riley2012a; Rhee et al., Reference Rhee, Hewitt, Young, Corley, Crowley and Stallings2003; Verhulst et al., Reference Verhulst, Neale and Kendler2015). Twin and family studies indicate that genetic factors explain approximately 50% of the phenotypic variance in SUDs (Deak & Johnson, Reference Deak and Johnson2021; Kendler et al., Reference Kendler, Schmitt, Aggen and Prescott2008). The most salient family environmental risks for SU and SUDs include parental SU, socioeconomic status (SES), parental divorce or death, parental attitudes and monitoring of SU, parental psychopathology, and disrupted family functioning (Barr et al., Reference Barr, Driver, Kuo, Stephenson, Aliev, Linnér, Marks, Anokhin, Bucholz, Chan, Edenberg, Edwards, Francis, Hancock, Harden, Kamarajan, Kaprio, Kinreich, Kramer and ¼ Dick2022; Finan et al., Reference Finan, Schulz, Gordon and Ohannessian2015).
Parental genetics, both transmitted and non-transmitted to offspring, may contribute to rearing environments, which in turn affect offspring SU. The correlation between the offspring’s genome and their family environments complicates the separation of genetic and environmental influences, which have historically been studied independently. Therefore, the mechanisms through which parents contribute to their offspring’s risk of SU and SUDs remain unclear. Furthermore, despite advances in genomewide association studies (GWAS) identifying multiple genetic variants associated with SU and SUDs (Hatoum et al., Reference Hatoum, Colbert, Johnson, Huggett, Deak, Pathak, Jennings, Paul, Karcher, Hansen, Baranger, Edwards, Grotzinger, Adkins, Adkins, Alanne-Kinnunen, Alexander, Aliev, Bacanu and … Agrawal2023; Johnson et al., Reference Johnson, Demontis, Thorgeirsson, Walters, Polimanti, Hatoum, Sanchez-Roige, Paul, Wendt, Clarke, Lai, Reginsson, Zhou, He, Baranger, Gudbjartsson, Wedow, Adkins, Adkins and Agrawal2020; Pasman et al., Reference Pasman, Verweij, Gerring, Stringer, Sanchez-Roige, Treur, Abdellaoui, Nivard, Baselmans, Ong, Ip, van der Zee, Bartels, Day, Fontanillas, Elson, de Wit, Davis and Vink2018; Polimanti et al., Reference Polimanti, Walters, Johnson, McClintick, Adkins, Adkins, Bacanu, Bierut, Bigdeli, Brown, Bucholz, Copeland, Costello, Degenhardt, Farrer, Foroud, Fox, Goate, Grucza and Galernter2020; Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu, Wang, Gao, Jiang, Khunsriraksakul, Otto, Addison, Akiyama, Albert, Aliev, Alonso, Arnett, Ashley-Koch, Ashrani, Barnes and Vrieze2022; Zhou et al., Reference Zhou, Sealock, Sanchez-Roige, Clarke, Levey, Cheng, Li, Polimanti, Kember, Smith, Thygesen, Morgan, Atkinson, Thursz, Nyegaard, Mattheisen, Børglum, Johnson, Justice and Galernter2020), the molecular pathways linking genetic variants to SU and SUDs are still poorly understood.
Here, we provide a comprehensive methodological framework aiming to (1) disentangle directly transmitted parent-to-offspring genetics effects from the nontransmitted genetic effects on SU and SUDs, (2) measure the impact of both transmitted and nontransmitted parental genetics on offspring SU and SUDs via family environments, known as ‘genetic nurture’, and (3) elucidate the molecular pathways from genes to behaviour by leveraging family-based genomic data and gene expression networks. We note that the current article is primarily theoretical in nature. While we describe epidemiological characteristics of two longitudinal cohorts — the Brisbane Longitudinal Twin Study (BLTS) in Australia and the Lifelines Cohort Study (Lifelines) in the Netherlands — that will be used to apply these methods, we emphasize that this design paper does not include an application to these data. Instead, our aim is to provide a comprehensive roadmap for our intended empirical investigations.
Genes to Behaviors: Using Family-Based Genomic Data to Disentangle Intergenerational Transmission of SU and SUDs
In an intergenerational context, risk transmission pathways can be disentangled into (1) genetic transmission, where genetic variants for SUDs are directly passed from parents to children, and (2) genetic nurture, where parental genotypes associated with SUDs affect offspring outcomes indirectly via the rearing environments (Kong et al., Reference Kong, Thorleifsson, Frigge, Vilhjalmsson, Young, Thorgeirsson, Benonisdottir, Oddsson, Halldorsson, Masson, Gudbjartsson, Helgason, Bjornsdottir, Thorsteinsdottir and Stefansson2018).
Several methods have been developed to disentangle direct genetic transmission and genetic nurture, including children of twins, half-sibs, and adoption designs and genotyped parent-offspring designs (Jami et al., Reference Jami, Hammerschlag, Bartels and Middeldorp2021; McAdams et al., Reference McAdams, Cheesman and Ahmadzadeh2023). Among these, the adoption design works because biological parents do not provide the rearing environments for adoptees, which allows the separation of the biological parents’ DNA and the environments that adoptive parents provide (Kendler et al., Reference Kendler, Ji, Edwards, Ohlsson, Sundquist and Sundquist2015; Kendler et al., Reference Kendler, Ohlsson, Sundquist and Sundquist2016; Kendler et al., Reference Kendler, Sundquist, Ohlsson, Palmér, Maes, Winkleby and Sundquist2012b). Swedish adoption studies have shown that parenting styles significantly impact the risk for alcohol use disorder (AUD) in adoptees after adjusting for biological factors (Kendler et al., Reference Kendler, Ji, Edwards, Ohlsson, Sundquist and Sundquist2015). However, nonrandom placement, prior placements, placement timing, contact between adopted children and biological parents, and sample representativeness limit the generalization of adoption studies (Blackwood & Muir, Reference Blackwood, Muir, Johnstone, Owens, Lawrie, McIntosh and Sharpe2010; Kendler et al., Reference Kendler, Sundquist, Ohlsson, Palmér, Maes, Winkleby and Sundquist2012b). Although these limitations can be addressed, there is no objective means of unconfounding parent-offspring genetics from parental behaviors and environments without measured parental and offspring DNA.
Methods analyzing genotyped mother-offspring pairs resolve the above issue partially by disentangling either maternal or paternal genetics effects from offspring phenotypes but not both simultaneously (Eaves et al., Reference Eaves, Pourcain, Smith, York and Evans2014; Jami et al., Reference Jami, Eilertsen, Hammerschlag, Qiao, Evans, Ystrøm, Bartels and Middeldorp2020; Qiao et al., Reference Qiao, Zheng, Helgeland, Vaudel, Johansson, Njølstad, Smith, Warrington and Evans2020). Our method (Bates et al., Reference Bates, Maher, Colodro-Conde, Medland, McAloney, Wright, Hansell, Okbay, Kendler, Martin and Gillespie2019; Bates et al., Reference Bates, Maher, Medland, McAloney, Wright, Hansell, Kendler, Martin and Gillespie2018) (Figure 1), also developed by Kong et al. (Reference Kong, Thorleifsson, Frigge, Vilhjalmsson, Young, Thorgeirsson, Benonisdottir, Oddsson, Halldorsson, Masson, Gudbjartsson, Helgason, Bjornsdottir, Thorsteinsdottir and Stefansson2018), solves the above problems by relying on genotyped family trios to identify and separate parentally transmitted and nontransmitted alleles (Cordell et al., Reference Cordell, Barratt and Clayton2004; Wheeler & Cordell, Reference Wheeler and Cordell2007). The reassembled transmitted and nontransmitted parental genomes are used to construct transmitted and nontransmitted polygenic scores (PGSs, individual-level genetic liability for a trait). A nontransmitted PGS that is associated with offspring outcomes provides evidence of genetic nurture, which is unconfounded by the transmitted parent-to-offspring PGS.
The analysis of nontransmitted alleles has a long history in animal breeding (Walsh & Lynch, Reference Walsh and Lynch2018) but was only recently leveraged to measure and interpret their effects on complex human traits (Bates et al., Reference Bates, Maher, Colodro-Conde, Medland, McAloney, Wright, Hansell, Okbay, Kendler, Martin and Gillespie2019; Bates et al., Reference Bates, Maher, Medland, McAloney, Wright, Hansell, Kendler, Martin and Gillespie2018; Kong et al., Reference Kong, Thorleifsson, Frigge, Vilhjalmsson, Young, Thorgeirsson, Benonisdottir, Oddsson, Halldorsson, Masson, Gudbjartsson, Helgason, Bjornsdottir, Thorsteinsdottir and Stefansson2018), and particularly in genetic nurture of educational outcomes and psychiatric disorders (Frach et al., Reference Frach, Barkhuizen, Allegrini, Ask, Hannigan, Corfield, Andreassen, Dudbridge, Ystrom, Havdahl and Pingault2024; Martin et al., Reference Martin, Wray, Agha, Lewis, Anney, O’Donovan, Thapar and Langley2023; Shakeshaft et al., Reference Shakeshaft, Martin, Dennison, Riglin, Lewis, O’Donovan and Thapar2023; Tubbs & Sham, Reference Tubbs and Sham2023; Wang et al., Reference Wang, Baldwin, Schoeler, Cheesman, Barkhuizen, Dudbridge, Bann, Morris and Pingault2021). To our knowledge, genetic nurture studies with respect to SU and SUDs remain limited. One study found that parental PGS for smoking initiation explained unique variance in offspring frequency of tobacco and alcohol use after controlling for offspring’s own PGS, providing evidence of genetic nurture (Saunders et al., Reference Saunders, Liu, Vrieze, McGue and Iacono2021). A second study found that both transmitted and nontransmitted PGSs for AUD were associated with riskier alcohol outcomes via exposure to parental relationship discord and divorce (Thomas et al., Reference Thomas, Salvatore, Kuo, Aliev, McCutcheon, Meyers, Bucholz, Brislin, Chan and Edenberg2023). However, previous studies primarily require genotyping data for offspring and both parents (parent-offspring trios), which can reduce the sample size and potentially introduce selection bias (Martin et al., Reference Martin, Wray, Agha, Lewis, Anney, O’Donovan, Thapar and Langley2023). We therefore describe a novel haplotype-based method (see Methods) to differentiate transmitted and nontransmitted genomes in parent-offspring trios and pairs, thereby increasing the sample size and generalizability. This method has been validated and replicated genetic nurture effects on educational attainment in the Lifelines Cohort Study (Trindade Pons et al., Reference Trindade Pons, Claringbould, Kamphuis, Oldehinkel and van Loo2024).
Genes to Functions: Using Functional Genomics to Identify the Molecular Mechanisms Underlying SUDs
The success of our method depends on well-powered GWAS to calculate PGSs from transmitted and nontransmitted parental genomes. Despite advances in GWAS, we do not fully understand the cascade of biological changes linking genetic variants to SU and SUDs. This can be addressed with gene expression (GE), which plays a critical role in the development of human diseases (Hindorff et al., Reference Hindorff, Sethupathy, Junkins, Ramos, Mehta, Collins and Manolio2009; Liu, Reference Liu2011; Schadt et al., Reference Schadt, Lamb, Yang, Zhu, Edwards, Guhathakurta, Sieberts, Monks, Reitman, Zhang, Lum, Leonardson, Thieringer, Metzger, Yang, Castle, Zhu, Kash, Drake and Lusis2005). Expression quantitative trait loci (eQTL, Nica & Dermitzakis, Reference Nica and Dermitzakis2013) are genetic variants (usually SNPs) that may contribute to disease susceptibility via influencing gene expression, consequently providing a direct link between GWAS and GE studies (Franke & Jansen, Reference Franke and Jansen2009; Liu, Reference Liu2011). eQTL analyses can discern transcriptome adaptations (Bhattacharya & Mariani, Reference Bhattacharya and Mariani2009; Gu et al., Reference Gu, Rao, Stormo, Hicks and Province2002) and reveal the mechanisms by which genetic variants contribute to SU and SUDs (Lehrmann & Freed, Reference Lehrmann and Freed2008), depending on their genomic location (e.g., transcription factor binding sites, splice sites, or other regulatory regions). Since most GWAS variants reside outside protein-coding regions, eQTLs affect cell functions through subtle modification of gene transcription and translation (Shastry, Reference Shastry2009). Assessing eQTLs in linkage disequilibrium with SNPs associated with a trait is crucial to explain the functional significance of GWAS loci. However, most approaches for GE and eQTL analyses are limited to reduced statistical power, higher type I error (Langfelder & Horvath, Reference Langfelder and Horvath2008; Langfelder et al., Reference Langfelder, Mischel and Horvath2013), and inability to capture genetic interactions underpinning psychiatric disorders (Zhi et al., Reference Zhi, Minturn, Rappaport, Brodeur and Li2013).
These limitations can be addressed by multivariate network-based methods such as weighted gene co-expression network analysis (WGCNA; Langfelder & Horvath, Reference Langfelder and Horvath2008; Oldham et al., Reference Oldham, Konopka, Iwamoto, Langfelder, Kato, Horvath and Geschwind2008). WGCNA identifies genes driving traits by classifying gene sets into ‘network modules’ based on their expression and connectivity patterns. These patterns are summarized in a single quantitative metric, the ‘module eigengene’ (ME), which can be used to test for associations with SU and SUD. However, WGCNA requires GE data that cannot be easily obtained from brain tissues in large samples. Unlike GWAS studies, which now reach sufficient sample sizes to detect small effects (Levey et al., Reference Levey, Galimberti, Deak, Wendt, Bhattacharya, Koller, Harrington, Quaden, Johnson, Gupta, Biradar, Lam, Cooke, Rajagopal, Empke, Zhou, Nunez, Kranzler, Edenberg and Galernter2023; Saunders et al., Reference Saunders, Wang, Chen, Jang, Liu, Wang, Gao, Jiang, Khunsriraksakul, Otto, Addison, Akiyama, Albert, Aliev, Alonso, Arnett, Ashley-Koch, Ashrani, Barnes and Vrieze2022), available GE data from brain tissues, including Gene Tissue Expression (GTEx; Lonsdale et al., Reference Lonsdale, Thomas, Salvatore, Phillips, Lo, Shad, Hasz, Walters, Garcia, Young, Foster, Moser, Karasik, Gillard, Ramsey, Sullivan, Bridge, Magazine, Syron and Moore2013), is based on relatively small sample sizes.
In the last decade, GE imputation approaches, such as PrediXcan (Gamazon et al., Reference Gamazon, Wheeler, Shah, Mozaffari, Aquino-Michaels, Carroll, Eyler, Denny, Nicolae, Cox and Im2015), summary data-based Mendelian randomization (Gusev et al., Reference Gusev, Ko, Shi, Bhatia, Chung, Penninx, Jansen, de Geus, Boomsma, Wright, Sullivan, Nikkola, Alvarez, Civelek, Lusis, Lehtimäki, Raitoharju, Kähönen, Seppälä and … Pasaniuc2016), FUSION (Zhu et al., Reference Zhu, Zhang, Hu, Bakshi, Robinson, Powell, Montgomery, Goddard, Wray, Visscher and Yang2016), and our own JPEGMIX2 (Chatzinakos et al., Reference Chatzinakos, Lee, Webb, Vladimirov, Kendler and Bacanu2018), have been developed to circumvent these limitations and identify trait-GE associations. GE imputation bridges the gap between large GWAS data and underpowered transcriptome studies by integrating genotypes and expression data collected on the same individuals from reference data such as GTEx (Gamazon et al., Reference Gamazon, Wheeler, Shah, Mozaffari, Aquino-Michaels, Carroll, Eyler, Denny, Nicolae, Cox and Im2015). This approach builds predictive models to estimate heritable, genetically regulated components of GE, which can be stored as external weights to impute genetically regulated GE in independent samples (Gusev et al., Reference Gusev, Ko, Shi, Bhatia, Chung, Penninx, Jansen, de Geus, Boomsma, Wright, Sullivan, Nikkola, Alvarez, Civelek, Lusis, Lehtimäki, Raitoharju, Kähönen, Seppälä and … Pasaniuc2016). However, most existing GE imputation methods suffer from low to moderate accuracy, often due to high genomic complexity in certain regions, or the tissue- and cell-type specificity of GE (de Leeuw et al., Reference de Leeuw, Werme, Savage, Peyrot and Posthuma2023; Wainberg et al., Reference Wainberg, Sinnott-Armstrong, Mancuso, Barbeira, Knowles, Golan, Ermel, Ruusalepp, Quertermous, Hao, Björkegren, Im, Pasaniuc, Rivas and Kundaje2019). To address this, we aim to improve the accuracy of GE imputations and use them to evaluate higher order interactions between imputed expression data at the network level (see Methods). Our approaches enable the imputation of GE in both transmitted and nontransmitted genomes, organizing them into ‘imputed gene network’ for further analysis of associations with SU and SUDs.
Overall, this article aims to provide a comprehensive conceptual framework and methodological approach for investigating the intergenerational transmission of SU and SUDs, by integrating genetic nurture analyses, GE imputation, and WGCNA. We also additionally describe two longitudinal cohorts — the Brisbane Longitudinal Twin Study in Australia and the Lifelines Cohort Study in the Netherlands. By combining novel methodologies with unique datasets and advanced analytical techniques, we aim to provide new insights into how genetic and environmental factors shape SU, SUDs and associated disease risks across different life stages and populations.
Data Overview
To achieve the above aims, we will use phenotypic and genomic data from BLTS in Australia and Lifelines in the Netherlands. A detailed description of the BLTS SU and SUD phenotypic and genomic data are provided elsewhere (Couvy-Duchesne et al., Reference Couvy-Duchesne, O’Callaghan, Parker, Mills, Kirk, Scott, Vinkhuyzen, Hermens, Lind, Davenport, Burns, Connell, Zietsch, Scott, Wright, Medland, McGrath, Martin, Hickie and ¼ Gillespie2018; Gillespie et al., Reference Gillespie, Henders, Davenport, Hermens, Wright, Martin and Hickie2013). Thus, only a brief description is provided below. A broad, general description of Lifelines is also provided elsewhere (Klijs et al., Reference Klijs, Scholtens, Mandemakers, Snieder, Stolk and Smidt2015; Scholtens et al., Reference Scholtens, Smidt, Swertz, Bakker, Dotinga, Vonk, van Dijk, van Zon, Wijmenga, Wolffenbuttel and Stolk2015; Sijtsma et al., Reference Sijtsma, Rienks, van der Harst, Navis, Rosmalen and Dotinga2022). Here, we provide a detailed report of all available Lifelines genomic data, SU and SUDs data.
Brisbane Longitudinal Twin Study
The BLTS was launched in 1992 to study melanocytic nevi and comprises over 7000 young adult twins, siblings and parents with longitudinal assessments when twins were aged 12, 14, 16, 21 and 25 years (Couvy-Duchesne et al., Reference Couvy-Duchesne, O’Callaghan, Parker, Mills, Kirk, Scott, Vinkhuyzen, Hermens, Lind, Davenport, Burns, Connell, Zietsch, Scott, Wright, Medland, McGrath, Martin, Hickie and ¼ Gillespie2018; Gillespie et al., Reference Gillespie, Henders, Davenport, Hermens, Wright, Martin and Hickie2013; Wright & Martin, Reference Wright and Martin2004). We rely on SU and SUD and environmental risk data from the ‘19UP’ online self-report survey of N = 2876 subjects comprising n = 2142 twins and n = 734 nontwin siblings (67% female, mean age = 25.9 years, SD = 3.6) (Couvy-Duchesne et al., Reference Couvy-Duchesne, O’Callaghan, Parker, Mills, Kirk, Scott, Vinkhuyzen, Hermens, Lind, Davenport, Burns, Connell, Zietsch, Scott, Wright, Medland, McGrath, Martin, Hickie and ¼ Gillespie2018; Gillespie et al., Reference Gillespie, Henders, Davenport, Hermens, Wright, Martin and Hickie2013). In addition to demographic, general and mental health items, the survey assessed lifetime alcohol, nicotine and cannabis use, as well as Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV) and DSM-5 criteria for alcohol and cannabis use disorders (American Psychiatric Association, 1994, 2013), and the Fagerström Test for Nicotine Dependence (FTND; Heatherton et al., Reference Heatherton, Kozlowski, Frecker and Fagerström1991). As reported in detail elsewhere, the rates of lifetime alcohol, nicotine and cannabis use were 98.7%, 60.3% and 61.3% for males, 97%, 50.5%, 48.9% for females respectively (Gillespie et al., Reference Gillespie, Henders, Davenport, Hermens, Wright, Martin and Hickie2013). In terms of SUDs, the rates of lifetime DSM-IV cannabis abuse, cannabis dependence, alcohol abuse, and alcohol dependence were 17.0%, 9.8%, 40.2% and 35.4% for males, 7.7%, 4.6%, 29.2% and 22.6% for females respectively (Couvy-Duchesne et al., Reference Couvy-Duchesne, O’Callaghan, Parker, Mills, Kirk, Scott, Vinkhuyzen, Hermens, Lind, Davenport, Burns, Connell, Zietsch, Scott, Wright, Medland, McGrath, Martin, Hickie and ¼ Gillespie2018). Such significantly higher rates of SUDs in males have been repeatedly reported in previous studies (Brady & Randall, Reference Brady and Randall1999; Grant et al., Reference Grant, Goldstein, Saha, Chou, Jung, Zhang, Pickering, Ruan, Smith, Huang and Hasin2015; Khan et al., Reference Khan, Secades-Villa, Okuda, Wang, Pérez-Fuentes, Kerridge and Blanco2013; McHugh et al., Reference McHugh, Votaw, Sugarman and Greenfield2018). The 19UP study also included self-report indicators of parental and family environments, such as household SES, family functioning (parental marital history, parental and sibling absences and separations, parental involvement, and parental bonding), family and peer group deviance including peer SU, and religious behaviors (Table 1).
Note:
a Due to divorce, separation, death, placed in foster care, or left early to live alone.
b Includes step and adoptive parents.
Lifelines Cohort Study
Lifelines is an ongoing multidisciplinary prospective population-based cohort study examining in a unique three-generation design the health and health-related behaviors of 167,729 persons living in the north of the Netherlands. The study employs a broad range of investigative procedures to assess biomedical, socio-demographic, behavioral, physical and psychological factors that contribute to the health and disease in the general population, with a special focus on multimorbidity and complex trait genetics. The data for the current project were collected in three waves: baseline (2006–2013), wave 2 (2014–2017) and wave 3 (2019–2023). The design and rationale for Lifelines have been described in detail elsewhere (Scholtens et al., Reference Scholtens, Smidt, Swertz, Bakker, Dotinga, Vonk, van Dijk, van Zon, Wijmenga, Wolffenbuttel and Stolk2015; Sijtsma et al., Reference Sijtsma, Rienks, van der Harst, Navis, Rosmalen and Dotinga2022).
Baseline data were collected from 167,729 participants aged from 6 months to 93 years. During the baseline recruitment period, individuals aged between 25 and 50 years were invited through their general practitioners to participate in the study. All persons who consented to participate were asked to provide contact details and to invite their family members, that is, partners, parents and children, resulting in a three-generation study. In addition, adults were also given the option of registering and participating online in the Lifelines study. The Lifelines website details ongoing research and data collection: https://wiki.lifelines.nl/doku.php.
Overall, among the baseline respondents, 49% of the participants (n = 81,652) were invited through their general practitioners, 38% (n = 64,489) via participating family members and 13% (n = 21,588) self-registered via the Lifelines website (Scholtens et al., Reference Scholtens, Smidt, Swertz, Bakker, Dotinga, Vonk, van Dijk, van Zon, Wijmenga, Wolffenbuttel and Stolk2015; Sijtsma et al., Reference Sijtsma, Rienks, van der Harst, Navis, Rosmalen and Dotinga2022). Participants were then invited for follow-up assessments every 5 years, including laboratory and biometrical assessments, and comprehensive questionnaires. Between assessments, follow-up questionnaires are completed approximately once every 1.5−2.5 years. In the current study, there are 143,595, 89,812 and 57,633 adult participants with available SU or SUD data from baseline, wave 2, and wave 3 respectively. The number of participants at wave 3 will increase as the entire data release since summer of 2024. We note that some variables of interest have a considerable attrition rate, due to nonresponse in (one of) the follow-up assessments, withdrawal of participation or mortality (Sijtsma et al., Reference Sijtsma, Rienks, van der Harst, Navis, Rosmalen and Dotinga2022), which is often the case in large general population cohort studies.
Substance Use and Substance Use Disorders
In Lifelines, SU and SUDs were measured by self-reported questionnaires. Tobacco and alcohol use were measured across all three waves. Cannabis use were measured at wave 2 and 3. SUDs were measured at wave 3.
Substance Use
Tobacco use includes measures of smoking status (e.g., never/current/former smokers) and amount of tobacco consumed (Du et al., Reference Du, Sidorenkov, Groen, Heuvelmans, Vliegenthart, Dorrius, Timens and de Bock2022). Cigarettes per day was defined as the average number of cigarettes smoked per day, either as a current or former smoker. Pack-years of smoking were calculated by multiplying the amount smoked per day (of the different tobacco products, including cigarettes/roll-ups, cigarillos, cigars, grams of pipe tobacco) by the number of years the person has smoked (e.g., 1 pack = 20 cigarettes). Alcohol use was assessed as part of a food frequency questionnaire (FFQ) developed by Wageningen University (Brouwer-Brolsma et al., Reference Brouwer-Brolsma, Perenboom, Sluik, van de Wiel, Geelen, Feskens and de Vries2022). Two questions referred to the frequency and quantity of alcohol consumed in the past month: ‘How often did you drink alcoholic drinks in the past month?’ (ranging from Not in the last month to 6–7 days per week), and ‘On days that you drank alcohol, how many glasses did you drink on average?’ (from 1 to 12 or more). These questions were split up for different kinds of alcoholic beverages (beer, alcohol-free beer, red wine/rose, white wine, sherry, distilled wine, other alcoholic beverages). Based on these questions, an average daily alcohol consumption score (gram per day) was calculated (Mangot-Sala et al., Reference Mangot-Sala, Smidt and Liefbroer2021).
Lifetime cannabis use were defined using two questions: (1) ‘Have you ever used drugs? (Yes/No/I prefer not to answer that)’, and if yes, (2) ‘Have you ever used cannabis, such as weed, marijuana, hashish? (Yes/No)’. The answer categories were recoded to ever (1) versus never (0) used cannabis. Frequency of cannabis use was assessed using the questions, ‘How often did you use cannabis in your entire life or in the past 12 months?’ (from 0 to 40 times or more).
Detailed sample characteristics are presented in Table 2. At baseline, the mean age of 143,595 adult participants was 44.5 years (SD = 12.8, range = 18−93) and 58.5% of the sample were women. The prevalence of ever smoking at baseline (52.1% for females, 56.6% for males) in Lifelines is higher than the prevalence of ever smoking (22.4% for females, 22.8% for males) in the Netherlands Twin Register study (NTR) based on young adult twins aged 18−25 years (Vink & Boomsma, Reference Vink and Boomsma2011), but falls within the range of the general Dutch population based on the Dutch Central Bureau of Statistics (CBS, 2014). According to CBS, the prevalence of ever smoking among males was 31.6% under 23 years old, 55.4% in the 25−44 years group, 65.2% in the 45–64 years group and 79.1% in the ≥65 years group. Among females, this was respectively 26.1%, 47.8%, 60.8% and 52.2%.
Note: SD, standard deviation. 1 packyear: using 20 cigarettes per day for 1 year, or using 1 cigarette per day for 20 years; Daily alcohol intake: grams/day; Drinking quantity: glasses/day. FTND: Fagerström Test for Nicotine Dependence. AUD: Alcohol Use Disorder.
a The number of participants at wave 3 will increase as the entire data release since summer of 2024.
b Data on cannabis use were assessed since the second assessment,
c Only the third assessment;
d The prevalence of DSM-5 AUD was based on 54,369 participants who reported ever drink alcohol and had DSM-5 AUD data available.
At wave 2, the mean age of 89,812 adult participants was 50.3 years (SD = 13.0, range = 18−96). The 11.3% prevalence of lifetime cannabis use in Lifelines (9.5% for females, 13.8% for males) is comparable to the 12.3% in a general population sample aged 18−64 years from Netherlands Mental Health Survey and Incidence Studies (NEMESIS, 8.3% for females, 16.1% for males; Vega et al., Reference Vega, Aguilar-Gaxiola, Andrade, Bijl, Borges, Caraveo-Anduaga, DeWit, Heeringa, Kessler, Kolody, Merikangas, Molnar, Walters, Warner and Wittchen2002), which is a psychiatric epidemiological cohort study based on random sampling of individuals from the Dutch population register (Basisregistratie Personen). However, our estimate of lifetime cannabis use is lower than the 26.9% among subjects aged 18 to 65 years from the NTR (24.7% and 36.2% for females and males aged 21−40 years; Stringer et al., Reference Stringer, Minică, Verweij, Mbarek, Bernard, Derringer, van Eijk, Isen, Loukola, Maciejewski, Mihailov, van der Most, Sánchez-Mora, Roos, Sherva, Walters, Ware, Abdellaoui, Bigdeli and Vink2016; Vink et al., Reference Vink, Wolters, Neale and Boomsma2010). This may be due to differences in the sample characteristics, age ranges, regional differences in cannabis use, and different assessments for lifetime cannabis use. For example, in Lifelines, participants were asked (1) if they ever used drugs, and if yes, did they (2) ever use cannabis (weed, marijuana, hashish) in their entire life. The prevalence of lifetime cannabis use is likely underestimated in Lifelines due to underreporting if a participant did not classify cannabis as drugs. In NTR, the participants were asked what age they initiated cannabis use with answer categories: (1) 11 years and younger, (2) 12−13, (3) 14−15, (4) 16−17, (5) 18 years or older and (6) never. The answer categories were recoded to Ever (1) versus Never (0) used cannabis.
Substance Use Disorders
Alcohol and nicotine use disorders data were collected from self-reported questionnaires at wave 3. Alcohol use disorder (AUD) was assessed using the 11 diagnostic criteria from the DSM-5 (American Psychiatric Association, 2013). Lifetime DSM-5 AUD diagnosis required endorsing a minimum of two criteria in the 12 months preceding the questionnaire or previously. Consistent with DSM-5 criteria, lifetime AUD severity levels were categorized as non-AUD (<2 criteria), mild (2−3 criteria), moderate (4−5 criteria), or severe (≥6 criteria).
Nicotine dependence was assessed using the Fagerström Test for Nicotine Dependence (FTND; Heatherton et al., Reference Heatherton, Kozlowski, Frecker and Fagerström1991). The FTND includes six items to evaluate the quantity of cigarette consumption, the compulsion to use and nicotine dependence; for instance, ‘How soon after you woke up did you smoke your first cigarette?’ The scores were summed to yield a total FTND score of 0−10, the higher the score, the stronger the nicotine dependence.
Finally, the 14.7% prevalence of lifetime DSM-5 AUD in Lifelines (9.1% for females, 22.6% for males) is comparable to the 12.8% in subjects aged 18−75 years from the NEMESIS (7.9% for females, 17.8% for males; Ten Have et al., Reference Ten Have, Tuithof, van Dorsselaer, Schouten, Luik and de Graaf2023).
Genomic Data in Lifelines
Genotyping and Imputation
Lifelines participants were genotyped with the Illumina CytoSNP-12v2 array (n = 15,400), Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0 (n = 36,339) and FinnGen Thermo Fisher Axiom® custom array (n = 28,249) respectively. Quality control (QC) of marker and samples were performed separately per batch. The detailed pre-imputation QC criteria is described in the Lifelines wiki (http://wiki-lifelines.web.rug.nl/).
In brief, duplicated and monomorphic markers, markers with a low call rate, low minor allele frequency, or variants that deviated significantly from Hardy-Weinberg equilibrium were removed. Samples with a low call rate, heterozygosity outliers or were identified as mix-ups were filtered out. After QC, data from each array was imputed through the Sanger Imputation Service utilizing the Haplotype Reference Consortium panel. We selected a standardized collection of imputed markers with imputation quality scores equal to or above 0.8 across all batches. To minimize the impact of population stratification, we limited the samples to those of European ancestry, determined through principal component analysis with the 1000 Genomes reference. In cases of samples that were genotyped in multiple batches (duplicates), data from the latest batch was used.
For this project, samples included all genotyped offspring who had at least one parent genotyped. There were a total of 19,281 offspring available for analysis. This included 3217 complete trios and 16,010 duos with one genotyped parent. Since Lifelines aims to genotype all participants, these numbers are expected to increase over time.
Methods
Nontransmitted Alleles Inference
Unlike the previously applied pseudo-control method (Bates et al., Reference Bates, Maher, Medland, McAloney, Wright, Hansell, Kendler, Martin and Gillespie2018), which performs a marker-by-marker comparison to create non-transmitted parental genomes using genotyping data from offspring and both parents (parent-offspring trios), we will use a haplotype-based approach to differentiate transmitted and nontransmitted genomes in parent-offspring trios and pairs (Trindade Pons et al., Reference Trindade Pons, Claringbould, Kamphuis, Oldehinkel and van Loo2024). Briefly, offspring’s haplotypes were compared to the two (parent-offspring pairs) or four (parent-offspring trios) parental haplotypes using tiles of every 150 adjacent markers in a chromosome. Tile size of 150 markers was optimal for our genotyping data, given the SNP density of the genotyping array. Each tile overlapped by 50 markers with neighboring tiles on both sides, which allowed us to account for potential crossing-over events. These tiles were used to trace the best matching parental haplotype across all the tiles. For each tile, if the overall match between the offspring’s haplotype and the best matching parental haplotype is less than 99.8%, the method assumes that there could be crossing-over and checks the match with the other haplotype from the same parent. For each available parent, the best match between parent and offspring tiles was used to determine which two parental tiles were transmitted to the offspring. After determining the transmitted tiles, the remaining parental tiles were stored in a separate data as non-transmitted alleles. For parent-offspring pairs, the nontransmitted alleles of the ungenotyped parent were set as missing (Trindade Pons et al., Reference Trindade Pons, Claringbould, Kamphuis, Oldehinkel and van Loo2024). Once the transmitted and nontransmitted alleles are determined, they can be used not only to calculate separate PGSs but also for GE imputation. This allows us to generate neuronally derived GE networks for both transmitted and non-transmitted gene co-expression networks, which can then be tested for their impact on SU and SUDs.
Nontransmitted Gene Co-Expression Networks
Once identified and assembled into parental transmitted and nontransmitted genotypes, they can be used to impute GE. GE imputation will be performed using the PrediXcan software (Gamazon et al., Reference Gamazon, Wheeler, Shah, Mozaffari, Aquino-Michaels, Carroll, Eyler, Denny, Nicolae, Cox and Im2015). To increase the accuracy of GE imputation, we developed a stringent imputation pipeline. Considering the much greater variability of gene expression, it is expected that there will be greater variability in the successful GE imputation, as compared to SNP imputation. In our GE imputation pipeline, the correlation is one of the many steps to ensure the reliability of accurate GE imputation. Furthermore, as we focus on the network interactions between the imputed genes, that is, carrying the cumulative signal across all genes in the network, we inherently have a greater tolerance to retain the potentially useful information from all genes. Specifically, the pipeline consists of the following steps: (1) the removal of all SNPs with NAs from our own genotype data; (2) ensuring genotype annotation is identical between the SQLite file and our own genotypes; (3) the use of SQLite weights from the GTEx final data release (Version 8, August 2019) to predict GE in our sample; (4) using regression models (with observed gene expression as the outcome and predicted gene expression as the predictor of interest) to filter out genes with low prediction accuracy across all subjects, that is, genes with predicted p values ≥ .05; (5) the agreement between imputed and actual GE via Pearson correlation (r ≥ .10); and (6) use the SQLite SNP weight files derived from specific brain regions such as nucleus accumbens (NAc), putamen, prefrontal cortex (PFC), and amygdala. We chose to focus our GE imputation analysis on these brain regions comprising the mesocorticolimbic system, given their putative roles in SU and addiction (Koob & Volkow, Reference Koob and Volkow2016; Volkow et al., Reference Volkow, Michaelides and Baler2019). This system is central to reward processing, motivation, and executive function — all key processes implicated in SUDs. The NAc and putamen are critical components of the reward circuitry (Haber & Knutson, Reference Haber and Knutson2010; Volkow & Morales, Reference Volkow and Morales2015), the PFC is involved in decision-making and impulse control (Dalley & Robbins, Reference Dalley and Robbins2017; Dixon et al., Reference Dixon, Thiruchselvam, Todd and Christoff2017), and the amygdala plays a role in emotion processing and drug-associated memories (Everitt & Robbins, Reference Everitt and Robbins2016; Koob & Volkow, Reference Koob and Volkow2016). Numerous neuroimaging and postmortem studies have demonstrated altered structure and function in these regions in individuals with SUDs (Koob & Volkow, Reference Koob and Volkow2016; Volkow et al., Reference Volkow, Koob and McLellan2016). By focusing on these specific areas, our aim is to capture gene expression patterns most relevant to the neurobiology of SU and addiction. With the GTEx final data release Version 8, we will substantially increase our ability to reliably impute gene expression, that is, from 3115 genes to 10,000−14,000 genes per brain region.
Regardless, to mitigate the limitations associated with GE imputation, we will implement several strategies: (1) applying stringent statistical thresholds with conservative multiple testing corrections, (2) interpreting our results conservatively and emphasizing the need for experimental validation, (3) conducting sensitivity analyses to assess the robustness of our findings to different methodological choices. By adopting this approach, we aim to provide a balanced and reliable interpretation of our results. We emphasize that while our findings may be consistent with causal relationships, they should be considered as generating hypotheses that would require replication across independent datasets and further experimental validation. These additional studies would aim to provide stronger evidence supporting or falsifying these hypotheses, rather than providing definitive proof of causality.
Following imputation, WGCNA will be performed on both transmitted and nontransmitted GE to identify co-expression networks and their corresponding MEs significantly correlated with SU and SUDs (Langfelder & Horvath, Reference Langfelder and Horvath2008; Langfelder et al., Reference Langfelder, Mischel and Horvath2013; Zhang & Horvath, Reference Zhang and Horvath2005). As mentioned above, WGCNA identifies higher order interactions between genes by assembling the imputed transcriptomes into ‘imputed network modules’. Each network module in WGCNA is represented by its ME, which is a quantitative summary of the correlated expression and connectedness across all genes in a module. WGCNA can be applied to identify gene networks associated with any phenotype including sex, environment, and so forth. Here, our intention will be to focus only on gene networks correlated with each SU and SUDs while controlling for covariates such as sex and age. Post-hoc analysis will be used to identify sex-specific GE networks, followed by tests of network preservation. From our previous work (Vornholt et al., Reference Vornholt, Drake, Mamdani, McMichael, Taylor, Bacanu, Miles and Vladimirov2020, Reference Vornholt, Drake, Mamdani, McMichael, Taylor, Bacanu, Miles and Vladimirov2021), we expect to generate ∼20 network modules per SU and SUD outcome with varying degrees of correlation, represented by their respective Pearson correlations and p values between SU and SUD outcomes and network modules. The validity of such identified module-trait associations will be further validated using the bootstrapped WGCNA approach (Gandal et al., Reference Gandal, Haney, Parikshak, Leppa, Ramaswami, Hartl, Schork, Appadurai, Buil, Werge, Liu, White, Horvath and Geschwind2018). The p values will be further corrected via Bonferroni, and modules significantly correlated with covariates excluded (Ponomarev et al., Reference Ponomarev, Wang, Zhang, Harris and Mayfield2012).
Discussion
Our study advances research on SU and SUDs through several key innovations. First, we provide a methodological framework that integrates genetic nurture analysis, GE imputation, and WGCNA. This comprehensive approach will help unravel the complex interplay of genetic and environmental factors in SU and SUDs, potentially leading to more targeted interventions. Second, we leverage two unique datasets: the Brisbane 19UP Study from Australia, which offers rich phenotypic data on young adults during a critical developmental period for SU and SUDs; and the Lifelines Cohort from the Netherlands, as a large multigenerational study ideal for examining genetic and environmental factors across different life stages and family structures. These geographically and culturally distinct cohorts enable cross-population comparisons and enhance the generalizability of our findings. Third, our study design allows for meta-analyses, increasing statistical power to detect genetic effects. By incorporating detailed environmental data, we can examine gene-environment interactions across different life stages. Additionally, we integrate multi-omics data, including transcriptomic and epigenomic information, offering a multi-dimensional view of the biological processes underlying SU and SUDs.
Future Directions
Our future research will focus on SU and SUDs, utilizing a methodological framework that integrates genetic nurture analyses, GE imputation, and WGCNA. This approach, while tailored to SU and SUDs, has broad applicability across a wide range of complex traits and outcomes, including cardiovascular diseases, metabolic disorders, other psychiatric conditions, and non-medical traits such as educational attainment or personality.
We will apply three main methodological approaches across the Lifelines and Brisbane 19UP datasets. (1) Genetic nurture analyses: This will compare parental influence effects between the Netherlands and Australia, exploring how genetic nurture impacts SU, SUDs, and related outcomes across different cultural contexts, family structures, and socioeconomic backgrounds. (2) GE imputation: Using reference panels such as GTEx, we will explore how age-related changes in imputed gene expression patterns correlate with SU, SUDs, and associated health outcomes across the lifespan, particularly in the Lifelines. (3) WGCNA: We will impute GE data from both cohorts to identify co-expressed gene modules associated with SU and SUDs. This will also enable us to explore how environmental factors influence the relationship between these modules and health outcomes across different age groups.
By integrating these methods, we can address research questions such as how genetic nurture effects interact with specific gene co-expression network modules that influence SU and SUDs in different populations, and whether or not gene expression profiles mediate the relationship between polygenic risk scores and SU-related outcomes, while considering both direct genetic effects and indirect genetic nurturing effects.
Conclusions
The scientific premise for developing and applying our method to investigate transmitted and nontransmitted GE networks of SU and SUDs is compelling. To our knowledge, we are unaware of any studies using genotyped families for GE imputation to estimate the impact of transmitted and nontransmitted GE networks, via parental and family environments, on SU and SUDs in offspring. This will provide the first molecular evidence to show how GE networks can ‘genetically nurture’ the risk of SU and SUDs by helping to foster risky and protective environments. For example, through which environments do these transmitted and nontransmitted GE networks impact on SU and SUDs? Furthermore, we can extend our research questions by deriving transmitted and non-transmitted maternal and paternal genotypes for GE imputation. This will enable us to determine, whether, for example, the paternally transmitted and nontransmitted GE networks are more salient, or whether their impacts vary according to the biological sex of each offspring. We anticipate that our integrated approach will provide a comprehensive understanding of the interplay between genetic factors, gene expression, and environmental influences on SU and SUDs across different life stages and populations. The insights gained from our research may not only advance our understanding of the etiology of SU and SUDs but also inform studies of other complex traits and outcomes.
Data availability statement
Data may be obtained from a third party and are not publicly available. Researchers can apply to use the Lifelines data used in this study. More information about how to request Lifelines data and the conditions of use can be found on their website (https://www.lifelines.nl/researcher/how-to-apply).
Acknowledgments
This work was supported by grants from the United States National Institutes of Health, National Institute on Drug Abuse (R01DA052453, R00DA023549). The work of VIV was supported by R01MH118239 and R01MH132806. HvL was supported by a VENI grant from the Talent Program of the Netherlands Organization of Scientific Research (NWO-ZonMW 09150161810021). The Lifelines initiative has been made possible by subsidy from the Dutch Ministry of Health, Welfare and Sport, the Dutch Ministry of Economic Affairs, the University Medical Center Groningen (UMCG), Groningen University and the Provinces in the North of the Netherlands (Drenthe, Friesland, Groningen). The authors wish to acknowledge the services of the Lifelines Cohort Study, the contributing research centers delivering data to Lifelines, and all the study participants.
Competing interests
All of the authors declare no conflicts of interest.