Background
Proving the hypothesis that experiences occurring during early development can influence health and chronic disease risk across the entire lifecourse requires high-quality evidence from long-term observational human studies. This concept has evolved over the past half-century from a controversial theory to a generally accepted model, referred to as the developmental origins of health and disease (DOHaD), codified in a scientific society dedicated to the field. Reference Gillman and Rich-Edwards1 Substantial experimental data from rodents, sheep, primates, and other animals confirm that early experiences can encode lifelong risks for a range of chronic health conditions. Reference Hsu and Tain2 In addition to proving the principle, animal data are helpful for demonstrating potential biological mechanisms. However, human evidence is essential given the many differences in measures and phenotypes between animals and humans. In addition, the heterogeneity of environmental and social conditions among humans contrasts with the controlled environment of laboratory animals. Similarly, long-term observational human studies are critical to answering important research questions where randomized clinical trials are impossible or unethical.
The first wave of DOHaD research focused on early life undernutrition and relied on retrospective analyses of outcomes in adulthood that leveraged administrative data about weight and/or length at birth or in infancy Reference Barker3,Reference Hales, Barker and Clark4 or studied geographically localized famines. Reference Roseboom, de Rooij and Painter5–Reference Stanner, Bulmer and Andrès7 However, poor growth may reflect exposures beyond early life undernutrition, and retrospective studies typically provided little to no information on prenatal diet or other early life exposures that may influence biological programing mechanisms not captured by overall birth size. Additionally, studies of births that occurred up to a century ago may not remain relevant for the current era given secular shifts in food production, transportation options, healthcare, neighborhoods/access to green space, and behaviors (e.g., screen time). Further, the experience of famine, which encompasses global severe energy and nutrient deficiency and substantial impacts on fertility and pre- and postnatal mortality, may not generalize to the range of nutrition experienced by more representative populations of pregnant women, particularly as obesity becomes increasingly prevalent. Reference Rich-Edwards and Gillman8 Beyond these questions about external validity, retrospective studies may also have less internal validity, given survival and other biases (e.g., recall bias), as well as limited information about important confounders. Reference Gillman and Rich-Edwards1 Furthermore, interest has grown in examining a broader range of early life exposures beyond undernutrition, including overnutrition, psychosocial stressors, environmental chemicals, maternal substance use, and climate change.
Ultimately, evidentiary support for specific hypotheses within the DOHaD umbrella requires prospective longitudinal datasets with well-measured early life exposures of interest, information on relevant confounders, investigations into potential mechanisms of effect, and follow-up into older ages to allow for accrual of chronic disease outcomes. Many adult health outcomes sensitive to early life experiences are further affected by exposures in adolescence or young adulthood, so capturing ongoing environmental exposures is critical to a full understanding of risk. For example, chronic obstructive lung disease is not only related to fetal growth and gestational age, but also lifetime air pollution and smoking behaviors. Reference Gold, Wang, Wypij, Speizer, Ware and Dockery9,Reference Garcia, Rice and Gold10 Ideally, such longitudinal studies should be large enough to examine rare outcomes and include participants with diversity across a range of axes (e.g., geography, race, ethnicity, socioeconomic status, sexual orientation, and gender identity) to support generalizability and help identify subgroups with differential disease susceptibility.
Numerous prospective cohort studies have been established over the past few decades, with recruitment in childhood, pregnancy, and even preconception, and with plans to follow the offspring longitudinally. For example, the birthcohorts.net website lists more than 130 cohorts each with at least 300 mother–child pairs, enrollment during pregnancy or at birth, and follow-up for at least 1 year. 11 Large government-funded national initiatives such as the Danish National Birth cohort Reference Olsen, Melbye and Olsen12 and the Norwegian Mother and Child Cohort study (MoBa), Reference Magnus, Irgens and Haug13 each of which include over 100,000 mother–child pairs, represent one approach to accruing sufficient sample sizes for subgroup analyses or rare outcomes. These settings have the advantage of public registries and centralized health systems that allow for efficient and relatively complete assessment of multiple outcomes, but they are both somewhat homogenous Scandinavian populations and mostly rely on questionnaires rather than in-person assessments. Another approach to accruing large samples has been to establish research consortia, pooling data from multiple separate cohorts using a common protocol, such as the Environmental influences on Child Health Outcomes (ECHO) program led by the US National Institutes of Health. ECHO incorporates data from over 50,000 children enrolled in 69 extant cohorts across the US, Reference LeWinn, Caretta, Davis, Anderson and Oken14 although to date only a small number have been followed into young adulthood.
However, such large-scale longitudinal studies are only as informative as the data they contain. Leading long-term studies is complex, challenging, and expensive, but ultimately essential to providing answers to pressing questions about the extent to which a range of exposures influence health outcomes. In this paper, the authors draw upon their own experiences leading cohorts with longitudinal follow-up into adulthood to describe specific challenges and lessons learned.
Cohorts whose expertise is represented in this manuscript
The authors of this manuscript include researchers and study staff with experience leading longitudinal cohorts based in the US from early life into young adulthood. Table 1 summarizes information about recruitment and follow-up of these nine cohorts. All are participating in the NIH ECHO Program and have shared their cohort’s data on the ECHO common data platform. Reference LeWinn, Caretta, Davis, Anderson and Oken14 A major focus within the ECHO program has been to promote participant retention in the ECHO-wide cohort, with the goal of maximizing power and minimizing bias. Reference LeWinn, Caretta, Davis, Anderson and Oken14 To achieve this goal, cohort principal investigators and study staff attend various meetings and serve on committees at which retention strategies are discussed, including some meetings that have been organized by participant lifestage. In these discussions, we identified that we share many of the same challenges as our cohorts transition into young adulthood, and shared our various approaches to overcoming such challenges. The objective of this paper is to share these challenges and solutions with others who might be planning or conducting long-term birth cohort studies.
Hereafter we will use the umbrella term “birth” cohort to represent cohorts with recruitment at any point in early life, from preconception into early childhood, as the operational challenges related to long-term follow-up are similar regardless of the specific timepoint of early life recruitment. For brevity, we will refer to the adult participant as the “parent,” while recognizing that some children may have legal guardians who are not parents; and will typically refer to the offspring as the “child,” though they may have reached adolescent or young adult life stages.
Challenges and strategies to address them
In this section we focus on specific challenges we have faced in retaining longitudinal birth cohorts into young adulthood and suggest approaches to mitigate or address these challenges. We summarize the overarching issues in Table 2 and expand upon them in the following sections.
Duration of funding cycles vs. duration of intended follow-up
Funding to support cohort-specific science and operations typically is distributed in increments of 5 years or fewer, with no assurance of renewal. Some studies may initially evaluate only short-term outcomes, but then subsequently develop new aims requiring longer-term follow-up. Other studies may desire longer-term follow-up from the outset but may not have funding sufficient for this purpose. Regardless, most initial recruitment and consent materials discuss the shorter-term, funded duration of the study only. Participants in our cohorts have expressed confusion because they initially consented to a study with follow-up into infancy, and were subsequently recontacted about childhood, and then adolescent visits. We recommend that investigators communicate both short-and long-term plans for follow-up as early as possible. For example, a consent form might include language such as: “We currently have funding to follow you and your child through age 5 years. We are planning to seek additional funding in the future to support your participation for many years to come, until your child is grown. If we are successful in obtaining this funding in the future, we will recontact you to request additional consent at that time.” Such an approach may theoretically result in fewer potential participants initially enrolling, as some may be daunted by the concept of signing up for a study that may last decades. However, we believe communicating these plans will minimize participant confusion and promote retention, so that participants will be fully aware of the possibility that researchers are, or may become interested in following their child well beyond childhood. If and when cohorts do begin follow-up beyond the originally planned study duration, they should prepare frontline field staff to address participants’ question about why the study is continuing longer than expected, for example by sharing the scientific questions that can only be done in such a long-term study. Long-term funding is critical to retaining study staff, who develop relationships and rapport with participants and hold important institutional knowledge and technical skill sets. Though not always possible due, in part, to funding availability, the long-term retention of study staff enhances participation and reduces participant withdrawal.
Staying in contact
Staying in contact with participants over several decades remains an enormous challenge, compounded by the need to contact the child participants themselves once they turn 18. We recommend that cohorts apply multiple strategies for tracking, including: obtaining multiple contact modes including landline and cellphone numbers, school and personal email and postal addresses; asking for multiple (up to three) close contacts who would know of the participants’ whereabouts if they moved; collecting information that might be useful for credit database searches; and using multiple online tracking databases. We recommend that cohorts obtain as much high-quality contact information as possible about the child even before their 18th birthday, from the child or their parent. Frequent contact can help maintain participant engagement and ensures that participant contact information is up to date, reducing the burden on the tracking team. Cohorts should ask for updated contact information at all study visits and surveys, include links for contact on study websites, and query participant-preferred contact methods. They may include special outreach timepoints for the sole purpose of collecting updated contact information and may specifically incentivize provision of contact information.
Adolescents and young adults as independent participants
Longitudinal follow-up of birth cohorts confers the additional challenge that the participant of interest (the child) is not the same individual who originally enrolled in the cohort. While according to the US Department of Health and Human Services children may provide assent for participating in research after age 7, their consent is not required. 15 Cohorts must reconsent their young adult participants for ongoing participation after their 18th birthday; but even without such consent, cohorts are able to continue using extant data collected with parent/guardian consent before age 18. Regulations may differ in non-US settings. To motivate ongoing participation, cohorts must understand participant barriers and motivators and provide appropriate compensation for their participation.
To learn more about the factors that promote ongoing participation, surveys, interviews, and focus group discussions to learn more information about the experiences of participants with the study itself may be helpful. Such feedback can be highly valuable and cohorts may wish to provide multiple opportunities and venues for ongoing participant feedback, at every data collection timepoint and as desired in between. Among teen respondents to a survey administered by one cohort, primary motivators for participation included wanting “to help advance science,” “to be a part of something beneficial,” and “being reimbursed for my time.” Incentives were especially important for teen, compared with parent, participation.
As learning more about the cohort’s scientific focus and findings may be especially motivating for teenagers and young adults, cohorts should use a range of modalities to share interesting results, not only relying on postings on a website which would require the participant to seek out the information, but also with pushes via email, newsletter, and social media. Using up to date modes of communication, visuals rather than text, and prioritizing social media or text messaging rather than mail or email, also helps reach young adults. Reference Mandoh, Redfern, Mihrshahi, Cheng, Phongsavan and Partridge16,Reference Haijes and van Thiel17
Different incentive structures may be more or less motivating for the young adults, compared with their parents. In one cohort, the possibility of a larger “lottery” incentive for teens appeared to yield better survey participation rates as opposed to a guaranteed smaller incentive, which had previously worked well with parents. Cohorts may also offer a range of stipend amounts for different data or specimen types that correspond to the participant burden.
The creation of a Youth Advisory Council, similar to a Community Advisory Board but composed of young adult participants, is a helpful mechanism to allow participants to provide input about the science that is important to them and their communities. It also provides a venue for sharing study findings and for learning the best ways to communicate results to them and their peers. In one cohort, young adult participants were invited to join such a council. The group meets at varying frequency and format (e.g., in person and online) for facilitated discussion and are provided an incentive for their participation. While only a small proportion of all invited youth elected to participate, their participation increased their study engagement and has provided an invaluable perspective to help guide the study as it moves into a young adult life stage.
Data collection and management logistics
Flexibility is as important with approaches to data collection as for maintaining contact. As adolescents and young adults are busy with school, work, and socializing, we recommend remote data collection whenever possible via online or telephone/video chat interviews. We suggest that cohorts offer maximal flexibility in visit timing, length, location, frequency, and type, especially for data elements that require in-person collection such as phlebotomy, anthropometry, fitness, motor assessment, and spirometry. If funding and staffing allows, study personnel can travel to participants, conducting visits at home or at school, in community buildings, or even in a mobile data collection unit such as a van or camper. Self-report of characteristics such as weight and height can provide additional data, but validity is likely suboptimal. Reference Chan, Tarrant, Ngan, So, Lok and Nelson18,Reference Wilson, Bopp, Papalia and Bopp19 Self-collection protocols are increasingly being developed for many measures including biosample collection. Reference Richardson, Orr, Ollosson, Irving, Balfour-Lynn and Carr20,Reference Valentine-Graves, Hall and Guest21 An alternative to self-collection is to use data collected by medical providers, accessed either from review of medical records or after-visit summaries given to participants by medical providers. A single data collection timepoint may thus entail multiple modalities, for example: 1) remote consent and survey completion; 2) arranging for a phlebotomist to meet the participant in the early morning for a fasting blood collection; 3) asking the participant to come to a study center for body composition assessment; 4) requesting the participant email or text a photo of their most recent physical examination after-visit summary; and 5) a video or phone visit for interview or remote cognitive assessments.
Management of large, complex, longitudinal datasets and biosample repositories poses additional challenges. The complexity of data management grows with time, and many straightforward data storage systems such as spreadsheets or REDCap are not well suited to such multidimensional data sets. Some cohorts have developed their own data management systems whereas others have turned to commercially available packages. Biospecimens also require a flexible inventory system that can record each time samples are added, modified, or removed from the repository. As biospecimens are extremely precious, especially those from very early life such as pregnancy, birth, and infancy, cohorts must balance their future potential against current use, as well as the possibility that sample quality will degrade with time, especially with repeated freeze/thaw cycles. Therefore, long-term storage in small vials is recommended.
Sensitive topics
Many topics highly relevant for adolescent and young adult health and well-being are often considered to be private in nature, including pubertal development, sexual orientation, and behaviors; gender identity and expression; substance use and abuse; disordered eating behaviors; experiences of bullying or relationship violence; mental health symptoms and diagnoses; and incarceration. Collection of some measures of high scientific interest, such as semen in males or pelvic ultrasounds in females, may also be considered intrusive or embarrassing. Collecting information regarding the death of young adult participants or their parent is also delicate. Cohorts may find it challenging to know when and how to incorporate these measures into their data collection protocols, especially when the brand of the study has been a focus on mothers and babies. As data collection transitions from parent-report to teen self-report, it is important to assess the participant’s developmental readiness to participate independently.
First, cohorts should be sensitive to these issues. We recommend that informed consent/assent documents include information about sensitive topics to be assessed; allow participants to opt out of any items that they do not wish to answer; incorporate trigger warnings ahead of sensitive questions; and include ongoing training for field staff about how to address participant questions regarding these topics. Wording in questionnaires should be carefully reviewed by both experts and laypeople with a range of perspectives, with an eye on, for example, eliminating language that is unnecessarily gendered, and allowing for participants to report on, and change over time, gender identities.
Second, it is important for cohorts to communicate and plan for potential risks. Cohorts should decide and inform participants about any results that would be reportable events, for example mandated reporting for child or elder abuse, or clinician follow-up for concerning mental health symptoms or clinical results. To do so, the cohort would need to develop a system for identifying such results, such as automated flags or real-time review of returned surveys, and may need to identify healthcare providers for consultation or referral. Cohorts should provide information to all participants about how to seek help, such as information on suicide and domestic violence hotline resources.
Third, cohorts should strive to maintain the privacy of data, especially sensitive data elements. They should send study materials directly to the adolescent or young adult and provide ways for the participant to return it directly to the investigators, not via the parent. To the extent possible, they should conduct study visits in separate, private spaces. Even if participants are younger than age 18, cohorts may tell parents that they will not share any participant responses to maintain privacy in some instances. This information should be outlined in the study consent, which may need to be updated for each visit as the cohort ages.
Maintaining a diverse cohort
Most adverse exposures and diseases disproportionately burden the most vulnerable in our population – those with lower education, lower income, disabilities, and who are Black, Indigenous and persons of color (BIPOC). Thus, it is essential for cohorts to strive for broad representation among their research participants including individuals from higher risk populations and from those with diversity in life experience. Unfortunately, higher risk populations are often less likely to enroll in research studies, and are more likely to be lost to follow-up over time. We have applied multiple approaches to try to maintain diverse representation in our studies. Incentives are especially important for lower income participants, and cohorts should be attentive to inflation and consider secondary costs to visits, such as gas and parking. On the other hand, in many settings, lower income people may be more likely to take public transportation which can take much longer than self-driving; thus, offering taxi or rideshare vouchers may help, as well as opportunities for home visits. Some cohorts may require multi-lingual staff who can conduct study visits in the preferred language of participants. This also requires the translation of all study materials, including the dissemination of study results and communication materials described above. Depending on the study population, translation and use of interpreter services might be critical to fostering participation. People with disabilities may require assistance or accommodations to attend study visits and complete assessments.
Cohorts should use diverse images and artwork in study materials, including websites. Every effort should be made to hire study personnel who share demographic backgrounds with the cohort participants. At minimum, cohorts can include diverse consultants and a community or participant advisory board. Communications should strive to achieve equitable reach, recognizing that participants may not have ready access to the internet or computers, and may have hearing or vision impairments, cognitive or developmental impairments, lower literacy or health literacy, or language barriers to completing questionnaires. Study materials should be accessible to people from a range of backgrounds, with no assumed prior knowledge and minimal technological requirement. Time should be set aside at all encounters to allow participants to ask questions.
Looking into the future
“Birth cohort studies may have a natural beginning but they have no obvious end.” Reference Najman, Alati and Bor22 Many researchers might hope to follow their cohort participants throughout their lifetime to assess early life influences on diseases that manifest with age and even on mortality. Moreover, long-term follow-up provides the opportunity to study preconception parental and grandparental influences on a third generation.
We recognize that we represent cohorts that are all based in the United States, and it is possible that our perspectives may be less applicable in other countries. However, many of us also have led or collaborated on cohorts in other parts of the world. While we are not aware of prior publications that explicitly aimed to summarize the challenges related to maintaining longitudinal birth cohorts and strategies to minimize them, investigators from Australia, Brazil, Norway, and Chile have also commented on difficulties with tracking, participant attrition, and maintaining cohort leadership over decades similar to those we have described here. Reference Najman, Alati and Bor22–Reference Horta, Gigante and Goncalves25 Investigators from the Mater-University of Queensland Study of Pregnancy (MUSP) have also nicely detailed the difficulties related to analysis of such complex datasets, such as how to address potential bias, multiple repeated measures, highly correlated variables, and questions of reverse causality. It would be interesting to see additional future work on both challenges and solutions from investigators in a variety of settings around the world. In addition, we welcome additional opportunities to hear from the study participants themselves about their experiences participating in such long-term research.
Securing funding can be one of the most substantial and existential challenges and may require multiple, recurrent applications to multiple funders. Even with funding in hand, the challenges of maintaining contact, collecting high-quality data, engaging young adults, addressing sensitive topics, and sustaining a diverse cohort are substantial and grow over time. Longitudinal follow-up of a study population over many decades ultimately requires flexibility, adaptability, and appropriate incentives to and opportunities for feedback from participants. The authors have learned from each other and hope that this publication will serve as a valuable resource for other scientists and study staff. Longitudinal birth cohort studies serve as tremendous resources to answer important questions about the DOHaD that remain unanswered, as well as those not yet asked that will emerge in the future.
Acknowledgment
The authors wish to thank our ECHO colleagues; the medical, nursing, and program staff; and the children and families participating in the ECHO cohorts. We also acknowledge the contribution of the following ECHO program collaborators: ECHO Components—Coordinating Center: Duke Clinical Research Institute, Durham, NC: Smith PB, Newby KL; Data Analysis Center: Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD: Jacobson LP; Research Triangle Institute, Durham, NC: Parker CB; Person-Reported Outcomes Core: Northwestern University, Evanston, IL: Gershon R, Cella D.
Financial support
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Research reported in this publication was supported by the Environmental influences on Child Health Outcomes (ECHO) program, Office of the Director, National Institutes of Health, under Award Numbers U2COD023375 (Coordinating Center), U24OD023382 (Data Analysis Center), U24OD023319 (PRO Core), UH3OD023282 (Gern), UH3OD023287 (Breton), UH3OD023286 (Oken), UH3OD023348 (O’Shea/Fry), and UH3OD023290 (Herbstman/Pereira). Other funding came from the National Institutes of Health awards R01HD034568, R01AI050681, P01AI089473, P30ES02095, R01AI024156, and R01AI051598.
Conflicts of Interest
Dr Jackson reports consulting for Regeneron, Sanofi, GSK, AstraZeneca and DSMB for Pfizer. None of the other authors has a conflict of interest to report.