Introduction
Developmental psychopathology is a framework that examines risk and resilience processes across development, with the aim of preventing psychopathology and promoting positive development (Cicchetti & Toth, Reference Cicchetti and Toth2009; Cicchetti, Reference Cicchetti1993; Masten, Reference Masten2006; Sroufe & Rutter, Reference Sroufe and Rutter1984). This framework is integrative, multidisciplinary, and involves the examination of the dynamic interplay of multilevel influences across development (Cicchetti & Toth, Reference Cicchetti and Toth2009; Masten, Reference Masten2006). Integral to this approach is the idea that associations between exposures and outcomes often vary across time. As a result, factors during certain developmental periods may have a larger influence on the development of psychopathology and related outcomes (Cicchetti & Toth, Reference Cicchetti and Toth2009). Thus, examining exposure–outcome associations across development, and not just in one developmental period, is critical. Additionally, it is important to consider how these associations may change over time (i.e., strengthen, attenuate, or vary with time since exposure).
While it is known that many associations differ over time, many analytic approaches require the assumption that the association between an exposure and outcome is constant, no matter what period during development the exposure and outcome are measured. In this tutorial, we present a more general approach to fitting longitudinal models of exposures and outcomes, which allows the association between the exposure and the outcome to change across time. In many cases, the temporal model for how exposures and outcomes are associated is not fully understood. In these cases, an exploratory, hypothesis-generating approach can be used. For these cases, we present a planned testing cascade to select the model which best fits the data. The testing cascade compares the utility of four temporal models, under the assumption that the exposure precedes the outcome.
In this tutorial, we present a general approach to fitting longitudinal models that are relevant to some of the central tenets of the developmental psychopathology framework. One of these tenets is that the developmental psychopathology framework is inherently transdisciplinary and examines multilevel influences on development (Cicchetti & Toth, Reference Cicchetti and Toth2009; Doom & Cicchetti, Reference Doom, Cicchetti, Harkness and Hayden2020; Rutter & Sroufe, Reference Rutter and Sroufe2000). In this approach, researchers can test transdisciplinary questions such as whether a child’s attachment style might be associated with other domains (e.g., executive functioning; immune system function) across development. Another central tenet of the framework that the current approach is well-suited to explore is developmental cascades, or influences that spread across multiple levels, domains, or systems over time (Doom & Cicchetti, Reference Doom, Cicchetti, Harkness and Hayden2020; Masten & Cicchetti, Reference Masten and Cicchetti2010; Sroufe & Rutter, Reference Sroufe and Rutter1984). Here researchers might be interested in understanding the cascade from parenting stress at one or multiple points in development to children’s later externalizing symptoms. Another important consideration in the developmental psychopathology framework is the identification of sensitive periods in development (Cicchetti & Toth, Reference Cicchetti and Toth2009; Cicchetti, Reference Cicchetti2015b; Pollak, Reference Pollak2015). Researchers might examine whether there are specific developmental periods in which stress exposure has the largest impact on cognition. Finally, this approach allows for the testing of interactions between risk and protective factors over time, which is another central tenet of the developmental psychopathology framework (Doom & Cicchetti, Reference Doom, Cicchetti, Harkness and Hayden2020; Masten, Reference Masten2001, Reference Masten2006). In this approach, researchers could first assess the relation of maltreatment experiences at one or more points in development with later friendship quality. After establishing this main temporal model, researchers could then test whether having a safe, trusting adult might disrupt the association between maltreatment and friendship quality across development. The statistical approach described in the current manuscript is well suited for examining these central tenets of the developmental psychopathology framework.
Following from this framework, developmentalists often ask questions about how exposures at different time points across development may be associated with trajectories of later outcomes. The following questions are often of interest:
-
Are all prior measurements of the predictor are associated with the outcome, with potentially different strengths of association, depending on the timing of the exposure and outcome measures (Figure 1a)?
-
Is only the most recently recorded measure of the predictor associated with the outcome (Figure 1b)?
-
Is there a sensitive period which best predicts the outcome across development? If so:
-
○ Does the strength of the association between the exposure measured during a sensitive period and the outcome differ depending on when the outcome was assessed (Figure 1c)?
-
○ Or, does the strength of the association between the exposure measured during a sensitive period and the outcome remain constant no matter when the outcome was assessed (Figure 1d)?
-
These questions are quite common in developmental research as well as across a number of other fields. For example, a researcher interested in the association between maternal sensitivity and offspring internalizing symptoms over development might hypothesize that all antecedent measurements of maternal sensitivity are associated with offspring internalizing symptoms (Figure 1a). Similarly, a researcher could hypothesize that hypothyroidism at any time point would be associated with expressive language at any point in development (Figure 1a). A researcher interested in the relation between peer deviancy and externalizing problems over time might hypothesize a recency effect, which would be best modeled using measurements of peer deviancy collected immediately before the assessment of externalizing problems (Figure 1b). Similarly, a researcher could hypothesize a recency effect of alcohol intake on risky decision-making where alcohol intake at the preceding time point shows the strongest association with risky decision-making at the following time point (Figure 1b). A researcher interested in examining the relation between iron deficiency and attention over time might hypothesize that there is an early sensitive period in which iron deficiency has the largest impact. They may also be interested in whether iron deficiency in that early sensitive period has the same magnitude of association with attention over time (Figure 1d), or whether this magnitude differs over time, and therefore across development (Figure 1c). Researchers interested in teratogenic effects of antenatal Zika infection on offspring cognition might similarly hypothesize that there is a sensitive period during pregnancy where infection may have greater effects relative to exposures outside of this period, though the strength of the association between antenatal Zika infection and the outcome could be the same or differ over time (Figure 1c,d). While each of these researchers might hypothesize certain temporal models for their research questions, it is difficult to know which is the best temporal model without comparing multiple models.
This tutorial describes how to test four hypotheses that are relevant to the psychopathology examples described above. We describe how to fit, compare, and evaluate models that align with each hypothesis. This tutorial has two main aims: (1) to describe how to fit several temporal models and (2) to describe how to compare the models and evaluate the associated hypotheses they represent, with an ordered sequence of tests. While any researcher could benefit from fitting the temporal model that fits their a priori hypothesis, the comparison of multiple models is especially helpful in situations where researchers have a hypothesized model but not enough prior literature to be confident that the proposed temporal model fits the data well.
The organization of the current tutorial is as follows. We first describe four possible temporal hypotheses, which correspond to four models. Mathematical descriptions and example code for the four models appear in the section entitled Supplementary Materials. Second, we show how to construct hypothesis tests to compare the four temporal models, and how a planned sequence of hypothesis tests allows investigators to choose the model that best fits the data. Third, we illustrate the approach by posing the questions in the context of an example longitudinal study of effortful control and body mass index (BMI), each measured at multiple time points. We provide a template for the interpretation of the results. Lastly, we contextualize recommendations for this modeling approach by discussing implications for conclusions in developmental psychopathology. See Table 1 for a glossary of terms used in the tutorial.
Analytic methods
This section details the general linear mixed model. First, we describe the model itself, then the components and assumptions, the hypothesis testing approach, four potential models that can be tested under this approach, and finally, how to compare models.
The general linear mixed model
The general linear mixed model (Laird & Ware, Reference Laird and Ware1982) has several features which allow it to provide a good description of repeated measured data in developmental psychopathology. The mixed model can be used to analyze data with repeated measures of both exposures and outcomes. Additionally, the general linear mixed model can be used for research where some of the planned measures are missing, or do not occur exactly at the time the researchers planned. This model allows describing both the population average response, and participant-specific trajectories. Finally, the model permits modeling the covariance structure for the outcomes, which describes the variance of each measure, and the correlations among the repeated measures. A description of the model appears in the Supplementary Materials section.
Model components
Full descriptions of the components of the general linear mixed model appear in the Supplementary Materials section. The general linear mixed model has a vector of outcomes, a fixed effect exposure matrix, a matrix of fixed effect slopes and intercepts for the population average, a matrix of participant-specific random exposures, and a vector of errors. The fixed effects exposure matrix, also known as a design matrix, or an X matrix, contains known constants and organizes the variables of interest.
Variance assumptions
A complete description of observations with a normal distribution specifies not only the means and slopes but also the covariance matrix, which gives variances and correlations among observations. A general linear mixed model defines the covariance among observations with (1) the random effects exposure matrix, (2) the covariance matrix for random effects, and (3) the covariance matrix for errors.
When fitting repeated measurements from the same individual, the covariance structure among observations must accurately account for within-participant correlation and potentially unequal variances. Choosing the incorrect covariance structure can greatly increase the Type I error rate (Gurka et al., Reference Gurka, Edwards and Muller2011), the probability that a hypothesis test incorrectly rejects the null.
Explicit guidance for choosing an appropriate covariance structure for repeated measures appears in multiple manuscripts (e.g., Cheng et al., Reference Cheng, Edwards, Maldonado-Molina, Komro and Muller2010). Here, we discuss two possible structures. When the true covariance pattern is unknown, a common approach is to fit an unstructured covariance (Gurka et al., Reference Gurka, Edwards and Muller2011; Harrall et al., Reference Harrall, Muller, Starling, Dabelea, Barton, Adgate and Glueck2023). The unstructured covariance structure is estimated using the repeated measurements for each study participant. When the variance of the outcome increases with age, as is common for such outcomes as BMI, many authors fit a random regression covariance model. The random regression model allows participant-specific intercepts and slopes and models the variance and covariance as a function of age or duration of time. The random regression approach was used in the four models in this manuscript.
The four models
Graphics are used to describe each model in Figure 1. Model equations are shown for each model in Table 2. The four models are presented in decreasing order of complexity, from most complicated, with many exposures, to least complicated, with fewer exposures.
For tutorial purposes, the models are presented with four repeated measurements. The assumption is made that the exposures are measured before the outcomes. Therefore, we will consider the exposure at time one, time two, and time three. We will consider the outcome at time two, time three, and time four. Exposure matrices for the four models appear in the Supplementary Materials section.
The four models described here can each model developmental processes. The all-times-before model associates the predictor in every prior time period with an outcome and can test different strengths of association at each time point. The immediately-before model tests the influence of a predictor on an outcome close in time. The two sensitive period models test whether there is a time period where the predictor has greater effects on the outcome. The four models enable assessment of developmental processes and assess which are probable based on the timing of the measurements. We may not know ahead of time which is plausible, especially with new predictor and outcome or in novel time periods, which is why testing all the models is important.
All-times-before model
The all-times-before model is shown in Figure 1a and Table 2. For each outcome at each time, values of the exposure measured at any time before the outcome are used as the exposures. The model allows different strengths of associations between each exposure and outcome pair. For example, in a study with four time points, in this model, the measurement of the exposure at time one is associated with the outcome at time two, the measurements of the exposure at times one and two is associated with measurement of the outcome at time three, and the measurements of the exposure at times one, time two, and time three are associated with the outcome at time four.
Immediately-before model
The immediately-before model is shown in Figure 1b and Table 2. For each outcome at each time, values of the exposure variable measured at the time immediately before the outcome are used as the exposures. The model allows different strengths of associations between each exposure and outcome pair. In this model, in a study with four time points, the measurement of the exposure at time one is only associated with the measurement of the outcome at time two. Similarly, the measurement of the exposure at time two is only associated with the outcome at time three. Lastly, the measurement of the exposure at time three is only associated with the measurement of the outcome at time four.
Differential-sensitive-period model
The differential-sensitive-period model is shown in Figure 1c and Table 2, with the sensitive period assumed to be at time one. The exposure at time one is associated with the outcome at each later time point. The model allows different strengths of association between each exposure and outcome pair. For example, the magnitude of the relation might decrease as more time elapses between the measurements of the exposure and the outcome, or the magnitude of the relation might strengthen with age. In this case, in a study with four time points, the measurement of the exposure at time one is associated with the measurement of the outcome at times two, three, and four.
Stable-sensitive-period model
The stable-sensitive-period model is shown in Figure 1d and Table 2, with the sensitive period assumed to be at time one. The exposure at time one is associated with the outcome at each later time point. The model assumes that the strength of the association is the same at each time period. Like the differential-sensitive-period model, the measurement of the exposure at time one is associated with the measurements of the outcome at times two, three, and four.
Model comparison by hypothesis testing
To determine the importance of specific developmental periods or how an association may change over time, the all-times-before, the immediately-before, the differential-sensitive-period model, and the stable-sensitive-period models can be compared using a series of planned hypothesis tests. This series of tests uses a backward stepdown approach to find the best temporal model. If a sensitive period best predicts the outcome, then a test is used to determine whether the association varies across time or is stable across time. A flowchart for this series of hypothesis tests comparing the four models specified appears in Figure 2. Mathematical specifications of the hypothesis tests appear in the Supplementary Materials section.
Each hypothesis test compares a complex model with a simpler model nested within it. A simple model is considered to be nested within a complex model if the simple model can be obtained from the complex model by setting coefficients equal to zero. The immediately-before model and sensitive period models are nested within the all-times-before model and therefore can be compared. However, the sensitive period models are not nested within the immediately-before model and thus cannot be directly compared. The hypothesis testing sequence does allow arrival at any of the four models as the final model (see Figure 2).
The hypothesis testing sequence starts with the most complex model, the all-times-before model. Hypothesis tests are used to evaluate the utility of exposures. The tests provide comparisons of nested models. The first test compares the all-times-before model to a model nested within it: the immediately-before model. The second test compares the all-times-before model with a different model nested within it: differential-sensitive-period model.
In both tests described, the hypothesis test assesses the multiple-degrees-of-freedom null hypothesis of whether the coefficients which occur in the more complex model, but not the simpler model, are equal to zero. A multiple degrees of freedom hypothesis is a hypothesis which tests multiple simple hypotheses at the same time. A significant p-value (i.e., one less than the nominal level set by the investigator, often 0.05) suggests that the null should be rejected, indicating that at least one of the coefficients is different from zero, and therefore the more complex model is a good fit to the data. A nonsignificant p-value indicates that the null should not be rejected, and that the simple and more complex model seem equally likely. For reasons of parsimony, the hypothesis testing cascade suggests using the simpler model wherever reasonable.
Figure 2 shows one last planned hypothesis test, for the case when a model using a single sensitive period as an exposure is the best fit. The hypothesis test compares two nested models. Testing whether slopes and intercepts are the same across time periods allows choosing between the differential-sensitive-period model and the stable-sensitive-period model.
Example matrices and code
Organizing the data for fitting the models and testing the hypotheses is key. Commented SAS code (SAS Institute, Cary, North Carolina) appears in the Supplementary Materials section, together with example datasets. The sample code demonstrates how to construct analytic datasets, hypotheses, and the four models presented in this manuscript. For clarity, the example analytic datasets and code assume four repeated measures in both outcome and predictor, and do not include covariates.
Example study
We provide an illustrative example to highlight the strength of this method. Deer et al. (Reference Deer, Doom, Harrall, Glueck, Glynn, Sandman and Davis2023) studied the longitudinal association from birth to adolescence between effortful control and BMI using data from a study of pregnant people and their offspring, described in Glynn et al. (Reference Glynn, Howland and Fox2018). We use this example study to highlight how to test associations in the context of both a repeated exposure (in this example, effortful control) and a repeated outcome (in this example, child BMI). Effortful control, or the effortful regulation of thoughts and behaviors such as planning, attention, and inhibiting impulsive behaviors (Rothbart & Bates, Reference Rothbart and Bates2006), has been identified as a transdiagnostic risk factor for mental and physical health (Anzman-Frasca et al., Reference Anzman-Frasca, Stifter and Birch2012; Santens et al., Reference Santens, Claes, Dierckx and Dom2020; Thamotharan et al., Reference Thamotharan, Lange, Zale, Huffhines and Fields2013). BMI is a measure of cardiometabolic health and is predictive of weight stigma (Puhl & Lessard, Reference Puhl and Lessard2020), being a victim of bullying (van Geel et al., Reference van Geel, Vedder and Tanilon2014), and poorer socioemotional development (Black & Kassenboehmer, Reference Black and Kassenboehmer2017), as well as physical and mental health across the lifespan (Jacobs et al., Reference Jacobs, Woo, Sinaiko, Daniels, Ikonen, Juonala, Kartiosuo, Lehtimäki, Magnussen, Viikari, Zhang, Bazzano, Burns, Prineas, Steinberger, Urbina, Venn, Raitakari and Dwyer2022; Sahoo et al., Reference Sahoo, Sahoo, Choudhury, Sofi, Kumar and Bhadoria2015). In the example presented, we operated using the temporality assumption, meaning that there must be temporal delays between causes and effects (Rothman et al., Reference Rothman, Lash and Greenland2012). Here, we posited that alterations in effortful control would precede changes in BMI, but this does not preclude a hypothesis that BMI could predict later effortful control.
Example study methods
Participants in the example study were recruited during pregnancy if they spoke English, did not smoke, were over 18 years of age, were pregnant with a single child, and did not report using drugs or alcohol during pregnancy. Offspring were included if they were born after 34 weeks of gestation, and had effortful control, height, and weight measured at six months of age. Effortful control was assessed using the Rothbart Temperament Questionnaires, which are a set of developmentally appropriate questionnaires to assess individual differences in reactivity and self-regulation across development (Ellis & Rothbart, Reference Ellis and Rothbart2001; Gartstein & Rothbart, Reference Gartstein and Rothbart2003; Putnam et al., Reference Putnam, Gartstein and Rothbart2006; Rothbart et al., Reference Rothbart, Ahadi and Evans2000; Simonds, Reference Simonds2006). These questionnaires were completed by mothers at 6-months, 1-year, 2-year, 5-year, 6.5-year, 9.5-year, and 11.5-year visits. Weight and length or height were measured by trained research assistants to calculate BMI at the 6-month, 1-year, 2-year, 5-year, 6.5-year, 9.5-year, 11.5-year, and 13-year visits. Infant sex at birth was collected from medical records and used as a moderator in the final model. Mothers reported their income and family size at the 6-month visit for the calculation of income-to-needs ratio (INR), which was used as a covariate in the final model. Further details appear in (Deer, Doom et al., Reference Doom, Deer, Dabelea, LeBourgeois, Lumeng, Martin, Hankin and Davis2023).
Asking the four study questions in the context of the example study
The investigators were interested in assessing whether there were associations between repeated measures of effortful control and BMI. It was assumed that effortful control needed to be measured prior to BMI in order to have a relation with it.
We now rephrase the potential research questions in the context of the example study.
-
Are all previous measurements of effortful control associated with the current measurement of BMI?
-
Across development, is the measure of effortful control immediately prior to the measure of BMI associated with that measure of BMI?
-
Is there a sensitive period when effortful control best predicts BMI over development? If so:
-
○ Does the strength of the association between effortful control in that sensitive period and BMI differ depending on when BMI was assessed?
-
○ Or, does the strength of the association between effortful control in that sensitive period and BMI remain constant no matter when BMI was assessed?
-
Example study analytic methods
All models used general linear mixed models to assess the relation between time-specific or repeated measures of effortful control with BMI trajectories. As prior research has identified a roughly quadratic pattern across this period of development (Wen et al., Reference Wen, Kleinman, Gillman, Rifas-Shiman and Taveras2012), each model was fit with a linear and quadratic age term. A random intercept for each participant and random slope for age was fit with an unstructured covariance between the random effects in order to account for within-person correlations between repeated measurements and the known increase in variance of BMI over time (Wen et al., Reference Wen, Kleinman, Gillman, Rifas-Shiman and Taveras2012). A series of four models were fit, following the planned hypothesis testing cascade shown in Figure 2, and used in the study (Deer, Doom et al., Reference Doom, Deer, Dabelea, LeBourgeois, Lumeng, Martin, Hankin and Davis2023). First, the all-times-before model was compared to the immediately-before model. Then, the all-times-before model was compared to the differential-sensitive-period model. Finally, the differential-sensitive-period model was compared to the stable-sensitive-period model.
Results from the example study
Contrast tests indicated that the all-times-before model did not explain more variance than the immediately-before model [p > 0.05]. The second set of contrast tests indicated that the all-times-before model did not explain more variance than the differential-sensitive-period model [p > 0.05]. Finally, comparing the differential-sensitive-period model and the stable-sensitive-period model showed that the magnitude and direction of the association between 6-month effortful control and BMI at each time point did not differ (p > 0.05). Therefore, the stable-sensitive-period model was retained as the final model.
This example highlights the benefit of the strategy described in this manuscript. Deer, Doom et al. (Reference Doom, Deer, Dabelea, LeBourgeois, Lumeng, Martin, Hankin and Davis2023) were able to find a model which best fit their data. Following the model identification stage, sex was added as a moderator of the association between effortful control at six months and BMI over time. Covariates were also added to the model.
Discussion
In this manuscript, we presented four questions that may be of scientific interest in experiments when both the exposure and the outcome were measured on multiple occasions across development. The questions allow an investigator to assess whether the exposure is associated with the outcome, to discern the temporal form of the association, and to test various developmental models. The models that follow from these questions are ideal for testing some of the central tenets of the developmental psychopathology framework, namely the transdisciplinary and multilevel nature of the approach, developmental cascades, understanding sensitive periods of development, and the interaction between risk and resilience factors shaping development over time.
The example study highlighted in this tutorial (Deer, Doom et al., Reference Doom, Deer, Dabelea, LeBourgeois, Lumeng, Martin, Hankin and Davis2023) concluded that the stable-sensitive period model best fit the data. However, in other studies, other temporal models might have been better. Here we discuss the implications of choosing each of the potential models and what this might mean for intervention strategies. If the all-times-before model was the final model, this would indicate that all previous measurements of an exposure are important when explaining the relation between the exposure and the outcome. Thus, it might be best to intervene as early as possible. Alternatively, if the immediately-before model was the final model, this would indicate that timing does not matter as much. Therefore, it would be possible to reduce levels of an exposure at any age in order to have an effect on the outcome. If either of the sensitive period models were the final model, this would indicate that the best time for intervention on the exposure would be during that sensitive period.
The paper fills an important gap by providing statistical models to evaluate sensitive periods and temporal associations between variables in the field of developmental psychopathology. It is possible, or perhaps likely in many instances, that factors at multiple points during development may predict repeated measures of socioemotional or behavioral outcomes, with different magnitudes of association across time. The current paper introduces methods to compare these developmental models.
This manuscript provides analytic methodology for evaluating whether a sensitive period model best fits repeated measures data in contrast to models which include all measurements of the predictor prior to or immediately before measurement of the outcome. Some time periods are considered sensitive periods of development for specific outcomes. Sensitive periods are times when physiological and behavioral systems are particularly open to inputs from the environment. Thus, different factors (e.g., stress, parenting, sociocultural influences) may play an outsized role in predicting the subsequent development of physical or mental health, in part due to changes in neurobiological functioning during sensitive periods (Cicchetti & Curtis, Reference Cicchetti and Curtis2015; Cicchetti, Reference Cicchetti2016; Knudsen, Reference Knudsen2004). Developmental researchers have identified several sensitive periods during which specific factors may predict various outcomes. Frequently studied sensitive periods include the prenatal period, infancy, early childhood, adolescence/puberty, the transition to adulthood, pregnancy, and the transition to parenthood (Davis et al., Reference Davis, Hankin, Swales and Hoffman2018; Davis & Narayan, Reference Davis and Narayan2020; Deer et al., Reference Deer, Su, Thwaites, Davis and Doom2023; Doom et al., Reference Doom, Deer, Dabelea, LeBourgeois, Lumeng, Martin, Hankin and Davis2024; Doom & Gunnar, Reference Doom and Gunnar2013; Glynn et al., Reference Glynn, Howland, Sandman, Davis, Phelan, Baram and Stern2018; Kaliush et al., Reference Kaliush, Conradt, Kerig, Williams and Crowell2023; Powers & Casey, Reference Powers and Casey2015; Schulenberg et al., Reference Schulenberg, Sameroff and Cicchetti2004). The method described in the current manuscript allows researchers to test hypotheses about whether these periods are sensitive periods for specific outcomes, or whether models including other temporal patterns of exposures may have a better fit.
The manuscript provides a tutorial on approaches to test hypotheses about temporal issues and sensitive periods. The models described in this tutorial allow for assessment of how associations may change over time. The analytic approach described in this tutorial is a modification of the multiple design matrix approach of Srivastava (Srivastava & Giles, Reference Srivastava and Giles1987b). For well-understood relations, it is possible that the researcher can choose the correct design matrix a priori. In many cases, however, it is not fully understood how the longitudinal predictors and outcomes of interest are associated. In both hypothesis-driven and exploratory cases, it is important that the research consider what plausible relations might exist. While it is possible that hypotheses may be developed with a literature search, comparing the fit of a series of different design matrices provides an empirical and rigorous test of timing questions. As a result, we strongly suggest that researchers test all of the models even if they have an a priori hypothesis that one will be the best fit. Additionally, it is important that researchers choose measures that are appropriate and measurable during the timeframe of interest. For example, if a researcher was interested in examining processes that occur during a specific developmental period (e.g., puberty), it would not be appropriate to continue modeling outcomes after this specific period.
When characterizing relations at sensitive periods, questions are often asked about how secondary predictors may moderate or mediate the association. When these types of questions arise and the temporal structure of a relation is unknown, we recommend that the research consider the two-step approach. First, use the modeling framework presented above to define the best-fitting temporal model of the primary predictor and outcome. Then, consider hypotheses about moderation or mediation in a second model, which includes the moderator or mediator of interest. This approach was used in the example study when sex was examined as a moderator following the initial stage of model identification.
Repeated measures of an exposure variable are typically correlated. In practice, correlations among exposures always introduce complexity in selecting a model. Muller and Fetterman (Reference Muller and Fetterman2002) provided extensive discussions of detecting and treating very high correlations among groups of exposures, often described as collinearity, and model selection. Cheng and colleagues (Reference Cheng, Edwards, Maldonado-Molina, Komro and Muller2010) provide some guidance on dealing with the complexity in “building a good enough mixed model.” If scientific considerations lead to exposures with moderate to high collinearity, then extra care must be taken in selecting tests to make correct decisions. One appealing approach is a backwards-stepdown strategy (Cheng et al., Reference Cheng, Edwards, Maldonado-Molina, Komro and Muller2010; Muller & Fetterman, Reference Muller and Fetterman2002), with variables ordered by the principles of chronology and parsimony. Chronological order avoids anachronisms by requiring exposures occurring earlier in time to be listed earlier in the model. Parsimonious order requires, for example, main effects to be listed earlier in the model than interactions. A backwards approach begins by comparing the ultimate, most complex, model to the penultimate model, etc. The approach ensures collinearity is accurately adjusted for statistically and logically.
While the models and associated questions discussed in the current manuscript are quite common in developmental psychopathology and other fields, we do not cover an exhaustive list of potential hypotheses and models. There are a variety of other statistical models which can describe longitudinal series of exposures and outcomes. Clusterbased approaches group participants together with similar exposure patterns providing a categorical exposure which summarizes longitudinal exposure (e.g., Nagin et al., Reference Nagin, Jones, Passos and Tremblay2018; Nagin, Reference Nagin2014; Nagin & Odgers, Reference Nagin and Odgers2010; Nagin & Tremblay, Reference Nagin and Tremblay2001). Cumulative approaches model exposure as area-under-the-curve, which is computed as the average product of the length of exposure and the strength of exposure. By contrast, the all-times-before model explicitly models the strength of the association between an exposure and outcome across different points in time. Thus, the all-times-before model cannot be considered a model of cumulative exposure.
This paper provides code written in SAS (SAS Institute Inc., 2016). The programming language was chosen because SAS allows hypothesis testing for the general linear mixed model using the Kenward-Roger corrections (Kenward & Roger, Reference Kenward and Roger1997, Reference Kenward and Roger2009). The Kenward Roger corrections ensure accurate type I error rates, and thus the replicability of data analysis. Researchers who prefer other programming languages can use our code as a template to produce similar models in their preferred language.
The methods used in the paper have some limitations. The manuscript considers a restricted class of temporal models. The manuscript describes four models, and a series of hypothesis tests to compare the models identified to test common questions in developmental psychopathology (e.g., sensitive period, all-times-before, and immediately-before). The four models described share a common feature: the exposures are all measured before the outcome. For many pairs of exposures and outcomes, however, a model including both contemporaneous and antecedent measures makes sense. Examples include models of pollution and cardiac autonomic physiology (Parenteau et al., Reference Parenteau, Alen, La, Luck, Teichrow, Daang, Nissen, Deer and Hostinar2022), racism and mental health (Liu et al., Reference Liu, Davis, Palma, Stern, Sandman and Glynn2023), and stress and parenting (Brown et al., Reference Brown, Doom, Lechuga-Peña, Watamura and Koppels2020). For such pairs of exposures and outcomes, different models than the four considered in this paper apply, including the marching covariate, and the at-or-before models (Srivastava & Giles, Reference Srivastava and Giles1987b). The marching covariate model uses contemporaneous exposures and outcomes and allows evaluating a hypothesis that only synchronous measures of exposures and outcomes are important. The at-or-before model uses exposures that are measured at the same time as the outcome, or at any time before the outcome. The at-or-before model allows evaluating whether anteceding or contemporaneous exposures are associated with an outcome. However, for the questions discussed in the current manuscript, the restricted class of temporal models are appropriate.
Another limitation of the models described in this manuscript is that they focus on a single sensitive period, rather than multiple sensitive periods. Numerous developmental processes are theorized to have multiple sensitive periods, such as cognitive (Thompson & Steinbeis, Reference Thompson and Steinbeis2020) and emotional development (Woodard & Pollak, Reference Woodard and Pollak2020). Future work should build on the current models to test multiple sensitive periods.
The strength of the manuscript is that it provides detailed instructions for developmental psychopathology researchers to understand temporal relations. The techniques described in this manuscript could be used to identify the timing for averting adverse sequelae of developmental issues. A thorough understanding of sensitive periods and developmental trajectories is needed in order to create effective prevention and intervention programming (Cicchetti, Reference Cicchetti2015a; Davis et al., Reference Davis, Hankin, Swales and Hoffman2018; Deer et al., Reference Deer, Bernard, Hostinar, Welling and Shackelford2019; Granic, Reference Granic2005; Hankin et al., Reference Hankin, Demers, Hennessey, Perzow, Curran, Gallop, Hoffman and Davis2023; Masten, Reference Masten2006; Sroufe & Rutter, Reference Sroufe and Rutter1984). The hope is that this tutorial manuscript can add to the analytic repertoire of researchers in the field, and help researchers better understand approaches for repeated measures data.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0954579424001299.
Author contributions
LillyBelle K. Deer and Kylie K. Harrall contributed to the manuscript equally.
Funding statement
DHG, DD, KKH and KEM were supported by NIH/NIGMS funded grants 5R01GM121081–08, 3R25GM111901 and 3R25GM111901-04S1. LKD was supported by NHLBI grant F32HL165844. JRD was supported by NHLBI grant K01HL143159. JRD and EPD were supported by NHLBI grant R01HL155744 and EPD was supported by NIMH grant R01MH109662.
Competing interests
None.