1 Introduction
The goal of many large-scale studies in the social sciences is to investigate the effect of an intervention or policy on an outcome of interest, such as the effect of a new mathematics curriculum on teachers’ instructional practices or the effect of vaccination advertisements on increasing vaccination rates. Such studies often involve large samples, requiring significant resources, but they are notorious for finding null results in educational studies (Coalition for Evidence-Based Policy, 2013). Further many interventions are difficult to implement or modify in the real world.
Thus, it is important that researchers implementing these large-scale studies also examine the mechanisms that underlie the intervention. Social network data may be one of the first observed intermediate outcomes before observing the primary study outcome; for example, an intervention about bullying will likely first impact how individuals interact. Second, social networks may even mediate the impacts of an intervention on individual-level outcomes; a bullying intervention may impact network structure in ways that are more conducive to improving student sense of belonging.
Social network data, however, present unique methodological challenges, since most statistical models are not appropriate for network data. This is because network ties violate the assumption of independent observations assumed by these models. Whether or not person a is friends with person b depends on what other friendships exist among the individuals in the network, whether a and b have any friends in common, and even the size of the network. To accommodate the unique interdependence present in social network data, researchers have proposed and continue to propose social network models that predict network ties (e.g. van Duijn et al., Reference van Duijn, Snijders and Zijlstra2004; Wasserman & Pattison, Reference Wasserman and Pattison1996; Hoff et al., Reference Hoff, Raftery and Handcock2002; Snijders & Nowicki, Reference Snijders and Nowicki1997).
Although most network researchers have focused on methods for a single network, several social network models have been extended for use across multiple, independent networks; that is, multilevel extensions of these network models have been developed that can be used on network data collected across ensembles or groups of networks (e.g. Snijders & Kenny, Reference Snijders and Kenny1999; Templin et al., Reference Templin, Ho, Anderson and Wasserman2003; Zijlstra et al., Reference Zijlstra, van Duijn and Snijders2006; Sweet et al., Reference Sweet, Thomas and Junker2013; Koskinen et al., Reference Koskinen, Caimo and Lomi2015; Slaughter & Koehly, Reference Slaughter and Koehly2016; Lorenz et al., Reference Lorenz, Boda, Salikutluk and Jansen2020). In addition, methods now exist to examine network-level interventions (Valente, Reference Valente2012) and estimate treatment effects (e.g. Sweet et al., Reference Sweet, Thomas and Junker2013). Moreover, Sweet (Reference Sweet2019) proposed the idea of modeling a social network as a mediator and introduced a mixed membership model for mediation. This model assumes that an intervention impacts the subgroup structure of the networks which then impacts the outcome.
While we believe the overall idea in Sweet (Reference Sweet2019) is promising and the proposed model has a great potential to be useful in assessing network mediation, the model is limited in that it requires networks to have clique structure. Further, the mediating variable in this model is a measure of subgroup structure, which may not always be the intermediate outcome observed as result of an intervention. Thus, alternative models are needed.
We introduce the hierarchical latent space mediation model (HLSMM), for mediation, which uses a latent space modeling approach instead of a mixed membership approach. Our model also assumes an intervention on a sample of multiple, independent networks, such as student friendship networks across schools or collaboration networks across organizations. Our model assumes that an intervention affects the network structure in treated networks, and this structure in turn affects the variability in an outcome variable in those networks. One possible application is to model the process through which group consensus is established.
Before introducing our model, we discuss modeling samples or ensembles of networks and estimating intervention effects using latent space approaches. We then introduce our model, a HLSM for mediation, and discuss its utility and estimation. We then present some evidence of feasibility through a simulation study and real-world analysis of teacher advice-seeking data.
2 Modeling ensembles of networks
Researchers have attempted to accommodate tie interdependency by either accounting for tie dependence explicitly and including network statistics as part of the model (e.g. exponential random graph models (Robins et al., Reference Robins, Pattison, Kalish and Lusher2007)) or accounting for tie dependence implicitly through the use of latent variables (Dabbs et al., Reference Dabbs, Adhikari and Sweet2020). We will focus on a latent variable model called a latent space model (LSM; Hoff et al., Reference Hoff, Raftery and Handcock2002), which incorporates latent space positions in the network model to account for the interdependence of network ties.
For modeling a sample of networks, we can use a HLSM (Sweet et al., Reference Sweet, Thomas and Junker2013), a multilevel extension of LSM, although other multilevel extensions have been proposed for other network models (Snijders & Kenny, Reference Snijders and Kenny1999; Templin et al., Reference Templin, Ho, Anderson and Wasserman2003; Zijlstra et al., Reference Zijlstra, van Duijn and Snijders2006; Koskinen et al., Reference Koskinen, Caimo and Lomi2015; Slaughter & Koehly, Reference Slaughter and Koehly2016; Lorenz et al., Reference Lorenz, Boda, Salikutluk and Jansen2020).
A HLSM for an ensemble of binary networks denoted by adjacency matrix, $A_k$ , for network k is given as:
where $A_{ijk}$ denotes a tie between nodes i and j for network k, $X_{ijk}$ is an edge-level covariate, $\beta_0$ is an intercept, and $\beta_k$ is a q-dimensional vector of the associated coefficients. Parameters $\beta_k$ may vary across each network k. Without covariates, Equation (1) specifies the probability of a tie $p_{ijk}$ between two nodes i and j in a network k as a function of the distance between the d-dimensional latent space positions $Z_{ik}$ and $Z_{jk}$ . The latent position $Z_{ik}$ is assumed to have multivariate normal distribution with mean 0 and diagonal variance–covariance matrix $(\sigma_{Z_k}^2 {I}_d)$ , where ${I}_d$ is a d-dimensional identity matrix.
Networks with larger values of the variance of the latent position, $\sigma_{Z_k}^2$ , would generally have latent positions that are more dispersed than networks with smaller values of $\sigma_{Z_k}^2$ . Because node dispersion is generally related to increases in distance between pairs of nodes, $\sigma_{Z_k}^2$ is correlated with the density of the network. Given a collection of networks, it may be of interest to explore or even model the differences in the variance of the latent space positions across networks.
2.1 Modeling interventions with LSMs
Consider an intervention in which entire networks are assigned to treatment or condition. Because we will eventually model the effect of the intervention as a function of $\sigma_Z^2$ in Equation (1), we first demonstrate how $\sigma_{Z}^2$ impacts network structure.
Recall that in a LSM, each network is represented by relative positions of its nodes in a low-dimensional space and the probability of a tie is a function of the distance between the nodes. Thus, $\sigma_{Z}^2$ controls the spread or scale of the distance, and networks with smaller values of $\sigma_{Z}^2$ will have higher density than networks with larger values of $\sigma_{Z}^2$ , because smaller values of $\sigma_{Z}^2$ map to smaller pairwise distances and higher tie probabilities. In addition, we also expect to see an increase in the number of isolated nodes as $\sigma_{Z}^2$ increases.
However, unlike random graph models, LSMs are also known to preserve reciprocity and transitivity (Hoff et al., Reference Hoff, Raftery and Handcock2002). This is because, as long as the position of a node relative to the neighboring nodes is preserved, reciprocity and transitivity will also be preserved. Therefore, with the exception of extreme differences in $\sigma_{Z}^2$ , reciprocity and transitivity should be similar across values of $\sigma_{Z}^2$ .
For illustration, we simulated 100 networks for values of $\sigma_{Z}^2$ and plotted the median network statistic value (orange circle) along with 0.025 and 0.975 quantiles, indicated by the line segment. Figure 1 illustrates how the variance of the latent space positions in a LSM impacts density, variance in degree, the number of isolated nodes, and mean geodesic distance (among connected nodes). The plot showing geodesic distance is misleading. The distances are not actually decreasing with increasing $\sigma_{Z}^2$ ; rather, the number of isolated nodes who are not connected to other nodes is increasing, and these distances are not included.
Aside for extreme differences in $\sigma_{Z}^2$ , there is little impact on the levels of reciprocity and transitivity although we note that reciprocity and transitivity naturally increase as density increases. Here, we use $\sigma_{Z}^2$ values 2 to 10 to illustrate a wide range, but note that in Section 4.1, we consider networks generated from LSMs with expected values of $\sigma_{Z}^2$ of 2.7 in the control condition.
Thus, an intervention that impacts $\sigma_{Z}^2$ would change the overall density and the number of isolated individuals in the network, while preserving some standard network structures such as transitivity and reciprocity. For example, in an intervention focused on improving school climate, we may expect greater unity and consensus among its teachers, but we would not expect clique structure or reciprocity to change. Similarly, teachers who were isolated previously may be now be connected to someone in the network as a result of a decrease in $\sigma_{Z}^2$ .
3 A hierarchical latent space mediation model
Let us consider an example that will motivate our proposed model. Suppose, there is a cluster-randomized trial in which schools are randomly assigned to an intervention and that the intervention changes how teachers within each school interact. Let us also assume the goal of this intervention is to improve teacher buy-in about an instructional method. We hypothesize that treated schools have different types of interactions which result in a stronger consensus about that instructional method.
To fit a mediation model, a classic approach is to use the three regression equations shown in Equation (2) and made popular by Baron & Kenny (Reference Baron and Kenny1986). We present the Bayesian version of these equations where generally the last two lines are estimated, and the mediation or indirect effect is defined as $\beta_2\gamma$ (Yuan & MacKinnon, Reference Yuan and MacKinnon2009). The model is given as
where k indexes the observed unit of randomization, $Y_k$ denotes the outcome, $T_k$ denotes the treatment status, and $M_k$ denotes the mediator. Similarly, $\alpha_1$ , $\alpha_2$ , and $\alpha_3$ are intercept parameters, $\beta_1$ , $\beta_2$ , and $\beta_3$ are coefficients associated with treatment indicator $T_k$ , and $\gamma$ is the coefficient associated with mediator $M_k$ .
In our context, our observed units are the networks, and our outcome of interest is teacher consensus, measured as the variability in opinions about this instructional method. Further, we assume that the intervention is focused on bringing teachers together and increasing collaboration. We present this scenario as a Bayesian mediation model where we replace $M_k$ with a parameter from a HLSM for network k and incorporate the last two lines of Equation (2).
Thus, the hierarchical latent space mediation model (HLSMM) model is given as:
where the treatment indicator is still $T_k$ ; i and j index the individuals within each network and k indexes the network. For any network, $A_k$ is generated from a LSM such that the variance–covariance matrix of the latent space positions is $\sigma^2_{Z_k}{I}_d$ . Note that pretreatment confounders can be included in the model as well.
We can also represent this model as the directed acyclic graph given in Figure 2; the mediation analysis is illustrated among treatment T, the mediator $\log(\sigma_{Z})$ , and the outcome variable $\log(\sigma_{Y})$ . Compared to Equation (2), the mediation analysis is also tied to network model such that a network tie from node i to node j in network k ( $A_{ijk}$ ) is a function of latent space positions $Z_{ik}$ and $Z_{jk}$ with variance $\sigma^2_{Z_k}$ .
Note that Figure 2 and Equation (3) denote the impact of an independent variable and mediating variable on the variability of some outcome $\log(\sigma_Y)$ , but other outcome variables could be used. Because the mediator is the log standard deviation of the latent space positions, a natural outcome is some measure of consensus or the log standard deviation in the outcome variable among individuals in each network. Also note that Equation (3) is parameterized such that the outcome and mediator both have normal distributions, similar to Equation (2), but we could parameterize this model so that $\sigma_Z$ and $\sigma_Y$ have lognormal distributions.
Returning to our contextual example, if the intervention is effective, the treated networks will have a smaller latent position variance. Decreasing the log standard deviation of the latent space positions, $\log(\sigma_{Z})$ , essentially decreases the number of isolated nodes and increases the overall density of the network. Not having any isolated nodes and increasing the number of ties would therefore result in higher levels of connection in the network. Thus, we hypothesize that well-connected networks have higher levels of consensus than networks with sparse connections.
3.1 Estimation
The HLSMMs in this manuscript were fit using a Markov chain Monte Carlo (MCMC) algorithm (Gelman et al., Reference Gelman, Carlin, Stern and Rubin2013), and we used a combination of Metropolis-Hastings and Metropolis updates within Gibbs sampling. The Metropolis/Gibbs algorithm is standard and allows any reader familiar with MCMC to understand the estimation process as well as augment the code to expand or adapt the proposed model for their purposes. However, the HLSMM can be fit using a variety of MCMC algorithms as well as posterior approximation methods such as variational inference.
Moreover, the MCMC algorithm combines the network portion of the model and the mediation model and updates all parameters simultaneously. That is, the latent positions given the observed networks are updated iteratively alongside the regression coefficients of the mediation portion of the model. Fitting the HLSMM in this way accounts for the estimation error in the latent positions when estimating the regression coefficients used to calculate the indirect effect.
Given the normal distributions specified in the model for $\log(\sigma_{Z})$ and $\log(\sigma_{Y})$ , we used Gibbs updates for $\gamma_0$ , $\gamma_1$ , $\theta_0$ , $\theta_1$ , and $\nu$ , using normal distributions as conjugate priors. Similarly, for variance hyper-parameters $\tau_{Z}$ and $\tau_Y$ , we employed conjugate inverse-Gamma distributions so that they were also updated via Gibbs. The latent space positions as well as the variance of the latent space positions (reparametrized as $\log(\sigma_Z)$ ) were each updated using a random walk Metropolis-Hastings update. We used conjugate priors when possible, but other prior distributions can be used. For example, half-Cauchy distributions could replace inverse-Gamma distributions for the variance updates.
Note that LSMs have two identifiability issues. The first is that there are an infinite number of latent space position configurations possible due to translation, reflection, and rotation. A procrustes transformation as used in Hoff et al. (Reference Hoff, Raftery and Handcock2002) can be applied to obtain estimates of the latent space positions with a stable configuration. Note that this transformation can be applied post-sampling as convergence can be obtained without it. The second identifiability issue is discussed less often but has larger impacts on the HLSMM parameters. There is an identifiability between $\beta_0$ , which can be thought of as an intercept and related to the overall probability of a tie, and the variance of the latent space positions $\sigma^2_{Z_k}$ , which also impacts tie probability through the dispersion of nodes in the latent space. We recommend fixing $\beta_0$ when estimating $\sigma^2_{Z_k}$ is of particular importance.
4 Empirical examples
As a proof of concept, we explore some of the operating characteristics and demonstrate the feasibility of the HLSMM with two empirical examples. The first is a series of simulations to help readers connect mediation parameter values to variance parameters and network structures. Through these simulations, we also examine the conditions under which the proposed model can be most useful. The second example is a real-world application of the HLSMM that involves advice-seeking networks among school staff at 14 schools in 2010 and 2015.
4.1 Simulation study
The purpose of this small simulation study is to examine the influence of various parameters on both parameter recovery and model utility. We consider nine different combinations of parameters $\gamma_1$ and $\theta_1$ in Equation (3). Recall that $\gamma_1$ is the effect of the intervention T on the log standard deviation of the latent space positions $\log(\sigma_{Z})$ , so we explore settings where the intervention has three different effects on this measure of variability. Similarly, $\theta_1$ is the conditional effect of $\log(\sigma_{Z})$ on $\log(\sigma_{Y})$ given T, and we consider three different values for this parameter as well.
We selected values of $\gamma_1$ that resulted in differences in $\log(\sigma_Z)$ that produced visually distinct but still realistic networks and values of $\theta_1$ that yielded analogous values for $\log(\sigma_Y)$ . Note that negative values of $\gamma_1$ result in smaller variance of latent space positions. Our positive value of $\theta_1$ maps to a positive relationship between latent space variability $\log(\sigma_Z)$ and variability in the outcome $\log(\sigma_Y)$ .
Table 1 summarizes the nine cells of our simulation, and each cell represents one combination of values for $\gamma_1$ and $\theta_1$ . We also include sample variances (across replications) to illustrate the indirect and direct effects on the outcome; note that $\sigma^2_{Z}$ and $\sigma^2_{Y}$ are given instead of $ \log(\sigma_{Y})$ and $\log(\sigma_{Y})$ for ease of interpretation. The full data generating model used is given as:
where k indexes the 30 networks each with 20 nodes. The independent variable $T_k$ indicates whether network k is in the treated group such that 15 networks are treated and 15 networks are not.
In Equation (4), the expected $\log(\sigma_Z)$ for control networks and treated networks is defined as $\gamma_0$ and $\gamma_0+ \gamma_1$ , respectively. Thus, in selecting values for each, our goal was to generate networks that reflected real-world data. The expected $\sigma^2_Z$ under the control condition is approximately 2.72 and under the three treated conditions ( $\gamma_1 = -0.2, -0.4, -0.6$ ) is 1.82, 1.22, and 0.82 respectively.
Figure 3 shows 30 networks generated from Equation (4) where $\gamma_0=0.5$ and $\gamma_1=-0.4$ . The control networks are shown on the top three rows of plots and the treated networks on the bottom three rows. Overall, the control networks tend to have more isolated nodes and are less dense than the treated networks. Thus, $\gamma_1=-0.2$ would result in less noticeable differences between the conditions, and $\gamma_1=-0.6$ would result in more noticeable differences between conditions.
We selected values for $\theta_1$ in a similar manner aiming for $\sigma_Y$ to be distinct enough to detect differences. Again, we selected error variances ( $\tau=0.09$ ) to observe adequate separation of distributions of $\log(\sigma_{Y_k})$ to detect effects. Finally, we also chose to include a nonzero value for $\nu=-0.1$ to indicate a partial mediation. Figure 4 shows a plot of $\log(\sigma_{Y_k})$ against $\log(\sigma_{Z_k})$ for the 30 networks generated from Equation (4) where $\gamma_0=0.5$ , $\gamma_1=-0.4$ , $\theta_0=-0.45$ , $\theta_1=0.9$ , and $\nu=0.1$ . There is evidence of a positive effect of $\log(\sigma_{Z})$ on $\log(\sigma_{Y})$ .
Within each simulation condition given in Table 1, we simulated data from a HLSMM generating model, Equation (4), and fit a HLSMM, Equation (3), using the following priors:
We selected weakly informative priors for $\gamma_0$ and $\theta_0$ and chose more informative priors for $\gamma_1$ , $\theta_1$ , and $\nu$ and error variances $\tau_Z$ and $\tau_W$ . (See the Appendix for a prior sensitivity analysis.) For each model fit, we ran three chains of length 10,000; we removed the first 1,000 draws and thinned each chain by 35 to obtain a final posterior sample of 774; this length and thinning provided adequate convergence and autocorrelation across all parameters. This process was repeated 100 times for each cell of the simulation (Table 1).
To evaluate model performance under each setting, we report the coverage rate, defined as the proportion of replications in which the 95% equal-tailed credible interval covered the data generating parameter. In addition, we also report the posterior probabilities that the mediation effect $\gamma_1\theta_1<0$ . Note that the mediation effect is negative for all cells of the simulation.
Figure 5 summarizes the posterior distributions of the mediation effect (indirect effect) $\gamma_1\theta_1$ of the 100 replications across all 9 conditions. The circles represent the maximum a posteriori (MAP) estimates, and the bars indicate the 95% highest posterior density (HPD) credible intervals. The coverage rates for the mediation effect are between 0.91 and 0.98, indicating that the indirect effect coverage is as expected. In addition, parameter coverage for all parameters estimated in the model range from 0.91 to 1.00. Furthermore, we note that the posterior variance for $\gamma_1\theta_1$ increases as the effect sizes of $\gamma_1$ and $\theta_1$ increase.
We also examine the posterior probability of the indirect effect being less than 0. We find that as the true indirect effect becomes more negative (a stronger effect), the posterior probability increases. It appears that changing the effect sizes of $\gamma_1$ has a larger impact than $\theta_1$ on these posterior probabilities. The exact probabilities are given in Table 2.
4.2 Application to teacher networks
We now present an application of the HLSMM to real-world data. These data come from a longitudinal study of workplace and social interactions among school staff that are part of the Distributed Leadership Studies at Northwestern University (distributedleadership.org). These data represent language arts advice-seeking relationships among school staff in 14 elementary schools in 2010 and 2015 in a suburban school district in the midwestern part of the US. There were major policy changes during this time including a new mathematics curriculum, the use of mathematics instructional coaches for teachers, and high stakes testing in mathematics.
While advice-seeking ties around mathematics have been studied in detail (Spillane et al., Reference Spillane, Shirrell and Sweet2017; Spillane et al., Reference Spillane, Hopkins and Sweet2015; Spillane et al., Reference Spillane, Hopkins and Sweet2018), language arts advice-seeking ties have not. We aim to examine the effects of the policy changes in mathematics on the language arts advice-seeking networks, and whether these networks then mediated consensus about teachers’ influence on school policy.
School staff in these 14 schools were asked “During this school year, to whom have you turned for advice and/or information about curriculum, teaching, and student learning?” For each person nominated, respondents then classified the connection as being about language arts, math, or science. The language arts networks for the years 2010 and 2015 are shown in Figures 6 and 7, respectively. Note that school staff includes teachers and administrators as well as instructional support teachers (math coaches) but does not include all staff employed by the school district such as nurses, administrative assistants, custodians, or para-educators.
Response rates varied by school but on average were approximately 85%. Respondents who did not submit a survey were excluded from the network unless they were nominated by at least two other staff members, and these individuals were assumed to have no outgoing ties. Further, staff members who took the survey but did not nominate anyone remained in the networks as having no outgoing ties.
As part of this study, school staff were asked a number of items about mathematics and mathematics pedagogy, but there were a few survey items that were not subject-specific. One is a series of items asking respondents “How much influence do teachers have over school policy in each of the areas below?” and the areas listed were as follows: hiring professional staff; planning how discretionary funds should be used; determining which books and instructional materials are used in classrooms; establishing the curriculum and instruction program; determining the content of in-service programs; setting standards for student behavior; and determining goals for improving the school.
To demonstrate how the HLSMM could be used in practice, we examine the extent to which language arts advice-seeking networks mediated the effect of the district changes on consensus about teacher’s influence on school policy. The belief is that structural differences in the language arts networks in 2010 compared to 2015 may impact differences in school staff consensus about teacher influence over policy (in 2010 compared to 2015).
The HLSMM fit to these data is given as:
where $A_{ijk}=1$ if teacher i seeks language arts advice from teacher j in school k and 0 otherwise, $T_k=1$ if the network is from 2015 and 0 otherwise, and $\sigma_{Y_{k}}$ is the standard deviation in teacher influence scores for school k.
Because our model hinges on accurate estimation of $\sigma_{Z_k}$ , we chose to fix $\beta_0$ . To fit the HLSMM, we ran a single chain of length 200,000. We broke the chain into three pieces to assess convergence; the potential scale reduction factor for the mediation effect was 1 with an upper confidence bound of 1.01, indicating adequate mixing. After tossing the first 1000 steps and thinning by every 175th step, our posterior sample size was 1138.
The posterior distributions for $\gamma_1$ , $\theta_1$ , $\nu$ , and $\gamma_1\theta_1$ are shown in Figure 8, where the HPD credible intervals are indicated by the shaded region. There is evidence of an intervention effect $\gamma_1$ on the $\log(\sigma_Z)$ , and the posterior probability that $\gamma_1>0$ is 0.94. The evidence of a mediated or indirect effect of $\log(\sigma_Z)$ is weak; the posterior probability that $\gamma_1\theta_1<0$ is 0.71, which is not statistically significant from a frequentist perspective that offers little evidence of a mediated effect. Further, the posterior probability that $\nu<0$ is 0.93 which suggests some evidence of a direct effect.
Thus, we conclude that there is evidence that the networks in 2015 compared to those in 2010 have a larger latent space variance in a LSM. There is some evidence of a direct effect of policy changes on staff consensus about their influence, but little evidence that the spread of the latent space positions mediated the effect of policy changes on teacher consensus.
This example serves as an illustration of our model as opposed to a formal analysis of these networks. Our evidence was quite weak, but even if our finding for a mediated effect could be considered statistically significant, there were too many limitations for these results to be used in practice. We are comparing networks in 2010 with networks in 2015, and there may be other causes for differences in network structure over the years than just the district-led policy changes. Similarly, there may be other variables that impact differences in teacher consensus.
As an illustration, however, we can continue with this real-world data analysis to examine goodness of fit. Note that we did fix $\beta_0$ as there is potentially an identifiability issue between $\beta_0$ and $\sigma_{Z_k}$ . We selected several possible values for $\beta_0 = (-3,-2,-1, 0, 1, 2,3)$ . If we fixed the intercept at a value that was too small, the overall probability of tie would be not be well recovered because the model is defined by subtracting distance between the latent positions from the intercept. Subtracting a number from a small number would yield results in fitted model that predicts extremely few ties. Similarly, if we selected an intercept that was too large, the latent positions would need to be quite far and it is possible that our model would produce networks dissimilar to our observed data. We hypothesized that $\beta_0=1$ would be ideal and decided to rely on goodness-of-fit checks to confirm.
We use posterior predictive checks and generate network data from the parameters specified at each step in our MCMC chain. We examine goodness of fit in two stages: we first investigate the network portion of the model and then we examine the regression portion of the model. For the network model fit, we use the latent space positions estimated at each step of the chain to generate 28 networks and evaluate our model fit using density, reciprocity, and transitivity in each of the 28 networks.
Figure 9 shows the distributions of density, reciprocity, and transitivity for each network generated from parameters at each step in the MCMC chain. The observed network statistic for each network are shown with a vertical line. In general, density, reciprocity, and transitivity are all well recovered, indicating that the LSM was able to capture these aspects of the observed network.
Note that posterior predictive checks using values of $\beta_0 =-1, -2$ resulted in models that generated networks with lower densities, reciprocity, and transitivity values than in our observed network; similarly values of $\beta_0=3$ produced networks that tended to generate networks higher in reciprocity and transitivity than in our observed data. Using $\beta_0=0$ or $\beta_0=2$ did not produce extremely different networks, although reciprocity recovery was slightly lower for $\beta_0=0$ .
For the regression portion of the model fit, we took the parameters at each step of the MCMC chain to generate outcome data $\log \sigma_{Y_k}$ . We show the distribution of our predicted outcome data with the observed data in Figure 10. The posterior predictive checks suggest the outcome data were well recovered.
Thus, we find the HLSMM fitted to the language arts advice-seeking network data to be adequate because the HLSMM generates data quite similar to our observed data. Despite the lack of evidence of mediation, we believe this example illustrates the feasibility of our model and demonstrates how posterior predictive checks can be used to assess model fit.
5 Discussion
To contribute to the methodological research exploring how social networks could act as mediators in interventions on networks, we introduced the HLSMM that embeds a LSM into a mediation analysis.
Our model is specifically aimed at interventions that increase or decrease connectedness (average centrality and density) and whose network structure is believed to impact a network-level outcome. The model we presented uses the variance of the latent positions in a network model to mediate the effect of a network-level treatment on the variance of an outcome.
We also illustrated the feasibility of our model through a small simulation study showing that our model recovers parameters and can estimate mediation effects. We also presented a real-world example of a natural experiment on networks. Although we did not find strong evidence of a mediation effect, this example showed how this model can be applied to data and how posterior predictive checks can be used for assessing goodness of fit.
We note that many mediation analyses are aimed at estimating a causal effect. We argue that the HLSMM could be used in a causal mediation analysis in the same way that the mediation model put forth by Baron & Kenny (Reference Baron and Kenny1986) can be used in a causal mediation analysis. Causal inference requires a series of assumptions, and our model’s ability to meet these assumptions is context- and data-dependent.
Further, this model does not account or accommodate within-network interactions, otherwise known as network interference or peer effects. How individuals within each network are influencing each other and their outcomes is not being modeled; this is a limitation of this model and other multilevel network models. Rather this model is looking at a mediation effect on a network-level outcome.
Thus, a natural next step in this work is to examine how network-level interventions and mediating variables could impact node-level outcomes. Being able to map changes across and within networks will help to improve our understanding of how individual outcomes are shaped through interactions. We imagine extensions to this model would then account for network interference/peer influence and align with current research in that area (e.g. An, Reference An2018; VanderWeele & An, Reference VanderWeele, An and Morgan2013; Ogburn, VanderWeele et al., Reference Ogburn and VanderWeele2017; Manski, Reference Manski2000).
Acknowledgments
We thank Jim Spillane and the Distributed Leadership Studies at Northwestern University (http://www.distributedleadership.org), including the NebraskaMATH study, for the data used in this paper.
Competing interests
None.
Appendix
Prior sensitivity analysis
We conducted a prior sensitivity analysis because prior distribution specification may be challenging with Bayesian models. We considered four different prior distributions as detailed in Table 3 where the first row is considered to be standard. We then vary the priors for the other parameters to be more or less informative.
For each replication of our sensitivity analysis, we changed the prior distribution for a single parameter and kept the priors for all other parameters to be those given in Table 3. For example, when exploring the priors for $\gamma_0$ , the priors for $\gamma_1$ , $\theta_0$ , $\theta_1$ , $\nu$ , $\tau_Z$ , $\tau_Y$ did not change and remained fixed to the distribution given in the first row of Table 3. Thus, our sensitivity analysis had 28 cells, and for each cell in the table, we ran 10 replications.
The data generating model is given as:
where $k=1,\ldots,30$ indexes the network, and $i,j=1\ldots,20 index the nodes$ . $X_k$ is a binary vector of length 30.
The prior distribution $\gamma_0$ , $\gamma_1$ , $\theta_0$ , $\theta_1$ , and $\nu$ had little to no impact on the parameter recovery of that parameter or on the parameter recovery of any other parameters in the model. The prior distributions of $\tau_Z$ and $\tau_Y$ did parameter recovery of $\tau_Z$ and $\tau_Y$ respectively; priors whose distributions that gave non-trivial weight to extremely large values resulted in more extreme values in the posterior distributions of $\tau_Z$ and $\tau_Y$ and worse parameter recovery.
Similarly, prior distributions for $\tau_Z$ and $\tau_Y$ also had varying impacts on the posterior variance for other parameters. When IG(1,1) or IG(1,2) was used for $\tau_Z$ , the posterior variances for $\gamma_0$ and $\gamma_1$ were slightly larger, but when IG(1,1) or IG(1,2) was used for $\tau_Y$ , the posterior variances for $\theta_0$ , $\theta_1$ , and $\nu$ were between 3 and 5 times higher.
Despite some effects of the choice prior distribution, the posterior distribution of the mediation effect was generally robust to choice of prior, and only the posterior variance was impacted when extreme prior distributions were used for variance components. This does impact the utility of the model in that users must be thoughtful regarding their choice of prior distribution for the $\tau_Y$ .