Learning Bayesian Networks: A Copula Approach for Mixed-Type Data

Federico Castelletti

doi:10.1007/s11336-024-09969-2

Learning Bayesian Networks: A Copula Approach for Mixed-Type Data

Published online by Cambridge University Press: 27 December 2024

Federico Castelletti

Show author details

Federico Castelletti*: Affiliation:
Universitá Cattolica del Sacro Cuore
*: Correspondence should be made to Federico Castelletti, Department of Statistical Sciences, Universitá Cattolica del Sacro Cuore, Milan, Italy. Email: [email protected]

Article contents

Abstract
Introduction
Model Specification
Bayesian Inference
Computational Implementation and Posterior Inference
Simulation Study
Real Data Analyses
Discussion
Footnotes
References

Rights & Permissions

Abstract

Estimating dependence relationships between variables is a crucial issue in many applied domains and in particular psychology. When several variables are entertained, these can be organized into a network which encodes their set of conditional dependence relations. Typically however, the underlying network structure is completely unknown or can be partially drawn only; accordingly it should be learned from the available data, a process known as structure learning. In addition, data arising from social and psychological studies are often of different types, as they can include categorical, discrete and continuous measurements. In this paper, we develop a novel Bayesian methodology for structure learning of directed networks which applies to mixed data, i.e., possibly containing continuous, discrete, ordinal and binary variables simultaneously. Whenever available, our method can easily incorporate known dependence structures among variables represented by paths or edge directions that can be postulated in advance based on the specific problem under consideration. We evaluate the proposed method through extensive simulation studies, with appreciable performances in comparison with current state-of-the-art alternative methods. Finally, we apply our methodology to well-being data from a social survey promoted by the United Nations, and mental health data collected from a cohort of medical students. R code implementing the proposed methodology is available athttps://github.com/FedeCastelletti/bayes_networks_mixed_data.

Keywords

Bayesian inference directed acyclic graph Markov chain Monte Carlo network psychometrics structural equation model

Type: Theory & Methods
Information: Psychometrika , Volume 89 , Issue 2 , June 2024 , pp. 658 - 686

DOI: https://doi.org/10.1007/s11336-024-09969-2 [Opens in a new window]
Copyright: Copyright © 2024 The Author(s), under exclusive licence to The Psychometric Society

1. Introduction

1.1. Background and Motivation

Learning dependence relations between variables is a pervasive issue in many applied domains, such as biology, social sciences, and notably psychology (Briganti et al., Reference Briganti, Scutari and McNally2022; Isvoranu et al., Reference Isvoranu, Epskamp, Waldorp and Borsboom2022). In the latter context, the recent field of network psychometrics considers a network-based approach to represent psychological constructs and understand directed interactions between behavioral, cognitive and biological factors, possibly allowing for causal interpretations (Borsboom et al., Reference Borsboom, Deserno, Rhemtulla, Epskamp, Fried, McNally, Robinaugh, Perugini, Dalege, Costantini, Isvoranu, Wysocki, van Borkulo, van Bork and Waldorp2021). The 2022 Psychometrika special issue “Network Psychometrics in action” promoted the development of statistical methods for network modeling motivated by psychological problems, and collected several contributions to the field, covering both methodological and applied aspects (Marsman and Rhemtulla, Reference Marsman and Rhemtulla2022).

Early works in the network psychometrics area were conceived to support psychologists in providing insights on various psychological phenomena, such as those at the basis of psychopathology, and in particular the study of comorbidity and mental disorders (Borsboom, Reference Borsboom2008; Cramer et al., Reference Cramer, Waldorp, Van der Maas and Borsboom2010). Typical research questions thus relate to the identification of direct dependencies between manifest variables, differences in the underlying dependence structure across available groups of patients, or even to the design of clinical interventions based on an estimated network. Crucial to these purposes is the development of models that allow to infer a plausible network structure for the available data and to provide a coherent quantification of the uncertainty related to directed links, specific paths or the whole network structure. In this regard, methodologies that fully account for network uncertainty lead to parameter estimates that are more robust w.r.t. possible network-model misspecifications; see Haslbeck and Waldorp (Reference Haslbeck and Waldorp2018), Epskamp et al. (Reference Epskamp, Kruis and Marsman2017) and Marsman et al. (Reference Marsman, Huth, Waldorp and Ntzoufras2022) for recent contributions in this area inspired by psychological problems.

All of the issues introduced above have motivated the development of dedicated statistical methodologies, based both on a frequentist and on a Bayesian paradigm. In particular, probabilistic graphical models both based on undirected or directed networks provide an effective tool to infer conditional dependence relations from the data (Cowell et al., Reference Cowell, Dawid, Lauritzen and Spiegelhalter1999; Edwards, Reference Edwards2000). Additionally, directed acyclic graphs (DAGs) offer a powerful framework for causal reasoning, even from observational, namely non-experimental, studies and specifically to quantify effects of hypothetical interventions on target variables w.r.t. outcome responses of interest; see Pearl (Reference Pearl2000) for a general introduction to causal inference based on DAGs, Maathuis and Nandy (Reference Maathuis, Nandy, Bühlmann, Drineas, Kane and van der Laan2016) for a review. The next section offers an overview of the main recent contributions to graphical modeling.

1.2. Literature Review

From a statistical perspective, learning a network of dependencies from the data is a model selection problem also known as structure learning. Several related methodologies that can deal with Gaussian and categorical data separately have been proposed. Specifically, score-based methods implement score functions for network estimation, such as based on penalized maximum likelihood estimators (Friedman et al., Reference Friedman, Hastie and Tibshirani2008; Meinshausen and Bühlmann, Reference Meinshausen and Bühlmann2006), or marginal likelihoods for methodologies following a Bayesian perspective (Chickering, Reference Chickering2002; Heckerman et al., Reference Heckerman, Geiger and Chickering1995). Moreover, constraint-based methods implement conditional independence tests to learn the set of (in)dependence constraints characterizing the underlying DAG structure, as in the popular PC algorithm (Kalisch and Bühlmann, Reference Kalisch and Bühlmann2007; Spirtes et al., Reference Spirtes, Glymour and Scheines2000). On the other hand, Bayesian methodologies adopt Markov Chain Monte Carlo (MCMC) methods to approximate a posterior distribution over the space of network structures, or related features of interest; see, for instance, Castelletti et al. (Reference Castelletti, Consonni, Della Vedova and Peluso2018) and Castelletti and Peluso (Reference Castelletti and Peluso2021) for, respectively, Gaussian and categorical settings, Ni et al. (Reference Ni, Baladandayuthapani, Vannucci and Stingo2022) for a recent overview of Bayesian methods for structure learning with applications to biological problems.

Mixed-type data, i.e., observations from variables of different parametric families, are very common in many contexts and especially psychological studies, where ordinal, discrete and continuous measurements are simultaneously collected on subjects. In particular, dealing with ordinal measurements within multivariate models is a pervasive issue in psychometrics, as clinical psychological data typically include polytomous items measured on ordinal or Likert-type scales. In this context, item response theory (IRT) based on factor analysis assumes that ordered-categorical responses are discrete representations of continuous latent scores. From a modeling perspective observed categorical data are then assumed to generate from discretization of latent data, as in the classical probit model and its extensions to ordinal responses. See in particular Wirth and Edwards (Reference Wirth and Edwards2007) for a review of item factor analysis within a structural equation modeling framework.

A few methodologies for structure learning from mixed data have been proposed. Harris and Drton (Reference Harris and Drton2013) introduce the rank PC, an extension of the original PC algorithm to nonparanormal models, namely based on a semi-parametric latent Gaussian copula model, with purely continuous marginal distributions. Moreover, Cui et al. (Reference Cui, Groot, Heskes, Frasconi, Landwehr, Manco and Vreeken2016) propose the Copula PC, an adaptation of the PC algorithm to a mixture of discrete and continuous data assumed to be drawn from a Gaussian copula model. Cui et al. (Reference Cui, Groot and Heskes2018) extend the previous method to deal with data that are missing at random. Similar ideas, for the case of undirected graphs, are also considered by Müller and Czado (Reference Müller and Czado2019) and He et al. (Reference He, Zhang, Wang and Zhang2017). Still in the context of directed graphs, a more recent methodology for structure learning given both categorical and Gaussian data is proposed by Andrews et al. (Reference Andrews, Ramsey and Cooper2018). The authors introduce a mixed-variable polynomial score based on the notion of Conditional Gaussian (CG) distribution (Lauritzen and Wermuth, Reference Lauritzen and Wermuth1989), then extended to a highly scalable algorithm by Andrews et al. (Reference Andrews, Ramsey and Cooper2019). Conditional Gaussian distributions are also adopted for structure learning of undirected graphs by Lee and Hastie (Reference Lee, Hastie, Carvalho and Ravikumar2013) and Cheng et al. (Reference Cheng, Li, Levina and Zhu2017) who implement penalized likelihoods and regression models with weighted lasso penalties, respectively. In a Bayesian setting, Bhadra et al. (Reference Bhadra, Rao and Baladandayuthapani2018) propose a unified framework for both categorical and Gaussian data based on Gaussian scale mixtures.

One main difficulty in developing statistical models for general mixed-type data is related to the non-standard joint support of the available variables. Typically however, interest lies in estimating dependence parameters of the joint distribution, corresponding to a network structure or correlation-type measures, rather than parameters indexing the marginal distributions of the variables. In this context, copula models, which allow to model the two sets of parameters separately, can provide an effective solution for statistical inference of network models. In addition, semiparametric copula models lacks any parametric assumption on the marginal c.d.f.’s which are estimated through their empirical distributions (Hoff, Reference Hoff2007). Contributions to copula graphical modeling based on undirected graphs are provided by Dobra and Lenkoski (Reference Dobra and Lenkoski2011) and Mohammadi et al. (Reference Mohammadi, Abegaz, van den Heuvel and Wit2017).

1.3. Contribution and Structure of the Paper

We propose a novel methodology for structure learning of networks which applies to mixed data, i.e., comprising continuous, categorical as well as discrete and ordinal measurements. Specifically, we consider a Gaussian copula DAG model in which each observable is associated to a latent counterpart, and the dependence parameter (covariance matrix) among latent variables reflects the conditional independencies imposed by a directed acyclic graph. We consider a Bayesian framework and proceed by assigning suitable prior distributions to DAG structures and DAG-dependent parameters. Inference is carried out by implementing an MCMC scheme which approximates the posterior distribution over network structures and covariance matrices. The main contributions of the proposed method can be summarized as follows: (i) we introduce a Bayesian framework for the analysis of complex dependence relations in multivariate settings characterized by mixed data; (ii) we provide a coherent quantification of the uncertainty around the estimated network or features of interest such as directed links, and a full posterior distribution of the underlying dependence parameter (correlation matrix), possibly summarized by Bayesian Model Averaging (BMA) estimates; (iii) our model allows to incorporate prior knowledge of the underlying network in terms of a partial ordering of the variables or edge orientations that are known in advance, thus improving DAG identification and enhancing causal inference.

The rest of the paper is organized as follows. In Sect. 2, we introduce Gaussian graphical models based on DAGs and the copula DAG model that we adopt for the analysis of mixed data. Section 3 completes our Bayesian model formulation by assigning prior distributions to DAG structures and DAG-model parameters. We implement in Sect. 4 an MCMC scheme which approximates the posterior distribution of DAGs and parameters. Our method is evaluated through extensive simulation experiments in Sect. 5. Section 6 is devoted to empirical studies, including the analysis of well-being data from a social survey promoted by the United Nations and mental health data collected from a cohort of medical students. In Sect. 7, we finally provide a discussion together with possible extensions of the proposed method to heterogeneous settings and latent trait models. Additional simulation results, comparisons with alternative methods and examples of MCMC diagnostics of convergence are included in Appendix.

2. Model Specification

2.1. Directed Acyclic Graphs

A Directed Acyclic Graph (DAG) is a pair $D = (V, E)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}=(V,E)$$\end{document} consisting of a set of vertices (or nodes) $V = {1, \dots, q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V=\{1,\ldots ,q\}$$\end{document} and a set of directed edges $E \subseteq V \times V$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E \subseteq V \times V$$\end{document} . For any two nodes $u, v \in V$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u,v \in V$$\end{document} , we denote an edge from u to v as (u, v) or $u \to v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\rightarrow v$$\end{document} indifferently; also, the set E is such that if $(u, v) \in E$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(u,v)\in E$$\end{document} then $(v, u) \notin E$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(v,u)\notin E$$\end{document} . A sequence of nodes $(v_{1}, v_{2}, \dots, v_{k})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(v_1,v_2, \ldots ,v_k)$$\end{document} is a path if there exists $v_{1} \to v_{2} \to \dots \to v_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_1\rightarrow v_2\rightarrow \cdots \rightarrow v_k$$\end{document} in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} . We assume that $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} does not contain cycles, that is paths such that $v_{1} \equiv v_{k}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_1\equiv v_k$$\end{document} . For a given node $v \in V$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v\in V$$\end{document} we let ${pa}_{D} (v)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {pa}_{\mathcal {D}}(v)$$\end{document} be the set of parents of v in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} , i.e., the set of all nodes u such that $(u, v) \in E$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(u,v)\in E$$\end{document} . Moreover, we say that u is a descendant of v if there exists a path from v to u; by converse, v is an ancestor of u. The set of all descendants and ancestors of a node v in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} are ${de}_{D} (v)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {de}_{\mathcal {D}}(v)$$\end{document} and ${an}_{D} (v)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {an}_{\mathcal {D}}(v)$$\end{document} , respectively.

A DAG encodes a set of conditional independencies of the form $A ⊥ ⊥ B | C$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A\bot \hspace{-6pt}\bot B \,\vert \,C$$\end{document} , reading as “A and B are conditionally independent given C”, where A, B, C are disjoint subsets of the vertex set V. The set of all conditional independencies characterizing the DAG determines the DAG Markov property and can be read-off from the graph using graphical criteria such as d-separation (Pearl, Reference Pearl2000). In particular, each node is conditionally independent from its non-descendants given its parents. Simple examples are provided in Fig. 1, where using d-separation it is possible to show that $u ⊥ ⊥ z | v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \bot \hspace{-6pt}\bot z \,\vert \,v$$\end{document} in both $D_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_1$$\end{document} and $D_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_2$$\end{document} ; differently, $u ⊥ ⊥ z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \bot \hspace{-6pt}\bot z$$\end{document} in $D_{3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_3$$\end{document} meaning that u and z are marginally independent. We refer the reader to Lauritzen (Reference Lauritzen1996) for further notions on graph theory.

Figure 1 Three DAGs on the set of nodes $V = {u, v, z}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V=\{u,v,z\}$$\end{document} . $D_{1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_1$$\end{document} and $D_{2}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_2$$\end{document} encode the conditional independence $u ⊥ ⊥ z | v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \bot \hspace{-5pt}\bot z \,\vert \,v$$\end{document} . In $D_{3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_3$$\end{document} we instead have $u ⊥ ⊥ z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \bot \hspace{-5pt}\bot z$$\end{document} .

2.2. Gaussian DAG Models

Let $D = (V, E)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}=(V,E)$$\end{document} be a DAG and $z = {(Z_{1}, \dots, Z_{q})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{z}=(Z_1,\ldots ,Z_q)^{\top }$$\end{document} a collection of q real-valued random variables, each associated with a node in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} , and with joint p.d.f. $f (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\cdot )$$\end{document} . We assume that

(1)

\begin{matrix} Z_{1}, \dots, Z_{q} | Ω, D \sim N_{q} (0, Ω^{- 1}), Ω \in P_{D}, \end{matrix}

where $Ω$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }$$\end{document} is the precision matrix (inverse of the covariance matrix $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} ) and $P_{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {P}_{\mathcal {D}}$$\end{document} denotes the set of all symmetric positive definite (s.p.d.) precision matrices Markov w.r.t. DAG $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} . Accordingly, we impose to $Ω$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }$$\end{document} the conditional independencies encoded by $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} that are deducible from d-separation (Sect. 2.1).

An equivalent representation of Model (1), useful for later developments, is given by the allied Structural Equation Model (SEM). To this end, let $D = diag (D_{11}, \dots, D_{qq})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}=\text {diag}(\varvec{D}_{11},\ldots ,\varvec{D}_{qq})$$\end{document} and $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} a (q, q) matrix of (regression) coefficients with diagonal elements equal to 1 and (u, v)-element $L_{u, v} \neq 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}_{u,v}\ne 0$$\end{document} if and only if $u \to v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\rightarrow v$$\end{document} in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} . Nonzero elements of $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} correspond to directed links between nodes, while zero entries to missing edges in the DAG. Accordingly, $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} resembles the DAG structure and the set of all parent–child relations characterizing its Markov property. A Gaussian SEM can be written as

(2)

\begin{matrix} L^{⊤} z = ε, ε \sim N_{q} (0, D), \end{matrix}

where $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} corresponds to the covariance matrix of the error terms, which is assumed to be diagonal and collects the conditional variances of $(Z_{1}, \dots, Z_{q})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(Z_1,\ldots ,Z_q)$$\end{document} . Moreover, Eq. (2) implies $V ar (z) = Σ = L^{- ⊤} D L^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {V}\!\text {ar}(\varvec{z})=\varvec{\Sigma }=\varvec{L}^{-\top }\varvec{D}\varvec{L}^{-1}$$\end{document} ; equivalently, $Ω = L D^{- 1} L^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }=\varvec{L}\varvec{D}^{-1}\varvec{L}^{\top }$$\end{document} . The latter decomposition provides a re-parameterization of $Ω$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }$$\end{document} in terms of $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} . From (2) we have, for each $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots , q$$\end{document} , $Z_{j} = - L_{≺ j]}^{⊤} z_{{pa}_{D} (j)} + ε_{j},$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_j = -\varvec{L}_{\prec j\,]}^{\top }\varvec{z}_{\text {pa}_{\mathcal {D}}(j)} + \varepsilon _j, $$\end{document} with $ε_{j} \sim N (0, D_{jj})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon _j\sim \mathcal {N}(0,\varvec{D}_{jj})$$\end{document} , where $≺ j] = {pa}_{D} (j) \times j$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\prec j\,]\,=\text {pa}_{\mathcal {D}}(j)\times j$$\end{document} and $L_{A \times B}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}_{A\times B}$$\end{document} denotes the sub-matrix of $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} with elements belonging to rows and columns indexed by A and B, respectively. Each equation above resembles the structure of a linear regression model for variable $Z_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_j$$\end{document} , with $- L_{≺ j]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\varvec{L}_{\prec j\,]}$$\end{document} corresponding to the regression coefficients associated with variables in $z_{{pa}_{D} (j)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{z}_{\text {pa}_{\mathcal {D}}(j)}$$\end{document} , namely the parents of node/variable $Z_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_j$$\end{document} ; see also Sect. 3.2. for a comparison with the Bayesian analysis of normal linear regression models. Accordingly, Model (1) can be equivalently written as

(3)

\begin{matrix} f (z_{1}, \dots, z_{q} | D, L, D) = \prod_{j = 1}^{q} d N (z_{j} | - L_{≺ j]}^{⊤} z_{{pa}_{D} (j)}, D_{jj}), \end{matrix}

where $d N (\cdot | μ, σ^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d\,\mathcal {N}(\cdot \,\vert \,\mu ,\sigma ^2)$$\end{document} denotes the p.d.f. of a univariate $N (μ, σ^{2})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}(\mu ,\sigma ^2)$$\end{document} . Finally, given n i.i.d. samples from (3), $z_{i} = {(z_{i, 1}, \dots, z_{i, q})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{z}_i=(z_{i,1},\ldots ,z_{i,q})^{\top }$$\end{document} , $i = 1, \dots, n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\ldots ,n$$\end{document} , collected in the (n, q) matrix $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} (row-binding of the $z_{i}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{z}_i$$\end{document} ’s), the likelihood function can be written as

(4)

\begin{matrix} f (Z | D, L, D) = \prod_{i = 1}^{n} (\prod_{j = 1}^{q} d N (z_{i, j} | - L_{≺ j]}^{⊤} z_{i, {pa}_{D} (j)}, D_{jj})) . \end{matrix}

2.3. Copula DAG Models

Consider now a collection of q random variables, $X_{1}, \dots, X_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,\ldots ,X_q$$\end{document} , comprising binary, ordinal, continuous or count variables, each with marginal cumulative distribution function (c.d.f.) $F_{j} (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_j(\cdot )$$\end{document} , $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} . In what follows we will consider a collection of n q-dimensional observations from $X_{1}, \dots, X_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,\ldots ,X_q$$\end{document} . To model these mixed data we need to specify a joint distribution for $X_{1}, \dots, X_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,\ldots , X_q$$\end{document} that we define through a Gaussian copula DAG model. Specifically, let $Z_{1}, \dots, Z_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_1,\ldots ,Z_q$$\end{document} be a collection of q latent random variables with joint Gaussian distribution as in (1), We establish a link between each observed variable $X_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document} and its latent counterpart $Z_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_j$$\end{document} by assuming that

(5)

\begin{matrix} X_{j} = F_{j}^{- 1} (Φ (Z_{j})), \end{matrix}

where $F_{j}^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F^{-1}_j$$\end{document} is the (pseudo) inverse c.d.f. of $X_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document} and $Φ (Z_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi (Z_j)$$\end{document} the c.d.f. of a standard Normal distribution. The joint c.d.f. of $X_{1}, \dots, X_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,\ldots ,X_q$$\end{document} can be written as

(6)

\begin{matrix} P (X_{1} \leq x_{1}, \dots, X_{q} \leq x_{q} | Ω, F_{1}, \dots, F_{q}) \\ = Φ_{q} (Φ^{- 1} (F_{1} (x_{1})), \dots, Φ^{- 1} (F_{q} (x_{q})) | Ω), \end{matrix}

where $Φ_{q} (\cdot | Ω)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi _q(\cdot \,\vert \,\varvec{\Omega })$$\end{document} denotes the c.d.f. of $N_{q} (0, Ω^{- 1})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_q(\varvec{0},\varvec{\Omega }^{-1})$$\end{document} in (1). Also notice that Model (6) depends on the marginal distributions $F_{1}, \dots, F_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1,\ldots ,F_q$$\end{document} and (although not emphasized in the equation) their parameters which would need to be “estimated”. A semiparametric estimation strategy would replace $F_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_j$$\end{document} with the corresponding empirical estimates ${\hat{F}}_{j} (k_{j}) = n^{- 1} \sum_{i = 1}^{n} 1 (x_{i, j} < k_{j}),$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{F}_j(k_j)=n^{-1}\sum _{i=1}^{n}\mathbbm {1}(x_{i,j}<k_j), $$\end{document} where $k_{j} \in unique {x_{1, j}, \dots, x_{n, j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_j\in \text {unique}\{x_{1,j},\ldots ,x_{n,j}\}$$\end{document} . A comparison with a parametric strategy based on probabilistic-model assumptions for the marginal c.d.f.’s is instead provided in Appendix.

As an alternative to the estimation procedures above, Hoff (Reference Hoff2007) proposes a rank-based nonparametric approach, that we also employ in our methodology. Specifically, let $x_{i} = {(x_{i, 1}, \dots, x_{i, q})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}_i=(x_{i,1},\ldots ,x_{i,q})^{\top }$$\end{document} , $i = 1, \dots, n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\ldots ,n$$\end{document} , be n i.i.d. samples from (6) and $X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}$$\end{document} the (n, q) data matrix. Since the $F_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_j$$\end{document} ’s are non-decreasing, for each pair of distinct observations $x_{i, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i,j}$$\end{document} and $x_{l, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{l,j}$$\end{document} , if $x_{i, j} < x_{l, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i,j}<x_{l,j}$$\end{document} then $z_{i, j} < z_{l, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{i,j}<z_{l,j}$$\end{document} . Therefore, observing $X$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{X}$$\end{document} implies that the latent data $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} must lie in the set

(7)

\begin{matrix} A (X) = (Z \in R^{n \times q} : max (z_{k, j} : x_{k, j} < x_{i, j}) < z_{i, j} < max (z_{k, j} : x_{i, j} < x_{k, j})) \end{matrix}

and one can take the occurrence of such an event as the data. Thus, the extended rank likelihood (Hoff, Reference Hoff2007) can be written as

(8)

\begin{matrix} p (Z \in A (X) | D, L, D) = \int_{A (X)} f (Z | D, L, D) d Z, \end{matrix}

with $f (Z | D, L, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\varvec{Z}\,\vert \,\varvec{D}, \varvec{L}, \mathcal {D})$$\end{document} as in Eq. (4).

Our model formulation assumes that a DAG Markov property holds at a latent-variable stage, namely between $Z_{1}, \dots, Z_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_1,\ldots ,Z_q$$\end{document} , in force of factorization (3). Through the copula-transfer link (5), this translates into a Markov property for $X_{1}, \dots, X_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,\ldots ,X_q$$\end{document} , provided that all the marginal distributions are continuous (Liu et al., Reference Liu, Lafferty and Wasserman2009). The presence of non-continuous variables (e.g., ordinal, discrete) might induce additional dependencies among the observed variables (w.r.t. those included in the latent model); however, such dependencies can be regarded as having a secondary relevance since they emerge from the marginals, rather than from the joint distribution; see in particular Dobra & Lenkoski (Reference Dobra and Lenkoski2011, Section 4) for a related discussion.

3. Bayesian Inference

We complete the Gaussian copula DAG model introduced in the previous section by assigning prior distributions to parameters $(D, L, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L},\mathcal {D})$$\end{document} . Since parameters $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} are DAG-dependent, as they satisfy structural constraints imposed by the DAG (Sect. 2.2) we structure our prior as $p (D, L, D) = p (D, L | D) p (D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{D},\varvec{L},\mathcal {D})=p(\varvec{D},\varvec{L}\,\vert \,\mathcal {D})p(\mathcal {D})$$\end{document} .

3.1. Prior on $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document}

Let $S_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}_q$$\end{document} be the set of all DAG structures on q nodes. In many applied problems, exact knowledge about the orientation of some edges, whenever present in the graph, is typically available and accordingly one would like to incorporate such information in the model. Without loss of generality, let $S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}_q^C$$\end{document} be the set of all DAGs on q nodes satisfying a set of structural constraints C, corresponding to edge orientations that are known in advance (equivalently, a set of “reversed” edge orientations that are forbidden). As an example, suppose node u represents a response variable of interest, so that there are no outgoing edges from u (equivalently, u cannot have children). In such a case we have $C = {u ↛ v | v = 1, \dots, q, v \neq u}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C=\{u\not \rightarrow v \,\vert \,v=1,\ldots ,q, v \ne u\}$$\end{document} and accordingly an edge between u and v whenever present will be oriented as $v \to u$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v\rightarrow u$$\end{document} . See also Sect. 6 for examples on real-data. We assign a prior to DAGs belonging to $S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}_q^C$$\end{document} as follows.

For a given DAG $D = (V, E) \in S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}=(V,E)\in \mathcal {S}_q^C$$\end{document} , let $S^{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}^{\mathcal {D}}$$\end{document} be the $0 - 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0-1$$\end{document} adjacency matrix of its skeleton, that is the underlying undirected graph obtained after removing the orientation of all its edges. For each (u, v)-element of $S^{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}^{\mathcal {D}}$$\end{document} , we have $S_{u, v}^{D} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}_{u,v}^{\mathcal {D}}=1$$\end{document} if and only if $(u, v) \in E$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(u,v)\in E$$\end{document} or $(v, u) \in E$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(v,u)\in E$$\end{document} , zero otherwise. Conditionally on a prior probability of inclusion $π \in (0, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi \in (0,1)$$\end{document} we assume for each $u > v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u>v$$\end{document} , $S_{u, v}^{D} | π \overset{iid}{\sim} Ber (π),$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varvec{S}_{u,v}^{\mathcal {D}} \,\vert \,\pi \overset{\text {iid}}{\sim }\text {Ber}(\pi ), $$\end{document} which implies

(9)

\begin{matrix} p (S^{D} | π) = π^{| S^{D} |} {(1 - π)}^{\frac{q (q - 1)}{2} - | S^{D} |}, \end{matrix}

where $| S^{D} |$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\varvec{S}^{\mathcal {D}}|$$\end{document} is the number of edges in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} (equivalently in its skeleton) and $q (q - 1) / 2$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q(q-1)/2$$\end{document} is the maximum number of edges in a DAG on q nodes. We then assume $π \sim Beta (c, d)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi \sim \text {Beta}(c,d)$$\end{document} , so that, by integrating out $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} , the resulting prior on $S^{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{S}^{\mathcal {D}}$$\end{document} is

(10)

\begin{matrix} p (S^{D}) = \frac{Γ (|, S^{D}, | + c) Γ (\frac{q (q - 1)}{2} - | S^{D} | + d)}{Γ (\frac{q (q - 1)}{2} + c + d)} \cdot \frac{Γ (c + d)}{Γ (c) Γ (d)} . \end{matrix}

Finally, we set $p (D) \propto p (S^{D})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ p(\mathcal {D})\propto p(\varvec{S}^{\mathcal {D}})$$\end{document} for each $D \in S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}\in \mathcal {S}_q^C $$\end{document} . See also Scott and Berger (Reference Scott and Berger2010) for a comparison with multiplicity correction priors adopted in a linear model selection setting. Hyperparameters c and d can be chosen to reflect a prior knowledge of sparsity in the graph, whenever available; in particular any choice $c < d$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c < d$$\end{document} will imply $E (π) < 0.5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}(\pi )<0.5$$\end{document} , thus favoring sparse graphs. The default choice $c = d = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c=d=1$$\end{document} , which corresponds to $π \sim Unif (0, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi \sim \text {Unif}(0,1)$$\end{document} , can be instead adopted in the absence of substantive prior information.

3.2. Prior on $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{D}},{\varvec{L}})$$\end{document}

Conditionally on DAG $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} we assign a prior to $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} through a DAG-Wishart distribution with position hyperparameter $U$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}$$\end{document} (a (q, q) s.p.d. matrix) and shape hyperparameter $a^{D} = {(a_{1}^{D}, \dots, a_{q}^{D})}^{⊤}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a^{\mathcal {D}}=(a_1^{\mathcal {D}},\ldots ,a_q^{\mathcal {D}})^{\top }$$\end{document} . The DAG-Wishart distribution (Cao et al., Reference Cao, Khare and Ghosh2019) provides a conjugate prior for the Gaussian DAG model (3). Accordingly, conditionally on the latent data $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} , the posterior distribution of $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} as well as the marginal likelihood of the model are available in closed form; see Sect. 4 for more details. A feature of the DAG-Wishart distribution is also that node-parameters ${(D_{jj}, L_{≺ j]}), j = 1, \dots, q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\big \{ (\varvec{D}_{jj}, \varvec{L}_{\prec j\,]} ), \, j=1,\ldots ,q \big \}$$\end{document} are a priori independent with distribution

(11)

\begin{matrix} D_{jj} | D & \sim I-Ga (\frac{1}{2} a_{j}^{D}, \frac{1}{2} U_{j | {pa}_{D} (j)}), \\ L_{≺ j]} | D_{jj}, D & \sim N_{| {pa}_{D} (j) |} (- U_{≺ j ≻}^{- 1} U_{≺ j]}, D_{jj} U_{≺ j ≻}^{- 1}), \end{matrix}

where $I-Ga (α, β)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {I-Ga}(\alpha ,\beta )$$\end{document} stands for an Inverse-Gamma distribution with shape $α > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha >0$$\end{document} and rate $β > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta >0$$\end{document} having expectation $β / (α - 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta /(\alpha -1)$$\end{document} ( $α > 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha >1$$\end{document} ). Moreover, $U_{j | {pa}_{D} (j)} = U_{jj} - U_{[j ≻} U_{≺ j ≻}^{- 1} U_{≺ j]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}_{j\,\vert \,\text {pa}_{\mathcal {D}}(j)} = \varvec{U}_{jj}-\varvec{U}_{[\, j\succ } \varvec{U}^{-1}_{\prec j\succ } \varvec{U}_{\prec j\,]}$$\end{document} , with $≺ j] = {pa}_{D} (j) \times j$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\prec j\,] = \text {pa}_{\mathcal {D}}(j)\times j$$\end{document} , $[j ≻ = j \times {pa}_{D} (j)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[\, j\succ \, = j \times \text {pa}_{\mathcal {D}}(j)$$\end{document} , $≺ j ≻ = {pa}_{D} (j) \times {pa}_{D} (j)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\prec j\succ \, = \text {pa}_{\mathcal {D}}(j)\times \text {pa}_{\mathcal {D}}(j)$$\end{document} .

With regard to hyperparameters $a_{1}^{D}, \dots, a_{q}^{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1^{\mathcal {D}},\ldots ,a_q^{\mathcal {D}}$$\end{document} we also consider the default choice $a_{j}^{D} = a + | {pa}_{D} (j) | - q + 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_j^{\mathcal {D}}=a+|\text {pa}_{\mathcal {D}}(j)|-q+1$$\end{document} $(a > q - 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(a>q-1)$$\end{document} which guarantees compatibility (same marginal likelihood) among prior distributions for Markov equivalent DAGs; see also Peluso and Consonni (Reference Peluso and Consonni2020). Finally, the prior on $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} is given by

(12)

\begin{matrix} p (D, L | D) = \prod_{j = 1}^{q} p (L_{≺ j]} | D_{jj}) p (D_{jj}) . \end{matrix}

A DAG-Wishart prior on $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} implicitly assigns (independent) Normal-Inverse-Gamma distributions to each pair of node-parameters $(D_{jj}, L_{≺ j]})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D}_{jj},\varvec{L}_{ \prec j\,]})$$\end{document} , a conditional variance and vector-regression coefficient for the j-th term of the SEM, as in the standard conjugate Bayesian analysis of a normal linear regression model. Moreover, under the default choice $U = g I_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}=g\varvec{I}_q$$\end{document} (Sect. 5), with $g > 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g>0$$\end{document} and $I_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{I}_q$$\end{document} the (q, q) identity matrix, it is easy to show that $E (L_{≺ j]} | D_{jj}, D) = 0$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}(\varvec{L}_{ \prec j\,]}\,\vert \,\varvec{D}_{jj}, \mathcal {D})=\varvec{0}$$\end{document} and $V ar (L_{≺ j]} | D_{jj}, D) = D_{jj} / g I_{| {pa}_{D} (j) |}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {V}\!\text {ar}(\varvec{L}_{ \prec j\,]}\,\vert \,\varvec{D}_{jj}, \mathcal {D})=\varvec{D}_{jj}/g\varvec{I}_{|\text {pa}_{\mathcal {D}}(j)|}$$\end{document} for each $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} , so that priors on regression coefficients are centered at zero and with diagonal covariance matrix reflecting an assumption of prior independence across elements of $L_{≺ j]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}_{ \prec j\,]}$$\end{document} ; moreover, smaller values of g make such prior less informative; we refer to Sect. 5 for details about the choice of hyperparameters.

4. Computational Implementation and Posterior Inference

Our target is the joint posterior of $(D, L, D, Z)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L},\mathcal {D},\varvec{Z})$$\end{document} , namely

(13)

\begin{matrix} p (D, L, D, Z | X) \propto p (Z \in A (X) | D, L, D) p (D, L | D) p (D), \end{matrix}

where $p (Z \in A (X) | D, L, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{Z}\in A(\varvec{X})\,\vert \,\varvec{D},\varvec{L},\mathcal {D})$$\end{document} is the extended rank likelihood in (8), $p (D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\mathcal {D})$$\end{document} and $p (D, L | D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{D},\varvec{L}\,\vert \,\mathcal {D})$$\end{document} the priors on DAG $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} and DAG-parameters $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} introduced in Sect. 3, respectively. An MCMC scheme targeting the posterior (13) can be constructed by iteratively sampling $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} , $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} and $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} from their full conditional distributions as we detail in the following.

4.1. Update of $(D, L, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{D}},{\varvec{L}},\mathcal {D})$$\end{document}

The joint full conditional distribution of $(D, L, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L},\mathcal {D})$$\end{document} is given by

(14)

\begin{matrix} p (D, L, D | X, Z) = p (D, L, D | Z) \propto f (Z | D, L, D) p (D, L | D) p (D) . \end{matrix}

Conditionally on the latent data $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} , we can sample from $p (D, L, D | Z)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\varvec{D},\varvec{L},\mathcal {D}\,\vert \,\varvec{Z})$$\end{document} using the MCMC scheme for posterior inference of Gaussian DAGs presented in Castelletti and Consonni (Reference Castelletti and Consonni2021, Supplementary Material). The latter consists of a partial analytic structure (PAS) algorithm (Godsill, Reference Godsill2012) based on the two following steps.

Given DAG $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} , parameters $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L})$$\end{document} are first sampled from their full conditional distribution, which, because of conjugacy of the DAG-Wishart prior with the distribution of the latent data, is such that, for $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} ,

(15)

\begin{matrix} D_{jj} | D, Z & \sim I-Ga (\frac{1}{2} {\tilde{a}}_{j}^{D}, \frac{1}{2} {\tilde{U}}_{j | {pa}_{D} (j)}), \\ L_{≺ j]} | D_{jj}, D, Z & \sim N_{| {pa}_{D} (j) |} (- {\tilde{U}}_{≺ j ≻}^{- 1} {\tilde{U}}_{≺ j]}, D_{jj} {\tilde{U}}_{≺ j ≻}^{- 1}), \end{matrix}

with ${\tilde{a}}_{j}^{D} = \tilde{a} + | {pa}_{D} (j) | - q + 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{a}_j^{\mathcal {D}} = \widetilde{a}+|\text {pa}_{\mathcal {D}}(j)|-q+1$$\end{document} , $\tilde{a} = a + n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{a} = a+n$$\end{document} and $\tilde{U} = U + Z^{⊤} Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{\varvec{U}} = \, \varvec{U}+ \varvec{Z}^\top \varvec{Z}$$\end{document} .

Next, update of DAG $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} is performed through a Metropolis Hastings step in which a DAG $D^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}^*$$\end{document} is drawn from a proposal distribution $q (D^{*} | D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q(\mathcal {D}^*\,\vert \,\mathcal {D})$$\end{document} . For a given DAG $D \in S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}\in \mathcal {S}_q^C$$\end{document} , the adopted proposal distribution is built upon the set of all DAGs belonging to $S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}_q^C$$\end{document} that can be reached from $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} through insertion, deletion or reversal of an edge in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} . Specifically, we construct the set of all direct successors of $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} , $O_{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}_{\mathcal {D}}$$\end{document} , and then draw uniformly a DAG $D^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}^*$$\end{document} from $O_{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {O}_{\mathcal {D}}$$\end{document} . It follows that $q (D^{*} | D) = 1 / | O_{D} |$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q(\mathcal {D}^*\,\vert \,\mathcal {D})=1/|\mathcal {O}_{\mathcal {D}}|$$\end{document} . Each proposed DAG then differs locally by a single edge (u, v) which is inserted (a), deleted (b) or reversed (c) in $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} . It can be shown that the acceptance probability for $D^{*}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}^*$$\end{document} under a PAS algorithm is given by $α_{D^{*}} = min {1; r_{D^{*}}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\mathcal {D}^*}=\min \{1;r_{\mathcal {D}^*}\}$$\end{document} with

(16)

\begin{matrix} r_{D^{*}} = (\begin{matrix} \frac{m_{} (Z_{v} | Z_{{pa}_{D^{*}} (v)}, D^{*})}{m_{} (Z_{v} | Z_{{pa}_{D} (v)}, D)} \cdot \frac{p (D^{*})}{p (D)} \cdot \frac{q (D | D^{*})}{q (D^{*} | D)} \\ if (a) or (b) \\ \frac{m_{} (Z_{v} | Z_{{pa}_{D^{*}} (v)}, D^{*})}{m_{} (Z_{v} | Z_{{pa}_{D} (v)}, D)} \cdot \frac{m_{} (Z_{u} | Z_{{pa}_{D^{*}} (u)}, D^{*})}{m_{} (Z_{u} | Z_{{pa}_{D} (u)}, D)} \cdot \frac{p (D^{*})}{p (D)} \cdot \frac{q (D | D^{*})}{q (D^{*} | D)} \\ if (c), \end{matrix}) \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} r_{\mathcal {D}^*}\,=\, \left\{ \begin{array}{rl} \dfrac{m_{}(\varvec{Z}_v\,\vert \,\varvec{Z}_{\text {pa}_{\mathcal {D}^*}(v)}, \mathcal {D}^*)}{m_{}(\varvec{Z}_v\,\vert \,\varvec{Z}_{\text {pa}_{\mathcal {D}}(v)}, \mathcal {D})} \cdot \dfrac{p(\mathcal {D}^*)}{p(\mathcal {D})} \cdot \dfrac{q(\mathcal {D}\,\vert \,\mathcal {D}^*)}{q(\mathcal {D}^*\,\vert \,\mathcal {D})} &{} \quad \\ \text { if } (a) \text { or } (b) \\ \dfrac{m_{}(\varvec{Z}_v\,\vert \,\varvec{Z}_{\text {pa}_{\mathcal {D}^*}(v)}, \mathcal {D}^*)}{m_{}(\varvec{Z}_v\,\vert \,\varvec{Z}_{\text {pa}_{\mathcal {D}}(v)}, \mathcal {D})} \cdot \dfrac{m_{}(\varvec{Z}_u\,\vert \,\varvec{Z}_{\text {pa}_{\mathcal {D}^*}(u)}, \mathcal {D}^*)}{m_{}(\varvec{Z}_u\,\vert \,\varvec{Z}_{\text {pa}_{\mathcal {D}}(u)}, \mathcal {D})} \cdot \dfrac{p(\mathcal {D}^*)}{p(\mathcal {D})} \cdot \dfrac{q(\mathcal {D}\,\vert \,\mathcal {D}^*)}{q(\mathcal {D}^*\,\vert \,\mathcal {D})} &{} \quad \\ \text { if } (c), \end{array} \right\} \end{aligned}$$\end{document}

where

\begin{matrix} m (Z_{v} | Z_{{pa}_{D} (v)}, D) = {(2 π)}^{- \frac{n}{2}} \cdot \frac{| U_{≺ v ≻} |^{\frac{1}{2}}}{| {\tilde{U}}_{≺ v ≻} |^{\frac{1}{2}}} \cdot \frac{Γ (\frac{1}{2}, {\tilde{a}}_{v}^{D})}{Γ (\frac{1}{2}, a_{v}^{D})} \cdot \frac{(\frac{1}{2} U_{v | {pa}_{D} (v)})^{\frac{1}{2} a_{v}^{D}}}{(\frac{1}{2} {\tilde{U}}_{v | {pa}_{D} (v)})^{\frac{1}{2} {\tilde{a}}_{v}^{D}}} \end{matrix}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} m\left( \varvec{Z}_v\,\vert \,\varvec{Z}_{\text {pa}_{\mathcal {D}}(v)}, \mathcal {D}\right) = (2\pi )^{-\frac{n}{2}}\cdot \frac{\big |\varvec{U}_{\prec v \succ }\big |^{\frac{1}{2}}}{\big |\widetilde{\varvec{U}}_{\prec v \succ }\big |^{\frac{1}{2}}}\cdot \frac{\Gamma \left( \frac{1}{2}\widetilde{a}_v^{\mathcal {D}}\right) }{\Gamma \left( \frac{1}{2}a_v^{\mathcal {D}}\right) }\cdot \frac{\Big (\frac{1}{2}\varvec{U}_{v\,\vert \,\text {pa}_{\mathcal {D}}(v)}\Big )^{\frac{1}{2}a_v^{\mathcal {D}}}}{\Big (\frac{1}{2}\widetilde{\varvec{U}}_{v\,\vert \,\text {pa}_{\mathcal {D}}(v)}\Big )^{\frac{1}{2}\widetilde{a}_v^{\mathcal {D}}}} \end{aligned}$$\end{document}

is the marginal (i.e., integrated w.r.t. $L_{≺ v]}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}_{\prec v\,]}$$\end{document} and $D_{vv}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}_{vv}$$\end{document} ) data distribution relative to the conditional density of $Z_{v}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_v$$\end{document} given $z_{{pa}_{D} (v)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{z}_{\text {pa}_{\mathcal {D}}(v)}$$\end{document} appearing in (4). We refer the reader to Castelletti and Mascaro (Reference Castelletti and Mascaro2022) for further computational details.

4.2. Update of $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Z}}$$\end{document}

Since the latent data $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} are known only relative to the event $A (X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A(\varvec{X})$$\end{document} (Eq. 7), we also need to sample them from their full conditional distribution. The latter is given by

(17)

\begin{matrix} p (Z | D, L, D, X) \propto p (Z \in A (X) | D, L, D) . \end{matrix}

Also, by exploiting the structure of the extended rank likelihood and the set $A (X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A(\varvec{X})$$\end{document} in (8) and (7), respectively, the conditional distribution of $Z_{i, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{i,j}$$\end{document} corresponds to a $N (z_{i, j} | - L_{≺ j]}^{⊤} z_{i, pa (j)}, D_{jj})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}(z_{i,j}\,\vert \,-\varvec{L}_{ \prec j\,]}^\top \varvec{z}_{i,\text {pa}(j)}, \varvec{D}_{jj})$$\end{document} truncated at $(z_{i, j}^{(l)}, z_{i, j}^{(u)})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\big (z_{i,j}^{(l)}, z_{i,j}^{(u)}\big )$$\end{document} , where $z_{i, j}^{(l)} = max {z_{k, j} : x_{k, j} < x_{i, j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{i,j}^{(l)}=\text {max}\{z_{k,j}:x_{k,j}<x_{i,j}\}$$\end{document} and $z_{i, j}^{(u)} = max {z_{k, j} : x_{i, j} < x_{k, j}}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{i,j}^{(u)}=\text {max}\{z_{k,j}:x_{i,j}<x_{k,j} \}$$\end{document} . Therefore, conditionally of the current values of $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}$$\end{document} and $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} , update of the (n, q) data matrix $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} can be performed by drawing each latent observation $z_{i, j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{i,j}$$\end{document} , $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} , $i = 1, \dots, n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\ldots ,n$$\end{document} , from its corresponding truncated-Normal conditional distribution.

4.3. Posterior Inference

The output of our MCMC scheme is a collection of DAGs ${D^{(s)}}_{s = 1}^{S}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\big \{\mathcal {D}^{(s)}\big \}_{s=1}^S$$\end{document} and DAG-parameters ${(D^{(s)}, L^{(s)})}_{s = 1}^{S}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\big \{\big (\varvec{D}^{(s)}, \varvec{L}^{(s)}\big )\big \}_{s=1}^S$$\end{document} approximately sampled from (13), where S is the number of fixed MCMC iterations. An approximate posterior distribution over the space of DAGs can be obtained by computing, for each $D \in S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}\in \mathcal {S}_q^C$$\end{document} ,

(18)

\begin{matrix} \hat{p} (D | X) = \frac{1}{S} \sum_{s = 1}^{S} 1 (D^{(s)} = D), \end{matrix}

which approximates the posterior probability of each DAG structure through its MCMC frequency of visits. As a summary of the previous output, we can also recover a (q, q) matrix of (marginal) posterior probabilities of edge inclusion, whose (u, v)-element corresponds to

(19)

\begin{matrix} \hat{p} (u \to v | X) \equiv {\hat{p}}_{u \to v} = \frac{1}{S} \sum_{s = 1}^{S} 1_{u \to v} {D^{(s)}}, \end{matrix}

an MCMC frequency-based estimated posterior probability of $u \to v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \rightarrow v$$\end{document} , where $1_{u \to v} {D^{(s)}} = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbbm {1}_{u\rightarrow v}\big \{\mathcal {D}^{(s)}\big \}=1$$\end{document} if $D^{(s)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}^{(s)}$$\end{document} contains $u \to v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\rightarrow v$$\end{document} , 0 otherwise. From the previous quantities, a single graph estimate summarizing the entire MCMC output can be also recovered. Specifically, by fixing a threshold for edge inclusion $k \in (0, 1)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \in (0,1)$$\end{document} , a DAG estimate can be obtained by including those edges whose posterior probability of inclusion in (19) exceeds k. When $k = 0.5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=0.5$$\end{document} , we name the resulting estimate Median Probability DAG Model (MPM DAG), following the idea proposed by Barbieri and Berger (Reference Barbieri and Berger2004) in a linear regression context. Finally, we can recover a posterior summary of the covariance matrix $Σ$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }$$\end{document} through the corresponding Bayesian model averaging (Hoeting et al., Reference Hoeting, Madigan, Raftery and Volinsky1999, BMA) estimate

(20)

\begin{matrix} \hat{Σ} = \frac{1}{S} \sum_{s = 1}^{S} Σ^{(s)}, \end{matrix}

where for each $s = 1, \dots, S$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=1,\ldots ,S$$\end{document} , $Σ^{(s)} = (L^{(s)})^{- ⊤} D^{(s)} (L^{(s)})^{- 1}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Sigma }^{(s)}=\big (\varvec{L}^{(s)}\big )^{-\top }\varvec{D}^{(s)}\big (\varvec{L}^{(s)}\big )^{-1}$$\end{document} .

5. Simulation Study

In this section, we evaluate the performance of the proposed method through simulations.

5.1. Scenarios

We fix the number of variables $q = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q=20$$\end{document} and consider different simulation settings where data are generated by varying the following features:

(a): The class of DAG structure;
(b): The type of variables;
(c): The sample size.

With regard to (a), we consider three different classes of DAG structures:

(a.1): Free: no structural constraints imposed to the DAG;
(a.2): Regression: we fix each node $u \in {1, 2, 3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\in \{1,2,3\}$$\end{document} as a response, so that edges $u \to v$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\rightarrow v$$\end{document} , for $v \in V \ {1, 2, 3}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v \in V {\setminus } \{1,2,3\}$$\end{document} are not allowed;
(a.3): Block: nodes are partitioned into two sets of equal size A and B and only edges within each set or from set A to B are allowed.

Under each scenario defined by (a), $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} DAGs are generated by fixing a probability of edge inclusion $π = 0.1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi =0.1$$\end{document} for all edges which are allowed in the corresponding class of DAG structures (a). For each given DAG $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} , we generate the parameters of the underlying Gaussian DAG model in (3) by fixing $D = I_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{D}= \varvec{I}_q$$\end{document} , while drawing nonzero elements of $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} uniformly in the interval $[- 1, - 0.1] \cup [0.1, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1,-0.1] \cup [0.1,1]$$\end{document} . Latent observations collected in the (n, q) data matrix $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} are then generated according to (3) for different sample sizes $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \in \{100,200,500,1000,2000\}$$\end{document} (c). Starting from the latent data matrix $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} , observed data are finally generated as in (5) for different choices of the marginal c.d.f’s $F_{1}, \dots, F_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1,\ldots ,F_q$$\end{document} (b). Specifically, we consider the following assumptions:

(b.1): Binary: $X_{j} \sim Ber (η_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j\sim \text {Ber}(\eta _j)$$\end{document} , $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} , with $η_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _j$$\end{document} randomly drawn in [0.2, 0.8];
(b.2): Ordinal: $X_{j} \sim Binomial (θ_{j}, 5)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j\sim \text {Binomial}(\theta _j,5)$$\end{document} , $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} , with $θ_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _j$$\end{document} randomly drawn in [0.2, 0.8];
(b.3): Count: $X_{j} \sim Poisson (λ_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j\sim \text {Poisson}(\lambda _j)$$\end{document} , $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} , with $λ_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _j$$\end{document} randomly drawn in [1, 10];
(b.4): Mixed: $X_{1} \dots, X_{7}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1\ldots ,X_7$$\end{document} as in Binary; $X_{8}, \dots, X_{15}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_8,\ldots ,X_{15}$$\end{document} as in Ordinal; $X_{16}, \dots, X_{20}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{16},\ldots ,X_{20}$$\end{document} as in Count.

Each combination of (a), (b) and (c) defines a simulation scenario which consists of $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} (true) DAGs and allied datasets.

5.2. Results

We run our MCMC scheme to approximate the posterior distribution of DAG structures and parameters. In particular, under each scenario among Free, Regression, Block, we limit the DAG space $S_{q}^{C}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {S}_q^C$$\end{document} to those DAGs satisfying the constraints imposed by the corresponding class of DAG structures; see also Sect. 3. The number of iterations was fixed as $S =$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S=$$\end{document} 15,000 after some pilot simulations that were used to assess the convergence of the MCMC chain. We also set $U = g I_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{U}=g\varvec{I}_q$$\end{document} with $g = 1 / n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g = 1/n$$\end{document} , $a = q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a = q$$\end{document} in the DAG-Wishart prior (11), while $c = 1$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c = 1$$\end{document} , $d = 5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d = 5$$\end{document} in (10), which corresponds to a $Beta (1, 5)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Beta}(1,5)$$\end{document} prior on the probability of edge inclusion $π$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pi $$\end{document} and implies $E (π) = 0.1 \bar{6}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}(\pi )=0.1\bar{6}$$\end{document} . While this specific choice of c and d can favor sparsity in the graphs, some further analyses not reported for brevity showed that results are quite insensitive to the values of the two hyperparameters.

We now evaluate the performance of our method in recovering the true DAG structure. To this end, we first estimate the posterior probabilities of edge inclusion in (19) for each pair of distinct nodes u, v. Given a threshold for edge inclusion $k \in [0, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\in [0,1]$$\end{document} , we construct a graph estimate by including those edges (u, v) such that ${\hat{p}}_{u \to v} \geq k$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{p}_{u\rightarrow v} \ge k$$\end{document} . We compare the resulting graph with the true DAG by computing the sensitivity (SEN) and specificity (SPE) indexes, defined as

\begin{matrix} SEN = \frac{TP}{TP + FN}, SPE = \frac{TN}{TN + FP}, \end{matrix}

where TP, TN, FP, FN are the numbers of true positives, true negatives, false positives and false negatives, based on the adjacency matrix of the estimated graph. By varying the threshold k within a grid in [0, 1] and computing SEN and SPE for each value of k, we obtain a receiver operating characteristic (ROC) curve where each point corresponds to the values of SEN and $(1 - SPE)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-{\text {SPE}})$$\end{document} computed for a given threshold k. Moreover, under each scenario, an average (w.r.t. the $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulations) curve is constructed as follows. For each threshold k, we compute SEN and $(1 - SPE)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-{\text {SPE}})$$\end{document} under each of the 40 simulated DAGs and compute the average values of the two indexes. By repeating this procedure for each k we obtain a collection of points which are joined by the average ROC curve. Results for Scenario Mixed (b.4), different sample sizes (c) and types of DAG structures (a) are summarized in Fig. 2. We also proceed similarly to compute the 5th and 95th percentiles and obtain the blue band which is included in each plot of the same figure. As expected, the performance of the method in recovering the true DAG structure improves as the sample size n increases under all settings Free, Regression, Block.

As a single graph estimate summarizing the MCMC output we also consider the median probability (MPM) DAG model $\hat{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\mathcal {D}}$$\end{document} which is obtained by fixing the threshold for edge inclusion as $k = 0.5$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=0.5$$\end{document} . We compare each DAG estimate $\hat{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{\mathcal {D}}$$\end{document} with the corresponding true DAG by measuring the Structural Hamming Distance (SHD). The latter corresponds to the number of edge insertions, deletions or flips needed to transform the estimated DAG into the true DAG; accordingly, lower values of SHD correspond to better performances. Results for each simulation scenario are summarized in the box-plots of Fig. 3, where each plot reports the distribution (across the $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the ratio between SHD and the maximum number of edges in the graphs (SHD/edges) for a combination of (a) and (c) and increasing sample sizes n.

The same behavior observed in Fig. 2 for Scenario Mixed is even more apparent, with SHD decreasing at zero as n increases under all settings. In addition, DAG learning is more difficult in the Binary scenario, where the collected categorical data provide a limited source of information to estimate dependence relations. By converse, structural learning is much easier under the Ordinal and Count scenarios. The performance in the case of mixed data is somewhat intermediate w.r.t. the previous scenarios.

Figure 2 Simulations. Receiver operating characteristic (ROC) curve obtained under varying thresholds for the posterior probabilities of edge inclusion for type of variables Mixed (b.4), sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free, Regression, Block (a). Dotted lines represent the (average over the 40 simulated DAGs) ROC curve, while the blue area represents the 5th–95th percentile band.

Figure 3 Simulations. Distribution (across $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the relative structural Hamming distance (SHD/edges) between estimated and true DAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free, Regression, Block (a).

We finally summarize in Tables 1, 2, 3 and 4 the sensitivity (SEN) and specificity (SPE) indexes computed under the median probability DAG model estimate. Each table refers to one scenario among Binary, Ordinal, Count, Mixed and reports the average (w.r.t. the $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulations) percentage value of the two indexes under settings Free, Regression, Block corresponding to the three different classes of DAG structures. While SPE attains high levels even for small sample sizes, SEN significantly increases as n grows. As a consequence, the improvement in the performance observed in Fig. 3 is mainly due to a reduction in the inclusion of “false negative” edges as the number of available observations grows. In addition, it appears that DAG identification is somewhat easier under Scenarios Regression and Block. Here, structural constraints limiting the DAG space can help identifying dependence relations between variables which otherwise may not be uniquely identifiable because of DAG Markov equivalence (Andersson et al., Reference Andersson, Madigan and Perlman1997).

Table 1 Simulations. Binary data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

Table 2 Simulations. Ordinal data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

Table 3 Simulations. Count data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

Table 4 Simulations. Mixed data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

5.3. Simulation Experiments with Unbalanced Correlation Structure

In the simulation scenarios considered before we randomly drew the nonzero elements of matrix $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} , corresponding to regression coefficients in the latent linear SEM, uniformly in the symmetric interval $[- 1, - 0.1] \cup [0.1, 1]$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-1,-0.1]\cup [0.1,1]$$\end{document} . This implies an expected “balanced” correlation structure between variables, meaning that both positive and negative associations in the generating model will be present, and with same expected proportion. However, in many psychological applications variables are mostly positively correlated each other; see also our results in the data analyses of Sect. 6. An important example is represented by cognitive test scores, whose pattern of positive correlations was already advocated by Spearman (Reference Spearman1904) in his positive manifold theory. To assess the ability of our method to capture such “unbalanced” correlation structure, we now consider a random choice of the nonzero elements of $L$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{L}$$\end{document} in the interval [0.1, 1] and implement the same scenarios introduced in Sect. 5.1. Accordingly, all variables are now positively correlated at the latent level. Results, in terms of relative SHD between true and estimated DAG, are reported in Fig. 4, which summarizes the distribution of SHD across simulated datasets under all scenarios defined in Sect. 5.1. The performance of our method is in line with what we observed in Fig. 3 under a balanced setting, suggesting that our model specification is insensitive to the specific proportion of positive/negative correlations.

Figure 4 Simulations with unbalanced correlation structure. Distribution (across $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the relative structural Hamming distance (SHD/edges) between estimated and true DAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free, Regression, Block (a)

6. Real Data Analyses

6.1. Well-Being Data

The Your Work-Life Balance is a project promoted by the United Nations (UN) and implemented through a public survey available at http://www.authentic-happiness.com/your-life-satisfaction-score. Scope of the survey is to evaluate how people thrive in both professional and personal lives based on several indicators that are related with life satisfaction. Variables considered in the study are classified into five dimensions:

• Healthy body, features reflecting fitness and healthy habits;
• Healthy mind, indicating how well subjects embrace positive emotions;
• Expertise, measuring the ability to grow expertise and achieve something unique;
• Connection, assessing the strength of social relationships;
• Meaning, evaluating compassion, generosity and happiness.

The survey supports the UN Sustainable Development Goals (https://sdgs.un.org) and aims at providing insights on the determinants of human well-being. Accordingly, some questions of interest are the following:

• “What are the strongest correlations between the various dimensions?”
• “What are the best predictors of a balanced life?”

The complete dataset is publicly available at https://www.kaggle.com/datasets/ydalat/lifestyle-and-wellbeing-data. It includes observations collected across years 2015–2020 of 20 ordinal variables (with levels ranging in 1–5 or 1–10) each measuring closeness of a subject w.r.t. to one perceived dimension, besides gender (binary) and age (ordinal with four classes). We include in our analysis the $n = 459$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 459$$\end{document} observations available for year 2020. We consider variable stress as the response and accordingly allow edges from each of the remaining variables to the response only. In addition, age and gender are considered as objective features and accordingly cannot have incoming edges. We do not impose further constraints among the remaining variables in terms of edge directions that are known in advance.

Following the scope of the original project, we are interested in understanding how the various dimensions relate each other and what are the direct/indirect determinants of the perceived level of stress. To this end, we implement our MCMC algorithm for a number of iterations $S =$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S=$$\end{document} 20,000, after an initial burnin period of 5000 runs that are discarded from posterior analysis. Diagnostic tools based on multiple chains and graphical inspections of the behavior across iterations of sampled parameters were also adopted to assess the convergence of the MCMC; see also Appendix for details.

We use the MCMC output to provide an estimate of the posterior probabilities of edge inclusion as well as a BMA estimate of the correlation matrix between (latent) variables. Results are reported in the two heat maps of Fig. 5. The upper map, which collects the estimated posterior probabilities of edge inclusion clearly suggests a sparse structure in the underlying network, with only a few edges whose posterior probability exceeds 0.5. This also emerges from the graph estimate reported in Fig. 6, which corresponds to the CPDAG representing the equivalence class of the MPM DAG estimate.

Figure 5 Well-being data. Upper panel: Heat map with estimated posterior probabilities of edge inclusion $\hat{p} (u \to v | X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{p} (u\rightarrow v \,\vert \,\varvec{X})$$\end{document} for each edge (u, v). Lower panel: Estimated correlation matrix.

Figure 6 Well-being data. Estimated CPDAG.

The estimated graph reveals that two variables directly affect the perceived level of stress, namely shouting (“How often do you shout or sulk at somebody?”) and to do list (“how well do you complete your weekly to-do list?”). Moreover, variable stress is conditionally independent from flow (“How many hours you experience flow, i.e., you fell fully immersed in performing an activity?”) given to do list. Also, flow and to do list are positively correlated, which suggests that people performing better in their activities also follow through with many more of their weekly goals. This in turn has a direct impact on the perceived level of stress. It also appears that shouting is positively correlated with the level of stress, while there is negative correlation with to do list; accordingly better results in completing to-do-lists, imply a reduction in the stress level perceived by individuals.

6.2. Student Mental Health Data

We consider a dataset from a cross-sectional study conducted on 886 medical students from the University of Lausanne (Switzerland) and presented by Carrard et al. (Reference Carrard, Bourquin, Berney, Schlegel, Gaume, Bart, Preisig, Mast and Berney2022). The data collection was based on a longitudinal open cohort study survey. In particular, all medical students enrolled in the university were eligible for participation and received a link to an online questionnaire; we refer to the original paper for details about the survey design and the collected data. Target of the study was to provide insights on students’ well-being, in order to implement policies aimed at improving their academic-life satisfaction and conditions. A number of dimensions related to empathy are measured through self-reported questionnaires based on the Questionnaire of Cognitive and Affective Empathy (QCAE) and the Jefferson Scale of Physician Empathy (JSPE); related variables are the QCAE affective empathy score (qcae aff), the QCAE cognitive empathy score (qcae cog) and the JSPE total empathy score (jspe). Burnout is a state of emotional, physical, and mental exhaustion which is caused by excessive exposure to stress. The burnout dimension is measured through the Maslach Burnout Inventory-Student Survey (MBI-SS); the latter is based on 15 items and provides three scores evaluating the following dimensions: emotional exhaustion (mbi ex), cynicism (mbi cy), and academic efficacy (mbi ea). In addition, students’ anxiety and depression is measured through the Center for Epidemiologic Studies Depression (CESD) score and the State-Trait Anxiety Inventory (STAI) score, both based on a questionnaire with self-report items on Likert scales (cesd and stai, respectively). Finally, the dataset contains information on demographic factors such as age and gender, besides variables measuring job satisfaction (job), partnership status (part) and self-reported health status (health), represented by an ordinal variable with 5 categories corresponding to increasing levels of perceived health satisfaction. We again refer to Carrard et al. (Reference Carrard, Bourquin, Berney, Schlegel, Gaume, Bart, Preisig, Mast and Berney2022) for a detailed description of the complete dataset, which is publicly available at https://zenodo.org/record/5702895. We emphasize that the structure of the analyzed dataset is quite heterogeneous, as it collects binary, ordinal (with different ranges of levels) as well as continuous measurements simultaneously.

One specific aim of the original study is to identify how variables included in the survey, in particular depressive symptoms, anxiety, and burnout, are related to empathy and mental health. In our analysis, we consider age, sex, part and job as exogenous variables, while we regard health as a response of interest.

Our MCMC algorithm is implemented for S = 20000 iterations, after a burnin period of 5000 runs that we adopt to assess the convergence of the chain. We use the MCMC output to provide an estimate of the (marginal) posterior probability of inclusion (PPI) for each possible directed edge in the DAG space; see Eq. (19). The resulting heat-map with the collection of estimated PPIs (Fig. 7, upper plot) suggests the existence of a few strong dependence relations among variables that correspond to directed links and paths in the graphs visited by the MCMC chain; this also appears from the MPM CPDAG estimate reported in Fig. 8 which contains a moderate number of edges. Furthermore, to investigate how variables correlate each other, we provide a BMA estimate of the correlation matrix between (latent) variables (Fig. 7, lower plot).

Notably, most variables belonging to the burnout dimension as well as to the anxiety/depression sphere are positively correlated with the exception of mbi ea, namely the score measuring academic efficacy, which as expected is negatively associated with the other variables, and in particular with the perceived level of anxiety (stai). mbi ea is also positively influenced by variable study (hours of study per week), which in turn has a positive, although less marked, effect on student’s anxiety. Importantly however, students’ depression, here quantified by the cesd score, has a negative effect on health, as it appears from the graph estimate of Fig. 8 and the correlation matrix in Fig. 7. Also, age has a positive impact on variable jspe which summarizes the empathy dimension, implying that medical students develop throughout their academic life a growing ability in understanding and sharing feelings that are experienced by others. Finally, an interesting set of structural dependencies is represented by the directed path study $\to$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} stai $\to$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} cesd $\to$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document} health. The latter structure suggests that students more involved in studying activities may incur in higher levels of anxiety and depression, which consequently affects personal health conditions.

Figure 7 Student mental health data. Upper panel: heat map with estimated posterior probabilities of edge inclusion $\hat{p} (u \to v | X)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{p} (u\rightarrow v \,\vert \,\varvec{X})$$\end{document} for each edge (u, v). Lower panel: estimated correlation matrix

Figure 8 Student mental health data. Estimated CPDAG.

7. Discussion

We proposed a Bayesian semi-parametric methodology for structure learning of directed networks which applies to mixed data, i.e., data that include categorical, discrete and continuous measurements. Our model formulation assumes that the available observations are generated by latent variables whose joint distribution is multivariate Gaussian and satisfies the independence constraints imposed by a directed acyclic graph (DAG). Following a copula-based approach, we model separately the dependence parameter and the marginal distributions of the observed variables which are estimated nonparametrically. The former corresponds to the covariance matrix, Markov w.r.t. an unknown DAG, for which a DAG-Wishart prior is assumed. Importantly, we fully account for model uncertainty by assigning a prior distribution to DAG structures. In addition, constraints on the network structure that are known beforehand can be easily incorporated in our model. The resulting framework allows for posterior inference of DAGs and DAG-parameters which is carried out by implementing an MCMC algorithm. We finally investigate the performance of our methodology through simulation studies. Results, incuded in the Appendix, show that our method, when adopted to provide a single network estimate is highly competitive with the frequentist benchmark Copula PC. In addition, being Bayesian, uncertainty around network structures as well as dependence parameters is fully provided by our method.

The theory of latent traits assumes that available observations collected on subjects are associated to a (possibly small) number of latent individual characteristics (Hambleton and Cook, Reference Hambleton and Cook1977). A latent trait model then establishes a mathematical relationship between these unobservable features and the observed data, which represent manifestations of the traits. In this context, Moustaki and Knott (Reference Moustaki and Knott2000) generalized the classical latent trait model, originally introduced for categorical manifest variables, to mixed data and specifically variables whose distribution belongs to the exponential family. For each manifest variable they specify a suitable generalized linear model, where covariates correspond to a set of common latent traits following independent normal distributions. Differently, our graphical modeling framework employs latent variables to project observable mixed-type features into a latent space (of same dimension) which, once equipped with a network model, incorporates a dependence structure between latent variables. We conjecture that our method could be extended to a latent trait framework for mixed data as the one considered by Moustaki and Knott (Reference Moustaki and Knott2000). A related model formulation would establish a link between each of the q manifest variables and $K < q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K<q$$\end{document} latent factors whose joint Gaussian distribution satisfies the conditional independencies embedded in a DAG. Finally, the method could identify a set of dependence relations between latent traits represented through a directed network.

Our model formulation assumes a common DAG structure with allied dependence parameter for all the available observations. In some settings however, a clustering structure may be present in the sample, with subjects divided into groups that are defined beforehand or unknown and therefore to learn from the data. In the former case, multiple datasets could be analyzed jointly by using a multiple graphical model approach (Peterson et al., Reference Peterson, Stingo and Vannucci2015). The latter adopts a Markov random field prior that encourages common edges among group-specific graphs, and a spike-and-slab prior controlling network relatedness parameters. In the second case, one could instead set up a mixture model, where each mixture component corresponds to a possibly different network with allied parameters; as the output, a clustering structure of the subjects would be also available; see, for instance, Ickstadt et al. (Reference Ickstadt, Bornkamp, Grzegorczyk, Wieczorek, Sheriff, Grecco, Zamir, Bernardo, Bayarri, Berger, Dawid, Heckerman, Smith and West2011) and Lee et al. (Reference Lee, Chen, DeSarbo and Xue2022) for mixtures of graphical models in the Gaussian and ordinal framework, respectively. Following a Bayesian nonparametric approach, we could consider an infinite mixture model where each latent group is characterized by a component-specific parameter. A Dirichlet process prior on the space of DAGs and DAG-parameters could be then assumed; see in particular Rodríguez et al. (Reference Rodríguez, Lenkoski and Dobra2011) and Castelletti and Consonni (Reference Castelletti and Consonni2023) for, respectively, undirected and directed Gaussian graphical models. Extensions of the proposed copula model to multiple DAGs and mixtures of DAGs are possible and currently under investigation.

Acknowledgements

We thank the Editor and three reviewers for constructive comments that helped improve the paper. We gratefully acknowledge valuable suggestions from Maarten Marsman (University of Amsterdam) during the revision of our manuscript. Work carried out within MUR-PRIN grant 2022 SMNNKY - CUP J53D23003870008, funded by the European Union - Next Generation EU. The views and opinions expressed are only those of the authors and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them. Partial support from UCSC (D1 and 2019-D.3.2 research grants) is also acknowledged.

Appendix

Comparison with Copula PC

In this section, we compare our methodology with the benchmark Copula PC method of Cui et al. (Reference Cui, Groot, Heskes, Frasconi, Landwehr, Manco and Vreeken2016); see also Cui et al. (Reference Cui, Groot and Heskes2018). Copula PC is a two-step approach which can be applied to mixed data comprising categorical (binary and ordinal), discrete and continuous variables. It first estimates a correlation matrix in the space of latent variables (each associated with one of the observed variables) which is then used to test conditional independencies as in the standard PC algorithm. For the first step, the same Gibbs sampling scheme introduced by Hoff (Reference Hoff2007) and based on data augmentation with latent observations is adopted. Moreover, conditional independence tests are implemented at significance level $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} which we vary in ${0.01, 0.05, 0.10}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{0.01,0.05,0.10\}$$\end{document} ; lower values of $α$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} imply a higher expected level of sparsity in the estimated graph. We refer to the three benchmarks as Copula PC 0.01, 0.05 and 0.10, respectively. Output of Copula PC is a completed partially directed acyclic graph (CPDAG) representing the estimated equivalence class. With regard to our method, we also consider as a single graph estimate summarizing our MCMC output the CPDAG representing the equivalence class of the estimated median probability DAG model. Each model estimate is finally compared with the true CPDAG by means of the SHD between the two graphs.

Results for Scenario Free, type of variables Binary, Ordinal, Count, Mixed and each sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} are summarized in Fig. 9 which reports the distribution across $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulations of the SHD. It first appears that all methods improve their performances as the sample size n increases. In addition, structure learning is more difficult in the Binary case, while easier in general in the case of Ordinal and Count data. Moreover, Copula PC 0.01 (light grey) performs better than Copula PC 0.05 and 0.10 (middle and dark gray, respectively). Our method clearly outperforms the three benchmarks in the Binary scenario, a behavior which is more evident for large sample sizes. In addition, it performs better than Copula PC 0.05 and 0.10 most of the time under the remaining settings and remains highly competitive with Copula PC 0.01, with an overall better performance in terms of average SHD under almost all sample sizes for Scenarios Ordinal and Count.

Figure 9 Simulations. Distribution (across $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the structural Hamming distance (SHD) between estimated and true CPDAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free. Methods under comparison are: our Bayesian Copula DAG model (light blue) and the Copula PC method with independence tests implemented at significance level $α \in {0.01, 0.0, 0.10}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \in \{0.01,0.0,0.10\}$$\end{document} (from light to dark gray).

Comparison with Bayesian Parametric Strategy

Our methodology is based on a semi-parametric strategy which models separately the dependence parameter, corresponding to a DAG-dependent covariance matrix, and the marginal distributions of the observed variables, which are estimated using a rank-based nonparametric approach.

Alternatively, one can adopt suitable parametric families for modeling the various mixed types of variables, as in a generalized linear model (glm) framework. To implement this parametric strategy, we generalize the latent Gaussian DAG-model in (1) to accommodate a nonzero marginal mean for the latent variables. Specifically, we assume

(21)

\begin{matrix} Z_{1}, \dots, Z_{q} | μ, Ω, D \sim N_{q} (μ, Ω^{- 1}), \end{matrix}

with $μ \in R^{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\mu }\in \mathbb {R}^{q}$$\end{document} and $Ω \in P_{D}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\Omega }\in \mathcal {P}_{\mathcal {D}}$$\end{document} , the space of all s.p.d. precision matrices Markov w.r.t. DAG $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document} . The allied structural equation model (SEM) representation of such model is given by $η + L^{⊤} z = ε, ε \sim N_{q} (0, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\eta }+\varvec{L}^\top \varvec{z}= \varvec{\varepsilon }, \varvec{\varepsilon }\sim \mathcal {N}_q(\varvec{0},\varvec{D})$$\end{document} , or equivalently, in terms of node-distributions

(22)

\begin{matrix} Z_{j} = η_{j} - L_{≺ j]}^{⊤} z_{{pa}_{D} (j)} + ε_{j}, ε_{j} \overset{ind}{\sim} N (0, D_{jj}), \end{matrix}

for each $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} with $D_{jj} = Σ_{j | {pa}_{D} (j)}, L_{≺ j]} = - Σ_{≺ j ≻}^{- 1} Σ_{≺ j]}, η_{j} = μ_{j} + L_{≺ j]}^{⊤} μ_{{pa}_{D} (j)} .$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varvec{D}_{jj} = \varvec{\Sigma }_{j\,\vert \,\text {pa}_{\mathcal {D}}(j)}, \varvec{L}_{\prec j \, ]} = -\varvec{\Sigma }^{-1}_{\prec j \succ }\varvec{\Sigma }_{\prec j \, ]}, \eta _j = \mu _j + \varvec{L}^{\top }_{\prec j \, ]} \varvec{\mu }_{\text {pa}_{\mathcal {D}}(j)}. $$\end{document} Importantly, each equation in (22) now resembles the structure of a linear “regression” model with a nonzero intercept term $η_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _j$$\end{document} . A Normal-DAG-Wishart prior can be then assigned to $(η, D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta },\varvec{D},\varvec{L})$$\end{document} ; see Castelletti and Consonni (Reference Castelletti and Consonni2023, Supplement, Section 1) for full details. Under such prior, the posterior distribution of $(η, D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta },\varvec{D},\varvec{L})$$\end{document} given independent (latent) Gaussian data $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} is still Normal-DAG-Wishart and also a marginal data distribution is available in closed-from expression. Therefore, we can adapt the MCMC scheme of Sect. 4 to this more general framework and specifically with the update of $(D, L, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{D},\varvec{L},\mathcal {D})$$\end{document} in Sect. 4.1 replaced by $(η, D, L, D)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\varvec{\eta }, \varvec{D},\varvec{L},\mathcal {D})$$\end{document} .

Consider now the observed variables $X_{1}, \dots, X_{q}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1,\ldots ,X_q$$\end{document} , where each $X_{j} \sim F_{j} (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j\sim F_j(\cdot )$$\end{document} , a suitably specified parametric family for $X_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document} , e.g., Bernoulli, Poisson, or Binomial; see also Sect. 4.1. As in a glm framework, we assume that

(23)

\begin{matrix} E (X_{j} | z_{{pa}_{D} (j)}) = h^{- 1} (η_{j} - L_{≺ j]}^{⊤} z_{{pa}_{D} (j)}) \end{matrix}

where $h^{- 1} (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^{-1}(\cdot )$$\end{document} is a suitable inverse-link function and it appears that $η_{j} - L_{≺ j]}^{⊤} z_{{pa}_{D} (j)}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta _j - \varvec{L}_{\prec j \, ]}^\top \varvec{z}_{\text {pa}_{\mathcal {D}}(j)}$$\end{document} plays the role of the linear predictor in the glm model for $X_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document} . Specifically, we take $h (\cdot) = logit (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(\cdot )=\text {logit}(\cdot )$$\end{document} and $h (\cdot) = log (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(\cdot )=\log (\cdot )$$\end{document} for $X_{j} \sim Bern (π_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j \sim \text {Bern}(\pi _j)$$\end{document} and $X_{j} \sim Pois (λ_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j\sim \text {Pois}(\lambda _j)$$\end{document} , respectively. Moreover, for $X_{j} \sim Bin (n_{j}, π_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j \sim \text {Bin}(n_j,\pi _j)$$\end{document} we take $h (π_{j}) = logit (π_{j})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(\pi _j)=\text {logit}(\pi _j)$$\end{document} while fix $n_{j} = max {x_{i, j}, i = 1, \dots, n}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n_j=\max \{x_{i,j}, i=1,\ldots ,n\}$$\end{document} . From (5) we then have $Z_{j} = Φ^{- 1} (F_{j}, (X_{j} | z_{{pa}_{D} (j)})),$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ Z_j = \Phi ^{-1}\left\{ F_j(X_j\,\vert \,\varvec{z}_{\text {pa}_{\mathcal {D}}(j)})\right\} , $$\end{document} with $Φ (\cdot)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi (\cdot )$$\end{document} the standard normal c.d.f. and with $F_{j}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_j$$\end{document} implicitly depending on DAG parameters $(η_{j}, L_{≺ j]})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\eta _j,\varvec{L}_{\prec j \, ]})$$\end{document} through (23). The update of $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{Z}$$\end{document} in Sect. 4.2 conditionally on the DAG parameters is then replaced by computing $z_{i, j} = Φ^{- 1} (F_{j}, (x_{i, j} | z_{{pa}_{D} (j)}))$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_{i,j}=\Phi ^{-1}\left\{ F_j(x_{i,j}\,\vert \,\varvec{z}_{\text {pa}_{\mathcal {D}}(j)})\right\} $$\end{document} iteratively for each $i = 1, \dots, n$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=1,\ldots ,n$$\end{document} and $j = 1, \dots, q$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1,\ldots ,q$$\end{document} .

We consider the same simulation settings as in the Balanced Scenario, with the four different types of variables and with class of DAG structure Free; see Sect. 5.1. We compare the performance of the parametric strategy introduced above with our original method. Specifically, from the MCMC output provided by each method we first recover a CPDAG estimate and compare true and estimated graphs in terms of structural Hamming distance (SHD); see also Sect. 5.2 for details.

Results are summarized in the box-plots of Fig. 10, representing the distribution of SHD (across the 40 independent simulations) obtained from our original method (light blue) and its parametric version (dark blue) under the various scenarios. It appears that the parametric “version” of our method outperforms our original semi-parametric model in the Binary Scenario, while it is clearly outperformed under all the other scenarios for small-to-moderate sample sizes; however, the two approaches tend to perform similarly as the sample size n increases.

Figure 10 Simulations. Distribution (across $N = 40$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the structural Hamming distance (SHD) between estimated and true CPDAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size $n \in {100, 200, 500, 1000, 2000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free. Methods under comparison are: our original semi-parametric Bayesian Copula DAG model (light blue) and its modified version based on parametric assumptions (dark blue).

MCMC Diagnostics of Convergence and Computational Time

Our methodology relies on Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution of the parameters. Accordingly, diagnostics of convergence of the resulting MCMC output to the target distribution should be implemented before posterior analysis. In the following, we include a few results relative to the application of our method to the well-being data presented in Sect. 6.1.

As a first diagnostic tool, we monitor the behavior of the estimated posterior expectation of each correlation coefficient across iterations. Each quantity is computed at MCMC iteration s using the sampled values collected up to step s, for $s = 1, \dots,$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=1,\ldots ,$$\end{document} 25,000. According to the results, reported for selected variables $(X_{u}, X_{v})$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_u,X_v)$$\end{document} in Fig. 11, we discard the initial $B = 5000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B=5000$$\end{document} draws that are therefore used as a burnin period. The behavior of each traceplot suggests for each parameter an appreciable degree of convergence to the posterior mean.

Figure 11 Well-being data. Trace plots of the posterior mean of four correlation coefficients (for randomly selected variables $X_{u}, X_{v}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_u,X_v$$\end{document} ) estimated from the MCMC output up to iteration s, for $s = 1, \dots,$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=1,\ldots ,$$\end{document} 25,000.

As a further diagnostic, we run two independent MCMC chains of length $S =$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S=$$\end{document} 25,000, again including a burnin period of $B = 5000$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B=5000$$\end{document} runs, and with randomly chosen DAGs for the MCMC initialization. Results in terms of estimated posterior probabilities of edge inclusion computed from the two MCMC chains are reported in the heatmaps of Fig. 12 and suggest a visible agreement between the two outputs.

Figure 12 Well-being data. Estimated correlation matrices obtained under two independent MCMC chains.

Finally, we investigate the computational time of our algorithm as a function of the number of variables q and sample size n. Plots in Fig. 13 summarize the behavior of the running time (averaged over 40 replicates) per iteration, as a function of $q \in {5, 10, 20, 50, 100}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q\in \{5,10,20,50,100\}$$\end{document} for $n = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 500$$\end{document} , and as a function of $n \in {50, 100, 200, 500, 1000}$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{50,100,200,500,1000\}$$\end{document} for $q = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q = 20$$\end{document} . Results were obtained on a PC Intel(R) Core(TM) i7-8550U 1,80 GHz.

Figure 13 Computational time (in seconds) per iteration, as a function of the number of variables q for fixed $n = 500$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 500$$\end{document} (upper plot) and as a function of the sample size n for fixed $q = 20$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q = 20$$\end{document} (lower plot), averaged over 40 simulated datasets.

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Andersson, S. A., Madigan, D., Perlman, M. D.. (1997). On the Markov equivalence of chain graphs, undirected graphs, and acyclic digraphs. Scandinavian Journal of Statistics, 24, 81–102.CrossRef Google Scholar

Andrews, B., Ramsey, J., Cooper, G. F.. (2018). Scoring Bayesian networks of mixed variables. International Journal of Data Science and Analytics, 14, 2–18.Google Scholar

Andrews, B., Ramsey, J., & Cooper, G. F. (2019). Learning high-dimensional directed acyclic graphs with mixed data-types. In Proceedings of machine learning research, vol. 104 of Proceedings of Machine Learning Research. PMLR, pp. 4–21.Google Scholar

Barbieri, M. M., Berger, J. O.. (2004). Optimal predictive model selection. The Annals of Statistics, 32, 870–897.CrossRef Google Scholar

Bhadra, A., Rao, A., Baladandayuthapani, V.. (2018). Inferring network structure in non-normal and mixed discrete-continuous genomic data. Biometrics, 74, 185–195.CrossRef Google Scholar PubMed

Borsboom, D.. (2008). Psychometric perspectives on diagnostic systems. Journal of Clinical Psychology, 64, 1089–1108.CrossRef Google Scholar PubMed

Borsboom, D., Deserno, M. K., Rhemtulla, M., Epskamp, S., Fried, E. I., McNally, R. J., Robinaugh, D. J., Perugini, M., Dalege, G., Costantini, Jonasand, Isvoranu, A.-M., Wysocki, A. C., van Borkulo, C. D., van Bork, R., Waldorp, L. J.. (2021). Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers, 1, 58.CrossRef Google Scholar

Briganti, G., Scutari, M., McNally, R.. (2022). A tutorial on Bayesian networks for psychopathology researchers, Psychological Methods, Advance Online Publication NA–NA.Google Scholar PubMed

Cao, X., Khare, K., Ghosh, M.. (2019). Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models. The Annals of Statistics, 47, 319–348.CrossRef Google Scholar

Carrard, V., Bourquin, C., Berney, S., Schlegel, K., Gaume, J., Bart, P.-A., Preisig, M., Mast, M. S., Berney, A.. (2022). The relationship between medical students’ empathy, mental health, and burnout: A cross-sectional study. Medical Teacher, 44, 1392–1399.CrossRef Google Scholar PubMed

Castelletti, F., Consonni, G.. (2021). Bayesian causal inference in probit graphical models. Bayesian Analysis, 16, 1113–1137.CrossRef Google Scholar

Castelletti, F., Consonni, G.. (2023). Bayesian graphical modeling for heterogeneous causal effects. Statistics in Medicine, 42, 15–32.CrossRef Google Scholar PubMed

Castelletti, F., Consonni, G., Della Vedova, M., Peluso, S.. (2018). Learning Markov equivalence classes of directed acyclic graphs: An objective Bayes approach. Bayesian Analysis, 13, 1231–1256.CrossRef Google Scholar

Castelletti, F., & Mascaro, A. (2022). BCDAG: An R package for Bayesian structure and causal learning of Gaussian DAGs. arXiv pre-print arXiv:2201.12003.Google Scholar

Castelletti, F., Peluso, S.. (2021). Equivalence class selection of categorical graphical models. Computational Statistics & Data Analysis, 164, 107304.CrossRef Google Scholar

Cheng, J., Li, T., Levina, E., Zhu, J.. (2017). High-dimensional mixed graphical models. Journal of Computational and Graphical Statistics, 26, 367–378.CrossRef Google Scholar

Chickering, D. M.. (2002). Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2, 445–498.Google Scholar

Cowell, R. G., Dawid, P. A., Lauritzen, S. L., Spiegelhalter, D. J.. (1999). Probabilistic Networks and Expert Systems, Springer.Google Scholar

Cramer, A. O. J., Waldorp, L. J., Van der Maas, H. L. J., Borsboom, D.. (2010). Comorbidity: A network perspective. Behavioral and Brain Sciences, 33, 137–150.CrossRef Google Scholar PubMed

Cui, R., Groot, P., Heskes, T.. (2018). Learning causal structure from mixed data with missing values using Gaussian copula models. Statistics and Computing, 29, 311–333.CrossRef Google Scholar

Cui, R., Groot, P., Heskes, T.Frasconi, P., Landwehr, N., Manco, G., Vreeken, J.. (2016). Copula PC algorithm for causal discovery from mixed data. Machine learning and knowledge discovery in databases,Springer International Publishing 377–392.CrossRef Google Scholar

Dobra, A., Lenkoski, A.. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 2A, 969–993.Google Scholar

Edwards, D.. (2000). Introduction to graphical modelling, Springer.CrossRef Google Scholar

Epskamp, S., Kruis, J., Marsman, M.. (2017). Estimating psychopathological networks: Be careful what you wish for. PLOS ONE, 12, 1–13.CrossRef Google Scholar PubMed

Friedman, J., Hastie, T., Tibshirani, R.. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.CrossRef Google Scholar PubMed

Godsill, S. J.. (2012). On the relationship between Markov chain monte Carlo methods for model uncertainty. Journal of Computational and Graphical Statistics, 10, 230–248.CrossRef Google Scholar

Hambleton, R. K., Cook, L. L.. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96.CrossRef Google Scholar

Harris, N., Drton, M.. (2013). PC algorithm for nonparanormal graphical models. Journal of Machine Learning Research, 14, 3365–3383.Google Scholar

Haslbeck, J. M. B., Waldorp, L. J.. (2018). How well do network models predict observations? On the importance of predictability in network models. Behavior Research Methods, 50, 853–861.CrossRef Google Scholar PubMed

He, Y., Zhang, X., Wang, P., Zhang, L.. (2017). High dimensional Gaussian copula graphical model with FDR control. Computational Statistics & Data Analysis, 113, 457–474.CrossRef Google Scholar

Heckerman, D., Geiger, D., Chickering, D. M.. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.CrossRef Google Scholar

Hoeting, J. A., Madigan, D., Raftery, A. E., Volinsky, C. T.. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14, 382–401.Google Scholar

Hoff, P.. (2007). Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics, 1, 265–283.CrossRef Google Scholar

Ickstadt, K., Bornkamp, B., Grzegorczyk, M., Wieczorek, J., Sheriff, M. R., Grecco, H. E., Zamir, E.Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., West, M.. (2011). Nonparametric Bayesian networks. Bayesian Statistics 9,Oxford University Press 283–316.CrossRef Google Scholar

Isvoranu, A., Epskamp, S., Waldorp, L. J., Borsboom, D. E.. (2022). Network psychometrics with R: A guide for behavioral and social scientists, Routledge.CrossRef Google Scholar

Kalisch, M., Bühlmann, P.. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.Google Scholar

Lauritzen, S. L.. (1996). Graphical models, Oxford University Press.CrossRef Google Scholar

Lauritzen, S. L., Wermuth, N.. (1989). Graphical models for associations between variables, some of which are qualitative and some quantitative. The Annals of Statistics, 17, 31–57.Google Scholar

Lee, J., & Hastie, T. (2013). Structure learning of mixed graphical models. In Carvalho, C. M. & Ravikumar, P. (eds.), Proceedings of the sixteenth international conference on artificial intelligence and statistics, vol. 31 of Proceedings of Machine Learning Research. Scottsdale, Arizona, USA: PMLR, pp. 388–396.Google Scholar

Lee, K. H., Chen, Q., DeSarbo, W. S., Xue, L.. (2022). Estimating finite mixtures of ordinal graphical models. Psychometrika, 87, 83–106.CrossRef Google Scholar PubMed

Liu, H., Lafferty, J., Wasserman, L.. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10, 2295–2328.Google Scholar

Maathuis, M., Nandy, P.Bühlmann, P., Drineas, P., Kane, M., van der Laan, M.. (2016). A review of some recent advances in causal inference. Handbook of big data,Chapman and Hall/CRC 387–408.Google Scholar

Marsman, M., Huth, K., Waldorp, L. J., Ntzoufras, I.. (2022). Objective Bayesian edge screening and structure selection for Ising networks. Psychometrika, 87, 47–82.CrossRef Google Scholar PubMed

Marsman, M., Rhemtulla, M.. (2022). Guest Editors’ introduction to the special issue “Network Psychometrics in action”: methodological innovations inspired by empirical problems. Psychometrika, 87, 1–11.CrossRef Google Scholar

Meinshausen, N., Bühlmann, P.. (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 34, 1436–1462.CrossRef Google Scholar

Mohammadi, A., Abegaz, F., van den Heuvel, E., Wit, E. C.. (2017). Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66, 629–645.Google Scholar

Moustaki, I., Knott, M.. (2000). Generalized latent trait models. Psychometrika, 65, 391–411.CrossRef Google Scholar

Müller, D., Czado, C.. (2019). Dependence modelling in ultra high dimensions with vine copulas and the graphical lasso. Computational Statistics & Data Analysis, 137, 211–232.CrossRef Google Scholar

Ni, Y., Baladandayuthapani, V., Vannucci, M., Stingo, F. C.. (2022). Bayesian graphical models for modern biological applications. Statistical Methods & Applications, 31, 197–225.CrossRef Google Scholar PubMed

Pearl, J.. (2000). Causality: Models, reasoning, and inference, Cambridge University Press.Google Scholar

Peluso, S., Consonni, G.. (2020). Compatible priors for model selection of high-dimensional Gaussian DAGs. Electronic Journal of Statistics, 14, 4110–4132.CrossRef Google Scholar

Peterson, C., Stingo, F. C., Vannucci, M.. (2015). Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association, 110, 159–174.CrossRef Google Scholar PubMed

Rodríguez, A., Lenkoski, A., Dobra, A.. (2011). Sparse covariance estimation in heterogeneous samples. Electronic Journal of Statistics, 5, 981–1014.CrossRef Google Scholar PubMed

Scott, J. G., Berger, J. O.. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, 38, 2587–2619.CrossRef Google Scholar

Spearman, C.. (1904). General intelligence, objectively determined and measured. The American Journal of Psychology, 15, 201–292.CrossRef Google Scholar

Spirtes, P., Glymour, C., Scheines, R.. (2000). Causation, prediction and search (2nd edition), The MIT Press 1–16.Google Scholar

Wirth, R., Edwards, M.. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12, 58–79.CrossRef Google Scholar PubMed

Figure 1 Three DAGs on the set of nodes V={u,v,z}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V=\{u,v,z\}$$\end{document}. D1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_1$$\end{document} and D2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_2$$\end{document} encode the conditional independence u⊥⊥z|v\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \bot \hspace{-5pt}\bot z \,\vert \,v$$\end{document}. In D3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_3$$\end{document} we instead have u⊥⊥z\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u \bot \hspace{-5pt}\bot z$$\end{document}.

Figure 2 Simulations. Receiver operating characteristic (ROC) curve obtained under varying thresholds for the posterior probabilities of edge inclusion for type of variables Mixed (b.4), sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free, Regression, Block (a). Dotted lines represent the (average over the 40 simulated DAGs) ROC curve, while the blue area represents the 5th–95th percentile band.

Figure 3 Simulations. Distribution (across N=40\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the relative structural Hamming distance (SHD/edges) between estimated and true DAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free, Regression, Block (a).

Table 1 Simulations. Binary data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

Table 2 Simulations. Ordinal data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

Table 3 Simulations. Count data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

Table 4 Simulations. Mixed data. Specificity (SPE) and sensitivity (SEN) indexes (averaged over the 40 simulations) computed w.r.t. the median probability DAG estimate, for sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} and under each scenario Free, Regression, Block corresponding to different classes of DAG structures.

Figure 4 Simulations with unbalanced correlation structure. Distribution (across N=40\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the relative structural Hamming distance (SHD/edges) between estimated and true DAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free, Regression, Block (a)

Figure 5 Well-being data. Upper panel: Heat map with estimated posterior probabilities of edge inclusion p^(u→v|X)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{p} (u\rightarrow v \,\vert \,\varvec{X})$$\end{document} for each edge (u, v). Lower panel: Estimated correlation matrix.

Figure 6 Well-being data. Estimated CPDAG.

Figure 7 Student mental health data. Upper panel: heat map with estimated posterior probabilities of edge inclusion p^(u→v|X)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{p} (u\rightarrow v \,\vert \,\varvec{X})$$\end{document} for each edge (u, v). Lower panel: estimated correlation matrix

Figure 8 Student mental health data. Estimated CPDAG.

Figure 9 Simulations. Distribution (across N=40\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the structural Hamming distance (SHD) between estimated and true CPDAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free. Methods under comparison are: our Bayesian Copula DAG model (light blue) and the Copula PC method with independence tests implemented at significance level α∈{0.01,0.0,0.10}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \in \{0.01,0.0,0.10\}$$\end{document} (from light to dark gray).

Figure 10 Simulations. Distribution (across N=40\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=40$$\end{document} simulated datasets) of the structural Hamming distance (SHD) between estimated and true CPDAG for type of variables Binary, Ordinal, Count, Mixed (b), sample size n∈{100,200,500,1000,2000}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in \{100,200,500,1000,2000\}$$\end{document} (c) and type of DAG structure Free. Methods under comparison are: our original semi-parametric Bayesian Copula DAG model (light blue) and its modified version based on parametric assumptions (dark blue).

Figure 11 Well-being data. Trace plots of the posterior mean of four correlation coefficients (for randomly selected variables Xu,Xv\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_u,X_v$$\end{document}) estimated from the MCMC output up to iteration s, for s=1,…,\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=1,\ldots ,$$\end{document}25,000.

Figure 12 Well-being data. Estimated correlation matrices obtained under two independent MCMC chains.

Figure 13 Computational time (in seconds) per iteration, as a function of the number of variables q for fixed n=500\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 500$$\end{document} (upper plot) and as a function of the sample size n for fixed q=20\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q = 20$$\end{document} (lower plot), averaged over 40 simulated datasets.

Article contents

Learning Bayesian Networks: A Copula Approach for Mixed-Type Data

Abstract

Keywords

1. Introduction

1.1. Background and Motivation

1.2. Literature Review

1.3. Contribution and Structure of the Paper

2. Model Specification

2.1. Directed Acyclic Graphs

2.2. Gaussian DAG Models

2.3. Copula DAG Models

3. Bayesian Inference

3.1. Prior on D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document}

3.2. Prior on (D,L) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{D}},{\varvec{L}})$$\end{document}

4. Computational Implementation and Posterior Inference

4.2. Update of Z \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Z}}$$\end{document}

4.3. Posterior Inference

5. Simulation Study

5.1. Scenarios

5.2. Results

5.3. Simulation Experiments with Unbalanced Correlation Structure

6. Real Data Analyses

6.1. Well-Being Data

6.2. Student Mental Health Data

7. Discussion

Acknowledgements

Appendix

Comparison with Copula PC

Comparison with Bayesian Parametric Strategy

MCMC Diagnostics of Convergence and Computational Time

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests

3.1. Prior on $D$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}$$\end{document}

3.2. Prior on $(D, L)$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\varvec{D}},{\varvec{L}})$$\end{document}

4.2. Update of $Z$ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{Z}}$$\end{document}