Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2025-01-05T15:00:02.558Z Has data issue: false hasContentIssue false

What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data

Published online by Cambridge University Press:  27 December 2024

Yang Liu*
Affiliation:
University of Maryland
Weimeng Wang
Affiliation:
University of Maryland
*
Correspondence should be made to Yang Liu, Department of Human Development and Quantitative Methodology, University of Maryland, 3304R Benjamin Bldg, 3942 Campus Dr,College Park,MD20742, USA. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents’ ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2023 The Author(s) under exclusive licence to The Psychometric Society

1. Introduction

Psychometric investigation on cognitive ability and speed has a long and rich history (e.g., Carroll Reference Carroll1993; Gulliksen Reference Gulliksen1950; Luce Reference Luce1986; Thorndike et al. Reference Thorndike, Bregman, Cobb and Woodyard1926). In the Reference Thorndike, Bregman, Cobb and Woodyard1926 monograph, Thorndike et al. stated that “level”, “extent”, and “speed” are three distinct aspects in any measure of performance: While both “level” and “extent” are manifested by correctness of answers and thus can be collectively translated to ability in modern terminology, “the speed of producing any given product is defined, of course, by the time required” (Thorndike et al. Reference Thorndike, Bregman, Cobb and Woodyard1926, p. 26). The prevalence of computerized test administration and data collection in recent years facilitates the acquisition of response-time (RT) data at the level of individual test items. In parallel, we witnessed a mushrooming development of psychometric models for item responses and RT over the past few decades (see De Boeck & Jeon, Reference De Boeck and Jeon2019; Goldhammer, Reference Goldhammer2015, for reviews), which in turn gave rise to broader investigations on the relationship between response speed and accuracy in various substantive domains (see Lee & Chen, Reference Lee and Chen2011; Kyllonen & Zu, Reference Kyllonen and Zu2016; von Davier et al., Reference von Davier, Khorramdel, He, Shin and Chen2019, for reviews). Empirical findings suggested that response speed not only composes proficiency or informs the construct to be measured but also bespeaks secondary test-taking behaviors such as rapid guessing (Deribo et al., Reference Deribo, Kroehne and Goldhammer2021; Wise, Reference Wise2017), using preknowledge (Qian et al., Reference Qian, Staniewska, Reckase and Woo2016; Sinharay, Reference Sinharay2020; Sinharay and Johnson, Reference Sinharay and Johnson2020), lacking motivation (Finn, Reference Finn2015; Thurstone, Reference Thurstone1937; Wise and Kong, Reference Wise and Kong2005), etc.

Characterizing individual differences in ability and speed with item responses and RT data is in essence a factor analysis problem (Molenaar et al., Reference Molenaar, Tuerlinckx and van der Maas2015a, Reference Molenaar, Tuerlinckx and van der Maasb). The two-factor simple-structure model proposed by van der Linden (Reference van der Linden2007) was arguably the most popular modeling option so far: Item responses and log-transformed RT variables are treated as two independent clusters of observed indicators for the ability and speed/slowness factors, respectively, and the two latent factors jointly follow a bivariate normal distribution (see Fig. 2 of Molenaar et al., Reference Molenaar, Tuerlinckx and van der Maas2015a, for a path-diagram representation). A notable merit of the simple-structure factor model is its plug-and-play nature: Analysts can separately apply standard item response theory (IRT) models for discrete responses (e.g., one-, two-, three-, or four- parameter logistic [1-4PL] model; Birnbaum, Reference Birnbaum1968; Barton & Lord, Reference Barton and Lord1981) and standard factor analysis models for the continuous log-RT variables (e.g., linear-normal factor model; Jöreskog, Reference Jöreskog1969), and then simply let the two latent factors covary. Despite its succinctness and popularity, the simple-structure model may fit poorly to empirical data. A highly endorsed interpretation for the lack of fit is that the two inter-dependent latent factors cannot fully explain the dependencies among item-level responses and RT variables. Based on this rationale, numerous diagnostics for residual dependencies and remedial modifications of the simple-structure model have been proposed in the recent literature (e.g., Bolsinova et al., Reference Bolsinova, De Boeck and Tijmstra2017; Bolsinova & Maris, Reference Bolsinova and Maris2016; Bolsinova & Molenaar, Reference Bolsinova and Molenaar2018; Bolsinova et al., Reference Bolsinova, Tijmstra and Molenaar2017; Bolsinova & Tijmstra, Reference Bolsinova and Tijmstra2016; Glas & van der Linden, Reference Glas and van der Linden2010; Meng et al., Reference Meng, Tao and Chang2015; Ranger & Ortner, Reference Ranger and Ortner2012; van der Linden & Glas, Reference van der Linden and Glas2010).

Augmenting standard IRT models with a measurement component for item-level RT may result in more precise ability scores, which is often highlighted as a practical benefit of RT modeling in educational assessment (Bolsinova and Tijmstra, Reference Bolsinova and Tijmstra2018; van der Linden et al., Reference van der Linden, Klein Entink and Fox2010). Under a simple-structure model with bivariate normal factors, the degree to which item-level RT improves scoring precision is dictated by the strength of the inter-factor correlation (see Study 1 of van der Linden et al., Reference van der Linden, Klein Entink and Fox2010). However, near-zero correlation estimates between ability and speed were sometimes encountered in real-world applications (e.g., Bolsinova, De Boeck, & Tijmstra, Reference Bolsinova, De Boeck and Tijmstra2017; Bolsinova, Tijmstra, & Molenaar, Reference Bolsinova, Tijmstra and Molenaar2017; Lee & Jia, Reference Lee and Jia2014; van der Linden, Scrams, & Schnipke, Reference van der Linden, Scrams and Schnipke1999). Whenever it happens, analysts are inclined to conclude that item-level RT is not useful for ability estimation at all, or that a less parsimonious factor structure is needed to enhance the utility of RT for scoring purposes (e.g., allowing the log-RT variables to cross-load on the ability factor; Bolsinova & Tijmstra, Reference Bolsinova and Tijmstra2018).

Indeed, van der Linden’s (Reference van der Linden2007) model could be overly restrictive for analyzing item responses and RT data. We, however, do not want to rush to the conclusion that it is the simple factor structure that should be blamed and abandoned. Other parametric assumptions, such as link functions, linear or curvilinear dependencies, and distributions of latent traits and error terms, are also part of the model specification and may contribute to the misfit as well. A fair evaluation on the tenability and usefulness of a simple factor structure demands a version of the model with minimal parametric assumptions other than the simple factor structure itself, which we refer to as a semiparametric simple-structure model. Should the semiparametric model still struggle to fit the data adequately, we no longer hesitate to give up on the simple factor structure.

Fortunately, the major components of a semiparametric simple-structure factor analysis have been readily developed in the existing literature. They are

  1. 1. A semiparametric (unidimensional) IRT model for dichotomous and polytomous responses (Abrahamowicz and Ramsay, Reference Abrahamowicz and Ramsay1992; Rossi et al., Reference Rossi, Wang and Ramsay2002);

  2. 2. A semiparametric (unidimensional) factor model for continuous log-RT variables (Liu and Wang, Reference Liu and Wang2022)

  3. 3. A nonparametric copula density estimator for ability and speed/slowness with fixed marginals (Kauermann et al., Reference Kauermann, Schellhase and Ruppert2013; Dou et al., Reference Dou, Kuriki, Lin and Richards2021).

As a side remark, we are aware of alternative semiparametric approaches that can be used for each of the above three components: for example, the monotonic polynomial logistic model for item responses (Falk and Cai, Reference Falk and Cai2016a, Reference Falk and Caib), the proportional hazard model (Kang, Reference Kang2017; Ranger and Kuhn, Reference Ranger and Kuhn2012; Wang et al., Reference Wang, Fan, Chang and Douglas2013b) and the linear transformation model (Wang et al., Reference Wang, Chang and Douglas2013a) for item-level RT, and the finite normal mixture model (Bauer, Reference Bauer2005; Pek et al., Reference Pek, Sterba, Kok and Bauer2009) and the Davidian curve model (Woods and Lin, Reference Woods and Lin2009; Zhang and Davidian, Reference Zhang and Davidian2001; Zhang et al., Reference Zhang, Wang, Weiss and Tao2021) for the joint distribution of latent traits. However, we focus on methods based on smoothing splines in the current analysis. Besides, the simultaneous incorporation of flexible models for all the three components of a simple structure model appears to be novel in the literature of RT modeling. Compared to, e.g., Wang et al. (Reference Wang, Chang and Douglas2013a) and Wang et al. (Reference Wang, Fan, Chang and Douglas2013b), in which semiparametric models were applied to only the RT data, our model fares more flexible and thus is more likely to reveal sophisticated dependency patterns in a joint analysis of item responses and RT data.

By retrospectively analyzing a set of mathematics testing data from the 2015 Programme for International Student Assessment (PISA; OECD, 2016), we revisit the following research questions that have only been partially answered previously through parametric simple-structure models:

  1. (1) Is a simple factor structure sufficient for a joint analysis of item response and RT?

  2. (2) How strong are math ability and general processing speed associated in the population of respondents?

  3. (3) To what extent can processing speed improve the precision in ability estimates under a simple-structure model?

It is worth mentioning that the data set was previously analyzed by Zhan et al. (Reference Zhan, Liao and Bian2018) using a variant of van der Linden’s (Reference van der Linden2007) simple-structure model with testlet effects: A higher-order cognitive diagnostics model with testlet effects was used for item responses, a linear-normal factor model was used for log-transformed RT, and the (higher-order) ability and speed factors were assumed to be bivariate normal. Zhan et al. (Reference Zhan, Liao and Bian2018) reported an estimated inter-factor correlation of -0.2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.2$$\end{document} and hence concluded that the association between speed and ability is weak. We are particularly interested in whether their conclusion stands after abandoning inessential parametric assumptions other than the simple factor structure.

The rest of the paper is organized as follows. We first provide a technical introduction of the proposed semiparametric procedure in Sect. 2: The three components of the semiparametric simple-structure model are formulated in Sects. 2.1 and 2.2, penalized maximum likelihood (PML) estimation and empirical selection of penalty weights are outlined in Sect. 2.3, and bootstrap-based goodness-of-fit assessment and inferences are described in Sects. 2.4 and 2.5. Descriptive statistics for the 2015 PISA mathematics data and a plan of our analysis are summarized in Sect. 3, followed by a detailed report of results in Sect. 4. The paper concludes with a discussion of broader implications of our findings and limitations of our method.

2. Methods

2.1. Unidimensional Semiparametric Factor Models

Let YijYjR \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}\in {\mathcal {Y}}_j\subset {\mathbb {R}}$$\end{document} be the jth manifest variable (MV) observed for respondent i: Yij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} represents either a discrete response to a test item or a continuous item-level RT. In our semiparametric factor model, the distribution of Yij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} is characterized by the following logistic conditional densityFootnote 1 of Yij=yYj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij} = y\in {\mathcal {Y}}_j$$\end{document} given a unidimensional latent variable (LV; also known as latent factor, latent trait, etc.) Xi=xXR \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i = x\in {\mathcal {X}}\subset {\mathbb {R}}$$\end{document} :

(1) fj(y|x)=expgj(x,y)Yjexpgj(x,y)μj(dy), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f_j(y|x) = \frac{\exp \left( g_j(x, y)\right) }{\int _{{\mathcal {Y}}_j}\exp \left( g_j(x, y')\right) \mu _j(\hbox {d}y')}, \end{aligned}$$\end{document}

in which the normalizing integral with respect to the dominating measure μj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _j$$\end{document} on Yj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}_j$$\end{document} is assumed to be finite. Equation 1 defines a valid conditional density as it is non-negative and integrates to unity with respect to y for a given x. However, the bivariate function gj:X×YjR \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j: {\mathcal {X}}\times {\mathcal {Y}}_j\rightarrow {\mathbb {R}}$$\end{document} is not identifiable: It is not difficult to see that adding any univariate function of x to gj(x,y) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j(x, y)$$\end{document} does not change the value of Eq. 1 (Gu, Reference Gu1995, Reference Gu2013). To impose necessary identification constraints, we re-write gj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j$$\end{document} by the functional analysis of variance (fANOVA) decomposition

(2) gj(x,y)=gjy(y)+gjxy(x,y) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g_j(x, y) = g_j^y(y) + g_j^{xy}(x, y) \end{aligned}$$\end{document}

and require that

(3) gjy(y0)=0,gjxy(x0,y)0,andgjxy(x,y0)0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g_j^y(y_0) = 0,\ g_j^{xy}(x_0, y)\equiv 0,\ \hbox {and}\ g_j^{xy}(x, y_0)\equiv 0 \end{aligned}$$\end{document}

for some reference levels x0X \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_0\in {\mathcal {X}}$$\end{document} and y0Yj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_0\in {\mathcal {Y}}_j$$\end{document} . Equation 3 is referred to as side conditions; x0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_0$$\end{document} and y0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_0$$\end{document} can be set arbitrarily within the respective domains (see Liu & Wang, Reference Liu and Wang2022, for more detailed comments). The univariate component gjy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^y$$\end{document} and the bivariate component gjxy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^{xy}$$\end{document} are functional parameters to be estimated from observed data.

Let ψj:YjRLj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_j: {\mathcal {Y}}_j\rightarrow {\mathbb {R}}^{L_j}$$\end{document} be a collection of Lj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_j$$\end{document} basis functions defined on the support of Yij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} , and φ:XRK \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\varphi }: {\mathcal {X}}\rightarrow {\mathbb {R}}^K$$\end{document} be a collection of K basis functions defined on the support of Xi \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document} . We proceed to approximate the functional parameters by basis expansion. In particular, we set the univariate component

(4) gjy(y)=ψj(y)αj, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g_j^y(y) = \varvec{\psi }_j(y)^\top \varvec{\alpha }_j, \end{aligned}$$\end{document}

in which the coefficient vector αjRLj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }_j\in {\mathbb {R}}^{L_j}$$\end{document} satisfies

(5) ψj(y0)αj=0. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{\psi }_j(y_0)^\top \varvec{\alpha }_j = 0. \end{aligned}$$\end{document}

Similarly, the bivariate component is expressed as

(6) gjxy(x,y)=ψj(y)Bjφ(x), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} g_j^{xy}(x, y) = \varvec{\psi }_j(y)^\top {\textbf{B}}_j\varvec{\varphi }(x), \end{aligned}$$\end{document}

in which the coefficient matrix BjRLj×K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{B}}_j\in {\mathbb {R}}^{L_j\times K}$$\end{document} satisfies

(7) Bjφ(x0)=0andBjψj(y0)=0. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\textbf{B}}_j\varvec{\varphi }(x_0) = \textbf{0}\ \hbox {and}\ {\textbf{B}}_j^\top \varvec{\psi }_j(y_0) = \textbf{0}. \end{aligned}$$\end{document}

The linear constraints imposed for the coefficients αj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }_j$$\end{document} and Bj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{B}}_j$$\end{document} (Eqs. 5 and 7) guarantee that the side conditions (Eq. 3) are satisfied.

Continuous Data When both XiX \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i\in {\mathcal {X}}$$\end{document} and YijYj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}\in {\mathcal {Y}}_j$$\end{document} (equipped with the Lebesgue measure μj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _j$$\end{document} ) are continuous random variables defined on closed intervals, Eq. 1 corresponds to the semiparametric factor model considered by Liu and Wang (Reference Liu and Wang2022). Without loss of generality, let X=Yj=[0,1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {X}}= {\mathcal {Y}}_j = [0, 1]$$\end{document} . In fact, any closed interval can be rescaled to the unit interval via a linear transform: If z[a,b] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z\in [a, b]$$\end{document} , a<b \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a < b$$\end{document} , then (z-a)/(b-a)[0,1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(z - a)/(b - a)\in [0, 1]$$\end{document} . To approximate smooth functional parameters supported on unit intervals or squares, we use the same cubic B-spline basis with equally spaced knots (De Boor, Reference De Boor1978) for both ψj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_j$$\end{document} and φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\varphi }$$\end{document} (and thus Lj=K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_j = K$$\end{document} ). It is sometimes desirable to force the MV to be stochastically increasing as the LV increases. Liu and Wang (Reference Liu and Wang2022) considered a simple approach to impose likelihood-ratio monotonicity, which boils down to the following linear inequality constraints on the coefficient matrix Bj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{B}}_j$$\end{document} :

(8) (DKDK)vec(Bj)0. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} ({\textbf{D}}_K\otimes {\textbf{D}}_K)\,\hbox {vec}({\textbf{B}}_j)\ge \textbf{0}. \end{aligned}$$\end{document}

In Eq. 8, vec(·) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {vec}(\cdot )$$\end{document} denotes the vectorization operator, and

DK=1-11-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\textbf{D}}_K = \begin{bmatrix} 1 &{} -1 &{} &{} \\ &{} \ddots &{} \ddots &{} \\ &{} &{} 1 &{} -1 \end{bmatrix} \end{aligned}$$\end{document}

is a (K-1)×K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(K - 1)\times K$$\end{document} first-order difference matrix. We also set D1=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{D}}_1 = 1$$\end{document} by convention.

Discrete Data When Yj={0,,Cj-1} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Y}}_j = \{0, \dots , C_j - 1\}$$\end{document} and μj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _j$$\end{document} is the associated counting measure, let y0=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_0 = 0$$\end{document} , Lj=Cj-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_j = C_j - 1$$\end{document} , and ψj(y)=(ψj1(y),,ψj,Cj-1(y)) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\psi }_j(y) = (\psi _{j1}(y), \dots , \psi _{j,C_j-1}(y))^\top $$\end{document} such that ψjk(y)=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\psi _{jk}(y) = 1$$\end{document} if y=k \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y = k$$\end{document} and 0 if yk \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y\ne k$$\end{document} . Then our generic model (Eqs. 1 and 2) reduces to Abrahamowicz and Ramsay’s (1992) multi-categorical semiparametric IRT model for unordered polytomous responses, which is further equivalent to the semiparametric logistic IRT proposed by Ramsay and Winsberg (Reference Ramsay and Winsberg1991) and Rossi et al. (Reference Rossi, Wang and Ramsay2002) when Cj=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_j = 2$$\end{document} (i.e., dichotomous data). It is because the basis expansions (i.e., Eqs. 4 and 6) are simplified to gjy(y)=αjy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^y(y) = \alpha _{jy}$$\end{document} and gjxy(x,y)=φ(x)βjy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^{xy}(x, y) = \varvec{\varphi }(x)^\top \varvec{\beta }_{jy}$$\end{document} , in which βjy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\beta }_{jy}^\top $$\end{document} denotes the yth row of Bj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{B}}_j$$\end{document} , if y=1,,Cj-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y = 1, \dots , C_j - 1$$\end{document} ; meanwhile, gjy(0)=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^y(0) = 0$$\end{document} and gjxy(x,0)0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^{xy}(x, 0)\equiv 0$$\end{document} as part of the side conditions. The conditional density (e.g., Eq. 1) then becomes the item response function (IRF)

(9) fj(y|x)=11+c=1Cj-1expαjc+φ(x)βjc,ify=0,[20pt]expαjy+φ(x)βjy1+c=1Cj-1expαjc+φ(x)βjc,ify=1,,Cj-1. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f_j(y|x) = \left\{ \begin{array}{ll} \displaystyle \frac{1}{1 + \sum _{c = 1}^{C_j - 1}\exp \left( \alpha _{jc} + \varvec{\varphi }(x)^\top \varvec{\beta }_{jc}\right) }, &{} \hbox {if }y = 0,\\[20pt] \displaystyle \frac{\exp \left( \alpha _{jy} + \varvec{\varphi }(x)^\top \varvec{\beta }_{jy}\right) }{1 + \sum _{c = 1}^{C_j - 1}\exp \left( \alpha _{jc} + \varvec{\varphi }(x)^\top \varvec{\beta }_{jc}\right) }, &{} \hbox {if }y = 1,\dots , C_j - 1. \end{array}\right. \end{aligned}$$\end{document}

Like the continuous case, we only consider X=[0,1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {X}}= [0, 1]$$\end{document} and φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\varphi }$$\end{document} being a cubic B-spline basis defined by a sequence of equally spaced knots. Similar to Eq. 8 in the continuous case, we may impose likelihood-ratio monotonicity on the conditional density by

(10) (DKDCj-1)vec(Bj)0, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} ({\textbf{D}}_K\otimes {\textbf{D}}_{C_j - 1})\,\hbox {vec}({\textbf{B}}_j)\ge \textbf{0}, \end{aligned}$$\end{document}

which reduces to Dkβj10 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{D}}_k\varvec{\beta }_{j1}\ge \textbf{0}$$\end{document} when Cj=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_j = 2$$\end{document} (i.e., dichotomous items).

2.2. Simple Factor Structure and Latent Variable Density

Consider a battery of m1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_1$$\end{document} continuous MVs and m2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_2$$\end{document} discrete MVs and write m=m1+m2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = m_1 + m_2$$\end{document} . We typically have m1=m2=m/2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_1 = m_2 = m/2$$\end{document} when the discrete responses and continuous RT variables are observed for the same set of items. From now on, denote by Yi1,,Yi,m1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{i1}, \dots , Y_{i,m_1}$$\end{document} the base-10 log-transformed RT, each of which is rescaled to [0, 1], and by Yi,m1+1,,Yim \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{i, m_1+1}, \dots , Y_{im}$$\end{document} the corresponding responses. Let Xi1,Xi2[0,1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}, X_{i2}\in [0, 1]$$\end{document} be the slowness Footnote 2 and ability factors for respondent i, respectively. A simple factor structure requires that the item responses Yi,m1+1,,Yim \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{i,m_1+1}, \dots , Y_{im}$$\end{document} are conditionally independent of the slowness factor Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}$$\end{document} given the ability factor Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}$$\end{document} , and symmetrically that the log-RT variables Yi1,,Yi,m1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{i1}, \dots , Y_{i,m_1}$$\end{document} are independent of Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}$$\end{document} given Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}$$\end{document} . We also make the local independence assumption that is standard in factor analysis (McDonald, Reference McDonald1982): Yi1,,Yim \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{i1}, \dots , Y_{im}$$\end{document} are mutually independent conditional on Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}$$\end{document} and Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}$$\end{document} . Further let Yi=(Yi1,,Yim) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}_i = (Y_{i1}, \dots , Y_{im})^\top $$\end{document} collect all the MVs produced by respondent i. The simple structure and local independence assumptions imply that

(11) f(y|x1,x2)=j=1m1f(yj|x1)·j=m1+1mf(yj|x2), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f({\textbf{y}}|x_1, x_2) = \prod _{j=1}^{m_1}f(y_j|x_1)\cdot \prod _{j=m_1+1}^{m}f(y_j|x_2), \end{aligned}$$\end{document}

in which y=(y1,,ym)[0,1]m1×Ym1+1××Ym \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}}= (y_1, \dots , y_m)^\top \in [0, 1]^{m_1}\times {\mathcal {Y}}_{m_1 + 1}\times \cdots \times {\mathcal {Y}}_{m}$$\end{document} , and x1,x2[0,1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1, x_2\in [0, 1]$$\end{document} .

For convenience in approximating functional parameters, both Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}$$\end{document} and Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}$$\end{document} are assumed to follow a Uniform[0, 1] distribution marginally. However, we are aware that uniformly distributed LVs are less attractive for substantive interpretation. Adopting the strategy of Liu and Wang (Reference Liu and Wang2022), we define Xid=Φ-1(Xid) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{id}^* = \Phi ^{-1}(X_{id})$$\end{document} , d=1,2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d = 1, 2$$\end{document} , where Φ-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi ^{-1}$$\end{document} is the standard normal quantile function; the transformed LVs are marginally N(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{N}(0, 1)$$\end{document} variates, in agreement with the standard formulation in parametric factor analysis. To capture the potentially complex association between latent slowness and ability, we employ a nonparametric estimator for the copula density (Sklar, Reference Sklar1959; Nelsen, Reference Nelsen2006) of (Xi1,Xi2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_{i1}, X_{i2})^\top $$\end{document} , denoted c(x1,x2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c(x_1, x_2)$$\end{document} . A copula density is non-negative and has uniform marginals: That is,

(12) c(x1,x2)0and01c(x1,x2)dx1=01c(x1,x2)dx21,x1,x2[0,1]. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} c(x_1, x_2)\ge 0\ \hbox {and }\int _0^1 c(x_1, x_2)\hbox {d}x_1 = \int _0^1 c(x_1, x_2)\hbox {d}x_2\equiv 1,\ \forall x_1, x_2\in [0, 1]. \end{aligned}$$\end{document}

c is in fact the joint density of (Xi1,Xi2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_{i1}, X_{i2})^\top $$\end{document} since both Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}$$\end{document} and Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}$$\end{document} are marginally uniform. In the light of Sklar’s theorem, the joint density of the transformed (Xi1,Xi2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_{i1}^*, X_{i2}^*)^\top $$\end{document} can be calculated by

(13) h(x1,x2)=c(Φ(x1),Φ(x2))ϕ(x1)ϕ(x2), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} h(x_1^*, x_2^*) = c(\Phi (x_1^*), \Phi (x_2^*))\phi (x_1^*)\phi (x_2^*), \end{aligned}$$\end{document}

in which ϕ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi $$\end{document} and Φ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi $$\end{document} are the density and distribution functions of N(0,1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal {N}}}}(0, 1)$$\end{document} , respectively.

We approximate the bivariate copula density c by a tensor-product spline (Dou et al., Reference Dou, Kuriki, Lin and Richards2021; Kauermann et al., Reference Kauermann, Schellhase and Ruppert2013):

(14) c(x1,x2)=φ(x2)Ξφ(x1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} c(x_1, x_2) = \varvec{\varphi }(x_2)^\top \mathbf {\Xi }\varvec{\varphi }(x_1) \end{aligned}$$\end{document}

in which φ:[0,1]RK \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\varphi }: [0, 1]\rightarrow {\mathbb {R}}^K$$\end{document} is a set of cubic B-spline basis functions defined with equally spaced knotsFootnote 3, and Ξ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {\Xi }$$\end{document} is an K×K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\times K$$\end{document} coefficient matrix. For Eq. 14 to be a proper copula density, we impose the following linear constraints on Ξ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {\Xi }$$\end{document} :

(15) ξkl0,k,l=1,,K,andΞκ=Ξκ=1, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \xi _{kl}\ge 0,\ \forall k, l = 1, \dots , K,\ \hbox {and }\mathbf {\Xi }\varvec{\kappa }= \mathbf {\Xi }^\top \varvec{\kappa }= \textbf{1}, \end{aligned}$$\end{document}

in which ξkl \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{kl}$$\end{document} is the (kl)th element of Ξ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {\Xi }$$\end{document} , and

(16) κ=01φ(x)dx \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \varvec{\kappa }= \int _0^1\varvec{\varphi }(x)\hbox {d}x \end{aligned}$$\end{document}

is a K×1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\times 1$$\end{document} vector of normalizing constants for basis functions. It can be verified by elementary properties of B-splines and straightforward algebra that Eqs. 14 and 15 imply Eq. 12.

2.3. Estimation

For each MV j=1,,m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1,\dots , m$$\end{document} , let θj=(αj,vec(Bj)) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_j = (\varvec{\alpha }_j^\top , \hbox {vec}({\textbf{B}}_j)^\top )^\top $$\end{document} collect all the coefficients in gjy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^y$$\end{document} and gjxy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_j^{xy}$$\end{document} . Also let θ=(θj,,θm,vec(Ξ)) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }= (\varvec{\theta }_j^\top , \dots , \varvec{\theta }_m^\top , \hbox {vec}(\mathbf {\Xi })^\top )^\top $$\end{document} denote all the coefficients in the simple-structure factor model. We estimate θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} by penalized maximum (marginal) likelihood (PML). The marginal likelihood for the MV vector Yi=y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{Y}}_i = {\textbf{y}}$$\end{document} amounts to the integration of Eq. 11 over x1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1$$\end{document} and x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2$$\end{document} under the copula density c(x1,x2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c(x_1, x_2)$$\end{document} : That is,

(17) f(y;θ)=[0,1]2f(y|x1,x2)c(x1,x2)dx1dx2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f({\textbf{y}}; \varvec{\theta }) = \iint _{[0, 1]^2}f({\textbf{y}}|x_1, x_2)c(x_1, x_2)\hbox {d}x_1\hbox {d}x_2. \end{aligned}$$\end{document}

Pooling across an independent and identically distributed (i.i.d.) sample of size n, we arrive at the sample log-likelihood function

(18) (θ;y1:n)=i=1nlogf(yi;θ), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \ell (\varvec{\theta }; {\textbf{y}}_{1:n}) = \sum _{i=1}^n\log f({\textbf{y}}_i; \varvec{\theta }), \end{aligned}$$\end{document}

in which y1:n=(y1,,yn) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}}_{1:n} = ({\textbf{y}}_1, \dots , {\textbf{y}}_n)^\top $$\end{document} denotes an n×m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times m$$\end{document} matrix of observed MV data.

To avoid overfitting, we regularize the roughness of estimated functional parameters by quadratic-form penalties in spline coefficients. For a continuous MV j, the penalty term for θj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }_j$$\end{document} is the sum of a univariate P-spline penalty for αj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }_j$$\end{document} and a bivariate P-spline penalty for Bj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{B}}_j$$\end{document} (Eilers and Marx, Reference Eilers and Marx1996; Currie et al., Reference Currie, Durban and Eilers2006):

(19) qj(θj;λj)=λj2αjEKEKαj+λj2vec(Bj)IKEKEK+EKEKIKvec(Bj), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q_j(\varvec{\theta }_j; \lambda _j) = \frac{\lambda _j}{2}\varvec{\alpha }_j^\top {\textbf{E}}_K^\top {\textbf{E}}_K^{}\varvec{\alpha }_j^{} + \frac{\lambda _j}{2}\hbox {vec}({\textbf{B}}_j)^\top \left( {\textbf{I}}_K\otimes {\textbf{E}}_K^\top {\textbf{E}}_K^{} + {\textbf{E}}_K^\top {\textbf{E}}_K^{}\otimes {\textbf{I}}_K \right) \hbox {vec}({\textbf{B}}_j), \end{aligned}$$\end{document}

in which λj>0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _j > 0$$\end{document} is the penalty weight, IK \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{I}}_K$$\end{document} denotes a K×K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\times K$$\end{document} identity matrix, and

EK=1-211-21 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\textbf{E}}_K = \begin{bmatrix} 1 &{} -2 &{} 1 &{} &{} \\ &{} \ddots &{} \ddots &{} \ddots &{}\\ &{}&{} 1 &{} -2 &{} 1 \\ \end{bmatrix} \end{aligned}$$\end{document}

is a second-order difference matrix of dimension (K-2)×K \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(K - 2)\times K$$\end{document} . If the MV is polytomous, no penalty is needed for the intercepts αj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\alpha }_j$$\end{document} and columns of Bj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{B}}_j$$\end{document} . The resulting P-spline penalty term then becomes

(20) qj(θj;λj)=λj2vec(Bj)EKEKICj-1vec(Bj). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q_j(\varvec{\theta }_j; \lambda _j) = \frac{\lambda _j}{2}\hbox {vec}({\textbf{B}}_j)^\top \left( {\textbf{E}}_K^\top {\textbf{E}}_K^{}\otimes {\textbf{I}}_{C_j - 1} \right) \hbox {vec}({\textbf{B}}_j). \end{aligned}$$\end{document}

A similar bivariate P-spline penalty is also introduced for the coefficient matrix Ξ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf {\Xi }$$\end{document} :

(21) q(Ξ;λm+1)=λm+12vec(Ξ)IKEKEK+EKEKIKvec(Ξ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q(\mathbf {\Xi }; \lambda _{m + 1}) = \frac{\lambda _{m+1}}{2}\hbox {vec}(\mathbf {\Xi })^\top \left( {\textbf{I}}_K\otimes {\textbf{E}}_K^\top {\textbf{E}}_K^{} + {\textbf{E}}_K^\top {\textbf{E}}_K^{}\otimes {\textbf{I}}_K \right) \hbox {vec}(\mathbf {\Xi }) \end{aligned}$$\end{document}

with a positive penalty weight λm+1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{m+1}$$\end{document} . Combining Eqs. 1821, we express the penalized sample log-likelihood function as

(22) p(θ;y1:n,λ)=(θ;y1:n)-nj=1mqj(θj,λj)+q(Ξ;λm+1), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(\varvec{\theta }; {\textbf{y}}_{1:n}, \varvec{\lambda }) = \ell (\varvec{\theta }; {\textbf{y}}_{1:n}) - n\left[ \sum _{j=1}^mq_j(\varvec{\theta }_j, \lambda _j) + q(\mathbf {\Xi }; \lambda _{m+1})\right] , \end{aligned}$$\end{document}

in which λ=(λ1,,λm,λm+1)(0,)m+1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }= (\lambda _1, \dots , \lambda _m, \lambda _{m+1})^\top \in (0, \infty )^{m + 1}$$\end{document} . PML estimation amounts to finding θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} that maximizes Eq. 22 subject to a series of linear equality and inequality constraints (i.e., Eqs. 5, 7, 8, 10, and 15), which is accomplished by a modified expectation-maximization (EM; Bock & Aitkin, Reference Bock and Aitkin1981; Dempster et al., Reference Dempster, Laird and Rubin1977) algorithm. A sequential quadratic programming algorithm (Nocedal & Wright, Reference Nocedal and Wright2006, Algorithm 18.3) is employed in the M-step to handle constrained optimization. The algorithm is a simple extension to what was described in Sects. 4.1 and 4.2 of Liu and Wang (Reference Liu and Wang2022); further details are therefore omitted for succinctness. Denote by θ^(y1:n,λ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{\varvec{\theta }}}}({\textbf{y}}_{1:n}, \varvec{\lambda })$$\end{document} the PML estimates of θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} obtained from data y1:n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}}_{1:n}$$\end{document} and penalty weights λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }$$\end{document} .

Larger penalty weights enforce less variable yet more biased solutions and vice versa—a well-known phenomenon referred to as the bias-variance trade-off. To strike a balance, we select the optimal λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }$$\end{document} from a pre-specified grid by multi-fold cross-validation. Let Ω1,,ΩS \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _1, \dots , \Omega _S$$\end{document} be a partition of the sample: s=1SΩs={1,,n} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bigcup _{s=1}^S\Omega _s = \{1, \dots , n\}$$\end{document} and ΩsΩs= \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _s\cap \Omega _{s'} = \emptyset $$\end{document} for all ss \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\ne s'$$\end{document} . For each s, let Ωsc \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _s^c$$\end{document} be the calibration set and Ωs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _s$$\end{document} be the validation set, in which the superscript c denotes the complement of a set. Predictive adequacy associated with a particular λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }$$\end{document} is gauged by the empirical risk

(23) R(y1:n,λ)=-1Ss=1S1|Ωs|(θ^(yΩsc,λ);yΩs), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} R({\textbf{y}}_{1:n}, \varvec{\lambda }) = -\frac{1}{S}\sum _{s = 1}^S\frac{1}{|\Omega _s|}\ell ({{\hat{\varvec{\theta }}}}({\textbf{y}}_{\Omega _s^c}, \varvec{\lambda }); {\textbf{y}}_{\Omega _s}), \end{aligned}$$\end{document}

in which |Ωs| \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\Omega _s|$$\end{document} denotes the size of Ωs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega _s$$\end{document} , (θ^(yΩsc,λ);yΩs) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell ({{\hat{\varvec{\theta }}}}({\textbf{y}}_{\Omega _s^c}, \varvec{\lambda }); {\textbf{y}}_{\Omega _s})$$\end{document} denotes the log-likelihood of the validation sub-sample evaluated at the estimated coefficients from the calibration set. Instead of choosing λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }$$\end{document} that minimizes Eq. 23 (i.e., the best solution), we adopt the “one standard error (SE)” heuristic (Chen and Yang, Reference Chen and Yang2021; Hastie et al., Reference Hastie, Tibshirani and Friedman2009) to take into account sampling variability: We select the smoothest solution within one SE from the λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }$$\end{document} that minimizes the empirical risk, where the SE at a specific λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }$$\end{document} is estimated by

(24) SE(y1:n,λ)=1S-1s=1S-1|Ωs|(θ^(yΩsc,λ);yΩs)-R(y1:n,λ)2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{SE}({\textbf{y}}_{1:n}, \varvec{\lambda }) = \sqrt{\frac{1}{S-1}\sum _{s = 1}^S\left[ -\frac{1}{|\Omega _s|}\ell ({{\hat{\varvec{\theta }}}}({\textbf{y}}_{\Omega _s^c}, \varvec{\lambda }); {\textbf{y}}_{\Omega _s}) - R({\textbf{y}}_{1:n}, \varvec{\lambda })\right] ^2}. \end{aligned}$$\end{document}

The value λ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }$$\end{document} contains m+1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m + 1$$\end{document} elements. To alleviate the computational burden for penalty weights selection, we set λ1==λm1=λ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _1 = \cdots = \lambda _{m_1} = \lambda _{(c)}$$\end{document} for continuous MVs, λm1+1==λm=λ(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{m_1 + 1} = \cdots = \lambda _m = \lambda _{(d)}$$\end{document} for discrete MVs, and λm+1=λ(g) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{m + 1} = \lambda _{(g)}$$\end{document} for the copula density of the two LVs. We also resort to a multistage workaround to select the remaining three penalty weights: (1) A unidimensional model is fitted to only the continuous MVs to find the optimal λ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(c)}$$\end{document} , (2) a unidimensional model is fitted to only the discrete MVs to find the optimal λ(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document} , and (3) a two-dimensional simple-structure model is fitted to all the MVs to find the optimal λ(g) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(g)}$$\end{document} while fixing λ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(c)}$$\end{document} and λ(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document} at their optimal values determined in earlier stages. The optimal weights thereby selected are denoted λ^(y1:n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{\varvec{\lambda }}}}({\textbf{y}}_{1:n})$$\end{document} . We then refit the model using the optimal weight and the full set of data to obtain the final solution of spline coefficients θ^(y1:n,λ^(y1:n)) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{\varvec{\theta }}}}({\textbf{y}}_{1:n}, {{\hat{\varvec{\lambda }}}}({\textbf{y}}_{1:n}))$$\end{document} .

2.4. Model Fit Diagnostics and Inferences

We quantify the sampling variability of sample statistics, including goodness of fit diagnostics and approximations to functional parameters, by bootstrapping (Efron and Tibshirani, Reference Efron and Tibshirani1994; Hastie et al., Reference Hastie, Tibshirani and Friedman2009). Let Y¯i \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\bar{{\textbf{Y}}}}}_i$$\end{document} be a random sample from the collection of observed MV vectors {y1,,yn} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{{\textbf{y}}_1, \dots , {\textbf{y}}_n\}$$\end{document} such that each element is selected with probability 1/n. Sample with replacement n times and denote the resulting bootstrap sample Y¯1:n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\bar{{\textbf{Y}}}}}_{1:n}$$\end{document} . We approximate the sampling distribution of any test statistic T(Y1:n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T({\textbf{Y}}_{1:n})$$\end{document} by the bootstrap sampling distribution of T(Y¯1:n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T({{\bar{{\textbf{Y}}}}}_{1:n})$$\end{document} conditional on y1:n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}}_{1:n}$$\end{document} . Note that most of the test statistics under investigation depend on the optimal penalty weights λ^(y1:n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{\varvec{\lambda }}}}({\textbf{y}}_{1:n})$$\end{document} , which is a function of the observed data. Pilot runs suggest that the variability of the optimal weights is small over bootstrap samples; we therefore treat λ=λ^(y1:n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }= {{\hat{\varvec{\lambda }}}}({\textbf{y}}_{1:n})$$\end{document} as fixed and do not repeat penalty weight selection in the resampling process, which substantially reduces computational time.

Let Sij=ςj(Yij) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{ij} = \varsigma _j(Y_{ij})$$\end{document} be the MV score associated with the individual response entry Yij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} . For continuous log-RT variables and dichotomous items, we simply let ςj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varsigma _j$$\end{document} be the identity function and thus Sij=Yij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{ij} = Y_{ij}$$\end{document} ; for unordered polytomous items, however, a customized ςj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varsigma _j$$\end{document} function is needed for recoding raw responses to a more meaningful scale (see Sect. 3.3 for an example). To assess the lack-of-fit for the simple-structure semiparametric model—in particular the unaccounted dependencies residing in observed MVs, we compute the residual correlation statistic

(25) ejj(y1:n,λ)=rjj-ρjj(θ^(y1:n,λ)). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} e_{jj'}(\textbf{y}_{1:n}, \varvec{\lambda }) = r_{jj'} - \rho _{jj'}({{\hat{\varvec{\theta }}}}({\textbf{y}}_{1:n}, \varvec{\lambda })). \end{aligned}$$\end{document}

for j,j=1,,m \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j, j' = 1,\dots , m$$\end{document} , j<j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j < j'$$\end{document} . In Eq. 25, rjj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{jj'}$$\end{document} and ρjj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{jj'}$$\end{document} are the respective sample and model-implied correlations between the jth and j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j'$$\end{document} th MV scores: The model-implied correlation can be further expressed as

(26) ρjj=μjj-μjμj(μjj-μj2)(μjj-μj2), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \rho _{jj'} = \frac{\mu _{jj'} - \mu _j\mu _{j'}}{\sqrt{(\mu _{jj} - \mu _j^2)(\mu _{j'j'} - \mu _{j'}^2)}}, \end{aligned}$$\end{document}

in which we drop the dependency on θ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\theta }$$\end{document} for conciseness. In Eq. 26, the first moment μj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _j$$\end{document} can be computed as

(27) μj=Yjςj(y)01fj(y|x)dxdy. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _j = \int _{{\mathcal {Y}}_j}\varsigma _j(y)\left[ \int _0^1f_j(y|x) \hbox {d}x\right] \hbox {d}y. \end{aligned}$$\end{document}

There are three cases when computing the second moment μjj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{jj'}$$\end{document} : (1) for a single MV, i.e., j=j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = j'$$\end{document} ,

(28) μjj=Yjςj(y)201fj(y|x)dxdy; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _{jj} = \int _{{\mathcal {Y}}_j}\varsigma _j(y)^2\left[ \int _0^1f_j(y|x) \hbox {d}x\right] \hbox {d}y; \end{aligned}$$\end{document}

(2) when jj \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j\ne j'$$\end{document} but the two MVs load on the same LV,

(29) μjj=Yj×Yjςj(y)ςj(z)01fj(y|x)fj(z|x)dxdydz; \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _{jj'} = \int _{{\mathcal {Y}}_j\times {\mathcal {Y}}_{j'}}\varsigma _j(y)\varsigma _{j'}(z)\left[ \int _0^1f_j(y|x)f_{j'}(z|x) \hbox {d}x\right] \hbox {d}y\hbox {d}z; \end{aligned}$$\end{document}

and (3) when the jth and j \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j'$$\end{document} th MVs load respectively on the first and second LVs,

(30) μjj=Yj×Yjςj(y)ςj(z)[0,1]2fj(y|x1)fj(z|x2)c(x1,x2)dx1dx2dydz. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mu _{jj'} = \int _{{\mathcal {Y}}_j\times {\mathcal {Y}}_{j'}}\varsigma _{j}(y)\varsigma _{j'}(z)\left[ \int _{[0, 1]^2}f_j(y|x_1)f_{j'}(z|x_2)c(x_1, x_2) \hbox {d}x_1\hbox {d}x_2\right] \hbox {d}y\hbox {d}z. \end{aligned}$$\end{document}

2.5. Latent Variable Density and Scores

As we have mentioned in Sect. 2.2, inferences for LVs are made based on the marginally normal Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}^*$$\end{document} and Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}^*$$\end{document} . In particular, we are interested in the strength of association between the two LVs. To this end, we compute the coefficient of determination for predicting ability ( Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^*_{i2}$$\end{document} ) by slowness ( Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^*_{i1}$$\end{document} ):

(31) η2=1-EVar(Xi2|Xi1)Var(Xi2)=VarE(Xi2|Xi1)Var(Xi2)=RRx2h(x2|x1)dx22h(x1)dx1, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \eta ^2 =&1 - \frac{{\mathbb {E}}\left[ \hbox {Var}(X_{i2}^*|X_{i1}^*) \right] }{\hbox {Var}(X_{i2}^*)} = \frac{\hbox {Var}\left[ {\mathbb {E}}(X_{i2}^*|X_{i1}^*) \right] }{\hbox {Var}(X_{i2}^*)}\nonumber \\ =&\int _{\mathbb {R}}\left[ \int _{\mathbb {R}}x_2^*h(x_2^*|x_1^*) \hbox {d}x_2^* \right] ^2h(x_1^*)\hbox {d}x_1^*, \end{aligned}$$\end{document}

in which h(xd)=Rh(x1,x2)dx3-d \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(x_d^*) = \int _{{\mathbb {R}}} h(x_1^*, x_2^*)\hbox {d}x^*_{3-d}$$\end{document} , d=1,2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d = 1, 2$$\end{document} , is the marginal density of Xid \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{id}^*$$\end{document} (assumed to be standard normal), and h(x2|x1)=h(x1,x2)/h(x1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(x_2^*|x_1^*) = h(x_1^*, x_2^*) / h(x_1^*)$$\end{document} is the conditional density of x2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_2^*$$\end{document} given x1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1^*$$\end{document} . Equation 31 reduces to the usual coefficient of determination for linear models when (Xi1,Xi2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_{i1}^*, X_{i2}^*)^\top $$\end{document} follows a bivariate normal distribution. When analyzing real data, we evaluate η2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta ^2$$\end{document} using the estimated LV density, denoted η^2(y1:n,λ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\eta }}^2({\textbf{y}}_{1:n}, \varvec{\lambda })$$\end{document} ; the sampling variability of η^2(y1:n,λ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\eta }}^2({\textbf{y}}_{1:n}, \varvec{\lambda })$$\end{document} is again characterized by bootstrapping (Sect. 2.4).

For each respondent i, LV scores can be predicted based on the posterior distribution of (Xi1,Xi2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_{i1}^*, X_{i2}^*)^\top $$\end{document} given yi=y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}}_i = {\textbf{y}}$$\end{document} with density

(32) fh(x1,x2|y)=f(y|Φ(x1),Φ(x2))h(x1,x2)R2f(y|Φ(x1),Φ(x2))h(x1,x2)dx1dx2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f^h(x_1^*, x_2^*|{\textbf{y}}) = \frac{f({\textbf{y}}|\Phi (x_1^*), \Phi (x_2^*))h(x_1^*, x_2^*)}{\iint _{{\mathbb {R}}^2} f({\textbf{y}}|\Phi (x_1^*), \Phi (x_2^*))h(x_1^*, x_2^*)\hbox {d}x_1^*\hbox {d}x_2^*}. \end{aligned}$$\end{document}

The means of the posterior distribution are often referred to as the expected a posteriori (EAP) scores, and the corresponding standard deviations (SDs) gauge the precision of the EAP scores (Thissen and Wainer, Reference Thissen and Wainer2001). In practice, density functions involved in Eq. 32 must be estimated from sample data, which introduces additional uncertainty to scores computed from the estimated posterior. Better precision measures can be obtained from a predictive distribution of LV scores (Liu and Yang, Reference Liu and Yang2018a, Reference Liu and Yangb; Yang et al., Reference Yang, Hansen and Cai2012). Let f^h(x1,x2|y;y1:n,λ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{f}}}^h(x_1^*, x_2^*|{\textbf{y}}; {\textbf{y}}_{1:n}, \varvec{\lambda })$$\end{document} be the estimated posterior density. The bootstrap expectation Ef^h(x1,x2|y;Y¯1:n,λ) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {E}}{{\hat{f}}}^h(x_1^*, x_2^*|{\textbf{y}}; {{\bar{{\textbf{Y}}}}}_{1:n}, \varvec{\lambda })$$\end{document} with respect to the (random) bootstrap sample Y¯1:n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\bar{{\textbf{Y}}}}}_{1:n}$$\end{document} defines a suitable predictive density; the inverse variance of the predictive distribution, which is henceforth referred to as the predictive precision, can be conveniently estimated from a collection of bootstrap samples. To set the baseline for assessing the gain in predictive precision, we also consider the marginal posterior density of the ability factor Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}^*$$\end{document} :

(33) fh(x2|ym1+1,,ym)=j=m1+1mf(yj|Φ(x2))h(x2)Rj=m1+1mf(yj|Φ(x2))h(x2)dx2. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f^h(x_2^*|y_{m_1 + 1}, \dots , y_m) = \frac{\prod _{j=m_1 + 1}^mf(y_j|\Phi (x_2^*))h(x_2^*)}{\int _{\mathbb {R}}\prod _{j=m_1 + 1}^mf(y_j|\Phi (x_2^*))h(x_2^*)\hbox {d}x_2^*}. \end{aligned}$$\end{document}

Estimated marginal EAP scores and the associated bootstrap predictive precisions can be obtained in a fashion similar to the two-dimensional case.

3. Data and Analysis Plan

3.1. PISA 2015 Mathematics Data

The data we analyze next came from the PISA 2015 computer-based mathematics assessment (OECD, 2016). The test is composed of 17 dichotomously scored items from two mathematics testing clusters (M1 and M2). Similar to the Zhan et al. (Reference Zhan, Liao and Bian2018), we only retained cases with complete response entries, leading to a total number of n=8606 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 8606$$\end{document} observations from 58 countries/economies.

Among the 17 items, there are four testlets (with item labels starting with CM155, CM411, CM496, and CM564), each of which involves a pair of items. We collapsed the two items within each testlet into a single four-category nominal item: The four categories 0, 1, 2, and 3 indicated the original item response patterns (0, 0), (1, 0), (0, 1), and (1, 1), respectively. The corresponding RT entries were also summed to a single testlet-level RT variable. Accordingly, the number of items involved in the initial fitting is m1=m2=13 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_1 = m_2 = 13$$\end{document} , and the number of MVs is m=26 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = 26$$\end{document} . During data preprocessing, we identified a number of extremely small and large RT entries, which are potential outliers and may cause instability in model fitting. Therefore, we excluded for each MV the top and bottom 1% RT and the associated item response dataFootnote 4. Then we took the base-10 logarithm of the RT variables and rescaled them to the unit interval. Selected descriptive statistics of the final data can be found in Tables 1 and 2.

Table 1 Descriptive statistics for transformed response time (RT).

For CM155, CM411, CM496, and CM564, summary statistics are computed for the log-transformed RT of the testlets. SD: Standard deviation. Skew: Skewness. Kurt: Kurtosis. CorrTotal: Correlation with the total sum of log-RT.

We applied the based-10 logarithm to the raw RT data and then rescaled the log-transformed variables to [0, 1].

Table 2 Descriptive statistics for item responses.

For testlets CM155, CM411, CM496, and CM564, the original item response patterns (0, 0), (1, 0), (0, 1), and (1, 1) are recoded to 0, 1, 2, and 3, respectively. P1–3: Observed proportions of response categories 1–3. CorrTotal: Correlation with total score; we apply the scoring function described in Sect. 3.3 to the testlet MVs before computing the total score.

3.2. Analysis Plan

As we have mentioned in Sect. 1, the data set was analyzed in the previous work by Zhan et al. (Reference Zhan, Liao and Bian2018) using a parametric simple-structure model. Though we acknowledge the parsimony and thus retain a simple factor structure, our analysis differs substantially from the previous work, because we model MV-LV and LV-LV dependencies in a nonparametric fashion and are able to provide an ultimate assessment for the validity of a simple factor structure in this data set. Once we confirm that the dependencies in the MVs are sufficiently accounted for, we present graphics and statistics based on the fitted model to demonstrate how the respective distributions of item responses and RT are governed by the ability and slowness factors, as well as how ability and slowness covary in the population of respondents.

Major steps of our analysis are outlined as follows.

  1. step 1. Determine the optimal penalty weights λ^(y1:n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\hat{\varvec{\lambda }}}}({\textbf{y}}_{1:n})$$\end{document} by the three-stage procedure described in Sect. 2.3.

  2. step 2. Draw B=100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B = 100$$\end{document} bootstrap samples (i.e., resample with replacement) from the observed data y1:n \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}}_{1:n}$$\end{document} and repeat model fitting in each bootstrap sample with λ=λ^(y1:n) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\lambda }= {{\hat{\varvec{\lambda }}}}({\textbf{y}}_{1:n})$$\end{document} .

  3. step 3. Examine the residual correlation statistics (Eq. 25) for all pairs of MVs. Flag a pair if the 90% two-sided bootstrap CI for the residual correlation fall entirely above 0.1 or below -0.1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.1$$\end{document} .

  4. step 4. Remove problematic items from the test and repeat steps 1–3 until no large residual correlation remains.

  5. step 5. Plot the conditional densities of the MVs given the marginally normal LVs (Eq. 1 with x=Φ(x) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x = \Phi (x^*)$$\end{document} ) and the joint density of the two LVs (Eq. 13). Compute estimated η2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta ^2$$\end{document} statistics (Eq. 31), EAP scores, and the associated predictive SDs for the scores.

Per the request from two referees, we also report in the supplementary document the empirical risk statistics and density estimates for two parametric models. The first model is a standard baseline model for the joint analysis of item response and RT data, which features linear-normal factor models for log-RT variables, 2PL models for item responses, nominal response models for testlets, and a bivariate normal LV density. Due to the strong parametric assumptions made therein, we do not expect the baseline model to fit the data well. Inspired by the semiparametric fitting, we also specified an updated parametric model with nonlinear factor models with quintic mean functions for log-RT variables, 4PL models for item responses, nominal models for testlets, and a two-component normal mixture density for the LVs. Even though the updated model has yet to attain a fit comparable to the semiparametric model, it reproduces key functional patterns in the semiparametric estimates of the bivariate LV density and the conditional densities for the MVs. Despite being tangential to the specific aims of the present work, these additional analyses exemplify another standard usage of semiparametric/nonparametric models: to provide diagnostic information about model-data fit and to guide model modification.

3.3. Detailed Configuration

For replicability, we provide all the tuning details involved in our analysis. PML estimation of the semiparametric simple structure model was implemented in the R package spfa, which can be downloaded at https://github.com/wwang1370/spfa and https://cran.r-project.org/web/packages/spfa/index.html.

Estimation K=13 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 13$$\end{document} B-splines basis functions were used for approximating smooth functions defined on the unit interval. Each log-RT variable was linearly transformed to [0, 1] using the sample minimum and maximum. The reference level for LVs and continuous MVs was set to x0=y0=0.5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_0 = y_0 = 0.5$$\end{document} ; for discrete MVs, the reference level was set to the first response category y0=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_0 = 0$$\end{document} . We impose likelihood-ratio monotonicity on item CM442Q02 since both its responses and RT show the highest correlations with totals (see Tables 1 and 2). Intractable integrals appeared in the conditional densities (Eq. 1) were approximated by a 21-point Gauss-Legendre quadrature rescaled to the unit interval. The marginal likelihood function (Eq. 17) involves a two-dimensional integral over the unit square and was approximated by a tensor-product Gauss-Legendre quadrature. In each fitting, we executed the EM algorithm until the change in the penalized log-likelihood (i.e., Eq. 22) was less than 10-3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-3}$$\end{document} between consecutive iterations.

Penalty Weight Selection We selected the three penalty weights λ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(c)}$$\end{document} , λ(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document} , and λ(g) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(g)}$$\end{document} from the following sequences of decreasing values:

λ(c){10-1,10-2,,10-6},λ(d){101,10-1,,10-4},λ(g){10-2,10-4,,10-8}. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \lambda _{(c)} \in \ {}&\{10^{-1}, 10^{-2}, \dots , 10^{-6}\},\\ \lambda _{(d)} \in \ {}&\{10^1, 10^{-1}, \dots , 10^{-4}\},\\ \lambda _{(g)} \in \ {}&\{10^{-2}, 10^{-4}, \dots , 10^{-8}\}. \end{aligned}$$\end{document}

The empirical risk was computed by five-fold cross-validation (Eq. 23 with S=5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S = 5$$\end{document} ). The smoothest solution within one SE (estimated by Eq. 24) from the minimal-risk solution was deemed optimal.

Inference Conditional on the optimal penalty weights, we resampled B=100 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B = 100$$\end{document} times with replacement, refit the model in each bootstrap sample, and examine the bootstrap distributions of fitted densities and model fit statistics. When computing fit diagnostics and summary statistics, we approximated intractable integrals by the same quadrature systems that were used in parameter estimation. The MV scoring functionFootnote 5 for testlet responses was defined by ςj(0)=0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varsigma _j(0) = 0$$\end{document} , ςj(1)=ςj(2)=1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varsigma _j(1) = \varsigma _j(2) = 1$$\end{document} , and ςj(3)=2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varsigma _j(3) = 2$$\end{document} .

4. Results

4.1. Model Fit and Modification

In the initial fitting of the semiparametric simple-structure model (using all 26 MVs), our cross-validation procedure selects 10-4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-4}$$\end{document} , 10-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-1}$$\end{document} , and 10-4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-4}$$\end{document} as the respective optimal values for λ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(c)}$$\end{document} , λ(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document} , and λ(g) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(g)}$$\end{document} . A graphical display of the results can be found in the first row of Fig. 1.

Figure 1 Empirical risks (Eq. 23) and standard errors (SE; Eq. 24). The rows of the graphical table correspond to the initial fitting (with all items) and the updated fitting (without the response time of CM034Q01 and CM571Q01). The columns represent the three stages of penalty weight selection (see Sect. 2.3). Within each panel, empirical risk values are plotted as functions of based-10 log-transformed penalty weights. Vertical bars indicate one SE above and below the empirical risk. The minimized empirical risks are shown as circles, while the optimal solutions determined by the “one SE rule” were highlighted as filled dots. The band formed by two horizontal dashed lines indicates the one-SE region associated with the minimum empirical risk. Note that the two graphs in the second column are identical: This is because all item responses are retained, and thus we do not need to re-select λ(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document} . LD: Local dependence. λ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(c)}$$\end{document} , λ(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document} , λ(g) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(g)}$$\end{document} : Penalty weights for continuous manifest variables (MVs), discrete MVs, and the latent density.

Based on a full-data fitting with the optimal penalty weights, we summarize the residual correlation statistics (Eq. 25) for all pairs of MVs in a graphical table (Fig. 2). It is observed that dependencies within RT variables are well explained by the slowness factor, and similarly dependencies within item responses are well explained by the ability factor. The largest residual correlation in the left panel of Fig. 2 is 0.1 (between the log-RT of CM571Q01 and CM603Q01) with a 90% bootstrap CI [0.08, 0.12]. In contrast, we identify some non-ignorable residual dependencies between the log-RT and response of the same item (i.e., diagonal entries in the right panel of Fig. 2). The within-item residual correlations reach 0.14 (with a bootstrap CI [0.12, 0.15]) for both items CM034Q01 and CM571Q01. We also find a large negative residual correlation for item CM423Q1: The point estimate is -0.12 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.12$$\end{document} , but the associated bootstrap CI [-0.13,-0.1] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-0.13, -0.1]$$\end{document} covers -0.1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.1$$\end{document} . Meanwhile, the RT-response dependencies are well explained between items: The off-diagonal statistics in the right panel of Fig. 2 ranges between -0.07 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.07$$\end{document} and 0.08.

Figure 2 Residual correlation statistics for the initial fitting of the semiparametric simple-structure model. Left: Residual correlations between two log-transformed response time (log-RT) variables. Middle: Residual correlations between two item/testlet responses. Right: Residuals between a log-RT variable and a item/test response, in which rows represent log-RT variables and columns represent responses. Positive residual correlations are shown in red, while negative residual correlations are shown in blue. A darker color indicates a larger magnitude (Color figure online).

Given the above findings, we conclude that a simple factor structure largely suffices for modeling the item responses and RT in the 2015 PISA mathematics data. For two out of 13 items (CM034Q01 and CM571Q01), however, the associations between item-level response speed and accuracy are not fully addressed by individual differences in general processing speed and ability. To be clear of adverse impact caused by unaccounted residual dependencies, we dropped the log-RT variables for items CM034Q1 and CM571Q01 while letting their responses stay, which results in a modified simple-structure model with m1=11 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_1 = 11$$\end{document} continuous MVs and m2=13 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m_2 = 13$$\end{document} discrete ones. Steps 1–3 (see Sect. 3.2) were repeated. The optimal λ(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(c)}$$\end{document} remains to be 10-4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-4}$$\end{document} , whereas the optimal λ(g) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(g)}$$\end{document} increases to 10-3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-3}$$\end{document} (see the second row of Fig. 1); the optimal λ(d)=10-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)} = 10^{-1}$$\end{document} is retained as no change has been made to the item response variables. There is no more large residual this time. The ranges of the residual correlations are [-0.04,0.08] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-0.04, 0.08]$$\end{document} among log-RT variables, [-0.03,0.02] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-0.03, 0.02]$$\end{document} among response variables, and [-0.11,0.08] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-0.11, 0.08]$$\end{document} across responses and log-RT. Similar to the initial fitting, the only residual correlation beyond ±0.1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm 0.1$$\end{document} is observed between the response and log-RT of item CM423Q1; however, the 90% bootstrap CI of the statistic is [-0.13,-0.09] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[-0.13, -0.09]$$\end{document} which contains -0.1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.1$$\end{document} . Therefore, we proceed to interpret the fitted densities based on the updated fitting.

4.2. Conditional Densities of Manifest Variables

Estimated conditional densities and means of the log-RT variables given the slowness factor are plotted in Fig. 3. Two major patterns are of interest here. First, although the high and low ends of the LV scale roughly map onto the longest and shortest RT for a majority of items/testlets, which justifies our decision to label the LV as “slowness”, the conditional mean function appears to decrease at the high end for all items/testlets except for CM442Q02, on which we impose the monotonicity constraints (Eq. 8). However, we often cannot distinguish the observed downward trend from a flat one due to large sampling variability, which is manifested by wider bootstrap confidence bands in those areas. For item CM603Q01 and testlet CM564, the downturn at the high end cannot be explained away by sampling variability. It implies that, among slow responders for the first nine items/testlets, the slower they respond to the first nine the faster they tend to response to the last two. The second observation concerns the dips in conditional mean functions when the latent slowness is between -1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document} and 0. Taking sampling variability into account, the dips are not substantial for CM603Q01 and CM564; also recall that the conditional mean function was forced to be non-decreasing for item CM442Q02. As such, the observed dips reflect a negative association between the above triplet and the remaining items/testlets for the subset of respondents whose latent slowness values fall slightly below average.

Figure 3 Estimated conditional densities and means for log-10 response time (RT) variables (rescaled to [0, 1]). Each panel corresponds to a single item/testlet. Conditional densities of manifest variables given the slowness factor are visualized as contours in gray. Estimated conditional means are superimposed as solid curves in black. Dotted lines represent 90% bootstrap confidence bands (CBs) for estimated conditional mean curves.

Per a referee’s request, we also examine the relationship between item-level RT and the ability factor. In our simple structure model, the log-RT variables Yij \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}$$\end{document} , j=1,,m1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j = 1,\dots , m_1$$\end{document} , do not directly load on the ability factor Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i2}^*$$\end{document} . Nevertheless, it remains possible to characterize the predictive distribution Yij|Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}|X_{i2}^*$$\end{document} by combining the conditional distribution of the slowness factor given the ability factor, i.e., Xi1|Xi2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1}^*|X_{i2}^*$$\end{document} , with the conditional distribution Yij|Xi1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij}|X_{i1}^*$$\end{document} (shown in Fig. 3). Such RT-ability associations turn out to be weak in the present data set; detailed results can be found in the supplementary document.

Estimated item/testlet response functions are displayed in Fig. 4. Due to the large penalty weight (i.e., 10-1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-1}$$\end{document} ), the fitted curves are smooth. For dichotomous items, the estimated curves for category 1 (i.e., correct answer) are largely in S-shape and typically have a restricted range (narrow than the entire interval [0, 1]). Similarly, estimated testlet response functions for the first and last categories also appear to have (often different) upper asymptotes. Some items, e.g., CM305Q01 and CM423Q01, are poorly discriminating, manifested by relatively flat IRFs.

Figure 4 Estimated conditional densities for discrete response variables, also known as item response functions (IRFs). Each panel corresponds to a single item/testlet. Curves for different categories are shown in different line types. Dotted lines represent 90% bootstrap confidence bands (CBs) for estimated IRFs.

4.3. Latent Density and Scores

A contour plot for the estimated two-dimensional LV density, which is computed from the estimated B-spline copula density with standard normal marginals (Eq. 13), is provided in the left panel of Fig. 5. It is observed that high ability respondents tend to response in a moderate speed, whereas low ability respondents can respond either very rapidly or very slowly. The shape of the density contours is nowhere near elliptical, which calls the standard practice of fitting a bivariate normal LV density into question. A better parameterization of the latent density for this data would be a mixture of two bivariate normals—one with a positive correlation for fast responders (i.e., slowness <0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<0$$\end{document} ) and the other with negative correlation for slow responders (i.e., slowness >0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$>0$$\end{document} ). A similar pattern is observed when we plot the ability EAP scores against the slowness EAP scores (right panel of Fig. 5), with an exception that EAP scores tend to be less variable than the true LVs.

Figure 5 Left: Estimated joint density for the slowness and ability factors (contours in gray) and the conditional mean of ability given slowness (black solid curve). Dotted lines represent 90% bootstrap confidence bands (CBs) for the estimated conditional mean curve. Right: Scatter plot for the expected a posteriori (EAP) scores of ability and slowness. A bivariate kernel density estimate (gray solid contours) and a smoothing spline regression line (black solid curve) are superimposed.

To better visualize the relationship between the two latent factors in the population, we also plot the conditional mean of ability given slowness (i.e., the black solid curve in the left panel of Fig. 5)—in other words, a nonlinear regression that predicts ability by slowness. The η2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta ^2$$\end{document} statistic (Eq. 31) of the population nonlinear regression is 0.45 with a 90% bootstrap CI [0.44, 0.47], indicating a strong association (Cohen, Reference Cohen1988, Chapter 9). Stated differently, knowing respondents’ processing speed on average reduces the uncertainty (measured by variance) in their mathematics ability by 45%. Recall that Zhan et al. (Reference Zhan, Liao and Bian2018) reported a correlation of -0.2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.2$$\end{document} between the speed (i.e., the reversal of slowness) and ability factors assuming bivariate normality, which implies η2=(-0.2)2=0.04 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta ^2=(-0.2)^2 = 0.04$$\end{document} . The divergent conclusion reached by Zhan et al. (Reference Zhan, Liao and Bian2018) is likely attributed to the restrictive parameterization of their measurement model: They forced the LV density to be bivariate normal and thus failed to capture the nonlinear relationship. In addition, a smoothing spline regression fitted to the EAP scores (i.e., the black sold curve in the right panel of Fig. 5) suggests a similar predictive relationship: The observed multiple R2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R^2$$\end{document} statistic is 0.54, even higher than the population η2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta ^2$$\end{document} .

As slowness/speed is a useful predictor of ability, it is anticipated that incorporating item-level RT information may improve the precision of IRT scale scores. Inspired by Bolsinova and Tijmstra (Reference Bolsinova and Tijmstra2018), we compare ability scores from the two-dimensional simple-structure model to those from the unidimensional semiparametric IRT model fitted to only responses in terms of their predictive precision (Sect. 2.5). It is first noted that the two sets of EAP scores are almost perfectly correlated (sample Pearson’s correlation > 0.99; see the left panel of Fig. 6). We then plot the predictive precisions associated with the two sets of EAP scores in the right panel of Fig. 5. Because the test is short and some items (e.g., items CM305Q01 and CM423Q01) have low discriminative power (manifested by flat item response functions), the predictive precisions are not high in general. Pooling across the entire sample, the mean predictive precision based on the unidimensional model is 4.68 with an interquartile range (IQR) [3.44, 5.71], and the median predictive precision based on the two-dimensional simple-structure model is 5.15 with an IQR [3.57, 6.45]. That is to say, using the two-dimensional model improves the predictive precision for ability scores by 10.1% on average.

Figure 6 Comparing expected a posteriori (EAP) scores for ability (left) and the associated predictive precisions (right) between the one-dimensional (1D) response-only model and the two-dimensional (2D) simple-structural model. In both panels, the dashed diagonal line in gray indicates equality.

To assess scoring precision at different slowness and ability levels, we split the sample into quintile groups by the slowness and ability EAP scores (from the two-dimensional model), respectively. A group-by-group summary of scoring precisions is provided in Table 3. When groups are formed by slowness scores, more increases in precision are typically observed in higher quintile groups; the percentage of improvement can be as high as 18.64% in the fifth quintile group. In contrast, the largest improvement is attained in the middle quintile group ( 15.89% \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$15.89\%$$\end{document} ) when groups are determined by ability scores; the one-dimensional ability scores in the fifth quintile group are almost as precise as the two-dimensional scores.

Table 3 Predictive precisions of ability scores in quintile groups.

Groups are determined by the slowness (left columns) and ability (right columns) scores computed from the two-dimensional simple-structure model.

Avg Prec: Average predictive precision within each group. 1D: One-dimensional model. 2D: Two-dimensional model.

5. Discussion

In the present paper, we perform a joint factor analysis for item response and RT data from the 2015 PISA mathematics assessment. In line with many previous studies that handled this type of data, our model features a simple factor structure with two LVs: The ability factor is indicated solely by item responses, the slowness factor is indicated solely by log-transformed RT variables, and the two LVs are permitted to covary in the population of respondents. The unique contribution of our work lies in the use of a semiparametric measurement model: We do not impose any restrictive functional forms of dependencies or distributional assumptions above and beyond the simple factor structure. Our model therefore fits the best to the data insofar as a simple factor structure is deemed proper. We approximate the functional parameters in the semiparametric factor model by cubic splines and estimate the resulting coefficients by PML: The penalty weights involved in the objective function are empirically selected via cross-validation. Inferences about model fit statistics and estimated functional parameters are conducted based on (nonparametric) bootstrap.

5.1. Implications

The semiparametric fitting reveals novel patterns that have yet been noticed in the existing literature, which has profound implications on the use of RT information in large-scale educational assessment.

First, a simple factor structure for ability and slowness fits reasonably well to the 2015 PISA mathematics data. Only two pairs of MVs exhibit excessive dependencies that are not well explained by the simple-structure model: Both pairs comprise the response and RT of the same item. Furthermore, including or excluding the RT variables of the two flagged pairs is inconsequential for model-based inferences. Our finding verifies the prevalent psychometric theory that between-person heterogeneity in item response behaviors are reflections of individual differences in ability and general processing speed. However, the existence of within-item local dependence between responses and RT, albeit not influential for the current analysis of the PISA data, should be reassessed in other applications of simple-structure factor models.

Second, commonly used parametric factor models are too simple to fully capture the MV-LV relations. Our semiparametric model implies that the conditional means of log-transformed RT variables are generally increasing but nonlinear functions of the slowness factor; the conditional variances appear to be non-constant for some items too. The most commonly used log-normal RT model, however, implies a linear conditional mean and a constant conditional variance and thus is evidently misspecified. As Liu and Wang (Reference Liu and Wang2022) also reported in that the log-normal RT model fits substantially worse than the semiparametric model in a different empirical example, cautions are advised in choose a suitable measurement model for item-level RT. Meanwhile, a large penalty weight is selected for the semiparametric IRT model, and consequently the fitted IRFs are smooth. While the shapes of the IRFs closely resemble logistic curves, the presence of lower and upper asymptotes hints at a 4PL model (Barton and Lord, Reference Barton and Lord1981), rather than the more popular 1PL and 2PL models in psychometric operations.

Third, the ability and slowness factors are strongly associated, which is probably the most surprising observation since a weak correlation was reported in Zhan et al.’s (Reference Zhan, Liao and Bian2018) analysis of the same data. The disparate finding of ours is ascribed to the use of a nonparametric latent density estimator, whereas the LV density is by default assumed to be (multivariate) normal in the vast majority of factor analysis applications. It then merely echoes a well-known fact that overly restrictive assumptions may lead to poorly fitting models and subsequently biased inferences. Diagnostics for non-normal LVs and measurement models equipped with non-parametric LV densities should be added to the routine toolbox for psychometricians. Future research is encouraged to examine the extent to which nonlinear factor models with non-normal latent densities can be beneficial in other assessment contexts.

Fourth, including item-level RT in the measurement model improves the precision of ability scores, which is an expected consequence as the ability factor can be well predicted by the slowness factor. While RT carries additional information about respondents’ ability, induced by the association between ability and general processing speed, it remains unclear whether RT should be officially used for scoring purposes in high-stake educational assessment. On the one hand, the joint factor model estimated in the present paper results in about 10% increase in predictive precisions for ability scores on average. Adaptive tests based on such a joint factor model may need fewer test items to reach the desired measurement precision, leading to more cost-effective test administrations. On the other hand, the same measurement model may no longer hold once the respondents are aware that response speed somehow affects their performance scores. In the latter case, a re-calibration of the joint factor model and a re-evaluation on the usefulness of RT information are necessary.

5.2. Limitations

There are also a number limitations to be addressed by future investigation.

First, the selection of penalty weights by multifold cross-validation is time consuming. A referee suggested that computing a one-sample estimate of cross-validation error (e.g., Akaike information criterion; AIC) or a large-sample approximation to the Bayesian marginal log-likelihood (e.g., Bayesian information criterion; BIC) is computationally advantageous. For nonparametric/semiparametric models using penalized smoothing splines, however, we must substitute a properly defined “effective degrees of freedom (edf)” for the number of parameters in the usual formulas of those information criteria. The ad hoc definition of edf proposed by Liu et al. (Reference Liu, Magnus and Thissen2016) for semiparametric IRT modeling can potentially be extended to the present context; however, the performance of the resulting information criteria in penalty weight selection remains unclear and should be investigated in future work.

Second, the sequential selection of multiple penalty weights does not guarantee that a globally optimal combination is found—it was only implemented as a workaround to alleviate the computational burden. Meanwhile, simultaneous selection on an outer-product grid (cf. Liu et al., Reference Liu, Magnus and Thissen2016) suffers from the “curse of dimensionality” and may be computationally inviable when the total number of penalty weights to be selected is large. Future research is encouraged to apply and evaluate optimization-based penalty weight selection, such as the “performance-oriented iteration” by Gu (Reference Gu1992), to semiparametric factor analysis. With the aid of optimization-based selection, it is also possible to explore the feasibility of selecting different penalty weights for different MVs, which further enhances the flexibility of the model.

Third, some of our decisions regarding locally dependent MVs can be refined. While coding each testlet response pattern as a unique category does not lead to any information loss, treating the summed RT within a testlet as a single MV does. In addition, we remove within-item local dependencies between responses and RT by simply excluding the RT variables. Although our treatments suffice for the purpose of the current analysis, it is natural to seek extensions of the proposed model to handle local dependencies in a more elegant way. In our opinion, the best strategy to approach a pair of locally dependent MVs is to directly model their bivariate conditional distribution given the LVs. For example, we may express the joint density of two log-RT variables, say Yij=y \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij} = y$$\end{document} and Yij=z \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_{ij'} = z$$\end{document} , given the latent slowness variable Xi1=x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{i1} = x$$\end{document} using a logistic density transform with a three-way fANOVA decomposition (Gu, Reference Gu1995, Reference Gu2013):

(34) f(y,z|x)exp(gy(y)+gz(z)+gxy(x,y)+gxz(x,z)+gyz(y,z)+gxyz(x,y,z)). \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f(y, z|x)\propto \exp \big (g^y(y) + g^z(z) + g^{xy}(x, y) + g^{xz}(x, z) + g^{yz}(y, z) + g^{xyz}(x, y, z) \big ). \end{aligned}$$\end{document}

Equation 34 involves six functional components, each of which can be approximated via basis expansion under suitable side conditions. Despite the straightforward formulation, simultaneous estimation of a large number of functional parameters proves to be computationally challenging.

Fourth, a referee made an important point that the residual correlation statistic (Eq. 25) only captures linear dependencies, which does not rule out the existence of nonlinear residual dependencies and is a major limitation of our diagnostic procedure. There exist various measures for nonlinear associations: Recent example include the Hellinger correlation (Geenens and Lafaye de Micheaux, Reference Geenens and Lafaye de Micheaux2022) and the Wasserstein dependence coefficient (Mordant & Segers, Reference Mordant and Segers2022; see also Chatterjee, Reference Chatterjee2022, for a review). However, those measures are often less intuitive to interpret as no common rules of thumb have been developed. As an alternative, one may fit an extended semiparametric factor model with bivariate conditional densities (Eq. 34) and identify nonlinear dependencies from graphical displays of estimated conditional densities.

Fifth, the proposed semiparametric factor model can be generalized in a number of ways. Sometimes, multiple latent constructs are simultaneously measured by an instrument (e.g., personality assessment); hence, a joint factor analysis of responses and RT for those measures involves at least three LVs. Such extensions of the current semiparametric simple-structure model suffers from a two-fold “curse of dimensionality”: The number of tensor-product basis functions grows exponentially when the dimension of a functional parameter’s domain increases, and the number of tensor-product quadrature points for likelihood approximation also increases exponentially as the dimension of LVs increases. While the EM algorithm with numerical quadrature can be replaced by stochastic approximation (Cai, Reference Cai2010a, Reference Caib; Gu and Kong, Reference Gu and Kong1998) to handle models with higher-dimensional LVs, reduced fANOVA parameterizations for conditional densities (Gu, Reference Gu1995, Reference Gu2013) and hierarchical formulations of B-spline copula (Kauermann et al., Reference Kauermann, Schellhase and Ruppert2013) are handy for constructing economical approximations of multivariate functional parameters.

Sixth, resampling based procedures (e.g., bootstrap) are time consuming even if parallel processing via OpenMP (Dagum and Menon, Reference Dagum and Menon1998) is enabled in the current implementation of PML estimation. For parametric models, inferential procedures based on large-sample approximations fares more computationally efficient. However, it is generally more difficult to prove large-sample results for semiparametric/nonparametric models as the functional parameters are infinite dimensional. Theoretical foundations on the asymptotic theory for semiparametric/nonparametric measurement models have yet been established and are left for future research.

Last but not least, we emphasize that semiparametric approaches are better suited for analyses that are exploratory and data-driven in nature. There are also scenarios in which confirmatory and theory-driven model building is preferred: For instance, when the test is designed based on cognitive theory and administered in a controlled laboratory setting (e.g., the well known “mental rotation” example in the RT literature; Borst et al., Reference Borst, Kievit, Thompson and Kosslyn2011). One prominent example of theory-driven psychometrics is the integration of diffusion decision models with factor analysis (e.g., Kang et al., Reference Kang, Boeck and Ratcliff2022, Reference Kang, Jeon and Partchev2023a,Reference Kang, Molenaar and Ratcliffb). Data-driven semiparametric models and theory-driven parametric models are both important yet mutually distinct tools to advance psychometricians’ understanding in the role of processing speed in test-taking behavior.

Funding

The work is sponsored by the National Science Foundation under grant No. 1826535.

Declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The dataset analyzed during the current study is available in the OECD PISA Database (https://www.oecd.org/pisa/data/).

Footnotes

The manuscript was handled by the ARCS Editor Dr. Nidhi Kohli.

1 With a slight abuse of terminology, both probability density functions for continuous random variables and probability mass functions for discrete random variables are referred to as densities.

2 Slowness is the reversal of speed. We abide by the convention that the LV is positively associated with the MV.

3 For simplicity, the same set of basis functions is used for the LVs in Eqs. 6, 9, and 14.

4 Zhan et al. (Reference Zhan, Liao and Bian2018) did not delete any extreme RT entries in their analysis. They performed Bayesian estimation with a somewhat informative prior configuration, which is presumably more stable in the presence of outlying observations.

5 Note that this scoring function was also applied before computing the item-total correlation statistics in Table 2.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

References

Abrahamowicz, M., Ramsay, J. O.. (1992). Multicategorical spline model for item response theory. Psychometrika, 57(1), 527.CrossRefGoogle Scholar
Barton, M. A., Lord, F. M.. (1981). An upper asymptote for the three-parameter logistic item-response model. ETS Research Report Series, 1981(1), 18.CrossRefGoogle Scholar
Bauer, D. J.. (2005). A semiparametric approach to modeling nonlinear relations among latent variables. Structural Equation Modeling, 12(4), 513535.CrossRefGoogle Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. Statistical theories of mental test scores.Google Scholar
Bock, R. D., Aitkin, M.. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443459.CrossRefGoogle Scholar
Bolsinova, M., De Boeck, P., Tijmstra, J.. (2017). Modelling conditional dependence between response time and accuracy. Psychometrika, 82(4), 11261148.CrossRefGoogle ScholarPubMed
Bolsinova, M., Maris, G.. (2016). A test for conditional independence between response time and accuracy. British Journal of Mathematical and Statistical Psychology, 69(1), 6279.CrossRefGoogle ScholarPubMed
Bolsinova, M., Molenaar, D.. (2018). Modeling nonlinear conditional dependence between response time and accuracy. Frontiers in Psychology, 9, 1525.CrossRefGoogle ScholarPubMed
Bolsinova, M., Tijmstra, J.. (2016). Posterior predictive checks for conditional independence between response time and accuracy. Journal of Educational and Behavioral Statistics, 41(2), 123145.CrossRefGoogle Scholar
Bolsinova, M., Tijmstra, J.. (2018). Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology, 71(1), 1338.CrossRefGoogle ScholarPubMed
Bolsinova, M., Tijmstra, J., Molenaar, D.. (2017). Response moderation models for conditional dependence between response time and response accuracy. British Journal of Mathematical and Statistical Psychology, 70(2), 257279.CrossRefGoogle ScholarPubMed
Borst, G., Kievit, R. A., Thompson, W. L., Kosslyn, S. M.. (2011). Mental rotation is not easily cognitively penetrable. Journal of Cognitive Psychology, 23(1), 6075.CrossRefGoogle Scholar
Cai, L.. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 3357.CrossRefGoogle Scholar
Cai, L.. (2010). Metropolis–Hastings Robbins–Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307335.CrossRefGoogle Scholar
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge University Press.CrossRefGoogle Scholar
Chatterjee, S. (2022). A survey of some recent developments in measures of association. arXiv preprint arXiv:2211.04702 .Google Scholar
Chen, Y., Yang, Y.. (2021). The one standard error rule for model selection: Does it work?. Stats, 4(4), 868892.CrossRefGoogle Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Lawrence Erlbaum Associates.Google Scholar
Currie, I. D., Durban, M., Eilers, P. H.. (2006). Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 259280.CrossRefGoogle Scholar
Dagum, L., Menon, R.. (1998). OpenMP: An industry standard API for shared-memory programming. IEEE Computational Science and Engineering, 5(1), 4655.CrossRefGoogle Scholar
De Boeck, P., Jeon, M.. (2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, 102.CrossRefGoogle ScholarPubMed
De Boor, C.. (1978). A practical guide to splines. Berlin: Springer.CrossRefGoogle Scholar
Dempster, A. P., Laird, N. M., Rubin, D. B.. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 122.CrossRefGoogle Scholar
Deribo, T., Kroehne, U., Goldhammer, F.. (2021). Model-based treatment of rapid guessing. Journal of Educational Measurement, 58(2), 281303.CrossRefGoogle Scholar
Dou, X., Kuriki, S., Lin, G. D., Richards, D.. (2021). Dependence properties of b-spline copulas. Sankhya A, 83(1), 283311.CrossRefGoogle Scholar
Efron, B., & Tibshirani, R. (1994). An introduction to the bootstrap. Taylor & Francis.CrossRefGoogle Scholar
Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical science, 89–102.CrossRefGoogle Scholar
Falk, C. F., Cai, L.. (2016). Maximum marginal likelihood estimation of a monotonic polynomial generalized partial credit model with applications to multiple group analysis. Psychometrika, 81(2), 434460.CrossRefGoogle ScholarPubMed
Falk, C. F., Cai, L.. (2016). Semiparametric item response functions in the context of guessing. Journal of Educational Measurement, 53(2), 229247.CrossRefGoogle Scholar
Finn, B.. (2015). Measuring motivation in low-stakes assessments. ETS Research Report Series, 2015(2), 117.CrossRefGoogle Scholar
Geenens, G., Lafaye de Micheaux, P.. (2022). The hellinger correlation. Journal of the American Statistical Association, 117(538), 639653.CrossRefGoogle Scholar
Glas, C. A., van der Linden, W. J.. (2010). Marginal likelihood inference for a model for item responses and response times. British Journal of Mathematical and Statistical Psychology, 63(3), 603626.CrossRefGoogle Scholar
Goldhammer, F. (2015). Measuring ability, speed, or both? challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13(3–4), 133–164.Google Scholar
Gu, C.. (1992). Cross-validating non-Gaussian data. Journal of Computational and Graphical Statistics, 1(2), 169179.CrossRefGoogle Scholar
Gu, C. (1995). Smoothing spline density estimation: Conditional distribution. Statistica Sinica, 709–726.Google Scholar
Gu, C. (2013). Smoothing spline ANOVA models. Springer.CrossRefGoogle Scholar
Gu, M. G., Kong, F. H.. (1998). A stochastic approximation algorithm with Markov chain Monte-Carlo method for incomplete data estimation problems. Proceedings of the National Academy of Sciences, 95(13), 72707274.CrossRefGoogle ScholarPubMed
Gulliksen, H.. (1950). Theory of mental tests. London: Wiley.CrossRefGoogle Scholar
Hastie, T., Tibshirani, R., Friedman, J.. (2009). The elements of statistical learning: Data mining, inference, and prediction, 2Berlin: Springer.CrossRefGoogle Scholar
Jöreskog, K. G.. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2), 183202.CrossRefGoogle Scholar
Kang, H.-A.. (2017). Penalized partial likelihood inference of proportional hazards latent trait models. British Journal of Mathematical and Statistical Psychology, 70(2), 187208.CrossRefGoogle ScholarPubMed
Kang, I., De Boeck, P., & Ratcliff, R. (2022). Modeling conditional dependence of response accuracy and response time with the diffusion item response theory model. Psychometrika, 1–24.CrossRefGoogle Scholar
Kang, I., Jeon, M., & Partchev, I. (2023). A latent space diffusion item response theory model to explore conditional dependence between responses and response times. Psychometrika, 1–35.CrossRefGoogle Scholar
Kang, I., Molenaar, D., & Ratcliff, R. (2023). A modeling framework to examine psychological processes underlying ordinal responses and response times of psychometric data. Psychometrika, 1–35.Google Scholar
Kauermann, G., Schellhase, C., Ruppert, D.. (2013). Flexible copula density estimation with penalized hierarchical b-splines. Scandinavian Journal of Statistics, 40(4), 685705.CrossRefGoogle Scholar
Kyllonen, P. C., Zu, J.. (2016). Use of response time for measuring cognitive ability. Journal of Intelligence, 4(4), 14.CrossRefGoogle Scholar
Lee, Y.-H., Chen, H.. (2011). A review of recent response-time analyses in educational testing. Psychological Test and Assessment Modeling, 53(3), 359.Google Scholar
Lee, Y.-H., & Jia, Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer-based study. Large-Scale Assessments in Education, 2(1), 1–24.CrossRefGoogle Scholar
Liu, Y., Magnus, B. E., Thissen, D.. (2016). Modeling and testing differential item functioning in unidimensional binary item response models with a single continuous covariate: A functional data analysis approach. Psychometrika, 81, 371398.CrossRefGoogle ScholarPubMed
Liu, Y., Wang, W.. (2022). Semiparametric factor analysis for item-level response time data. Psychometrika, 87(2), 666692.CrossRefGoogle ScholarPubMed
Liu, Y., & Yang, J. S. (2018a). Bootstrap-calibrated interval estimates for latent variable scores in item response theory. Psychometrika, 83(2), 333–354.CrossRefGoogle Scholar
Liu, Y., & Yang, J. S. (2018). Interval estimation of latent variable scores in item response theory. Journal of Educational and Behavioral Statistics, 43(3), 259–285.CrossRefGoogle Scholar
Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization. Oxford University Press.Google Scholar
McDonald, R. P.. (1982). Linear versus models in item response theory. Applied Psychological Measurement, 6(4), 379396.CrossRefGoogle Scholar
Meng, X.-B., Tao, J., Chang, H.-H.. (2015). A conditional joint modeling approach for locally dependent item responses and response times. Journal of Educational Measurement, 52(1), 127.CrossRefGoogle Scholar
Molenaar, D., Tuerlinckx, F., van der Maas, H. L.. (2015). A bivariate generalized linear item response theory modeling framework to the analysis of responses and response times. Multivariate Behavioral Research, 50(1), 5674.CrossRefGoogle Scholar
Molenaar, D., Tuerlinckx, F., van der Maas, H. L.. (2015). A generalized linear factor model approach to the hierarchical framework for responses and response times. British Journal of Mathematical and Statistical Psychology, 68(2), 197219.CrossRefGoogle Scholar
Mordant, G., Segers, J.. (2022). Measuring dependence between random vectors via optimal transport. Journal of Multivariate Analysis, 189.CrossRefGoogle Scholar
Nelsen, R. B.. (2006). An introduction to copulas. Berlin: Springer.Google Scholar
Nocedal, J., Wright, S.. (2006). Numerical optimization. New York: Springer.Google Scholar
OECD. (2016). PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy. Paris: PISA, OECD Publishing.CrossRefGoogle Scholar
Pek, J., Sterba, S. K., Kok, B. E., Bauer, D. J.. (2009). Estimating and visualizing nonlinear relations among latent variables: A semiparametric approach. Multivariate Behavioral Research, 44(4), 407436.CrossRefGoogle ScholarPubMed
Qian, H., Staniewska, D., Reckase, M., Woo, A.. (2016). Using response time to detect item preknowledge in computer-based licensure examinations. Educational Measurement: Issues and Practice, 35(1), 3847.CrossRefGoogle Scholar
Ramsay, J. O., Winsberg, S.. (1991). Maximum marginal likelihood estimation for semiparametric item analysis. Psychometrika, 56(3), 365379.CrossRefGoogle Scholar
Ranger, J., Kuhn, J.-T.. (2012). A flexible latent trait model for response times in tests. Psychometrika, 77, 3147.CrossRefGoogle Scholar
Ranger, J., Ortner, T.. (2012). The case of dependency of responses and response times: A modeling approach based on standard latent trait models. Psychological Test and Assessment Modeling, 54(2), 128.Google Scholar
Rossi, N., Wang, X., Ramsay, J. O.. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27(3), 291317.CrossRefGoogle Scholar
Sinharay, S.. (2020). Detection of item preknowledge using response times. Applied Psychological Measurement, 44(5), 376392.CrossRefGoogle ScholarPubMed
Sinharay, S., Johnson, M. S.. (2020). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73(3), 397419.CrossRefGoogle ScholarPubMed
Sklar, M.. (1959). Fonctions de répartition àn dimensions et leurs marges. Publications de l’Institut de statistique de l’Université de Paris, 8, 229231.Google Scholar
Thissen, D., & Wainer, H. (2001). Test scoring. Taylor & Francis.CrossRefGoogle Scholar
Thorndike, E. L., Bregman, E. O., Cobb, M. V., & Woodyard, E. (1926). The measurement of intelligence. Teachers College Bureau of Publications.CrossRefGoogle Scholar
Thurstone, L. L.. (1937). Ability, motivation, and speed. Psychometrika, 2(4), 249254.CrossRefGoogle Scholar
van der Linden, W. J.. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287308.CrossRefGoogle Scholar
van der Linden, W. J., Glas, C. A.. (2010). Statistical tests of conditional independence between responses and/or response times on test items. Psychometrika, 75(1), 120139.CrossRefGoogle Scholar
van der Linden, W. J., Klein Entink, R. H., Fox, J.-P.. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327347.CrossRefGoogle Scholar
van der Linden, W. J., Scrams, D. J., Schnipke, D. L.. (1999). Using response-time constraints to control for differential speededness in computerized adaptive testing. Applied Psychological Measurement, 23(3), 195210.CrossRefGoogle Scholar
von Davier, M., Khorramdel, L., He, Q., Shin, H. J., Chen, H.. (2019). Developments in psychometric population models for technology-based large-scale assessments: An overview of challenges and opportunities. Journal of Educational and Behavioral Statistics, 44(6), 671705.CrossRefGoogle Scholar
Wang, C., Chang, H.-H., Douglas, J. A.. (2013). The linear transformation model with frailties for the analysis of item response times. British Journal of Mathematical and Statistical Psychology, 66(1), 144168.CrossRefGoogle ScholarPubMed
Wang, C., Fan, Z., Chang, H.-H., Douglas, J. A.. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381417.CrossRefGoogle Scholar
Wise, S. L.. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 5261.CrossRefGoogle Scholar
Wise, S. L., Kong, X.. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163183.CrossRefGoogle Scholar
Woods, C. M., Lin, N.. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33(2), 102117.CrossRefGoogle Scholar
Yang, J. S., Hansen, M., Cai, L.. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72(2), 264290.CrossRefGoogle Scholar
Zhan, P., Liao, M., Bian, Y.. (2018). Joint testlet cognitive diagnosis modeling for paired local item dependence in response times and response accuracy. Frontiers in Psychology, 9, 607.CrossRefGoogle ScholarPubMed
Zhang, D., & Davidian, M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57(3), 795–802.CrossRefGoogle Scholar
Zhang, X., Wang, C., Weiss, D. J., Tao, J.. (2021). Bayesian inference for IRT models with non-normal latent trait distributions. Multivariate Behavioral Research, 56(5), 703723.CrossRefGoogle ScholarPubMed
Figure 0

Table 1 Descriptive statistics for transformed response time (RT).

Figure 1

Table 2 Descriptive statistics for item responses.

Figure 2

Figure 1 Empirical risks (Eq. 23) and standard errors (SE; Eq. 24). The rows of the graphical table correspond to the initial fitting (with all items) and the updated fitting (without the response time of CM034Q01 and CM571Q01). The columns represent the three stages of penalty weight selection (see Sect. 2.3). Within each panel, empirical risk values are plotted as functions of based-10 log-transformed penalty weights. Vertical bars indicate one SE above and below the empirical risk. The minimized empirical risks are shown as circles, while the optimal solutions determined by the “one SE rule” were highlighted as filled dots. The band formed by two horizontal dashed lines indicates the one-SE region associated with the minimum empirical risk. Note that the two graphs in the second column are identical: This is because all item responses are retained, and thus we do not need to re-select λ(d)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document}. LD: Local dependence. λ(c)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(c)}$$\end{document}, λ(d)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(d)}$$\end{document}, λ(g)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{(g)}$$\end{document}: Penalty weights for continuous manifest variables (MVs), discrete MVs, and the latent density.

Figure 3

Figure 2 Residual correlation statistics for the initial fitting of the semiparametric simple-structure model. Left: Residual correlations between two log-transformed response time (log-RT) variables. Middle: Residual correlations between two item/testlet responses. Right: Residuals between a log-RT variable and a item/test response, in which rows represent log-RT variables and columns represent responses. Positive residual correlations are shown in red, while negative residual correlations are shown in blue. A darker color indicates a larger magnitude (Color figure online).

Figure 4

Figure 3 Estimated conditional densities and means for log-10 response time (RT) variables (rescaled to [0, 1]). Each panel corresponds to a single item/testlet. Conditional densities of manifest variables given the slowness factor are visualized as contours in gray. Estimated conditional means are superimposed as solid curves in black. Dotted lines represent 90% bootstrap confidence bands (CBs) for estimated conditional mean curves.

Figure 5

Figure 4 Estimated conditional densities for discrete response variables, also known as item response functions (IRFs). Each panel corresponds to a single item/testlet. Curves for different categories are shown in different line types. Dotted lines represent 90% bootstrap confidence bands (CBs) for estimated IRFs.

Figure 6

Figure 5 Left: Estimated joint density for the slowness and ability factors (contours in gray) and the conditional mean of ability given slowness (black solid curve). Dotted lines represent 90% bootstrap confidence bands (CBs) for the estimated conditional mean curve. Right: Scatter plot for the expected a posteriori (EAP) scores of ability and slowness. A bivariate kernel density estimate (gray solid contours) and a smoothing spline regression line (black solid curve) are superimposed.

Figure 7

Figure 6 Comparing expected a posteriori (EAP) scores for ability (left) and the associated predictive precisions (right) between the one-dimensional (1D) response-only model and the two-dimensional (2D) simple-structural model. In both panels, the dashed diagonal line in gray indicates equality.

Figure 8

Table 3 Predictive precisions of ability scores in quintile groups.

Supplementary material: File

Liu and Wang supplementary material

Liu and Wang supplementary material
Download Liu and Wang supplementary material(File)
File 1.3 MB