We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In the present paper a model for describing dynamic processes is constructed by combining the common Rasch model with the concept of structurally incomplete designs. This is accomplished by mapping each item on a collection of virtual items, one of which is assumed to be presented to the respondent dependent on the preceding responses and/or the feedback obtained. It is shown that, in the case of subject control, no unique conditional maximum likelihood (CML) estimates exist, whereas marginal maximum likelihood (MML) proves a suitable estimation procedure. A hierarchical family of dynamic models is presented, and it is shown how to test special cases against more general ones. Furthermore, it is shown that the model presented is a generalization of a class of mathematical learning models, known as Luce's beta-model.
Consider an old test X consisting of s sections and two new tests Y and Z similar to X consisting of p and q sections respectively. All subjects are given test X plus two variable sections from either test Y or Z. Different pairings of variable sections are given to each subsample of subjects. We present a method of estimating the covariance matrix of the combined test (X1, ..., Xs, Y1, ..., Yp, Z1, ..., Zq) and describe an application of these estimation techniques to linear, observed-score, test equating.
The posterior distribution of the bivariate correlation is analytically derived given a data set where x is completely observed but y is missing at random for a portion of the sample. Interval estimates of the correlation are then constructed from the posterior distribution in terms of highest density regions (HDRs). Various choices for the form of the prior distribution are explored. For each of these priors, the resulting Bayesian HDRs are compared with each other and with intervals derived from maximum likelihood theory.
Standard procedures for drawing inferences from complex samples do not apply when the variable of interest θ cannot be observed directly, but must be inferred from the values of secondary random variables that depend on θ stochastically. Examples are proficiency variables in item response models and class memberships in latent class models. Rubin's “multiple imputation” techniques yield approximations of sample statistics that would have been obtained, had θ been observable, and associated variance estimates that account for uncertainty due to both the sampling of respondents and the latent nature of θ. The approach is illustrated with data from the National Assessment for Educational Progress.
Missing data occur in many real world studies. Knowing the type of missing mechanisms is important for adopting appropriate statistical analysis procedure. Many statistical methods assume missing completely at random (MCAR) due to its simplicity. Therefore, it is necessary to test whether this assumption is satisfied before applying those procedures. In the literature, most of the procedures for testing MCAR were developed under normality assumption which is sometimes difficult to justify in practice. In this paper, we propose a nonparametric test of MCAR for incomplete multivariate data which does not require distributional assumptions. The proposed test is carried out by comparing the distributions of the observed data across different missing-pattern groups. We prove that the proposed test is consistent against any distributional differences in the observed data. Simulation shows that the proposed procedure has the Type I error well controlled at the nominal level for testing MCAR and also has good power against a variety of non-MCAR alternatives.
Mediation analysis constitutes an important part of treatment study to identify the mechanisms by which an intervention achieves its effect. Structural equation model (SEM) is a popular framework for modeling such causal relationship. However, current methods impose various restrictions on the study designs and data distributions, limiting the utility of the information they provide in real study applications. In particular, in longitudinal studies missing data is commonly addressed under the assumption of missing at random (MAR), where current methods are unable to handle such missing data if parametric assumptions are violated.
In this paper, we propose a new, robust approach to address the limitations of current SEM within the context of longitudinal mediation analysis by utilizing a class of functional response models (FRM). Being distribution-free, the FRM-based approach does not impose any parametric assumption on data distributions. In addition, by extending the inverse probability weighted (IPW) estimates to the current context, the FRM-based SEM provides valid inference for longitudinal mediation analysis under the two most popular missing data mechanisms; missing completely at random (MCAR) and missing at random (MAR). We illustrate the approach with both real and simulated data.
In this paper, the constrained maximum likelihood estimation of a two-level covariance structure model with unbalanced designs is considered. The two-level model is reformulated as a single-level model by treating the group level latent random vectors as hypothetical missing-data. Then, the popular EM algorithm is extended to obtain the constrained maximum likelihood estimates. For general nonlinear constraints, the multiplier method is used at the M-step to find the constrained minimum of the conditional expectation. An accelerated EM gradient procedure is derived to handle linear constraints. The empirical performance of the proposed EM type algorithms is illustrated by some artifical and real examples.
A general approach for fitting a model to a data matrix by weighted least squares (WLS) is studied. This approach consists of iteratively performing (steps of) existing algorithms for ordinary least squares (OLS) fitting of the same model. The approach is based on minimizing a function that majorizes the WLS loss function. The generality of the approach implies that, for every model for which an OLS fitting algorithm is available, the present approach yields a WLS fitting algorithm. In the special case where the WLS weight matrix is binary, the approach reduces to missing data imputation.
Situations sometimes arise in which variables collected in a study are not jointly observed. This typically occurs because of study design. An example is an equating study where distinct groups of subjects are administered different sections of a test. In the normal maximum likelihood function to estimate the covariance matrix among all variables, elements corresponding to those that are not jointly observed are unidentified. If a factor analysis model holds for the variables, however, then all sections of the matrix can be accurately estimated, using the fact that the covariances are a function of the factor loadings. Standard errors of the estimated covariances can be obtained by the delta method. In addition to estimating the covariance matrix in this design, the method can be applied to other problems such as regression factor analysis. Two examples are presented to illustrate the method.
Existing test statistics for assessing whether incomplete data represent a missing completely at random sample from a single population are based on a normal likelihood rationale and effectively test for homogeneity of means and covariances across missing data patterns. The likelihood approach cannot be implemented adequately if a pattern of missing data contains very few subjects. A generalized least squares rationale is used to develop parallel tests that are expected to be more stable in small samples. Three factors were varied for a simulation: number of variables, percent missing completely at random, and sample size. One thousand data sets were simulated for each condition. The generalized least squares test of homogeneity of means performed close to an ideal Type I error rate for most of the conditions. The generalized least squares test of homogeneity of covariance matrices and a combined test performed quite well also.
Time limits are imposed on many computer-based assessments, and it is common to observe examinees who run out of time, resulting in missingness due to not-reached items. The present study proposes an approach to account for the missing mechanisms of not-reached items via response time censoring. The censoring mechanism is directly incorporated into the observed likelihood of item responses and response times. A marginal maximum likelihood estimator is proposed, and its asymptotic properties are established. The proposed method was evaluated and compared to several alternative approaches that ignore the censoring through simulation studies. An empirical study based on the PISA 2018 Science Test was further conducted.
Measures of agreement are used in a wide range of behavioral, biomedical, psychosocial, and health-care related research to assess reliability of diagnostic test, psychometric properties of instrument, fidelity of psychosocial intervention, and accuracy of proxy outcome. The concordance correlation coefficient (CCC) is a popular measure of agreement for continuous outcomes. In modern-day applications, data are often clustered, making inference difficult to perform using existing methods. In addition, as longitudinal study designs become increasingly popular, missing data have become a serious issue, and the lack of methods to systematically address this problem has hampered the progress of research in the aforementioned fields. In this paper, we develop a novel approach to tackle the complexities involved in addressing missing data and other related issues for performing CCC analysis within a longitudinal data setting. The approach is illustrated with both real and simulated data.
In knowledge space theory, existing adaptive assessment procedures can only be applied when suitable estimates of their parameters are available. In this paper, an iterative procedure is proposed, which upgrades its parameters with the increasing number of assessments. The first assessments are run using parameter values that favor accuracy over efficiency. Subsequent assessments are run using new parameter values estimated on the incomplete response patterns from previous assessments. Parameter estimation is carried out through a new probabilistic model for missing-at-random data. Two simulation studies show that, with the increasing number of assessments, the performance of the proposed procedure approaches that of gold standards.
A discussion of alternative constraint systems has been lacking in the literature on correspondence analysis and related techniques. This paper reiterates earlier results that an explicit choice of constraints has to be made which can have important effects on the resulting scores. The paper also presents new results on dealing with missing data and probabilistic category assignment.
Unless data are missing completely at random (MCAR), proper methodology is crucial for the analysis of incomplete data. Consequently, methods for effectively testing the MCAR mechanism become important, and procedures were developed via testing the homogeneity of means and variances–covariances across the observed patterns (e.g., Kim & Bentler in Psychometrika 67:609–624, 2002; Little in J Am Stat Assoc 83:1198–1202, 1988). The current article shows that the population counterparts of the sample means and covariances of a given pattern of the observed data depend on the underlying structure that generates the data, and the normal-distribution-based maximum likelihood estimates for different patterns of the observed sample can converge to the same values even when data are missing at random or missing not at random, although the values may not equal those of the underlying population distribution. The results imply that statistics developed for testing the homogeneity of means and covariances cannot be safely used for testing the MCAR mechanism even when the population distribution is multivariate normal.
A maximum likelihood method of estimating the parameters of the multiple factor model when data are missing from the sample is presented. A Monte Carlo study compares the method with 5 heuristic methods of dealing with the problem. The present method shows some advantage in accuracy of estimation over the heuristic methods but is considerably more costly computationally.
The validity of a test is often estimated in a nonrandom sample of selected individuals. To accurately estimate the relation between the predictor and the criterion we correct this correlation for range restriction. Unfortunately, this corrected correlation cannot be transformed using Fisher's Z transformation, and asymptotic tests of hypotheses based on small or moderate samples are not accurate. We developed a Fisher r to Z transformation for the corrected correlation for each of two conditions: (a) the criterion data were missing due to selection on the predictor (the missing data were MAR); and (b) the criterion was missing at random, not due to selection (the missing data were MCAR). The two Z transformations were evaluated in a computer simulation. The transformations were accurate, and tests of hypotheses and confidence intervals based on the transformations were superior to those that were not based on the transformations.
Standard procedures for estimating item parameters in item response theory (IRT) ignore collateral information that may be available about examinees, such as their standing on demographic and educational variables. This paper describes circumstances under which collateral information about examinees may be used to make inferences about item parameters more precise, and circumstances under which it must be used to obtain correct inferences.
The existing maximum likelihood theory and its computer software in structural equation modeling are established based on linear relationships among manifest variables and latent variables. However, models with nonlinear relationships are often encountered in social and behavioral sciences. In this article, an EM type algorithm is developed for maximum likelihood estimation of a general nonlinear structural equation model. To avoid computation of the complicated multiple integrals involved, the E-step is completed by a Metropolis-Hastings algorithm. It is shown that the M-step can be completed efficiently by simple conditional maximization. Standard errors of the maximum likelihood estimates are obtained via Louis's formula. The methodology is illustrated with results from a simulation study and two real examples.
Weekly cycles in emotion were examined by combining item response modeling and spectral analysis approaches in an analysis of 179 college students' reports of daily emotions experienced over 7 weeks. We addressed the measurement of emotion using an item response model. Spectral analysis and multilevel sinusoidal models were used to identify interindividual differences in intraindividual cyclic change. Simulations and incomplete data designs were used to examine how well this combination of analysis techniques might work when applied to other practical data problems. Empirically, we found systematic individual differences in the extent to which individuals' emotions follow a weekly cycle, and in how such cycles are exhibited. Weekly cycles accounted for very little variance in day to day emotions at the individual level. Analytically, we illustrate how measurement, change, and interindividual difference models from different traditions may be combined in a practical manner to describe some of the complexities of human behavior.