A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

Yeojin Chung; Sophia Rabe-Hesketh; Vincent Dorie; Andrew Gelman; Jingchen Liu

doi:10.1007/s11336-013-9328-2

A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

Published online by Cambridge University Press: 01 January 2025

Yeojin Chung ,

Sophia Rabe-Hesketh ,

Vincent Dorie ,

Andrew Gelman and

Jingchen Liu

Show author details

Yeojin Chung*: Affiliation:
School of Business Administration, Kookmin University
Sophia Rabe-Hesketh: Affiliation:
Graduate School of Education, University of California, Berkeley Institute of Education, University of London
Vincent Dorie: Affiliation:
Department of Statistics, Columbia University
Andrew Gelman: Affiliation:
Department of Statistics, Columbia University
Jingchen Liu: Affiliation:
Department of Statistics, Columbia University
*: Requests for reprints should be sent to Yeojin Chung, School of Business Administration, Kookmin University, Seoul, South Korea. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Group-level variance estimates of zero often arise when fitting multilevel or hierarchical linear models, especially when the number of groups is small. For situations where zero variances are implausible a priori, we propose a maximum penalized likelihood approach to avoid such boundary estimates. This approach is equivalent to estimating variance parameters by their posterior mode, given a weakly informative prior distribution. By choosing the penalty from the log-gamma family with shape parameter greater than 1, we ensure that the estimated variance will be positive. We suggest a default log-gamma(2,λ) penalty with λ→0, which ensures that the maximum penalized likelihood estimate is approximately one standard error from zero when the maximum likelihood estimate is zero, thus remaining consistent with the data while being nondegenerate. We also show that the maximum penalized likelihood estimator with this default penalty is a good approximation to the posterior median obtained under a noninformative prior.

Our default method provides better estimates of model parameters and standard errors than the maximum likelihood or the restricted maximum likelihood estimators. The log-gamma family can also be used to convey substantive prior information. In either case—pure penalization or prior information—our recommended procedure gives nondegenerate estimates and in the limit coincides with maximum likelihood as the number of groups increases.

Keywords

Bayes modal estimation hierarchical linear model mixed model multilevel model penalized likelihood variance estimation weakly informative prior

Type: Original Paper
Information: Psychometrika , Volume 78 , Issue 4 , October 2013 , pp. 685 - 709

DOI: https://doi.org/10.1007/s11336-013-9328-2 [Opens in a new window]
Copyright: Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Alderman, D., & Powers, D. (1980). The effects of special preparation on SAT-verbal scores. American Educational Research Journal, 17(2), 239–251.CrossRef Google Scholar

Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. R. package version 0.999375-37.Google Scholar

Bell, W. (1999). Accounting for uncertainty about variances in small area estimation. In Bulletin of the International Statistical Institute, 52nd session, Helsinki.Google Scholar

Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Chichester: Wiley.CrossRef Google Scholar

Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B, 26(2), 211–252.CrossRef Google Scholar

Browne, W., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1(3), 473–514.CrossRef Google Scholar

Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Skandinavian Journal of Statistics, 30(1), 45–59.CrossRef Google Scholar

Crainiceanu, C., & Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society. Series B, 66(1), 165–185.CrossRef Google Scholar

Crainiceanu, C., Ruppert, D., & Vogelsang, T. (2003). Some properties of likelihood ratio tests in linear mixed models (Technical report). Available at http://www.orie.cornell.edu/~davidr/papers.Google Scholar

Curcio, D., & Verde, P. (2011). Comment on: Efficacy and safety of tigecycline: a systematic review and meta-analysis. Journal of Antimicrobial Chemotherapy, 66(12), 2893–2895.CrossRef Google Scholar PubMed

DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188.CrossRef Google Scholar PubMed

Dorie, V. (2013). Mixed methods for mixed models: Bayesian point estimation and classical uncertainty measures in multilevel models. PhD thesis, Columbia University.Google Scholar

Dorie, V., Liu, J., & Gelman, A. (2013). Bridging between point estimation and Bayesian inference for generalized linear models (Technical report). Department of Statistics, Columbia University.Google Scholar

Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. Series B, 57(1), 45–97.CrossRef Google Scholar

Drum, M., & McCullagh, P. (1993). [Regression models for discrete longitudinal responses]: comment. Statistical Science, 8(3), 300–301.CrossRef Google Scholar

Fay, R.E., & Herriot, R.A. (1979). Estimates of income for small places: an application of James–Stein procedures to census data. Journal of the American Statistical Association, 74(366), 269–277.CrossRef Google Scholar

Fu, J., & Gleser, L. (1975). Classical asymptotic properties of a certain estimator related to the maximum likelihood estimator. Annals of the Institute of Statistical Mathematics, 27(1), 213–233.CrossRef Google Scholar

Galindo-Garre, F., & Vermunt, J. (2006). Avoiding boundary estimates in latent class analysis by Bayesian posterior mode estimation. Behaviormetrika, 33(1), 43–59.CrossRef Google Scholar

Galindo-Garre, F., Vermunt, J., & Bergsma, W. (2004). Bayesian posterior mode estimation of logit parameters with small samples. Sociological Methods & Research, 33(1), 88–117.CrossRef Google Scholar

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515–533.CrossRef Google Scholar

Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman & Hall/CRC.Google Scholar

Gelman, A., Jakulin, A., Pittau, M.G., & Su, Y.S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.CrossRef Google Scholar

Gelman, A., & Meng, X. (1996). Model checking and model improvement. In Markov chain Monte Carlo in practice (pp. 189–201). London: Chapman & Hall.Google Scholar

Gelman, A., Shor, B., Bafumi, J., & Park, D. (2007). Rich state, poor state, red state, blue state: what’s the matter with Connecticut?. Quarterly Journal of Political Science, 2(4), 345–367.CrossRef Google Scholar

Greenland, S. (2000). When should epidemiologic regressions use random coefficients?. Biometrics, 56(3), 915–921.CrossRef Google Scholar PubMed

Hardy, R., & Thompson, S. (1998). Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine, 17(8), 841–856.3.0.CO;2-D>CrossRef Google Scholar PubMed

Harville, D.A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika, 61(2), 383–385.CrossRef Google Scholar

Harville, D.A. (1977). Maximum likelihood approaches to variance components estimation and related problems. Journal of the American Statistical Association, 72(358), 320–338.CrossRef Google Scholar

Higgins, J.P.T., Thompson, S.G., & Spiegelhalter, D.J. (2009). A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society. Series A, 172(1), 137–159.CrossRef Google Scholar PubMed

Huber, P.J. (1967). The behavior of maximum likelihood estimation under nonstandard condition. In LeCam, L.M., & Neyman, J. (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 221–233). Berkeley: University of California Press.Google Scholar

Kenward, M., & Roger, J.H. (1997). Small-sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983–997.CrossRef Google Scholar PubMed

Laird, N.M., & Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics, 38(4), 963–974.CrossRef Google Scholar PubMed

Li, H., & Lahiri, P. (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101(4), 882–892.CrossRef Google Scholar

Longford, N.T. (2000). On estimating standard errors in multilevel analysis. Journal of the Royal Statistical Society. Series D, 49(3), 389–398.Google Scholar

Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.CrossRef Google Scholar

Miller, J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. The Annals of Statistics, 5(4), 746–762.CrossRef Google Scholar

Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177–195.CrossRef Google Scholar

Morris, C. (2006). Mixed model prediction and small area estimation (with discussions). Test, 15(1), 72–76.Google Scholar

Morris, C., & Tang, R. (2011). Estimating random effects via adjustment for density maximization. Statistical Science, 26(2), 271–287.CrossRef Google Scholar

Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16(1), 1–32.CrossRef Google Scholar

O’Hagan, A. (1976). On posterior joint and marginal modes. Biometrika, 63(2), 329–333.CrossRef Google Scholar

Overton, R. (1998). A comparison of fixed-effects and mixed (random-effects) models for meta-analysis tests of moderator variable effects. Psychological Methods, 3(3), 354.CrossRef Google Scholar

Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545–554.CrossRef Google Scholar

Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press.Google Scholar

Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323.CrossRef Google Scholar

Raudenbush, S., & Bryk, A. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98.CrossRef Google Scholar

Rubin, D.B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4), 377–401.CrossRef Google Scholar

Self, S.G., & Liang, K.Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82(398), 605–610.CrossRef Google Scholar

Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational and Behavioral Statistics, 18(3), 237–259.CrossRef Google Scholar

Stram, D.O., & Lee, J.W. (1994). Variance components testing in the logitudinal mixed effects model. Biometrics, 50(4), 1171–1177.CrossRef Google Scholar

Swallow, W., & Monahan, J. (1984). Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components. Technometrics, 26(1), 47–57.CrossRef Google Scholar

Swaminathan, H., & Gifford, J.A. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50(3), 349–364.CrossRef Google Scholar

Tsutakawa, R.K., & Lin, H.Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51(2), 251–267.CrossRef Google Scholar

Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for longitudinal data. Berlin: Springer.Google Scholar

Vermunt, J., & Magidson, J. (2005). Technical guide for Latent Gold 4.0: basic and advanced (Technical report). Statistical Innovations Inc., Belmont, Massachusetts.Google Scholar

Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30(3), 261–293.CrossRef Google Scholar

Warton, D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), 340–349.CrossRef Google Scholar

Weiss, R.E. (2005). Modeling longitudinal data. New York: Springer.Google Scholar

Whaley, S., Sigman, M., Neumann, C.G., Bwibo, N.O., Guthrie, D., Weiss, R.E., Alber, S., & Murphy, S.P. (2003). Animal source foods improve dietary quality, micronutrient status, growth and cognitive function in Kenyan school children: background, study design and baseline findings. The Journal of Nutrition, 133(11), 3965–3971.CrossRef Google Scholar

White, H. (1990). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838.CrossRef Google Scholar

Article contents

A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests