On the occupancy problem for a regime-switching model

Michael Grabchak; Mark Kelbert; Quentin Paris

doi:10.1017/jpr.2020.33

On the occupancy problem for a regime-switching model

Part of: Markov processes

Published online by Cambridge University Press: 04 May 2020

Michael Grabchak ,

Mark Kelbert and

Quentin Paris

Show author details

Michael Grabchak*: Affiliation:
University of North Carolina Charlotte
Mark Kelbert*: Affiliation:
National Research University Higher School of Economics
Quentin Paris*: Affiliation:
National Research University Higher School of Economics
*: *Postal address: Department of Mathematics and Statistics, Charlotte, NC, USA. Email address: [email protected]
**Postal address: National Research University Higher School of Economics (HSE), Faculty of Economics, Department of Statistics and Data Analysis, Moscow, Russia. Email address: [email protected]
***Postal address: National Research University Higher School of Economics (HSE), Faculty of Computer Science, School of Data Analysis and Artificial Intelligence & HDI Lab, Moscow, Russia. Email address: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This article studies the expected occupancy probabilities on an alphabet. Unlike the standard situation, where observations are assumed to be independent and identically distributed, we assume that they follow a regime-switching Markov chain. For this model, we (1) give finite sample bounds on the expected occupancy probabilities, and (2) provide detailed asymptotics in the case where the underlying distribution is regularly varying. We find that in the regularly varying case the finite sample bounds are rate optimal and have, up to a constant, the same rate of decay as the asymptotic result.

Keywords

occupancy problem regime switching Markov chain regular variation

MSC classification

Primary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces)

Secondary: 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)

Type: Research Papers
Information: Journal of Applied Probability , Volume 57 , Issue 1 , March 2020 , pp. 53 - 77

DOI: https://doi.org/10.1017/jpr.2020.33 [Opens in a new window]
Copyright: © Applied Probability Trust 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ben-Hamou, A., Boucheron, S. and Gassiat, E. (2016) Pattern coding meets censoring: (Almost) adaptive coding on countable alphabets. arXiv:1608.08367.Google Scholar

Ben-Hamou, A., Boucheron, S. and Ohannessian, M. I. (2017) Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications. Bernoulli 23, 249–287.CrossRef Google Scholar

Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987) Regular Variation (Encyclopedia of Mathematics And Its Applications). Cambridge University Press.CrossRef Google Scholar

Bubeck, S., Ernst, D. and Garivier, A. (2013) Optimal discovery with probabilistic expert advice: Finite time analysis and macroscopic optimality. J. Mach. Learn. Res. 14, 601–623.Google Scholar

Chao, A. (1981) On estimating the probability of discovering a new species. Ann. Statist. 9, 1339–1342.10.1214/aos/1176345651CrossRef Google Scholar

Chen, S. F. and Goodman, J. (1999) An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 359–394.10.1006/csla.1999.0128CrossRef Google Scholar

Decrouez, G., Grabchak, M. and Paris, Q. (2018) Finite sample properties of the mean occupancy counts and probabilities. Bernoulli 24, 1910–1941.10.3150/16-BEJ915CrossRef Google Scholar

Efron, B. and Thisted, R. (1976) Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63, 435–447.Google Scholar

Gandolfi, A. and Sastri, C. C. A. (2004) Nonparametric estimations about species not observed in a random sample. Milan J. Math. 72, 81–105.10.1007/s00032-004-0031-8CrossRef Google Scholar

Glynn, P. W. and Ormoneit, D. (2002) Hoeffding’s inequality for uniformly ergodic Markov chains. Statist. Prob. Lett. 56, 143–146.CrossRef Google Scholar

Gnedin, A., Hansen, B. and Pitman, J. (2007) Notes on the occupancy problem with infinitely many boxes: General asymptotics and power laws. Prob. Surv. 4, 146–171.10.1214/07-PS092CrossRef Google Scholar

Good, I. J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264.10.1093/biomet/40.3-4.237CrossRef Google Scholar

Good, I. J. and Toulmin, G. H. (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43, 45–63.10.1093/biomet/43.1-2.45CrossRef Google Scholar

Grabchak, M. and Zhang, Z. (2017) Asymptotic properties of Turing’s formula in relative error. Mach. Learn. 106, 1771–1785.10.1007/s10994-016-5620-6CrossRef Google Scholar

Johnson, N. L. and Kotz, S. (1977) Urn Models and Their Application. Wiley, New York.Google Scholar

Karlin, S. (1967) Central limit theorems for certain infinite urn schemes. J. Math. Mech. 17, 373–401.Google Scholar

Mao, C. X. and Lindsay, B. G. (2002) A Poisson model for the coverage problem with a genomic application. Biometrika 89, 669–681.10.1093/biomet/89.3.669CrossRef Google Scholar

Ohannessian, M. I. and Dahleh, M. A. (2012) Rare probability estimation under regularly varying heavy tails. In Proc. 25th Ann. Conf. on Learning Theory, Vol. 23, pp. 21.1–21.24.Google Scholar

Orlitsky, A., Santhanam, N. P. and Zhang, J. (2004) Universal compression of memoryless sources over unknown alphabets. IEEE Trans. Inf. Theory 50, 1469–1481.10.1109/TIT.2004.830761CrossRef Google Scholar

Paulin, D. (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Prob. 20, 1–32.10.1214/EJP.v20-4039CrossRef Google Scholar

Resnick, S. I. (2007) Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer, New York.Google Scholar

Roberts, G. O. and Rosenthal, J. S. (2004) General state space Markov chains and MCMC algorithms. Prob. Surv. 1, 20–71.10.1214/154957804100000024CrossRef Google Scholar

Thisted, R. and Efron, B. (1987) Did Shakespeare write a newly discovered poem? Biometrika 74, 445–455.10.1093/biomet/74.3.445CrossRef Google Scholar

Zhang, C. H. (2005) Estimation of sums of random variables: Examples and information bounds. Ann. Statist. 33, 2022–2041.10.1214/009053605000000390CrossRef Google Scholar

Zhang, Z. and Huang, H. (2007) Turing’s formula revisited. J. Quant. Ling. 14, 222–241.10.1080/09296170701514189CrossRef Google Scholar

Article contents

On the occupancy problem for a regime-switching model

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests