Hostname: page-component-78c5997874-ndw9j Total loading time: 0 Render date: 2024-11-05T12:01:37.819Z Has data issue: false hasContentIssue false

Analysis of an Algorithm for Random Sampling*

Published online by Cambridge University Press:  27 July 2009

Catherine B. Hurley
Affiliation:
The George Washington University, Washington, D.C. 20052
Hosam M. Mahmoud
Affiliation:
The George Washington University, Washington, D.C. 20052

Abstract

We analyze a standard algorithm for sampling m items without replacement from a computer file of n records. The algorithm repeatedly selects a record at random from the file, rejecting records that have previously been selected, until m records are obtained. The running time of the algorithm has two components: a rejection component and a search component. We show that the probability distribution of the rejection component undergoes an infinite series of phase transitions, depending on the order of magnitude of m relative to n. We identify an infinite number of ranges of m, each with a different behavior. The rejection component is distributed as a linear combination of Poisson-like random variables. The search component is customarily done using a hash table with separate chaining. The analysis of the hashing scheme in this problem differs from previous hashing analyses, as the number of lookups in the hash table for each insertion has a geometric distribution. We show that the average overall cost of searching is asymptotically linear with only two phase transitions in the coefficient of linearity.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1994

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Baum, L. & Billingsley, P. (1965). Asymptotic distributions for the coupon collector's problem. Annals of Statistics 36: 18351839.Google Scholar
2.Devroye, L. (1986). Non-uniform random variate generation. New York: Springer-Verlag.Google Scholar
3.Erdös, P. & Rényi, A. (1961). On a classical problem of probability theory. Magyar Tudomanyos Akademia Mat. Kutató Int. Közl. 6: 215220.Google Scholar
4.Ernvall, J. & Nevalainen, O. (1982). An algorithm for unbiased random sampling. The Computer Journal 25: 4547.Google Scholar
5.Feller, W. (1968). An introduction to probability theory and its applications, Vol. 1, 3rd ed.New York: Wiley.Google Scholar
6.Goodman, S. & Hedetniemi, S. (1977). Introduction to the design and analysis of algorithms. New York: McGraw-Hill.Google Scholar
7.Holst, L. (1986). On birthday, collectors', occupancy and other classical urn problems. International Statistical Review 54: 1527.Google Scholar
8.Jakobsson, M. (1985). Sampling without replacement in linear time. The Computer Journal 28: 412413.Google Scholar
9.Rényi, A. (1962). Three new proofs and a generalization of a theorem of Irving Weiss. Magyar Tudomanyos Akademia Mat. Kutató Int. Közl. 7: 203214.Google Scholar