Analysis of an Algorithm for Random Sampling*

Catherine B. Hurley; Hosam M. Mahmoud

doi:10.1017/S0269964800003302

Analysis of an Algorithm for Random Sampling*

Published online by Cambridge University Press: 27 July 2009

Catherine B. Hurley and

Hosam M. Mahmoud

Show author details

Catherine B. Hurley: Affiliation:
The George Washington University, Washington, D.C. 20052
Hosam M. Mahmoud: Affiliation:
The George Washington University, Washington, D.C. 20052

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We analyze a standard algorithm for sampling m items without replacement from a computer file of n records. The algorithm repeatedly selects a record at random from the file, rejecting records that have previously been selected, until m records are obtained. The running time of the algorithm has two components: a rejection component and a search component. We show that the probability distribution of the rejection component undergoes an infinite series of phase transitions, depending on the order of magnitude of m relative to n. We identify an infinite number of ranges of m, each with a different behavior. The rejection component is distributed as a linear combination of Poisson-like random variables. The search component is customarily done using a hash table with separate chaining. The analysis of the hashing scheme in this problem differs from previous hashing analyses, as the number of lookups in the hash table for each insertion has a geometric distribution. We show that the average overall cost of searching is asymptotically linear with only two phase transitions in the coefficient of linearity.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 8 , Issue 2 , April 1994 , pp. 153 - 168

DOI: https://doi.org/10.1017/S0269964800003302 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 1994

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Baum, L. & Billingsley, P. (1965). Asymptotic distributions for the coupon collector's problem. Annals of Statistics 36: 1835–1839.Google Scholar

2.Devroye, L. (1986). Non-uniform random variate generation. New York: Springer-Verlag.Google Scholar

3.Erdös, P. & Rényi, A. (1961). On a classical problem of probability theory. Magyar Tudomanyos Akademia Mat. Kutató Int. Közl. 6: 215–220.Google Scholar

4.Ernvall, J. & Nevalainen, O. (1982). An algorithm for unbiased random sampling. The Computer Journal 25: 45–47.Google Scholar

5.Feller, W. (1968). An introduction to probability theory and its applications, Vol. 1, 3rd ed.New York: Wiley.Google Scholar

6.Goodman, S. & Hedetniemi, S. (1977). Introduction to the design and analysis of algorithms. New York: McGraw-Hill.Google Scholar

7.Holst, L. (1986). On birthday, collectors', occupancy and other classical urn problems. International Statistical Review 54: 15–27.Google Scholar

8.Jakobsson, M. (1985). Sampling without replacement in linear time. The Computer Journal 28: 412–413.Google Scholar

9.Rényi, A. (1962). Three new proofs and a generalization of a theorem of Irving Weiss. Magyar Tudomanyos Akademia Mat. Kutató Int. Közl. 7: 203–214.Google Scholar

Article contents

Analysis of an Algorithm for Random Sampling*

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests