Hostname: page-component-586b7cd67f-t7czq Total loading time: 0 Render date: 2024-11-30T05:42:33.042Z Has data issue: false hasContentIssue false

Concentration of Size and Path Length of Tries

Published online by Cambridge University Press:  24 September 2004

WERNER SCHACHINGER
Affiliation:
Department of Statistics and Decision Support Systems, University of Vienna, Brünnerstr. 72, A-1210 Wien, Austria (e-mail: [email protected])

Abstract

We show that, for a certain class of probabilistic models, the number of internal nodes $S_n$ of a trie built from $n$ independent and identically distributed keys is concentrated around its mean, in the sense that $\Var S_n=\Oh(\EE S_n)$. Keys are sequences of symbols which may be taken from varying alphabets, and the choice of the alphabet from which the $k$th symbol is taken, as well as its distribution, may depend on all the preceding symbols. In the construction of the trie we also allow for bucket sizes greater than 1. The property that characterizes our models is the following: there is a constant $C$ such that for any word $v$ that may occur as a prefix of a key the size $S^v_n$ of a trie built from the suffixes of $n$ independent keys conditioned to have the common prefix $v$ has the property $\EE S^v_n\leq Cn$. This class of models contains memoryless and Markovian source models as well as the probabilistic dynamical source models that were recently introduced and thoroughly developed by Vallée [Algorithmica29 (2001) 262–306], in particular the continued fraction source. Furthermore we study the external path length $L_n$, which obeys $\EE L_n=\Oh(n\ln n)$ and $\Var L_n=\Oh(n\ln^2 n)$.

Type
Paper
Copyright
© 2004 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)