Published online by Cambridge University Press: 25 November 2003
We aim at finding the minimal set of fragments that achieves maximal parse accuracy in Data Oriented Parsing (DOP). Experiments with the Penn Wall Street Journal (WSJ) treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank. We isolate a number of dependency relations which previous models neglect but which contribute to higher accuracy. We show that the history of statistical parsing models displays a tendency towards using more and larger fragments from training data.