Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-11T03:24:40.600Z Has data issue: false hasContentIssue false

Linguistic Corpora And Lexicography

Published online by Cambridge University Press:  19 November 2008

Extract

Over the past ten to fifteen years, the discipline of lexicography has changed almost beyond recognition. This change is due to the technological revolution which has computerized the lexicographers' working environment to a very high degree and which has permitted a veritable quantum leap in the amount and variety of resources that can be brought to bear on the lexicographical process. The most important of these resources are computerized corpora of real, mostly written, but now increasingly also spoken, running text. When the first entirely corpus-based dictionary—COBUILD1—came out in 1987, it was on the basis of a corpus of around 20 million words of connected text. Now all major British dictionary publishers use corpora of at least one hundred million words of text. Harrap/Chambers, Longman, and Oxford University Press have built the 100 million word British National Corpus (BNC), HarperCollins has the 200 million-plus word Cobuild Bank of English (BoE), and Cambridge University Press has compiled the 100 million word Cambridge Language Survey corpus (CLS).

Type
Technology and Language Analysis
Copyright
Copyright © Cambridge University Press 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

UNANNOTATED BIBLIOGRAPHY

CIDE: Cambridge international dictionary of English. 1995. [ed. by Procter, P.] Cambridge: Cambridge University Press.Google Scholar
COBUILD1: Collins Cobuild English language dictionary. 1st ed. 1987. [ed. by Sinclair, J..] London: Collins.Google Scholar
COBUILD2: Collins Cobuild English dictionary. 2nd ed. 1995. [ed. by Sinclair, j..] London: HarperCollins.Google Scholar
Harrap's: Harrap's essential English dictionary. 1995. [ed. by Higgleton, E. et al. , ] Edinburgh: Harrap/Chambers.Google Scholar
LDOCE1: Longman dictionary of contemporary English. 1st ed. 1978 [ed. by Procter, P..] Harlow: Longman.Google Scholar
LDOCE3: Longman dictionary of contemporary English. 3rd ed. 1995. [ed. by Gadsby, A..] Harlow: Longman.Google Scholar
Longman language activator. 1993. [ed. by Summers, D..] Harlow: Longman.Google Scholar
OALD5: Oxford advanced learner's dictionary. 5th ed. 1995. [ed. by Crowther, J.. ] Oxford: Oxford University Press. [Original editor: A. S. Hornby].Google Scholar
OED: Oxford English dictionary. 18841928. [ed. by Murray, J. A. H., et al.] Oxford: Clarendon Press.Google Scholar
Akkerman, E., Masereeuw, P. C. and Meijs, W. J.. 1985. Designing a computerized lexicon for linguistic purposes: ASCOT report no. 1. Amsterdam: Rodopi.CrossRefGoogle Scholar
Akkerman, E., Voogt-Van Zutphen, H. J. and Meijs, W. J.. 1988. A computerized lexicon for word-level tagging: ASCOT report no. 2. Amsterdam: Rodopi.CrossRefGoogle Scholar
Atkins, B. T. S. 1994. Tools for computer-aided lexicography: The Hector project. In Kiefer, F., Kiss, G. and pajzs, J. (eds.) Papers in computational lexicography: Complex 1994. Budapest: Research Institute for Linguistics, Hungarian Academy of Sciences. 159.Google Scholar
Atkins, B. T. S. and Levin, B.. 1995. Building on a corpus: A linguistic and lexicographical look at some near-synonyms. International Journal of Lexicography. 8. 85114.CrossRefGoogle Scholar
Biber, D. 1994. Using register-diversified corpora for general language studies. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 179201.Google Scholar
Black, E. et al. , (eds.) 1993. Statistically-driven computer grammars of English: The IBM/Lancaster approach. Amsterdam: Rodopi.CrossRefGoogle Scholar
Boguraev, B. and Briscoe, E. (eds.) 1989. Computational lexicography for natural language processing. London: Longman.Google Scholar
Brent, M. R. 1994. From grammar to lexicon: Unsupervised learning of lexical syntax. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 203222.Google Scholar
Briscoe, E. and Carroll, J.. 1994. Generalized probailistic LR parsing of natural language (corpora) with unification-based grammars. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 2559.Google Scholar
Church, K. W. 1988. A stochastic parts program and noun pharse parser for unrestricted text. In Proceedings of the second conference on Applied Natural language processing. Austin, TX. 136143.CrossRefGoogle Scholar
Church, K. W. and Hanks, P.. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics. 16. 2229.Google Scholar
Church, K. W. and Mercer, R. L.. 1994. Introduction to the special issue on computational linguistics using large corpora. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 124.Google Scholar
Clear, J. 1993. The British National Corpus. In Landau, G. P. and Delaney, P. (eds.) The digital word: Text-based computing in the humanities. London: MIT Press. 163188.Google Scholar
Clear, J. 1994. I can't see the sense in a large corpus. In Kiefer, F., Kiss, G. and Pajzs, J. (eds.) Papers in computational lexicography: Complex 1994. Budapest: Research Institute for Linguistics, Hungarian Academy of Sciences. 3345.Google Scholar
Crowdy, S. 1993. Spoken corpus design. Literary and Linguistic Computing. 8. 259265.CrossRefGoogle Scholar
Crowdy, S. 1994. Spoken corpus transcription. Literary and Linguistic Computing. 9. 2528.CrossRefGoogle Scholar
Fontenelle, T. Forthcoming. Tools for corpus processing. [Report to appear in the European Community's Studies in machine translation and natural language processing series.]Google Scholar
Gale, W. A. and Church, K.. 1994. program for aligning sentences in bilingual corpora. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 75102.Google Scholar
Garside, R., Leech, G. and Sampson, G. (eds.) 1987. The computational analysis of English: A corpus-based approach. London: Longman.Google Scholar
Grefenstette, G. 1994. Corpus-derived first, second and third-order word affinities. In Martin, W. et al. , (eds.) Euralex '94 proceedings. Amsterdam: Free University of Amsterdam. 279–90.Google Scholar
Harley, A. 1994. Cambridge language survey: Semantic tagger. Cambridge: Cambridge University Press. [Acquilex II Working Paper No. 39.]Google Scholar
Hoey, M. 1991. Patterns of lexis in text. Oxford: Oxford University Press.Google Scholar
Johnson, S. 1755. Dictionary of the English Language.Google Scholar
Kay, M. and Röscheisen, M.. 1994. Text-translation alignment. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 121142.Google Scholar
Marcus, M., Santorini, B. and Marcinkiewicz, M. A.. 1994. Building a large annotated corpus of English: The Penn Treebank. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 273290.Google Scholar
Meijs, W. J. 1992a. Computers and dictionariesIn Butler, C. (ed.) Computers and written texts. Oxford: Blackwell. 141165.Google Scholar
Meijs, W. J. 1992b. The expanding lexical universe: Extracting taxonomies from machine-readable dictionaries. in Bibliograf, (ed.) Euralex '90 Proceedings. Barcelona: Bibliograf. 119128.Google Scholar
Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J.. 1985. A comprehensive grammar of the English language. London: Longman.Google Scholar
Rundell, M. and Ham, N.. 1994. A new conceptual map of English. In Martin, W. et al. , (eds.) Euralex '94 proceedings. Amsterdam: Free University of Amsterdam. 172180.Google Scholar
Sinclair, J. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.Google Scholar
Smajda, F. 1994. Retrieving collocations from text: Xtract. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 143–77.Google Scholar
Summers, D. 1993. Longman/Lancaster English language corpus—criteria and design. International Journal of Lexicography. 6. 181208.CrossRefGoogle Scholar
Thorndike, E. W. and Lorge, I.. 1944. The teacher's wordbook of 30,000 words. New York: Teachers' College.Google Scholar
Voutilainen, A. and Heikkilä, J.. 1994. An English constraint grammar (ENGCG): A surface-syntactic parser of English In Fries, U., Tottie, G. and Schneider, P. (eds.) Creating and using English language corpora. Amsterdam: Rodopi. 189199.CrossRefGoogle Scholar
West, M. 1953. General service list of English words. 2nd ed. [Reprinted 1965 London: Longman.]Google Scholar