Linguistic Corpora And Lexicography

Willem Meijs

doi:10.1017/S026719050000146X

Linguistic Corpora And Lexicography

Published online by Cambridge University Press: 19 November 2008

Willem Meijs

Article contents

Extract
References

Get access

Rights & Permissions

Extract

Over the past ten to fifteen years, the discipline of lexicography has changed almost beyond recognition. This change is due to the technological revolution which has computerized the lexicographers' working environment to a very high degree and which has permitted a veritable quantum leap in the amount and variety of resources that can be brought to bear on the lexicographical process. The most important of these resources are computerized corpora of real, mostly written, but now increasingly also spoken, running text. When the first entirely corpus-based dictionary—COBUILD1—came out in 1987, it was on the basis of a corpus of around 20 million words of connected text. Now all major British dictionary publishers use corpora of at least one hundred million words of text. Harrap/Chambers, Longman, and Oxford University Press have built the 100 million word British National Corpus (BNC), HarperCollins has the 200 million-plus word Cobuild Bank of English (BoE), and Cambridge University Press has compiled the 100 million word Cambridge Language Survey corpus (CLS).

Type: Technology and Language Analysis
Information: Annual Review of Applied Linguistics , Volume 16 , March 1996 , pp. 99 - 114

DOI: https://doi.org/10.1017/S026719050000146X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

UNANNOTATED BIBLIOGRAPHY

CIDE: Cambridge international dictionary of English. 1995. [ed. by Procter, P.] Cambridge: Cambridge University Press.Google Scholar

COBUILD1: Collins Cobuild English language dictionary. 1st ed. 1987. [ed. by Sinclair, J..] London: Collins.Google Scholar

COBUILD2: Collins Cobuild English dictionary. 2nd ed. 1995. [ed. by Sinclair, j..] London: HarperCollins.Google Scholar

Harrap's: Harrap's essential English dictionary. 1995. [ed. by Higgleton, E. et al. , ] Edinburgh: Harrap/Chambers.Google Scholar

LDOCE1: Longman dictionary of contemporary English. 1st ed. 1978 [ed. by Procter, P..] Harlow: Longman.Google Scholar

LDOCE3: Longman dictionary of contemporary English. 3rd ed. 1995. [ed. by Gadsby, A..] Harlow: Longman.Google Scholar

Longman language activator. 1993. [ed. by Summers, D..] Harlow: Longman.Google Scholar

OALD5: Oxford advanced learner's dictionary. 5th ed. 1995. [ed. by Crowther, J.. ] Oxford: Oxford University Press. [Original editor: A. S. Hornby].Google Scholar

OED: Oxford English dictionary. 1884–1928. [ed. by Murray, J. A. H., et al.] Oxford: Clarendon Press.Google Scholar

Akkerman, E., Masereeuw, P. C. and Meijs, W. J.. 1985. Designing a computerized lexicon for linguistic purposes: ASCOT report no. 1. Amsterdam: Rodopi.CrossRef Google Scholar

Akkerman, E., Voogt-Van Zutphen, H. J. and Meijs, W. J.. 1988. A computerized lexicon for word-level tagging: ASCOT report no. 2. Amsterdam: Rodopi.CrossRef Google Scholar

Atkins, B. T. S. 1994. Tools for computer-aided lexicography: The Hector project. In Kiefer, F., Kiss, G. and pajzs, J. (eds.) Papers in computational lexicography: Complex 1994. Budapest: Research Institute for Linguistics, Hungarian Academy of Sciences. 1–59.Google Scholar

Atkins, B. T. S. and Levin, B.. 1995. Building on a corpus: A linguistic and lexicographical look at some near-synonyms. International Journal of Lexicography. 8. 85–114.CrossRef Google Scholar

Biber, D. 1994. Using register-diversified corpora for general language studies. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 179–201.Google Scholar

Black, E. et al. , (eds.) 1993. Statistically-driven computer grammars of English: The IBM/Lancaster approach. Amsterdam: Rodopi.CrossRef Google Scholar

Boguraev, B. and Briscoe, E. (eds.) 1989. Computational lexicography for natural language processing. London: Longman.Google Scholar

Brent, M. R. 1994. From grammar to lexicon: Unsupervised learning of lexical syntax. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 203–222.Google Scholar

Briscoe, E. and Carroll, J.. 1994. Generalized probailistic LR parsing of natural language (corpora) with unification-based grammars. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 25–59.Google Scholar

Church, K. W. 1988. A stochastic parts program and noun pharse parser for unrestricted text. In Proceedings of the second conference on Applied Natural language processing. Austin, TX. 136–143.CrossRef Google Scholar

Church, K. W. and Hanks, P.. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics. 16. 22–29.Google Scholar

Church, K. W. and Mercer, R. L.. 1994. Introduction to the special issue on computational linguistics using large corpora. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 1–24.Google Scholar

Clear, J. 1993. The British National Corpus. In Landau, G. P. and Delaney, P. (eds.) The digital word: Text-based computing in the humanities. London: MIT Press. 163–188.Google Scholar

Clear, J. 1994. I can't see the sense in a large corpus. In Kiefer, F., Kiss, G. and Pajzs, J. (eds.) Papers in computational lexicography: Complex 1994. Budapest: Research Institute for Linguistics, Hungarian Academy of Sciences. 33–45.Google Scholar

Crowdy, S. 1993. Spoken corpus design. Literary and Linguistic Computing. 8. 259–265.CrossRef Google Scholar

Crowdy, S. 1994. Spoken corpus transcription. Literary and Linguistic Computing. 9. 25–28.CrossRef Google Scholar

Fontenelle, T. Forthcoming. Tools for corpus processing. [Report to appear in the European Community's Studies in machine translation and natural language processing series.]Google Scholar

Gale, W. A. and Church, K.. 1994. program for aligning sentences in bilingual corpora. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 75–102.Google Scholar

Garside, R., Leech, G. and Sampson, G. (eds.) 1987. The computational analysis of English: A corpus-based approach. London: Longman.Google Scholar

Grefenstette, G. 1994. Corpus-derived first, second and third-order word affinities. In Martin, W. et al. , (eds.) Euralex '94 proceedings. Amsterdam: Free University of Amsterdam. 279–90.Google Scholar

Harley, A. 1994. Cambridge language survey: Semantic tagger. Cambridge: Cambridge University Press. [Acquilex II Working Paper No. 39.]Google Scholar

Hoey, M. 1991. Patterns of lexis in text. Oxford: Oxford University Press.Google Scholar

Johnson, S. 1755. Dictionary of the English Language.Google Scholar

Kay, M. and Röscheisen, M.. 1994. Text-translation alignment. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 121–142.Google Scholar

Marcus, M., Santorini, B. and Marcinkiewicz, M. A.. 1994. Building a large annotated corpus of English: The Penn Treebank. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 273–290.Google Scholar

Meijs, W. J. 1992a. Computers and dictionariesIn Butler, C. (ed.) Computers and written texts. Oxford: Blackwell. 141–165.Google Scholar

Meijs, W. J. 1992b. The expanding lexical universe: Extracting taxonomies from machine-readable dictionaries. in Bibliograf, (ed.) Euralex '90 Proceedings. Barcelona: Bibliograf. 119–128.Google Scholar

Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J.. 1985. A comprehensive grammar of the English language. London: Longman.Google Scholar

Rundell, M. and Ham, N.. 1994. A new conceptual map of English. In Martin, W. et al. , (eds.) Euralex '94 proceedings. Amsterdam: Free University of Amsterdam. 172–180.Google Scholar

Sinclair, J. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.Google Scholar

Smajda, F. 1994. Retrieving collocations from text: Xtract. In Armstrong, S. (ed.) Using large corpora. London: MIT Press. 143–77.Google Scholar

Summers, D. 1993. Longman/Lancaster English language corpus—criteria and design. International Journal of Lexicography. 6. 181–208.CrossRef Google Scholar

Thorndike, E. W. and Lorge, I.. 1944. The teacher's wordbook of 30,000 words. New York: Teachers' College.Google Scholar

Voutilainen, A. and Heikkilä, J.. 1994. An English constraint grammar (ENGCG): A surface-syntactic parser of English In Fries, U., Tottie, G. and Schneider, P. (eds.) Creating and using English language corpora. Amsterdam: Rodopi. 189–199.CrossRef Google Scholar

West, M. 1953. General service list of English words. 2nd ed. [Reprinted 1965 London: Longman.]Google Scholar

Article contents

Linguistic Corpora And Lexicography

Extract

Access options

References

UNANNOTATED BIBLIOGRAPHY

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests