Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2025-01-03T20:52:48.606Z Has data issue: false hasContentIssue false

Developing annotation solutions for online Data Driven Learning*

Published online by Cambridge University Press:  01 January 2009

Pascual Pérez-Paredes
Affiliation:
Universidad de Murcia, Departamento de Filología Inglesa, Facultad de Letras, Campus La Merced, 30071 – Murcia, Spain (email: [email protected]; [email protected])
Jose M. Alcaraz-Calero
Affiliation:
Universidad de Murcia, Departamento de Filología Inglesa, Facultad de Letras, Campus La Merced, 30071 – Murcia, Spain (email: [email protected]; [email protected])

Abstract

Although annotation is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation of corpus-based/driven language teaching.

In this paper, we set out to examine the process of development of SACODEYL Annotator, an application that seeks to assist SACODEYL system users in annotating XML multilingual corpora. First, we discuss the role of annotation in DDL and the dominating paradigm in general corpus applications. In the context of the language classroom, we argue that it is essential that corpora should be pedagogically motivated (Braun, 2005 and 2007a). Then, we move on to deal with the analysis and design stages of our annotation solution by illustrating its main features. Some of these include a user friendly hierarchical and extensible taxonomy tree to facilitate the learner-oriented annotation of the corpora; real-time graphics representation of the annotated corpus matching the XML TEI-compliant (Text Encoding Initiative) standard, as well as an intuitive management of the different data sections and associated metadata.

SACODEYL (System Aided Compilation and Open Distribution of European Youth Language) is an EU funded MINERVA project which aims to develop an ICT-based system for the assisted compilation and open distribution of multimedia European teen talk in the context of language education. This research lays emphasis on the functionalities of the application within the SACODEYL context. However, our paper addresses similarly the needs of potential multimedia language corpus administrators in general on the lookout for powerful annotation assisting software. SACODEYL Annotator is free to use and can be downloaded from our website.

Type
Original Article
Copyright
Copyright © European Association for Computer Assisted Language Learning 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abe, M. Tono, Y. (2005) Variations in L2 spoken and written English: investigating patterns of grammatical errors across proficiency levels. In: Corpus Linguistics Conference http://www.corpus.bham.ac.uk/PCLC/CL2005proceedings_AbeTono.docGoogle Scholar
Atserias, J. Casas, E. B., Comelles, M. González, L. Padró, M. (2006) FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). Genoa, Italy.Google Scholar
Banker, R. D., Davis, G. B., Slaughter, S. A. (1998) Software development practices, software complexity, and software maintenance performance: a field study. Management Science, 44(4): 433450.CrossRefGoogle Scholar
Bax, S. (2003) CALL – past, present and future. System, 31(3): 1328.CrossRefGoogle Scholar
Bernardini, S. (2000) Competence, capacity, corpora. A study in corpus-aided language learning. Bologna: CLUEB.Google Scholar
Bernardini, S. (2004) Corpora in the classroom: An overview and some reflections on future developments. In: Sinclair, J. McH. (ed.), How to Use Corpora in Language Teaching. Amsterdam; Philadelphia: J. Benjamins, 1536.CrossRefGoogle Scholar
Biber, D.Finnegan, E. (1991) On the exploitation of computerized corpora in variation studies. In: Aijmer, K. and Altenberg, B. (eds.), English corpus linguistics. Studies in honour of Jan Svartvik. London: Longman, 204220.Google Scholar
Biber, D., Conrad, S.Reppen, R. (1998) Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Braun, S. (2005) From pedagogically relevant corpora to authentic language learning contents. ReCALL, 17(1): 4764.CrossRefGoogle Scholar
Braun, S., Kohn, K.Mukherjee, J. (eds.) (2006) Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. Frankfurt/M: Peter Lang.Google Scholar
Braun, S. (2006a) ELISA – a pedagogically enriched corpus for language learning purposes. In: Braun, S., Kohn, K. and Mukherjee, J. (eds.), Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. Frankfurt/M: Peter Lang, 2547.Google Scholar
Braun, S., Kohn, K.Mukherjee, J. (2006b) Multi-Level Annotation of Linguistic Data with MMAX2. In: Braun, S. Kohn, K. and Mukherjee, J. (eds.), Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods (English Corpus Linguistics, Vol. 3). Frankfurt: Peter Lang, 197214.Google Scholar
Braun, S. (2007a) Designing and exploiting small multimedia corpora for autonomous learning and teaching. In: Hidalgo, E., Quereda, L. and Santana, J. (eds.), Corpora in the Foreign Language Classroom. Selected papers from TaLC 2004. Amsterdam: Rodopi, 3146.Google Scholar
Braun, S. (2007b) Integrating corpus work into secondary education: from data-driven learning to needs-driven corpora. ReCALL, 19(3): 307328.CrossRefGoogle Scholar
Burnard, L. (1995) The Text Encoding Initiative: an overview. In: Leech, G., Myers, G. and Thomas, J. (eds.), Spoken English on Computer: Transcription, Markup and Applications. London: Longman, 6981.Google Scholar
Campbell, D. F., Mcdonnell, C., Meinardi, M.Richardson, B. (2007) The need for a speech corpus. ReCALL, 19(1): 320.CrossRefGoogle Scholar
Chambers, A. (2007) Popularising corpus consultation by language learners and teachers. In: Hidalgo, E., Quereda, L. and Santana, J. (eds.), Corpora in the Foreign Language Classroom. Selected papers from TaLC 2004. Amsterdam: Rodopi, 316.Google Scholar
Colpaert, J. (2004) Design of Online Interactive Language Courseware. Conceptualization, Specification and Prototyping, Research into the Impact of Linguistic-didactic Functionality on Software Architecture. Antwerp: University of Antwerp.Google Scholar
Cushion, S. (2004) Increasing accessibility by pooling digital resources. ReCALL, 16(1): 4150.CrossRefGoogle Scholar
Ellis, R. (2005) Principles of instructed language learning. System, 33(2): 209224.CrossRefGoogle Scholar
Flowerdew, J. (1993) An educational, or process, approach to the teaching of professional genres. ELT, 47(4): 305316.Google Scholar
Frankenberg-Garcia, A. (2005) Pedagogical uses of monolingual and parallel concordances. ELT, 59(3): 189198.Google Scholar
Garside, R. (1987) The CLAWS word-tagging system. In: Garside, R., Leech, F. and Sampson, G. (eds.), The Computational Analysis of English. London: Longman, 3041.Google Scholar
Garside, R., Leech, G.McEnery, A. (eds.) (1997) Corpus Annotation: Linguistic Information from Computer Text Corpora. London: Longman.CrossRefGoogle Scholar
Gavioli, L.Aston, G. (2001) Enriching reality: language corpora in language pedagogy. ELT, 55(3): 238246.Google Scholar
Hidalgo, E., Quereda, L., Santana, J. (2007) Corpora in the Foreign Language Classroom. In: TALC 2004. Proceedings. Amsterdam: Rodopi.Google Scholar
Jain, H., Vitharana, P.Zahedi, F. M. (2003) An assessment model for requirements identification in component-based software development. Special Interest Group on Management Information Systems, 34(4): 4863.Google Scholar
Larman, C. (2002) Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process. Indiana: Prentice Hall PTR.Google Scholar
Larman, C. (2003) Agile and Iterative Development: A Manager’s Guide. New York: Addison-Wesley Professional.Google Scholar
Lee, J.Xue, N. L. (1999) Analyzing user requirements by use cases: a goal-driven approach. IEEE Software, 16(4): 92101.CrossRefGoogle Scholar
Leech, G.Candlin, C. N. (eds.) (1986) Computers in English Language Teaching and Research. London: Longman.Google Scholar
Leech, G. (1986) Automatic grammatical analysis and its educational applications. In: Leech, G. and Candlin, C. (eds.), Computers in English Language Teaching and Research, 205–215.Google Scholar
Leech, G. (1991) The State of the Art in Corpus Linguistics. In: Aijmer, K. and Altenberg, B. (eds.), English corpus linguistics. Studies in honour of Jan Svartvik. London: Longman, 829.Google Scholar
Leech, G. (1993) Corpus Annotation Schemes. Literary and Linguistic Computing, 8(4): 275281.CrossRefGoogle Scholar
Levy, M. (1997) Theory-driven CALL and the development process. Computer Assisted Language Learning, 10(1): 4156.CrossRefGoogle Scholar
Mauranen, A. (2004) Spoken – general: Spoken corpus for an ordinary learner. In: Sinclair, J. McH. (ed.), How to Use Corpora in Language Teaching, 89–105.CrossRefGoogle Scholar
McCarthy, M.O’Dell, F. (2006) English Collocations in Use Intermediate. Cambridge: Cambridge University Press.Google Scholar
McEnery, A. M.Wilson, A. (1996) Corpus Linguistics. Edinburgh: Edinburgh University Press.Google Scholar
McEnery, A. M.Wilson, A. (1997) Corpora and language teaching. ReCALL, 9(1): 514.CrossRefGoogle Scholar
Meunier, F. (2002) The pedagogical value of native and learner corpora in EFL grammar teaching. In: Granger, S., Hung, J. and Petch-Tyson, S. (eds.), Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching. Amsterdam: Benjamins, 119142.Google Scholar
Mishan, F. (2004) Authenticating corpora for language learning: a problem and its resolution. ELT, 58(3): 219227.Google Scholar
Mishan, M.Strunz, B. (2003) An application of XML to the creation of an interactive resource for authentic language learning tasks. ReCALL, 15(2): 237250.CrossRefGoogle Scholar
Mukherjee, J. (2006) Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. In: Braun, S., Kohn, K. and Mukherjee, J. (eds.), Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. Frankfurt/M: Peter Lang, 14.Google Scholar
Needleman, M. (2000) The Unicode Standard. Serial Review, 26(2): 5154.CrossRefGoogle Scholar
O’Keeffe, A., McCarthy, M.Carter, R. (2007) From Corpus to Classroom. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Owen, C. (1996) Do concordances require to be consulted? ELT J, 50(3): 219224.CrossRefGoogle Scholar
Plass, J. L. (1998) Design and evaluation of the user interface of foreign language multimedia software: a cognitive approach. Language Learning & Technology, 2(1): 3545.Google Scholar
Poesio, M.Artstein, R. (2005) Annotating (Anaphoric) Ambiguity. Corpus Linguistics Conference 2005: http://ron.artstein.org/publications/anaphoric-ambiguity.pdfGoogle Scholar
Santos Pereira, L. (2004) Spoken – an example: The use of concordancing in the teaching of Portuguese. In: Sinclair, J. McH. (ed.), How to Use Corpora in Language Teaching, 109–122.CrossRefGoogle Scholar
Schmid, H. (1995) TreeTagger – a language independent part-of-speech tagger. http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/Google Scholar
Sinclair, J. (2004) How to Use Corpora in Language Teaching. Amsterdam; Philadelphia: J. Benjamins.CrossRefGoogle Scholar
Ward, M. (2002) Reusable XML technologies and the development of language learning materials. ReCALL, 14(2): 285294.CrossRefGoogle Scholar
Ward, M. (2006) Using Software Design Methods in CALL. Computer Assisted Language Learning, 19(2–3): 129147.CrossRefGoogle Scholar
Weber, J. J. (2001) A concordance- and genre-informed approach to ESP essay writing. ELT, 55(1): 1420.Google Scholar