Published online by Cambridge University Press: 19 June 2019
Data-driven learning (DDL; Johns, 1991), involving students’ hands-on use of corpora for self-guided language learning, is a methodology now increasingly used in many tertiary contexts to enhance the teaching of disciplinary postgraduate thesis writing. However, there are still few studies tracking students’ actual engagement with corpora for DDL. This mixed-methods study reports on the tracking of students’ corpus use via a purpose-built corpus query and data visualisation platform integrated into a large postgraduate disciplinary thesis writing program at a university in Hong Kong. Data on corpus usage history (e.g. times of access, duration of use), query syntax (e.g. query lexis/phraseology and use of wildcards and part-of-speech tags), query function (e.g. frequency lists/distribution, concordance sorting and collocation) and query filters (e.g. searches by faculty, discipline, or thesis section) were collected from 327 students spanning over 11,000 individual corpus queries. The results show significant interdisciplinary and inter-/intra-user trends and variation in the use of particular corpus functions and query syntax adopted by corpus users. Students varied in the type of knowledge (e.g. domain-specific, language-specific) they were accessing, and frequently went beyond the exemplars of the DDL course materials to generate unique queries under their own initiative. Qualitative case study data from three corpus users’ activity logs also show distinctive individual corpus engagement by query frequency and function. These data provide a clearer insight into what students actually do during DDL and the different directions and trajectories that individual users take as a result of DDL. All accompanying DDL tasks are also included as supplementary materials.