Published online by Cambridge University Press: 12 August 2020
What does it mean to be able to study Chinese history at scale? What methods, tools, and approaches will allow us to understand Chinese history and historiography from a larger perspective over the longue durée, including linguistic, philosophical, ethnographic, and literary concerns? In this article we present what we feel is one potential key to answering these questions and provide an overview of the utility and value of harnessing this framework for text-based historical research as a means to expand one's scholarship to virtually limitless scales.
Jeffrey Tharsen, University of Chicago, email: [email protected] and Clovis Gladstone, University of Chicago, email: [email protected]
1 We can trace concordances back to the Dominicans, who in the thirteenth century had the idea of providing a new way to study the Bible.
2 See Winter, Thomas Nelson, “Roberto Busa, S.J., and the Invention of the Machine-Generated Concordance,” The Classical Bulletin 75.1 (1999), 3–20Google Scholar
3 Founded in 1982, the ARTFL Project is one of the oldest Digital Humanities Lab in North America. See http://artfl-project.uchicago.edu/ for more details.
6 PhiloLogic is completely open-source; installation instructions for Linux and Mac OS are available here: https://artfl-project.github.io/PhiloLogic4/installation.html. For those who prefer GitHub, the current PhiloLogic repository is https://github.com/ARTFL-Project/PhiloLogic4.
7 The Textual Optics Lab is a Digital Humanities Lab at the University of Chicago which comprises scholars from the ARTFL Project and the Chicago Text Lab. See https://textual-optics-lab.uchicago.edu/ for more details.
8 A collaboration between the ARTFL Project, the Centro di studi Opera del Vocabolario Italiano (Florence, Italy), the William and Katherine Devers Program in Dante Studies (University of Notre Dame), and the Department of Italian Studies at the University of Reading. See http://artfl-project.uchicago.edu/content/ovi for more details.
9 A collaboration between the University of Chicago and The Project on the History of Black Writing (HBW) at the University of Kansas. See https://textual-optics-lab.uchicago.edu/black_writing_corpus for more details.
10 A collaboration between the University of Chicago and the Chinese Periodical Database 全国报刊数据库 at the Shanghai Library. See https://textual-optics-lab.uchicago.edu/shanghai-library-republican-journal-corpus and http://www.cnbksy.cn/ for more details.
11 See https://artflsrv03.uchicago.edu/philologic4/histories_5_7 for the Concordance UI for the Twenty-four Chinese Histories. Figure 1 contains a screenshot of the UI.
12 For a description of the algorithm, see Olsen, Mark, Horton, Russell, and Roe, Glenn, “Something Borrowed: Sequence Alignment and the Identification of Similar Passages in Large Text Collections,” Digital Studies / Le Champ numérique 2.1 (2010)Google Scholar, DOI: http://doi.org/10.16995/dscn.258. A similar method is used in D. A. Smith, R. Cordell and E. M. Dillon, “Infectious Texts: Modeling Text Reuse in Nineteenth-Century Newspapers,” 2013 IEEE International Conference on Big Data, Silicon Valley, CA, 2013, doi: 10.1109/BigData.2013.6691675.
14 For a similar example focused upon Ming literature, see Paul Vierthaler and Mees Gelein, “A BLAST-based, Language-agnostic Text Reuse Algorithm with a MARKUS Implementation and Sequence Alignment Optimized for Large Chinese Corpora,” Journal of Cultural Analytics, March 2019, http://doi.org/10.22148/16.034.
15 For example, for the Records of the Historian (Shi ji 史記) compared with the Book of Han (Han shu 漢書), TextPAIR identified 3,933 shared passages.