Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-09T21:47:34.729Z Has data issue: false hasContentIssue false

SEN: A subword-based ensemble network for Chinese historical entity extraction

Published online by Cambridge University Press:  22 December 2022

Chengxi Yan
Affiliation:
School of Information Resource Management, Renmin University of China, Beijing, China Research Center for Digital Humanities of RUC, Beijing, China
Ruojia Wang*
Affiliation:
School of Management, Beijing University of Chinese Medicine, Beijing, China
Xiaoke Fang
Affiliation:
College of Applied Arts and Science, Beijing Union University, Beijing, China
*
*Corresponding author. E-mail: [email protected]

Abstract

Understanding various historical entity information (e.g., persons, locations, and time) plays a very important role in reasoning about the developments of historical events. With the increasing concern about the fields of digital humanities and natural language processing, named entity recognition (NER) provides a feasible solution for automatically extracting these entities from historical texts, especially in Chinese historical research. However, previous approaches are domain-specific, ineffective with relatively low accuracy, and non-interpretable, which hinders the development of NER in Chinese history. In this paper, we propose a new hybrid deep learning model called “subword-based ensemble network” (SEN), by incorporating subword information and a novel attention fusion mechanism. The experiments on a massive self-built Chinese historical corpus CMAG show that SEN has achieved the best with 93.87% for F1-micro and 89.70% for F1-macro, compared with other advanced models. Further investigation reveals that SEN has a strong generalization ability of NER on Chinese historical texts, which is not only relatively insensitive to the categories with fewer annotation labels (e.g., OFI) but can also accurately capture diverse local and global semantic relations. Our research demonstrates the effectiveness of the integration of subword information and attention fusion, which provides an inspiring solution for the practical use of entity extraction in the Chinese historical domain.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bingenheimer, M. (2015). The digital archive of Buddhist temple gazetteers and named entity recognition (NER) in classical Chinese. Lingua Sinica 1, 8.CrossRefGoogle Scholar
Bhojanapalli, S., Yun, C., Rawat, A.S., Reddi, S. and Kumar, S. (2020). Low-rank bottleneck in multi-head attention models. In Proceedings of the 37th International Conference on Machine Learning, (ICML), pp. 864873.Google Scholar
Botha, J. and Blunsom, P. (2014). Compositional morphology for word representations and language modelling. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, (ICML), pp. 18991907.Google Scholar
Byrne, K. (2007). Nested named entity recognition in historical archive text. In International Conference on Semantic Computing. Washington D.C.: IEEE Computer Society, pp. 589596.CrossRefGoogle Scholar
Cao, S. and Lu, W. (2017). Improving word embeddings with convolutional feature learning and subword information. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, California: Association for the Advancement of Artificial Intelligence, pp. 31443151.CrossRefGoogle Scholar
Cao, S., Lu, W., Zhou, J. and Li, X. (2018). Cw2vec: Learning Chinese word embeddings with stroke n-gram information. In Proceedings of 32nd AAAI Conference on Artificial Intelligence. Palo Alto, California: Association for the Advancement of Artificial Intelligence, pp. 50535061.CrossRefGoogle Scholar
Chaudhary, A., Zhou, C., Levin, L., Neubig, G., Mortensen, D.R. and Carbonell, J.G. (2018). Adapting word embeddings to new languages with morphological and phonological subword representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 32853295.CrossRefGoogle Scholar
Chen, A., Peng, F., Shan, R. and Sun, G. (2006). Chinese named entity recognition with conditional probabilistic models. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 173176.Google Scholar
Chiu, J.-P. and Nichols, E. (2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics 4, 357370.CrossRefGoogle Scholar
Cho, H.-C., Okazaki, N., Miwa, M. and Tsujii, J.I. (2013). Named entity recognition with multiple segment representations. Information Processing and Management 49, 954965.CrossRefGoogle Scholar
Dauphin, Y.N., Fan, A., Auli, M. and Grangier, D. (2017). Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, (ICML), pp. 933941.Google Scholar
De Weerdt, H. (2020). Creating, linking, and analyzing Chinese and Korean datasets: Digital text annotation in MARKUS and COMPARATIVUS. Journal of Chinese History 4, 519527.CrossRefGoogle Scholar
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 41714186.Google Scholar
E, S. and Xiang, Y. (2017). Chinese named entity recognition with character-word mixed embedding. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York: Association for Computing Machinery, pp. 20552058.CrossRefGoogle Scholar
Forney, G.D. (1973). The viterbi algorithm. Proceedings of the IEEE 61, 268278.CrossRefGoogle Scholar
Gong, C., Li, Z., Xia, Q., Chen, W. and Zhang, M. (2020). Hierarchical LSTM with char-subword-word tree-structure representation for Chinese named entity recognition. Science China Information Sciences 63, 115.CrossRefGoogle Scholar
Gui, T., Ma, R., Zhang, Q., Zhao, L., Jiang, Y.G. and Huang, X. (2019). CNN-based Chinese NER with lexicon rethinking. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. Palo Alto, California: Association for the Advancement of Artificial Intelligence, pp. 49824988.CrossRefGoogle Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9, 17351780.CrossRefGoogle ScholarPubMed
Jia, Y. and Ma, X. (2019). Attention in character-based BiLSTM-CRF for Chinese named entity recognition. In Proceedings of the 4th International Conference on Mathematics and Artificial Intelligence. New York: Association for Computing Machinery, pp. 14.CrossRefGoogle Scholar
Ji, Z., Shen, Y., Sun, Y., Yu, T. and Wang, X. (2021). C-CLUE: A benchmark of classical Chinese based on a crowdsourcing system for knowledge graph construction. In China Conference on Knowledge Graph and Semantic Computing. Singapore: Springer, pp. 295301.CrossRefGoogle Scholar
Jin, Y., Xie, J., Guo, W., Luo, C., Wu, D. and Wang, R. (2019). LSTM-CRF neural network with gated self attention for Chinese NER. IEEE Access 7, 136694136703.CrossRefGoogle Scholar
Kingma, D.P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980.Google Scholar
Lafferty, J.D., McCallum, A. and Pereira, F.C. (2001). Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of 18th International Conference on Machine Learning, (ICML), pp. 282289.Google Scholar
Leong, K.S., Wong, F., Li, Y. and Dong, M.C. (2008). Chinese tagging based on maximum entropy model. In Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing, pp. 138142.Google Scholar
Levow, G.A. (2006). The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 108117.Google Scholar
Li, J., Sun, A., Han, J. and Li, C. (2020). A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering 345070.CrossRefGoogle Scholar
Li, L., Mao, T., Huang, D. and Yang, Y. (2006). Hybrid models for Chinese named entity recognition. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 7278.Google Scholar
Li, P.-H., Fu, T.-J. and Ma, W.-Y. (2020). Why attention? analyze BiLSTM deficiency and its remedies in the case of NER. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, California: Association for the Advancement of Artificial Intelligence, pp. 82368244.CrossRefGoogle Scholar
Liu, C.-L., Huang, C.-K., Wang, H. and Bol, P.K. (2015). Mining local gazetteers of literary Chinese with CRF and pattern based methods for biographical information in Chinese history. In Proceedings of the 2015 IEEE International Conference on Big Data. Washington D.C.: IEEE Computer Society, pp. 16291638.CrossRefGoogle Scholar
Long, Y., Xiong, D., Lu, Q., Li, M. and Huang, C.R. (2016). Named entity recognition for Chinese novels in the ming-qing dynasties. In Workshop on Chinese Lexical Semantics. Cham, Switzerland: Springer, pp. 362375.CrossRefGoogle Scholar
Luong, M.-T., Socher, R. and Manning, C.D. (2013). Better word representations with recursive neural networks for morphology. In Proceedings of the 17th Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 104113.Google Scholar
Ma, X. and Hovy, E. (2016). End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 10641074.CrossRefGoogle Scholar
Meng, Y., Wu, W., Wang, F., Li, X., Nie, P., Yin, F., Li, M., Han, Q., Sun, X. and Li, J. (2019). Glyce: Glyph-vectors for Chinese character representations. In Proceedings of the 33rd Conference on Neural Information Processing Systems. NeurIPS, pp. 27462757.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems. NeurIPS, pp. 31113119.Google Scholar
Peng, W., Cheng, H. and Chen, S.-P. (2018). From text to data: Extracting posting data from Chinese local gazetteers. In Proceedings of the 9th International Conference of Digital Archives and Digital Humanities. DADH, pp. 79125.Google Scholar
Pennington, J., Socher, R. and Manning, C.D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, pp.15321543.CrossRefGoogle Scholar
Sun, Y., Lin, L., Yang, N., Ji, Z. and Wang, X. (2014). Radical-enhanced Chinese character embedding. In Proceedings of the 21st International Conference on Neural Information Processing. Cham, Switzerland: Springer, pp. 279286.CrossRefGoogle Scholar
Tsai, R.T.-H., Wu, S.-H., Lee, C.-W., Shih, C.W. and Hsu, W.L. (2004). Mencius: A Chinese named entity recognizer using the maximum entropy-based hybrid model. International Journal of Computational Linguistics and Chinese Language Processing 9, 6582.Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems. NeurIPS, pp. 59986008.Google Scholar
Watson, R.S. (1986). The named and the nameless: gender and person in Chinese society. American Ethnologist 13, 619631.CrossRefGoogle Scholar
Wu, S., Song, X. and Feng, Z. (2021). MECT: Multi-metadata embedding based cross-transformer for Chinese named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 15291539.CrossRefGoogle Scholar
Wu, X., Zhao, H. and Che, C. (2018). Term translation extraction from historical classics using modern chinese explanation. In International Symposium on Natural Language Processing Based on Naturally Annotated Big Data. Cham, Switzerland: Springer, pp. 8898.CrossRefGoogle Scholar
Xiong, D., Xu, J., Lu, Q. and Lo, F. (2014). Recognition and extraction of honorifics in Chinese diachronic corpora. In Proceedings of the 15th Workshop on Chinese Lexical Semantics. Cham, Switzerland: Springer, pp. 305316.CrossRefGoogle Scholar
Xu, C., Wang, F., Han, J. and Li, C. (2019). Exploiting multiple embeddings for Chinese named entity recognition. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery, pp. 22692272.CrossRefGoogle Scholar
Yan, C. and Wang, J. (2020). Exploiting hybrid subword information for Chinese historical named entity recognition. In Proceedings of 2020 IEEE International Conference on Big Data. Washington D.C.: IEEE Computer Society, pp. 47954801.CrossRefGoogle Scholar
Yang, F., Zhang, J., Liu, G., Zhou, J., Zhou, C. and Sun, H. (2018). Five-stroke based CNN-BiRNN-CRF network for Chinese named entity recognition. In Proceedings of the 7th CCF International Conference on Natural Language Processing and Chinese Computing. Cham , Switzerland: Springer, pp. 184195.CrossRefGoogle Scholar
Yu, P. and Wang, X. (2021). BERT-based named entity recognition in Chinese Twenty-Four Histories. In International Conference on Web Information Systems and Applications. Cham , Switzerland: Springer, pp. 289301.Google Scholar
Zhang, Y., Liu, Y., Zhu, J., Zhen, Z., Liu, X., Wang, W., Chen, Z. and Zhai, S. (2019). Learning Chinese word embeddings from stroke, structure and Pinyin of characters. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery, pp. 10111020.CrossRefGoogle Scholar
Zhang, Y., Xu, Z. and Zhang, T. (2008). Fusion of multiple features for Chinese named entity recognition based on CRF model. In Asia Information Retrieval Symposium. Cham, Switzerland: Springer, pp. 95106.CrossRefGoogle Scholar
Zhou, Y., Huang, L., Guo, T., Hu, S. and Han, J. (2019). An attention-based model for joint extraction of entities and relations with implicit entity features. In Proceedings of the 2019 World Wide Web Conference. New York: Association for Computing Machinery, pp. 729737.CrossRefGoogle Scholar
Zhu, Y. and Wang, G. (2019). CAN-NER: Convolutional attention network for Chinese named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 33843393.Google Scholar