Exploring the effectiveness of linguistic knowledge for biographical relation extraction†

MARCOS GARCIA; PABLO GAMALLO

doi:10.1017/S1351324913000314

Exploring the effectiveness of linguistic knowledge for biographical relation extraction†

Published online by Cambridge University Press: 18 October 2013

MARCOS GARCIA and

PABLO GAMALLO

Show author details

MARCOS GARCIA: Affiliation:
Centro Singular de Investigación en Tecnoloxías da Información (CITIUS), University of Santiago de Compostela, Coruña, Spain e-mail: [email protected], [email protected]
PABLO GAMALLO: Affiliation:
Centro Singular de Investigación en Tecnoloxías da Información (CITIUS), University of Santiago de Compostela, Coruña, Spain e-mail: [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Machine learning techniques have been implemented to extract instances of semantic relations using diverse features based on linguistic knowledge, such as tokens, lemmas, PoS-tags, or dependency paths. However, there has been little work aiming to know which of these features works better in the relation extraction task, and less in languages other than English. In this paper, various features representing different levels of linguistic knowledge are systematically evaluated for biographical relation extraction. The effectiveness of these features was measured by training several supervised classifiers that only differ in the type of linguistic knowledge used to define their features. The experiments performed in this paper show that some basic linguistic knowledge (provided by lemmas and their combination in bigrams) behaves better than other complex features, such as those based on syntactic analysis. Furthermore, some feature combinations using different levels of analysis are proposed in order (i) to avoid feature overlapping as well as (ii) to evaluate the use of computationally inexpensive and widespread tools such as tokenization and lemmatization. This paper also describes two new freely available corpora for biographical relation extraction in Portuguese and Spanish, built by means of a distant-supervision strategy. Experiments were performed with five semantic relations and two languages, using these corpora.

Type: Articles
Information: Natural Language Engineering , Volume 21 , Issue 4 , August 2015 , pp. 519 - 551

DOI: https://doi.org/10.1017/S1351324913000314 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agichtein, E. 2005. Extracting Relations from Large Text Collections, PhD Thesis. New York: Columbia University.Google Scholar

Agichtein, E., and Gravano, L. 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the 5th Association for Computing Machinery Conference on Digital Libraries, San Antonio, TX, USA, pp. 85–94.Google Scholar

Aguado de Cea, G., Gómez-Pérez, A., Montiel-Ponsoda, E., and Suárez-Figueroa, M. 2008. Natural language-based approach for helping in the reuse of ontology design patterns. In Knowledge Engineering: Practice and Patterns, pp. 32–47. Berlin: Springer-Verlag.CrossRef Google Scholar

Akbik, A., and Broß, J. 2009. Wanderlust: extracting semantic relations from natural language text using dependency grammar patterns. In Proceedings of the Workshop on Semantic Search (SemSearch 2009) at the 18th International World Wide Web Conference (WWW 2009), Madrid, Spain, pp. 6–15.Google Scholar

Banko, M., Cafarella, M. J., Soderl, S., Broadhead, M., and Etzioni, O. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, pp. 2670–6.Google Scholar

Brin, S. 1998. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology (EDBT 1998), València, Spain, pp. 172–83.Google Scholar

Bruckschen, M., de Souza, J. G. C., Vieira, R., and Rigo, S. 2008. Sistema SeRELeP para o reconhecimento de relações entre entidades mencionadas. In Mota, C. and Santos, D. (eds.), Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: O Segundo HAREM, pp. 247–60. Linguateca.Google Scholar

Bunescu, R. C., and Mooney, R. J. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of the Human Language Technology Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, Canada, pp. 724–31.Google Scholar

Bunescu, R. C., and Mooney, R. J. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 576–83.Google Scholar

Cardoso, N. 2008. REMBRANDT - Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In Mota, C. and Santos, D. (eds.), Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: O Segundo HAREM, pp. 195–211. Linguateca.Google Scholar

Chang, C., and Lin, C. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (3): 1–27.CrossRef Google Scholar

Chaves, M. S. 2008. Geo-ontologias e padrões para reconhecimento de locais e de suas relações em textos: o SEI-Geo no Segundo HAREM. In Mota, C. and Santos, D. (eds.), Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: O Segundo HAREM, pp. 231–45. Linguateca.Google Scholar

Costa, F., and Branco, A. 2012. Extracting temporal information from portuguese texts. In Proceedings of the 10th International Conference on Computational Processing of the Portuguese Language (PROPOR 2012), pp. 99–105. Lecture Notes in Artificial Intelligence, vol. 7243. Berlin: Springer-Verlag.CrossRef Google Scholar

Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A. M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. 2004. Web-scale information extraction in KnowItAll. In Proceedings of the 13th International Conference on World Wide Web (WWW 2004), New York, USA, pp. 100–10.CrossRef Google Scholar

Etzioni, O., Fader, A., Christensen, J., Soderland, S., and Center, M. T. 2011. Open information extraction: the second generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Catalonia, Spain.Google Scholar

Finkelstein-Landau, M., and Morin, E. 1999. Extracting semantic relationships between terms: supervised vs. unsupervised methods. In Proceedings of International Workshop on Ontological Engineering on the Global Information Infrastructure, Dagstuhl Castle, Germany, pp. 71–80.Google Scholar

Fleischman, M., Hovy, E., and Echihabi, A. 2003. Offline strategies for online question answering: answering questions before they are asked. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan, pp. 1–7.Google Scholar

Gamallo, P., and González, I. 2013. A compressing strategy for dependency parsing. Under review for Revista Electrónica de Lingüística Aplicada.Google Scholar

Gamallo, P., Garcia, M., and Fernández-Lanza, S. 2012. Dependency-based open information extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP (ROBUS-UNSUP 2012) at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp. 10–18.Google Scholar

Gamallo, P., and González, I. 2011. A grammatical formalism based on patterns of part-of-speech tags. International Journal of Corpus Linguistics 16 (1): 45–71.CrossRef Google Scholar

Garcia, M., and Gamallo, P. 2011a. An exploration of the linguistic knowledge for semantic relation extraction in Spanish. In Saint-Dizier, P. and Mehta-Melkar, R. (eds.), Proceedings of the Joint Workshop FAM-LbR/KRAQ 2011. Learning by Reading and Its Applications in Intelligent Question-Answering at 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Catalonia, Spain, pp. 7–12.Google Scholar

Garcia, M., and Gamallo, P. 2011b. Dependency-based text compression for semantic relation extraction. In Nakov, P., Kozareva, Z., Ganchev, K., and Hobbs, J. (eds.), Proceedings of the Workshop on Information Extraction and Knowledge Acquisition (IEKA 2011) at 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), Hissar, Bulgaria, pp. 21–8.Google Scholar

Garera, N., and Yarowsky, D. 2009. Structural, transitive and latent models for biographic fact extraction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), Athens, Greece, pp. 300–8.Google Scholar

Grishman, R. 2010. The impact of task and corpus on event extraction systems. In Proceeding of 7th Language Resources and Evaluation Conference (LREC 2010), Valleta, Malta.Google Scholar

Hearst, M. A. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th Conference on Computational Linguistics 2: 539–45.CrossRef Google Scholar

Hoffmann, R., Zhang, C., and Weld, D. S. 2010. Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, pp. 286–95.Google Scholar

Jiang, J., and Zhai, C. 2007. A systematic exploration of the feature space for relation extraction. In Proceedings of the Human Language Technologies/The Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2007), Rochester, NY, USA, pp. 113–20.Google Scholar

Jijkoun, V., De Rijke, M., and Mur, J. 2004. Information extraction for question answering: improving recall through syntactic patterns. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 1284–90.CrossRef Google Scholar

Kambhatla, N. 2004. Combining lexical, syntactic and semantic features with maximum entropy models for extracting relations. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Catalonia, Spain.CrossRef Google Scholar

Lin, D. 2003. Dependency-based evaluation of MINIPAR. Treebanks: Building and Using Parsed Corpora 20: 317–29.CrossRef Google Scholar

Liu, X., Nie, Z., Yu, N., and Wen, J. 2010. BioSnowball: automated population of Wikis. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), Washington, DC, USA, pp. 969–78.CrossRef Google Scholar

Mann, G. S. 2002. Fine-grained proper noun ontologies for question answering. In Proceedings of the 2002 Workshop on Building and Using Semantic Networks (SemaNet 2002), Taipei, Taiwan, pp. 1–7.Google Scholar

Mintz, M., Bills, S., Snow, R., and Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL/AFNLP 2009), Singapore, pp. 1003–11.Google Scholar

Mota, C., and Santos, D. 2008. Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca.Google Scholar

Nagy, I., and Farkas, R. 2010. Person attribute extraction from the textual parts of web pages. In CLEF (Notebook Papers/LABs/Workshops), Padua, Italy.Google Scholar

Nguyen, D. P. T., Matsuo, Y., and Ishizuka, M. 2007. Relation extraction from Wikipedia using subtree mining. In Proceedings of the 22nd National Conference on Artificial Intelligence, Vancouver, Canada, vol. 2, pp. 1414–20.Google Scholar

Nguyen, T.-V. T., Moschitti, A., and Riccardi, G. 2009. Convolution kernels on constituent, dependency and sequential structures for relation extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, vol. 3, pp. 1378–87.Google Scholar

Oliveira, H. G., and Gomes, P. 2010. Onto.PT: automatic construction of a lexical ontology for portuguese. In Proceedings of 5th European Starting AI Researcher Symposium (STAIRS 2010), Lisbon, Portugal, pp. 199–211.Google Scholar

Oliveira, H. G., Santos, D., Gomes, P., and Seco, N. 2008. PAPEL: a dictionary-based lexical ontology for Portuguese. In Computational Processing of the Portuguese Language, pp. 31–40. Berlin: Springer-Verlag.CrossRef Google Scholar

Padró, Ll., Collado, M., Reese, S., Lloberes, M., and Castellón, I. 2010. FreeLing 2.1: five years of open-source language processing tools. In Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), Valleta, Malta.Google Scholar

Pantel, P., and Pennacchiotti, M. 2006. Espresso: leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, NSW, Australia, pp. 113–20.Google Scholar

Pasca, M., Lin, D., Bigham, J., Lifchits, A., and Jain, A. 2006. Organizing and searching the world wide web of facts-step one: the one-million fact extraction challenge. In Proceedings of the National Conference on Artificial Intelligence, Boston, MA, USA, vol. 21, pp. 1400–5.Google Scholar

Ravichandran, D., and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, USA, pp. 41–7.Google Scholar

Riedel, S., Yao, L., and McCallum, A. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Berlin: Springer-Verlag.Google Scholar

Ruiz-Casado, M., Alfonseca, E., and Castells, P. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Proceedings of the Atlantic Web Intelligence Conference (AWIC 2005), pp. 380–6. Lecture Notes in Computer Science, vol. 3528. Berling: Springer-Verlag.Google Scholar

Sánchez-Cuadrado, S., Lloréns, J., Morato, J., and Hurtado, J. A. 2003. Extracción automática de relaciones semánticas. In 2da Conferencia Iberoamericana en Sistemas, Cibernética e Informática (CISCI 2003), Orlando, Florida, pp. 41–7.Google Scholar

Sierra, G., Alarcón, R., Aguilar, C., and Bach, C. 2008. Definitional verbal patterns for semantic relation extraction. Terminology 14 (1): 74–98.Google Scholar

Snow, R., Jurafsky, D., and Ng, A. Y. 2005. Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems 17: 1297–304.Google Scholar

Soares, S., Martins, B., and Calado, P. 2011. Extracting biographical sentences from textual documents. In Proceedings of the 15th Portuguese Conference on Artificial Intelligence (EPIA 2011), Lisbon, Portugal, pp. 718–30.Google Scholar

Soler, V., and Alcina, A. 2008. Patrones léxicos para la extracción de conceptos vinculados por la relación parte-todo en español. Terminology 14 (1): 99–123.Google Scholar

Suchanek, F. M., Ifrim, G., and Weikum, G. 2006. LEILA: Learning to Extract Information by Linguistic Analysis. In Second Workshop on Ontology Population (OLP2) at the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, NSW, Australia.Google Scholar

Sun, A., Grishman, R., Xu, W., and Min, B. 2011. New York University 2011 system for KBP slot filling. In Proceedings of the Text Analytics Conference (TAC 2011), Gaithersburg, MD, USA.Google Scholar

Wan, X., Gao, J., Li, M., and Ding, B. 2005. Person resolution in person search results: WebHawk. In Proceedings of the 14th Association for Computing Machinery International Conference on Information and Knowledge Management (CIKM 2005), Bremen, Germany, pp. 163–70.Google Scholar

Wu, F., and Weld, D. S. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, pp. 118–27.Google Scholar

Yan, Y., Okazaki, N., Matsuo, Y., Yang, Z., and Ishizuka, M. 2009. Unsupervised relation extraction by mining Wikipedia texts using information from the web. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL/AFNLP 2009), Singapore, pp. 1021–9.Google Scholar

Zhao, S., and Grishman, R. 2005. Extracting relations with integrated information using kernel methods. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, MI, USA, pp. 419–26.Google Scholar

Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006 A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, NSW, Australia, pp. 825–32.Google Scholar

Zhou, G., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, MI, USA, pp. 427–34.Google Scholar

Zhou, G., Zhang, M., Ji, D. H., and Zhu, Q. 2007. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, pp. 728–36.Google Scholar

Article contents

Exploring the effectiveness of linguistic knowledge for biographical relation extraction†

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests