Hostname: page-component-586b7cd67f-t7fkt Total loading time: 0 Render date: 2024-11-29T03:47:15.291Z Has data issue: false hasContentIssue false

Exploring open information via event network

Published online by Cambridge University Press:  26 October 2017

YANPING CHEN
Affiliation:
Guizhou Provincial Key Laboratory of Public Big Data, GuiZhou University, China e-mail: [email protected] Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: [email protected], [email protected], [email protected]
QINGHUA ZHENG
Affiliation:
Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: [email protected], [email protected], [email protected]
FENG TIAN
Affiliation:
National Engineering Lab of Big Data Analytics, Xi’an Jiaotong University, China e-mail: [email protected]
HUAN LIU
Affiliation:
Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: [email protected], [email protected], [email protected]
YAZHOU HAO
Affiliation:
Shaanxi Province Key Laboratory of Satellite and Terrestrial Network Technology, Xi’an Jiaotong University, China e-mail: [email protected], [email protected], [email protected]
NAZARAF SHAH
Affiliation:
The Faculty of Engineering and Computing, Coventry University, UK e-mail: [email protected]

Abstract

It is a challenging task to discover information from a large amount of data in an open domain.1 In this paper, an event network framework is proposed to address this challenge. It is in fact an empirical construct for exploring open information, composed of three steps: document event detection, event network construction and event network analysis. First, documents are clustered into document events for reducing the impact of noisy and heterogeneous resources. Secondly, linguistic units (e.g., named entities or entity relations) are extracted from each document event and combined into an event network, which enables content-oriented retrieval. Then, in the final step, techniques such as social network or complex network can be applied to analyze the event network for exploring open information. In the implementation section, we provide examples of exploring open information via event network.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This research is supported in part by the National Science Foundation of China under grant numbers 201721002, 61462011, 61540050 and 61472315; The Fundamental Theory and Applications of Big Data with Knowledge Engineering under the National Key Research and Development Program of China with grant number 2016YFB1000903, Project of China Knowledge Centre for Engineering Science and Technology, and the Ministry of Education Innovation Research Team no. IRT13035. The Open project no. 2017BDKFJJ018; the Major Applied Basic Research Program of Guizhou Province no. JZ20142001. Introduce Talents Science Projects of Guizhou University no. 201650.

References

Agichtein, E., and Gravano, L., 2000. Snowball: extracting relations from large plain-text collections. In Proceedings of the DL ’00, San Antonio, USA, ACM, pp. 8594.Google Scholar
Ahn, D., 2006. The stages of event extraction. In Proceedings of the ARTE ’06, Sydney, Australia, ACL, pp. 18.Google Scholar
Alex, B., Haddow, B., and Grover, C., 2007. Recognising nested named entities in biomedical text. In Proceedings of the BioNLP ’07, Prague, Czech Republic, ACL, pp. 6572.Google Scholar
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study: final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.Google Scholar
Angel, A., Sarkas, N., Koudas, N., and Srivastava, D., 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proceedings of the VLDB Endowment 5 (6): 574–85.CrossRefGoogle Scholar
Angeli, G., Premkumar, M., and Manning, C., 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the ACL ’15, Beijing, China, ACL, pp. 344–54.Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. 2007. Dbpedia: a nucleus for a web of open data. In The Semantic Web, pp. 722–35. Springer.Google Scholar
Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O., 2007. Open information extraction for the web. In Proceedings of the IJCAI ’07, Hyderabad, India, AAAI, pp. 2670–6.Google Scholar
Banko, M., Etzioni, O., and Center, T. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of the ACL ’08, NAACL, pp. 2836.Google Scholar
Batagelj, V., and Mrvar, A., 1998. Pajek-program for large network analysis. Connections 21 (2): 4757.Google Scholar
Blei, D., Ng, A., and Jordan, M., 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 : 9931022.Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., and Taylor, J., 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the SIGMOD ’08, Vancouver, Canada, ACM, pp. 1247–50.Google Scholar
Brin, S. 1998. Extracting patterns and relations from the world wide web. In The World Wide Web and Databases, pp. 172–83. Springer.Google Scholar
Carpenter, B., 2006. Character language models for Chinese word segmentation and named entity recognition. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 169–72.Google Scholar
Che, W., Liu, T., and Li, S., 2005. Automatic entity relation extraction. Journal of Chinese Information Processing 19 : 16.Google Scholar
Chen, A., Peng, F., Shan, R., and Sun, G., 2006. Chinese named entity recognition with conditional probabilistic models. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 173–6.Google Scholar
Chen, W., Zhang, Y., and Isahara, H., 2006. Chinese named entity recognition with conditional random fields. In Proceedings of the SIGHAN ’06, Sydney, Australia, ACL, pp. 118–21.Google Scholar
Chen, Y., Ouyang, Y., Li, W., Zheng, D., and Zhao, T., 2010. Using deep belief nets for Chinese named entity categorization. In Proceedings of the NEWS ’10, Uppsala, Sweden, ACL, pp. 102–9.Google Scholar
Chen, Y., Zheng, Q., and Chen, P., 2015a. A boundary assembling method for chinese entity mention recognition. IEEE Intelligent Systems 30 (6): 50–8.CrossRefGoogle Scholar
Chen, Y., Zheng, Q., and Chen, P., 2015b. Feature assembly method for extracting relations in Chinese. Artificial Intelligence 228 : 179–94.Google Scholar
Chen, Y., Zheng, Q., and Zhang, W., 2014. Omni-word feature and soft constraint for chinese relation extraction. In Proceedings of the ACL’14, Baltimore, USA, ACL, pp. 572–81.Google Scholar
Chiu, J., and Nichols, E. 2015. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 4: 357–70.CrossRefGoogle Scholar
Collins, M., and Duffy, N., 2001. Convolution kernels for natural language. In Proceedings of the NIPS ’01, Vancouver, Canada, pp. 625–32.Google Scholar
Csardi, G., and Nepusz, T., 2006. The igraph software package for complex network research. InterJournal, Complex Systems 1695 (5): 19.Google Scholar
Curran, J., Murphy, T., and Scholz, B. 2007. Minimising semantic drift with mutual exclusion bootstrapping. In Proceedings of the PACL ’07, Melbourne, Australia, ACL.Google Scholar
Das Sarma, A., Jain, A., and Yu, C., 2011. Dynamic relationship and event discovery. In Proceedings of the WSDM ’11, Hong Kong, China, ACM, pp. 207–16.Google Scholar
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (6): 391407.3.0.CO;2-9>CrossRefGoogle Scholar
Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., and Bontcheva, K., 2015. Analysis of named entity recognition and linking for tweets. Information Processing & Management 51 : 3249.Google Scholar
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program–tasks, data, and evaluation. In Proceedings of LREC ’04, Lisbon, Portugal, ELRA 4: 837–40.Google Scholar
Downey, D., Schoenmackers, S., and Etzioni, O. 2007. Sparse information extraction: unsupervised language models to the rescue. In Proceedings of the ACL ’07, Prague, Czech Republic, ACL.Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A. M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A., 2005. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165 : 91134.Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S., and Mausam, M. 2011. Open Information extraction: the second generation. In Proceedings of the IJCAI ’11, Barcelona, Spain, AAAI 11: 3–10.Google Scholar
Fu, G., and Luke, K., 2005. Chinese named entity recognition using lexicalized HMMs. In Proceedings of the SIGKDD ’05, Chicago, USA, ACM, pp. 1925.Google Scholar
Hacioglu, K., Douglas, B., and Chen, Y., 2005. Detection of entity mentions occurring in English and Chinese text. In Proceedings of the HLT-EMNLP ’05, Vancouver, Canada, ACL, pp. 379–86.Google Scholar
Hoffmann, R., Zhang, C., and Weld, D. S., 2010. Learning 5000 relational extractors. In Proceedings of the ACL ’10, Uppsala, Sweden, ACL, vol. 10, pp. 286–95.Google Scholar
Jones, K., 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28 (1): 1121.Google Scholar
Kambhatla, N., 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relation. In Proceedings of the ACL ’04, Barcelona, Spain, ACL, pp. 178–81.Google Scholar
Kozareva, Z., and Hovy, E., 2010. Learning arguments and supertypes of semantic relations using recursive patterns. In Proceedings of the ACL’10, Uppsala, Sweden, ACL, pp. 1482–91.Google Scholar
Kuzey, E., and Weikum, G., 2014. Evin: building a knowledge base of events. In Proceedings of the WWW ’14, Seoul, Korea, IW3C2, pp. 103–6.Google Scholar
Kuzey, E., Vreeken, J., and Weikum, G., 2014. A fresh look on knowledge bases: distilling named events from news. In Proceedings of the CIKM ’14, Shanghai, China, ACM, pp. 1689–98.Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., and Dyer, C. 2016. Neural architectures for named entity recognition. Proceedings of the NAACL-HLT ’16, San Diego, USA, ACL, pp. 260–70.Google Scholar
Lenat, D., 1995. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM 38 (11): 33–8.CrossRefGoogle Scholar
Leydesdorff, L., and Vaughan, L. 2006. Co-occurrence matrices and their applications in information science: Extending ACA to the Web environment. Journal of the Association for Information Science and Technology 57 (12), 1616–28.Google Scholar
Ling, G., Asahara, M., and Matsumoto, Y., 2003. Chinese unknown word identification using character-based tagging and chunking. In Proceedings of the ACL ’03, Sapporo, Japan, ACL, pp. 197200.Google Scholar
Liu, M., Liu, K., Xu, L., and Zhao, J., 2014. Exploring fine-grained entity type constraints for distantly supervised relation extraction. In Proceedings of COLING’14, Nantes, France, ACL, pp. 2107–16.Google Scholar
Luhn, H., 1957. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1 (4): 309–17.Google Scholar
McCallum, A., 2005. Information extraction: distilling structured data from unstructured text. Queue 3 (9): 4857.Google Scholar
McIntosh, T., Yencken, L., Curran, J. R., and Baldwin, T., 2011. Relation guided bootstrapping of semantic lexicons. In Proceedings of the ACL ’11, Portland, USA, ACL, pp. 266–70.Google Scholar
Miller, G., 1995. WordNet: a lexical database for English. Communications of the ACM 38 (11): 3941.Google Scholar
Mintz, M., Bills, S., Snow, R., and Jurafsky, D., 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the ACL ’09, Singapore, ACL, pp. 1003–11.Google Scholar
Mohamed, T., Hruschka, E. Jr, and Mitchell, T., 2011. Discovering relations between noun categories. In Proceedings of the ACL’11, Portland, USA, ACL, pp. 1447–55.Google Scholar
Moro, A., and Navigli, R., 2013. Integrating syntactic and semantic analysis into the open information extraction paradigm. In Proceedings of the IJCAI ’13, Beijing, China, AAAI, pp. 2148–54.Google Scholar
Moro, A., Li, H., Krause, S., Xu, F., Navigli, R., and Uszkoreit, H., 2013. Semantic rule filtering for web-scale relation extraction. In Proceedings of the ISWC’13, Sydney, Australia, Springer, pp. 347–62.Google Scholar
Nallapati, R., Feng, A., Peng, F., and Allan, J., 2004. Event threading within news topics. In Proceedings of the CIKM ’04, Washington, USA, ACM, pp. 446–53.CrossRefGoogle Scholar
Nothman, J., Ringland, N., Radford, W., Murphy, T., and Curran, J., 2013. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194 : 151–75.Google Scholar
Padró, L., Agić, Ž., Carreras, X., Fortuna, B., Garcia-Cuesta, E., Li, Z., Štajner, T., and Tadić, M., 2014. Language processing infrastructure in the xlike project. In Proceedings of the LREC ’14, Reykjavik, Iceland, ELRA, pp. 3811–6.Google Scholar
Parikh, R., and Karlapalem, K., 2013. Et: events from tweets. In Proceedings of the WWW ’13, Rio de Janeiro, Brazil, IW3C2, pp. 613–20.Google Scholar
Phan, X. H., and Nguyen, C. T. 2007. GibbsLDA++: AC/C++ implementation of latent Dirichlet allocation. Technical Report. see http://gibbslda.sourceforge.net/.Google Scholar
Piskorski, J., Tanev, H., Atkinson, M., Van Der Goot, E., and Zavarella, V. 2011. Online news event extraction for global crisis surveillance. In Nguyen, N. T. (ed.) Transactions on Computational Collective Intelligence V, pp. 182212. Berlin, Heidelberg: Springer.Google Scholar
Riedel, S., Yao, L., McCallum, A., and Marlin, B., 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the HLT-NAACL ’13, Atlanta, USA, ACL, pp. 721–9.Google Scholar
Ritter, A., Mausam, Etzioni, O., and Clark, S., 2012. Open domain event extraction from twitter. In Proceedings of the SIGKDD ’12, Beijing, China, ACM, pp. 1104–12.Google Scholar
Roth, D., and Yih, W., 2002. Probabilistic reasoning for entity & relation recognition. In Proceedings of the COLING ’02, Taipei, Taiwan, ACL, pp. 17.Google Scholar
Roth, D., and Yih, W. 2007. Global inference for entity and relation identification via a linear programming formulation. In Introduction to Statistical Relational Learning, pp. 553–80. Cambridge, USA: MIT Press.Google Scholar
Rospocher, M., van Erp, M., Vossen, P., Fokkens, A., Aldabe, I., Rigau, G., Soroa, A., Ploeger, T., and Bogaard, T. 2016. Building event-centric knowledge graphs from news. Journal of Web Semantics, 37, 132–51.Google Scholar
Sowa, J. F., 1984. Conceptual Structures: Information Processing in Mind and Machine. Boston, USA: Addison-Wesley Pub.Google Scholar
Suchanek, F., Kasneci, G., and Weikum, G., 2007. Yago: A core of semantic knowledge. In Proceedings of the WWW ’07, Banff, Canada, IW3C2, pp. 690706.Google Scholar
Sun, L., and Han, X., 2014. A feature-enriched tree kernel for relation extraction. In Proceedings of the ACL’14, Baltimore, USA, ACL, pp. 61–7.Google Scholar
Takamatsu, S., Sato, I., and Nakagawa, H., 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the ACL ’12, Jeju, Korea, ACL, pp. 721–9.Google Scholar
Tang, B., Cao, H., Wang, X., Chen, Q., and Xu, H. 2014. Evaluating word representation features in biomedical named entity recognition tasks. BioMed Research International, 2014, 16.Google ScholarPubMed
Trieschnigg, D., and Kraaij, W. 2004. TNO Hierarchical topic detection report at TDT 2004. In Proceedings of the Topic Detection and Tracking Workshop.Google Scholar
Vossen, P., Agerri, R., Aldabe, I., Cybulska, A., van Erp, M., Fokkens, A., Laparra, E., Minard, A. L., Aprosio, A. P., Rigau, G., Rospocher, M., and Segers, R., 2016. NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowledge-Based Systems 110 : 6085.CrossRefGoogle Scholar
Wang, W., Besançon, R., Ferret, O., and Grau, B., 2011. Filtering and clustering relations for unsupervised information extraction in open domain. In Proceedings of the CIKM ’11, Glasgow, Scotland, ACM, pp. 1405–14.Google Scholar
Weld, D. S., Hoffmann, R., and Wu, F. 2009. Using wikipedia to bootstrap open information extraction. ACM SIGMOD Record 37 (4): 266–70.Google Scholar
Xu, Y., Kim, M., Quinn, K., Goebel, R., and Barbosa, D., 2013. Open information extraction with tree kernels. In Proceedings of the HLT-NAACL ’13, Atlanta, USA, ACL, pp. 868–77.Google Scholar
Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., and Liu, X., 1999. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and Their Applications 14 (4): 3243.Google Scholar
Zelenko, D., Aone, C., and Richardella, A., 2003. Kernel methods for relation extraction. Journal of Machine Learning Research 3 : 1083–106.Google Scholar
Zeng, D., Liu, K., Lai, S., Zhou, G., and Zhao, J., 2014. Relation classification via convolutional deep neural network. In Proceedings of the COLING’14, Nantes, France, ACL, pp. 2335–44.Google Scholar
Zhang, M., Zhang, J., Su, J., and Zhou, G., 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the COLING-ACL ’06, Sydney, Australia, ACL, pp. 825–32.Google Scholar
Zhang, S., Duh, K., and Van Durme, B., 2017. MT/IE: Cross-lingual open information extraction with neural sequence-to-sequence models. In Proceedings of the EACL ’17, Valencia, Spain, ACL, pp. 6470.Google Scholar
Zhang, P., Li, W., Hou, Y., and Song, D., 2011. Developing position structure-based framework for Chinese entity relation extraction. ACM Transactions on Asian Language Information Processing 10 : 14.Google Scholar
Zhang, Y., and Callan, J. 2004. CMU DIR supervised tracking report. In Proceedings of the DARPA Workshop.Google Scholar
Zhou, G., Su, J., Zhang, J., and Zhang, M., 2005. Exploring various knowledge in relation extraction. In Proceedings of the ACL ’05, Ann Arbor, Michigan, ACL, pp. 427–34.Google Scholar
Zhu, J., Nie, Z., Liu, X., Zhang, B., and Wen, J., 2009. StatSnowball: a statistical approach to extracting entity relationships. In Proceedings of the WWW ’09, Raleigh, North Carolina, IW3C2, pp. 101–10.Google Scholar
Zhou, Y., Huang, C., Gao, J., and Wu, L., 2005. Transformation based Chinese entity detection and tracking. In Proceedings of the IJCNLP ’05, Jeju Island, Korea, Springer, pp. 232–7.Google Scholar