ERD-MedLDA: Entity relation detection using supervised topic models with maximum margin learning

DINGCHENG LI; SWAPNA SOMASUNDARAN; AMIT CHAKRABORTY

doi:10.1017/S1351324912000058

ERD-MedLDA: Entity relation detection using supervised topic models with maximum margin learning

Published online by Cambridge University Press: 14 March 2012

DINGCHENG LI ,

SWAPNA SOMASUNDARAN and

AMIT CHAKRABORTY

Show author details

DINGCHENG LI: Affiliation:
Liberal Arts–TC, University of Minnesota, Twin Cities, MN 55455, USA email: [email protected]
SWAPNA SOMASUNDARAN: Affiliation:
Siemens Corporate Research, Princeton, NJ 08540, USA email: [email protected]
AMIT CHAKRABORTY: Affiliation:
Siemens Corporate Research, Princeton, NJ 08540, USA email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper proposes a novel application of topic models to do entity relation detection (ERD). In order to make use of the latent semantics of text, we formulate the task of relation detection as a topic modeling problem. The motivation is to find underlying topics that are indicative of relations between named entities (NEs). Our approach considers pairs of NEs and features associated with them as mini documents, and aims to utilize the underlying topic distributions as indicators for the types of relations that may exist between the NE pair. Our system, ERD-MedLDA, adapts Maximum Entropy Discriminant Latent Dirichlet Allocation (MedLDA) with mixed membership for relation detection. By using supervision, ERD-MedLDA is able to learn topic distributions indicative of relation types. Further, ERD-MedLDA is a topic model that combines the benefits of both, maximum likelihood estimation (MLE) and maximum margin estimation (MME), and the mixed-membership formulation enables the system to incorporate heterogeneous features. We incorporate different features into the system and perform experiments on the ACE 2005 corpus. Our approach achieves better overall performance for precision, recall, and F-measure metrics as compared to baseline SVM-based and LDA-based models. We also find that our system shows better and consistent improvements with the addition of complex informative features as compared to baseline systems.

Type: Articles
Information: Natural Language Engineering , Volume 18 , Special Issue 2: Statistical Learning of Natural Language Structured Input and Output , April 2012 , pp. 263 - 289

DOI: https://doi.org/10.1017/S1351324912000058 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

ACE. 2000–2005. Automatic content extraction. http://www.ldc.upenn.edu/Projects/ACE/Google Scholar

Blei, D. M., and Jordan, M. I. 2006. Variational inference for Dirichlet process mixtures. Bayesian Analysis 1 (1): 121–44.CrossRef Google Scholar

Blei, D. M., and McAuliffe, J. 2008. Supervised topic models. Advances in Neural Information Processing Systems 20: 121–8.Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3: 993–1022.Google Scholar

Bunescu, R. C., and Mooney, R. J. 2005. A shortest path dependency kernel for relation extraction. In HLT & EMNLP Proceedings, pp. 724–31, Vancouver, Canada.Google Scholar

Carreras, X., and Màrquez, L. 2005. Introduction to the CoNLL-2005 shared task: semantic role labeling. In Proceedings of the 9th Conference on Computational Natural Language Learning, pp. 152–64, Ann Arbor, MI.Google Scholar

Chan, Y., and Roth, D. 2011. Exploiting syntactico-semantic structures for relation extraction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR.Google Scholar

Collins, M., and Duffy, N. 2002. Convolution kernels for natural language. Advances in Neural Information Processing Systems 1: 625–32.Google Scholar

Cortes, C., and Vapnik, V. 1995. Support-vector networks. Machine Learning 20 (3): 273–97.CrossRef Google Scholar

Culotta, A., and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 423, Barcelona, Spain.Google Scholar

Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program: tasks, data, and evaluation. Proceedings of LREC 4: 837–40.Google Scholar

Farkas, R., Vincze, V., Móra, G., Csirik, J., and Szarvas, G. 2010. The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In Proceedings of the 14th Conference on Computational Natural Language Learning (CoNLL-2010): Shared Task, pp. 1–12, Uppsala, Sweden.Google Scholar

Flaherty, P., Giaever, G., Kumm, J., Jordan, M. I., and Arkin, A. P. 2005. A latent variable model for chemogenomic profiling. Bioinformatics 21 (15): 3286–93.CrossRef Google Scholar PubMed

Hachey, B. 2006. Comparison of similarity models for the relation discovery task. In Proceedings of the Workshop on Linguistic Distances, p. 25, Sydney, Australia.CrossRef Google Scholar

Hasegawa, T., Sekine, S., and Grishman, R. 2004. Discovering relations among named entities from large corpora. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL'04), Main Volume, pp. 415–22, Barcelona, Spain.Google Scholar

Jiang, J. 2009. Multi-task transfer learning for weakly-supervised relation extraction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, pp. 1012–20, Suntec, Singapore.Google Scholar

Jiang, J., and Zhai, C. X. 2007. A systematic exploration of the feature space for relation extraction. In Proceedings of NAACL/HLT, pp. 113–20, Rochester, NY.Google Scholar

Joachims, T. 1999. Making large scale SVM learning practical. In Advances in Kernel Methods: Support Vector Learning, pp. 169–184. Cambridge, MA: MIT Press.Google Scholar

Kambhatla, N. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 22, Barcelona, Spain.CrossRef Google Scholar

Khayyamian, M., Mirroshandel, S. A., and Abolhassani, H. 2009. Syntactic tree-based relation extraction using a generalization of Collins and Duffy convolution tree kernel. In Proceedings of the HLT/NAACL Student Research Workshop and Doctoral Consortium, pp. 66–71, Boulder, CO.Google Scholar

Lacoste-Julien, S., Sha, F., and Jordan, M. I. 2008. DiscLDA: discriminative learning for dimensionality reduction and classification. In Advances in Neural Information Processing Systems 21: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, Vancouver, Canada.Google Scholar

Lin, W. H., Xing, E., and Hauptmann, A. 2008. A joint topic and perspective model for ideological discourse. In Daelemans, W., Goethals, B., and Morik, K. (eds.), Machine Learning and Knowledge Discovery in Databases, pp. 17–32. Berlin: Springer-Verlag.Google Scholar

Miller, S., Fox, H., Ramshaw, L., and Weischedel, R. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 226–33, Seattle, WA.Google Scholar

Minka, T. P. 2003. A comparison of numerical optimizers for logistic regression. Technical Report, Department of Statistics, Carnegie Mellon University.Google Scholar

Mintz, M., Bills, S., Snow, R., and Jurafsky, D. 2009. Distant supervision for relation extraction without labeled data. In 47th ACL & 4th AFNLP Proceedings, pp. 1003–11, Suntec, Singapore.Google Scholar

Moschitti, A. 2006. Efficient convolution kernels for dependency and constituent syntactic trees. In 17th ECML Proceedings, pp. 318–29, Berlin, Germany.Google Scholar

Nguyen, T. V. T., Moschitti, A., and Riccardi, G. 2009. Convolution kernels on constituent, dependency and sequential structures for relation extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1378–87, Singapore.Google Scholar

Qian, L., Zhou, G., Kong, F., Zhu, Q., and Qian, P. 2008. Exploiting constituent dependencies for tree kernel-based semantic relation extraction. In Proceedings of the 22nd ACL Conference, pp. 697–704, Manchester.Google Scholar

Ramage, D., Hall, D., Nallapati, R., and Manning, C. D. 2009. Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 EMNLP Conference, pp. 248–56, Singapore.Google Scholar

Roth, D., and Yih, W. 2002. Probabilistic reasoning for entity & relation recognition. In Proceedings of the 19th International Conference on Computational Linguistics (COLING), p. 7, Morristown, NJ.Google Scholar

Shan, H., Banerjee, A., and Oza, N. C. 2009. Discriminative mixed-membership models. In Proceedings of the 9th IEEE International Conference on Data Mining, pp. 466–75, Miami, FL.Google Scholar

Titov, I., and McDonald, R. 2008. Modeling online reviews with multi-grain topic models. In Proceeding of the 17th International Conference on World Wide Web, pp. 111–20, New York.CrossRef Google Scholar

Wang, C., Blei, D., and Li, F. F. 2009. Simultaneous image classification and annotation. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1903–10, Miami, FL.CrossRef Google Scholar

Zelenko, D., Aone, C., and Richardella, A. 2003. Kernel methods for relation extraction. Journal of Machine Learning Research 3: 1083–106.Google Scholar

Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In 21st ICCL & 44th ACL Proceedings, pp. 825–32, Sydney, Australia.Google Scholar

Zhao, S., and Grishman, R. 2005. Extracting relations with integrated information using kernel methods. In 43rd ACL Proceedings, p. 426, Ann Arbor, MI.Google Scholar

Zhao, W. X., Jiang, J., Yan, H., and Li, X. 2010. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 56–65, MIT Stata Center, MA.Google Scholar

Zhao, X., Jiang, J., He, J., Song, Y., Achananuparp, P., LIM, E. P., and Li, X. 2011. Topical keyphrase extraction from twitter. Proceedings of the 49th Annual ACL-HLT Meeting, Portland, OR.Google Scholar

Zhou, G., Jian, S., Jie, Z., and Min, Z. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the ACL, pp. 427–34, Ann Arbor, MI.Google Scholar

Zhou, G., Zhang, M., Ji, D. H., and Zhu, Q. 2007. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In Proceedings of the EMNLP/CoNLL-2007 Conference, pp. 728–36, Prague, Czech Republic.Google Scholar

Zhou, G. D., Zhang, M., Ji, D. H., and Zhu, Q. M. 2008. Hierarchical learning strategy in semantic relation extraction. Information Processing & Management 44 (3): 1008–21.CrossRef Google Scholar

Zhu, J., Ahmed, A., and Xing, E. P. 2009. MedLDA: maximum margin supervised topic models for regression and classification. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1257–64, Montreal, Canada.CrossRef Google Scholar

Zhu, J., and Xing, E. P. 2010. Conditional topic random fields. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel.Google Scholar

Article contents

ERD-MedLDA: Entity relation detection using supervised topic models with maximum margin learning

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests