Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-26T04:02:27.207Z Has data issue: false hasContentIssue false

Network analysis of narrative content in large corpora

Published online by Cambridge University Press:  11 September 2013

SAATVIGA SUDHAHAR
Affiliation:
Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1TH, UK e-mail: [email protected], [email protected]
GIANLUCA DE FAZIO
Affiliation:
Department of Sociology, Emory University, Atlanta, GA 30322, USA e-mail: [email protected], [email protected]
ROBERTO FRANZOSI
Affiliation:
Department of Sociology, Emory University, Atlanta, GA 30322, USA e-mail: [email protected], [email protected]
NELLO CRISTIANINI
Affiliation:
Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1TH, UK e-mail: [email protected], [email protected]

Abstract

We present a methodology for the extraction of narrative information from a large corpus. The key idea is to transform the corpus into a network, formed by linking the key actors and objects of the narration, and then to analyse this network to extract information about their relations. By representing information into a single network it is possible to infer relations between these entities, including when they have never been mentioned together. We discuss various types of information that can be extracted by our method, various ways to validate the information extracted and two different application scenarios. Our methodology is very scalable, and addresses specific research needs in social sciences.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, A., Corvalan, A., Jensen, J., and Rambow, O. 2012. Social network analysis of alice in wonderland. In Workshop on Computational Linguistics for Literature, Montreal, Canada.Google Scholar
Anchuri, P., and Magdon-Ismail, M. 2012. Communities and balance in signed networks: a spectral approach. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey.Google Scholar
Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., and Cunningham, H. 2002. Shallow methods for named entity co-reference resolution. In 9th Annual Workshop on TALN 2002, Nancy, France.Google Scholar
Brin, S., and Page, L. 1998. The anatomy of a large-scale hypertextual (web) search engine. In Seventh International World Wide Web Conference, Brisbane, Australia.Google Scholar
Chen, H., Chung, W., Xu, J., Wang, G., Qin, Y., and Chau, M. 2004. Crime data mining: a general framework and some examples. IEEE Computer 37 (4): 50–6.CrossRefGoogle Scholar
Cunningham, H. 2002. GATE, a general architecture for text engineering. Computer and the Humanties 36: 223–54 (Springer, Netherlands).Google Scholar
Dali, L., Rusu, D., Fortuna, B., Mladenic, D., and Grobelnik, M. 2009. Question answering based on semantic graphs. In 18th International World Wide Web Conference, Madrid, Spain.Google Scholar
De Fazio, G. 2012. Political Radicalization in the Making: The Civil Rights Movement in Northern Ireland,1968–1972. PhD thesis, Department of Sociology, Emory University, Atlanta, GA.Google Scholar
Doreian, P., and Mrvar, A. 1996. A partitioning approach to structural balance. Social Networks 18 (2):149–68.Google Scholar
Earl, J., Martin, A., McCarthy, J., and Soule, S. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30: 6580.Google Scholar
Elson, D. K., Dames, N., and McKeown, K. R. 2010. Extracting social networks from literary fiction. In 24th AAAI Conference on Artificial Intelligence (AAAI 2010), Atlanta, GA.Google Scholar
Erdös, P., and Rényi, A. 1960. On the evolution of random graphs. Mathematical Institute of the Hungarian Academy of Sciences 5: 1761.Google Scholar
Flaounas, I., Ali, O., Turchi, M., Snowsill, T., Nicart, F., Tijl, D. B., and Cristianini, N. 2011. Noam: news outlets analysis and monitoring system. In ACM SIGMOD International Conference on Management of Data, Athens, Greece.Google Scholar
Franzosi, R. 1987. The press as a source of socio-historical data: issues in the methodology of data collection from newspapers. Historical Methods 20: 516.Google Scholar
Franzosi, R. 1998. Narrative as data. Linguistic and statistical tools for the quantitative study of historical events. International Review of Social History (Special Issue on New Methods in Historical Sociology/Social History) 43: 81104.Google Scholar
Good, P. 2005. Permutation, Parametric, and Bootstrap Tests of Hypotheses, 3rd ed. (Springer Series in Statistics). New York, NY: Springer.Google Scholar
Gruzd, A., and Haythornthwaite, C. 2008. Automated discovery and analysis of social networks from threaded discussions. In International Network of Social Network Analysis (INSNA) Conference, St. Pete Beach, FL.Google Scholar
Hassan, A., Abu-Jbara, A., and Radev, D. 2012. Extracting signed social networks from text. In TextGraphs-7 Workshop at ACL, Jeju, Korea.Google Scholar
Heider, F. 1946. Attitudes and cognitive organization. The Journal of psychology 21 (1): 107–12.Google Scholar
Kimura, M., Saito, K., Ohara, K., and Motoda, H. 2010. Learning to predict opinion share in social networks. In 24th AAAI Conference on Artificial Intelligence (AAAI-10), Atlanta, GA.Google Scholar
Kipper, K., Korhonen, A., Ryant, N., and Palmer, M. 2006. Extensive classifications of English verbs. In 12th EURALEX International Congress, Turin, Italy.Google Scholar
Kleinberg, J. 1998. Authoritative sources in a hyperlinked environment. In 9th ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA.Google Scholar
Kunegis, J., Schmidt, S., Lommatzsch, A., Lerner, J., De Luca, E., and Albayrak, S. 2010. Spectral analysis of signed graphs for clustering, prediction and visualization. In SIAM International Conference on Data Mining, Columbus, OH.Google Scholar
Lin, D. 1998. Dependency-based evaluation of minipar. In Workshop on the Evaluation of Parsing Systems, Granada, Spain.Google Scholar
Mac Carron, P., and Kenna, R. 2012. Universal properties of mythological networks. Europhysics Letters 99: 28002. arXiv:1205.4324 [physics.soc-ph].Google Scholar
Mihalcea, R., and Radev, D. 2011. Graph-Based Natural Language Processing and Information Retrieval. Cambridge, UK: Cambridge University Press.Google Scholar
Mitkov, R. 1999. Anaphora resolution: the state of the art. Technical Report, School of Languages and European Studies, University of Wolverhampton, West Midlands, UK.Google Scholar
Moretti, F. 2011. Network theory, plot analysis. New Left Review 68: 80102.Google Scholar
Rusu, D., Dali, L., Fortuna, B., Grobelnik, M., and Mladenic, D. 2007. Triplet extraction from sentences. In 10th International Multiconference Information Society – IS 2007, Ljubljana, Slovenia.Google Scholar
Rusu, D., Fortuna, B., Grobelnik, M., and Mladenic, D. 2008. Semantic graphs derived from triplets with application in document summarization. In Conference on Data Mining and Data Warehouses (SiKDD), Las Vegas, NV.Google Scholar
Sandhaus, E., 2008. The New York Times Annotated Corpus. New York, NY: New York Times. LDC Catalog No. LDC2008T19; ISBN: 1-58563-486-7.Google Scholar
Sclano, F., and Velardi, P. 2007. TermExtractor: a web application to learn the common terminology of interest groups and research communities. In 9th Conference on Terminology and Artificial Intelligence (TIA 2007), Sophia, Antinopolis.Google Scholar
Shannon, P., Markiel, A., Ozier, O., Baliga, N., Wang, J., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 13: 2498–504.Google Scholar
Seigel, S. 1957. Nonparametric statistics. The American Statistician 11 (3): 13–9.Google Scholar
Sergei, M., and Kim, S. 2002. Specificity and stability in topology of protein networks. Science 296 (5569): 910–3.Google Scholar
Soon, W., Ng, H., and Lim, D. 2001. A machine learning approach to co-reference resolution of noun phrases. Computational Linguistics 27: 521–44.Google Scholar
Trampus, M., and Mladenic, D. 2011. Learning event patterns from text. Informatica 35: 200711.Google Scholar
Velardi, P., Navigli, R., Cucchiarelli, A., and Antonio, F. D. 1990. A new contentbased model for social network analysis. In IEEE International Conference on Semantic Computing, Santa Clara, CA.Google Scholar
William, J. W. 1990. Construction of permutation tests. Journal of American Statistical Association 85: 693–8.Google Scholar
Wilson, E. B. 1927. Probable inference, the law of succession, and statistical inference. Journal of American Statistical Association 22: 209–12.Google Scholar
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S. 2005. Opinionfinder: a system for subjectivity analysis. In Human Language Technology Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, Canada.Google Scholar
Yang, B., Cheung, W., and Liu, J. 2007. Community mining from signed social networks. IEEE Transactions on Knowledge and Data Engineering 19: 10.Google Scholar
Zeng, D., Chen, H., Lusch, R., and Li, S. 2010. Social media analytics and intelligence. Journal of IEEE Intelligent Systems 25 (6): 13–6.Google Scholar