Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-23T13:56:32.090Z Has data issue: false hasContentIssue false

Identifying signs of syntactic complexity for rule-based sentence simplification

Published online by Cambridge University Press:  31 October 2018

RICHARD EVANS
Affiliation:
Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, UK e-mail: [email protected], [email protected]
CONSTANTIN ORĂSAN
Affiliation:
Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, UK e-mail: [email protected], [email protected]

Abstract

This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output.

Type
Article
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work was supported by the European Commission under the Seventh (FP7-2007–2013) Framework Programme for Research and Technological Development [287607]. We gratefully acknowledge Emma Franklin, Zoë Harrison, and Laura Hasler for their contribution to the development of the datasets used in our research and Iustin Dornescu for his contribution to the development of the sign tagger. For their participation in the user surveys, we thank Martina Cotella, Francesca Della Moretta, Arianna Fabbri, and Victoria Yaneva. We gratefully acknowledge Larissa Sayuri Futino Castro dos Santos for assistance in collating our survey data.

References

Agarwal, R., and Boggess, L., 1992. A simple but useful approach to conjunct identification. In Proceedings of the 30th Annual Meeting for Computational Linguistics, Newark, Delaware: Association for Computational Linguistics, pp. 1521.Google Scholar
Aluisio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., and Fortes, R. P. M., 2008a. Towards Brazilian Portuguese automatic text simplification systems. In Proceedings of the 8th ACM Symposium on Document Engineering (DocEng ’08), Sao Paulo, Brazil: ACM, pp. 240–8.Google Scholar
Aluisio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., Caseli, H. M., and Fortes, R. P. M., 2008b. A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th Annual ACM International Conference on Design of Communication (SIGDOC ’08), Lisbon, Portugal: ACM, pp. 1522.Google Scholar
Angrosh, M. A., and Siddharthan, A., 2014. Text simplification using synchronous dependency grammars: generalising automatically harvested rules. In Proceedings of the 8th International Natural Language Generation Conference, Philadelphia, Pennsylvania: Association for Computational Linguistics, pp. 1625.Google Scholar
Angrosh, M., Nomoto, T., and Siddharthan, A., 2014. Lexico-syntactic text simplification and compression with typed dependencies. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING 2014), Dublin, Ireland, pp. 19962006.Google Scholar
Bennetto, L., Pennington, B. F., and Rogers, S. J., 1996. Intact and impaired memory functions in autism. Child Development 67 (4): 1816–35.Google Scholar
Bos, J., 2008. Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference in Semantics in Text Processing, Venice, Italy, pp. 277–86.Google Scholar
Bott, S., Saggion, H., and Figueroa, D., 2012. A hybrid system for Spanish text simplification. In Proceedings of the NAACL-HLT 2012 Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montréal, Canada, pp. 7584.Google Scholar
Brill, E., 1994. Some advances in transformation-based part of speech tagging. In Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, Washington, pp. 722–7.Google Scholar
Brouwers, L., Bernhard, D., Ligozat, A.-L., and Francois, T., 2014. Syntactic sentence simplification for French. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL 2014, Gothenburg, Sweden: Association for Computational Linguistics, pp. 4756.Google Scholar
Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., and Covington, M. A., 2008. Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods 40 (2): 540–5.Google Scholar
Canning, Y. 2002. Syntactic Simplification of Text. Ph.d. thesis, University of Sunderland.Google Scholar
Caplan, D., and Waters, G. S., 1999. Verbal working memory and sentence comprehension. Behavioural and Brain Sciences 22 (1): 77126.Google Scholar
Chandrasekar, R., Doran, C., and Srinivas, B., 1996. Motivations and methods for text simplification. In Proceedings of the 16th International Conference on Computational Linguistics (COLING ’96), Copenhagen, Denmark, pp. 1041–4.Google Scholar
Chomsky, N. 1970. Remarks on nominalization. In Jacobs, R., and Rosenbaum, P. (eds.), Readings in English Transformational Grammar, pp. 184221. Boston, Massachusetts: Ginn and Company.Google Scholar
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 (1): 3746.Google Scholar
Cohn, T., and Lapata, M., 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research 20 (34): 637–74.Google Scholar
Coster, W., and Kauchak, D., 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011), Portland, Oregon: Association of Computational Linguistics, pp. 665–9.Google Scholar
Daelemans, W., Höthker, A., and Tjong Kim Sang, E., 2004. Automatic sentence simplification for subtitling in Dutch and English. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, pp. 1045–8.Google Scholar
De Belder, J., and Moens, M. F., 2010. Text simplification for children. In Proceedings of the SIGIR Workshop on Accessible Search Systems, Geneva, Switzerland, pp. 1926.Google Scholar
DeFrancesco, C., and Perkins, K. 2012. An analysis of the proposition density, sentence and clause types, and nonfinite verbal usage in two college textbooks. In Plakhotnik, M. S., Nielsen, S. M., and Pane, D. M. (eds.), Proceedings of the 11th Annual College of Education & GSN Research Conference, pp. 20–5. Miami, Florida: Florida International University.Google Scholar
de Marneffe, M.-C., MacCartney, W., and Manning, C. D., 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, Italy: ELDA, pp. 449–54.Google Scholar
Dornescu, I., Evans, R., and Orasan, C., 2013. A tagging approach to identify complex constituents for text simplification. In Proceedings of the 9th International Conference on Recent Advances in Natural Language Processing (RANLP-2013), Hissar, Bulgaria, pp. 221–9.Google Scholar
Evans, R. 2011. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing 26 (4), 371–88.Google Scholar
Evans, R., and Orasan, C. 2013. Annotating signs of syntactic complexity to support sentence simplification. In Habernal, I. and Matousek, V. (eds.), Text, Speech and Dialogue. Proceedings of the 16th International Conference TSD 2013, pp. 92104. Plzen, Czech Republic: Springer.Google Scholar
Feblowitz, D., and Kauchak, D., 2013. Sentence simplification as tree transduction. In Proceedings of the 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations, Sofia, Bulgaria: Association for Computational Linguistics, pp. 110.Google Scholar
Ferrés, D., Marimon, M., and Saggion, H., 2015. A web-based text simplification system for english. Procesamiento del Lenguaje Natural 55: 191–4.Google Scholar
Gaizauskas, R., Foster, J., Wilks, Y. Arundel, J., Clough, P., and Piao, S., 2001. The Meter corpus: a corpus for analysing journalistic text reuse. In Proceedings of Corpus Linguistics 2001 Conference, Lancaster, UK: Lancaster University Centre for Computer Corpus Research on Language, pp. 214–23.Google Scholar
Glavas, G., and Stajner, S., 2013. Event-centered simplification of news stories. In Proceedings of the Student Workshop held in conjunction with RANLP 2013, Hissar, Bulgaria, pp. 71–8.Google Scholar
Gonzalez-Dios, I., Aranzabe, M. J., and Díaz de Ilarraza, A., 2018. The corpus of Basque simplified texts (CBST). Language Resources and Evaluation 52 (1): 217–47.Google Scholar
Grover, C., Matheson, C., Mikheev, A., and Moens, M., 2000. LT TTT – a flexible tokenisation tool. In Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, pp. 1147–54.Google Scholar
Hepple, M. 2000. Independence and commitment: assumptions for rapid training and execution of rule-based POS taggers. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong: Association for Computational Linguistics, pp. 278–85.Google Scholar
Jay, T. B., 2003. The Psychology of Language. Upper Saddle Rive, NJ: Pearson.Google Scholar
Jelínek, T. 2014. Improvements to dependency parsing using automatic simplification of data. In Proceedings of Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland: European Language Resources Association, pp. 73–7.Google Scholar
Jonnalagadda, S., Tari, L., Hakenberg, J., Baral, C., and Gonzalez, G., 2009. Towards effective sentence simplification for automatic processing of biomedical text. In Proceedings of NAACL HLT 2009: Short Papers, Boulder, Colorado: Association for Computational Linguistics, pp. 177–80.Google Scholar
Kincaid, J. P., Fishburne, R. P., Rogers, R. L., and Chissom, B. S. 1975. Derivation of new readability formulas (Automatic readability index, fog count and flesch reading ease formula) for Navy enlisted personnel. CNTECHTRA Research Branch Report 8-75, CNTECHTRA.Google Scholar
Kintsch, W., and Welsch, D. M. 1991. The construction–integration model: a framework for studying memory for text. In Hockley, W. E., and Lewandowsky, S. (eds.), Relating Theory and Data: Essays on Human Memory, pp. 367–85. NJ, Erlbaum: Hillsdale.Google Scholar
Klerke, S., Goldberg, Y., and Søgaard, A., 2016. Improving sentence compression by learning to predict gaze. In Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2016), San Diego, California: Association for Computational Linguistics, pp. 1528–33.Google Scholar
Kudo, T. 2005. Crf++: yet another crf toolkit. http://crfpp.sourceforge.net.Google Scholar
Lafferty, J., McCallum, A., and Pereira, F. C., 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, Rümlang, Switzerland: Morgan Kaufmann, pp. 282–9.Google Scholar
Lei, C.-U., Man, K. L., and Ting, T. O. 2014. Using Coh-Metrix to analyse writing skills of students: a case study in a technological common core curriculum course. In Proceedings of the International MultiConference of Engineers and Computer Scientists 2014 Vol II (IMECS 2014), Hong Kong: IMECS, pp. 3–6.Google Scholar
Levenshtein, V. I., 1966. Binary codes capable of correcting deletions and insertions and reversals. Soviet Physics Doklady 10 (8): 707–10.Google Scholar
Maier, W., Kübler, S., Hinrichs, E., and Kriwanek, J., 2012. Annotating coordination in the penn treebank. In Proceedings of the 6th Linguistic Annotation Workshop, Jeju, Republic of Korea: Association for Computational Linguistics, pp. 166–74.Google Scholar
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A., 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19 (2): 313–30.Google Scholar
Martos, J., Freire, S., González, A., Gil, D., Evans, R., Jordanova, V., Cerga, A., Shishkova, A., and Orasan, C. 2013. User preferences: Updated. Technical Report D2.2, Deletrea, Madrid, Spain.Google Scholar
Max, A. 2000. Syntactic Simplification – An Application to Text for Aphasic Readers. Mphil in Computer Speech and Language Processing, Wolfson College, University of Cambridge.Google Scholar
McDonald, R. T., and Nivre, J. 2011. Analyzing and integrating dependency parsers. Computational Linguistics, 37 (1): 197230.Google Scholar
McNamara, D. S., Graesser, A. C., McCarthy, P. M., and Cai, Z., 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge, UK: Cambridge University Press.Google Scholar
Mishra, K., Soni, A., Sharma, R., and Sharma, D. 2014. Exploring the effects of sentence simplification on Hindi to English machine translation system In Proceedings of the Workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society, Dublin, Ireland: Association for Computational Linguistics, pp. 21–9.Google Scholar
Miwa, M., Sætre, R., Miyao, Y., and Tsujii, J., 2010. Entity-focused sentence simplification for relation extraction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China: Association for Computational Linguistics, pp. 788–96.Google Scholar
Narayan, S., and Gardent, C., 2014. Hybrid simplification using deep semantics and machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland: Association for Computational Linguistics, pp. 435–45.Google Scholar
Ogden, C. K., 1932. Basic English: A General Introduction with Rules and Grammar. London: K. Paul, Trench, Trubner & Co., Ltd.Google Scholar
Paetzold, G. H., and Specia, L., 2013. Text simplification as tree transduction. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, Fortaleza, CE, Brazil: Sociedade Brasileira de Computação, pp. 116–25.Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W. J., 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting for Computational Linguistics, Philadelphia, Pennsylvania: Association for Computational Linguistics, pp. 311–8.Google Scholar
Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J., 1985. A Comprehensive Grammar of the English Language. Harlow, Essex: Longman.Google Scholar
Rennes, E., and Jönsson, A., 2015. A tool for automatic simplification of Swedish texts. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), Vilnius, Lithuania: LiU Electronic Press, pp. 317–20.Google Scholar
Rindflesch, T. C., Rajan, J. V., and Hunter, L., 2000. Extracting molecular binding relationships from biomedical text. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, Washington: Association of Computational Linguistics, pp. 188–95.Google Scholar
Saggion, H., S̆tajner, S., Bott, S., Mille, S., Rello, L., and Drndarevic, B., 2015. Making it simplext: implementation and evaluation of a text simplification system for Spanish. ACM Transactions on Accessible Computing (TACCESS) – Special Issue on Speech and Language Processing for AT (Part 2) 6 (4): 14:114:36.Google Scholar
Scarton, C., Palmero Aprosio, A., Tonelli, S., Martin-Wanton, T., and Specia, L. 2017. MUSST: a multilingual syntactic simplification tool. In The Companion Volume of the IJCNLP 2017 Proceedings: System Demonstrations, Taipei, Taiwan: AFNLP, pp. 25–8.Google Scholar
Seretan, V., 2012. Acquisition of syntactic simplification rules for French. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey: European Language Resources Association (ELRA), pp. 4019–26.Google Scholar
Sheremetyeva, S., 2014. Automatic text simplification for handling intellectual property (The case of multiple patent claims). In Proceedings of the Workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society, Dublin, Ireland: Association for Computational Linguistics, pp. 41–52.Google Scholar
Siddharthan, A. 2004. Syntactic Simplification and Text Cohesion. Ph.d. thesis, University of Cambridge.Google Scholar
Siddharthan, A., 2006. Syntactic simplification and text cohesion. Research on Language and Computation 4 (1): 77109.Google Scholar
Siddharthan, A., 2011. Text simplification using typed dependencies: a comparison of the robustness of different generation strategies. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG ’11), Nancy, France: Association for Computational Linguistics, pp. 211.Google Scholar
Siddharthan, A., and Angrosh, M. A., 2014. Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden: Association for Computational Linguistics, pp. 722–31.Google Scholar
S̆tajner, S., Calixto, I., and Saggion, H., 2015. Automatic text simplification for Spanish: comparative evaluation of various simplification strategies. In Proceedings of Recent Advances in Natural Language Processing (RANLP-2015), Hissar, Bulgaria, pp. 618–26.Google Scholar
Suter, J., Ebling, S., and Volk, M., 2016. Rule-based automatic text simplification for German. In Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), Bochum, Germany: Bochumer Linguistische Arbeitsberichte (BLA), pp. 279–87.Google Scholar
Sutton, C., and McCallum, A., 2011. An introduction to conditional random fields. Foundations and Trends in Machine Learning 4 (4): 267373.Google Scholar
Tomita, M., 1985. Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Norwell, MA, USA: Kluwer Academic Publishers.Google Scholar
Van Delden, S., and Gomez, F., 2002. Combining finite state automata and a greedy learning algorithm to determine the syntactic roles of commas. In Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence (ICTAI ’02), Washington, DC, USA: IEEE Computer Society, pp. 293301.Google Scholar
Vickrey, D., and Koller, D., 2008. Sentence simplification for semantic role labeling. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (ACL ’08: HLT), Columbus, Ohio, USA: Association for Computational Linguistics, pp. 344–52.Google Scholar
Vu, T. T., Tran, G. B., and Pham, S. B. 2014. Learning to simplify children stories with limited data. In Nguyen, N. T., Attachoo, B., Trawiski, B., and Somboonviwat, K. (eds.), Intelligent Information and Database Systems (ACIIDS 2014), pp. 3141. Bangkok, Thailand: Springer.Google Scholar
Woodsend, K., and Lapata, M., 2011. Learning to simplify sentences with quasi-synchronous grammar and integer programming. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland: Association for Computational Linguistics, pp. 409–20.Google Scholar
Wubben, S., van den Bosch, A., and Krahmer, E., 2012. Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea: Association for Computational Linguistics, pp. 1015–24.Google Scholar
Xu, W., Callison-Burch, C., and Napoles, C., 2015. Problems in current text simplification research: new data can help. Transactions of the Association for Computational Linguistics 3: 283–97.Google Scholar
Xu, W., Napoles, C., Pavlick, E., Chen, Q., and Callison-Burch, C., 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics 4: 401–15.Google Scholar
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., and Lee, L., 2010. For the sake of simplicity: unsupervised extraction of lexical simplifications from wikipedia. In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California: Association of Computational Linguistics, pp. 365–8.Google Scholar
Zhang, X., and Lapata, M., 2017. Sentence simplification with deep reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 584–94.Google Scholar
Zhu, Z., Bernhard, D., and Gurevych, I., 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, pp. 1353–61.Google Scholar