Ranking facts for explaining answers to elementary science questions

Jennifer D’Souza; Isaiah Onando Mulang; Sören Auer

doi:10.1017/S1351324921000358

Ranking facts for explaining answers to elementary science questions

Published online by Cambridge University Press: 24 January 2022

Jennifer D’Souza

Isaiah Onando Mulang and

Sören Auer

Show author details

Jennifer D’Souza*: Affiliation:
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Isaiah Onando Mulang: Affiliation:
University of Bonn, Bonn, Germany
Sören Auer: Affiliation:
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
*: *Corresponding author. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question’s answer by “connecting the dots” across various pertinent facts. Considering automated reasoning for elementary science question answering, we address the novel task of generating explanations for answers from human-authored facts. For this, we examine the practically scalable framework of feature-rich support vector machines leveraging domain-targeted, hand-crafted features. Explanations are created from a human-annotated set of nearly 5000 candidate facts in the WorldTree corpus. Our aim is to obtain better matches for valid facts of an explanation for the correct answer of a question over the available fact candidates. To this end, our features offer a comprehensive linguistic and semantic unification paradigm. The machine learning problem is the preference ordering of facts, for which we test pointwise regression versus pairwise learning-to-rank. Our contributions, originating from comprehensive evaluations against nine existing systems, are (1) a case study in which two preference ordering approaches are systematically compared, and where the pointwise approach is shown to outperform the pairwise approach, thus adding to the existing survey of observations on this topic; (2) since our system outperforms a highly-effective TF-IDF-based IR technique by 3.5 and 4.9 points on the development and test sets, respectively, it demonstrates some of the further task improvement possibilities (e.g., in terms of an efficient learning algorithm, semantic features) on this task; (3) it is a practically competent approach that can outperform some variants of BERT-based reranking models; and (4) the human-engineered features make it an interpretable machine learning model for the task.

Keywords

Information extraction Machine learning Semantics Statistical methods Explanation generation

Type: Article
Information: Natural Language Engineering , Volume 29 , Issue 2 , March 2023 , pp. 228 - 253

DOI: https://doi.org/10.1017/S1351324921000358 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Angeli, G., Premkumar, M.J.J. and Manning, C.D. (2015). Leveraging linguistic structure for open domain information extraction. In ACL.CrossRef Google Scholar

Banerjee, P. (2019). Asu at textgraphs 2019 shared task: Explanation regeneration using language models and iterative re-ranking. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 78–84.Google Scholar

Bauer, L., Wang, Y. and Bansal, M. (2018). Commonsense for generative multi-hop question answering tasks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4220–4230.CrossRef Google Scholar

Chia, Y.K., Witteveen, S. and Andrews, M. (2019). Red dragon ai at textgraphs 2019 shared task: Language model assisted explanation generation. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 85–89.CrossRef Google Scholar

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C. and Tafjord, O. (2018). Think you have Solved Question Answering? Try Arc, the ai2 Reasoning Challenge. ArXiv, abs/1803.05457.Google Scholar

Clark, P., Etzioni, O., Khashabi, D., Khot, T., Mishra, B.D., Richardson, K., Sabharwal, A., Schoenick, C., Tafjord, O., Tandon, N., Bhakthavatsalam, S., Groeneveld, D., Guerquin, M. and Schmitz, M. (2019). From ‘F’ to ‘A’ on the N.Y. regents science exams: An overview of the aristo project. arXiv preprintarXiv: abs/1909.01958.Google Scholar

Clark, P., Harrison, P. and Balasubramanian, N. (2013). A study of the knowledge base requirements for passing an elementary science test. In Proceedings of the 2013 Workshop on Automated Knowledge Base Construction. ACM, pp. 37–42.CrossRef Google Scholar

Das, R., Godbole, A., Zaheer, M., Dhuliawala, S. and McCallum, A. (2019). Chains-of-reasoning at textgraphs 2019 shared task: Reasoning over chains of facts for explainable multi-hop inference. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 101–117.CrossRef Google Scholar

Devlin, J., Chang, M., Lee, K. and Toutanova, K. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. CoRR, abs/1810.04805.Google Scholar

D’Souza, J., Mulang, I.O. and Auer, S. (2019). Team svmrank: Leveraging feature-rich support vector machines for ranking explanations to elementary science questions. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 90–100.Google Scholar

Fried, D., Jansen, P., Hahn-Powell, G., Surdeanu, M. and Clark, P. (2015). Higher-order lexical semantic models for non-factoid answer reranking. Transactions of the Association for Computational Linguistics 3, 197–210.CrossRef Google Scholar

Fürnkranz, J. and Hüllermeier, E. (2010). Preference learning and ranking by pairwise comparison. In Preference Learning. Springer, pp. 65–82.CrossRef Google Scholar

Jansen, P., Balasubramanian, N., Surdeanu, M. and Clark, P. (2016). What’s in an explanation? characterizing knowledge and inference requirements for elementary science exams. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2956–2965.Google Scholar

Jansen, P., Sharp, R., Surdeanu, M. and Clark, P. (2017). Framing QA as building and ranking intersentence answer justifications. Computational Linguistics 43(2), 407–449.CrossRef Google Scholar

Jansen, P. and Ustalov, D. (2019). Textgraphs 2019 shared task on multi-hop inference for explanation regeneration. In Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13), pp. 63–77.CrossRef Google Scholar

Jansen, P., Wainwright, E., Marmorstein, S. and Morrison, C.T. 2018. Worldtree: A corpus of explanation graphs for elementary science questions supporting multi-hop inference. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC).Google Scholar

Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines, vol. 668. Springer Science & Business Media.CrossRef Google Scholar

Joachims, T. (2006). Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 217–226.CrossRef Google Scholar

Kamishima, T., Kazawa, H. and Akaho, S. (2005). Supervised ordering-an empirical survey. In Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE, pp. 4–pp.CrossRef Google Scholar

Kamishima, T., Kazawa, H. and Akaho, S. (2010). A survey and empirical comparison of object ranking methods. In Preference Learning. Springer, pp. 181–201.CrossRef Google Scholar

Khashabi, D., Azer, E.S., Khot, T., Sabharwal, A. and Roth, D. (2019). On the Capabilities and Limitations of Reasoning for Natural Language Understanding. arXiv preprint arXiv:1901.02522.Google Scholar

Khot, T., Clark, P., Guerquin, M., Jansen, P. and Sabharwal, A. (2020). QASC: A dataset for question answering via sentence composition. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020. AAAI Press, pp. 8082–8090.CrossRef Google Scholar

Khot, T., Sabharwal, A. and Clark, P. (2018). Scitail: A textual entailment dataset from science question answering. In AAAI.CrossRef Google Scholar

Liu, H. and Singh, P. (2004). Conceptnet–a practical commonsense reasoning tool-kit. BT Technology Journal 22(4), 211–226.CrossRef Google Scholar

Lundberg, S.M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pp. 4765–4774.Google Scholar

Manning, C.D. (2015). Computational linguistics and deep learning. Computational Linguistics 41(4), 701–707.CrossRef Google Scholar

Melnikov, V., Gupta, P., Frick, B., Kaimann, D. and Hüllermeier, E. (2016). Pairwise versus pointwise ranking: A case study. Schedae Informaticae 25, 73–83.Google Scholar

Mitra, A. and Baral, C. (2015). Learning to automatically solve logic grid puzzles. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1023–1033.CrossRef Google Scholar

Mitra, A., Clark, P., Tafjord, O. and Baral, C. (2019). Declarative question answering over knowledge bases containing natural language text with answer set programming. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3003–3010.CrossRef Google Scholar

Mittelstadt, B., Russell, C. and Wachter, S. (2019). Explaining explanations in AI. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, pp. 279–288.CrossRef Google Scholar

Nogueira, R. and Cho, K. (2019). Passage Re-Ranking with Bert. arXiv preprint arXiv:1901.04085.Google Scholar

Parikh, A.P., Täckström, O., Das, D. and Uszkoreit, J. (2016). A decomposable attention model for natural language inference. In EMNLP.CrossRef Google Scholar

Paul, D. and Frank, A. (2019). Ranking and selecting multi-hop knowledge paths to better predict human needs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3671–3681.Google Scholar

Pirtoaca, G.S., Rebedea, T. and Ruseti, S. (2019). Answering questions by learning to rank-learning to rank by answering questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2531–2540.CrossRef Google Scholar

Ribeiro, M.T., Singh, S. and Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 1135–1144.CrossRef Google Scholar

Seo, M.J., Kembhavi, A., Farhadi, A. and Hajishirzi, H. (2016). Bidirectional Attention Flow for Machine Comprehension. ArXiv, abs/1611.01603.Google Scholar

Speer, R., Chin, J. and Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. In Thirty-First AAAI Conference on Artificial Intelligence.Google Scholar

Swayamdipta, S., Thomson, S., Dyer, C. and Smith, N.A. (2017). Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. arXiv preprint arXiv:1706.09528.Google Scholar

Turing, A.M. (2009). Computing machinery and intelligence. In Parsing the Turing Test. Springer, pp. 23–65.CrossRef Google Scholar

Vapnik, V.N. (1999). The nature of statistical learning theory. Springer science & business media.Google Scholar

Article contents

Ranking facts for explaining answers to elementary science questions

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests