Understanding Argument-Based Validity in Language Testing

doi:10.1017/9781108669849.004

2 - Understanding Argument-Based Validity in Language Testing

from Part I - Basic Concepts and Uses of Validity Argument in Language Testing and Assessment

Published online by Cambridge University Press: 14 January 2021

Carol A. Chapelle and

Hye-won Lee

Edited by

Carol A. Chapelle and

Erik Voss

Show author details

Carol A. Chapelle: Affiliation:
Iowa State University
Erik Voss: Affiliation:
Teachers College, Columbia University

Book contents

Get access

Summary

Argument-based validity has evolved in response to the needs of language testing researchers for a systematic approach to investigating validity of the language tests. Based on a collection of 51 recent books, articles, and research reports in language assessment, this chapter describes the fundamental characteristics of an argument-based approach to validity, which has been operationalized in various ways in language assessment. These characteristics demonstrate how argument-based validity operationalizes the ideals for validation presented by Messick (1989) and accepted by most language testers: that a validity argument should be unitary, but multifacted means for integrating a variety of evidence in an ongoing validation process. The chapter describes how validity arguments serve the multiple functions that language testers demand of their validation tools, and takes into account the concepts that are important in language testing. It distinguishes between two formulations of argument-based validity that appear in language testing to introduce the conventions used throughout the papers in the volume.

Keywords

argument-based validation language testing validation research test development language testers

Type: Chapter
Information: Validity Argument in Language Testing
Case Studies of Validation Research
, pp. 19 - 44

DOI: https://doi.org/10.1017/9781108669849.004 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Aryadoust, V. (2011). Validity arguments of the speaking and listening modules of international English language testing system: A synthesis of existing research. Asian ESP Journal, 7(2), 28–54.Google Scholar

Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. Newcastle upon Tyne: Cambridge Scholars Publishing.Google Scholar

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34.Google Scholar

Bachman, L. F., & Palmer, A. (1996). Language testing in practice. Oxford: Oxford University Press.Google Scholar

Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.Google Scholar

Barkaoui, K. (2017). Examining repeaters’ performance on second language proficiency tests: A review and a call for research. Language Assessment Quarterly, 14(4), 420–431.Google Scholar

Brooks, L., & Swain, M. (2014). Contextualizing performances: Comparing performances during TOEFL iBT^TM and real-life academic speaking activities. Language Assessment Quarterly, 11(4), 353–373.CrossRef Google Scholar

Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 23–52.CrossRef Google Scholar

Chapelle, C. A. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. F. & Cohen, A. D. (Eds.), Second language acquisition and language testing interfaces (pp. 32–70). Cambridge: Cambridge University Press.Google Scholar

Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272.Google Scholar

Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple… Language Testing, 29(1), 19–27.CrossRef Google Scholar

Chapelle, C. A., Chung, Y.-R., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443–469.Google Scholar

Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405.CrossRef Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the Test of English as a Foreign Language^TM. New York: Routledge.Google Scholar

Chapelle, C. A., & Voss, E. (2013). Evaluation of language tests through validation research. In Kunnan, A. J. (Ed.), The companion to language assessment (pp. 1079–1097). Chichester: Wiley.Google Scholar

Cheng, L., & Sun, Y. (2015). Interpreting the impact of the Ontario Secondary School Literacy Test on second language students within an argument-based validation framework. Language Assessment Quarterly, 12(1), 50–66.CrossRef Google Scholar

Choi, Y. (2018). Graphic-prompt tasks for assessment of academic English writing ability: An argument-based approach to investigating validity. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Chung, Y.-R. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Colby-Kelly, C., & Turner, C. (2007). AFL research in the L2 classroom and evidence of usefulness: Taking formative assessment to the next level. Canadian Modern Language Review, 64(1), 9–37.Google Scholar

Creswell, J., & Plano Clark, V. (2017). Designing and conducting mixed methods research (3rd ed.). Thousand Oaks, CA: Sage Publications.Google Scholar

Cronbach, L. J. (1971). Test validation. In Thorndike, R. L. (Ed.), Educational measurement (pp. 443–507). Washington, DC: American Council on Education.Google Scholar

Cronbach, L. J. (1988). Internal consistency of tests: Analyses old and new. Psychometrika, 53(1), 63–70.Google Scholar

Doe, C. D. (2013). Validating the Canadian academic English language assessment for diagnostic purposes from three perspectives: Scoring, teaching, and learning. Unpublished doctoral dissertation, Queen’s University.Google Scholar

Doe, C. D. (2015). Student interpretations of diagnostic feedback. Language Assessment Quarterly, 12(1), 110–135.Google Scholar

Educational Testing Service (ETS). (2018). Validity evidence supporting the interpretation and use of TOEFL iBT® scores. TOEFL^® Research Insight Series, Volume 4. Princeton, NJ: Educational Testing Service.Google Scholar

Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater^® scoring. Language Testing, 27(3), 317–334.Google Scholar

Farnsworth, T. L. (2013). An investigation into the validity of the TOEFL iBT Speaking test for international teaching assistant certification. Language Assessment Quarterly, 10(3), 274–291.Google Scholar

Frost, K., Elder, C., & Wigglesworth, G. (2012). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345–369.Google Scholar

Fulcher, G., & Davidson, F. (2009). Test architecture, test retrofit. Language Testing, 26(1), 123–144.Google Scholar

He, L., & Min, S. (2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160–176.Google Scholar

Im, G.-H., & Cheng, L. (2019). The Test of English for International Communication (TOEIC^®). Language Testing, 36(2), 315–324.CrossRef Google Scholar

Jia, Y. (2013). Justifying the use of a second language oral test as an exit test in Hong Kong: An application of assessment use argument framework. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Johnson, R. C. (2011). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Unpublished doctoral dissertation, Macquarie University.Google Scholar

Jun, H. S. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.Google Scholar

Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38, 319–342.Google Scholar

Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Greenwood Publishing.Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.Google Scholar

Kenyon, D. (2012). Using Bachman’s assessment use argument as a tool in conceptualizing the issues surrounding linking ACTFL and CERF. In Tschirner, E. (Ed.), Aligning frameworks of reference in language testing: The ACTFL Proficiency Guidelines and the Common European Framework of Reference for Languages (pp. 23–34). Tübingen, Germany: Stauffenburg Verlag.Google Scholar

Kim, Y.-H. (2010). An argument-based validity inquiry into the Empirically-derived Descriptor-based Diagnostic (EDD) assessment in ESL academic writing. Unpublished doctoral dissertation, University of Toronto.Google Scholar

Klebanov, B. B., Ramineni, C., Kaufer, D., Yeoh, P., & Ishizaki, S. (2019). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1), 125–144.Google Scholar

Koizumi, R., Sakai, H., Ido, T., Ota, H., Hayama, M., Sato, M., & Nemoto, A. (2011). Toward validity argument for test interpretation and use based on scores of a diagnostic grammar test for Japanese learners of English. Japanese Journal for Research on Testing (『日本テスト学会誌』), 7(1), 99–119.Google Scholar

LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451–475.CrossRef Google Scholar

Lee, J. (2016). Transfer from ESL academic writing to first year composition and other disciplinary courses: An assessment perspective. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Li, Z. (2015). An argument-based validation study of the English Placement Test (EPT): Focusing on the inferences of extrapolation and ramification. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Llosa, L. (2005). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Llosa, L. (2008). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency based on teacher judgments. Educational Measurement: Issues and Practice, 27(3), 32–42.Google Scholar

Llosa, L., & Malone, M. E. (2019). Comparability of students’ writing performance on TOEFL iBT and in required university writing courses. Language Testing, 36(2), 235–263.Google Scholar

McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Oxford: Blackwell Publishing.Google Scholar

Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan Publishing Co.Google Scholar

Mislevy, R. J., & Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educational Measurement: Issues and Practice, Winter, 6–20.Google Scholar

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–62.Google Scholar

Norris, J. M. (2008). Validity evaluation in language assessment. New York: Peter Lang.Google Scholar

Pan, M., & Qian, D. D. (2017). Embedding corpora into the content validation of the grammar test of the National Matriculation English Test (NMET) in China. Language Assessment Quarterly, 14(2), 120–139.Google Scholar

Papageorgiou, S., & Tannenbaum, R. J. (2016). Situating standard setting within argument-based validity. Language Assessment Quarterly, 13(2), 109–123.Google Scholar

Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly, 7(2), 137–159.Google Scholar

Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Plakans, L., & Burke, M. (2013). The decision-making process in language program placement: Test and nontest factors interacting in context. Language Assessment Quarterly, 10(2), 115–134.Google Scholar

Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing, 28(4), 463–481.CrossRef Google Scholar

Sawaki, Y., & Sinharay, S. (2018). Do the TOEFL iBT^® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529–556.Google Scholar

Schimidgall, J. E., Getman, E. P., & Zu, J. (2018). Screener tests need validation too: Weighing an argument for test use against practical concerns. Language Testing, 35(4), 583–607.Google Scholar

Schmidgall, J. E, & Xi, X. (2020). Validation of language assessments. In Chapelle, C. A. (Ed.), Concise encyclopedia of applied linguistics (pp. 1123–1135). Oxford: Wiley-Blackwell.Google Scholar

So, Y. (2014). Are teacher perspectives useful? Incorporating EFL teacher feedback in the development of a large-scale international English test. Language Assessment Quarterly, 11(3), 283–303.Google Scholar

Suzuki, Y. (2015). Self-assessment of Japanese as a second language: The role of experiences in the naturalistic acquisition. Language Testing, 32(1), 63–81.Google Scholar

Toulmin, S. E. (2003). The uses of argument. Cambridge: Cambridge University Press.Google Scholar

Vongpumivitch, V. (2010). The General English Proficiency Test. In Cheng, L. & Curtis, A. (Eds.), English language assessment and the Chinese learner (pp. 158–172). New York: Routledge.Google Scholar

Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Wang, H., Choi, I., Schmidgall, J., & Bachman, L. F. (2012). Review of Pearson Test of English Academic. Language Testing, 29(4), 603–619.CrossRef Google Scholar

Weigle, S. C., Yang, W., & Montee, M. (2013). Exploring reading processes in an academic reading test using short-answer questions. Language Assessment Quarterly, 10(1), 28–48.Google Scholar

Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan.Google Scholar

Xi, X. (2008). Methods of test validation. In Shohamy, E. & Hornberger, N. H. (Eds.), Encyclopedia of language and education, 2nd edition, Volume 7: Language testing and assessment (pp. 177–196). New York: Springer.Google Scholar

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.Google Scholar

Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199–225.Google Scholar