Variation in Participants and Stimuli in Acceptability Experiments

doi:10.1017/9781108569620.005

4 - Variation in Participants and Stimuli in Acceptability Experiments

from Part I - General Issues in Acceptability Experiments

Published online by Cambridge University Press: 16 December 2021

Jana Häussler and

Tom S. Juzek

Edited by

Grant Goodall

Show author details

Grant Goodall: Affiliation:
University of California, San Diego

Book contents

Get access

Summary

Judgments in acceptability judgment tasks are not uniform – because of the conditions involved, but also because of additional variation across participants and across items. Some of the variation is meaningful, some is noise. This chapter discusses both types of variation and provides recommendations on how to deal with them. We show how some of the interspeaker variation stems from micro-differences between grammars. Statistical procedures like distribution analysis or cluster analysis help in detecting such variation. The same procedures can be used to identify variation across items. Further, we outline how to reduce variation across and within items. In particular, we recommend keeping length and complexity of sentences constant as well as the accessibility of NP-antecedents. The rest of the chapter deals with variation stemming from extralinguistic sources. Beside individual differences related to performance factors, e.g. working memory, we discuss methodological artifacts like scale effects and non-cooperative behavior.

Keywords

variation noise interspeaker variation microvariation individual differences performance factors item construction scale effects non-cooperative behavior

Type: Chapter
Information: The Cambridge Handbook of Experimental Syntax , pp. 97 - 117

DOI: https://doi.org/10.1017/9781108569620.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Ambridge, B., Pine, J. M., Rowland, C. F., & Young, C. R. (2008). The effect of verb semantic class and verb frequency (entrenchment) on children’s and adults’ graded judgements of argument-structure overgeneralization errors. Cognition, 106, 87–129.CrossRef Google Scholar PubMed

Andersson, S.-G. & Kvam, S. (1984). Satzverschränkung im heutigen Deutsch. Eine syntaktische und funktionale Studie unter Berücksichtigung alternativer Konstruktionen. Tübingen: Narr.Google Scholar

Ariel, M. (1990). Accessing NP Antecedents. Abingdon: Routledge.Google Scholar

Baayen, R. H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press.Google Scholar

Baayen, R. H. & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12–28.CrossRef Google Scholar

Bader, M. & Häussler, J. (2010). Toward a model of grammaticality judgments. Journal of Linguistics, 46(2), 273–330.CrossRef Google Scholar

Bard, E. G., Robertson, D., & Sorace, A. (1996). Magnitude Estimation of linguistic acceptability. Language, 72(1), 32–68.Google Scholar

Bayer, J. (1984). Comp in Bavarian syntax. The Linguistic Review, 3(3), 209–274.Google Scholar

Bolinger, D. (1978). Asking more than one thing at a time. In Hiz, H., eds., Questions. Dordrecht: Reidel, pp. 97–106.Google Scholar

Brandner, E. (2012). Syntactic microvariation. Language and Linguistics Compass, 6, 113–130.Google Scholar

Bresnan, J. & Ford, M. (2010). Predicting syntax: Processing dative constructions in American and Australian varieties of English. Language, 86(1),168–213.CrossRef Google Scholar

Buchholz, S. & Latorre, J. (2011). Crowdsourcing preference tests, and how to detect cheating. In P. Cosi, R. De Mori, G. Di Fabbrizio, & R. Pieraccini, eds., INTERSPEECH 2011: 12th Annual Conference of the International Speech Communication Association, pp. 3053–3056. ISCA Archive: www.isca-speech.org/archive/interspeech_2011 Google Scholar

Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.Google Scholar

Chomsky, N. (1973). Conditions on transformations. In Anderson, S. & Kiparsky, P., eds., A Festschrift for Morris Halle. New York: Holt, Rinehart & Winston, pp. 232–286.Google Scholar

Chomsky, N. (1980). Rules and Representations (Woodbridge Lectures 11). New York: Columbia University Press.Google Scholar

Clifton, C., Jr., Fanselow, G., & Frazier, L. (2006). Amnestying superiority violations: Processing multiple questions. Linguistic Inquiry, 37, 51–68.Google Scholar

Clifton, C. Jr., Frazier, L., & Connine, C. (1984). Lexical expectations in sentence comprehension. Journal of Verbal Learning and Verbal Behaviour, 23, 696–708.Google Scholar

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.Google Scholar

Conway, A., Kane, M., Bunting, M., Hambrick, D. Z., Wilhelm, O., & Engle, R. (2005). Working memory span tasks: A methodological review and user’s guide. Psychonomic Bulletin and Review, 12, 769–86.Google Scholar

Cover, T. M. & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 1–27.Google Scholar

Cowart, W. (1997). Experimental Syntax: Applying Objective Methods to Sentence Judgments. Thousand Oaks, CA: Sage.Google Scholar

Cox, E. P (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17(4), 407–422.CrossRef Google Scholar

Crawford, J. (2012). Using syntactic satiation to investigate subject islands. In J. Choi, E. Hogue, A., Punske, J., Tat, D., Schertz, J., & Trueman, A., eds., Proceedings of the 29th West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Proceedings Project, pp. 38–45.Google Scholar

Culicover, P. W. & Jackendoff, R. (2010). Quantitative methods alone are not enough: Response to Gibson and Fedorenko. Trends Cognitive Science, 14, 234–235.Google Scholar

Dandurand, F., Shultz, T. R., & Onishi, K. H. (2008). Comparing online and lab methods in a problem-solving experiment. Behavior Research Methods, 40(2), 428–434.CrossRef Google Scholar

Divjak, D. (2017). The role of lexical frequency in the acceptability of syntactic variants: Evidence from that-clauses in Polish. Cognitive Science, 41(2), 354–382.Google Scholar

Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. In Mynatt, E., ed., CHI ʼ10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Atlanta, GA: Association for Computing Machinery, pp. 2399–2402.Google Scholar

Eickhoff, C. & de Vries, A. P. (2013). Increasing cheat robustness of crowdsourcing tasks. Information Retrieval, 16(2), 121–137.Google Scholar

Fanselow, G. & Frisch, S. (2006). Effects of processing difficulty on judgments of acceptability. In Fanselow, G., Féry, C., Schlesewsky, M., & Vogel, R., eds., Gradience in Grammar: Generative Perspectives. Oxford: Oxford University Press, pp. 291–316.Google Scholar

Fanselow, G., Kliegl, R., & Schlesewsky, M. (2005). Syntactic variation in German wh-questions: Empirical investigations of weak crossover violations and long wh-movement. Linguistic Variation Yearbook, 5, 37–63.CrossRef Google Scholar

Featherston, S. (2007). Data in generative grammar: The stick and the carrot. Theoretical Linguistics, 33, 269–318.Google Scholar

Featherston, S. (2008). Thermometer judgments as linguistic evidence. In Riehl, C. M. & Rothe, A., eds., Was ist linguistische Evidenz? Aachen: Shaker Verlag, pp. 69–89.Google Scholar

Fedorenko, E. & Gibson, E. (2010). Adding a third wh-phrase does not increase the acceptability of object-initial multiple-wh-questions. Syntax, 13(3), 183–195.Google Scholar

Fine, A. B., Jaeger, T. F., Farmer, T. A., & Qian, T. (2013). Rapid expectation adaptation during syntactic comprehension. PLoS ONE, 8(10), e77661. DOI:10.1371/journal.pone.0077661 CrossRef Google Scholar PubMed

Fix, E. & Hodges, J. (1989). Discriminatory analysis. Nonparametric discrimination: Consistency properties. International Statistical Review / Revue internationale de statistique, 57(3), 238–247.Google Scholar

Forgy, E. W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics, 21(3), 768–769.Google Scholar

Francom, J. (2009). Experimental syntax: Exploring the effect of repeated exposure to anomalous syntactic structure – evidence from rating and reading tasks. Doctoral dissertation, University of Arizona.Google Scholar

Garland, R. (1991). The mid-point on a rating scale: Is it desirable? Marketing Bulletin, 2, 66–70.Google Scholar

Gerbrich, H., Schreier, V., & Featherston, S. (2019). Standard items for English judgment studies: Syntax and semantics. In Featherston, S., Hörnig, R., von Wietersheim, S., & Winkler, S. (eds.), Experiments in Focus: Information Structure and Semantic Processing. Berlin: De Gruyter, pp. 305–327.Google Scholar

Gervain, J. (2003). Syntactic microvariation and methodology: problems and perspectives. Acta Linguistica Hungarica, 50(3–4), 405–434.Google Scholar

Ghiselli, E. E. (1939). All or none versus graded response questionnaires. Journal of Applied Psychology, 23, 405–415.Google Scholar

Gibson, E. & Fedorenko, E. (2010). Weak quantitative standards in linguistics research. Trends Cognitive Science, 14, 233–234.CrossRef Google Scholar PubMed

Gibson, E., Piantadosi, S., & Fedorenko, K. (2011). Using Mechanical Turk to obtain and analyze English acceptability judgments. Language and Linguistics Compass, 5(8), 509–524.Google Scholar

Givón, T. (1983). Topic Continuity in Discourse: A Quantitative Cross-Language Study. Amsterdam: John Benjamins.Google Scholar

Gries, S. Th. (2013). Statistics for Linguistics with R. A Practical Introduction, 2nd, revised ed. Berlin and Boston, MA: De Gruyter Mouton.Google Scholar

Guajardo, G. & Goodall, , G. (2019). On the status of Concordantia Temporum in Spanish: An experimental approach. Glossa: A Journal of General Linguistics, 4(1), 116. DOI: 10.5334/gjgl.749 Google Scholar

Hancock, R. & Bever, T. G. (2013). Genetic factors and normal variation in the organization of language. Biolinguistics, 7, 75–95.Google Scholar

Harrington Stack, C. M., James, A. N., & Watson, D. G. (2018). A failure to replicate rapid syntactic adaptation in comprehension. Memory & Cognition, 46, 864–877.Google Scholar

Hartsuiker, R. J., Bernolet, S., Schoonbaert, S., Speybroeck, S., & Vanderelst, D. (2008). Syntactic priming persists while the lexical boost decays: Evidence from written and spoken dialogue. Journal of Memory and Language, 58, 214–238.Google Scholar

Häussler, J. & Juzek, T. S. (2017). Hot topics surrounding acceptability judgement tasks. In Featherston, S., Hörnig, R., Steinberg, R., Umbreit, B., & Wallis, J., eds., Proceedings of Linguistic Evidence 2016: Empirical, Theoretical, and Computational Perspectives. University of Tübingen, http://dx.doi.org/10.15496/publikation-19039 Google Scholar

Hiramatsu, K. (2000). Accessing linguistic competence: Evidence from children’s and adults’ acceptability judgments. Doctoral dissertation, University of Connecticut.Google Scholar

Hofmeister, P., Jaeger, T. F., Sag, I. A., Arnon, I., & Snider, N. (2007). Locality and accessibility in wh-questions. In Featherston, S. & Sternefeld, W., eds., Roots: Linguistics in Search of Its Evidential Base. Berlin: Mouton de Gruyter, pp. 185–206.Google Scholar

Hofmeister, P. & Sag, I. A. (2010). Cognitive constraints and island effects. Language, 86, 366–415.Google Scholar

Hofmeister, P., Staum Casasanto, L., & Sag, I. A. (2012a). How do individual cognitive differences relate to acceptability judgments? A reply to Sprouse, Wagers, and Phillips. Language, 88, 390–400.Google Scholar

Hofmeister, P., Staum, Casasanto, L., & Sag, I. A. (2012b). Misapplying working memory tests: A reductio ad absurdum. Language, 88(2), 408–409.Google Scholar

Jegerski, J. (2014). Self-paced reading. In Jegerski, J. & VanPatten, B., eds., Research Methods in Second Language Psycholinguistics. New York: Routledge, pp. 20–49.Google Scholar

Just, M. A. & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1), 122–149.CrossRef Google Scholar PubMed

Kaan, E. & Chun, E. (2018). Priming and adaptation in native speakers and second-language learners. Bilingualism: Language and Cognition, 21, 228–242.Google Scholar

Kayne, R. (1983). Connectedness. Linguistic Inquiry, 14, 223–249.Google Scholar

Kazai, G. (2011). In search of quality in crowdsourcing for search engine evaluation. In Clough, P., Foley, C., Gurrin, C., Jones, G., Kraaij, W., Lee, H., & Murdock, V., eds., Advances in Information Retrieval. Heidelberg: Springer, pp. 165–176.Google Scholar

Kazai, G., Kamps, J., & Milic-Frayling, N. (2011). Worker types and personality traits in crowdsourcing relevance labels. In Berendt, B., de Vries, A., Fan, W., Macdonald, C., Ounis, I., & Ruthven, I., eds., Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). New York: ACM, pp. 1941–1944.Google Scholar

Kilpatrick, F. P. & Cantril, H. (1960). Self-anchoring scaling: A measure of individuals’ unique reality worlds. Journal of Individual Psychology, 16, 158–173.Google Scholar

Klaus, J. & Schriefers, H. (2016). Measuring verbal working memory capacity: A reading span task for laboratory and web-based use. OSF Preprints. December 7. DOI:10.31219/osf.io/nj48xGoogle Scholar

Kluender, R. (1998). On the distinction between strong and weak islands: A processing perspective. In Culicover, P. & McNally, L., eds., The Limits of Syntax (Syntax and Semantics, 29). San Diego, CA: Academic Press, pp. 241–279.Google Scholar

Kluender, R. (2004). Are subject islands subject to a processing account? In Chand, V., Kelleher, A., Rodríguez, A. J., & Schmeiser, B., eds., Proceedings of the 23rd West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Press, pp. 101–125.Google Scholar

Krantz, J. H. & Dalal, R. (2000). Validity of Web-based psychological research. In Birnbaum, M., ed., Psychological Experiments on the Internet. New York: Academic Press, pp. 35–60.Google Scholar

Kroch, A. (1989). Reflexes of grammar in patterns of language change. Language Variation and Change, 1, 199–244.Google Scholar

Kuno, S. & Robinson, J. (1972). Multiple wh-questions. Linguistic Inquiry, 3, 463–487.Google Scholar

Labov, W. (1966). The Social Stratification of English in New York City. Washington, DC: Center for Applied Linguistics.Google Scholar

Langsford, S., Perfors, A., Hendrickson, A. T., Kennedy, L. A., & Navarro, D. J. (2018). Quantifying sentence acceptability measures: Reliability, bias, and variability. Glossa: A Journal of General Linguistics, 3 (1), 37. DOI: 10.5334/gjgl.396 Google Scholar

Levshina, N. (2015). How to Do Linguistics with R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins.Google Scholar

Lloyd, S. P. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28, 129–137.Google Scholar

Luce, R. D. (1986). Response Times: Their Role in Inferring Elementary Mental Organization. New York: Oxford University Press.Google Scholar

Lühr, R. (1988). Zur Satzverschränkung im heutigen Deutsch. Groninger Arbeiten zur Germanistischen Linguistik, 29, 74–87.Google Scholar

Mason, W. & Suri, S. (2012). Conducting behavioral research on Amazonʼs Mechanical Turk. Behavior Research Methods, 44(1), 1–23.Google Scholar

Mattel, M. & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Study I: Reliability and validity. Journal of Applied Psychology, 56(6), 506–509.Google Scholar

Munro, R., Bethard, S., Kuperman, V., Lai, V. T., Melnick, R., Potts, C., Schnoebelen, T., & Tily, H. (2010). Crowdsourcing and language studies: the new generation of linguistic data. In NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, 122–130.Google Scholar

Pakulak, E. & Neville, H. J. (2010). Proficiency differences in syntactic processing of monolingual native speakers indexed by event-related potentials. Journal of Cognitive Neuroscience, 22(12), 2728–2744.Google Scholar

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.CrossRef Google Scholar

Pesetsky, D. (1982). Paths and categories. Doctoral dissertation, Massachusetts Institute of Technology.Google Scholar

Pesetsky, D. (1987). Wh-in-Situ: Movement and unselective binding. In Reuland, E. J. & ter Meulen, A. G. B., eds., The Representation of (In)definitness. Cambridge, MA: MIT Press, pp. 98–129.Google Scholar

Phillips, C. (2013). On the nature of island constraints. I: Language processing and reductionist accounts. In Sprouse, J. & Hornstein, N., eds., Experimental Syntax and Island Effects. Cambridge: Cambridge University Press, pp. 64–108.Google Scholar

Pickering, M. J. & Branigan, H. P. (1998). The representation of verbs: Evidence from syntactic priming in language production. Journal of Memory and Language, 39, 633–651.Google Scholar

Preston, C. C. & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1–15.CrossRef Google Scholar PubMed

Rayner, K. & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14, 191–201.Google Scholar

Ross, J. (1967). Constraints on variables in syntax. Doctoral dissertation, Massachusetts Institute of Technology.Google Scholar

Salzmann, M., Häussler, J., Bayer, J., & Bader, M. (2013). That-trace effects without traces. An experimental investigation. In Keine, S. & Sloggett, S., eds., Proceedings of the 42nd Annual Meeting of the North East Linguistic Society. Amherst, MA: GLSA, vol. 2, pp. 149–162.Google Scholar

Schnoebelen, T. & Kuperman, V. (2010). Using Amazon Mechanical Turk for linguistic research. Psihologija, 43(4), 441–464.Google Scholar

Schütze, C. T. (1996). The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. Chicago: University of Chicago Press.Google Scholar

Seidenberg, M. S. & MacDonald, M. C. (1999). A probabilistic constraints approach to language acquisition and processing. Cognitive Science, 23(4), 569–588.Google Scholar

Snyder, W. (2000). An experimental investigation of syntactic satiation effects. Linguistic Inquiry, 31(3), 575–582.Google Scholar

Soleymani, M. & Larson, M. (2010). Crowdsourcing for affective annotation of video: development of a viewer-reported boredom corpus. In ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010), pp. 4–8.Google Scholar

Sorokin, A. & Forsyth, D. (2008). Utility data annotation with Amazon Mechanical Turk: Computer vision and pattern recognition workshops. In IEEE Computer Society Conference on IEEE (CVPRW’08), pp. 1–8.Google Scholar

Sprouse, J. (2009). Revisiting satiation: Evidence for an equalization response strategy. Linguistic Inquiry, 40, 329–341.Google Scholar

Sprouse, J. (2011). A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods, 43(1), 155–167.Google Scholar

Sprouse, J., Wagers, M., & Phillips, C. (2012a). A test of the relation between working memory and syntactic island effects. Language, 88(1), 82–123.CrossRef Google Scholar

Sprouse, J., Wagers, M., & Phillips, C. (2012b). Working-memory capacity and island effects: A reminder of the issues and the facts. Language, 88(2), 401–407.Google Scholar

Stevens, S. S (1946). On the theory of scales of measurement. Science, 103, 667–688.Google Scholar

Tooley, K. M. & Bock, K. (2014). On the parity of structural persistence in language production and comprehension. Cognition, 132(2), 101–136.Google Scholar

Tooley, K. M. & Traxler, M. J. (2010). Syntactic priming effects in comprehension: A critical review. Language and Linguistics Compass, 4(10), 925–937.Google Scholar

Traxler, M. J. (2008). Lexically independent syntactic priming of adjunct relations in on-line sentence comprehension. Psychonomic Bulletin & Review, 15, 149–155.Google Scholar

Traxler, M. J., Tooley, K. M., & Pickering, M. J. (2014). Syntactic priming during sentence comprehension: Evidence for the lexical boost. Journal of Experimental Psychology: Learning, Memory and Cognition, 40(4), 905–918.Google Scholar

Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131.Google Scholar

Vogel, R. (2019). Grammatical taboos: An investigation on the impact of prescription in acceptability judgement experiments. Zeitschrift für Sprachwissenschaft, 38(1), 37–79.Google Scholar

Warren, T. & Gibson, E. (2002). The influence of referential processing on sentence complexity. Cognition, 85, 79–112.Google Scholar

Warren, T. & Gibson, E. (2005). Effects of NP type in reading cleft sentences in English. Language and Cognitive Processes, 20, 751–767.Google Scholar

Wasow, T. (2002). Postverbal Behavior. Stanford: CSLI Publications.Google Scholar

Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The effect of rating scale format on response styles: the number of response categories and response category labels. International Journal of Research in Marketing, 27, 236–247.CrossRef Google Scholar

Weskott, T. & Fanselow, G. (2011). On the informativity of different measures of linguistic acceptability. Language, 87(2), 249–273.Google Scholar

Winter, B. (2019). Statistics for Linguists: An Introduction Using R. New York: Routledge.Google Scholar

Zhu, D. & Carterette, B. (2010). An analysis of assessor behavior in crowdsourced preference judgments. In SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010), pp. 17–20.Google Scholar