Discovering multiword expressions

Aline Villavicencio; Marco Idiart

doi:10.1017/S1351324919000494

Discovering multiword expressions

Published online by Cambridge University Press: 11 September 2019

Aline Villavicencio and

Marco Idiart

Show author details

Aline Villavicencio*: Affiliation:
Federal University of Rio Grande do Sul, Porto Alegre, Brazil University of Sheffield, Sheffield, UK University of Essex, Colchester, England, UK
Marco Idiart: Affiliation:
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural language processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We concentrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods.

Keywords

Multiword expressions Association measures Compositionality Idiomaticity

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 6 , November 2019 , pp. 715 - 733

DOI: https://doi.org/10.1017/S1351324919000494 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Arnon, I. and Snider, N. (2010). More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62, 67–82.CrossRef Google Scholar

Attia, M., Toral, A., Tounsi, L., Pecina, P. and van Genabith, J. (2010). Automatic extraction of Arabic multiword expressions. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications (MWE 2010), Beijing, China. Association for Computational Linguistics, pp. 18–26.Google Scholar

Baldwin, T. and Kim, S. N. (2010). Multiword expressions. In Indurkhya, N. and Damerau, F. J. (eds), Handbook of Natural Language Processing, 2nd Edn. Boca Raton, FL, USA: CRC Press, Taylor and Francis Group, pp. 267–292.Google Scholar

Baroni, M., Dinu, G. and Kruszewski, G. (2014). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland. Association for Computational Linguistics, pp. 238–247.CrossRef Google Scholar

Barrett, M., Bingel, J., Hollenstein, N., Rei, M. and Søgaard, A. (2018). Sequence classification with human attention. In Korhonen, A. and Titov, I., (eds), Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, October 31–November 1, 2018, Brussels, Belgium. Association for Computational Linguistics, pp. 302–312.Google Scholar

Barrett, M., Bingel, J., Keller, F. and Søgaard, A. (2016). Weakly supervised part-of-speech tagging using eye-tracking data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2 of Short Papers. The Association for Computer Linguistics.Google Scholar

Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999). Longman Grammar of Spoken and Written English, 1st Edn. Harlow, Essex: Pearson Education Ltd. 1204 p.Google Scholar

Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009, volume Normalized, Tübingen, pp. 31–40.Google Scholar

Butnariu, C., Kim, S.N., Nakov, P., Ó Séaghdha, D., Szpakowicz, S. and Veale, T. (2009). SemEval-2010 task 9: The interpretation of noun compounds using paraphrasing verbs and prepositions. In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009), Boulder, Colorado. Association for Computational Linguistics, pp. 100–105.CrossRef Google Scholar

Cacciari, C. and Tabossi, P. (1988). The comprehension of idioms. Journal of Memory and Language 27, 668–683.CrossRef Google Scholar

Calzolari, N., Fillmore, C.J., Grishman, R., Ide, N., Lenci, A., MacLeod, C. and Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands, Spain. European Language Resources Association (ELRA).Google Scholar

Camacho-Collados, J., Pilehvar, M.T. and Navigli, R. (2015). A framework for the construction of monolingual and cross-lingual word similarity datasets. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China. Association for Computational Linguistics, pp. 1–7.Google Scholar

Caseli, H.d.M., Ramisch, C., Nunes, M.d.G.V. and Villavicencio, A. (2010). Alignment-based extraction of multiword expressions. Language Resources and Evaluation 44(1–2), 59–77.CrossRef Google Scholar

Church, K. (2013). How many multiword expressions do people know? ACM Transactions on Speech and Language Processing 10(2), 4:1–4: 13.CrossRef Google Scholar

Church, K.W. and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29.Google Scholar

Clark, S. 2015. Vector Space Models of Lexical Meaning, Chapter 16. John Wiley & Sons, Ltd, pp. 493–522.Google Scholar

Constant, M., Eryiit, G., Monti, J., Plas, L., Ramisch, C., Rosner, M. and Todirascu, A. (2017). Multiword expression processing: A survey. Computational Linguistics 43(4), 837–892.CrossRef Google Scholar

Cook, P., Fazly, A. and Stevenson, S. (2008). The VNC-tokens Dataset. In Grégoire, N., Evert, S. and Krenn, B. (eds), Proceedings of the LREC Workshop Towards a Shared Task for MWEs (MWE 2008), Marrakech, Morocco, pp. 19–22.Google Scholar

Cop, U., Dirix, N., Drieghe, D. and Duyck, W. (2017). Presenting geco: An eyetracking corpus of monolingual and bilingual sentence reading. Behavior Research Methods 49(2), 602–615.CrossRef Google Scholar PubMed

Cordeiro, S., Villavicencio, A., Idiart, M. and Ramisch, C. (2019). Unsupervised compositionality prediction of nominal compounds. Computational Linguistics 45(1), 1–57.CrossRef Google Scholar

Curran, J. and Moens, M. (2002). Scaling context space. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 231–238.Google Scholar

de Marneffe, M.-C., Padó, S. and Manning, C.D. (2009). Multi-word expressions in textual inference: Much ado about nothing? In Proceedings of the 2009 Workshop on Applied Textual Inference, Suntec, Singapore. Association for Computational Linguistics, pp. 1–9.Google Scholar

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74.Google Scholar

Evert, S. and Krenn, B. (2005). Using small random samples for the manual evaluation of statistical association measures. Computer Speech and Language 19(4), 450–466.CrossRef Google Scholar

Farahmand, M., Smith, A. and Nivre, J. (2015). A multiword expression data set: Annotating non-compositionality and conventionalization for english noun compounds. In Proceedings of the 11th Workshop on Multiword Expressions, Denver, Colorado. Association for Computational Linguistics, pp. 29–33.CrossRef Google Scholar

Fazly, A., Cook, P. and Stevenson, S. (2009). Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1), 61–103.CrossRef Google Scholar

Fellbaum, C. (ed) (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication). Cambridge, Massachusetts: MIT Press, 423 p.CrossRef Google Scholar

Fillmore, C.J. (1979). Innocence: A second idealization for linguistics. Annual Meeting of the Berkeley Linguistics Society 5, pp. 63–76.CrossRef Google Scholar

Fillmore, C.J., Kay, P. and O’Connor, M.C. (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language 64, 501–538.CrossRef Google Scholar

Firth, J.R. (1957). Papers in Linguistics 1934–1951. Oxford, UK: Oxford UP, 233 p.Google Scholar

Frege, G. (1892–1960). Über sinn und bedeutung. Zeitschrift für Philosophie und philosophische Kritik 100, 25–50. Translated, as ‘On Sense and Reference’, by Max Black.Google Scholar

Glucksberg, S. (1989). Metaphors in conversation: How are they understood? why are they used? Metaphor and Symbolic Activity 4(3), 125–143.CrossRef Google Scholar

Hartung, M., Kaupmann, F., Jebbara, S. and Cimiano, P. (2017). Learning compositionality functions on word embeddings for modelling attribute meaning in adjective-noun phrases. In Proceedings of the 15th Meeting of the European Chapter of the Association for Computational Linguistics (EACL).CrossRef Google Scholar

Hendrickx, I., Kozareva, Z., Nakov, P., Ó Séaghdha, D., Szpakowicz, S. and Veale, T. (2013). Semeval-2013 task 4: Free paraphrases of noun compounds. In Proceedings of *SEM 2013, Volume 2 – SemEval. ACL, pp. 138–143.Google Scholar

Jackendoff, R. (1997). Twistin’ the night away. Language 73, 534–559.CrossRef Google Scholar

Justeson, J.S. and Katz, S.M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1), 9–27.CrossRef Google Scholar

Kilgarriff, A., Rychlý, P., Smrz, P. and Tugwell, D. (2004a). The sketch engine. In Williams, G. and Vessier, S. (eds), Proceedings of the 11th EURALEX International Congress, Lorient, France. Université de Bretagne-Sud, Faculté des lettres et des sciences humaines, pp. 105–115.Google Scholar

Kilgarriff, A., Rychly, P., Smrz, P. and Tugwell, D. (2004b). The sketch engine. In Proceedings of EURALEX.Google Scholar

Kim, S.N., Medelyan, O., Kan, M.-Y. and Baldwin, T. (2010). Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Erk, K. and Strapparava, C. (eds), Proceedings of the 5th SemEval (SemEval 2010), Uppsala, Sweden. ACL, pp. 21–26.Google Scholar

King, M. and Cook, P. (2018). Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of english verb-noun combinations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics, pp. 345–350.CrossRef Google Scholar

Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A. and Fidler, S. (2015). Skip-thought vectors. In Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M. and Garnett, R. (eds), Advances in Neural Information Processing Systems 28, Curran Associates, Inc, pp. 3294–3302.Google Scholar

Korkontzelos, I., Zesch, T., Zanzotto, F.M. and Biemann, C. (2013). Semeval-2013 task 5: Evaluating phrasal semantics. In Diab, M.T., Baldwin, T. and Baroni, M. (eds), Proceedings of the 7th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2013, June 14–15, Atlanta, Georgia, USA, 2013, pp. 39–47.Google Scholar

Kruszewski, G. and Baroni, M. (2014). Dead parrots make bad pets: Exploring modifier effects in noun phrases. In Bos, J., Frank, A. and Navigli, R. (eds), Proceedings of the Third Joint Conference on Lexical and Computational Semantics, *SEM@COLING 2014, August 23–24, 2014, Dublin, Ireland. The *SEM 2014 Organizing Committee, pp. 171–181.CrossRef Google Scholar

Lapesa, G. and Evert, S. (2014). A large scale evaluation of distributional semantic models: Parameters, interactions and model selection. Transactions of the Association for Computational Linguistics 2, 531–545.CrossRef Google Scholar

Lapesa, G. and Evert, S. (2017). Large-scale evaluation of dependency-based DSMs: Are they worth the effort? In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain. Association for Computational Linguistics, pp. 394–400.Google Scholar

Leacock, C. and Chodorow, M. (1998). Combining local context and wordnet similarity for word sense identification. In Fellfaum, C. (ed), WordNet: An electronic lexical database, pp. 265–283, Cambridge, Massachusetts: MIT Press.Google Scholar

Levy, O. and Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland. Association for Computational Linguistics, pp. 302–308.CrossRef Google Scholar

Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics-Volume 2. Association for Computational Linguistics, pp. 768–774.Google Scholar

Losnegaard, G.S., Sangati, F., Parra Escartín, C., Savary, A., Bargmann, S. and Monti, J. (2016). PARSEME survey on MWE resources. In 9th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, pp. 2299–2306.Google Scholar

Manning, C.D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, USA: MIT Press, 620 p.Google Scholar

McCarthy, D., Keller, B. and Carroll, J. (2003). Detecting a continuum of compositionality in phrasal verbs. In Bond, F., Korhonen, A., McCarthy, D., and Villavicencio, A. (eds), Proceedings of the ACL Workshop on MWEs: Analysis, Acquisition and Treatment (MWE 2003), Sapporo, Japan. ACL, pp. 73–80.Google Scholar

McGill, W.J. (1954). Multivariate information transmission. Psychometrika 19(2), 97–116.CrossRef Google Scholar

Melamed, I.D. (1997). Automatic discovery of non-compositional compounds in parallel data. In Proceedings of the 2nd EMNLP (EMNLP-2), Brown University, RI, USA. ACL, pp. 97–108.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of 26th International Conference on Neural Information Processing Systems - Volume 2, Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, pp. 3111–3119.Google Scholar

Mitchell, J. and Lapata, M. (2008). Vector-based models of semantic composition. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08:HLT), Columbus, Ohio. Association for Computational Linguistics, pp. 236–244.Google Scholar

Mitchell, J. and Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science 34(8), 1388–1429.CrossRef Google Scholar PubMed

Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-based Approach. Oxford Studies in Lexicography. Oxford, UK: Clarendon Press.Google Scholar

Nakov, P. (2008). Paraphrasing verbs for noun compound interpretation. In Proceedings of the LREC Workshop Towards a Shared Task for MWEs (MWE 2008), pp. 46–49.Google Scholar

Nunberg, G., Sag, I.A. and Wasow, T. (1994). Idioms. In Everson, S. (ed), Language, Oxford, UK: Cambridge University Press, pp. 491–538.Google Scholar

Padró, M., Idiart, M., Villavicencio, A. and Ramisch, C. (2014). Nothing like good old frequency: Studying context filters for distributional thesauri. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014) - Short Papers, Doha, Qatar.CrossRef Google Scholar

Pastor, G.C. and Colson, J.-P. (2019). Computational and Corpus-based Phraseology. John Benjamins.CrossRef Google Scholar

Pearce, D. (2001). Synonymy in collocation extraction. In WordNet and Other Lexical Resources: Applications, Extensions and Customizations (NAACL 2001 Workshop), pp. 41–46.Google Scholar

Pearce, D. (2002). A comparative evaluation of collocation extraction techniques. In Proceedings of the Third LREC (LREC 2002). Las Palmas, Canary Islands, Spain: ELRA, pp. 1530–1536.Google Scholar

Pecina, P. (2010). Lexical association measures and collocation extraction. Language Resources and Evaluation 44(1–2), 137–158.CrossRef Google Scholar

Pecina, P. and Schlesinger, P. (2006). Combining association measures for collocation extraction. In Proceedings of the 21th International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 651–658.CrossRef Google Scholar

Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 1532–1543.CrossRef Google Scholar

Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp. 2227–2237.Google Scholar

Ramisch, C. (2015). Multiword Expressions Acquisition: A Generic and Open Framework, volume XIV of Theory and Applications of Natural Language Processing. Springer.CrossRef Google Scholar

Ramisch, C., Cordeiro, S.R., Savary, A., Vincze, V., Barbu Mititelu, V., Bhatia, A., Buljan, M., Candito, M., Gantar, P., Giouli, V., Güngör, T., Hawwari, A., Iñurrieta, U., Kovalevskait, J., Krek, S., Lichte, T., Liebeskind, C., Monti, J., Parra Escartn, C., QasemiZadeh, B., Ramisch, R., Schneider, N., Stoyanova, I., Vaidya, A. and Walsh, A. (2018). Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 222–240.Google Scholar

Ramisch, C., Cordeiro, S., Zilio, L., Idiart, M., Villavicencio, A. and Wilkens, R. (2016). How naked is the naked truth? A multilingual lexicon of nominal compound compositionality. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, p. 156.CrossRef Google Scholar

Ramisch, C., Schreiner, P., Idiart, M. and Villavicencio, A. (2008a). An evaluation of methods for the extraction of multiword expressions. In Proceedings of the LREC 2008 Workshop on Multiword Expressions, Marrakech, pp. 50–53.Google Scholar

Ramisch, C. and Villavicencio, A. (2018). Computational treatment of multiword expressions. In Mitkov, R. (ed), The Oxford Handbook of Computational Linguistics, 2nd Edn, Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199573691.013.56.Google Scholar

Ramisch, C., Villavicencio, A., Moura, L. and Idiart, M. (2008b). Picking them up and figuring them out: Verb-particle constructions, noise and idiomaticity. In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester, England, pp. 49–56.CrossRef Google Scholar

Rayson, P., Piao, S., Sharoff, S., Evert, S. and Moirón, B.V. (2010). Multiword expressions: Hard going or plain sailing? Language Resources and Evaluation 44(1–2), 1–5.CrossRef Google Scholar

Reddy, S., McCarthy, D. and Manandhar, S. (2011). An empirical study on compositionality in compound nouns. In Proceedings of The 5th International Joint Conference on Natural Language Processing 2011 (IJCNLP 2011), Chiang Mai, Thailand.Google Scholar

Rohanian, O., Taslimipoor, S., Yaneva, V. and Ha, L. A. (2017). Using gaze data to predict multiword expressions. In Mitkov, R. and Angelova, G. (eds), Proceedings of the International Conference Recent Advances in Natural Language Processing, September 2–8, 2017, Varna, Bulgaria, pp. 601–609.CrossRef Google Scholar

Roller, S. and Schulte im Walde, S. (2014). Feature norms of German noun compounds. In Proceedings of the 10th Workshop on Multiword Expressions, ACL, pp. 104–108.Google Scholar

Roller, S., Schulte im Walde, S. and Scheible, S. (2013). The (un)expected effects of applying standard cleansing models to human ratings on compositionality. In Proceedings of the 9th Workshop on Multiword Expressions, Atlanta, Georgia, USA, pp. 32–41.Google Scholar

Rosén, V., Losnegaard, G.S., De Smedt, K., Bejček, E., Savary, A., Przepiórkowski, A., Osenova, P. and Barbu Mititelu, V. (2015). A survey of multiword expressions in treebanks. In Proceedings of the 14th International Workshop on Treebanks & Linguistic Theories Conference, Warsaw, Poland.Google Scholar

Sag, I.A., Baldwin, T., Bond, F., Copestake, A.A. and Flickinger, D. (2002). Multiword expressions: A pain in the neck for nlp. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, CICLing’02. Berlin, Heidelberg: Springer-Verlag, pp. 1–15.Google Scholar

Salehi, B., Cook, P. and Baldwin, T. (2014). Using distributional similarity of multi-way translations to predict multiword expression compositionality. In Bouma, G. and Parmentier, Y. (eds), Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden. The Association for Computer Linguistics, pp. 472–481.CrossRef Google Scholar

Salehi, B., Cook, P. and Baldwin, T. (2015). A word embedding approach to predicting the compositionality of multiword expressions. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado. Association for Computational Linguistics, pp. 977–983.Google Scholar

Salehi, B., Cook, P. and Baldwin, T. (2018). Exploiting multilingual lexical resources to predict MWE compositionality. In Markantonatou, S., Ramisch, C., Savary, A. and Vincze, V. (eds), Multiword Expressions at Length and in Depth: Extended Papers from the MWE 2017 Workshop. Berlin: Language Science Press, pp. 343–373.Google Scholar

Salton, G., Ross, R.J. and Kelleher, J.D. (2016). Idiom token classification using sentential distributed semantics. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics.Google Scholar

Savary, A., Sailer, M., Parmentier, Y., Rosner, M., Rosén, V., Przepiórkowski, A., Krstev, C., Vincze, V., Wójtowicz, B., Losnegaard, G.S., Parra Escartín, C., Waszczuk, J., Constant, M., Osenova, P. and Sangati, F. (2015). PARSEME – PARSing and Multiword Expressions within a European multilingual network. In 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2015), Pozna, Poland.Google Scholar

Schneider, N., Hovy, D., Johannsen, A. and Carpuat, M. (2016). SemEval-2016 task 10: Detecting minimal semantic units and their meanings (DiMSUM). In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California. Association for Computational Linguistics, pp. 546–559.Google Scholar

Schneider, N. and Smith, N.A. (2015). A corpus and model integrating multiword expressions and supersenses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado. Association for Computational Linguistics, pp. 1537–1547.Google Scholar

Schulte im Walde, S., Hätty, A., Bott, S. and Khvtisavrishvili, N. (2016). GhoSt-NN: A representative gold standard of German noun–noun compounds. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia. European Language Resources Association (ELRA), pp. 2285–2292.Google Scholar

Seretan, V. (2011). Syntax-Based Collocation Extraction, volume 44 of Text, Speech and Language Technology, 1st Edn. Dordrecht, Netherlands: Springer, 212 p.CrossRef Google Scholar

Siyanova-Chanturia, A. (2013). Eye-tracking and erps in multi-word expression research: A state-of-the-art review of the method and findings. The Mental Lexicon 8(2), 245–268.Google Scholar

Søgaard, A., Vulic, I., Ruder, S. and Faruqui, M. (2019). Cross-Lingual Word Embeddings . Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.Google Scholar

Sporleder, C. and Li, L. (2009). Unsupervised recognition of literal and non-literal use of idiomatic expressions. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL’09, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 754–762.Google Scholar

Taslimipoor, S., Rohanian, O., Mitkov, R. and Fazly, A. (2017). Investigating the opacity of verb-noun multiword expression usages in context. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain. Association for Computational Linguistics, pp. 133–138.CrossRef Google Scholar

Tsvetkov, Y. and Wintner, S. (2012). Extraction of multi-word expressions from small parallel corpora. Natural Language Engineering 18(04), 549–573.CrossRef Google Scholar

Turney, P.D. and Pantel, P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141–188.CrossRef Google Scholar

Van de Cruys, T. (2011). Two multivariate generalizations of pointwise mutual information. In Proceedings of the Workshop on Distributional Semantics and Compositionality, Portland, Oregon, USA. Association for Computational Linguistics, pp. 16–20.Google Scholar

Villavicencio, A. (2005). The availability of verb-particle constructions in lexical resources: How much is enough? Computer Speech & Language Special issue on MWEs 19(4), 415–432.CrossRef Google Scholar

Villavicencio, A., Kordoni, V., Zhang, Y., Idiart, M., and Ramisch, C. (2007). Validation and evaluation of automatically acquired multiword expressions for grammar engineering. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp. 1034–1043.Google Scholar

Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development 4(1), 66–82.CrossRef Google Scholar

Wilkens, R., Zilio, L., Cordeiro, S.R., Paula, F., Ramisch, C., Idiart, M. and Villavicencio, A. (2017). LexSubNC: A dataset of lexical substitution for nominal compounds. In Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017), Montpellier, France.Google Scholar

Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge, UK: Cambridge UP. 348 p.CrossRef Google Scholar

Wu, Z. and Palmer, M. (1994). Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, ACL’94, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 133–138.CrossRef Google Scholar

Yazdani, M., Farahmand, M. and Henderson, J. (2015). Learning semantic composition to detect non-compositionality of multiword expressions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. Association for Computational Linguistics, pp. 1733–1742.CrossRef Google Scholar

Zhang, Y., Kordoni, V., Villavicencio, A. and Idiart, M. (2006). Automated multiword expression prediction for grammar engineering. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, Sydney, Australia. Association for Computational Linguistics, pp. 36–44.CrossRef Google Scholar

Article contents

Discovering multiword expressions

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests