Article contents
Finding next of kin: Cross-lingual embedding spaces for related languages
Published online by Cambridge University Press: 04 September 2019
Abstract
Some languages have very few NLP resources, while many of them are closely related to better-resourced languages. This paper explores how the similarity between the languages can be utilised by porting resources from better- to lesser-resourced languages. The paper introduces a way of building a representation shared across related languages by combining cross-lingual embedding methods with a lexical similarity measure which is based on the weighted Levenshtein distance. One of the outcomes of the experiments is a Panslavonic embedding space for nine Balto-Slavonic languages. The paper demonstrates that the resulting embedding space helps in such applications as morphological prediction, named-entity recognition and genre classification.
- Type
- Article
- Information
- Copyright
- © Cambridge University Press 2019
References
- 6
- Cited by