Published online by Cambridge University Press: 04 October 2018
As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service-oriented ones (group 4). Contributions encompassed in these four groups are described, highlighting their reuse by the community and the modelling challenges that are still to be faced.
*We are very grateful to the anonymous reviewers for their meticulous reading of the survey and for providing us with numerous insightful and constructive suggestions to improve it. We would also like to thank Dr Guadalupe Aguado-de-Cea for her help in proofreading this manuscript. This work is supported by the Spanish Ministry of Education, Culture and Sports through the Formación del Profesorado Universitario (FPU) program, and by the Spanish Ministry of Economy and Competitiveness through the project 4V (TIN2013-46238-C4-2-R) within the FEDER funding scheme, the Juan de la Cierva program, and the Excellence Network ReTeLe (TIN2015-68955-REDT).