Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2025-01-05T12:37:28.140Z Has data issue: false hasContentIssue false

Crowd-assessing quality in uncertain data linking datasets

Published online by Cambridge University Press:  02 July 2020

Daniel Faria
Affiliation:
Instituto Gulbenkian de Ciência, Oeiras, Portugal e-mail: [email protected] INESC-ID, Lisboa, Portugal
Alfio Ferrara
Affiliation:
Department of Computer Science, Università degli Studi di Milano, Milan, Italy e-mails: [email protected], [email protected] Data Science Research Center, Università degli Studi di Milano, Milan, Italy
Ernesto Jiménez-ruiz
Affiliation:
City, University of London, London, UK e-mail: [email protected] Department of Informatics, University of Oslo, Oslo, Norway e-mail: [email protected]
Stefano Montanelli
Affiliation:
Department of Computer Science, Università degli Studi di Milano, Milan, Italy e-mails: [email protected], [email protected] Data Science Research Center, Università degli Studi di Milano, Milan, Italy
Catia Pesquita
Affiliation:
Lasige, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal e-mail: [email protected]

Abstract

The quality of a dataset used for evaluating data linking methods, techniques, and tools depends on the availability of a set of mappings, called reference alignment, that is known to be correct. In particular, it is crucial that mappings effectively represent relations between pairs of entities that are indeed similar due to the fact that they denote the same object. Since the reliability of mappings is decisive in order to perform a fair evaluation of automatic linking methods and tools, we call this property of mappings as mapping fairness. In this article, we propose a crowd-based approach, called Crowd Quality (CQ), for assessing the quality of data linking datasets by measuring the fairness of the mappings in the reference alignment. Moreover, we present a real experiment, where we evaluate two state-of-the-art data linking tools before and after the refinement of the reference alignment based on the CQ approach, in order to present the benefits deriving from the crowd assessment of mapping fairness.

Type
Research Article
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Achichi, M., Cheatham, M., Dragisic, Z., Euzenat, J., Faria, D., Ferrara, A., Flouris, G., Fundulaki, I., Harrow, I., Ivanova, V., Jiménez-Ruiz, E., Kuss, E., Lambrix, P., Leopold, H., Li, H., Meilicke, C., Montanelli, S., Pesquita, C., Saveta, T., Shvaiko, P., Splendiani, A., Stuckenschmidt, H., Todorov, K., Trojahn dos Santos, C. & Zamazal, O. 2016. Results of the ontology alignment evaluation initiative 2016. In 11th International Workshop on Ontology Matching (OM 2016), Kobe, Japan, 73–129. CEUR-WS.org.Google Scholar
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S. & Lehmann, J. 2013. Crowdsourcing linked data quality assessment. In Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, 260–276.Google Scholar
Algergawy, A., Cheatham, M., Faria, D., Ferrara, A., Fundulaki, I., Harrow, I., Hertling, S., Jiménez-Ruiz, E., Karam, N., Khiat, N., Lambrix, P., Li, H., Montanelli, S., Paulheim, H., Pesquita, C., Saveta, T., Schmidt, D., Shvaiko, P., Splendiani, A., Thiéblin, E., Trojahn dos Santos, C., Vatascinová, J., Zamazal, O. & Zhou, L. 2018. Results of the ontology alignment evaluation initiative 2018. In 13th International Workshop on Ontology Matching (OM 2018), Monterey, CA, USA, 76–116. CEUR-WS.org.Google Scholar
Bozzon, A., Brambilla, M., Ceri, S. & Mauri, A. 2013. Reactive crowdsourcing. In Proceedings of the 22nd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil, 153–164.Google Scholar
Carmines, E. G. & Zeller, R. A. 1979. Reliability and Validity Assessment, 17. Sage Publications.CrossRefGoogle Scholar
Castano, S., Ferrara, A., Genta, L. & Montanelli, S. 2016. Combining Crowd Consensus and User Trustworthiness for Managing Collective Tasks. Future Generation Computer Systems, 54.Google Scholar
Castano, S., Ferrara, A. & Montanelli, S. (2015). A multi-dimensional approach to crowd-consensus modeling and evaluation. In Proceedings of the 34th International Conference on Conceptual Modeling (ER 2015), Stockholm, Sweden.CrossRefGoogle Scholar
Cheatham, M. & Hitzler, P. 2014. Conference v2.0: An uncertain version of the OAEI conference benchmark. In Proceedings of the 13th International Semantic Web Conference, Riva del Garda, Italy, 33–48.Google Scholar
Cruz, I. F., Loprete, F., Palmonari, M., Stroe, C. & Taheri, A. 2014. Pay-as-you-go multi-user feedback model for ontology matching. In Proceedings of the 19th International Conference on Knowledge Engineering and Knowledge Management, Link’oping, Sweden, 80–96.Google Scholar
Cuenca Grau, B., Dragisic, Z., Eckert, K., Euzenat, J., Ferrara, A., Granada, R., Ivanova, V., Jiménez-Ruiz, E., Kempf, A. O., Lambrix, P., Nikolov, A., Paulheim, H., Ritze, D., Scharffe, F., Shvaiko, P., Trojahn dos Santos, C. & Zamazal, O. 2013. Results of the ontology alignment evaluation initiative 2013. In 8th International Workshop on Ontology Matching (OM 2013), Sydney, Australia, 61–100. CEUR-WS.orgGoogle Scholar
Dragisic, Z., Ivanova, V., Lambrix, P., Faria, D., Jiménez-Ruiz, E., & Pesquita, C. (2016). User Validation in Ontology Alignment. In Proceedings of the 15th International Semantic Web Conference, Kobe, Japan.CrossRefGoogle Scholar
Estellés-Arolas, E. & Guevara, F. G. L. 2012. Towards an integrated crowdsourcing definition. Journal of Information Science 38(2), 189–200.Google Scholar
Euzenat, J., Rosoiu, M. & dos Santos, C. T. 2013. Ontology matching benchmarks: generation, stability, and discriminability. Journal of Web Semantics 21, 3048.CrossRefGoogle Scholar
Euzenat, J. & Shvaiko, P. 2013. Ontology Matching, 2nd edition. Springer.CrossRefGoogle Scholar
Euzenat, J. & Shvaiko, P. 2007. Ontology Matching, 18. Springer.Google Scholar
Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I. F. & Couto, F. M. 2013. The AgreementMakerLight ontology matching system. In OTM Conferences - ODBASE, 527–541.Google Scholar
Ferrara, A., Montanelli, S., Noessner, J. Stuckenschmidt, H. 2011. Benchmarking matching applications on the semantic web. In Extended Semantic Web Conference. Springer, 108122.Google Scholar
Galton, F. 1907. One vote, one value. Nature 75, 414.CrossRefGoogle Scholar
Genta, L., Ferrara, A. & Montanelli, S. 2017. Consensus-based techniques for range-task resolution in crowdsourcing systems. In Proceedings of the 7th EDBT International Workshop on Linked Web Data Management, Venice, Italy.Google Scholar
Howe, J. 2006. The rise of crowdsourcing. Wired Magazine 14(6), 14.Google Scholar
Jiménez-Ruiz, E. & Cuenca Grau, B. 2011. LogMap: logic-based and scalable ontology matching. In Proceedings of the 10th International Semantic Web Conference, Bonn, Germany, 273–288.Google Scholar
Jiménez-Ruiz, E., Cuenca Grau, B., Horrocks, I. & Berlanga, R. 2011. Logic-based assessment of the compatibility of UMLS ontology sources. Journal of Biomedical Semantics 2.CrossRefGoogle Scholar
Jiménez-Ruiz, E., Cuenca Grau, B., Zhou, Y. & Horrocks, I. 2012a. Large-scale interactive ontology matching: algorithms and implementation. In European Conference on Artificial Intelligence (ECAI), 444–449.Google Scholar
Jiménez-Ruiz, E., Grau, B. C., Horrocks, I.et al. 2012b. Exploiting the UMLS metathesaurus in the ontology alignment evaluation initiative. In 2nd International Workshop on Exploiting Large Knowledge Repositories (E- LKR). CEUR- WS. org.Google Scholar
Li, H., Dragisic, Z., Faria, D., Ivanova, V., Jiménez-Ruiz, E., Lambrix, P. & Pesquita, C. 2019. User validation in ontology alignment: functional assessment and impact. Knowledge Engineering Review 34, e15.CrossRefGoogle Scholar
Malone, T. W., Laubacher, R. & Dellarocas, C. 2010. The Collective Intelligence Genome. IEEE Engineering Management Review 38(3).CrossRefGoogle Scholar
Mortensen, J. M. 2013. Crowdsourcing Ontology Verification. In Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, 448–455.Google Scholar
Ngomo, A.-C. N. & Auer, S. 2011. Limesa time-efficient approach for large-scale link discovery on the web of data. In 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain.Google Scholar
Noronha, J., Hysen, E., Zhang, H. & Gajos, K. Z. 2011. Platemate: crowdsourcing nutritional analysis from food photographs. In Proceeding of the 24th Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 1–12.Google Scholar
Noy, N. F., Mortensen, J., Musen, M. A. & Alexander, P. R. 2013. Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow. In Proceedings of the 5th ACM Web Science Conference, Paris, France, 262–271.Google Scholar
Paulheim, H., Hertling, S. & Ritze, D. 2013. Towards evaluating interactive ontology matching tools. In Proceedings of the 10th Extended Semantic Web Conference, Montpellier, France, 31–45.Google Scholar
Röder, M., Saveta, T., Fundulaki, I. & Ngomo, A.-C. N. (2017). Hobbit link discovery benchmarks. 12th International Workshop on Ontology Matching (OM 2017), Vienna, Austria.Google Scholar
Sarasua, C., Simperl, E. & Noy, N. F. 2012. CrowdMap: crowdsourcing ontology alignment with microtasks. In Proceedings of the 11th International Semantic Web Conference, Boston, MA, USA, 525–541.Google Scholar
Saveta, T., Daskalaki, E., Flouris, G., Fundulaki, I., Herschel, M. & Ngonga Ngomo, A.-C. 2015. Pushing the limits of instance matching systems: a semantics-aware benchmark for linked data. In Proceedings of the 24th International Conference on World Wide Web, ACM, 105106.Google Scholar
Thaler, S., Simperl, E. P. B. & Siorpaes, K. 2011. SpotTheLink: a game for ontology alignment. In Proceedings of the 6th Conference on Professional Knowledge Management: From Knowledge to Action, Innsbruck, Austria, 246–253.Google Scholar
Van Dusen, D. A., Chase, C. & Wise, J. A. 2016. System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction. US Patent 9461876.Google Scholar
Volz, J., Bizer, C., Gaedke, M. & Kobilarov, G. 2009. Silk-a link discovery framework for the web of data. In International Workshop on Linked Data on the Web (LDOW2009), Madrid, Spain. CEUR-WS.org.Google Scholar