3 - Cleaning
Published online by Cambridge University Press: 10 September 2022
Summary
Learning outcomes of this chapter
• Adopting a broader view of metadata quality
• Why you need to clean your metadata in the context of linked data
• Identification of most common metadata quality issues
• Understanding the possibilities and limits of automated metadata cleaning
• Case study: cleaning metadata of the Schoenberg Database of Manuscripts
Introduction
‘It is not a bug, it is a feature’ is one of the more interesting lines one of us learnt when working for a software company. When a customer noticed an inconsistency in one of the products, the challenge was to convince the client the issue was not a shortcoming but actually a quality of the software. This line comes to mind when we think about the relation between linked data and metadata quality. The lack of consistent, formalized and well structured data on the web is often presented as the biggest Achilles’ heel for the realization of the semantic web and linked data vision. However, we prefer to see the same reality from another viewpoint. Even the most ardent critic of linked data must admit at least one positive outcome: linked data have put metadata quality in the spotlight, finally giving this topic the attention it deserves.
If you only remember one thing from this chapter, it should be this: all metadata is dirty, but you can do something about it. Recurrent metadata quality issues such as duplicate records or inconsistent encoding of dates or names all have a negative impact on the use of your metadata but also on the implementation of linked data methodologies. As Chapters 4 and 5 will demonstrate, the success rate of methods such as reconciliation and enrichment depends to a large extent on how consistent and well structured your metadata are. Data profiling and cleaning techniques will teach you how to spot these issues and where possible mitigate them.
The difficulty of combining theory and practice
Data quality has attracted a lot of attention recently within academic circles. A large number of papers and books describe data quality with the help of theoretical concepts, models and frameworks which often refer to and build upon one another.
- Type
- Chapter
- Information
- Linked Data for Libraries, Archives and MuseumsHow to clean, Link and Publish your Metadata, pp. 71 - 108Publisher: FacetPrint publication year: 2015