4 - Reconciling
Published online by Cambridge University Press: 10 September 2022
Summary
Learning outcomes of this chapter
• Why controlled vocabularies are important for linked data
• Being able to compare the lightweight linked data approach with the full-fledged semantic web
• Understanding the role of SKOS
• Learning to select the most suitable vocabularies to leverage your metadata
• Case study: reconciling the metadata of the Powerhouse Museum
Introduction
‘Controlled vocabularies are like underwear. Everyone thinks they are a good idea but no one wants to use someone else’s.’ So goes a classic joke within library and information science circles. As this chapter will demonstrate, there are nonetheless important reasons why one would want to share and re-use vocabularies.
Thesauri, taxonomies, classification schemes or any other manifestation of controlled vocabularies constitute the very core of the LIS profession. The creation of library classifications such as the Dewey Decimal Classification (DDC) or the Universal Decimal Classification (UDC) represent in many ways the intellectual birth of the LIS discipline more than a century ago. The creation of a controlled vocabulary and its use to describe and give access to collections has been a central activity within the profession of librarians, archivists and curatorial staff.
Throwing money into a black hole?
The creation and use of a controlled vocabulary is expensive, as both the development of a vocabulary and its subsequent use for indexing has to be performed by domain experts. For decades, computer science has tried to automate both processes. On the whole, these attempts have not been terribly successful. Currently, data mining and natural language processing (NLP) techniques most certainly can be used to speed up the collection of potential terms for the construction of a thesaurus. The same techniques can at a second stage be used to analyse what terms from the thesaurus may be used to describe a document. In practice, both procedures need to be supervised by a domain expert, and are in that sense semi-automated methods.
The arrival of the web drastically affected views on the use of controlled vocabularies. Even their most ardent advocates from the LIS domain realize that maintaining and applying a controlled vocabulary on the scale of the web is a utopian idea. Moreover, the success of Google's indexing services based on a full-text search of unstructured HTML pages led to a questioning of the traditional indexing practices.
- Type
- Chapter
- Information
- Linked Data for Libraries, Archives and MuseumsHow to clean, Link and Publish your Metadata, pp. 109 - 158Publisher: FacetPrint publication year: 2015