RDM and MD Landscape in Earth & Environment

HMC Home -> HMC Hub Earth & Evironment -> Catalogue of Resources

Go to a collection of other useful resources collected by the hub

Compilation of Recommendations

Details


Short Title

R 4: Incorporate external vocabularies as necessary

Source Documnent

Guidelines for publishing structured metadata on the Web V3.0

Source Document Link

https://doi.org/10.15497/RDA00066

Publishing Organisation

RDA Research Metadata Schemas WG

Date of Publication

2021-06-15

Topic

Interlinking/ interoperability

Addressed Stakeholders

data service providers, data stewards

Keywords

vocabulary, terminology, semantics

Text

A research data repository may use controlled vocabularies or other semantic resources to: Specify relationships between described resources, for example, a dataset is a subset of another dataset, a dataset is collected through a instrument, and then is cleaned and normalised by software; Provide the allowed range of a property value, for example, Library Congress Subject Heading for indicating topics of a library resource, the BODC Parameter Usage Vocabulary (PUV)36 for labelling scientific variables. The purpose of using controlled vocabularies is to standardise information, so that there is a shared understanding of the concepts, facilitating interoperability between adopters of those vocabularies, and enabling resources or resources with the same property to be linked thereby improving data discovery. However, generic schemas such as Schema.org vocabularies don’t enforce constraints or recommend controlled vocabularies for property values or rich relations between resource objects. This is a deliberate decision as Schema.org is for data from all domains (e.g., news, jobs, music, events, movies, among others), and fewer constraints make it more easily adoptable. However, a data repository can use Schema.org together with vocabularies from other standards or namespaces. The incorporation of external vocabularies into Schema.org may enrich data search interfaces, such as faceted or filter searches (Wu, et al, 2021), as well as to enable APIs such as aggregated search across repositories of a specific domain or related domains. When repositories plan to include vocabularies and properties outside of Schema.org, it is recommended they use linked open vocabularies and dereferencable property names as much as possible. Linked Open Vocabularies are a ‘high-quality catalogue of reusable vocabularies to describe Linked and Open Data’ (Vandenbussche, et al, 2017). The Linked Open Vocabularies website37 publishes about 723 vocabularies (e.g SKOS) and 72k terms (e.g., all property names from dcterms). Using linked open vocabulary terms will enable the connection of data from multiple repositories, for example, linking data that are of the same property (e.g datasets of the same subject heading ‘climate science’, or all datasets from the location X). Furthermore, using dereferencable Uniform Resource Identifiers (URIs) that point to a term or property value will provide unambiguous identification of the reference resource (i.e. does the term “apple” mean fruit in one repository and a corporation in another?), the URLs help provide context to interpret properties precisely.