Compilation of Recommendations

Details

Short Title

R1. Clarify the purpose(s) of your markup (or why you want to markup your data)

Source Documnent

Guidelines for publishing structured metadata on the Web V3.0

Source Document Link

https://doi.org/10.15497/RDA00066

Publishing Organisation

RDA Research Metadata Schemas WG

Date of Publication

2021-06-15

Topic

Discovery/ indexing/ search, Interlinking/ interoperability

Addressed Stakeholders

data service providers, data stewards

Keywords

metadata

Text

Before publishing structured data, the first question one has to ask is: what are the purposes of adding structured data to resource landing pages? The answer to this question may impact the scope of the task and decisions made at a later stage of the process, for example, which resource objects from a repository should be in scope, which schema, vocabulary and syntactic implementation are appropriate. In general, there are two broad use cases for publishing structured data: 1. For data discovery: The initial motivation for having structured data came from web search engine operators, whose purpose is to improve data search and result presentation over the web. Repositories need to check what search features are provided by the targeted web search tools, as that can impact on the coverage of structured data. The most common search future is the keyword search. Keyword search indicates topical relevance between searched keyword and searched data, this topic information is usually captured in descriptive metadata such as title, description and keywords. On top of the keyword search, some web data search tools involve advanced search features such as facet filter or facet search along one or more data attributes, to help users to narrow down or broaden up a search, to assist assessment on the relevance or usefulness of a candidate datasets. Other novel advanced data discovery features include the utilisation of data linkage to construct knowledge graphs, for instance, combining Wikidata and Bioschemas data. Novel strategies such as these aim to achieve more precise answers to a search query. And the more discovery features are offered, the more coverage of metadata is required. 2. For exchanging metadata with other repositories: Embedding structured data in landing pages offers a new way for metadata aggregators to harvest metadata. Currently, if a metadata aggregator harvests metadata from multiple data repositories, or a data repository exports detailed metadata to multiple downstream repositories or catalogues, either the metadata aggregator or the data repository would have to implement and maintain several crosswalks. If both data repositories and aggregators are implementing structured data markup, they would save resources on maintaining crosswalks as they only need to have a crosswalk from their own schemas to/from the common markup vocabularies. Aggregators have a similar purpose to the Web discovery applications, that is to make the aggregated metadata (thus data) more discoverable. Aggregators of a specific domain may accommodate and require more detailed metadata than generalist aggregators. So repositories involved in harvesting and exchanging metadata need to understand each other’s requirement and potential usage of metadata. In either use case (or both), one needs to first identify the purpose and understand the requirement of down streaming metadata consumers, as which impacts on the scope of the project that sets up to publish structured data. In addition, it is worth noting that the power of structured data lies in its connection to other resources or entities published to the web, for example, a dataset may be a subset or derivative from another dataset; or a dataset may be a secondary product, produced following some software processing, the result of a workflow, etc. Linking to other relevant resources is a good practice for data discovery, metadata exchange and data aggregation.