HMC Home -> HMC Hub Earth & Evironment -> Catalogue of Resources
Go to a collection of other useful resources collected by the hub
Compilation of Recommendations
Details
Short Title
FsF-R1.2-01M Metadata includes provenance information about data creation or generation.
Source Documnent
FAIRsFAIR Data Object Assessment Metrics
Source Document Link
https://doi.org/10.5281/zenodo.4081213
Publishing Organisation
FAIRsFAIR
Date of Publication
2020-10-12
Topic
Metadata richness/ ingest/ submission
Addressed Stakeholders
data service providers, research community, data stewards
Keywords
metadata, PID, provenance
Text
Data provenance (also known as lineage) represents a dataset's history, including the people, entities, and processes involved in its creation, management and longer-term curation. It is essential that data producers provide provenance information about the data to enable informed use and reuse. The levels of provenance information needed can vary depending on the data type (e.g., measurement, observation, derived data, or data product) and research domains. For that reason, it is difficult to define a set of finite provenance properties that will be adequate for all domains. Based on existing work, we suggest that the following provenance properties of data generation or collection are included in the metadata record as a minimum. - Sources of data, e.g., datasets the data is derived from and instruments - Data creation or collection date - Contributors involved in data creation and their roles - Data publication, modification and versioning information ... There are various ways through which provenance information may be included in a metadata record. Some of the provenance properties (e.g., instrument, contributor) may be best represented using PIDs (such as DOIs for data, ORCIDs for researchers). This way, humans and systems can retrieve more information about each of the properties by resolving the PIDs. Alternatively, the provenance information can be given in a linked provenance record expressed explicitly in, e.g., PROV-O or PAV or Vocabulary of Interlinked Datasets (VoID).