RDM and MD Landscape in Earth & Environment

HMC Home -> HMC Hub Earth & Evironment -> Catalogue of Resources

Go to a collection of other useful resources collected by the hub

Compilation of Recommendations


Short Title

FsF-R1.2-01M Metadata includes provenance information about data creation or generation.

Source Documnent

FAIRsFAIR Data Object Assessment Metrics

Source Document Link


Publishing Organisation


Date of Publication



Metadata richness/ ingest/ submission

Addressed Stakeholders

data service providers, research community, data stewards


metadata, PID, provenance


Data provenance (also known as lineage) represents a dataset's history, including the people, entities, and processes involved in its creation, management and longer-term curation. It is essential that data producers provide provenance information about the data to enable informed use and reuse. The levels of provenance information needed can vary depending on the data type (e.g., measurement, observation, derived data, or data product) and research domains. For that reason, it is difficult to define a set of finite provenance properties that will be adequate for all domains. Based on existing work, we suggest that the following provenance properties of data generation or collection are included in the metadata record as a minimum. - Sources of data, e.g., datasets the data is derived from and instruments - Data creation or collection date - Contributors involved in data creation and their roles - Data publication, modification and versioning information  ... There are various ways through which provenance information may be included in a metadata record. Some of the provenance properties (e.g., instrument, contributor) may be best represented using PIDs (such as DOIs for data, ORCIDs for researchers). This way, humans and systems can retrieve more information about each of the properties by resolving the PIDs. Alternatively, the provenance information can be given in a linked provenance record expressed explicitly in, e.g., PROV-O or PAV or Vocabulary of Interlinked Datasets (VoID).