FsF-R1.2-01M Metadata includes provenance information about data creation or generation.
- Short Title: FsF-R1.2-01M Metadata includes provenance information about data creation or generation.
- Source Documnent: FAIRsFAIR Data Object Assessment Metrics
- Source Document Link: https://doi.org/10.5281/zenodo.4081213
- Publishing Organisation: FAIRsFAIR
- Date of Publication: 2020-10-12
- Topic: Metadata richness/ ingest/ submission
- Keywords: metadata, PID, provenance
- Addressed Stakeholders: data service providers, research community, data stewards
- Full Text: Data provenance (also known as lineage) represents a dataset's history, including the people, entities, and processes involved in its creation, management and longer-term curation. It is essential that data producers provide provenance information about the data to enable informed use and reuse. The levels of provenance information needed can vary depending on the data type (e.g., measurement, observation, derived data, or data product) and research domains. For that reason, it is difficult to define a set of finite provenance properties that will be adequate for all domains. Based on existing work, we suggest that the following provenance properties of data generation or collection are included in the metadata record as a minimum. - Sources of data, e.g., datasets the data is derived from and instruments - Data creation or collection date - Contributors involved in data creation and their roles - Data publication, modification and versioning information ... There are various ways through which provenance information may be included in a metadata record. Some of the provenance properties (e.g., instrument, contributor) may be best represented using PIDs (such as DOIs for data, ORCIDs for researchers). This way, humans and systems can retrieve more information about each of the properties by resolving the PIDs. Alternatively, the provenance information can be given in a linked provenance record expressed explicitly in, e.g., PROV-O or PAV or Vocabulary of Interlinked Datasets (VoID).