RDM and MD Landscape in Earth & Environment

HMC Home -> HMC Hub Earth & Evironment -> Catalogue of Resources

Go to a collection of other useful resources collected by the hub

Compilation of Recommendations

Details


Short Title

Identification of data collections

Source Documnent

Principles and best practices in data versioning for all datasets big and small

Source Document Link

https://doi.org/10.15497/RDA00042

Publishing Organisation

RDA Data Versioning WG

Date of Publication

2020-01-16

Topic

Policy, Quality control/ curation

Addressed Stakeholders

data stewards, policy makers

Keywords

data collections, PID, persistent identifiers

Text

Datasets may be aggregated into collections or timeseries. These collections can be seen as “works of works” (Hourclé, 2009), similar to a journal series. Following this practice, the collection (work of works) should be identified and versioned, and so should be each of its constituent datasets (works) (Klump et al., 2016) Some data collections, such as time series data, are expected to change over time as new data are added. Here, the entire time series should be identified, as should be time-stamped revisions, if the series is updated frequently (Rauber, et al., 2016). As not all changes are due to the addition of data over time, but may also be the result of corrections, recalibrations, etc. it is also recommended to adopt a dataset release policy for time series data.