Identification of data collections

Short Title: Identification of data collections
Source Documnent: Principles and best practices in data versioning for all datasets big and small
Source Document Link: https://doi.org/10.15497/RDA00042
Publishing Organisation: RDA Data Versioning WG
Date of Publication: 2020-01-16
Topic: Policy, Quality control/ curation
Keywords: data collections, PID, persistent identifiers
Addressed Stakeholders: data stewards, policy makers
Full Text: Datasets may be aggregated into collections or timeseries. These collections can be seen as “works of works” (Hourclé, 2009), similar to a journal series. Following this practice, the collection (work of works) should be identified and versioned, and so should be each of its constituent datasets (works) (Klump et al., 2016) Some data collections, such as time series data, are expected to change over time as new data are added. Here, the entire time series should be identified, as should be time-stamped revisions, if the series is updated frequently (Rauber, et al., 2016). As not all changes are due to the addition of data over time, but may also be the result of corrections, recalibrations, etc. it is also recommended to adopt a dataset release policy for time series data.

← Back