Identification of data collections
- Short Title: Identification of data collections
- Source Documnent: Principles and best practices in data versioning for all datasets big and small
- Source Document Link: https://doi.org/10.15497/RDA00042
- Publishing Organisation: RDA Data Versioning WG
- Date of Publication: 2020-01-16
- Topic: Policy, Quality control/ curation
- Keywords: data collections, PID, persistent identifiers
- Addressed Stakeholders: data stewards, policy makers
- Full Text: Datasets may be aggregated into collections or timeseries. These collections can be seen as “works of works” (Hourclé, 2009), similar to a journal series. Following this practice, the collection (work of works) should be identified and versioned, and so should be each of its constituent datasets (works) (Klump et al., 2016) Some data collections, such as time series data, are expected to change over time as new data are added. Here, the entire time series should be identified, as should be time-stamped revisions, if the series is updated frequently (Rauber, et al., 2016). As not all changes are due to the addition of data over time, but may also be the result of corrections, recalibrations, etc. it is also recommended to adopt a dataset release policy for time series data.