What Are The Characteristics of a Good Collection?

Making collections takes some preparation and planning. Here are some characteristics of good data collections:

  • Selfcontained. Collections should contain everything someone needs to understand the data set. This includes the complete data set, as well as notes on how to interpret the data, accurate and clear labels on variables, and, if possible, any source code or software used to process the data.
  • Selfdescribing. Although it’s a good idea to include notes explaining how your data\ is organised, it’s even better if this is also clear from looking at the collection itself! For example, if your data was collected over a number of days, using filenames that start with a date in YYYYMMDD (e.g. “20160315_Sugarloaf_Island_Humidity”) makes it clear which day a particular file was generated.
  • Longlived. This data might be need to be stored for a long time (potentially up to 25 years). This means that the format used to store your data might not be around when you or someone else needs to look at it again. This is especially a problem for files generated by rarely used proprietary software. Where possible it is a good idea to choose open or commonly used file formats to store your data.