Research Data Management Guide: Describe Your Data
Data Documentation
Why is data documentation important?
Providing documentation is a crucial part of the research process; in order for data to be used effectively and efficiently, certain facts about that data must be recorded. Depending on one’s own memory is not a good solution, especially in collaborative research environments. Describing research protocols and documenting data through the creation of metadata, lab notebooks, instrument calibrations, methodology outlines or codebooks will ensure that your data is valid and reproducible. If you’re looking for more information about particular metadata standards, see our page on metadata.
Good documentation practices involve two levels of information:
1. Research study level documentation provides an overview of the project, and may cover any number of related datasets. It may include:
-
the context of the research study as it relates to the dataset
-
history of the project
-
goals of the project
-
project staff and funding sources (if applicable)
-
operating hypotheses
-
-
the origins of the data
-
if the data was collected or created as part of the research project
-
all collection protocols
-
instruments (including any equipment numbers)
-
hardware and software used
-
-
if the data was created outside the project
-
source or repository in which the data was found
-
any information about the original creators and their project
-
download date/time
-
-
2. Data level documentation provides technical information about each individual dataset, and may be either be embedded into the data file or stored as a separate file. This level of documentation may include:
-
file names and versions
-
variable descriptions, data types and values
-
location of header columns
-
explanation of codes or classification systems
-
explanations of missing values
-
software or hardware information specific to the creation of a particular dataset