Why is data documentation important?
Providing documentation is a crucial part of the research process; in order for data to be used effectively and efficiently, certain facts about that data must be recorded. Depending on one’s own memory is not a good solution, especially in collaborative research environments. Describing research protocols and documenting data through the creation of metadata, lab notebooks, instrument calibrations, methodology outlines or codebooks will ensure that your data is valid and reproducible. If you’re looking for more information about particular metadata standards, see our page on metadata.
Good documentation practices involve two levels of information:
1. Research study level documentation provides an overview of the project, and may cover any number of related datasets. It may include:
the context of the research study as it relates to the dataset
history of the project
goals of the project
project staff and funding sources (if applicable)
operating hypotheses
the origins of the data
if the data was collected or created as part of the research project
all collection protocols
instruments (including any equipment numbers)
hardware and software used
if the data was created outside the project
source or repository in which the data was found
any information about the original creators and their project
download date/time
2. Data level documentation provides technical information about each individual dataset, and may be either be embedded into the data file or stored as a separate file. This level of documentation may include:
file names and versions
variable descriptions, data types and values
location of header columns
explanation of codes or classification systems
explanations of missing values
software or hardware information specific to the creation of a particular dataset