Research Data Management Guide: Organize Your Data
Consistent File Naming and Organization
Following a consistent standard is extremely important for keeping data easy to find and well documented. The top-level directory should have a descriptive name, often that includes the project name and year. Substructure within the directory should have a consistent naming convention and could include separate sub-directories for different trials of an experiment, versions of a dataset, or researchers working on the project.
Adopting a file naming convention can be a key part of keeping your project data organized. A good filename should
-
briefly and uniquely describe a dataset
-
include a timestamp or location designation, if applicable
-
not have any spaces; use “_” or “-” instead
-
include some kind of versioning logic in the filename to track edits or changes to the dataset
Think carefully about how your data is stored in files and folders; it should be fairly easy for someone new to the project to efficiently access your data.
Choosing a File Format for Long-Term Access
Digital information is designed to be read and used by computers, and are thus dependent on trends in hardware and software development. Access to your data over a long period of time cannot be guaranteed, in spite of the backwards compatibility built into many software packages.
Formats likely to be accessible in the long term are in an open, documented standard that is non-proprietary, uncompressed and unencrypted, and in standard use across a community. For instance,
-
text (.txt) or pdf/a (.pdfa) instead of MS Word (.docx)
-
comma-separated (.csv) or tab-separated (.tsv) instead of MS Excel (.xlsx)
If a proprietary format is a discipline or industry standard, a researcher could consider keeping a copy of the final dataset in the original format and, if possible, an additional copy in a non-proprietary open format. Migrating to an open standard later in the life of the dataset is possible, but risks data loss.
Data Services administers VTechData, a data repository for Virginia Tech research, and can support preservation migration of certain file formats.
For more information on short and long-term storage, see our page on storage and security.
Resources
- File Formats Table (UK Data Archive)
- Spreadsheet Help (California Digital Library)
- Version Control and Authenticity (UK Data Archive)
- Version Control Video Lessons (Software Carpentry)