Virginia Tech Data Repository: Preparing Data for Deposit
Preparing Data For Deposit
Before depositing your data in the Virginia Tech Data Repository and requesting its publication to receive a digital object identifier (DOI), go through the following steps in preparation.
For the purposes of archiving and sharing in our repository, data includes, or can solely be, software or code.
Contact email@example.com for assistance in preparing your data for deposit; we are here to help!
You will be required to read and agree with this Deposit Agreement and that you understand your roles and responsibilities as the Depositor.
Can you deposit and openly share your data?
Virginia Tech researchers should only deposit and openly share (publish) data for which they have the rights to do so, and in doing so do not violate laws or ethics.
More information on this requirement is given within the Virginia Tech Data Repository Deposit Agreement that you are required to agree to upon data deposit.
Files to Include in your Data Deposit
Be sure that you have on hand all relevant data and documentation for upload and deposit into the Virginia Tech Data Repository.
In depositing your data for publication there are a few questions you should ask yourself:
Who do I expect to make use of my data? Who is the intended audience?
For what purpose am I depositing data? What files containing data and documentation are needed to fit this purpose?
Should my deposit include code or software used to process or generate the data? Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.
Prior to publication you are able to upload files for your dataset in one or more sessions. However, any additional files to be added to a published dataset will necessitate a new version of this dataset and possibly a new DOI.
If the size of your dataset exceeds 25GB, please contact firstname.lastname@example.org to inform the Virginia Tech Data Repository administrators. Preserving datasets larger than this size requires more of their attention.
Formatting your Data
Convert your files into more open, community-adopted or widely usable formats when feasible. For one broad example, comma-separated value (csv) files are more usable and more easily importable into computing applications than Excel spreadsheets.
It is possible that your data are created and analyzed in a format used by and most useful for a small research community. Consider providing the research data in two formats, the format used by your research community and in a format openly accessible to a wider user base.
Organizing your Data
If your dataset contains a large number of files (roughly more than ten) consider aggregating subsets of them into .zip, .tar or other archive file formats before uploading them. This will ease upload of the dataset for you and download of parts of the dataset for others. These subsets could be sub-folders within a project dataset folder, for example.
These archive file formats can also preserve a folder/directory structure, and this can provide valuable context for data you share. The figshare for institutions platform does not allow for the creation of or uploading of folders.
Documenting Your Data
Fields that are required for publishing on the Figshare for institutions platform are as follows. Depositors are strongly encouraged to fill in the optional fields as well.
- Title of dataset
- Authors of dataset - First and last names needed, authors can be ordered as the depositor wants
- Corresponding Author Name - person of contact for technical questions related to this dataset
- Categories - FOR codes for defining and grouping research
- Item Type - dataset
- Keywords - for improving discoverability of the dataset
- License - CC0 Public Domain Dedication - see “Appending a License to your Deposited Data” below
We will create a README file from your input to these fields. This README file will be included as a separate file in your published dataset.
In developing your documentation (including your dataset Description and the Files/Folders in Dataset and their Descriptions) consider how your data should be documented for your intended audience.
Are there any standards for metadata that are appropriate to this research community?
What information and context will need to be added so that this audience can understand or make use of your data?
If you were giving a colleague your data to use in another project and didn’t want them asking you questions every hour about it, what information would you need to give them?
This documentation could include settings on instrumentation or parameter settings for computer models that created or analyzed the data, references to publications or technical manuals that help to describe how the data were created, and computational libraries used in data generation or analysis. Think broadly!
Not all of this documentation needs to be included in the description fields but much of it can be.
References to documentation available online elsewhere can be linked from the Resource and References fields.
Uploading and Requesting Publication of Your Dataset
The Virginia Tech Data Repository uses the figshare for institutions platform for upload and sharing of research datasets.
figshare provides a great deal of well-written guidance on how to work within the system, including on
Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.
If you have any questions on how to work with the figshare for institutions platform, please contact us at email@example.com for assistance.
Appending a License to your Deposited Data
Datasets in the Virginia Tech Data Repository will have a Creative Commons Public Domain Dedication (CC0) applied upon publication.
Applying the CC0 to a dataset indicates to prospective re-users that they can distribute, remix, adapt, and build upon the material in any medium or format. This allows for maximal re-use of published datasets by both humans and machines.
All published datasets published in the Virginia Tech Data Repository will have data citations associated with them. Users of any dataset will be expected to cite their use following academic norms.
For a detailed rationale on why the Virginia Tech Data Repository Administrators strongly encourage the use of CC0 for all published datasets, read this blog post from our colleagues at Dryad “Why does Dryad use CC0?”.
Contact the Virginia Tech Data Repository Administrators to discuss other licensing options as needed.
Choosing a License for Published Software/Code
Depositors of software or code into the Virginia Tech Data Repository either as part of a dataset or as the whole dataset are required to include an open source license within the software or code.
Without the inclusion of an open source license, shared software or code is automatically under copyright and is unable to be reused legally without the permission of the depositor. For maximal re-use of code an open source license is required.
The Depositor can choose the open source license appropriate for desired re-use purposes. https://choosealicense.com/ provides a useful interface for making this decision. The Virginia Tech Data Repository Administrators recommend use of the MIT license or BSD 3 Clause License.
Connecting Your Published Data to You (via ORCID)
The Virginia Tech Data Repository allows Depositors to associate themselves with an ORCID id. Virginia Tech researchers are strongly encouraged to register themselves with ORCiD and associate themselves with an ORCiD within The Virginia Tech Data Repository.
An ORCID id is a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. You can connect your iD with your professional information — affiliations, grants, publications, peer review, and more. You can use your iD to share your information with other systems, ensuring you get recognition for all your contributions, saving you time and hassle, and reducing the risk of errors.
To learn more about ORCiD and other VT-local systems that an ORCID iD can be used in, visit ORCID at VT.
Last modified 24 August 2022