Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Virginia Tech Data Repository: Preparing Data for Deposit

Preparing Data For Deposit

Before depositing your data in the Virginia Tech Data Repository and requesting its publication to receive a digital object identifier (DOI), go through the following steps in preparation.

For the purposes of archiving and sharing in a repository, data includes, or can solely be, software or code.

Contact vtechdata@vt.edu for assistance in preparing your data for deposit; we are here to help!

Read the Virginia Tech Data Repository Deposit Agreement

You will be required to read and agree with this Deposit Agreement and that you understand your roles and responsibilities as the Depositor. 

Can you deposit and openly share your data? 

Virginia Tech researchers should only deposit and openly share (publish) data for which they have the rights to do so, and in doing so do not violate laws or ethics. 

More information on this requirement is given within the Virginia Tech Data Repository Deposit Agreement that you are required to agree to upon data deposit.

Files to Include in your Data Deposit

Be sure that you have on hand all relevant data and documentation for upload and deposit into the Virginia Tech Data Repository.

In depositing your data for publication there are a few questions you should ask yourself:

Who do I expect to make use of my data? Who is the intended audience?

For what purpose am I depositing data? What files containing data and documentation are needed to fit this purpose?

Should my deposit include code or software used to process or generate the data? Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.

Prior to publication you are able to upload files for your dataset in one or more sessions. However, any additional files to be added to a published dataset will necessitate a new version of this dataset and possibly a new DOI.

If the size of your dataset exceeds 25GB, please contact vtechdata@vt.edu to inform the Virginia Tech Data Repository administrators. Preserving datasets larger than this size requires more of their attention.

Formatting your Data

Convert your files into more open, community-adopted or widely usable formats when feasible. For one broad example, comma-separated value (csv) files are more usable and more easily importable into computing applications than Excel spreadsheets.

It is possible that your data are created and analyzed in a format used by and most useful for a small research community. Consider providing the research data in two formats, the format used by your research community and in a format openly accessible to a wider user base.

Organizing your Data

If your dataset contains a large number of files (roughly more than ten) consider aggregating subsets of them into .zip, .tar or other archive file formats before uploading them. This will ease upload of the dataset for you and download of parts of the dataset for others. These subsets could be sub-folders within a project dataset folder, for example. 

These archive file formats can also preserve a folder/directory structure, and this can provide valuable context for data you share. The figshare for institutions platform does not allow for the creation of or uploading of folders.

Documenting Your Data

We require you create a README file for inclusion in your data deposit. Use the template at this link prior to depositing your data; it will ease the deposit process in the Virginia Tech Data Repository.

In developing this README file, consider how your data should be documented for your intended audience. 

Are there any standards for metadata that are appropriate to this research community?

What information and context will need to be added so that this audience can understand or make use of your data? 

If you were giving a colleague your data to use in another project and didn’t want them asking you questions every hour about it, what information would you need to give them?

Not all of this documentation needs to be included in the README file, but much of it can be. This documentation could include settings on instrumentation or parameter settings for computer models that created or analyzed the data, references to publications or technical manuals that help to describe how the data were created, and computational libraries used in data generation or analysis. Think broadly!

 

At a minimum the README file must include a list of all the files in the dataset, and a brief description of each file.

 

Fields that are required for publishing on the Figshare for institutions platform are as follows. Depositors are strongly encouraged to fill in the optional fields as well.

  • Title of dataset
  • Authors of dataset - First and last names needed, authors can be ordered as the depositor wants
  • Categories - FOR codes for defining and grouping research
  • Item Type - dataset
  • Keywords - for improving discoverability of the dataset
  • Description   
  • License - CC0 Public Domain Dedication - see “Appending a License to your Deposited Data” below

Uploading and Requesting Publication of Your Dataset

 

The Virginia Tech Data Repository uses the figshare for institutions platform for upload and sharing of research datasets.

 

figshare provides a great deal of well-written guidance on how to work within the system, including on

How to upload and publish your data. Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.

 

If you have any questions on how to work with the figshare for institutions platform, please contact us at vtechdata@vt.edu for assistance.

Appending a License to your Deposited Data

Datasets in the Virginia Tech Data Repository will have a Creative Commons Public Domain Dedication (CC0) applied upon publication.

Applying the CC0 to a dataset indicates to prospective re-users that they can distribute, remix, adapt, and build upon the material in any medium or format. This allows for maximal re-use of published datasets by both humans and machines.

All published datasets published in the Virginia Tech Data Repository will have data citations associated with them. Users of any dataset will be expected to cite their use following academic norms.

For a detailed rationale on why the Virginia Tech Data Repository Administrators strongly encourage the use of CC0 for all published datasets, read this blog post from our colleagues at Dryad “Why does Dryad use CC0?”. 

Contact the Virginia Tech Data Repository Administrators to discuss other licensing options as needed.

Choosing a License for Published Software/Code

Depositors of software or code into the Virginia Tech Data Repository either as part of a dataset or as the whole dataset are required to include an open source license within the software or code.

Without the inclusion of an open source license, shared software or code is automatically under copyright and is unable to be reused legally without the permission of the depositor. For maximal re-use of code an open source license is required.

The Depositor can choose the open source license appropriate for desired re-use purposes. https://choosealicense.com/ provides a useful interface for making this decision. The Virginia Tech Data Repository Administrators recommend use of the MIT license or BSD 3 Clause License.

Connecting Your Published Data to You (via ORCID)

The Virginia Tech Data Repository allows Depositors to associate themselves with an ORCID id. Virginia Tech researchers are strongly encouraged to register themselves with ORCiD and associate themselves with an ORCiD within The Virginia Tech Data Repository.

 

An ORCID id is a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. You can connect your iD with your professional information — affiliations, grants, publications, peer review, and more. You can use your iD to share your information with other systems, ensuring you get recognition for all your contributions, saving you time and hassle, and reducing the risk of errors.

 

To learn more about ORCiD and other VT-local systems that an ORCID iD can be used in, visit ORCID at VT

Last modified 6 July 2021