Virginia Tech Data Repository: Preparing Data for Deposit

Preparing Data for Deposit

Before depositing your data in the Virginia Tech Data Repository and requesting its publication to receive a digital object identifier (DOI), go through the following steps in preparation.

For the purposes of archiving and sharing research outputs in our repository, ‘data’ includes, or can solely be, software or code.

Contact vtdatarepo-g@vt.edu for assistance in preparing your data for deposit; we are here to help!

Read the Virginia Tech Data Repository Deposit Agreement

You will be required to read and agree to this Deposit Agreement and understand your roles and responsibilities as the Depositor.

To navigate this page, click on the tabs above for relevant topic guidance. If you want to see all guidance at once, select Show All.

For a quick comparison of the sort of documentation you should include with your dataset against a dataset that lacks good documentation, please see our Poor vs. Better Documentation guide.

Can you deposit and openly share your data?

Virginia Tech researchers should only deposit and openly share (publish) data for which they have the rights to do so, and in doing so, do not violate any laws or ethics.

More information on this requirement is given within the Virginia Tech Data Repository Deposit Agreement that you are required to agree to upon data deposit.

Refer to our Human Participants Data Guide for consideration of sharing data originating from human participants.

Files to Include in your Data Deposit

Be sure that you have on hand all relevant data and documentation for upload and deposit into the Virginia Tech Data Repository.

In depositing your data for publication, there are a few questions you should ask yourself:

Who do I expect to make use of my data? Who is the intended audience?

For what purpose am I depositing data? What files containing data and documentation are needed to fit this purpose? Do I need further documentation to make the data understandable to my intended audience, such as a data dictionary?

Should my deposit include code or software used to process or generate the data? Note that the Virginia Tech Data Repository’s current Figshare for Institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.

Prior to publication you are able to upload files for your dataset in one or more sessions. However, any additional files to be added to a published dataset will necessitate a new version of this dataset and possibly a new DOI.

If the size of your dataset exceeds 25GB, please contact vtdatarepo-g@vt.edu to inform the Virginia Tech Data Repository administrators. Preserving datasets larger than this size requires more administrator time and effort.

Formatting your Data

Convert your files into more open, community-adopted or widely usable formats when feasible. For one broad example, comma-separated value (CSV) files are more usable and more easily importable into computing applications than Excel spreadsheets.

It is possible that your data are created and analyzed in a format used by, and most useful for, a small research community. Consider providing the research data in two formats: the format used by your research community and in a format openly accessible to a wider user base.

Organizing your Data

If your dataset contains a large number of files (roughly more than ten) consider aggregating subsets of them into .zip, .tar or other archive file formats before uploading them. This will ease the upload of the dataset for you and the download of parts of the dataset for others. These subsets could be sub-folders within a project dataset folder, for example. These archive file formats can also preserve a folder/directory structure, and this can provide valuable context for data you share.

The Figshare for Institutions platform also allows for the uploading of folders that will keep your folder structure in the system. However, the use of archive file formats is recommended unless each individual file is large and the download of a ZIP or TAR file containing many large files would be difficult for your intended audience.

Documenting Your Data

Fields that are required for publishing on the Figshare for institutions platform are as follows. Depositors are strongly encouraged to fill in the optional fields as well.

Title of dataset
Group - This is automatically filled out, do not change it.
Item Type - Dataset or Software
Authors of dataset - first and last names needed, authors can be ordered as the depositor wants
Categories - FOR codes for defining and grouping research
Keywords - for improving discoverability of the dataset
Description
License - CC0 Public Domain Dedication - see “Appending a License to your Deposited Data” below
Corresponding Author Name - person of contact for technical questions related to this dataset
Files/Folders in Dataset and their Descriptions

We will create a README file from your input to these fields. This README file will be included as a separate file in your published dataset.

In developing your documentation (including your dataset Description and the Files/Folders in Dataset and their Descriptions) consider how your data should be documented for your intended audience. If your dataset contains archive file formats (e.g. zips or tars) be sure to include descriptions of the contents of these archive files, including the archived directory structure.

Are there any standards for metadata that are appropriate to this research community?

What information and context will need to be added so that this audience can understand or make use of your data?

If you were giving a colleague your data to use in another project and didn’t want them asking you questions every hour about it, what information would you need to give them?

This documentation could include settings on instrumentation or parameter settings for computer models that created or analyzed the data, references to publications or technical manuals that help to describe how the data were created, and computational libraries used in data generation or analysis. Think broadly!

Not all of this documentation needs to be included in the description fields, but much of it can be.

References to documentation available online elsewhere can be linked from the Related Materials field. This documentation could include manuscripts the published dataset supports or references, or technical manuals about data or modeling output creation. Use of the Related Materials field is described by figshare in how to upload and publish your data, #9 under ‘Add descriptive metadata’.

Uploading and Requesting Publication of Your Dataset

The Virginia Tech Data Repository uses the Figshare for Institutions platform for upload and sharing of research datasets.

Figshare provides a great deal of well-written guidance on how to work within the system, including on

How to upload and publish your data. Please note that actions described in this guidance will take place on the Virginia Tech Data Repository as opposed to figshare.com.

Note that the Virginia Tech Data Repository current Figshare for Institutions platform allows for integration of GitLab, GitHub, and Bitbucket accounts for ease of code or software inclusion.

If you have any questions on how to work with the Figshare for Institutions platform, please contact us at vtdatarepo-g@vt.edu for assistance.

Appending a License to your Deposited Data

Datasets in the Virginia Tech Data Repository will have a Creative Commons Public Domain Dedication (CC0) applied upon publication.

Applying the CC0 to a dataset indicates to prospective re-users that they can distribute, remix, adapt, and build upon the material in any medium or format. This allows for maximal re-use of published datasets by both humans and machines.

All published datasets published in the Virginia Tech Data Repository will have data citations associated with them. Users of any dataset will be expected to cite their use following academic norms.

For a detailed rationale on why the Virginia Tech Data Repository Administrators strongly encourage the use of CC0 for all published datasets, read this blog post from our colleagues at Dryad, “Why does Dryad use CC0?”.

Contact the Virginia Tech Data Repository Administrators at vtdatrepo-g@vt.edu to discuss other licensing options as needed.

Choosing a License for Published Software/Code

Depositors of software or code into the Virginia Tech Data Repository, either as part of a dataset or as the whole dataset, are required to include an open source license within the software or code.

Without the inclusion of an open source license, shared software or code is automatically under copyright and is unable to be reused legally without the permission of the depositor. For maximal re-use of code, an open source license is required.

The Depositor can choose the open source license appropriate for desired re-use purposes. https://choosealicense.com/ provides a useful interface for making this decision. The Virginia Tech Data Repository Administrators recommend use of the MIT license or BSD 3 Clause License.

Connecting Your Published Data to You (via ORCID)

The Virginia Tech Data Repository allows Depositors to associate themselves with an ORCID id. Virginia Tech researchers are strongly encouraged to register themselves with ORCiD and associate themselves with an ORCiD within The Virginia Tech Data Repository.

An ORCID iD is a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. You can connect your iD with your professional information — affiliations, grants, publications, peer review, and more. You can use your iD to share your information with other systems, ensuring you get recognition for all your contributions, saving you time and hassle, and reducing the risk of errors.

To connect your Virginia Tech Data Repository (Figshare for Institutions) account with your ORCID account, follow the instructions in this figshare article. To learn more about ORCiD and other VT-local systems that an ORCID iD can be used in, visit ORCID at VT.

Updating Your Published Dataset

Depositors sometimes need to update their published datasets. Examples of this need are when the Depositor’s manuscript is accepted and their Resource Title and DOI needs to be updated in their README file and metadata fields, or when the Depositor needs to update the Title field or files in their dataset following the manuscript review process.

For details on updating a published dataset, please refer to https://help.figshare.com/article/how-to-edit-or-delete-my-data, ‘Public Items’.

After you make the necessary changes, click “Submit for Review". This updated dataset will then be reviewed by the curators and published as a new version.

Please note that the base DOI/citation of the published dataset provided to you via e-mail will always point to the latest version. For example: if the DOI of the published article provided to you was https://doi.org/10.7294/199043, and the article was updated and published as version2, the base doi https://doi.org/10.7294/199043 will then point to version 2. To access the older version i.e. version1 in this case https://doi.org/10.7294/199043.v1 should be used.