Virginia Tech Data Repository: Preparing Data for Deposit
Preparing Data for Deposit
Before depositing your data in the Virginia Tech Data Repository and requesting its publication to receive a digital object identifier (DOI), go through the following steps in preparation.
For the purposes of archiving and sharing in our repository, data includes, or can solely be, software or code.
Contact vtdatarepo-g@vt.edu for assistance in preparing your data for deposit; we are here to help!
Read the Virginia Tech Data Repository Deposit Agreement
You will be required to read and agree with this Deposit Agreement and that you understand your roles and responsibilities as the Depositor.
For a quick comparison of the sort of documentation you must/should include with your dataset and a dataset that lacks good documentation, please see our Poor vs. Better Documentation guide.
To navigate this page, click on the tabs above for relevant topic guidance. If you want to see all guidance at once, select Show All.
Can you deposit and openly share your data?
Virginia Tech researchers should only deposit and openly share (publish) data for which they have the rights to do so, and in doing so do not violate laws or ethics.
More information on this requirement is given within the Virginia Tech Data Repository Deposit Agreement that you are required to agree to upon data deposit.
Files to Include in your Data Deposit
Be sure that you have on hand all relevant data and documentation for upload and deposit into the Virginia Tech Data Repository.
In depositing your data for publication there are a few questions you should ask yourself:
Who do I expect to make use of my data? Who is the intended audience?
For what purpose am I depositing data? What files containing data and documentation are needed to fit this purpose?
Should my deposit include code or software used to process or generate the data? Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.
Prior to publication you are able to upload files for your dataset in one or more sessions. However, any additional files to be added to a published dataset will necessitate a new version of this dataset and possibly a new DOI.
If the size of your dataset exceeds 25GB, please contact vtdatarepo-g@vt.edu to inform the Virginia Tech Data Repository administrators. Preserving datasets larger than this size requires more of their attention.
Formatting your Data
Convert your files into more open, community-adopted or widely usable formats when feasible. For one broad example, comma-separated value (csv) files are more usable and more easily importable into computing applications than Excel spreadsheets.
It is possible that your data are created and analyzed in a format used by and most useful for a small research community. Consider providing the research data in two formats, the format used by your research community and in a format openly accessible to a wider user base.
Organizing your Data
If your dataset contains a large number of files (roughly more than ten) consider aggregating subsets of them into .zip, .tar or other archive file formats before uploading them. This will ease upload of the dataset for you and download of parts of the dataset for others. These subsets could be sub-folders within a project dataset folder, for example.
These archive file formats can also preserve a folder/directory structure, and this can provide valuable context for data you share. The figshare for institutions platform does not allow for the creation of or uploading of folders.
Documenting Your Data
Fields that are required for publishing on the Figshare for institutions platform are as follows. Depositors are strongly encouraged to fill in the optional fields as well.
- Title of dataset
- Authors of dataset - First and last names needed, authors can be ordered as the depositor wants
- Corresponding Author Name - person of contact for technical questions related to this dataset
- Categories - FOR codes for defining and grouping research
- Item Type - dataset
- Keywords - for improving discoverability of the dataset
- License - CC0 Public Domain Dedication - see “Appending a License to your Deposited Data” below
- Description
We will create a README file from your input to these fields. This README file will be included as a separate file in your published dataset.
In developing your documentation (including your dataset Description and the Files/Folders in Dataset and their Descriptions) consider how your data should be documented for your intended audience.
Are there any standards for metadata that are appropriate to this research community?
What information and context will need to be added so that this audience can understand or make use of your data?
If you were giving a colleague your data to use in another project and didn’t want them asking you questions every hour about it, what information would you need to give them?
This documentation could include settings on instrumentation or parameter settings for computer models that created or analyzed the data, references to publications or technical manuals that help to describe how the data were created, and computational libraries used in data generation or analysis. Think broadly!
Not all of this documentation needs to be included in the description fields but much of it can be.
References to documentation available online elsewhere can be linked from the Resource and References fields. Note that the Resource Title and Resource DOI fields must both be filled or both be empty. If only one of these fields is filled, you will not be able to request publication of the dataset.
Uploading and Requesting Publication of Your Dataset
The Virginia Tech Data Repository uses the figshare for institutions platform for upload and sharing of research datasets.
figshare provides a great deal of well-written guidance on how to work within the system, including on
How to upload and publish your data. Please note that actions described in this guidance will take place on the Virginia Tech Data Repository as opposed to figshare.com.
Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.
If you have any questions on how to work with the figshare for institutions platform, please contact us at vtdatarepo-g@vt.edu for assistance.
Appending a License to your Deposited Data
Datasets in the Virginia Tech Data Repository will have a Creative Commons Public Domain Dedication (CC0) applied upon publication.
Applying the CC0 to a dataset indicates to prospective re-users that they can distribute, remix, adapt, and build upon the material in any medium or format. This allows for maximal re-use of published datasets by both humans and machines.
All published datasets published in the Virginia Tech Data Repository will have data citations associated with them. Users of any dataset will be expected to cite their use following academic norms.
For a detailed rationale on why the Virginia Tech Data Repository Administrators strongly encourage the use of CC0 for all published datasets, read this blog post from our colleagues at Dryad “Why does Dryad use CC0?”.
Contact the Virginia Tech Data Repository Administrators to discuss other licensing options as needed.
Choosing a License for Published Software/Code
Depositors of software or code into the Virginia Tech Data Repository either as part of a dataset or as the whole dataset are required to include an open source license within the software or code.
Without the inclusion of an open source license, shared software or code is automatically under copyright and is unable to be reused legally without the permission of the depositor. For maximal re-use of code an open source license is required.
The Depositor can choose the open source license appropriate for desired re-use purposes. https://choosealicense.com/ provides a useful interface for making this decision. The Virginia Tech Data Repository Administrators recommend use of the MIT license or BSD 3 Clause License.
Connecting Your Published Data to You (via ORCID)
The Virginia Tech Data Repository allows Depositors to associate themselves with an ORCID id. Virginia Tech researchers are strongly encouraged to register themselves with ORCiD and associate themselves with an ORCiD within The Virginia Tech Data Repository.
An ORCID id is a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. You can connect your iD with your professional information — affiliations, grants, publications, peer review, and more. You can use your iD to share your information with other systems, ensuring you get recognition for all your contributions, saving you time and hassle, and reducing the risk of errors.
To connect your Virginia Tech Data Repository (figshare for institutions) account with your ORCID account, follow the instructions in this figshare article. To learn more about ORCiD and other VT-local systems that an ORCID iD can be used in, visit ORCID at VT.
Updating Your Published Dataset
Depositors sometimes need to update their published datasets. Examples of this need are when the Depositor’s manuscript is accepted and their Resource Title and DOI needs to be updated in their README file and metadata fields, or when the Depositor needs to update the Title field or files in their dataset following the manuscript review process.
For details on updating a published dataset, please refer to https://help.figshare.com/article/how-to-edit-or-delete-my-data, ‘Public Items’.
After you make the necessary changes, check the “Publish changes” box and click "Save changes". This updated article will then be reviewed by the curators and published as a new version.
Please note that the base DOI/citation of the published dataset provided to you via e-mail will always point to the latest version. For example: if the DOI of the published article provided to you was https://doi.org/10.7294/199043, and the article was updated and published as version2, the base doi https://doi.org/10.7294/199043 will then point to version 2. To access the older version i.e. version1 in this case https://doi.org/10.7294/199043.v1 should be used.
Before depositing your data in the Virginia Tech Data Repository and requesting its publication to receive a digital object identifier (DOI), go through the following steps in preparation.
For the purposes of archiving and sharing in our repository, data includes, or can solely be, software or code.
Contact vtdatarepo-g@vt.edu for assistance in preparing your data for deposit; we are here to help!
Read the Virginia Tech Data Repository Deposit Agreement
You will be required to read and agree with this Deposit Agreement and that you understand your roles and responsibilities as the Depositor.
For a quick comparison of the sort of documentation you must/should include with your dataset and a dataset that lacks good documentation, please see our Poor vs. Better Documentation guide.
Can you deposit and openly share your data?
Virginia Tech researchers should only deposit and openly share (publish) data for which they have the rights to do so, and in doing so do not violate laws or ethics.
More information on this requirement is given within the Virginia Tech Data Repository Deposit Agreement that you are required to agree to upon data deposit.
Files to Include in your Data Deposit
Be sure that you have on hand all relevant data and documentation for upload and deposit into the Virginia Tech Data Repository.
In depositing your data for publication there are a few questions you should ask yourself:
Who do I expect to make use of my data? Who is the intended audience?
For what purpose am I depositing data? What files containing data and documentation are needed to fit this purpose?
Should my deposit include code or software used to process or generate the data? Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.
Prior to publication you are able to upload files for your dataset in one or more sessions. However, any additional files to be added to a published dataset will necessitate a new version of this dataset and possibly a new DOI.
If the size of your dataset exceeds 25GB, please contact vtdatarepo-g@vt.edu to inform the Virginia Tech Data Repository administrators. Preserving datasets larger than this size requires more of their attention.
Formatting your Data
Convert your files into more open, community-adopted or widely usable formats when feasible. For one broad example, comma-separated value (csv) files are more usable and more easily importable into computing applications than Excel spreadsheets.
It is possible that your data are created and analyzed in a format used by and most useful for a small research community. Consider providing the research data in two formats, the format used by your research community and in a format openly accessible to a wider user base.
Organizing your Data
If your dataset contains a large number of files (roughly more than ten) consider aggregating subsets of them into .zip, .tar or other archive file formats before uploading them. This will ease upload of the dataset for you and download of parts of the dataset for others. These subsets could be sub-folders within a project dataset folder, for example.
These archive file formats can also preserve a folder/directory structure, and this can provide valuable context for data you share. The figshare for institutions platform does not allow for the creation of or uploading of folders.
Documenting Your Data
Fields that are required for publishing on the Figshare for institutions platform are as follows. Depositors are strongly encouraged to fill in the optional fields as well.
- Title of dataset
- Authors of dataset - First and last names needed, authors can be ordered as the depositor wants
- Corresponding Author Name - person of contact for technical questions related to this dataset
- Categories - FOR codes for defining and grouping research
- Item Type - dataset
- Keywords - for improving discoverability of the dataset
- License - CC0 Public Domain Dedication - see “Appending a License to your Deposited Data” below
- Description
We will create a README file from your input to these fields. This README file will be included as a separate file in your published dataset.
In developing your documentation (including your dataset Description and the Files/Folders in Dataset and their Descriptions) consider how your data should be documented for your intended audience.
Are there any standards for metadata that are appropriate to this research community?
What information and context will need to be added so that this audience can understand or make use of your data?
If you were giving a colleague your data to use in another project and didn’t want them asking you questions every hour about it, what information would you need to give them?
This documentation could include settings on instrumentation or parameter settings for computer models that created or analyzed the data, references to publications or technical manuals that help to describe how the data were created, and computational libraries used in data generation or analysis. Think broadly!
Not all of this documentation needs to be included in the description fields but much of it can be.
References to documentation available online elsewhere can be linked from the Resource and References fields. Note that the Resource Title and Resource DOI fields must both be filled or both be empty. If only one of these fields is filled, you will not be able to request publication of the dataset.
Uploading and Requesting Publication of Your Dataset
The Virginia Tech Data Repository uses the figshare for institutions platform for upload and sharing of research datasets.
figshare provides a great deal of well-written guidance on how to work within the system, including on
How to upload and publish your data. Please note that actions described in this guidance will take place on the Virginia Tech Data Repository as opposed to figshare.com.
Note that the Virginia Tech Data Repository current figshare for institutions platform allows for integration of GitLab, GitHub and Bitbucket accounts for ease of code or software inclusion.
If you have any questions on how to work with the figshare for institutions platform, please contact us at vtdatarepo-g@vt.edu for assistance.
Appending a License to your Deposited Data
Datasets in the Virginia Tech Data Repository will have a Creative Commons Public Domain Dedication (CC0) applied upon publication.
Applying the CC0 to a dataset indicates to prospective re-users that they can distribute, remix, adapt, and build upon the material in any medium or format. This allows for maximal re-use of published datasets by both humans and machines.
All published datasets published in the Virginia Tech Data Repository will have data citations associated with them. Users of any dataset will be expected to cite their use following academic norms.
For a detailed rationale on why the Virginia Tech Data Repository Administrators strongly encourage the use of CC0 for all published datasets, read this blog post from our colleagues at Dryad “Why does Dryad use CC0?”.
Contact the Virginia Tech Data Repository Administrators to discuss other licensing options as needed.
Choosing a License for Published Software/Code
Depositors of software or code into the Virginia Tech Data Repository either as part of a dataset or as the whole dataset are required to include an open source license within the software or code.
Without the inclusion of an open source license, shared software or code is automatically under copyright and is unable to be reused legally without the permission of the depositor. For maximal re-use of code an open source license is required.
The Depositor can choose the open source license appropriate for desired re-use purposes. https://choosealicense.com/ provides a useful interface for making this decision. The Virginia Tech Data Repository Administrators recommend use of the MIT license or BSD 3 Clause License.
Connecting Your Published Data to You (via ORCID)
The Virginia Tech Data Repository allows Depositors to associate themselves with an ORCID id. Virginia Tech researchers are strongly encouraged to register themselves with ORCiD and associate themselves with an ORCiD within The Virginia Tech Data Repository.
An ORCID id is a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. You can connect your iD with your professional information — affiliations, grants, publications, peer review, and more. You can use your iD to share your information with other systems, ensuring you get recognition for all your contributions, saving you time and hassle, and reducing the risk of errors.
To connect your Virginia Tech Data Repository (figshare for institutions) account with your ORCID account, follow the instructions in this figshare article. To learn more about ORCiD and other VT-local systems that an ORCID iD can be used in, visit ORCID at VT.
Updating Your Published Dataset
Depositors sometimes need to update their published datasets. Examples of this need are when the Depositor’s manuscript is accepted and their Resource Title and DOI needs to be updated in their README file and metadata fields, or when the Depositor needs to update the Title field or files in their dataset following the manuscript review process.
For details on updating a published dataset, please refer to https://help.figshare.com/article/how-to-edit-or-delete-my-data, ‘Public Items’.
After you make the necessary changes, check the “Publish changes” box and click "Save changes". This updated article will then be reviewed by the curators and published as a new version.
Please note that the base DOI/citation of the published dataset provided to you via e-mail will always point to the latest version. For example: if the DOI of the published article provided to you was https://doi.org/10.7294/199043, and the article was updated and published as version2, the base doi https://doi.org/10.7294/199043 will then point to version 2. To access the older version i.e. version1 in this case https://doi.org/10.7294/199043.v1 should be used.
Last modified 8 October 2024