NSF Data Management Plan (DMP) Guide: Writing DMPs

What to Include in Your DMP

DMP content suggested by NSF general requirements, Directorate- and/or Division-level guidelines are similar but different in structure or wording. We thus developed the following questions to help you identify the five or six different sections in the DMP template when drafting your data management plan. You can download a list of questions (DMP Checklist) from here.

DMP content suggested by NSF guidelines
Questions to consider
  • Types of data produced
  • Product of research
  • Expected data
  • What type of data will be produced in the research?
  • How much data will it be, and at what growth rate?
  • How and when will the data be collected? What software is required to produce, analyze, read, or view the data?
  • Will you use existing data? If so, where is it from and why was it chosen for this research?
  • Data and metadata standards
  • Data format
  • Data collected, formats and standards
  • What file formats will be used? Are they standard to your field and/or proprietary?
  • What file naming conventions will be used for your data?
  • What contextual details (metadata) will you generate (automatically and/or manually) for others to understand and use your data?
  • What metadata standard(s) will you select, and why? (e.g. accepted domain-local standard, widespread usage, software-generated)
  • Will you track versions of your data? Will you use any version control software in doing so?
  • Policies for access and sharing
  • Policies for data sharing and public access
  • Access to data, and data sharing practices and policies
  • Dissemination methods
  • Data dissemination and policies for public access, sharing and publication delays
  • Policies for access and sharing, and provisions for appropriate protection/privacy
  • Which of the data used or generated during the project will be shared?
  • When and how will you share these data?
  • Will there be any embargo periods for political/commercial/patent reasons?
  • Does the data have to be protected (e.g. access restricted to only certain authorized users) and, if so, what is your plan for protection?
  • Does sharing the data raise privacy, ethical, or confidentiality concerns and, if so, how will they be addressed?
  • Policies for re-use, redistribution
  • Policies and provisions for re-use, re-distribution and production of derivatives
  • Will you permit reuse, redistribution, or the creation of new tools, services, data, or products (derivatives), and will commercial use be allowed?
  • How will you make your data available for re-use?
  • Who is expected to use your data (in the near and long future)?
  • How should users of your shared data give you credit? (e.g. through data citation or in the acknowledgement section of a publication)?
  • If your data are in an uncommon or proprietary format, will they be converted to a more common non-proprietary format for reuse?
  • Could a licensing approach or particularly a Creative Commons License serve your goals for encouraging, simplifying, and setting parameters for reuse?
  • Plans for archiving and preservation
  • Archiving of data
  • Data storage and preservation
  • Data storage and preservation of access
  • Which of the data used or generated during the project will be stored or archived after the project?
  • Will you archive your data in data repositories? Will you deposit your data into Virginia Tech's institutional repository VTechWorks? (If you plan to deposit your data into VTechWorks, see the bottom of this page for the boilerplate language.) If depositing into one of many discipline-specific data repositories, which one will you use, and why?
  • If using a service outside of your project team or institution to archive your data, will there be a formal archiving agreement? (e.g. Co-PI's institution, discipline-specific data repositories, journal publishers)
  • What transformations will be necessary to prepare data for preservation? (e.g. data cleaning, anonymization, converting your data to more stable file formats)
  • Roles and responsibilities
  • What are the responsibilities of staff and investigators for managing the data generated during and after the project?
  • Who is responsible for each data management activity to ensure the DMP is reviewed and implemented?
  • Who will have responsibility for decisions about the data once all the original personnel are no longer associated with the project?
  • Is there a formal process for transferring responsibility for the data should a PI or co-PI leave his or her institution?
  • Period of data retention
  • Which of the data you plan to generate will have long-term value to others?
  • How long will you keep your data beyond the life of the project? (e.g. 3-5 years, 10-20 years)
  • Which datasets will be archived (preserved for the long-term) and made available, and which will not?
  • Who maintain your data for the long-term?
  • Additional possible data management requirements
  • Cost of implementing the DMP
  • Who will manage and administer the preserved or archived data? Is additional specialist expertise (or training for existing staff) required?
  • Who will bear the cost associated with data preparation, management, and preservation?

DMP Boilerplate Language for VTechWorks Users

If you plan to deposit your data into the institutional research data repository at Virginia Tech, VTechData, we prepared the following language to integrate in your DMP. If you are using this language, then please contact data consultants at dmpreview@vt.edu to help ensure you have a strong plan for managing data.

"Datasets selected for sharing will be published and made accessible through VTechData (https://data.lib.vt.edu/) managed by the University Libraries at Virginia Tech. VTechData highlights and provides access to data generated at Virginia Tech. The system relies on item and dataset level metadata as the primary building block to data discovery, access, and reuse. Published datasets are to be made accessible for at least five years.

University Libraries’ personnel provide advice and some assistance on organizing, documenting and otherwise curating research data to improve its discoverability and reusability. The original and curated datasets are published according to best practices developed by the University Libraries and accepted by the disciplinary communities. VTechData also provides researchers persistent digital object identifiers and data citations for published datasets. Researchers can assign licenses according to their public data sharing interests."