NSF Data Management Plan (DMP) Guide: Data Basics


Glossary of Terms

Defining Research Data

     Government Agencies

     University Policies (selected examples)


Defining Research Data

Research data is defined as "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings." This widely used definition is provided by the U.S. Office of Management and Budget (OMB Circular 110, last revised in 1993 and amended in 1999). More recently, in February 2013 the Executive Office of the President's Office of Science and Technology Policy (OSTP) published a memorandum entitled "Increasing Access to the Results of Federally Funded Scientific Research," consistently using the same definition.

OSTP's February 22, 2013 memo also made the Administration’s position on sharing research data clear—“digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” The 2013 memo reiterated that "research data does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens."

Providing an authoritative definition is helpful as a point of reference but is also challenging, because any definition of research data is likely to differ across disciplines and to depend on the context in which the question is asked. In some disciplines, research data is not so easily defined as in STEM subjects—in the Arts and Humanities, research data can be both tangible and intangible, digital and physical, and heterogeneous and infinite. For the purposes of a specific research agreement, the investigator and institution should review the funding agency’s particular definition and expectations. If the institution has existing (or is developing) research policies, the use of a broader and practical definition (e.g. types of data, formats, data lifecyle) will offer a more comprehensive and useful tool.

Virginia Tech's Policy 13015 (last revised in 2001) applies to ownership and retention of research data, results, and related records, but it does not define research data.

Data Lifecycle

Data Characteristics

Examples of Data

Examples of data include "a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen" (see International Digital Curation Center Glossary).

Types of Data

Research data include sensor data, instrument data, geospatial data, collated or aggregated data, observational data, experimental data, simulation data, numerical data, tabular data, textual data, audio/visual data or any other representation of information that can be communicated and reinterpreted by an expert. The type of data will affect your decisions about file organization, back-up formats, and short- and long-term access.

Data formats

Data formats are usually determined by how you gather and process data, i.e. the software used for data collection and analysis. Formats are also determined by: 1) norms and conventions of your discipline; 2) options you choose for storing and sharing data; 3) preferred options for preservation. There are optimal data formats that are used for long-term preservation.

Learn More about Data Management