Finding Data: Introduction

What is Data?

Data are informational values (numbers, text, images, ...) that are used in research, business, policy, and other areas, usually with additional context.

Tip: For consultations on how to design and use statistical analysis check out the Statistical Applications and Innovations Group (SAIG) at Virginia Tech, via the Department of Statistics. SAIG's About page states, "Our mission is to provide statistical advice, analysis, and education to Virginia Tech researchers by offering individual collaboration meetings, walk-in consulting, educational short courses, and support for interdisciplinary research projects."

Use this guide to:

Image of a person's hand holding a light blue and white sphere with 1s and 0s on it in black type - representing data (on a medium blue background)

image by geralt via Pixabay, released to the public domain (CC0)

Data Services

Questions about data?

Contact Data Services at the library:

What are: datasets, data repositories, open data, and data citations?

dataset is a defined, intentional collection of data points (informational values) with at least minimal description. For example, if information was collected about how many times students used computers around campus, the resulting dataset might be a spreadsheet with column labels such as: start time, end time, duration, day of the week, date, location; along with values listed for each student use. 

A curated dataset includes further information, such as a READ ME or description document that describes the purpose for collecting the information, why and how it was collected in this way, and how the data was analyzed or used after collection. 

A data repository houses datasets. Data repositories may be openly accessible or restricted; they may have an open submission policy, or they may be focused on a particular topic, purpose, or community; they may preserve datasets for the long term; and they may provide additional tools or resources.

Open data, increasingly provided by researchers and organizations, refers to datasets that are shared in some way with the general public. At a minimum, a detailed description of a dataset is provided, and some datasets may be viewable, downloadable, and/or shared in a way that allows for their re-use by others. Sometimes ethical, legal, or other restrictions prevent sharing of data, but open data initiatives are making more and more information available for all of us to learn from or use. 

Data citations are used to reference a dataset that has been published or described and made publicly available, to credit the source of the dataset.

  • *When referencing a dataset, or re-using and citing a dataset;
  • Best Practice includes citing the original researcher/s' publication/s discussing the original study's analysis, results, and conclusions drawn from the dataset as well.
  • Visit the 'Citing Datasets' page for details on how and when to cite a dataset