It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
This part of the guide identifies key collections of data that can be exported and analyzed in analytical software. In general, you can select microdata based on narrative descriptions and data documentation. See the aggregate data tab for data in tables, which often can be exported and combined for analysis.
Data can be hard to find and to work with. The VT libraries has a team of informatics consultants to help you with methodology, interpretation, visualization, and management/curation of your research data. Some tabs in this guide are maintained by members of the library's data service group.
Overview: Finding data can be tricky
It can be hard to describe, much less find, data you want. The VT library's Discovery Search does not distinguish datafrom other kinds of information: it won't help you identify aggregate numerical data in tables nor datasets of microdata analyzed with sophisticated quantitative or qualitative software.
a tool to discover with scope of data from around the world and, often more important, to point you to data providers who might have more for you
and as a rich and varied source of numerical information in its own right, with some interesting visualization options
It's tempting to start search engines to find numerical and geospatial data. This is a good start for finding what data exist about something. They may not be as effective if you're interested in the variety of data available about a place.
Their reliability depends on how dataset providers comply with technical standards for describing data (ie, metadata). As with any search, your search terms have to match the works the people who compiled the data put in the titles or headings of tables.
Only a few of our data providers (notably ICPSR, Roper Center, Harvard's Dataverse network) permit searching for data by variable name (which varies with the researcher).
Often the best way to start finding data is by asking yourself what kind of agency or business or research institution might have an interest in counting the people, things, behaviors (or whatever) that you want data on. Go to the source and dig around. Some words/labels that can signal where data lurk: data, dataset. indicators, repository, statistics, archive, visualization.
Compilations of data, such as traditional statistical abstracts or PolicyMap usually group aggregate data according to predefined themes that may reflect the goals, practices, and jargon of the compilers -- not always you.
News articles will often identify researchers who have not yet published their data in academic sources.
When you find a relevant table of data use it as a discovery tool: scrutinize source notes and other annotations and trace back to the providers for further -- perhaps more recent -- data.
Sometimes, the words in notes, titles, and column headings that can be effective search terms.
Statista combines market research and analysis, content and information design, ecommerce, global consumer surveys, and digital and consumer market outlooks, consolidating statistical data on over 80,000 topics from more than 22,500 sources and makes it available on four platforms: German, English, French and Spanish.
IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community context. Data and services available free of charge.
Eurostat, the statistical office of the European Union, aims to compile and make available statistics at European level that enable comparisons between countries and regions. Eurostat coordinates the statistical activities of the institutions and bodies of the Union, in particular with a view to ensuring consistency and quality of the data and minimising reporting burden.
Create downloadable, country-level reports drawing on 200 indicators to track how different societies perform along six indicess of social development. The indices allow estimating the effects of social development for a large range of countries on indicators like economic growth, human development, and governance.
Data about policymakers rather than citizens. CAP monitors policy processes by tracking the actions that governments take in response to the challenges they face, classifying policy activities into a single, universal and consistent coding scheme. These activities can take many different forms, including debating a problem, delivering speeches, (eg, the Queen’s speech in the United Kingdom), holding hearings, introducing or enacting laws (eg, Bills and Public Laws in the United States) or issuing judicial rulings (eg, rulings from the European Court of Justice).
Free resource. Election Passport provides free access to a rich dataset of constituency election results in over 100 countries and territories throughout the world. The data are unusually complete, including votes won by very small parties, independents, and frequently candidate names, that are difficult to locate. Additional elections are regularly added.
Free resource. Global Elections Database (formerly known as the Constituency-Level Elections Dataset, 2007) provides information on the results of both national and subnational elections around the world. These data are presented at two levels of analysis, allowing users to quickly identify the results of elections within a country as a whole or within particular constituencies or districts of a country. All parties are included in the database regardless of the number of votes that they won. The data are based on countries' official election results and have been amassed from various government institutions. The data are accessible in multiple formats: spreadsheets; tables; GIS maps.
Access to these data requires you to createa free, personal account, which then allows you to save customized datasets for future reference and to receive automatic updates to the data when they become available.
Free resource. Constituency-Level Elections Archive (CLEA) is a repository of detailed election results at the constituency level for lower house legislative elections from around the world. Purpose is to preserve and consolidate these valuable data in one comprehensive and reliable resource that is ready for analysis and publicly available at no cost for research, education, and policy-making.
Searchable archive of datasets and data-related articles. Part of international "Dataverse Project," which is both a network of data repositories and a project to develop open source research data repository software.
ICPSR Bibliography of Data-related Literature is a freely-available, searchable database of citations to published and unpublished scholarly works. The database currently contains over 93,000 citations, with hundreds more added every month. Each citation has two-way links: out to the publication and into ICPSR’s study catalog, providing access to the data being analyzed in the publications. Because of these linkages, the Bibliography facilitates data discovery and literature searches by social scientists, students, librarians, journalists, policymakers, and funding agencies.
If you prefer to start by browsing by topic or place:
"The mission of Our World in Data is to make data and research on the world’s largest problems understandable and accessible." Leverage interpretative essays and data visualizations of very long run trends in policy problems to inform your own (re)search. Broadly organized around the UN Global Sustainability goals, themes include health, food provision, the growth and distribution of incomes, violence, rights, wars, culture, energy use, education, and environmental changes.
Produced by the Oxford Martin Programme on Global Development at the University of Oxford.
The ISSP is a cross-national survey program conducting annual surveys in a broad group of countries. The survey asks questions on a variety of topics. You can download full datasets or analyze online through the GESIS Archive.
Gateway to public sector information available on public data portals across European countries. Organized by topic. Also provides information regarding the provision of data and the benefits of re-using data.
re3data.org is a global registry of research data repositories that covers research data repositories from different academic disciplines. It presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions. The registry is funded by the German Research Foundation (DFG). Offers an interesting visual subject browse.
Provides data and analysis on the issues, attitudes and trends shaping the United States and the world. Datasets in seven areas, including U.S. Politics & Policy, Journalism & Media, Internet, Science & Tech, Religion & Public Life, Hispanic Trends, Global Attitudes & Trends, Social & Demographic Trends. Create a fee account to download Pew dataasets.
"Focusing on human interactions in the environment, SEDAC has as its mission to develop and operate applications that support the integration of socioeconomic and earth science data and to serve as an 'Information Gateway' between earth sciences and social sciences." Search, browse, download data, maps, other tools and resources. Hosted by CIESIN at Columbia University.
Global directory of academic open-access repositories (some will include datasets, but will also have many other information formats, such as articles, white papers, and more). Browse to repositories by global region and country. Search is limited to repository names, not to repository contents.
A handy point of departure, this libguide from UC San Diego identifies key data providers and major statistical publications from US federal agencies. (Also includes some databases restricted to UCSD.)
ICPSR is a large, searchable repository for social and behavioral science research datasets, covering political science, sociology, economics, demography, and interdisciplinary areas. ICPSR also curates and distributes data from public sources (like federal statistical agencies) with many value-added features, maintaining several topical archives in the areas of demography, criminal justice, mental health, aging, child care, and education.
Virginia Tech's institutional membership entitles members of the VT community both to download datasets and to deposit their research data for permanent curation and access; create a free ICPSR "My data" account and log in with it in order to download data.
Some datasets have access/use restrictions that may require approval by VT's institutional review board (among other offices) and by ICPSR prior to access; in some cases you researchers are required to work only in secure "data enclaves." For highly sensitive data, such approvals can add months to the beginning of the research timeline. Restricted data at ICPSR are conspicuously marked. (These restrictions are to protect research respondents' identities in areas like drug use, sexuality, and criminal behavior.)
ICPSR's site includes instructional and best-practices materials; more are available on ICPSR's YouTube channel.
VT faculty and students qualify for discounts on ICPSR's summer training program of workshops and courses on social science research methods.
Searchable portal brings together contents of several major repositories of social data. Dataverse is an web application for sharing, preserving, citing, exploring, and analyzing research data. It facilitates making data available to others, and allows you to replicate others work. Each Dataverse repository hosts multiple dataverses. Each dataverse contains datatsets or other dataverses, and each dataset contains descriptive metadata and data files (including documentation and code that accompanies the data). Part of Harvard Dataverse.
Based at Syracuse University, QDR selects, ingests, curates, archives, manages, durably preserves, and provides access to digital data used in qualitative and multi-method social inquiry. The repository develops and publicizes common standards and methodologically informed practices for these activities, as well as for the reuse and citation of qualitative data.
VT Libraries provides Tech's institutional membership in QDR. (In fact, Virginia Tech is the QDR's very first institutional member.)
Virginia Tech’s institutional data repository is a platform for depositing and providing public access to datasets and related research products created by Virginia Tech faculty, staff, and students. Other research universities may offer similar repositories.
GeoData is a discovery tool for geospatial data, primarily fro Virginia, comprising not only datasets purchased as a part of the library collection, but also data created, collected, or digitized from printed maps at Virginia Tech. GeoData is implementation of the inter-institutional GeoBlacklight collaboration, curated by the VT Libraries' Geospatial Data Consultant.
The home of the US government’s open data. Here you will find data compiled by federal agencies, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more.Browse by topic from the landing page or access the searchable data catalog from the Data menu at the top of the page. Files may be in TXT, HTML, XLS, CSV. or other formats.
See the Federal Committee on Statistical Methodology (FCSM) site for technical standards and guidelines behind the federal data.
The US Census Bureau Data Repository preserves and disseminates survey instruments, specifications, data dictionaries, codebooks, and other materials provided by the US Census Bureau. ICPSR, the host of this data repository, has also listed additional Census-related data collections from its larger holdings.
ResearchDataGov is a web portal and application system for discovering and requesting restricted-access microdata from various US federal statistical agencies. These data must be accessed and used only within a Federal Statistical Research Data Center -- Virginia Tech has an arrangement for VT faculty researchers to apply to use the FSRDC at Georgetown University. Access to these data will take several months, not moments: it requires application, then approval by the federal agency(ies) that generated the datasets -- as well as the Georgetown FSRDC administrator; Tech's Institute for Society, Culture, and Environment; and other campus offices. See VT application procedures at ISCE
Public-use datasets have been created from the studies listed at RDC and, in many cases, these may be adequate for your research. The public-use data can often be found on federal websites or in the catalogs of repositories such as ICPSR. Public-use data from the Census Bureau, for example, can be found in the United States Census Bureau Data Repository, housed at ICPSR.
ICPSR developed ResearchDataGov with support and guidance from the Census Bureau, the Office of Management and Budget, and the Interagency Council on Statistical Policy.
Project to safeguard US federal agencies' data and their associated user interfaces to assure that reliable copies remain available to researchers. Initial concentration has been in environmental and climate data.
Some government digital data were distributed on disk or tape and not posted online, and some data that were available have moved or taken down over the years. DataLumos is ICPSR's archive for valuable US government agencies' social data resources.
You can search for data sources and statistics resources in other VT Libraries' research guides. Here is a basic starter list. Sort and filter it in various ways and use its search box as a point of departure for more.
The Correlates of State Policy Project aims to compile, disseminate, and encourage the use of data relevant to US state policy research, tracking policy differences across and change over time in the 50 states. Comprises more than 900 variables from various sources assembled them into one large dataset. These cross-state and cross-time datasets are free and publicly available for academics, policy analysts, students, policymakers, and the research community. From the Institute for Public Policy and Social Research at Michigan State Universtiy
Provided by the Roper Center for Public Opinion Research at Cornell University, Roper iPoll is the largest collection of public opinion poll data with results from 1935 to the present. Roper iPoll contains nearly 800,000 questions and over 23,000 datasets from both U.S. and international polling firms.
Surveys cover many topics,large and small, including social issues, politics, pop culture, international affairs, science, the environment, and much more. When available, results charts, demographic crosstabs and full datasets are provided for immediate download. Coverage is 1930s-present.
ANES has aimed since 1948 to provide data that support rich hypothesis testing about American voting behavior, maximize methodological excellence, measure many variables, and promote comparisons across people, contexts, and time. variable search tool, informational guides and ANES study reports.
GSS gathers data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes. The GSS contains a standard core of demographic, behavioral, and attitudinal questions, plus topics of special interest, such as civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events. Hundreds of trends have been tracked since 1972. In addition, since the GSS adopted questions from earlier surveys, trends can be followed for up to 70 years. Datasets may be downloaded or analyzed online with GSS Data Explorer
WorldPublicOpinion.org presents articles summarize polling data and analyses from numerous sources with links to questionnaires and results. Full datasets can be downloaded from http://drum.lib.umd.edu/handle/1903/10117. WorldPublicOpinion.org is an international collaborative project managed by the Program for Public Consultation at the University of Maryland.
Global study of changing values and their impact on social and political life consists of nationally representative surveys conducted in almost 100 countries which contain almost 90 percent of the world’s population, using a common questionnaire. The WVS is the largest non-commercial, cross-national, time series investigation of human beliefs and values ever executed, currently including interviews with almost 400,000 respondents. Open access.
This project represents the largest, most careful and systematic comparative survey of attitudes and values toward politics, power, reform, democracy and citizens' political actions in Africa, Asia, Latin America and the Arabic region. It is based on a common module of questions contained in regional barometer surveys. Regional barometers:
Direct online access to over 25 years of public opinion survey data, collected by major survey research firms in Canada. In addition, CORA archives and provides access to the individual-level data files from most Canadian Election Studies since 1965. Search for survey questions and results frequencies from the data analysis page.
LAD provides a portal for Latin American datasets acquired, processed and archived by the Roper Center for Public Opinion Research. This valuable collections includes data from public opinion surveys conducted by the survey research community in Latin America and the Caribbean, including universities, institutes, individual scholars, private polling and public opinion research firms.
VT Libraries provide a limited number of computers loaded with specialized analytical applications that are not available through the internet (and may be expensive for individual purchase) -- along with consultants to help you gather, analyze, represent, and curate your data.
The DCL (Newman Library 3010, near the main elevators) provides STATA, R Studio, ARGGIS, ERDAS, among others.
Open hours depend on staffing and can vary by semester; to schedule a consultation or reserve time on a workstation email or stop by. Some remote access can be reserved as times when the PCs are not in use.
Tech's Statistical Applications and Innovations Group offers walk-in consulting hours in the Newman Library Data Transformation Lab (room 3010) four afternoons a week to address your quick questions or to help with research projects requiring less than 30 minutes of assistance. Walk-in hours are available only when classes are in session.
Tech's Technology-enhanced Learning and Online Strategies (TLOS) office transitioned 250 computers in its campus labs to virtual-only access via the VT VPN as a Covid protection measure. This page tells you how.
This list of applications on those TLOS lab machines shows which ones are available remotely. TLOS licenses for statistical software often expire every August, and updates may be delayed.
Data and analyses "on the causes, consequences and nature of Good Governance and the Quality of Government (QoG) -- that is, trustworthy, reliable, impartial, uncorrupted and competent government institutions. Our research addresses the questions of how to create and maintain high quality government institutions and how the quality of such institutions influences public policy and socio-economic conditions in a broader sense." Based at the University of Gothenburg, Sweden.
Provides analyses, index scores, and data documentation on two levels of governance
Regional Authority Index. RAI tracks regional authority on an annual basis from 1950 to 2010 in 81 countries. Datasets include annual scores in for 231 regional governments/tiers and 81 countries for 1950-2010
International Authority Index. MIA measures delegation and pooling of international authority for 76 international governmental organizations for 1950-2010. The MIA data are annual.
NCCS is a national clearinghouse of data on the nonprofit sector in the United States. This open-access version of NCCS Webster contains a variety of tools and reports to help you learn more about the nonprofit sector: find a nonprofit organization in your area, view IRS Form 990 images, analyze financial data on the sector, look at trends in charitable giving, or download data.
"GDELT Project is an open platform for research and analysis of global society" through mining news media from around the world, in 100 languages, since 1979. The "big data" project offers a free cloud-based analysis service, Google BigQuery, and -- for advanced users -- dataset downloads.
ATI is a new approach to connecting readers of qualitative and mixed-methods research to the underlying data, such as those curated by the Qualitative Data Repository at Syracuse University. ATI facilitates transparency by allowing scholars to “annotate” specific passages in an article. Annotations amplify the text and, when possible, include a link to one or more data sources underlying a claim; data sources are housed in a repository. (VT's institutional membership in the QDR is provided by the University Libraries.)