Skip to Main Content

Preparing Data for Input into Geospatial Applications: Introduction

This is a guide to help Virginia Tech community members prepare their data before inputting it into a geospatial software/application.

Introduction

Geospatial applications rely on accurate, well-structured, and properly formatted data. Poorly prepared data can lead to errors in mapping, analysis, and decision-making. This guide covers the key steps in preparing data for geospatial use, including data sourcing, cleaning, transformation, and validation.

When starting a GIS mapping or analysis project, a common challenge is assembling the data needed to answer the question or produce the desired output. The datasets you need may be available, but at different accuracy levels, or may include the required geographic features but lack a key attribute. These sorts of issues may make data unusable without additional preparation.

Creating a map begins with raw spatial data, typically lacking visual styling or symbology. At this initial stage, the data consists merely of geometric forms—such as points, lines, and polygons—without any graphical interpretation.

These geometric features are then processed through a cartographic workflow using Geographic Information Systems (GIS) software (e.g., ArcGIS, QGIS). During this stage, the data are symbolized and classified to represent real-world features—for example, points may become icons for statues, pharmacies, or museums. Standard cartographic color conventions are applied to enhance readability; for instance, brown is often used for buildings, green for parks or vegetation, and blue for water bodies. These visual cues help make the map more intuitive and user-friendly.

Maps are generally composed of multiple data layers, each representing a specific category of spatial features. These layers correspond to separate tables within a geospatial database and might include information about buildings, road networks, water features, or green spaces. Together, these layers are combined to form a complete and coherent map.

 

Source: https://docs.geoserver.geo-solutions.it

Coordinate Systems

Coordinate systems are the foundation for working with geographic data in GIS. They are essential for accurately placing features on a map and provide a way to locate features on the Earth’s surface using numbers, either latitude and longitude (in a geographic coordinate system) or X and Y values like meters or feet (in a projected coordinate system). Coordinate systems are important because they ensure everything lines up correctly on a map. Whether using a map projection to flatten the globe, geocoding an address into a location, or georeferencing an image to match real-world coordinates, having the right coordinate system makes your spatial data accurate and meaningful. There are two main types of coordinate systems:

  • Projected Coordinate System (PCS) transforms the Earth’s surface onto a flat, 2D map using X and Y coordinates (usually in meters or feet). This system is used when you need to make accurate measurements, such as calculating land area or planning infrastructure. Most local GIS Analysis is more effective when using projected coordinate systems.
  • Geographic Coordinate System (GCS) uses a 3D model of the Earth and expresses locations using latitude and longitude in degrees. This system is ideal for global data but isn’t well-suited for measuring distance or area.

Geocoding

Geocoding is a process of converting addresses into coordinates (latitude (X) and longitude (Y)). It takes a text-based description of a location, such as an address or the name of a place. It returns geographic coordinates, frequently latitude and longitude, to identify a location on the Earth’s surface. For example:

Your Address Address Standardization Address Locator (Latitudes & Longitudes) Points on a Map
560 Drillfield Dr, Blacksburg 560 Drillfield Dr, Blacksburg, VA 24061 37.228835, -80.41919
310 Alumni Mall, 24061 310 Alumni Mall, Blacksburg, VA 24061 37.230317, -80.420013
190 W Campus Dr, VA 290 College Ave, Blacksburg, VA 24060 37.229626, -80.417971

Note: 

  • Latitudes are horizontal lines that measure distance North or South of the equator (0N & 0S)
  • Longitudes are vertical lines that measure the distance East or West of the Greenwich meridian (0E & 0W)
  • Together, the latitudes and longitudes enable the ability to locate points or places on the globe.

Credit: Illinois State University

Map Projections

Map Projections: Representing a 3-dimensional Earth's data into a 2-dimensional data based on various models. Maps require coordinates to model the locations of places on Earth accurately. The World Geodetic System 1984 (WGS84) is the universal standard coordinate system for mapping and navigation worldwide, including GPS (Global Positioning System). However, there are other coordinate systems used by specific areas/continents, for example:

Coordinate System

EPSG Code

Use Case

WGS 84 (EPSG:4326)

4326

GPS, global mapping

UTM (Universal Transverse Mercator)

Varies by zone

Regional mapping, more accurate at smaller scales

NAD 83 (EPSG:4269)

4269

North America-specific

ED 50 (EPSG:4230)

4230

Europe-specific

An EPSG code is a unique identifier for a Coordinate Reference System (CRS) used in mapping and GIS applications. The European Petroleum Survey Group (EPSG) developed these codes, now widely used to define spatial reference systems in GIS software. EPSG codes are important for:

  • Standardization: Ensures consistency when working with spatial data across different software and platforms.
  • Quick Identification: Easily reference a CRS without manually defining parameters.
  • Compatibility: Used in GIS tools like QGIS, ArcGIS, GDAL, and web mapping APIs.

Note:

  • A geographic coordinate system describes where something is on Earth using latitude and longitude, like a globe.
  • A projected coordinate system flattens the Earth to make accurate measurements and is better for detailed, local maps. Most local GIS Analysis is more effective when using projected coordinate systems.

Georeferencing

Georeferencing is the process of aligning spatial data, like maps or scanned satellite images, to a known coordinate system so that it accurately represents real-world locations. This is essential for combining different datasets in GIS, where precise spatial alignment is critical. 

How Georeferencing Works:

  • Choosing a Coordinate System: Select a coordinate system like WGS 84 (EPSG:4326) for global data.
  • Adding Control Points: Match identifiable features on the map (e.g., road intersections) with real-world coordinates.
  • Applying a Transformation: Adjust the image to fit the chosen coordinate system using methods like Affine or Polynomial transformations.
  • Checking for Accuracy: Validate the results to ensure precise alignment.

For example, if you have a historical map that lacks geographic coordinates, you can add control points at known locations, like the corners of a building or road intersections, to anchor the map to a real-world location. This makes the old map usable for modern analysis.

Image showing the georeferencing process for a scanned analog map in GIS software. 2021. Geography & Map Division. Library of Congress

  • Beginning of the georeferencing process (left) two control points have been placed between the scanned map image and the current aerial imagery, bringing the scanned map image to the correct scale but not the correct placement.
  • Completion of the georeferencing process (right) a number of well-distributed control points have been placed, bringing the scanned map image to both the correct scale and correct geographic placement.

Geospatial Data Types

Before preparing data, it is essential to understand the different types of geospatial data, because different GIS applications require different data formats.

Geospatial Data Type

Geospatial Data Format

Vector Data (Points, Lines, and Polygons)

Typically stored in Shapefiles (.shp), Geodatabase (.gdb), GeoJSON, KML

Raster Data (Satellite Images, DEMs)

Requires formats like GeoTIFF, ECW, MrSid

Tabular Data (Attribute Tables)

Best saved as CSV, Excel, or database files

For example, if you have a list of coordinates in an Excel sheet, you need to format it as point data before importing it into a GIS application. You can read more about the types of geospatial data in our An Introduction to Geospatial Mapping: Geospatial Mapping Data LibGuide.

Understanding the geospatial data types is also important, as it prevents data compatibility issues. For example, if you join a CSV table of city names to a shapefile of cities, you need a common key field (e.g., City_ID) to match records correctly. Read more on how to use Joins and/versus Relates here: About joining and relating tables

Geospatial Data Acquisition and Sourcing

Geospatial data can be obtained from various sources, each providing unique information about the Earth's surface and its attributes. They can be broadly classified into primary geospatial data and secondary geospatial data.

You can read more about the sources of geospatial data in the LibGuide referenced above. Sourcing primary geospatial data is typically tailored to the specific needs of a project. The key characteristics of primary geospatial data are:

  1. Firsthand collection – Gathered directly by researchers, surveyors, or organizations.
  2. Customizable – Data collection can be tailored to fit specific project goals or geographic areas.
  3. High accuracy – Often involves precise tools, such as GPS devices or drones.
  4. Up-to-date – Reflects current conditions at the time of collection.
  5. Resource-intensive – Requires time, tools, expertise, and sometimes permits.

It is essential to understand the sources of your geospatial data to assess their accuracy, completeness, and usability. For secondary geospatial data, obtained from existing records or databases, it is important to:

  1. Understand how the data was collected and how that will impact the map you will create
  2. Choose the right data for your needs
  3. Avoid using outdated and incompatible information (unless you are doing a comparison analysis)
  4. Cite sources properly in your work and respect the licensing of the dataset

The following table presents a short list of libraries and directories that provide access to open GIS data:

Geospatial Data Curator

Profile Photo
Imma Mwanja
Contact:
Newman Library Room 3010
560 Drillfield Dr
Blacksburg, VA 24061
(540)-231-8665