Preparing Data for Input into Geospatial Applications: Introduction
Introduction
Geospatial applications rely on accurate, well-structured, and properly formatted data. Poorly prepared data can lead to errors in mapping, analysis, and decision-making. This guide covers the key steps in preparing data for geospatial use, including data sourcing, cleaning, transformation, and validation.
When starting a GIS mapping or analysis project, a common challenge is assembling the data needed to answer the question or produce the desired output. The datasets you need may be available, but at different accuracy levels, or may include the required geographic features but lack a key attribute. These sorts of issues may make data unusable without additional preparation.
Creating a map begins with raw spatial data, typically lacking visual styling or symbology. At this initial stage, the data consists merely of geometric forms—such as points, lines, and polygons—without any graphical interpretation.
These geometric features are then processed through a cartographic workflow using Geographic Information Systems (GIS) software (e.g., ArcGIS, QGIS). During this stage, the data are symbolized and classified to represent real-world features—for example, points may become icons for statues, pharmacies, or museums. Standard cartographic color conventions are applied to enhance readability; for instance, brown is often used for buildings, green for parks or vegetation, and blue for water bodies. These visual cues help make the map more intuitive and user-friendly.
Maps are generally composed of multiple data layers, each representing a specific category of spatial features. These layers correspond to separate tables within a geospatial database and might include information about buildings, road networks, water features, or green spaces. Together, these layers are combined to form a complete and coherent map.
Coordinate Systems
Coordinate systems are the foundation for working with geographic data in GIS. They are essential for accurately placing features on a map and provide a way to locate features on the Earth’s surface using numbers, either latitude and longitude (in a geographic coordinate system) or X and Y values like meters or feet (in a projected coordinate system). Coordinate systems are important because they ensure everything lines up correctly on a map. Whether using a map projection to flatten the globe, geocoding an address into a location, or georeferencing an image to match real-world coordinates, having the right coordinate system makes your spatial data accurate and meaningful. There are two main types of coordinate systems:
- Projected Coordinate System (PCS) transforms the Earth’s surface onto a flat, 2D map using X and Y coordinates (usually in meters or feet). This system is used when you need to make accurate measurements, such as calculating land area or planning infrastructure. Most local GIS Analysis is more effective when using projected coordinate systems.
- Geographic Coordinate System (GCS) uses a 3D model of the Earth and expresses locations using latitude and longitude in degrees. This system is ideal for global data but isn’t well-suited for measuring distance or area.
Geocoding
Geocoding is a process of converting addresses into coordinates (latitude (X) and longitude (Y)). It takes a text-based description of a location, such as an address or the name of a place. It returns geographic coordinates, frequently latitude and longitude, to identify a location on the Earth’s surface. For example:
Your Address | Address Standardization | Address Locator (Latitudes & Longitudes) | Points on a Map |
---|---|---|---|
560 Drillfield Dr, Blacksburg | 560 Drillfield Dr, Blacksburg, VA 24061 | 37.228835, -80.41919 | |
310 Alumni Mall, 24061 | 310 Alumni Mall, Blacksburg, VA 24061 | 37.230317, -80.420013 | |
190 W Campus Dr, VA | 290 College Ave, Blacksburg, VA 24060 | 37.229626, -80.417971 |
Note:
- Latitudes are horizontal lines that measure distance North or South of the equator (0N & 0S)
- Longitudes are vertical lines that measure the distance East or West of the Greenwich meridian (0E & 0W)
- Together, the latitudes and longitudes enable the ability to locate points or places on the globe.
Credit: Illinois State University
Map Projections
Map Projections: Representing a 3-dimensional Earth's data into a 2-dimensional data based on various models. Maps require coordinates to model the locations of places on Earth accurately. The World Geodetic System 1984 (WGS84) is the universal standard coordinate system for mapping and navigation worldwide, including GPS (Global Positioning System). However, there are other coordinate systems used by specific areas/continents, for example:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
An EPSG code is a unique identifier for a Coordinate Reference System (CRS) used in mapping and GIS applications. The European Petroleum Survey Group (EPSG) developed these codes, now widely used to define spatial reference systems in GIS software. EPSG codes are important for:
- Standardization: Ensures consistency when working with spatial data across different software and platforms.
- Quick Identification: Easily reference a CRS without manually defining parameters.
- Compatibility: Used in GIS tools like QGIS, ArcGIS, GDAL, and web mapping APIs.
Note:
- A geographic coordinate system describes where something is on Earth using latitude and longitude, like a globe.
- A projected coordinate system flattens the Earth to make accurate measurements and is better for detailed, local maps. Most local GIS Analysis is more effective when using projected coordinate systems.
Georeferencing
Georeferencing is the process of aligning spatial data, like maps or scanned satellite images, to a known coordinate system so that it accurately represents real-world locations. This is essential for combining different datasets in GIS, where precise spatial alignment is critical.
How Georeferencing Works:
- Choosing a Coordinate System: Select a coordinate system like WGS 84 (EPSG:4326) for global data.
- Adding Control Points: Match identifiable features on the map (e.g., road intersections) with real-world coordinates.
- Applying a Transformation: Adjust the image to fit the chosen coordinate system using methods like Affine or Polynomial transformations.
- Checking for Accuracy: Validate the results to ensure precise alignment.
For example, if you have a historical map that lacks geographic coordinates, you can add control points at known locations, like the corners of a building or road intersections, to anchor the map to a real-world location. This makes the old map usable for modern analysis.
Image showing the georeferencing process for a scanned analog map in GIS software. 2021. Geography & Map Division. Library of Congress
- Beginning of the georeferencing process (left) two control points have been placed between the scanned map image and the current aerial imagery, bringing the scanned map image to the correct scale but not the correct placement.
- Completion of the georeferencing process (right) a number of well-distributed control points have been placed, bringing the scanned map image to both the correct scale and correct geographic placement.
Geospatial Data Types
Before preparing data, it is essential to understand the different types of geospatial data, because different GIS applications require different data formats.
|
|
|
|
|
|
|
|
For example, if you have a list of coordinates in an Excel sheet, you need to format it as point data before importing it into a GIS application. You can read more about the types of geospatial data in our An Introduction to Geospatial Mapping: Geospatial Mapping Data LibGuide.
Understanding the geospatial data types is also important, as it prevents data compatibility issues. For example, if you join a CSV table of city names to a shapefile of cities, you need a common key field (e.g., City_ID) to match records correctly. Read more on how to use Joins and/versus Relates here: About joining and relating tables.
Geospatial Data Acquisition and Sourcing
Geospatial data can be obtained from various sources, each providing unique information about the Earth's surface and its attributes. They can be broadly classified into primary geospatial data and secondary geospatial data.
You can read more about the sources of geospatial data in the LibGuide referenced above. Sourcing primary geospatial data is typically tailored to the specific needs of a project. The key characteristics of primary geospatial data are:
- Firsthand collection – Gathered directly by researchers, surveyors, or organizations.
- Customizable – Data collection can be tailored to fit specific project goals or geographic areas.
- High accuracy – Often involves precise tools, such as GPS devices or drones.
- Up-to-date – Reflects current conditions at the time of collection.
- Resource-intensive – Requires time, tools, expertise, and sometimes permits.
It is essential to understand the sources of your geospatial data to assess their accuracy, completeness, and usability. For secondary geospatial data, obtained from existing records or databases, it is important to:
- Understand how the data was collected and how that will impact the map you will create
- Choose the right data for your needs
- Avoid using outdated and incompatible information (unless you are doing a comparison analysis)
- Cite sources properly in your work and respect the licensing of the dataset
The following table presents a short list of libraries and directories that provide access to open GIS data: