Skip to content

Geospatial Engineering Overview

Every time you open a ride-hailing app, search for "coffee near me", or watch a delivery driver navigate to your door, geospatial engineering is doing the heavy lifting. Behind the simple blue dot on a map lies a stack of coordinate systems, spatial indexes, projection math, and distance algorithms that most engineers never learn until they need them.

This page gives you the conceptual foundation. The companion Spatial Indexing Deep Dive goes into advanced data structures and production implementations.


Why Geospatial Engineering Matters

Location is one of the most common dimensions in modern applications, yet most CS curricula skip it entirely. If you are building any of the following, you need geospatial fundamentals:

DomainGeospatial Problem
Ride-hailing (Uber, Lyft)Match riders to nearest drivers in real time
Food delivery (DoorDash)Assign orders to couriers within a delivery radius
E-commerce (Amazon)Find the nearest warehouse with stock
Social (Tinder, Snapchat)"People nearby" with privacy-preserving location
Real estate (Zillow)Search within a drawn polygon, school district boundaries
Logistics (FedEx, UPS)Route optimization across thousands of stops
IoT / Fleet trackingMonitor thousands of vehicles, geofence alerting
Gaming (Pokemon Go)Augmented reality anchored to real-world coordinates

The Shape of the Earth

Before you can reason about coordinates, you need to understand what you are projecting onto.

The Geoid, Ellipsoid, and Datum

The Earth is not a sphere. It is not even a perfect ellipsoid. The actual shape — the geoid — is an irregular surface defined by gravity. Since we cannot do math on an irregular potato, we approximate.

ConceptDefinitionExample
GeoidThe true gravitational shape of the EarthMeasured by satellites (GRACE mission)
EllipsoidA mathematically defined oval that approximates the geoidWGS84 ellipsoid: semi-major axis 6,378,137 m
DatumAn ellipsoid anchored to specific reference pointsWGS84, NAD83, ETRS89
CRSCoordinate Reference System — the full specificationEPSG:4326 (WGS84 geographic)

WARNING

Mixing datums is a common bug. A point in NAD27 and the same point in WGS84 can differ by up to 200 meters in North America. Always track which datum your coordinates use.

WGS84 — The Universal Standard

WGS84 (World Geodetic System 1984) is the datum used by GPS, Google Maps, and nearly every web mapping system. When someone gives you a latitude and longitude without specifying a datum, they almost certainly mean WGS84.

  • EPSG:4326 — WGS84 geographic coordinates (latitude, longitude in degrees)
  • EPSG:3857 — Web Mercator projection (used by all major web map tile servers)
Latitude:   -90 to +90   (north-south, equator = 0)
Longitude: -180 to +180  (east-west, Prime Meridian = 0)

TIP

Latitude is like a ladder (lat-itude, lat-er) — it goes up and down. Longitude is long — it wraps around the Earth the long way.


Coordinate Systems and Projections

The Earth is 3D. Maps are 2D. Converting between them always introduces distortion. The type of projection you choose determines what gets distorted and what is preserved.

Common Projections

ProjectionPreservesDistortsUse Case
MercatorAngles, local shapesArea (Greenland looks huge)Navigation, web maps
Web Mercator (3857)AnglesArea, poles (cuts off at ~85 degrees)Google Maps, Mapbox, Leaflet
UTMShape within each zoneNothing significant within zoneMilitary, surveying, local engineering
AlbersAreaShape at edgesThematic maps, census data
Lambert Conformal ConicShape, anglesAreaAviation charts, state-level mapping

When Projection Matters in Engineering

For most backend applications — finding nearby drivers, geofencing, proximity search — you work in WGS84 (EPSG:4326) and use geodesic distance formulas. You only need to worry about projections when:

  1. Rendering map tiles — tile servers use Web Mercator (EPSG:3857)
  2. Computing areas — Mercator wildly distorts area; use equal-area projections
  3. High-precision surveying — use UTM for sub-meter accuracy
  4. Storing in PostGIS — use geography type (WGS84) for global data, geometry type with a local projection for regional data

Distance Calculations

"How far apart are two points?" is the foundational geospatial query, and the answer depends on how much accuracy you need versus how much CPU you want to spend.

Haversine Formula

The Haversine formula computes the great-circle distance between two points on a sphere. It ignores the ellipsoidal shape of the Earth but is accurate to about 0.3% (good enough for most applications).

python
import math

def haversine(lat1, lon1, lat2, lon2):
    """Distance in kilometers between two points on Earth."""
    R = 6371  # Earth radius in km

    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)

    a = (math.sin(dlat / 2) ** 2 +
         math.cos(math.radians(lat1)) *
         math.cos(math.radians(lat2)) *
         math.sin(dlon / 2) ** 2)

    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    return R * c

# New York to London
print(haversine(40.7128, -74.0060, 51.5074, -0.1278))
# Output: 5570.25 km

Vincenty / Karney (Geodesic)

For sub-meter accuracy, use Vincenty's formulae or the improved Karney algorithm (implemented in GeographicLib). These compute distance on the WGS84 ellipsoid.

python
from geographiclib.geodesic import Geodesic

result = Geodesic.WGS84.Inverse(40.7128, -74.0060, 51.5074, -0.1278)
print(f"Distance: {result['s12'] / 1000:.2f} km")
# Output: 5570.22 km (more accurate than Haversine)

Distance Method Comparison

MethodAccuracySpeedUse Case
Euclidean (flat earth)Terrible for > 1 kmFastestOnly for very small areas (< 1 km, same city block)
Haversine~0.3% errorFastProximity search, "nearby" queries, 99% of apps
Vincenty~0.05 mmMediumSurveying, geodesy, aviation
Karney~15 nmMediumBest for all cases needing high precision

TIP

Use Haversine for application-level "find nearby" queries. Use PostGIS ST_Distance with geography type for database-level queries — it uses the Karney algorithm internally.


Geohashing

Geohashing converts a 2D coordinate (latitude, longitude) into a 1D string that preserves spatial locality. Nearby points tend to share a common prefix, which makes geohashes perfect for database indexing.

How Geohashing Works

The algorithm recursively bisects the world:

Each additional character narrows the area:

Geohash LengthCell WidthCell HeightApproximate Area
15,000 km5,000 kmContinent
21,250 km625 kmLarge country region
3156 km156 kmState / Province
439.1 km19.5 kmMetro area
54.9 km4.9 kmTown
61.2 km0.6 kmNeighborhood
7153 m153 mCity block
838 m19 mBuilding
94.8 m4.8 mRoom

Geohash Properties

Prefix sharing — if two geohashes share a prefix, they are in the same region. 9q8yyk and 9q8yym are in the same level-5 cell.

Edge problem — two points across a cell boundary can be very close but share no prefix at all. The classic example: points on either side of the Prime Meridian or the Equator.

python
import geohash2

# San Francisco
gh1 = geohash2.encode(37.7749, -122.4194, precision=6)
print(gh1)  # "9q8yyk"

# Nearby point in SF
gh2 = geohash2.encode(37.7750, -122.4190, precision=6)
print(gh2)  # "9q8yyk" — same geohash!

# Finding neighbors to handle edge cases
neighbors = geohash2.neighbors(gh1)
print(neighbors)
# ['9q8yym', '9q8yys', '9q8yyw', '9q8yyj', ...]

WARNING

Never use exact geohash prefix matching alone for proximity search. Always query the target cell plus its 8 neighbors to avoid missing nearby points across cell boundaries.

Geohash in Databases

Geohashes work beautifully as database index keys because prefix searches become range scans:

sql
-- Find all restaurants in a geohash region
-- The geohash column has a B-tree index
SELECT * FROM restaurants
WHERE geohash LIKE '9q8yy%';

-- With neighbors for completeness
SELECT * FROM restaurants
WHERE geohash LIKE '9q8yy%'
   OR geohash LIKE '9q8yz%'
   OR geohash LIKE '9q8yw%'
   -- ... all 8 neighbors

Spatial Data Types and Formats

Geometry Types

The OGC (Open Geospatial Consortium) defines standard geometry types:

TypeExample UseWKT Example
PointStore location, pin on mapPOINT(-122.4194 37.7749)
LineStringRoad segment, route, riverLINESTRING(-122.4 37.7, -122.3 37.8)
PolygonDelivery zone, city boundaryPOLYGON((-122.5 37.7, -122.4 37.7, ...))
MultiPolygonCountry with islands, discontiguous zonesMULTIPOLYGON((...), (...))

Data Formats

FormatTypeUse Case
GeoJSONText (JSON)Web APIs, Leaflet/Mapbox, interchange
WKTTextSQL queries, human-readable
WKBBinaryDatabase internal storage, efficient transfer
ShapefileBinary + sidecar filesLegacy GIS (ESRI), government data
GeoParquetColumnar binaryAnalytics, data lakes, large datasets
FlatGeobufBinaryStreaming large datasets over HTTP
MVTBinary (protobuf)Vector map tiles
json
// GeoJSON Point
{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [-122.4194, 37.7749]
  },
  "properties": {
    "name": "San Francisco",
    "population": 873965
  }
}

DANGER

GeoJSON uses [longitude, latitude] order (x, y). Most humans think latitude-first. This mismatch causes countless bugs. Google Maps API uses (lat, lng), but GeoJSON uses [lng, lat]. Always check which convention your library expects.


When You Need Geospatial Infrastructure

Not every app that stores lat/lng needs a full geospatial stack. Use this decision tree:

Complexity Tiers

TierComplexityToolsExample Apps
1 — Store & DisplayLowRegular float columns, GeoJSON in frontendContact directory with addresses
2 — Proximity SearchMediumGeohash in Redis/DynamoDB or PostGISRestaurant finder, store locator
3 — Polygon OperationsHighPostGIS, Turf.jsDelivery zone management, geofencing
4 — Real-Time SpatialVery HighH3/S2 + Redis + stream processingUber, fleet tracking, Pokemon Go

Common Geospatial Pitfalls

1. Storing Coordinates as Integers

Storing lat/lng as integers (multiplied by 1e6 or 1e7) saves space but introduces rounding errors and makes joins with GIS tools painful. Use DOUBLE PRECISION or geography type instead.

2. Ignoring the Antimeridian

The International Date Line (180 / -180 longitude) breaks naive bounding box queries:

sql
-- WRONG: this returns nothing for a box crossing the antimeridian
WHERE lng BETWEEN 170 AND -170

-- CORRECT: split into two ranges
WHERE (lng >= 170 OR lng <= -170)

3. Treating Lat/Lng as a Flat Grid

One degree of latitude is always ~111 km. But one degree of longitude varies from 111 km at the equator to 0 km at the poles.

python
import math

def lng_km_per_degree(latitude):
    """Kilometers per degree of longitude at a given latitude."""
    return 111.32 * math.cos(math.radians(latitude))

print(lng_km_per_degree(0))    # 111.32 km (equator)
print(lng_km_per_degree(45))   # 78.71 km (Paris)
print(lng_km_per_degree(60))   # 55.66 km (Helsinki)
print(lng_km_per_degree(89))   # 1.94 km (near pole)

4. Using Float Equality

Floating-point coordinates should never be compared with ==. Use a tolerance:

python
# WRONG
if user_lat == store_lat and user_lng == store_lng:
    ...

# CORRECT (within ~1 meter)
EPSILON = 0.00001  # ~1.1 meters
if abs(user_lat - store_lat) < EPSILON and abs(user_lng - store_lng) < EPSILON:
    ...

5. Privacy: Storing Exact User Locations

Storing exact GPS coordinates is a privacy risk. Common mitigations:

TechniquePrecisionMethod
Geohash truncation~1 kmStore only 5-6 character geohash
Random jitter~100-500 mAdd random offset before storage
Snap to gridVariableRound to nearest grid cell center
k-anonymityVariableOnly reveal location if k users share the same cell

Geospatial Tech Stack at a Glance

LayerTools
DatabasePostGIS (Postgres extension), MongoDB (2dsphere), Elasticsearch
In-memory indexRedis GEO commands, H3, S2
Tile serverMartin, pg_tileserv, Tegola
FrontendMapbox GL JS, Leaflet, Deck.gl, Google Maps API
GeocodingNominatim (OSM), Google Geocoding API, Mapbox
RoutingOSRM, Valhalla, GraphHopper, Google Directions API
Data processingApache Sedona (Spark), DuckDB Spatial, GDAL/OGR
File formatsGeoJSON, GeoParquet, FlatGeobuf, Shapefile

What to Learn Next

TopicLink
Spatial indexing deep dive (geohash, S2, H3, R-tree, PostGIS)Spatial Indexing Deep Dive
Design Uber/Lyft (ride matching, real-time location)Design Uber
Design Google Maps (routing, tile serving)Design Google Maps
Caching strategies for location dataCaching
Stream processing for real-time locationStream Processing

Key Takeaways

  1. The Earth is not flat — always use geodesic distance (Haversine minimum) for points more than a kilometer apart
  2. Geohashes are your gateway drug — they turn 2D coordinates into 1D sortable strings, perfect for database indexes
  3. GeoJSON is [lng, lat] — the most common source of geospatial bugs is coordinate order
  4. Always query neighbors — geohash proximity search must include the 8 surrounding cells
  5. Choose the right tier — do not build PostGIS infrastructure for a simple store locator with 200 locations
  6. Privacy is not optional — truncate, jitter, or anonymize user locations before storage

"What I cannot create, I do not understand." — Richard Feynman