Skip to content
Unverified — AI-generated content. Help verify this page

Spatial Indexing Deep Dive

A regular B-tree cannot answer "find all restaurants within 2 km of me" efficiently. Spatial data lives in two dimensions (or more), and standard one-dimensional indexes cannot capture proximity in 2D space without scanning far too many rows. Spatial indexes solve this by partitioning space into cells, hierarchies, or trees that make multi-dimensional proximity queries fast.

This page covers the five most important spatial indexing approaches: Geohash, S2 Geometry, H3 Hexagonal, R-tree, and PostGIS — with production use cases for each.


The Spatial Indexing Landscape

IndexDimension ReductionStorageBest For
Geohash2D → 1D stringAny key-value storeSimple proximity, Redis, DynamoDB
S23D sphere → 1D integerAny sorted storeGoogle-scale, covering queries
H32D → hexagonal cellsAny storeAnalytics, ride matching, heatmaps
R-treeBounding box hierarchyPostGIS, SQLitePolygon queries, complex GIS
KD-treeBinary space partitionIn-memory onlyML nearest-neighbor, small datasets
QuadtreeRecursive 4-way splitIn-memory, tile serversMap rendering, collision detection

1. Geohash

Geohash uses a Z-order curve (Morton curve) to interleave latitude and longitude bits into a single string. This is the simplest spatial index and works with any database that supports prefix queries.

How It Works (Bit Interleaving)

Longitude bits: 0 1 1 0 1 ...
Latitude bits:  1 0 1 1 0 ...
Interleaved:   01 10 11 01 10 ...

Base32 encode → "9q8yyk"

Precision Table

CharactersBitsCell WidthCell HeightError (lat)Error (lon)
155,000 km5,000 km+/- 23 deg+/- 23 deg
2101,250 km625 km+/- 2.8 deg+/- 5.6 deg
315156 km156 km+/- 0.7 deg+/- 0.7 deg
42039.1 km19.5 km+/- 0.087 deg+/- 0.18 deg
5254.9 km4.9 km+/- 0.022 deg+/- 0.022 deg
6301.2 km610 m+/- 0.0027 deg+/- 0.0055 deg
735153 m153 m+/- 0.00069 deg+/- 0.00069 deg
84038.2 m19.1 m+/- 0.000086 deg+/- 0.00017 deg

Proximity Search with Geohash

The standard algorithm: compute the target geohash, expand to include all 8 neighbors, query for matching prefixes, then filter by exact distance.

python
import geohash2
from math import radians, sin, cos, sqrt, atan2

def find_nearby(lat, lng, radius_km, precision=6):
    """Find candidate cells for a proximity search."""
    center = geohash2.encode(lat, lng, precision=precision)

    # Always include neighbors to handle edge cases
    candidates = {center}
    candidates.update(geohash2.neighbors(center))

    return candidates

def haversine_km(lat1, lon1, lat2, lon2):
    R = 6371
    dlat = radians(lat2 - lat1)
    dlon = radians(lon2 - lon1)
    a = sin(dlat/2)**2 + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlon/2)**2
    return R * 2 * atan2(sqrt(a), sqrt(1 - a))

# Usage: Find restaurants within 2 km
cells = find_nearby(37.7749, -122.4194, radius_km=2, precision=6)
# Query DB: SELECT * FROM restaurants WHERE geohash IN ('9q8yyk', '9q8yym', ...)
# Post-filter: keep only results within exact 2 km radius

Geohash in Redis

Redis has native geospatial support built on sorted sets with geohash scoring:

bash
# Add locations
GEOADD restaurants -122.4194 37.7749 "pizza-place"
GEOADD restaurants -122.4089 37.7837 "sushi-spot"
GEOADD restaurants -122.4058 37.7880 "taco-stand"

# Find within 2 km radius
GEORADIUS restaurants -122.4194 37.7749 2 km WITHCOORD WITHDIST ASC COUNT 10

# Or use the newer GEOSEARCH (Redis 6.2+)
GEOSEARCH restaurants FROMLONLAT -122.4194 37.7749 BYRADIUS 2 km ASC COUNT 10

TIP

Redis GEOADD internally stores coordinates as a 52-bit geohash score in a sorted set. GEORADIUS / GEOSEARCH queries the sorted set by score range (exploiting geohash prefix locality) and then filters by exact distance. This gives O(N+log(M)) performance where N is results and M is total items.

Geohash in DynamoDB

DynamoDB has no native geospatial support, but geohash works perfectly as a sort key:

typescript
// Store with geohash as sort key
const params = {
  TableName: 'Locations',
  Item: {
    pk: 'RESTAURANT',
    sk: '9q8yyk8v',  // geohash
    lat: 37.7749,
    lng: -122.4194,
    name: 'Pizza Place',
  }
};

// Query: find all in geohash prefix
// Use BEGINS_WITH on the sort key
const query = {
  TableName: 'Locations',
  KeyConditionExpression: 'pk = :pk AND begins_with(sk, :prefix)',
  ExpressionAttributeValues: {
    ':pk': 'RESTAURANT',
    ':prefix': '9q8yy',  // 5-char prefix for ~5 km resolution
  }
};

Geohash Limitations

LimitationImpactMitigation
Edge discontinuityNearby points across cell boundary share no prefixAlways query 8 neighbors
Z-order curve jumpsSome spatially close cells have distant geohash valuesAccept false positives, filter post-query
Non-uniform cellsCells are rectangular, not square (except at equator)Use S2 or H3 for uniform cells
Pole distortionCells get very narrow near polesNot an issue for most applications

2. S2 Geometry (Google)

S2, developed by Google, projects the Earth onto a cube and uses a Hilbert curve to map the 2D surface to 1D cell IDs. The Hilbert curve has better spatial locality than the Z-order curve used by geohash — nearby cells on the sphere almost always have nearby cell IDs.

S2 Architecture

S2 Cell Levels

S2 cells range from level 0 (face of the cube) to level 30 (sub-centimeter):

LevelApprox Cell SizeTypical Use
0~7,800 kmHemisphere
5~250 kmCountry region
10~8 kmCity district
12~2 kmNeighborhood
14~460 mCity block
16~120 mBuilding cluster
18~30 mIndividual building
20~7 mRoom
24~0.5 mSub-meter precision
30~1 cmMaximum precision

S2 Covering

The killer feature of S2 is covering: given any region (circle, polygon, arbitrary shape), S2 can compute a set of cells at various levels that tightly cover the region. These cells become simple range queries on a sorted index.

python
import s2sphere

def s2_covering_for_radius(lat, lng, radius_km, max_cells=8):
    """Compute S2 cell covering for a circular region."""
    center = s2sphere.LatLng.from_degrees(lat, lng)
    # Earth radius ≈ 6371 km
    angle = s2sphere.Angle.from_degrees(radius_km / 111.32)

    cap = s2sphere.Cap.from_axis_angle(
        center.to_point(),
        angle
    )

    coverer = s2sphere.RegionCoverer()
    coverer.min_level = 10
    coverer.max_level = 16
    coverer.max_cells = max_cells

    covering = coverer.get_covering(cap)
    return covering

# Find cells covering 5 km around San Francisco
cells = s2_covering_for_radius(37.7749, -122.4194, 5.0)
for cell_id in cells:
    print(f"Cell: {cell_id.id()}, Level: {cell_id.level()}")

S2 vs Geohash

FeatureGeohashS2
Curve typeZ-order (Morton)Hilbert
Spatial localityGood (jumps at boundaries)Excellent (minimal jumps)
Cell shapeRectangle (varies by latitude)Nearly uniform quadrilateral
Multi-level coveringNo (fixed precision)Yes (mix levels for tight fit)
Pole handlingDegenerate cellsUniform cells everywhere
Cell ID formatBase32 string64-bit integer
EcosystemUbiquitous (Redis, DynamoDB)Google, Foursquare, MongoDB (2dsphere)
Implementation complexitySimpleComplex

TIP

Google uses S2 internally for Google Maps, Google Earth, and BigQuery GIS. MongoDB's 2dsphere index also uses S2 under the hood. If you need to cover arbitrary polygons efficiently, S2 is the best choice.


3. H3 Hexagonal Indexing (Uber)

H3 is Uber's open-source hierarchical hexagonal grid system. Unlike geohash (rectangles) and S2 (quadrilaterals), H3 uses hexagons, which have a crucial property: every neighbor is equidistant from the center. This eliminates the directional bias of rectangular grids.

Why Hexagons?

PropertySquare GridHexagonal Grid
Neighbors4 edge + 4 corner (unequal distance)6 (all equidistant)
Edge effectsDiagonal vs adjacent ambiguityNo ambiguity
CoveragePerfect tilingPerfect tiling
Area uniformityVaries with projectionNearly uniform (icosahedron base)
Gradient smoothingCheckerboard artifactsSmooth gradients

H3 Resolution Levels

ResolutionAvg Hex AreaAvg Edge LengthUse Case
04,357,449 km21,107 kmContinental
312,392 km259.8 kmRegional analysis
5252.9 km28.5 kmCity-level
75.16 km21.22 kmNeighborhood
80.737 km2461 mSurge pricing zone
90.105 km2174 mCity block
100.015 km265.9 mBuilding cluster
12307 m29.4 mIndividual building
150.9 m20.5 mSub-meter

H3 for Ride Matching

Uber uses H3 for surge pricing, supply/demand analysis, and ride matching:

python
import h3

# Convert lat/lng to H3 cell
lat, lng = 37.7749, -122.4194
h3_index = h3.latlng_to_cell(lat, lng, res=8)
print(h3_index)  # '8828308281fffff'

# Get k-ring (all cells within k steps)
# k=1 gives the hex + its 6 neighbors (7 total)
# k=2 gives 19 cells, k=3 gives 37, etc.
ring = h3.grid_disk(h3_index, k=2)
print(f"Cells in k=2 ring: {len(ring)}")  # 19

# Compute supply/demand per hex
def compute_surge(h3_cell, drivers, riders):
    """Compute surge multiplier for a hexagonal zone."""
    supply = len([d for d in drivers if h3.latlng_to_cell(d.lat, d.lng, 8) == h3_cell])
    demand = len([r for r in riders if h3.latlng_to_cell(r.lat, r.lng, 8) == h3_cell])

    if supply == 0:
        return 3.0  # max surge
    ratio = demand / supply
    if ratio > 2.0:
        return min(1.0 + (ratio - 1.0) * 0.5, 3.0)
    return 1.0

# Polyfill a polygon with hexagons (e.g., a delivery zone)
polygon = h3.LatLngPoly(
    [(37.78, -122.42), (37.79, -122.40), (37.77, -122.40), (37.76, -122.42)]
)
hexagons = h3.polygon_to_cells(polygon, res=9)
print(f"Hexagons covering zone: {len(hexagons)}")

H3 for Analytics and Heatmaps

sql
-- PostgreSQL with h3-pg extension
-- Count events per hexagon for a heatmap
SELECT
    h3_lat_lng_to_cell(location, 8) AS hex,
    COUNT(*) AS event_count,
    AVG(value) AS avg_value
FROM events
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY hex
ORDER BY event_count DESC
LIMIT 100;

WARNING

H3 has a subtle parent-child relationship issue: a hex at resolution N does not perfectly subdivide into 7 child hexes at resolution N+1 due to the geometry of hexagonal tiling. H3 handles this with "pentagon" cells at icosahedron vertices (12 total globally), but be aware of this when doing hierarchical aggregation.


4. R-tree (PostGIS / GIST)

R-trees are balanced search trees for spatial data. Unlike geohash and S2 (which map 2D to 1D), R-trees work natively in 2D by organizing objects into nested bounding boxes.

R-tree Structure

How R-tree Search Works

  1. Start at root. Check if query region overlaps root's bounding box (it always does).
  2. Descend to child nodes whose bounding boxes overlap the query region.
  3. Skip subtrees whose bounding boxes do not overlap (this is the pruning).
  4. At leaf nodes, check individual geometries against the query.

PostGIS GIST Index

PostGIS uses R-trees via the GIST (Generalized Search Tree) framework. This is the most powerful spatial index for complex queries.

sql
-- Create a table with geography type
CREATE TABLE businesses (
    id SERIAL PRIMARY KEY,
    name TEXT,
    location GEOGRAPHY(POINT, 4326),
    boundary GEOGRAPHY(POLYGON, 4326)
);

-- Create GIST spatial index
CREATE INDEX idx_businesses_location ON businesses USING GIST (location);
CREATE INDEX idx_businesses_boundary ON businesses USING GIST (boundary);

-- Proximity search: find businesses within 5 km
SELECT name, ST_Distance(location, ST_MakePoint(-122.4194, 37.7749)::geography) AS dist_m
FROM businesses
WHERE ST_DWithin(
    location,
    ST_MakePoint(-122.4194, 37.7749)::geography,
    5000  -- meters
)
ORDER BY dist_m
LIMIT 20;

-- Polygon containment: is this point inside a delivery zone?
SELECT name
FROM delivery_zones
WHERE ST_Contains(
    boundary::geometry,
    ST_MakePoint(-122.4194, 37.7749)::geometry
);

-- Intersection: which zones overlap with this new zone?
SELECT a.name
FROM delivery_zones a
WHERE ST_Intersects(
    a.boundary,
    ST_GeomFromGeoJSON('{
        "type": "Polygon",
        "coordinates": [[[-122.5,37.7],[-122.4,37.7],[-122.4,37.8],[-122.5,37.8],[-122.5,37.7]]]
    }')::geography
);

Geography vs Geometry in PostGIS

Featuregeometrygeography
CoordinatesProjected (flat, in meters/feet)WGS84 (lat/lng in degrees)
DistanceEuclidean (flat earth)Geodesic (curved earth)
AccuracyDepends on projection, often wrong for large areasAlways correct globally
SpeedFaster (simpler math)Slower (geodesic calculations)
FunctionsAll 300+ PostGIS functionsSubset (~40 functions)
Use whenSingle city, known projectionGlobal data, multi-region

TIP

Rule of thumb: use geography for anything spanning more than one city. Use geometry with a local projection (UTM zone) for high-performance queries within a single metro area.


5. Choosing the Right Spatial Index

Decision Matrix

Production Use Cases

CompanyIndexUse Case
UberH3Surge pricing zones, supply/demand analysis
GoogleS2Google Maps, BigQuery GIS, Earth
TinderGeohash"People nearby" with Redis
DoorDashPostGIS + H3Delivery zones (PostGIS), analytics (H3)
FoursquareS2Place search, venue recommendations
SnapS2 + H3Snap Map, location stories
MongoDBS22dsphere index
ElasticsearchGeohash + BKDGeo queries, geo-aggregations

Geofencing at Scale

Geofencing — "is this point inside this polygon?" — is one of the most performance-sensitive geospatial operations.

Architecture for Real-Time Geofencing

Optimization: Pre-filter with H3

For millions of geofences, checking every polygon is too slow. Pre-compute which H3 cells each geofence covers, then use cell lookup for O(1) pre-filtering:

python
import h3

def precompute_geofence_index(geofences, resolution=7):
    """Build an H3-based geofence lookup table."""
    cell_to_fences = {}

    for fence in geofences:
        # Convert polygon to H3 cells
        polygon = h3.LatLngPoly(fence.vertices)
        cells = h3.polygon_to_cells(polygon, res=resolution)

        for cell in cells:
            if cell not in cell_to_fences:
                cell_to_fences[cell] = []
            cell_to_fences[cell].append(fence.id)

    return cell_to_fences

def check_geofence(lat, lng, cell_index, geofences, resolution=7):
    """Fast geofence check using H3 pre-filter."""
    cell = h3.latlng_to_cell(lat, lng, res=resolution)

    # O(1) lookup — only check geofences that overlap this cell
    candidate_ids = cell_index.get(cell, [])

    # Exact point-in-polygon only for candidates
    results = []
    for fid in candidate_ids:
        fence = geofences[fid]
        if point_in_polygon(lat, lng, fence.vertices):
            results.append(fid)
    return results

Performance Benchmarks

Real-world performance for "find nearest K points within radius R" with 10 million points:

IndexQuery Latency (p50)Query Latency (p99)MemoryNotes
Redis GEOSEARCH0.3 ms1.2 ms~800 MBIn-memory, limited to points
PostGIS GIST2 ms12 ms~400 MB indexSupports all geometry types
Elasticsearch5 ms25 ms~1.5 GBAlso does full-text search
DynamoDB + Geohash8 ms35 msManagedRequires client-side distance filter
MongoDB 2dsphere3 ms15 ms~600 MB indexS2-based, good polygon support
H3 + PostgreSQL1 ms5 ms~300 MB indexPre-computed cells, integer lookups

DANGER

These benchmarks are illustrative. Real performance depends heavily on data distribution, query shape, hardware, and configuration. Always benchmark with your own data and access patterns before choosing.


Combining Spatial Indexes

Production systems often use multiple spatial indexes for different purposes:

LayerIndexWhy
Hot (real-time)Redis GEO or in-memory H3Sub-millisecond proximity, moving objects
Warm (persistent)PostGIS GISTComplex polygon queries, geofencing
AnalyticalH3 in ClickHouse/BigQueryAggregation, heatmaps, trend analysis

Key Takeaways

  1. Geohash is the simplest — use it when you need "nearby" queries in Redis, DynamoDB, or any key-value store
  2. S2 is the most versatile — multi-level covering makes it ideal for arbitrary polygon queries at global scale
  3. H3 is the best for analytics — hexagonal uniformity eliminates directional bias in aggregations
  4. R-tree (PostGIS) is the most powerful — use it when you need polygon intersection, containment, and the full GIS function library
  5. Combine indexes — real systems use Redis for real-time, PostGIS for persistence, and H3 for analytics
  6. Always post-filter — spatial indexes produce candidates; verify with exact distance or point-in-polygon
  7. Pre-compute when possible — geofence checking at scale requires H3/S2 pre-filtering, not brute-force polygon checks

"What I cannot create, I do not understand." — Richard Feynman