Basic geospatial queries in postgres

Table of contents

The need for geospatial queries is not necessarily very frequent in the broader field of software development, leaving developers confused as to what systems to choose when implementing features requiring them. The most common need, finding something close to or in a limited radius around a known point, is fairly straight-forward to implement in pure postgres without the need for complex extensions like PostGIS.

Getting started

While it is possible to do basic distance searching in postgres, be warned that this feature is fairly limited and the PostGIS extension contains many more features like shapes (e.g. to store areas of zipcodes or city boundaries), queries to check if something is contained in a bounding box, intersects or touches it, and more efficient storage and indexing of coordinates and shape data.

However, if all you need is to find something close to a known point, or within a specific distance of it, all those features are overkill. Instead, you can rely on the earthdistance extension:

CREATE EXTENSION IF NOT EXISTS cube;
CREATE EXTENSION IF NOT EXISTS earthdistance;

Both extensions ship by default with most installations, as well as the official postgres docker image. If they are missing from your installation, you are likely missing the postgres-contrib package:

sudo apt install postgres-contrib

Re-run the SQL commands above after the installation finishes and your database is ready to run simple geospatial queries.

Geocoding

Before working with GPS coordinates, you need to have coordinates in the first place. Humans tend to use textual representation of locations, using street names and numbers, city names and zipcodes. Turning this text into lat/long coordinates is called geocoding, and often requires a third-party service or dedicated database to look up.


For simple needs, you can use OpenStreetMap and their publicly hosted geocoding API for free, but note their usage limitations. If you need more requests, either use a hosted third-party solution like the paid google maps geocoding service, or host your own Nominatim instance.

Setting up a sample table

We need some data to query, so let's set up a sample table:

CREATE TABLE locations (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  coordinates point NOT NULL
);

Next, insert the locations of some capitol cities around the world:

INSERT INTO locations (name, coordinates) VALUES
('Washington, D.C., United States', '(-77.036871, 38.907192)'),
('London, United Kingdom', '(-0.127758, 51.507351)'),
('Paris, France', '(2.352222, 48.856613)'),
('Berlin, Germany', '(13.405037, 52.520008)'),
('Rome, Italy', '(12.496366, 41.902783)'),
('Madrid, Spain', '(-3.703790, 40.416775)'),
('Canberra, Australia', '(149.128067, -35.280937)'),
('Ottawa, Canada', '(-75.697193, 45.421530)'),
('Tokyo, Japan', '(139.691711, 35.689487)'),
('Beijing, China', '(116.407396, 39.904200)'),
('Moscow, Russia', '(37.617634, 55.755825)'),
('Brasília, Brazil', '(-47.929222, -15.780148)'),
('New Delhi, India', '(77.209021, 28.613939)'),
('Buenos Aires, Argentina', '(-58.381592, -34.603722)'),
('Mexico City, Mexico', '(-99.133208, 19.432608)'),
('Cairo, Egypt', '(31.235725, 30.244219)'),
('Seoul, South Korea', '(126.978413, 37.566535)'),
('Athens, Greece', '(23.810301, 37.927214)'),
('Stockholm, Sweden', '(18.071930, 59.329323)'),
('Brussels, Belgium', '(4.351810, 50.846733)'),
('Lisbon, Portugal', '(-9.139337, 38.722252)'),
('Warsaw, Poland', '(21.012229, 52.229676)'),
('Vienna, Austria', '(16.373819, 48.208174)');

This is dummy data, not actual city coordinates.

Note the data type POINT used for the coordinates. This type was introduced through the earthdistance extension and enables more efficient storage of GPS location data.

Finding rows by distance to a location

The most basic geospatial query is to find locations close to another location. This example uses the coordinates of Paris, France to find the 3 closest cities from our sample table:

SELECT
   id,
   name,
   earth_distance(
       ll_to_earth(2.352222, 48.856613),
       ll_to_earth(coordinates[0], coordinates[1])
   ) AS distance
FROM locations
ORDER BY distance
LIMIT 3;

The earth_distance function computes the distance between two points on the earth in meters. The result will be:

 id |         name           |    distance     
----+------------------------+-------------------
 3  | Paris, France          |                0
 20 | Brussels, Belgium      | 313776.0683129836
 2  | London, United Kingdom | 404031.4043157682

Calculating the distance between two points in a 2D space causes errors when applied to a 3D sphere like the earth, resulting in increasingly inaccurate results the longer the distance. The earthdistance extension accounts for this and calculates distances correctly on a 3D sphere, but may be slightly slower than other 2D alternatives.

The ll_to_earth functions turn lat/lon coordinates into a point on a sphere (earth), in preparation for distance calculations.

Optimizing query performance

Before any other optimization steps, always make sure you have properly indexed the column used in the distance calculations:

CREATE INDEX ON locations USING GIST(coordinates);

As you may have guessed, calculating the distance for each row on every query can become expensive really fast, depending on the number of rows in the table and the request frequency. Since this problem is very common, earthdistance brings a helper function to take some of the load off the server for such queries: earth_box. The earth_box function takes a location and a distance, then draws a bounding box rectangle around that point. This bounding box can be used in a WHERE condition to discard locations outside of it, without calculating the exact distance from it to the point:

SELECT
  name,
  earth_distance(
      ll_to_earth(2.352222, 48.856613),
      ll_to_earth(coordinates[0], coordinates[1])
  ) AS distance
FROM locations
WHERE
   earth_box(ll_to_earth(2.352222, 48.856613), 500000) @>
   ll_to_earth(coordinates[0],coordinates[1])
ORDER BY distance;

The filter condition in the WHERE clause runs before the distance calculation for each row, and can discard rows outside the box when reading row data.

Coordinates outside the bounding box are eliminated through simple mathematical comparisons instead of expensive calculations, by simply checking if their lat/lon coordinates are both located within the rectangle.


A slight caveat with the bounding box approach is that it is not exactly precise: while using a rectangle makes the filter condition extremely cheap when querying data, a rectangle does not accurately represent a radius. More specifically, the rectangle distance to the base point is at least the given distance, but may be slightly more from the base point to into the corners of the rectangle:

To account for this inaccuracy, use a combination of a bounding box together with a distance check if accurate results are important for your query:

SELECT
  name,
  earth_distance(
      ll_to_earth(2.352222, 48.856613),
      ll_to_earth(coordinates[0], coordinates[1])
  ) AS distance
FROM locations
WHERE
   earth_box(ll_to_earth(2.352222, 48.856613), 500000) @>
   ll_to_earth(coordinates[0],coordinates[1])
AND
   earth_distance(
      ll_to_earth(2.352222, 48.856613),
      ll_to_earth(coordinates[0], coordinates[1])
  ) <= 500000
ORDER BY distance;

The query first discards all cities outside the bounding box, then calculates the distances for all remaining rows, removes rows that exceed the radius and finally orders them by distance. Be aware that this query is a fair bit more expensive than just using the bounding box alone (it computes the distance for results within the bounding box twice), so only use it if you really need precise results and cannot have any results slightly outside the search radius within the query results.

More articles

Understanding computer storage units and transfer speeds

Making sense of Gb, GB, GiB, Gbps and GBps

A Practical Guide to Linux Disk Encryption

Keeping disk contents safe from prying eyes

Editing files with nano

Terminal file-editing made easy

Why boring software is a smart choice

Not everything is about excitement

Common pitfalls running docker in production

Avoiding the mistakes many make when embracing containerized deployments

Modern linux networking basics

Getting started with systemd-networkd, NetworkManager and the iproute2 suite