Skip to main content

Geocode rows from a SQLite table

Project description

geocode-sqlite

PyPI Changelog Tests License

Geocode rows from a SQLite table

Installation

Install this tool using pip or pipx:

# install inside a virtualenv
pip install geocode-sqlite

# install globally
pipx install geocode-sqlite

Usage

Let's say you have a spreadsheet with addresses in it, and you'd like to map those locations. First, create a SQLite database and insert rows from that spreadsheet using sqlite-utils.

sqlite-utils insert data.db data data.csv --csv

Now, geocode it using OpenStreetMap's Nominatim geocoder.

geocode-sqlite nominatim data.db data \
 --location="{address}, {city}, {state} {zip}" \
 --delay=1 \
 --user-agent="this-is-me"

In the command above, you're using Nominatim, which is free and only asks for a unique user agent (--user-agent).

This will connect to a database (data.db) and read all rows from the table data (skipping any that already have both a latitude and longitude column filled).

You're also telling the geocoder how to extract a location query (--location) from a row of data, using Python's built-in string formatting, and setting a rate limit (--delay) of one request per second.

For each row where geocoding succeeds, latitude and longitude will be populated. If you hit an error, or a rate limit, run the same query and pick up where you left off.

The resulting table layout can be visualized with datasette-cluster-map.

Under the hood, this package uses the excellent geopy library, which is stable and thoroughly road-tested. If you need help understanding a particular geocoder's options, consult geopy's documentation.

Supported Geocoders

The CLI currently supports these geocoders:

  • bing
  • googlev3
  • mapquest (and open-mapquest)
  • mapbox
  • nominatim
  • opencage

Adding new geocoders

  1. Open an issue with the name of the geocoding service as the ticket title (example). Put any noteworthy implementation details in the ticket body, like where to get an API key if one is required.
  2. Fork the repo and add a geocoder.
  3. Add an example to the Makefile. Add tests if there's new shared functionality.

Common arguments and options

Each geocoder needs to know where to find the data it's working with. These are the first two arguments:

  • database: a path to a SQLite file, which must already exist
  • table: the name of a table, in that database, which exists and has data to geocode

From there, we have a set of options passed to every geocoder:

  • location: a string format that will be expanded with each row to build a full query, to be geocoded
  • delay: a delay between each call (some services require this)
  • latitude: latitude column name
  • longitude: longitude column name
  • geojson: store results as GeoJSON, instead of in latitude and longitude columns
  • spatialite: store results in a SpatiaLite geometry column, instead of in latitude and longitude columns
  • raw: store raw geocoding results in a JSON column

Each geocoder takes additional, specific arguments beyond these, such as API keys. Again, geopy's documentation is an excellent resource.

Using SpatiaLite

The --spatialite flag will store results in a geometry column, instead of latitude and longitude columns. This is useful if you're doing other GIS operations, such as using a spatial index. See the SpatiaLite cookbook and functions list for more of what's possible.

Capturing additional geocoding data

Geocoding services typically return more data than just coordinates. This might include accuracy, normalized addresses or other context. This can be captured using the --raw flag. By default, this will add a raw column and store the full geocoding response as JSON. If you want to rename that column, pass a value, like --raw custom_raw.

The shape of this response object will vary between services. You can query specific values using SQLite's built-in JSON functions. For example, this will work with Google's geocoder:

select
  json_extract(raw, '$.formatted_address') as address,
  json_extract(raw, '$.geometry.location_type') as location_type
from
  innout_test

Check each geocoding service's documentation for what's included in the response.

Python API

The command line interface aims to support the most common options for each geocoder. For more fine-grained control, use the Python API.

As with the CLI, this assumes you already have a SQLite database and a table of location data.

from geocode_sqlite import geocode_table
from geopy.geocoders import Nominatim

# create a geocoder instance, with some extra options
nominatim = Nominatim(user_agent="this-is-me", domain="nominatim.local.dev", scheme="http")

# assuming our database is in the same directory
count = geocode_table("data.db", "data", query_template="{address}, {city}, {state} {zip}")

# when it's done
print(f"Geocoded {count} rows")

Any geopy geocoder can be used with the Python API.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd geocode-sqlite
python -m venv .venv
source .venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Please remember that this library is mainly glue code between other well-tested projects, specifically: click, geopy and sqlite-utils. Tests should focus on making sure those parts fit together correctly. We can assume the parts themselves already work.

To that end, there is a test geocoder included: geocode_sqlite.testing.DummyGeocoder. That geocoder works with an included dataset of In-N-Out Burger locations provided by AllThePlaces. It works like a normal GeoPy geocoder, except it will only return results for In-N-Out locations using the included database.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geocode-sqlite-0.8.2.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

geocode_sqlite-0.8.2-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file geocode-sqlite-0.8.2.tar.gz.

File metadata

  • Download URL: geocode-sqlite-0.8.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.14

File hashes

Hashes for geocode-sqlite-0.8.2.tar.gz
Algorithm Hash digest
SHA256 11793ae1663bf1c892a4947395f094280e9ae89a6360ee0266ef81dc0d7464a6
MD5 e3bc113dc33ed6b43bf84fdf7768935d
BLAKE2b-256 f7699a68b0a68715b2a8f3e86e9cfb151ad28cdff0c1e5a53776a91242c42fbd

See more details on using hashes here.

File details

Details for the file geocode_sqlite-0.8.2-py3-none-any.whl.

File metadata

File hashes

Hashes for geocode_sqlite-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e125df6a027d95a0773f2c5b444d9368cf2adf090591251f177c6017758755f1
MD5 20b8d8a37388b907e9477856bf3291dd
BLAKE2b-256 4704755fa05b3bc5070e9c968e317c2e13d333e59442fd795b4a0a85b2c5d6d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page