Skip to main content

Sitemap generation for ASGI applications.

Project description

asgi-sitemaps

Build Status Coverage Python versions Package version

Generate and check sitemap files from ASGI apps without having to spin up a server. Powered by HTTPX and anyio.

Note: This is alpha software. Be sure to pin your dependencies to the latest minor release.

Quickstart

python -m asgi_sitemaps 'example:app' --base-url https://app.example.io > sitemap.xml
$ cat sitemap.xml
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    <url><loc>https://app.example.io/</loc><changefreq>daily</changefreq></url>
</urlset>

Use the --check mode to verify that the sitemap is in sync (e.g. as part of CI checks):

cat sitemap.xml | python -m asgi_sitemaps --check 'example:app' --base-url https://app.example.io

Features

  • Offline sitemap generation from an ASGI app callable.
  • Support for --check mode.
  • Invoke from the command line, or use the programmatic async API (supports asyncio and trio).
  • Fully type annotated.
  • 100% test coverage.

Installation

Install with pip:

$ pip install asgi-sitemaps

asgi-sitemaps requires Python 3.7+.

Command line reference

$ python -m asgi_sitemaps --help
usage: __main__.py [-h] [--base-url BASE_URL] [-I IGNORE_PATH_PREFIX]
                   [--max-concurrency MAX_CONCURRENCY] [--check]
                   app

positional arguments:
  app                   Path to an ASGI app, formatted as
                        '<module>:<attribute>'.

optional arguments:
  -h, --help            show this help message and exit
  --base-url BASE_URL   Base URL to use when building sitemap entries.
                        (default: http://testserver/)
  -I IGNORE_PATH_PREFIX, --ignore-path-prefix IGNORE_PATH_PREFIX
                        Prevent crawling URLs that start with this path
                        prefix. Can be used multiple times. (default: [])
  --max-concurrency MAX_CONCURRENCY
                        Maximum number of concurrently processed URLs.
                        (default: 100)
  --check               Read existing sitemap from stdin and fail if computed
                        sitemap differs. (default: False)

Programmatic API

The .crawl() async function takes the following arguments:

  • app: an ASGI application instance.
  • base_url: see --base-url
  • ignore: see --ignore-path-prefix.
  • max_concurrency: see --max-concurrency.

It returns a list of strings referring to the discovered URLs.

You can use the .make_url(urls) helper to generate the sitemap XML and save it as is appropriate for your use case.

Example usage, outputting the sitemap to a sitemap.xml file (mimicking the CLI behavior):

import asgi_sitemaps

from .app import app

async def main():
    urls = await asgi_sitemaps.crawl(app)
    with open("sitemap.xml", "w") as f:
        f.write(asgi_sitemaps.make_xml(urls))

By default, .make_xml() generates <url> tags with a daily change frequency. You can customize the generation of URL tags by passing a custom urltag callable:

from urllib.parse import urlsplit

import asgi_sitemaps

def urltag(url):
    path = urlsplit(url).path
    changefreq = "monthly" if path.startswith("/reports") else "daily"
    return f"<url><loc>{url}</loc><changefreq>{changefreq}</changefreq></url>"

urls = ...
xml = asgi_sitemaps.make_xml(urls, urltag=urltag)

License

MIT

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

0.2.0 - 2020-06-01

Changed

  • Project was renamed from sitemaps to asgi-sitemaps - sitemap generation for ASGI apps. (Pull #2)
  • Change options of CLI and programmatic API to fit new "ASGI-only" project scope. (Pull #2)
  • CLI now reads from stdin (for --check mode) and outputs sitemap to stdout. (Pull #2)

Removed

  • Drop support for crawling arbitrary remote servers. (Pull #2)

Fixed

  • Don't include non-200 or non-HTML URLs in sitemap. (Pull #2)

0.1.0 - 2020-05-31

Added

  • Initial implementation: CLI and programmatic async API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asgi-sitemaps-0.2.0.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

asgi_sitemaps-0.2.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file asgi-sitemaps-0.2.0.tar.gz.

File metadata

  • Download URL: asgi-sitemaps-0.2.0.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for asgi-sitemaps-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4a5d9e27297f2e2c21d513d9fdc49d508ebabf0aa316584514af86541640a213
MD5 ec2db2911fda01fa508c76946929895e
BLAKE2b-256 07f285e4dde8747ca7a58680a673a54357521b8867b3a937dcaf7cf109f38afa

See more details on using hashes here.

File details

Details for the file asgi_sitemaps-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: asgi_sitemaps-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for asgi_sitemaps-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91610d0a45335f53bbcd2148622449bbd1609ba3c711d130c19fbc1966ab40fd
MD5 e7b9bd5b9723a4da757605c040e7c89b
BLAKE2b-256 d5d58aca6c96df4cf3b97504386b144a95c6daf31564386e779d573068555b31

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page