Skip to main content

Sitemap generation for ASGI applications.

Project description

asgi-sitemaps

Build Status Coverage Python versions Package version

Generate and check sitemap files from ASGI apps without having to spin up a server. Powered by HTTPX and anyio.

Note: This is alpha software. Be sure to pin your dependencies to the latest minor release.

Quickstart

python -m asgi_sitemaps 'example:app' --base-url https://app.example.io > sitemap.xml
$ cat sitemap.xml
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    <url><loc>https://app.example.io/</loc><changefreq>daily</changefreq></url>
</urlset>

Use the --check mode to verify that the sitemap is in sync (e.g. as part of CI checks):

cat sitemap.xml | python -m asgi_sitemaps --check 'example:app' --base-url https://app.example.io

Features

  • Offline sitemap generation from an ASGI app callable.
  • Support for --check mode.
  • Invoke from the command line, or use the programmatic async API (supports asyncio and trio).
  • Fully type annotated.
  • 100% test coverage.

Installation

Install with pip:

$ pip install asgi-sitemaps

asgi-sitemaps requires Python 3.7+.

Command line reference

$ python -m asgi_sitemaps --help
usage: __main__.py [-h] [--base-url BASE_URL] [-I IGNORE_PATH_PREFIX]
                   [--max-concurrency MAX_CONCURRENCY] [--check]
                   app

positional arguments:
  app                   Path to an ASGI app, formatted as
                        '<module>:<attribute>'.

optional arguments:
  -h, --help            show this help message and exit
  --base-url BASE_URL   Base URL to use when building sitemap entries.
                        (default: http://testserver/)
  -I IGNORE_PATH_PREFIX, --ignore-path-prefix IGNORE_PATH_PREFIX
                        Prevent crawling URLs that start with this path
                        prefix. Can be used multiple times. (default: [])
  --max-concurrency MAX_CONCURRENCY
                        Maximum number of concurrently processed URLs.
                        (default: 100)
  --check               Read existing sitemap from stdin and fail if computed
                        sitemap differs. (default: False)

Programmatic API

The .crawl() async function takes the following arguments:

  • app: an ASGI application instance.
  • base_url: see --base-url
  • ignore: see --ignore-path-prefix.
  • max_concurrency: see --max-concurrency.

It returns a list of strings referring to the discovered URLs.

You can use the .make_url(urls) helper to generate the sitemap XML and save it as is appropriate for your use case.

Example usage, outputting the sitemap to a sitemap.xml file (mimicking the CLI behavior):

import asgi_sitemaps

from .app import app

async def main():
    urls = await asgi_sitemaps.crawl(app)
    with open("sitemap.xml", "w") as f:
        f.write(asgi_sitemaps.make_xml(urls))

By default, .make_xml() generates <url> tags with a daily change frequency. You can customize the generation of URL tags by passing a custom urltag callable:

from urllib.parse import urlsplit

import asgi_sitemaps

def urltag(url):
    path = urlsplit(url).path
    changefreq = "monthly" if path.startswith("/reports") else "daily"
    return f"<url><loc>{url}</loc><changefreq>{changefreq}</changefreq></url>"

urls = ...
xml = asgi_sitemaps.make_xml(urls, urltag=urltag)

License

MIT

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

0.2.0-rc.1 - 2020-06-01

Changed

  • Project was renamed from sitemaps to asgi-sitemaps - sitemap generation for ASGI apps. (Pull #2)
  • Change options of CLI and programmatic API to fit new "ASGI-only" project scope. (Pull #2)
  • CLI now reads from stdin (for --check mode) and outputs sitemap to stdout. (Pull #2)

Removed

  • Drop support for crawling arbitrary remote servers. (Pull #2)

Fixed

  • Don't include non-200 or non-HTML URLs in sitemap. (Pull #2)

0.1.0 - 2020-05-31

Added

  • Initial implementation: CLI and programmatic async API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asgi-sitemaps-0.2.0rc1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

asgi_sitemaps-0.2.0rc1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file asgi-sitemaps-0.2.0rc1.tar.gz.

File metadata

  • Download URL: asgi-sitemaps-0.2.0rc1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for asgi-sitemaps-0.2.0rc1.tar.gz
Algorithm Hash digest
SHA256 5b353a875f79bdc90e396d79d1862ff77aba10078c9b0759065ed85628033cfa
MD5 e12ee35382d60c2cad7e410dcdbe69cb
BLAKE2b-256 f8f67553acd5a0f6e0c7a8c8895f9645237d54067dcc14b8401104b1011118ec

See more details on using hashes here.

File details

Details for the file asgi_sitemaps-0.2.0rc1-py3-none-any.whl.

File metadata

  • Download URL: asgi_sitemaps-0.2.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for asgi_sitemaps-0.2.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f82681fe2c792f096387e4512780d0b502a0e6de7dd2ee46553139e0932abe3
MD5 e8d2b77921457de776e7e55fd6b51d30
BLAKE2b-256 cd77ebe021d3d0cc8ccc844da83954456ee5112a70bfc636a615865b9a8cfb61

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page