Sitemap generation for ASGI applications.
Project description
asgi-sitemaps
Generate and check sitemap files from ASGI apps without having to spin up a server. Powered by HTTPX and anyio.
Note: This is alpha software. Be sure to pin your dependencies to the latest minor release.
Quickstart
python -m asgi_sitemaps 'example:app' --base-url https://app.example.io > sitemap.xml
$ cat sitemap.xml
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url><loc>https://app.example.io/</loc><changefreq>daily</changefreq></url>
</urlset>
Use the --check
mode to verify that the sitemap is in sync (e.g. as part of CI checks):
cat sitemap.xml | python -m asgi_sitemaps --check 'example:app' --base-url https://app.example.io
Features
- Offline sitemap generation from an ASGI app callable.
- Support for
--check
mode. - Invoke from the command line, or use the programmatic async API (supports asyncio and trio).
- Fully type annotated.
- 100% test coverage.
Installation
Install with pip:
$ pip install asgi-sitemaps
asgi-sitemaps
requires Python 3.7+.
Command line reference
$ python -m asgi_sitemaps --help
usage: __main__.py [-h] [--base-url BASE_URL] [-I IGNORE_PATH_PREFIX]
[--max-concurrency MAX_CONCURRENCY] [--check]
app
positional arguments:
app Path to an ASGI app, formatted as
'<module>:<attribute>'.
optional arguments:
-h, --help show this help message and exit
--base-url BASE_URL Base URL to use when building sitemap entries.
(default: http://testserver/)
-I IGNORE_PATH_PREFIX, --ignore-path-prefix IGNORE_PATH_PREFIX
Prevent crawling URLs that start with this path
prefix. Can be used multiple times. (default: [])
--max-concurrency MAX_CONCURRENCY
Maximum number of concurrently processed URLs.
(default: 100)
--check Read existing sitemap from stdin and fail if computed
sitemap differs. (default: False)
Programmatic API
The .crawl()
async function takes the following arguments:
app
: an ASGI application instance.base_url
: see--base-url
ignore
: see--ignore-path-prefix
.max_concurrency
: see--max-concurrency
.
It returns a list of strings referring to the discovered URLs.
You can use the .make_url(urls)
helper to generate the sitemap XML and save it as is appropriate for your use case.
Example usage, outputting the sitemap to a sitemap.xml
file (mimicking the CLI behavior):
import asgi_sitemaps
from .app import app
async def main():
urls = await asgi_sitemaps.crawl(app)
with open("sitemap.xml", "w") as f:
f.write(asgi_sitemaps.make_xml(urls))
By default, .make_xml()
generates <url>
tags with a daily
change frequency. You can customize the generation of URL tags by passing a custom urltag
callable:
from urllib.parse import urlsplit
import asgi_sitemaps
def urltag(url):
path = urlsplit(url).path
changefreq = "monthly" if path.startswith("/reports") else "daily"
return f"<url><loc>{url}</loc><changefreq>{changefreq}</changefreq></url>"
urls = ...
xml = asgi_sitemaps.make_xml(urls, urltag=urltag)
License
MIT
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
0.2.0-rc.1 - 2020-06-01
Changed
- Project was renamed from
sitemaps
toasgi-sitemaps
- sitemap generation for ASGI apps. (Pull #2) - Change options of CLI and programmatic API to fit new "ASGI-only" project scope. (Pull #2)
- CLI now reads from stdin (for
--check
mode) and outputs sitemap to stdout. (Pull #2)
Removed
- Drop support for crawling arbitrary remote servers. (Pull #2)
Fixed
- Don't include non-200 or non-HTML URLs in sitemap. (Pull #2)
0.1.0 - 2020-05-31
Added
- Initial implementation: CLI and programmatic async API.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file asgi-sitemaps-0.2.0rc1.tar.gz
.
File metadata
- Download URL: asgi-sitemaps-0.2.0rc1.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b353a875f79bdc90e396d79d1862ff77aba10078c9b0759065ed85628033cfa |
|
MD5 | e12ee35382d60c2cad7e410dcdbe69cb |
|
BLAKE2b-256 | f8f67553acd5a0f6e0c7a8c8895f9645237d54067dcc14b8401104b1011118ec |
File details
Details for the file asgi_sitemaps-0.2.0rc1-py3-none-any.whl
.
File metadata
- Download URL: asgi_sitemaps-0.2.0rc1-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f82681fe2c792f096387e4512780d0b502a0e6de7dd2ee46553139e0932abe3 |
|
MD5 | e8d2b77921457de776e7e55fd6b51d30 |
|
BLAKE2b-256 | cd77ebe021d3d0cc8ccc844da83954456ee5112a70bfc636a615865b9a8cfb61 |