Skip to main content

Fetch a given sitemap and retrieve all URLs in it.

Project description

fetch-sitemap

Retrieves all URLs of a given sitemap.xml URL and fetches each page one by one. Useful for (load) testing the entire site for error responses.

Sample Output

Note: The default concurrency limit is 5, so five URLs are fetched at once. Depending on your server's worker count, this might already be enough to DoS it. Try --concurrency-limit=2 and increase if you feel comfortable.

Usage: fetch-sitemap [-h] [--basic-auth BASIC_AUTH] [-l LIMIT] [-c CONCURRENCY_LIMIT]
                     [-t REQUEST_TIMEOUT] [--random] [--report-path REPORT_PATH]
                     [-o OUTPUT] [-v]
                     sitemap_url

Fetch a given sitemap and retrieve all URLs in it.

Positional Arguments:
  sitemap_url           URL of the sitemap to fetch

Options:
  -h, --help            show this help message and exit
  --basic-auth BASIC_AUTH
                        Basic auth information. Use: 'username:password' (default: None)
  -l, --limit LIMIT     Maximum number of URLs to fetch from the given sitemap.xml
                        (default: None)
  -c, --concurrency-limit CONCURRENCY_LIMIT
                        Max number of concurrent requests (default: 5)
  -t, --request-timeout REQUEST_TIMEOUT
                        Timeout for fetching a URL in seconds (default: 30)
  --random              Append a random string like ?12334232343 to each URL to bypass
                        frontend cache (default: False)
  --report-path REPORT_PATH
                        Store results in a CSV file (example: ./report.csv) (default:
                        None)
  -o, --output-dir OUTPUT
                        Store all fetched sitemap documents in this folder (default: None)
  -v, --version         Show program's version number and exit

🤺 Local Development

poetry install
poetry run fetch-sitemap -h
poetry run ./tests.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_sitemap-14.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

fetch_sitemap-14-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file fetch_sitemap-14.tar.gz.

File metadata

  • Download URL: fetch_sitemap-14.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-14.tar.gz
Algorithm Hash digest
SHA256 f53493a3ed67ed675548ca27fe7e73c8e2561a8cc56102c6fc38011ade28c822
MD5 3e7d1d8d50b6e809dcd2a244b7767306
BLAKE2b-256 634867c1c6b1afe4116ef9ce0c286b9bf4dc7126e0fc3e2beb6a1c946954e1b4

See more details on using hashes here.

File details

Details for the file fetch_sitemap-14-py3-none-any.whl.

File metadata

  • Download URL: fetch_sitemap-14-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-14-py3-none-any.whl
Algorithm Hash digest
SHA256 c3470c62de1016cd1a05318939c103627707292e7e41d0021de4387c65ac6738
MD5 a31dd1feef60bc31f8c276c99bcc8762
BLAKE2b-256 68f3eeb65fb0bf851a259636742553d8cc4d7fe23c0451a04e6da8d6c5846af6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page