Skip to main content

Fetch a given sitemap and retrieve all URLs in it.

Project description

fetch-sitemap

Retrieves all URL's of a given sitemap.xml URL and fetches each page one by one. Useful for (load) testing the entire site for error responses.

Sample Output

Note: The default concurrency limit is 5, so 5 URLs are fetched at once. Depending on your server's worker count, this might already be enough to DoS it. Try --concurrency-limit=2 and increase if you feel comfortable.

usage: fetch-sitemap 
    [-h] 
    [--basic-auth BASIC_AUTH] 
    [-l LIMIT] 
    [-c CONCURRENCY_LIMIT] 
    [-t REQUEST_TIMEOUT] 
    [--report-path REPORT_PATH] 
    sitemap_url

Fetch a given sitemap and retrieve all URLs in it.

positional arguments:
  sitemap_url           URL of the sitemap to fetch

options:
  -h, --help            show this help message and exit
  --basic-auth BASIC_AUTH
                        Basic auth information. Use: 'username:password'.
  -l LIMIT, --limit LIMIT
                        Maximum number of URLs to fetch from the given sitemap.xml. Default: All
  -c CONCURRENCY_LIMIT, --concurrency-limit CONCURRENCY_LIMIT
                        Max number of concurrent requests. Default: 5
  -t REQUEST_TIMEOUT, --request-timeout REQUEST_TIMEOUT
                        Timeout for fetching a URL in seconds. Default: 30
  --random              Append a random string like ?12334232343 to each URL to bypass frontend cache. Default: False
  --report-path REPORT_PATH
                        Store results in a CSV file. Example: ./report.csv
  -o OUTPUT, --output-dir OUTPUT
                        Store all fetched sitemap documents in this folder.
  -v, --version         show program's version number and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_sitemap-11.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

fetch_sitemap-11-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file fetch_sitemap-11.tar.gz.

File metadata

  • Download URL: fetch_sitemap-11.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-11.tar.gz
Algorithm Hash digest
SHA256 160957d1b1326f9839c9a432364bcaca57cf09d155cc569719d5d4e5a95665d6
MD5 edf3ee9d9a12a2e5bf33d39c38848811
BLAKE2b-256 56385a8974ad459a77e0e2c9eb175f36318b70e276ec944055ea2bc6214aa93e

See more details on using hashes here.

File details

Details for the file fetch_sitemap-11-py3-none-any.whl.

File metadata

  • Download URL: fetch_sitemap-11-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-11-py3-none-any.whl
Algorithm Hash digest
SHA256 6dd2212726a52b148b34f2580d666751564e2b3b05f5ad2b92ca4bce92e910f8
MD5 b7661afa26b02b8f2d177f0066896ef1
BLAKE2b-256 fcecfa46075dafe4f40188877775274d3f2ff720a5da87be0eddcb1f9230b6e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page