Skip to main content

Fetch a given sitemap and retrieve all URLs in it.

Project description

fetch-sitemap

Retrieves all URLs of a given sitemap.xml URL and fetches each page one by one. Useful for (load) testing the entire site for error responses.

Sample Output

Installation

$ pip install fetch-sitemap

Usage

$ fetch-sitemap --help

 Usage: fetch-sitemap [OPTIONS] SITEMAP_URL

 Fetch a given sitemap and retrieve all URLs in it.

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ --basic-auth                -a  TEXT              Basic auth information. Format: 'username:password'   │
│ --limit                     -l  INTEGER           Maximum number of URLs to fetch from the given        │
│                                                   sitemap.xml.                                          │
│ --recursive/--no-recursive                        Recursively fetch all sitemap documents from the      │
│                                                   given sitemap.xml.                                    │
│                                                   [default: recursive]                                  │
│ --concurrency-limit         -c  INTEGER           Max number of concurrent requests. [default: 5]       │
│ --request-timeout           -t  INTEGER           Timeout for fetching a URL in seconds. [default: 30]  │
│ --random                    -r                    Append a random string like ?12334232343 to each URL  │
│                                                   to bypass frontend cache.                             │
│ --random-length                 INTEGER           Length of the --random hash. [default: 15]            │
│ --report-path               -p  FILE              Store results in a CSV file. Example: ./report.csv    │
│ --output-dir                -o  DIRECTORY         Store all fetched sitemap documents in this folder.   │
│                                                   Example: /tmp/my.domain.com/                          │
│ --slow-threshold                FLOAT             Responses slower than this (in seconds) are           │
│                                                   considered 'slow'.                                    │
│                                                   [default: 5.0]                                        │
│ --slow-num                      INTEGER OR "ALL"  How many 'slow' responses to show. [default: 10]      │
│ --user-agent                    TEXT              User-Agent string set in the HTTP header.             │
│                                                   [default: Mozilla/5.0 (compatible; fetch-sitemap/21)] │
│ --version                                         Show the version and exit.                            │
│ --help                                            Show this message and exit.                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯

🤺 Local Development

poetry install
poetry run fetch-sitemap -h
poetry run ./tests.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_sitemap-22.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

fetch_sitemap-22-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file fetch_sitemap-22.tar.gz.

File metadata

  • Download URL: fetch_sitemap-22.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-22.tar.gz
Algorithm Hash digest
SHA256 9ebf50d366e148f9d6d995f1ec9959dcc9e206198e8152476a017d9a5feab560
MD5 ce1031265325461c87aa8ab2baa69fc9
BLAKE2b-256 8938f6e1b80fb89ef807043c98cd4be7921ab94589165ed89548df4cc5a7982f

See more details on using hashes here.

File details

Details for the file fetch_sitemap-22-py3-none-any.whl.

File metadata

  • Download URL: fetch_sitemap-22-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-22-py3-none-any.whl
Algorithm Hash digest
SHA256 8387d4b5563dfe2e2a72d4326d164df5f49c25aa84c4cc1f8dc6e77d02d25c9a
MD5 3ae0c242f81ec432825b0cf9920c1e25
BLAKE2b-256 6bdd2f1c385b4bf5e2082c3b9a6240f28081027cf86d52688de03ece7cc14878

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page