Skip to main content

Fetch a given sitemap and retrieve all URLs in it.

Project description

fetch-sitemap

Retrieves all URLs of a given sitemap.xml URL and fetches each page one by one. Useful for (load) testing the entire site for error responses.

Sample Output

Installation

$ pip install fetch-sitemap

Usage

$ fetch-sitemap --help

 Usage: fetch-sitemap [OPTIONS] SITEMAP_URL

 Fetch a given sitemap and retrieve all URLs in it.

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ --basic-auth                -a  TEXT              Basic auth information. Format: 'username:password'   │
│ --limit                     -l  INTEGER           Maximum number of URLs to fetch from the given        │
│                                                   sitemap.xml.                                          │
│ --recursive/--no-recursive                        Recursively fetch all sitemap documents from the      │
│                                                   given sitemap.xml.                                    │
│                                                   [default: recursive]                                  │
│ --concurrency-limit         -c  INTEGER           Max number of concurrent requests. [default: 5]       │
│ --request-timeout           -t  INTEGER           Timeout for fetching a URL in seconds. [default: 30]  │
│ --random                    -r                    Append a random string like ?12334232343 to each URL  │
│                                                   to bypass frontend cache.                             │
│ --random-length                 INTEGER           Length of the --random hash. [default: 15]            │
│ --report-path               -p  FILE              Store results in a CSV file. Example: ./report.csv    │
│ --output-dir                -o  DIRECTORY         Store all fetched sitemap documents in this folder.   │
│                                                   Example: /tmp/my.domain.com/                          │
│ --slow-threshold                FLOAT             Responses slower than this (in seconds) are           │
│                                                   considered 'slow'.                                    │
│                                                   [default: 5.0]                                        │
│ --slow-num                      INTEGER OR "ALL"  How many 'slow' responses to show. [default: 10]      │
│ --user-agent                    TEXT              User-Agent string set in the HTTP header.             │
│                                                   [default: Mozilla/5.0 (compatible; fetch-sitemap/21)] │
│ --version                                         Show the version and exit.                            │
│ --help                                            Show this message and exit.                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯

🤺 Local Development

poetry install
poetry run fetch-sitemap -h
poetry run ./tests.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_sitemap-24.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

fetch_sitemap-24-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file fetch_sitemap-24.tar.gz.

File metadata

  • Download URL: fetch_sitemap-24.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-24.tar.gz
Algorithm Hash digest
SHA256 131b763710a96b5ebdc4ffb4a7a704bfc1b318ec5f3312e08d2f742edfdefa4c
MD5 82cdc6ac6ff69702a419eaeb8b2d3494
BLAKE2b-256 bb613e0fb9eb090ca4eabfd30ef536646ef5fee9fc19011e0520f5fc5ecd0c3c

See more details on using hashes here.

File details

Details for the file fetch_sitemap-24-py3-none-any.whl.

File metadata

  • Download URL: fetch_sitemap-24-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-24-py3-none-any.whl
Algorithm Hash digest
SHA256 6509f38eec5f9ecb050a6d7918b69ab95205663cde16b0b1160249b4d7011795
MD5 56d2a4e7f5d55b345d43512b10423619
BLAKE2b-256 29583ec1b28868fbe820514a06aa73dfc38140c9bcbc9f7bb265f1740ff418bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page