Skip to main content

Fetch a given sitemap and retrieve all URLs in it.

Project description

fetch-sitemap

Retrieves all URLs of a given sitemap.xml URL and fetches each page one by one. Useful for (load) testing the entire site for error responses.

Sample Output

Installation

$ pip install fetch-sitemap

Usage

$ fetch-sitemap --help

 Usage: fetch-sitemap [OPTIONS] SITEMAP_URL

 Fetch a given sitemap and retrieve all URLs in it.

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮
│ --basic-auth                -a  TEXT              Basic auth information. Format: 'username:password'   │
│ --limit                     -l  INTEGER           Maximum number of URLs to fetch from the given        │
│                                                   sitemap.xml.                                          │
│ --recursive/--no-recursive                        Recursively fetch all sitemap documents from the      │
│                                                   given sitemap.xml.                                    │
│                                                   [default: recursive]                                  │
│ --concurrency-limit         -c  INTEGER           Max number of concurrent requests. [default: 5]       │
│ --request-timeout           -t  INTEGER           Timeout for fetching a URL in seconds. [default: 30]  │
│ --random                    -r                    Append a random string like ?12334232343 to each URL  │
│                                                   to bypass frontend cache.                             │
│ --random-length                 INTEGER           Length of the --random hash. [default: 15]            │
│ --report-path               -p  FILE              Store results in a CSV file. Example: ./report.csv    │
│ --output-dir                -o  DIRECTORY         Store all fetched sitemap documents in this folder.   │
│                                                   Example: /tmp/my.domain.com/                          │
│ --slow-threshold                FLOAT             Responses slower than this (in seconds) are           │
│                                                   considered 'slow'.                                    │
│                                                   [default: 5.0]                                        │
│ --slow-num                      INTEGER OR "ALL"  How many 'slow' responses to show. [default: 10]      │
│ --user-agent                    TEXT              User-Agent string set in the HTTP header.             │
│                                                   [default: Mozilla/5.0 (compatible; fetch-sitemap/21)] │
│ --version                                         Show the version and exit.                            │
│ --help                                            Show this message and exit.                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯

🤺 Local Development

poetry install
poetry run fetch-sitemap -h
poetry run ./tests.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_sitemap-23.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

fetch_sitemap-23-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file fetch_sitemap-23.tar.gz.

File metadata

  • Download URL: fetch_sitemap-23.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-23.tar.gz
Algorithm Hash digest
SHA256 66a7f105c2523a84bdfd5b8f8cb4d1611d85c42797bb8afbc102e9cd436b9267
MD5 d0d73f7dd5928aac55977d3be2b536c2
BLAKE2b-256 13399248d0cacee1b735c11a1339fe1c0c9ed1f463dc8d47dc4439c3a85067fc

See more details on using hashes here.

File details

Details for the file fetch_sitemap-23-py3-none-any.whl.

File metadata

  • Download URL: fetch_sitemap-23-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for fetch_sitemap-23-py3-none-any.whl
Algorithm Hash digest
SHA256 bdf3d4eb419309260d230392f8a3ddd8247c41c0b98fbc3d75e6044d405fcafc
MD5 f02f957e9f8a31a70f8b71b5b2e58953
BLAKE2b-256 b2b9cc4fe438854b86486966f0032ad16e5c6922e081720198e377bc90a39ed2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page