Skip to main content

A downloader middleware that stores the current request chain to be crawled at another time.

Project description

scrapy-time-machine

Run your spider with a previously crawled request chain.

Why?

Lets say your spider crawls some page everyday and after some time you notice that an important information was added and you want to start saving it.

You may modify your spider and extract this information from now on, but what if you want the historical value of this data, since it was first introduced to the site?

With this extension you can save a snapshot of the site at every run to be used in the future (as long as you don't change the request chain).

Sample project

There is a sample Scrapy project available at the examples directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-time-machine-1.0.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

scrapy_time_machine-1.0.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-time-machine-1.0.0.tar.gz.

File metadata

  • Download URL: scrapy-time-machine-1.0.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for scrapy-time-machine-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5c503de6b6771ad9007b85ee006657c09ec07bd4edaf1ee37164151b69563bc6
MD5 5deaeb37a88e3a7c28a452bd9dbb8a56
BLAKE2b-256 601a8405499af1d07717d36e8728fb3bf72af7cb598ffd491908992f8a57f3f6

See more details on using hashes here.

File details

Details for the file scrapy_time_machine-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_time_machine-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c6e21d52c5918be766f096f172f186ec89bc5bfbc019ef305f05c47080cc38c
MD5 7a67f4457ba376436252fb80ec176542
BLAKE2b-256 0d78e4dec96995f09726f1a1447430509e1123fecc541ccf2caeb9335e16688b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page