Skip to main content

Restrict authorized Scrapy redirections to the website start_urls

Project description

scrapy-redirect restricts authorized HTTP redirections to the website start_urls

Why?

If the Scrapy REDIRECT_ENABLED config key is set to False and a request to the homepage of the crawled website returns a 3XX status code, the crawl will stop immediatly, as the redirection will not be followed.

scrapy-redirect will force Scrapy to tolerate redirections coming from the start_urls urls, in the case where REDIRECT_ENABLED = False, to avoid this particular problem.

Installation

$ pip install scrapy-redirect

Configuration

Install scrapy-redirect in your Scrapy middlewares by adding the following key/value pair in the SPIDER_MIDDLEWARES settings key (in settings.py):

SPIDER_MIDDLEWARES = {
    ...
    'scrapyredirect.HomepageRedirectMiddleware': 575,
    ...
}

Note that it is important for the middleware order value to be inferior to 600 (the default value of the 'scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware' middleware), as it must be executed before Scrapy blocks the redirection.

NB: if REDIRECT_ENABLED = True, scrapy-redirect does nothing.

License

scrapy-redirect is published under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-redirect-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

File details

Details for the file scrapy-redirect-0.1.0.tar.gz.

File metadata

File hashes

Hashes for scrapy-redirect-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4eb733d867925794b0598738da518374d35199db5af4a04f62235352db67b2d2
MD5 baf105f87f301ed9c53881c2690f1a07
BLAKE2b-256 d0371b6b6ee64a0fbf37eb408c298dfddb04b22f5b2f989d8a6b948d32c70780

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page