Skip to main content

A spider middleware that forwards meta params through subsequent requests.

Project description

scrapy-sticky-meta-params

A Scrapy spider middleware that forwards meta params through subsequent requests.

What does it do?

This middleware simplify the process of carrying information through requests and responses on spiders.

Without the middleware

class SampleSpider(Spider):
    name = "without_middleware"
    start_urls = ["https://www.example.com"]

    def parse(self, response):
        for param in range(5):
            yield Request(
                "https://www.example.com/next",
                meta={"param": param},
                callback=self.parse_2
            )

    def parse_2(self, response):
        # Get important information from response
        info = response.xpath("//info/text()").get("info")
        # We need to get the param from meta and forward it again in this request
        param = response.meta["param"]
        yield Request(
            "https://www.example.com/next",
            meta={"info": info, "param": param},
            callback=self.parse_3
        )

    def parse_3(self, response):
        yield {
            "param": response.meta["param"],  # The value that we've extracted in the first callback
            "info": response.meta["info"]
        }

With the middleware

class SampleSpider(Spider):
    name = "with_middleware"
    start_urls = ["https://www.example.com"]
    sticky_meta_keys = ["param"]  # Will always forward the meta param "param"

    def parse(self, response):
        for param in range(5):
            yield Request(
                "https://www.example.com/next",
                meta={"param": param},
                callback=self.parse_2
            )

    def parse_2(self, response):
        # Get important information from response
        info = response.xpath("//info/text()").get("info")
        # We don"t need to get the "param" value from meta and resend it.
        yield Request(
            "https://www.example.com/next",
            meta={"info": info},
            callback=self.parse_3
        )

    def parse_3(self, response):
        yield {
            "param": response.meta["param"],  # The value that we've extracted in the first callback
            "info": response.meta["info"]
        }

Awesome, how to use it?

To enable the middleware you need to add it to your projects's SPIDER_MIDDLEWARES setting in settings.py.

SPIDER_MIDDLEWARES = {
    'scrapy_sticky_meta_params.middleware.StickyMetaParamsMiddleware': 550,
}

This middleware needs to be enabled per spider, to do this you need to add the following attribute on your spider:

sticky_meta_keys = []

You need to fill this list with every key that you want to be forwarded to subsequent requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-sticky-meta-params-1.0.0.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file scrapy-sticky-meta-params-1.0.0.tar.gz.

File metadata

  • Download URL: scrapy-sticky-meta-params-1.0.0.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for scrapy-sticky-meta-params-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7e4d887d08ed703a4190129e21cfc034db9be3045c3b2e9b16f731e60d8c05c8
MD5 fe3c48c56d9744768e8241c8f6a1151c
BLAKE2b-256 083bbbfb905d08b1f9befa007cd72ec41a692b9437c0dd87a4596edc06686ca2

See more details on using hashes here.

File details

Details for the file scrapy_sticky_meta_params-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_sticky_meta_params-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for scrapy_sticky_meta_params-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 da7f3af2d21303cfb681218bcd7f8d067de34b77a872346dec11a729d7e108fb
MD5 c029a26e3f7e5c26f59c8d1792374369
BLAKE2b-256 1f9ce15c0afaf26072f0c0402113e04844859a0949221a4cb8ffce479844d03d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page