Skip to main content

Client library to process URLs through Zyte API

Project description

PyPI Version Supported Python Versions Automated tests Coverage report

Requirements

  • Python 3.7+

  • Scrapy

Installation

pip install scrapy-zyte-api

This package requires Python 3.7+.

Configuration

Replace the default http and https in Scrapy’s DOWNLOAD_HANDLERS in the settings.py of your Scrapy project.

You also need to set the ZYTE_API_KEY.

Lastly, make sure to install the asyncio-based Twisted reactor in the settings.py file as well:

Here’s an example of the things needed inside a Scrapy project’s settings.py file:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_zyte_api.handler.ScrapyZyteAPIDownloadHandler",
    "https": "scrapy_zyte_api.handler.ScrapyZyteAPIDownloadHandler"
}

# Having the following in the env var would also work.
ZYTE_API_KEY = "<your API key>"

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Usage

To enable a scrapy.Request to go through Zyte Data API, the zyte_api key in Request.meta must be present and has dict-like contents.

To set the default parameters for Zyte API enabled requests, you can set the following in the settings.py file or any other settings within Scrapy:

ZYTE_API_DEFAULT_PARAMS = {
    "browserHtml": True,
    "geolocation": "US",
}

You can see the full list of parameters in the Zyte Data API Specification.

Note that the ZYTE_API_DEFAULT_PARAMS would only work if the zyte_api key in Request.meta is set. When doing so, it will override any parameters set in the ZYTE_API_DEFAULT_PARAMS setting.

import scrapy


class SampleQuotesSpider(scrapy.Spider):
    name = "sample_quotes"

    custom_settings = {
        "ZYTE_API_DEFAULT_PARAMS": {
            "geolocation": "US",  # You can set any Geolocation region you want.
        }
    }

    def start_requests(self):
        yield scrapy.Request(
            url="http://books.toscrape.com/",
            callback=self.parse,
            meta={
                "zyte_api": {
                    "browserHtml": True,
                    "javascript": True,
                    "echoData": {"some_value_I_could_track": 123},
                }
            },
        )

    def parse(self, response):
        yield {"URL": response.url, "status": response.status, "HTML": response.body}

        print(response.raw_api_response)
        # {
        #     'url': 'https://quotes.toscrape.com/',
        #     'browserHtml': '<html> ... </html>',
        #     'echoData': {'some_value_I_could_track': 123},
        # }

        print(response.request.meta)
        # {
        #     'zyte_api': {
        #         'browserHtml': True,
        #         'geolocation': 'US',
        #         'javascript': True,
        #         'echoData': {'some_value_I_could_track': 123}
        #     },
        #     'download_timeout': 180.0,
        #     'download_slot': 'quotes.toscrape.com'
        # }

The raw Zyte Data API response can be accessed via the raw_api_response attribute of the response object. Note that such responses are of ZyteAPIResponse and ZyteAPITextResponse types, which are respectively subclasses of scrapy.http.Response and scrapy.http.TextResponse. Such classes are needed to hold the raw Zyte Data API responses.

If multiple requests target the same URL with different Zyte Data API parameters, pass dont_filter=True to Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-zyte-api-0.2.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

scrapy_zyte_api-0.2.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-zyte-api-0.2.0.tar.gz.

File metadata

  • Download URL: scrapy-zyte-api-0.2.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for scrapy-zyte-api-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f9d8ad57cde921c8b66d0c046a5697fde1ea7d56dfc53a820139ce3381f29410
MD5 773d72a18910fee30add00843271ba9e
BLAKE2b-256 b1b49b2c06c2d30b6b6d6fff734fbe8d63de1f298705753bff527af90b27e3bf

See more details on using hashes here.

File details

Details for the file scrapy_zyte_api-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_zyte_api-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2262510bf217897dd243854356106c3cb95dc36560a649639424a9a849a842f3
MD5 255ccf6ba358c98fd2778b5cb0128230
BLAKE2b-256 fec34b38eae198efcd4f6071846713ec307d321ea25b49657fb1e775749b9497

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page