Skip to main content

Page Object pattern for Scrapy

Project description

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Requires Python 3.8+ and Scrapy >= 2.6.0.

Usage in a Scrapy Project

Add the following inside Scrapy’s settings.py file:

DOWNLOADER_MIDDLEWARES = {
    "scrapy_poet.InjectionMiddleware": 543,
    "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
    "scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
    "scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt

  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black

  • isort

  • flake8

You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-poet-0.22.0.tar.gz (57.1 kB view details)

Uploaded Source

Built Distribution

scrapy_poet-0.22.0-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapy-poet-0.22.0.tar.gz.

File metadata

  • Download URL: scrapy-poet-0.22.0.tar.gz
  • Upload date:
  • Size: 57.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for scrapy-poet-0.22.0.tar.gz
Algorithm Hash digest
SHA256 cb3ed9142c3af50199ab4fc290f6332e4f76b22f558902c52af9c65856175a4b
MD5 3fb0a268175dfc263aedbb9d7262c7ab
BLAKE2b-256 94a11c0908732a950a5655da5d5e7571a04adc3c9f348f26a1d689be6d88af05

See more details on using hashes here.

Provenance

File details

Details for the file scrapy_poet-0.22.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_poet-0.22.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b40504a262cc77085398e1193b72d3e98c2efcb2391900064785dd6ccc1ea2a1
MD5 a2293dac28823ba8cf99fb0c388fac55
BLAKE2b-256 326f62bd28dbd919c99ecc1a901e4de915652d2dd240b35fc1ee91419a95ebde

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page