Skip to main content

Page Object pattern for Scrapy

Project description

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Requires Python 3.8+ and Scrapy >= 2.6.0.

Usage in a Scrapy Project

Add the following inside Scrapy’s settings.py file:

DOWNLOADER_MIDDLEWARES = {
    "scrapy_poet.InjectionMiddleware": 543,
    "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
    "scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
    "scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt

  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black

  • isort

  • flake8

You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_poet-0.23.0.tar.gz (59.6 kB view details)

Uploaded Source

Built Distribution

scrapy_poet-0.23.0-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_poet-0.23.0.tar.gz.

File metadata

  • Download URL: scrapy_poet-0.23.0.tar.gz
  • Upload date:
  • Size: 59.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for scrapy_poet-0.23.0.tar.gz
Algorithm Hash digest
SHA256 d650d62edf453afa57273f4c294262d33467a11cce7fe9a3db05388cf7dca007
MD5 9394df0c0a9415d4f76e2a475d219e24
BLAKE2b-256 024e708228d66b2fdf01fce617e736165593dfc61bb06459015bb1797d464910

See more details on using hashes here.

Provenance

File details

Details for the file scrapy_poet-0.23.0-py3-none-any.whl.

File metadata

  • Download URL: scrapy_poet-0.23.0-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for scrapy_poet-0.23.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff6a1f62a25cf2b7545778e75b7dada8b8e2b4895607a2aefc835d8111fcc680
MD5 c2f9d1d34a0a86f45e9e566a92bd75b4
BLAKE2b-256 3b17a6e9bbdf367e4d865dcf54377bd00a5eff5555ec87f4472485e11aa4dd1f

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page