Skip to main content

Page Object pattern for Scrapy

Project description

PyPI Version Supported Python Versions Build Status Coverage report Documentation Status

scrapy-poet is the web-poet Page Object pattern implementation for Scrapy. scrapy-poet allows to write spiders where extraction logic is separated from the crawling one. With scrapy-poet is possible to make a single spider that supports many sites with different layouts.

Read the documentation for more information.

License is BSD 3-clause.

Quick Start

Installation

pip install scrapy-poet

Requires Python 3.8+ and Scrapy >= 2.6.0.

Usage in a Scrapy Project

Add the following inside Scrapy’s settings.py file:

DOWNLOADER_MIDDLEWARES = {
    "scrapy_poet.InjectionMiddleware": 543,
    "scrapy.downloadermiddlewares.stats.DownloaderStats": None,
    "scrapy_poet.DownloaderStatsMiddleware": 850,
}
SPIDER_MIDDLEWARES = {
    "scrapy_poet.RetryMiddleware": 275,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"

Developing

Setup your local Python environment via:

  1. pip install -r requirements-dev.txt

  2. pre-commit install

Now everytime you perform a git commit, these tools will run against the staged files:

  • black

  • isort

  • flake8

You can also directly invoke pre-commit run –all-files or tox -e linters to run them without performing a commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_poet-0.22.5.tar.gz (57.8 kB view details)

Uploaded Source

Built Distribution

scrapy_poet-0.22.5-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_poet-0.22.5.tar.gz.

File metadata

  • Download URL: scrapy_poet-0.22.5.tar.gz
  • Upload date:
  • Size: 57.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for scrapy_poet-0.22.5.tar.gz
Algorithm Hash digest
SHA256 b388834664676e0b56e02d7162a14f7b53e9bf2c76a8072e5c646bcd7d8afcd7
MD5 0c1efdafc2fddfade3f302af056eb895
BLAKE2b-256 b811b69afecb41ee782c3ccd9fb9c708ef38e0aaa53ec27147510dfc57f56be2

See more details on using hashes here.

Provenance

File details

Details for the file scrapy_poet-0.22.5-py3-none-any.whl.

File metadata

  • Download URL: scrapy_poet-0.22.5-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for scrapy_poet-0.22.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c23896016cf64ae6f85bf7ba01b42d60a6c710527e003ab49c6f8eea59a0eb7c
MD5 a926cda7eb965bdf56a49777a5f4cbb5
BLAKE2b-256 eac0f1d0660bfd07f96cb5364077379760c0c66c8dae32f34537cfe54ab85080

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page