Skip to main content

Put Scrapy spiders behind an HTTP API

Project description

https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg https://img.shields.io/pypi/pyversions/scrapyrt.svg https://img.shields.io/pypi/v/scrapyrt.svg https://img.shields.io/pypi/l/scrapyrt.svg Downloads count https://readthedocs.org/projects/scrapyrt/badge/?version=latest

Introduction

HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.

Features

  • Allows you to easily add HTTP API to your existing Scrapy project

  • All Scrapy project components (e.g. middleware, pipelines, extensions) are supported out of the box.

  • You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.

Note

  • Project is not a replacement for Scrapyd or Scrapy Cloud or other infrastructure to run long running crawls

  • Not suitable for long running spiders, good for spiders that will fetch one response from some website and return response

Getting started

To install Scrapyrt:

pip install scrapyrt

Now you can run Scrapyrt from within Scrapy project by just typing:

scrapyrt

in Scrapy project directory.

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won’t find one. Note that you need to have all your project requirements installed.

Scrapyrt supports endpoint /crawl.json that can be requested with two methods: GET and POST.

To run sample toscrape-css spider from Quotesbot parsing page about famous quotes:

curl "http://localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

To run same spider only allowing one request and parsing url with callback parse_foo:

curl "http://localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/&callback=parse_foo&max_requests=1"

Documentation

Documentation is available on readthedocs.

Support

Open source support is provided here in Github. Please create a question issue (ie. issue with “question” label).

Commercial support is also available by Zyte.

License

ScrapyRT is offered under BSD 3-Clause license.

Development

Development taking place on Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrt-0.12.0.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

scrapyrt-0.12.0-py2.py3-none-any.whl (35.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapyrt-0.12.0.tar.gz.

File metadata

  • Download URL: scrapyrt-0.12.0.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for scrapyrt-0.12.0.tar.gz
Algorithm Hash digest
SHA256 e73687dbadd05aa7403c9762f9a487b1e88e4aa701a6087416ceffe61cad7d45
MD5 a85b2be202cd7a9ffabc2d15206b9e1c
BLAKE2b-256 356499f0414577b690145861efa33f7c968b34d5a36d48ab0c755a437e8d902b

See more details on using hashes here.

Provenance

File details

Details for the file scrapyrt-0.12.0-py2.py3-none-any.whl.

File metadata

  • Download URL: scrapyrt-0.12.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for scrapyrt-0.12.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e69691dc883bd628b331b0112637170f21de6ca13ac08a415f83a288edf150d1
MD5 efae64ad475e36c44a24b1800c44d25c
BLAKE2b-256 6d20d7eed0c0ecb10533a3bcd6c7705d17a04fa4a37a3662f5f84bc289c03b7a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page