Skip to main content

Put Scrapy spiders behind an HTTP API

Project description

https://raw.githubusercontent.com/scrapinghub/scrapyrt/master/artwork/logo.gif

ScrapyRT (Scrapy realtime)

https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg https://img.shields.io/pypi/pyversions/scrapyrt.svg https://img.shields.io/pypi/v/scrapyrt.svg https://img.shields.io/pypi/l/scrapyrt.svg Downloads count https://readthedocs.org/projects/scrapyrt/badge/?version=latest

Add HTTP API for your Scrapy project in minutes.

You send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider visiting this URL.

  • All Scrapy project components (e.g. middleware, pipelines, extensions) are supported

  • You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.

Quickstart

1. install

> pip install scrapyrt

2. switch to Scrapy project (e.g. quotesbot project)

> cd my/project_path/is/quotesbot

3. launch ScrapyRT

> scrapyrt

4. run your spiders

> curl "localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider

>  curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/", "callback":"some_callback"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl.json -v

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won’t find one. Note that you need to have all your project requirements installed.

Note

  • Project is not a replacement for Scrapyd or Scrapy Cloud or other infrastructure to run long running crawls

  • Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quickly

Documentation

Documentation is available on readthedocs.

Support

Open source support is provided here in Github. Please create a question issue (ie. issue with “question” label).

Commercial support is also available by Zyte.

License

ScrapyRT is offered under BSD 3-Clause license.

Development

Development taking place on Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrt-0.13.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

scrapyrt-0.13.0-py2.py3-none-any.whl (36.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapyrt-0.13.0.tar.gz.

File metadata

  • Download URL: scrapyrt-0.13.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for scrapyrt-0.13.0.tar.gz
Algorithm Hash digest
SHA256 cd193a887199ef253a19d902acd8ea99c13fc889fae41b3e3ee63d4449e6cc08
MD5 6217dae175f32f7cade86005cb07bd3c
BLAKE2b-256 eeb66b783c8a997a7c8217165445e04a3f68678595e7d3e5b158bb68f07aaebb

See more details on using hashes here.

Provenance

File details

Details for the file scrapyrt-0.13.0-py2.py3-none-any.whl.

File metadata

  • Download URL: scrapyrt-0.13.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for scrapyrt-0.13.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2d033650535c8c0ee9474657e97fc78eb50642fa7e1be0c61420a1b38c81181c
MD5 bc76888be29f3d3e2fbb6b7821ac4de7
BLAKE2b-256 2d3c6a43502e67ddfa7f4e62604de7a3d793dc9b1d6106db60e705510837f9b0

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page