scrapyrt

Put Scrapy spiders behind an HTTP API

These details have not been verified by PyPI

Project links

Homepage

Project description

https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg

https://img.shields.io/pypi/pyversions/scrapyrt.svg

https://img.shields.io/pypi/v/scrapyrt.svg

https://img.shields.io/pypi/l/scrapyrt.svg

https://readthedocs.org/projects/scrapyrt/badge/?version=latest

Introduction

HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.

Features

Allows you to easily add HTTP API to your existing Scrapy project
All Scrapy project components (e.g. middleware, pipelines, extensions) are supported out of the box.
You simply run Scrapyrt in Scrapy project directory and it starts HTTP server allowing you to schedule your spiders and get spider output in JSON format.

Note

Project is not a replacement for Scrapyd or Scrapy Cloud or other infrastructure to run long running crawls
Not suitable for long running spiders, good for spiders that will fetch one response from some website and return response

Getting started

To install Scrapyrt:

pip install scrapyrt

Now you can run Scrapyrt from within Scrapy project by just typing:

scrapyrt

in Scrapy project directory.

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won’t find one. Note that you need to have all your project requirements installed.

Scrapyrt supports endpoint /crawl.json that can be requested with two methods: GET and POST.

To run sample toscrape-css spider from Quotesbot parsing page about famous quotes:

curl "http://localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

To run same spider only allowing one request and parsing url with callback parse_foo:

curl "http://localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/&callback=parse_foo&max_requests=1"

Documentation

Documentation is available on readthedocs.

Support

Open source support is provided here in Github. Please create a question issue (ie. issue with “question” label).

Commercial support is also available by Zyte.

License

ScrapyRT is offered under BSD 3-Clause license.

Development

Development taking place on Github.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.16.0

Feb 14, 2024

0.15.0

Sep 20, 2023

0.14.0

Aug 10, 2023

0.13.0

Dec 10, 2021

This version

0.12.0

Mar 8, 2021

0.11.0

Sep 20, 2019

0.10

Apr 18, 2017

0.9

May 11, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyrt-0.12.0.tar.gz (27.8 kB view details)

Uploaded Mar 8, 2021 Source

Built Distribution

scrapyrt-0.12.0-py2.py3-none-any.whl (35.1 kB view details)

Uploaded Mar 8, 2021 Python 2 Python 3

File details

Details for the file scrapyrt-0.12.0.tar.gz.

File metadata

Download URL: scrapyrt-0.12.0.tar.gz
Upload date: Mar 8, 2021
Size: 27.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for scrapyrt-0.12.0.tar.gz
Algorithm	Hash digest
SHA256	`e73687dbadd05aa7403c9762f9a487b1e88e4aa701a6087416ceffe61cad7d45`
MD5	`a85b2be202cd7a9ffabc2d15206b9e1c`
BLAKE2b-256	`356499f0414577b690145861efa33f7c968b34d5a36d48ab0c755a437e8d902b`

See more details on using hashes here.

Provenance

File details

Details for the file scrapyrt-0.12.0-py2.py3-none-any.whl.

File metadata

Download URL: scrapyrt-0.12.0-py2.py3-none-any.whl
Upload date: Mar 8, 2021
Size: 35.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.7

File hashes

Hashes for scrapyrt-0.12.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e69691dc883bd628b331b0112637170f21de6ca13ac08a415f83a288edf150d1`
MD5	`efae64ad475e36c44a24b1800c44d25c`
BLAKE2b-256	`6d20d7eed0c0ecb10533a3bcd6c7705d17a04fa4a37a3662f5f84bc289c03b7a`