Skip to main content

No project description provided

Project description

https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg https://coveralls.io/repos/jamesturk/scrapelib/badge.png?branch=master https://img.shields.io/pypi/v/scrapelib.svg Documentation Status

scrapelib is a library for making requests to less-than-reliable websites, it is implemented (as of 0.7) as a wrapper around requests.

scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.

Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:

  • All of the power of the suberb requests library.

  • HTTP, HTTPS, and FTP requests via an identical API

  • support for simple caching with pluggable cache backends

  • request throttling

  • configurable retries for non-permanent site failures

Written by James Turk <dev@jamesturk.net>, thanks to Michael Stephens for initial urllib2/httplib2 version

See https://github.com/jamesturk/scrapelib/graphs/contributors for contributors.

Requirements

  • python >=3.7

  • requests >= 2.0

Example Usage

Documentation: http://scrapelib.readthedocs.org/en/latest/

import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)

# Grab Google front page
s.get('http://google.com')

# Will be throttled to 10 HTTP requests per minute
while True:
    s.get('http://example.com')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapelib-2.0.4.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

scrapelib-2.0.4-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file scrapelib-2.0.4.tar.gz.

File metadata

  • Download URL: scrapelib-2.0.4.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.1 Darwin/20.3.0

File hashes

Hashes for scrapelib-2.0.4.tar.gz
Algorithm Hash digest
SHA256 460d6b620e35ee36f0e37f6e8c6a5a27f38c08f9293ad9b7ed139ac3fa191eb2
MD5 a68d8e65ced081a01a6e39873c9fa0fb
BLAKE2b-256 22f5a7abe9e85e835246403bf0e4d1c9dc5a49a36fb6c16896f40a17d1d4e8ce

See more details on using hashes here.

File details

Details for the file scrapelib-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: scrapelib-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.1 Darwin/20.3.0

File hashes

Hashes for scrapelib-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1455d443bd0cda14d240c8e5c746c64c50aef94beb8b5821417e21ff1c79595a
MD5 9fd2b13cf67c8a50d536c14e8ba77c81
BLAKE2b-256 ea4aa7f788c7a6afec11233f7763810e1569cffb46d7fb35baf29a43df3c36b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page