Skip to main content

a library for scraping things

Project description

https://travis-ci.org/sunlightlabs/scrapelib.svg?branch=master https://coveralls.io/repos/sunlightlabs/scrapelib/badge.png?branch=master https://pypip.in/version/scrapelib/badge.svg https://pypip.in/format/scrapelib/badge.svg Documentation Status

scrapelib is a library for making requests to less-than-reliable websites, it is implemented (as of 0.7) as a wrapper around requests.

scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.

Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:

  • All of the power of the suberb requests library.

  • HTTP, HTTPS, and FTP requests via an identical API

  • support for simple caching with pluggable cache backends

  • request throttling

  • configurable retries for non-permanent site failures

scrapelib is a project of Sunlight Labs released under a BSD-style license, see LICENSE for details.

Written by James Turk <jturk@sunlightfoundation.com>

Contributors:
  • Michael Stephens - initial urllib2/httplib2 version

  • Joe Germuska - fix for IPython embedding

  • Alex Chiang - fix to test suite

Requirements

  • python 2.7, 3.3, 3.4

  • requests >= 1.0

Installation

scrapelib is available on PyPI and can be installed via pip install scrapelib

PyPI package: http://pypi.python.org/pypi/scrapelib

Source: http://github.com/sunlightlabs/scrapelib

Documentation: http://scrapelib.readthedocs.org/en/latest/

Example Usage

import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)

# Grab Google front page
s.get('http://google.com')

# Will be throttled to 10 HTTP requests per minute
while True:
    s.get('http://example.com')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapelib-1.0.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

scrapelib-1.0.0-py2.py3-none-any.whl (15.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapelib-1.0.0.tar.gz.

File metadata

  • Download URL: scrapelib-1.0.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scrapelib-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e43be3e0d120e0f38b5f3cdfbdd09619e022ccd948527adcfa033783ffbb8830
MD5 7b3849dbac9d633abf1d889bffe92089
BLAKE2b-256 2e619843937ce4f4fe6b7ee671a4e2d3f2d2176a8b3045dd944d888583e0cdc7

See more details on using hashes here.

File details

Details for the file scrapelib-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for scrapelib-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4c5aafe09d0431efe3af6c768f12fecc6c4d8926419baf6ea468345bfa72b0f4
MD5 77fcc627816b1be9ff386c9c2cee7160
BLAKE2b-256 3c73ba8a1eca10ae7818733d6e0cfff83a6a4dc7b039152091bb3296b981f068

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page