Skip to main content

a library for scraping things

Project description

https://travis-ci.org/sunlightlabs/scrapelib.svg?branch=master https://coveralls.io/repos/sunlightlabs/scrapelib/badge.png?branch=master https://pypip.in/version/scrapelib/badge.svg https://pypip.in/format/scrapelib/badge.svg

scrapelib is a library for making requests to less-than-reliable websites, it is implemented (as of 0.7) as a wrapper around requests.

scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.

Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:

  • All of the power of the suberb requests library.

  • HTTP, HTTPS, and FTP requests via an identical API

  • support for simple caching with pluggable cache backends

  • request throttling

  • configurable retries for non-permanent site failures

scrapelib is a project of Sunlight Labs released under a BSD-style license, see LICENSE for details.

Written by James Turk <jturk@sunlightfoundation.com>

Contributors:
  • Michael Stephens - initial urllib2/httplib2 version

  • Joe Germuska - fix for IPython embedding

  • Alex Chiang - fix to test suite

Requirements

  • python 2.7, 3.3, 3.4

  • requests >= 1.0

Installation

scrapelib is available on PyPI and can be installed via pip install scrapelib

PyPI package: http://pypi.python.org/pypi/scrapelib

Source: http://github.com/sunlightlabs/scrapelib

Documentation: http://scrapelib.readthedocs.org/en/latest/

Example Usage

import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)

# Grab Google front page
s.urlopen('http://google.com')

# Will be throttled to 10 HTTP requests per minute
while True:
    s.urlopen('http://example.com')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapelib-0.10.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

scrapelib-0.10.1-py2.py3-none-any.whl (16.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapelib-0.10.1.tar.gz.

File metadata

  • Download URL: scrapelib-0.10.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scrapelib-0.10.1.tar.gz
Algorithm Hash digest
SHA256 e27d9505bd4c8e0e4f8d8e39abdd4b2330fd6e18d070527b0e170d027341ffeb
MD5 3889417108f702354db7a231d3336b1b
BLAKE2b-256 10c43fe487a0cd4dcdb697803d29d35ffb8fc27ff10a06814abad4f42062ec1b

See more details on using hashes here.

File details

Details for the file scrapelib-0.10.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for scrapelib-0.10.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 77f853dc3a66d9d6fea5894a43b0230d6897e95a8baaeb31c00625884868ee40
MD5 d17e8f839542b3908ccc728d0acbb651
BLAKE2b-256 9cb3d564242f53903cfe107f163cb7e51d6dbbb95a9d1c9b3437dd11138874a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page