Skip to main content

a library for scraping things

Project description

https://travis-ci.org/sunlightlabs/scrapelib.svg?branch=master https://coveralls.io/repos/sunlightlabs/scrapelib/badge.png?branch=master https://pypip.in/version/scrapelib/badge.svg https://pypip.in/format/scrapelib/badge.svg

scrapelib is a library for making requests to less-than-reliable websites, it is implemented (as of 0.7) as a wrapper around requests.

scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.

Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:

  • All of the power of the suberb requests library.

  • HTTP, HTTPS, and FTP requests via an identical API

  • support for simple caching with pluggable cache backends

  • request throttling

  • configurable retries for non-permanent site failures

scrapelib is a project of Sunlight Labs released under a BSD-style license, see LICENSE for details.

Written by James Turk <jturk@sunlightfoundation.com>

Contributors:
  • Michael Stephens - initial urllib2/httplib2 version

  • Joe Germuska - fix for IPython embedding

  • Alex Chiang - fix to test suite

Requirements

  • python 2.7, 3.3, 3.4

  • requests >= 1.0

Installation

scrapelib is available on PyPI and can be installed via pip install scrapelib

PyPI package: http://pypi.python.org/pypi/scrapelib

Source: http://github.com/sunlightlabs/scrapelib

Documentation: http://scrapelib.readthedocs.org/en/latest/

Example Usage

import scrapelib
s = scrapelib.Scraper(requests_per_minute=10)

# Grab Google front page
s.urlopen('http://google.com')

# Will be throttled to 10 HTTP requests per minute
while True:
    s.urlopen('http://example.com')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapelib-0.10.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

scrapelib-0.10.0-py2.py3-none-any.whl (15.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapelib-0.10.0.tar.gz.

File metadata

  • Download URL: scrapelib-0.10.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scrapelib-0.10.0.tar.gz
Algorithm Hash digest
SHA256 7118a8daeb3b5578dab219e47d990a186c969ea25d4df229461b6facd0ac2b26
MD5 6171f845ee3373725eb8c7f5323fdda2
BLAKE2b-256 ae315794f3106af97c39ea2c4f4728060fa41a6767d9f2aea42f4b924632d284

See more details on using hashes here.

File details

Details for the file scrapelib-0.10.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for scrapelib-0.10.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 eb529bf1bde9b0785c9f9e12be81ea729e7592c167a3e0e00fde6fdc6f7bfaec
MD5 e2b84a12a970a7eea35db47a9e9ad4c2
BLAKE2b-256 243c082d5819538810a4241cd3879f45b850dd8e342b885d1cfb1af88731ab10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page