Skip to main content

Site Scraping Framework

Project description

https://travis-ci.org/lorien/grab.png

Grab is a python site scraping framework. Grab provides powerful interface to two libraries: lxml and pycurl. There are two ways how to use Grab: 1) Use Grab to configure network requests and to process fetched documents. In this way you should manually control flow of you program. 2) Use Grab::Spider to buld asynchronous site scrapers. This is how scrapy works.

Example of Grab usage:

from grab import Grab

g = Grab()
g.go('https://github.com/login')
g.set_input('login', 'lorien')
g.set_input('password', '***')
g.submit()
for elem in g.doc.select('//ul[@id="repo_listing"]/li/a'):
    print '%s: %s' % (elem.text(), elem.attr('href'))

Example of Grab::Spider usage:

from grab.spider import Spider, Task
import logging

class ExampleSpider(Spider):
    def task_generator(self):
        for lang in ('python', 'ruby', 'perl'):
            url = 'https://www.google.com/search?q=%s' % lang
            yield Task('search', url=url)

    def task_search(self, grab, task):
        print grab.doc.select('//div[@class="s"]//cite').text()


logging.basicConfig(level=logging.DEBUG)
bot = ExampleSpider()
bot.run()

Installation

Pip is recommended way to install Grab and its dependencies:

$ pip install lxml
$ pip install pycurl
$ pip install grab

Documentation

Russian docs: http://grablib.org/docs/ English docs in progress.

Discussion group (Russian or English): http://groups.google.com/group/python-grab/

Contribution

If you found a bug or if you want new feature please create new issue on bitbucket or github:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grab-0.4.11.tar.gz (115.8 kB view details)

Uploaded Source

File details

Details for the file grab-0.4.11.tar.gz.

File metadata

  • Download URL: grab-0.4.11.tar.gz
  • Upload date:
  • Size: 115.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for grab-0.4.11.tar.gz
Algorithm Hash digest
SHA256 21c49f8dab2671081d755a069d6e6c26d81efb52c9c23b635d74330089e72e2c
MD5 f02e9a5bf15dd326921db30967555791
BLAKE2b-256 dc0e8fb51c0ddb1f28f4be4847f7824e9eaf21180ab89ab3846a5cecbb11b407

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page