Skip to main content

Web Scraping Framework

Project description

Grab

https://travis-ci.org/lorien/grab.png?branch=master https://coveralls.io/repos/lorien/grab/badge.svg?branch=master https://pypip.in/download/grab/badge.svg?period=month https://pypip.in/version/grab/badge.svg https://landscape.io/github/lorien/grab/master/landscape.png https://readthedocs.org/projects/grab/badge/?version=latest

What is Grab?

Grab is a python web scraping framework. Grab provides tons of helpful methods to scrape web sites and to process the scraped content:

  • Automatic cookies (session) support

  • HTTP and SOCKS proxy with and without authorization

  • Keep-Alive support

  • IDN support

  • Tools to work with web forms

  • Easy multipart file uploading

  • Flexible customization of HTTP requests

  • Automatic charset detection

  • Powerful API of extracting info from HTML documents with XPATH queries

  • Asynchronous API to make thousands of simultaneous queries. This part of library called Spider and it is too big to even list its features in this README.

  • Python 3 ready

Grab Example

from grab import Grab
import logging

logging.basicConfig(level=logging.DEBUG)
g = Grab()
g.go('https://github.com/login')
g.set_input('login', '***')
g.set_input('password', '***')
g.submit()
g.doc.save('/tmp/x.html')

g.doc('//span[contains(@class, "octicon-sign-out")]').assert_exists()
home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()
repo_url = home_url + '?tab=repositories'

g.go(repo_url)
for elem in g.doc.select('//h3[@class="repo-list-name"]/a'):
    print('%s: %s' % (elem.text(),
                      g.make_url_absolute(elem.attr('href'))))

Grab::Spider Example

from grab.spider import Spider, Task
import logging

class ExampleSpider(Spider):
    def task_generator(self):
        for lang in ('python', 'ruby', 'perl'):
            url = 'https://www.google.com/search?q=%s' % lang
            yield Task('search', url=url, lang=lang)

    def task_search(self, grab, task):
        print('%s: %s' % (task.lang,
                          grab.doc('//div[@class="s"]//cite').text()))


logging.basicConfig(level=logging.DEBUG)
bot = ExampleSpider()
bot.run()

Installation

Pip is recommended way to install Grab and its dependencies:

$ pip install -U grab

See details here http://docs.grablib.org/en/latest/usage/installation.html

Documentation and Help

Documentation: http://docs.grablib.org/en/latest/

English mailing list: http://groups.google.com/group/grab-users/

Russian mailing list: http://groups.google.com/group/python-grab/

Contribution

To report a bug please use github issue tracker: https://github.com/lorien/grab/issues

If you want to develop new feature in Grab please use issue tracker to describe what you want to do or contact me at lorien@lorien.name

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grab-0.6.22.tar.gz (91.1 kB view details)

Uploaded Source

File details

Details for the file grab-0.6.22.tar.gz.

File metadata

  • Download URL: grab-0.6.22.tar.gz
  • Upload date:
  • Size: 91.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for grab-0.6.22.tar.gz
Algorithm Hash digest
SHA256 c2d754e8ed670ea3d709357d777d355fb58e5b45e34ef9356338c392ef343454
MD5 c962725092cb9d458cbc91294496342f
BLAKE2b-256 5b8ea584c6e3ca735a97f1c6bb7e345ec9d2427aa8f85efb5a3faa961a7c8a7e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page