robobrowser

Your friendly neighborhood web scraper

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language

Project description

RoboBrowser: Your friendly neighborhood web scraper
===============================================

.. image:: https://badge.fury.io/py/robobrowser.png
:target: http://badge.fury.io/py/robobrowser

.. image:: https://travis-ci.org/jmcarp/robobrowser.png?branch=master
:target: https://travis-ci.org/jmcarp/robobrowser

.. image:: https://coveralls.io/repos/jmcarp/robobrowser/badge.png?branch=master
:target: https://coveralls.io/r/jmcarp/robobrowser

Homepage: `http://robobrowser.readthedocs.org/ <http://robobrowser.readthedocs.org/>`_

RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser
can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services
that don't have APIs, RoboBrowser can help.

.. code-block:: python

import re
from robobrowser import RoboBrowser

# Browse to Rap Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')

# Search for Queen
form = browser.get_form(action='/search')
form # <RoboForm q=>
form['q'].value = 'queen'
browser.submit_form(form)

# Look up the first song
songs = browser.select('.song_name')
browser.follow_link(songs[0])
lyrics = browser.select('.lyrics')
lyrics[0].text # \n[Intro]\nIs this the real life...

# Back to results page
browser.back()

# Look up my favorite song
browser.follow_link('death on two legs')

# Can also search HTML using regex patterns
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Verse 1]\nYou suck my blood like a leech...

RoboBrowser combines the best of two excellent Python libraries:
`Requests <http://docs.python-requests.org/en/latest/>`_ and
`BeautifulSoup <http://www.crummy.com/software/BeautifulSoup/>`_.
RoboBrowser represents browser sessions using Requests and HTML responses
using BeautifulSoup, transparently exposing methods of both libraries:

.. code-block:: python

import re
from robobrowser import RoboBrowser

browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')

# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot

# Search the parsed HTML
browser.select('div.teaser-icon') # [<div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# </div>,
# ...
browser.find(class_=re.compile(r'column', re.I)) # <div class="one-third column">
# <div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# ...

RoboBrowser also includes tools for working with forms, inspired by
`WebTest <https://github.com/Pylons/webtest>`_ and `Mechanize <http://wwwsearch.sourceforge.net/mechanize/>`_.

.. code-block:: python

from robobrowser import RoboBrowser

browser = RoboBrowser()
browser.open('http://twitter.com')

# Get the signup form
signup_form = browser.get_form(class_='signup')
signup_form # <RoboForm user[name]=, user[email]=, ...

# Inspect its values
signup_form['authenticity_token'].value # 6d03597 ...

# Fill it out
signup_form['user[name]'].value = 'python-robot'
signup_form['user[user_password]'].value = 'secret'

# Serialize it to JSON
signup_form.serialize() # {'data': {'authenticity_token': '6d03597...',
# 'context': '',
# 'user[email]': '',
# 'user[name]': 'python-robot',
# 'user[user_password]': ''}}

# And submit
browser.submit_form(signup_form)

Requirements
------------

- Python >= 2.6 or >= 3.3

License
-------

MIT licensed. See the bundled `LICENSE <https://github.com/jmcarp/robobrowser/blob/master/LICENSE>`_ file for more details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.5.3

Jun 7, 2015

0.5.2

Apr 18, 2015

0.5.1

Sep 15, 2014

0.5.0

Jul 28, 2014

0.4.1

Jul 27, 2014

0.4.0

Jul 24, 2014

0.3.3

Jul 22, 2014

0.3.2

Jul 20, 2014

0.3.1

Jul 5, 2014

0.3

May 4, 2014

0.2

Mar 10, 2014

This version

0.1.1

Feb 11, 2014

0.1.0

Feb 8, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robobrowser-0.1.1.tar.gz (16.2 kB view details)

Uploaded Feb 11, 2014 Source

File details

Details for the file robobrowser-0.1.1.tar.gz.

File metadata

Download URL: robobrowser-0.1.1.tar.gz
Upload date: Feb 11, 2014
Size: 16.2 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for robobrowser-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`13810a168c42ffb6959d0edbc0ad3653234ea7dcea11ef736ae19186c6345859`
MD5	`51142589ed60361b8ff2b8be225db67a`
BLAKE2b-256	`3e9669c70e62c33c21a9cd5091d9fb677827fef2f29f759a7ed879106cfbe9e0`