Your friendly neighborhood web scraper
Project description
RoboBrowser: Your friendly neighborhood web scraper
===============================================
.. image:: https://badge.fury.io/py/robobrowser.png
:target: http://badge.fury.io/py/robobrowser
.. image:: https://travis-ci.org/jmcarp/robobrowser.png?branch=master
:target: https://travis-ci.org/jmcarp/robobrowser
.. image:: https://coveralls.io/repos/jmcarp/robobrowser/badge.png?branch=master
:target: https://coveralls.io/r/jmcarp/robobrowser
Homepage: `http://robobrowser.readthedocs.org/ <http://robobrowser.readthedocs.org/>`_
RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser
can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services
that don't have APIs, RoboBrowser can help.
.. code-block:: python
import re
from robobrowser import RoboBrowser
# Browse to Rap Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')
# Search for Queen
form = browser.get_form(action='/search')
form # <RoboForm q=>
form['q'].value = 'queen'
browser.submit_form(form)
# Look up the first song
songs = browser.select('.song_name')
browser.follow_link(songs[0])
lyrics = browser.select('.lyrics')
lyrics[0].text # \n[Intro]\nIs this the real life...
# Back to results page
browser.back()
# Look up my favorite song
browser.follow_link('death on two legs')
# Can also search HTML using regex patterns
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Verse 1]\nYou suck my blood like a leech...
RoboBrowser combines the best of two excellent Python libraries:
`Requests <http://docs.python-requests.org/en/latest/>`_ and
`BeautifulSoup <http://www.crummy.com/software/BeautifulSoup/>`_.
RoboBrowser represents browser sessions using Requests and HTML responses
using BeautifulSoup, transparently exposing methods of both libraries:
.. code-block:: python
import re
from robobrowser import RoboBrowser
browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')
# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot
# Search the parsed HTML
browser.select('div.teaser-icon') # [<div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# </div>,
# ...
browser.find(class_=re.compile(r'column', re.I)) # <div class="one-third column">
# <div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# ...
RoboBrowser also includes tools for working with forms, inspired by
`WebTest <https://github.com/Pylons/webtest>`_ and `Mechanize <http://wwwsearch.sourceforge.net/mechanize/>`_.
.. code-block:: python
from robobrowser import RoboBrowser
browser = RoboBrowser()
browser.open('http://twitter.com')
# Get the signup form
signup_form = browser.get_form(class_='signup')
signup_form # <RoboForm user[name]=, user[email]=, ...
# Inspect its values
signup_form['authenticity_token'].value # 6d03597 ...
# Fill it out
signup_form['user[name]'].value = 'python-robot'
signup_form['user[user_password]'].value = 'secret'
# Serialize it to JSON
signup_form.serialize() # {'data': {'authenticity_token': '6d03597...',
# 'context': '',
# 'user[email]': '',
# 'user[name]': 'python-robot',
# 'user[user_password]': ''}}
# And submit
browser.submit_form(signup_form)
Requirements
------------
- Python >= 2.6 or >= 3.3
License
-------
MIT licensed. See the bundled `LICENSE <https://github.com/jmcarp/robobrowser/blob/master/LICENSE>`_ file for more details.
===============================================
.. image:: https://badge.fury.io/py/robobrowser.png
:target: http://badge.fury.io/py/robobrowser
.. image:: https://travis-ci.org/jmcarp/robobrowser.png?branch=master
:target: https://travis-ci.org/jmcarp/robobrowser
.. image:: https://coveralls.io/repos/jmcarp/robobrowser/badge.png?branch=master
:target: https://coveralls.io/r/jmcarp/robobrowser
Homepage: `http://robobrowser.readthedocs.org/ <http://robobrowser.readthedocs.org/>`_
RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser
can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services
that don't have APIs, RoboBrowser can help.
.. code-block:: python
import re
from robobrowser import RoboBrowser
# Browse to Rap Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')
# Search for Queen
form = browser.get_form(action='/search')
form # <RoboForm q=>
form['q'].value = 'queen'
browser.submit_form(form)
# Look up the first song
songs = browser.select('.song_name')
browser.follow_link(songs[0])
lyrics = browser.select('.lyrics')
lyrics[0].text # \n[Intro]\nIs this the real life...
# Back to results page
browser.back()
# Look up my favorite song
browser.follow_link('death on two legs')
# Can also search HTML using regex patterns
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Verse 1]\nYou suck my blood like a leech...
RoboBrowser combines the best of two excellent Python libraries:
`Requests <http://docs.python-requests.org/en/latest/>`_ and
`BeautifulSoup <http://www.crummy.com/software/BeautifulSoup/>`_.
RoboBrowser represents browser sessions using Requests and HTML responses
using BeautifulSoup, transparently exposing methods of both libraries:
.. code-block:: python
import re
from robobrowser import RoboBrowser
browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')
# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot
# Search the parsed HTML
browser.select('div.teaser-icon') # [<div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# </div>,
# ...
browser.find(class_=re.compile(r'column', re.I)) # <div class="one-third column">
# <div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# ...
RoboBrowser also includes tools for working with forms, inspired by
`WebTest <https://github.com/Pylons/webtest>`_ and `Mechanize <http://wwwsearch.sourceforge.net/mechanize/>`_.
.. code-block:: python
from robobrowser import RoboBrowser
browser = RoboBrowser()
browser.open('http://twitter.com')
# Get the signup form
signup_form = browser.get_form(class_='signup')
signup_form # <RoboForm user[name]=, user[email]=, ...
# Inspect its values
signup_form['authenticity_token'].value # 6d03597 ...
# Fill it out
signup_form['user[name]'].value = 'python-robot'
signup_form['user[user_password]'].value = 'secret'
# Serialize it to JSON
signup_form.serialize() # {'data': {'authenticity_token': '6d03597...',
# 'context': '',
# 'user[email]': '',
# 'user[name]': 'python-robot',
# 'user[user_password]': ''}}
# And submit
browser.submit_form(signup_form)
Requirements
------------
- Python >= 2.6 or >= 3.3
License
-------
MIT licensed. See the bundled `LICENSE <https://github.com/jmcarp/robobrowser/blob/master/LICENSE>`_ file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
robobrowser-0.1.1.tar.gz
(16.2 kB
view details)
File details
Details for the file robobrowser-0.1.1.tar.gz
.
File metadata
- Download URL: robobrowser-0.1.1.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13810a168c42ffb6959d0edbc0ad3653234ea7dcea11ef736ae19186c6345859 |
|
MD5 | 51142589ed60361b8ff2b8be225db67a |
|
BLAKE2b-256 | 3e9669c70e62c33c21a9cd5091d9fb677827fef2f29f759a7ed879106cfbe9e0 |