Your friendly neighborhood web scraper
Project description
Homepage: http://pyrobot.readthedocs.org/
import re
from pyrobot import RoboBrowser
# Browse to Rap Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')
# Search for Queen
form = browser.get_form(action=re.compile(r'search'))
form['q'].value = 'queen'
browser.submit_form(form)
# Look up the first song
songs = browser.select('.song_name')
browser.follow_link(songs[0])
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Intro]\nIs this the real life...
# Back to results page
browser.back()
# Look up my favorite song
browser.follow_link('death on two legs')
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Verse 1]\nYou suck my blood like a leech...
PyRobot combines the best of two excellent Python libraries: Requests and BeautifulSoup. PyRobot represents browser sessions using Requests and HTML responses using BeautifulSoup, transparently exposing methods of both libraries:
import re
from pyrobot import RoboBrowser
browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')
# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot
# Searched the parsed HTML
browser.select('div.teaser-icon') # [<div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# </div>,
# ...
browser.find(class_=re.compile(r'column', re.I)) # <div class="one-third column">
# <div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# ...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyrobot-0.1.0.tar.gz
(10.9 kB
view hashes)