Your friendly neighborhood web scraper
Project description
Homepage: http://pyrobot.readthedocs.org/
import re
from pyrobot import RoboBrowser
# Browse to Rap Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')
# Search for Queen
form = browser.get_form(action=re.compile(r'search'))
form['q'].value = 'queen'
browser.submit_form(form)
# Look up the first song
songs = browser.select('.song_name')
browser.follow_link(songs[0])
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Intro]\nIs this the real life...
# Back to results page
browser.back()
# Look up my favorite song
browser.follow_link('death on two legs')
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Verse 1]\nYou suck my blood like a leech...
PyRobot combines the best of two excellent Python libraries: Requests and BeautifulSoup. PyRobot represents browser sessions using Requests and HTML responses using BeautifulSoup, transparently exposing methods of both libraries:
import re
from pyrobot import RoboBrowser
browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')
# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot
# Searched the parsed HTML
browser.select('div.teaser-icon') # [<div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# </div>,
# ...
browser.find(class_=re.compile(r'column', re.I)) # <div class="one-third column">
# <div class="teaser-icon">
# <span class="mega-octicon octicon-checklist"></span>
# ...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyrobot-0.1.0.tar.gz
(10.9 kB
view details)
File details
Details for the file pyrobot-0.1.0.tar.gz
.
File metadata
- Download URL: pyrobot-0.1.0.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f0abbbf665c6187e7739d4cebc3bf6bebea103dc08c7f1800e1784ad0d35243 |
|
MD5 | d5b9f52a79752c495eeae2e46b4c255d |
|
BLAKE2b-256 | 25b980d4869915307feb8e749b24e3c9329184101561a8b5ada07fd7cc7ba08c |