Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Project description
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
Free software: BSD license
Documentation: https://parsel.readthedocs.org.
Features
Extract text using CSS or XPath selectors
Regular expression helper methods
Example:
>>> from parsel import Selector >>> sel = Selector(text=u"""<html> <body> <h1>Hello, Parsel!</h1> <ul> <li><a href="http://example.com">Link 1</a></li> <li><a href="http://scrapy.org">Link 2</a></li> </ul </body> </html>""") >>> >>> sel.css('h1::text').extract_first() u'Hello, Parsel!' >>> >>> sel.css('h1::text').re('\w+') [u'Hello', u'Parsel'] >>> >>> for e in sel.css('ul > li'): print(e.xpath('.//a/@href')).extract_first() http://example.com http://scrapy.org
History
0.9.0 (2015-07-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parsel-0.9.0.tar.gz
(20.0 kB
view details)
Built Distribution
File details
Details for the file parsel-0.9.0.tar.gz
.
File metadata
- Download URL: parsel-0.9.0.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 620adecb14f2306d914b2f5e0f977c9f834f094f577a554fed0adc7b3f33295c |
|
MD5 | c422f4a04fac7613155ea576a4f70a39 |
|
BLAKE2b-256 | d5ed869a69da3de9e52e8ab0e1ffb5410378f23ce6eca7d115a920c55ca053cb |
File details
Details for the file parsel-0.9.0-py2-none-any.whl
.
File metadata
- Download URL: parsel-0.9.0-py2-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 903f9a915ce2912f0b0a498630347eb6312729cd267a7eeac95618d0a02e76d1 |
|
MD5 | f74e90f216b4b68bac9935a11f6b63e2 |
|
BLAKE2b-256 | 1140cbe6e7e99e3e7d5ee80cf840219d5dc3fb00ea7bb2026aa776e89e0e07df |