Skip to main content

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Project description

Build Status PyPI Version Coverage report

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Features

  • Extract text using CSS or XPath selectors

  • Regular expression helper methods

Example:

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').extract_first()
u'Hello, Parsel!'
>>>
>>> sel.css('h1::text').re('\w+')
[u'Hello', u'Parsel']
>>>
>>> for e in sel.css('ul > li'):
        print(e.xpath('.//a/@href').extract_first())
http://example.com
http://scrapy.org

History

1.4.0 (2018-02-08)

  • Selector and SelectorList can’t be pickled because pickling/unpickling doesn’t work for lxml.html.HtmlElement; parsel now raises TypeError explicitly instead of allowing pickle to silently produce wrong output. This is technically backwards-incompatible if you’re using Python < 3.6.

1.3.1 (2017-12-28)

  • Fix artifact uploads to pypi.

1.3.0 (2017-12-28)

  • has-class XPath extension function;

  • parsel.xpathfuncs.set_xpathfunc is a simplified way to register XPath extensions;

  • Selector.remove_namespaces now removes namespace declarations;

  • Python 3.3 support is dropped;

  • make htmlview command for easier Parsel docs development.

  • CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.

1.2.0 (2017-05-17)

  • Add SelectorList.get and SelectorList.getall methods as aliases for SelectorList.extract_first and SelectorList.extract respectively

  • Add default value parameter to SelectorList.re_first method

  • Add Selector.re_first method

  • Add replace_entities argument on .re() and .re_first() to turn off replacing of character entity references

  • Bug fix: detect None result from lxml parsing and fallback with an empty document

  • Rearrange XML/HTML examples in the selectors usage docs

  • Travis CI:

    • Test against Python 3.6

    • Test against PyPy using “Portable PyPy for Linux” distribution

1.1.0 (2016-11-22)

  • Change default HTML parser to lxml.html.HTMLParser, which makes easier to use some HTML specific features

  • Add css2xpath function to translate CSS to XPath

  • Add support for ad-hoc namespaces declarations

  • Add support for XPath variables

  • Documentation improvements and updates

1.0.3 (2016-07-29)

  • Add BSD-3-Clause license file

  • Re-enable PyPy tests

  • Integrate py.test runs with setuptools (needed for Debian packaging)

  • Changelog is now called NEWS

1.0.2 (2016-04-26)

  • Fix bug in exception handling causing original traceback to be lost

  • Added docstrings and other doc fixes

1.0.1 (2015-08-24)

  • Updated PyPI classifiers

  • Added docstrings for csstranslator module and other doc fixes

1.0.0 (2015-08-22)

  • Documentation fixes

0.9.6 (2015-08-14)

  • Updated documentation

  • Extended test coverage

0.9.5 (2015-08-11)

  • Support for extending SelectorList

0.9.4 (2015-08-10)

  • Try workaround for travis-ci/dpl#253

0.9.3 (2015-08-07)

  • Add base_url argument

0.9.2 (2015-08-07)

  • Rename module unified -> selector and promoted root attribute

  • Add create_root_node function

0.9.1 (2015-08-04)

  • Setup Sphinx build and docs structure

  • Build universal wheels

  • Rename some leftovers from package extraction

0.9.0 (2015-07-30)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsel-1.4.0.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

parsel-1.4.0-py2.py3-none-any.whl (13.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file parsel-1.4.0.tar.gz.

File metadata

  • Download URL: parsel-1.4.0.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for parsel-1.4.0.tar.gz
Algorithm Hash digest
SHA256 2f3a6813a0ff39b6ca2530b9c1ad25d83e3a33808d93dd21fbf114c6232a16a8
MD5 2fb2b31ce0002630ba9ba2492c4ac6bf
BLAKE2b-256 01f28649c65adf8433a09cdee20912cdb470b9426fab55922abd299372e02904

See more details on using hashes here.

Provenance

File details

Details for the file parsel-1.4.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for parsel-1.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1a9ac0c1db8175547e1732be57ced2a2dc0714590f6b249d022ad25d918ef923
MD5 ff99af7fbf3b71311de5c5a480ad8f12
BLAKE2b-256 bcb42fd37d6f6a7e35cbc4c2613a789221ef1109708d5d4fb9fd5f6f721a43c9

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page