Skip to main content

XML Parsing for humans.

Project description


Requests-XML: XML Parsing for Humans
====================================

.. image:: https://travis-ci.org/erinxocon/requests-xml.svg?branch=master
:target: https://travis-ci.org/erinxocon/requests-xml
.. image:: https://img.shields.io/pypi/v/requests-xml.svg?maxAge=2592000
:target: https://pypi-hypernode.com/pypi/requests-xml/
.. image:: https://img.shields.io/pypi/l/requests-xml.svg?maxAge=2592000
:target: https://opensource.org/licenses/MIT

This library intends to make parsing XML as
simple and intuitive as possible. **Requests-XML** is related
to the amazing `Requests-HTML <http://html.python-requests.org/>`_
and delivers the same quality of user experience — with support for our beloved XML documents.

When using this library you automatically get:

- *XPath Selectors*, for the *brave* at heart.
- *Simple Search/Find* for the *faint* at heart.
- XML to JSON conversion thanks to `xmljson <https://github.com/sanand0/xmljson/>`_.
- Mocked user-agent (like a real web browser).
- Connection–pooling and cookie persistence.
- The Requests experience you know and love, with magical XML parsing abilities.


Installation
============

.. code-block:: shell

$ pipenv install requests-xml
✨🍰✨

Only **Python 3.6** is supported.


Tutorial & Usage
================

Make a GET request to `nasa.gov <https://www.nasa.gov/rss/dyn/lg_image_of_the_day.rss/>`_, using `Requests <https://docs.python-requests.org/>`_:

.. code-block:: pycon

>>> from requests_xml import XMLSession
>>> session = XMLSession()

>>> r = session.get('https://www.nasa.gov/rss/dyn/lg_image_of_the_day.rss')

Grab a list of all links on the page, as–is (this only works for RSS feeds, or other feeds that happen to have `link` elements):

.. code-block:: pycon

>>> r.xml.links
['http://www.nasa.gov/image-feature/from-the-earth-moon-and-beyond', 'http://www.nasa.gov/image-feature/jpl/pia21974/jupiter-s-colorful-cloud-belts', 'http://www.nasa.gov/', 'http://www.nasa.gov/image-feature/portrait-of-the-expedition-54-crew-on-the-space-station', ...]


XPath is the main supported way to query an element (`learn more <https://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx>`_):

.. code-block:: pycon

>>> item = r.xml.xpath('//item', first=True)
<Element 'item' >

Grab an element's text contents:

.. code-block:: pycon

>>> print(item.text)
The Beauty of Light
http://www.nasa.gov/image-feature/the-beauty-of-light
The Soyuz MS-08 rocket is launched with Soyuz Commander Oleg Artemyev of Roscosmos and astronauts Ricky Arnold and Drew Feustel of NASA, March 21, 2018, to join the crew of the Space Station.
http://www.nasa.gov/image-feature/the-beauty-of-light
Wed, 21 Mar 2018 14:12 EDT
NASA Image of the Day

Introspect an element's attributes (`learn more <https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes>`_):

.. code-block:: pycon

>>> rss = r.xml.xpath('//rss', first=True)
>>> rss.attrs
{'version': '2.0', '{http://www.w3.org/XML/1998/namespace}base': 'http://www.nasa.gov/'}

Render out an element's XML (note: namespaces will be applied to sub elements when grabbed):

.. code-block:: pycon

>>> item.xml
'<item xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:media="http://search.yahoo.com/mrss/"> <title>The Beauty of Light</title>\n <link>http://www.nasa.gov/image-feature/the-beauty-of-light</link>\n <description>The Soyuz MS-08 rocket is launched with Soyuz Commander Oleg Artemyev of Roscosmos and astronauts Ricky Arnold and Drew Feustel of NASA, March 21, 2018, to join the crew of the Space Station.</description>\n <enclosure url="http://www.nasa.gov/sites/default/files/thumbnails/image/nhq201803210005.jpg" length="1267028" type="image/jpeg"/>\n <guid isPermaLink="false">http://www.nasa.gov/image-feature/the-beauty-of-light</guid>\n <pubDate>Wed, 21 Mar 2018 14:12 EDT</pubDate>\n <source url="http://www.nasa.gov/rss/dyn/lg_image_of_the_day.rss">NASA Image of the Day</source>\n</item>'


Select an element list within an element:

.. code-block:: pycon

>>> item.xpath('//enclosure')[0].attrs['url']
'http://www.nasa.gov/sites/default/files/thumbnails/image/nhq201803210005.jpg'

Search for links within an element:

.. code-block:: pycon

>>> item.links
['http://www.nasa.gov/image-feature/the-beauty-of-light']


Search for text on the page. This is useful if you wish to search out things between specific tags without using XPath:

.. code-block:: pycon

>>> r.xml.search('<title>{}</title>)
<Result ('NASA Image of the Day',) {}>


Using PyQuery we can use CSS selectors to easily grab an element, with a simple syntax for ensuring the element
contains certain text. This can be used as another easy way to grab an element without an XPath:

.. code-block:: pycon

>>> light_title = r.xml.find('title', containing='The Beauty of Light')
[<Element 'title' >]

>>> light_title[0].text
'The Beauty of Light'

Note: XPath is preferred as it can allow you to get very specific with your element selection. Find is intended to be
an easy way of grabbing all elements of a certain name. Find does however accept CSS selectors, and if you can get those
to work with straight XML, go for it!

JSON Support
============

Using the great `xmljson <https://github.com/sanand0/xmljson/>`_ package, we convert the whole
XML document into a JSON representation. There are six different conversion convetions available.
See the `about <https://github.com/sanand0/xmljson#about>`_ for what they are. The default is ``badgerfish``.
If you wish to use a different conversion convention, pass in a string with the name of the convetion to the
``.json()`` method.


Using without Requests
======================

You can also use this library without Requests:

.. code-block:: pycon

>>> from requests_xml import XML
>>> doc = """
<employees>
<person>
<name value="Alice"/>
</person>
<person>
<name value="Bob"/>
</person>
</employees>
"""

>>> xml = XML(xml=doc)
>>> xml.json()
{
"employees": [{
"person": {
"name": {
"@value": "Alice"
}
}
}, {
"person": {
"name": {
"@value": "Bob"
}
}
}]
}

License
=======
MIT


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

requests-xml-0.2.3.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

requests_xml-0.2.3-py2.py3-none-any.whl (11.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file requests-xml-0.2.3.tar.gz.

File metadata

File hashes

Hashes for requests-xml-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ba2e8d508594561df3b7fada50d2d36cfda00ee3307f8d8ee406ac92b2d9c5c8
MD5 8e3ac85e3552f1d2b04e6b431248ae02
BLAKE2b-256 e46f50c80681e25473ebf4750d7a8c8781afd147f9820faae3430487bbce5bda

See more details on using hashes here.

File details

Details for the file requests_xml-0.2.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for requests_xml-0.2.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a428482b0b08d76cbba0a6e2136ae0ff8f6ffe22c3a60f03e3a6983fbb27a79c
MD5 16f062423325f94d4a00b5e13bed611b
BLAKE2b-256 3e392229bdb377549b44794e7947ad1dab01da6033d435bc952a97215db5f581

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page