Parse hentry from microformats.
Project description
Parse a well designed webpage with microformats markup. If you have no knowledge about microformats, take a look at http://microformats.org/wiki/hentry.
A hentry schema looks like:
<article class="hentry"> <h1 class="entry-title">Article title</h1> <time class="updated" datetime="2014-11-06T20:00:00Z" pubdate>2014-11-06</time> <div class="entry-content"> <p>Here is the content</p> </div> <div class="entry-tags"> <a href="#tag1" rel="tag">tag1</a> <a href="#tag2" rel="tag">tag2</a> </div> <div class="vcard author"> <span class="fn">Author Name</span> </div> </article>
With this library hentry.py, you can parse the html into meta data:
hentry.parse_html(text, format='html')
Installation
Install hentry with pip:
$ pip install hentry
Basic Usage
Parse a webpage with a url:
hentry.parse_url(url)
Parse a webpage with html content:
hentry.parse_html(content)
The result is a dict which contains:
title
content
author
pubdate
tags
categories
image
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hentry-0.1.tar.gz
(3.5 kB
view details)
File details
Details for the file hentry-0.1.tar.gz
.
File metadata
- Download URL: hentry-0.1.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d433180f01b66966f556d8f01aef38fb2655c59f7462f29ee103043eea6b43a |
|
MD5 | 4d44f9a1745e5851ec6e84d20e49fd02 |
|
BLAKE2b-256 | afb96ab464dd1abf3b511598ac5bd0d28254c23bf2ec07a58aba0600e689f2e0 |