Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks
Project description
HTMLmetadata
Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks
Inspired in https://metascraper.js.org
Install
pip install htmlmetadata
Use
You can use it by calling the module directly.
python -m htmlmetadata http://schema.org/docs/about.html
{
"request": {
"url": "http://schema.org/docs/about.html"
},
"summary": {
"description": "Schema.org is a set of extensible schemas that enables webmasters to embed\n structured data on their web pages for use by search engines and other applications.",
"title": "about page - schema.org",
"language": "en"
}
}
Or use it directly in your code.
from htmlmetadata import extract_metadata
data = extract_metadata("http://schema.org/docs/about.html")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
htmlmetadata-1.1.zip
(8.4 kB
view details)
Built Distribution
File details
Details for the file htmlmetadata-1.1.zip
.
File metadata
- Download URL: htmlmetadata-1.1.zip
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 153e82b57392d62b57cb687ef45f7f99fd5d2e3705c8412db723945e281efd16 |
|
MD5 | 00dc5a8e29293b47bf268e6e70063242 |
|
BLAKE2b-256 | f33602dd088e3aee77b8a85d8e7aeaf7e9cb8a7ca963d6e51e85d6035a698a48 |
File details
Details for the file htmlmetadata-1.1-py2.py3-none-any.whl
.
File metadata
- Download URL: htmlmetadata-1.1-py2.py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 004d36c78fd27366b7ee97943d7293d54dd51a9f16efbbb6163c6d1b25619635 |
|
MD5 | e13f15ec806fbe9aa4c57d3ba6d010e1 |
|
BLAKE2b-256 | ebd0e4077f28c0af6c47ce8d3411bdd252521311f2804ed9a99bdd3648fd45a5 |