Skip to main content

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Project description

HTMLmetadata

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Inspired in https://metascraper.js.org

Install

pip install htmlmetadata

Use

You can use it by calling the module directly.

python -m htmlmetadata http://schema.org/docs/about.html                                                                            
{
  "request": {
    "url": "http://schema.org/docs/about.html"
  },
  "summary": {
    "description": "Schema.org is a set of extensible schemas that enables webmasters to embed\n    structured data on their web pages for use by search engines and other applications.",
    "title": "about page - schema.org",
    "language": "en"
  }
}

Or use it directly in your code.

from htmlmetadata import extract_metadata

data = extract_metadata("http://schema.org/docs/about.html")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmlmetadata-1.1.zip (8.4 kB view details)

Uploaded Source

Built Distribution

htmlmetadata-1.1-py2.py3-none-any.whl (5.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file htmlmetadata-1.1.zip.

File metadata

  • Download URL: htmlmetadata-1.1.zip
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0

File hashes

Hashes for htmlmetadata-1.1.zip
Algorithm Hash digest
SHA256 153e82b57392d62b57cb687ef45f7f99fd5d2e3705c8412db723945e281efd16
MD5 00dc5a8e29293b47bf268e6e70063242
BLAKE2b-256 f33602dd088e3aee77b8a85d8e7aeaf7e9cb8a7ca963d6e51e85d6035a698a48

See more details on using hashes here.

File details

Details for the file htmlmetadata-1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: htmlmetadata-1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0

File hashes

Hashes for htmlmetadata-1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 004d36c78fd27366b7ee97943d7293d54dd51a9f16efbbb6163c6d1b25619635
MD5 e13f15ec806fbe9aa4c57d3ba6d010e1
BLAKE2b-256 ebd0e4077f28c0af6c47ce8d3411bdd252521311f2804ed9a99bdd3648fd45a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page