Skip to main content

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Project description

HTMLmetadata

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Inspired in https://metascraper.js.org

Install

pip install htmlmetadata

Use

You can use it by calling the module directly.

python -m htmlmetadata http://schema.org/docs/about.html                                                                            
{
  "request": {
    "url": "http://schema.org/docs/about.html"
  },
  "summary": {
    "description": "Schema.org is a set of extensible schemas that enables webmasters to embed\n    structured data on their web pages for use by search engines and other applications.",
    "title": "about page - schema.org",
    "language": "en"
  }
}

Or use it directly in your code.

from htmlmetadata import extract_metadata

data = extract_metadata("http://schema.org/docs/about.html")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmlmetadata-1.0.zip (8.3 kB view details)

Uploaded Source

Built Distribution

htmlmetadata-1.0-py2.py3-none-any.whl (5.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file htmlmetadata-1.0.zip.

File metadata

  • Download URL: htmlmetadata-1.0.zip
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0

File hashes

Hashes for htmlmetadata-1.0.zip
Algorithm Hash digest
SHA256 f4e3934edd422e90acbc3de0fc68008564c7fde00ce944804d9c362f2c7509d5
MD5 643df0df7ff0bf232856ecbb227d4b93
BLAKE2b-256 b017524ab54164fcc98279f9b3d005e15ff929a340d01d0e5f636bf48d6f63c3

See more details on using hashes here.

File details

Details for the file htmlmetadata-1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: htmlmetadata-1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0

File hashes

Hashes for htmlmetadata-1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6715d60226dbcbb826f79472771b45089f08cf9d6e65d323b5c9ba77bb27f171
MD5 43a4cab795f1e1446d51d22300524e74
BLAKE2b-256 177ea3bd8045025135c40615b2318e5d741d4e8465b9895ad754fc1ea323b68e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page