Skip to main content

library to compare HTML while ignoring non-functional differences

Project description

htmlcompare

A Python library to ensure two HTML documents are "equal". Currently the functionality is very limited but the idea is that the library should ignore differences automatically when these are not relevant for HTML semantics (e.g. <img style=""> is the same as <img>, style="color: black; font-weight: bold" is equal to style="font-weight:bold;color:black;").

Usage

import htmlcompare

diff = htmlcompare.compare('<div>', '<p>')
is_same = bool(diff)

To ease testing the library provides some helpers

from htmlcompare import assert_different_html, assert_same_html

assert_different_html('<br>', '<p>')
assert_same_html('<div />', '<div></div>')

Limitations / Plans

Only basic CSS support. Declarations in style attributes are parsed with tinycss2 (Python 3.5+) so ordering of declarations and extra whitespace should not matter. tinycss2 does not support Python 2 and 3.4 so the only help here is to strip trailing ;s in style attributes. Contents of <style> tags are completely ignored for now (even with tinycss2).

No validation of conditional comments. Not sure which library I can use here but at some point I'll likely need this as well.

JavaScript - for obvious reasons it will be impossible to implement perfect JS comparison but it might be possible to run some kind of "beautifier" to take care of insignificant stylistic changes. However I don't need this feature so this is unlikely to get implemented (unless contributed by someone else).

Custom hooks could help adapting the comparison to your specific needs. However I don't know which API would be best so this will wait until there are real-world use cases.

Better API: The current API is very minimal and implements just what I needed right now. I hope to improve the API once I use this project in more complex scenarios.

Other projects

xmldiff is a well established project to compare two XML documents. However it seems as if the code does not contain knowledge about specific HTML semantics (e.g. CSS, empty attributes, insignificant attribute order).

Misc

The code is licensed under the MIT license. It supports Python 2.7 and Python 3.4+.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

HTMLCompare-0.3.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

HTMLCompare-0.3.0-py2.py3-none-any.whl (9.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file HTMLCompare-0.3.0.tar.gz.

File metadata

  • Download URL: HTMLCompare-0.3.0.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.4.2 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.8

File hashes

Hashes for HTMLCompare-0.3.0.tar.gz
Algorithm Hash digest
SHA256 24acd54a9227625543801c14bf054e887ea7e6c227e6764f25aa4bd5a71656a1
MD5 4b993800966f9f036d1830daba99fec0
BLAKE2b-256 ea4eb559f0c13c5bdf49f4886fb1187fa83be7ca7620120f6d14f2c64c6f834c

See more details on using hashes here.

File details

Details for the file HTMLCompare-0.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: HTMLCompare-0.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.4.2 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.8

File hashes

Hashes for HTMLCompare-0.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7a99536a4b5b071d867bdb5341a7275030c35e3bcfa5b8545d41153497b85723
MD5 508cc2f4529af8ae2b1b6b9d2c97b499
BLAKE2b-256 05863d86a1a2561e6eb24d56c5e39dd94a144217bff8b83f174c9d4ee02d9b7f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page