Project description

lxml_html_clean

Motivation

This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.

Important: the HTML Cleaner in lxml_html_clean is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.

This project uses functions from Python's urllib.parse for URL parsing which do not validate inputs. For more information on potential security risks, refer to the URL parsing security documentation. A maliciously crafted URL could potentially bypass the allowed hosts check in Cleaner.

Installation

You can install this project directly via pip install lxml_html_clean or as an extra of lxml via pip install lxml[html_clean]. Both ways install this project together with lxml itself.

Security

For discussions regarding security-related issues or any sensitive reports, please contact us privately. You can reach out to lbalhar(at)redhat.com or frenzy.madness(at)gmail.com to ensure your concerns are addressed confidentially and securely.

Documentation

https://lxml-html-clean.readthedocs.io/

License

BSD-3-Clause

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

Oct 9, 2024

0.3.0

Oct 9, 2024

0.2.2

Aug 30, 2024

0.2.1

Aug 29, 2024

0.2.0

Jul 29, 2024

0.1.1

Apr 5, 2024

0.1.0

Feb 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lxml_html_clean-0.3.1.tar.gz (20.9 kB view hashes)

Uploaded Oct 9, 2024 Source

Built Distribution

lxml_html_clean-0.3.1-py3-none-any.whl (13.9 kB view hashes)

Uploaded Oct 9, 2024 Python 3

Hashes for lxml_html_clean-0.3.1.tar.gz

Hashes for lxml_html_clean-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`d9f7d8ae36092f4ed5079cfbf95ff06d3c6fd04f4a861422ce251ece72d3c4b5`
MD5	`3714194432fb5a1982436cb136f7f066`
BLAKE2b-256	`ebc9efd2064658c33d248a9522825dfb38c82619579754c0320103e632829b16`

Hashes for lxml_html_clean-0.3.1-py3-none-any.whl

Hashes for lxml_html_clean-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`081e6378c68ebb4113940ed82a3534c99e24ba1ca5ad5ce8868c7c4d264618f1`
MD5	`9b60f2bbe4eafeed09f68fd185487b17`
BLAKE2b-256	`0eae2fc4fc394031ba4e474ae4e0da6822f601900f278f3fade5e9422897eb89`