Lib to extract html elements by preserving ancestors and cleaning CSS
Project description
Chopper is a tool to extract elements from HTML by preserving ancestors and CSS rules.
Compatible with Python >= 2.6, <= 3.4
Installation
pip install chopper
Full documentation
Quick start
from chopper.extractor import Extractor
HTML = """
<html>
<head>
<title>Test</title>
</head>
<body>
<div id="header"></div>
<div id="main">
<div class="iwantthis">
HELLO WORLD
<a href="/nope">Do not want</a>
</div>
</div>
<div id="footer"></div>
</body>
</html>
"""
CSS = """
div { border: 1px solid black; }
div#main { color: blue; }
div.iwantthis { background-color: red; }
a { color: green; }
div#footer { border-top: 2px solid red; }
"""
extractor = Extractor.keep('//div[@class="iwantthis"]').discard('//a')
html, css = extractor.extract(HTML, CSS)
The result is :
>>> html
"""
<html>
<body>
<div id="main">
<div class="iwantthis">
HELLO WORLD
</div>
</div>
</body>
</html>"""
>>> css
"""
div{border:1px solid black;}
div#main{color:blue;}
div.iwantthis{background-color:red;}
"""
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
chopper-0.4.8-py3-none-any.whl
(12.4 kB
view details)
File details
Details for the file chopper-0.4.8-py3-none-any.whl
.
File metadata
- Download URL: chopper-0.4.8-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46c08b4904e2fde976e225d77ee3987e03e1e8e8522efc47d0e16f1ca0a9eda5 |
|
MD5 | 6580550cf0e7c23b0533ea9ba8c30653 |
|
BLAKE2b-256 | 7526c008ac779cc539ce1109d6180bcebe5c883df65d4f9e07727fb12da85fde |