This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis
Project description
Introduction
- transmogrify.htmlcontentextractor
This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis
Changelog
1.0b3 (2010-12-13)
simpler autogenerated xpath
better logging
1.0b2 (2010-11-09)
Put condition on autofinder so can be turned off
1.0b1 (2010-11-03)
ignore already found items. better debug [“Dylan Jay”]
skip templates if item already parsed [“Dylan Jay”]
print automaticly found XPaths [“Dylan Jay”]
make text fields strip tail text [“Vitaliy Podoba”]
1.0dev (2010-03-22)
split the auto templatefinder out to it’s own blueprint [“Dylan Jay”]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for transmogrify.htmlcontentextractor-1.0b3.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39c7a17c99481b7e376d8b63b38a9c7a18692676289519dc48b2f0506d2696a6 |
|
MD5 | 7a656cc5a6c36a6938c36089b49a1af1 |
|
BLAKE2b-256 | aeb12ef7cbb6524cdc1d7e4ba97843b8c2848aa4f1d9294bf159c71d29c38105 |