transmogrify.htmlcontentextractor · PyPI

Some features may not work without JavaScript. Please try enabling it if you encounter problems.

This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis

These details have not been verified by PyPI

Project links

Homepage

Project description

Introduction

transmogrify.htmlcontentextractor: This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis

Changelog

1.0b1 (2010-11-03)

ignore already found items. better debug [“Dylan Jay”]
skip templates if item already parsed [“Dylan Jay”]
print automaticly found XPaths [“Dylan Jay”]
make text fields strip tail text [“Vitaliy Podoba”]

1.0dev (2010-03-22)

split the auto templatefinder out to it’s own blueprint [“Dylan Jay”]

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0

Mar 22, 2010

1.0b5 pre-release

Jun 29, 2011

1.0b5dev pre-release

Jun 29, 2011

1.0b4 pre-release

Feb 6, 2011

1.0b3 pre-release

Dec 13, 2010

1.0b2 pre-release

Nov 8, 2010

This version

1.0b1 pre-release

Nov 7, 2010

1.0dev pre-release

Mar 22, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transmogrify.htmlcontentextractor-1.0b1.zip (381.2 kB view hashes)

Uploaded Nov 7, 2010 Source

Hashes for transmogrify.htmlcontentextractor-1.0b1.zip

Hashes for transmogrify.htmlcontentextractor-1.0b1.zip
Algorithm	Hash digest
SHA256	`1698e6ef619b670ba16ed72492aa05e649bbc7b78dd951b07d79f11ec161f04f`
MD5	`74cf35ddd26825c6acc3a2c92f2da7ba`
BLAKE2b-256	`be872d68d34c6d889d0a230ba259b57ce4cc1898992acd021d4e5f295636be0d`

Supported by

AWS

AWS Cloud computing and Security Sponsor

Datadog

Datadog Monitoring

Fastly

Google

Google Download Analytics

Microsoft

Microsoft PSF Sponsor

Pingdom

Pingdom Monitoring

Sentry

Sentry Error logging

StatusPage

StatusPage Status page