Skip to main content

A library for creating statistical NER systems that work on HTML data

Project description

PyPI Version Build Status Code Coverage Documentation

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

To run tests, make sure tox is installed, then run tox from the source root.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webstruct-0.5.tar.gz (42.3 kB view details)

Uploaded Source

Built Distribution

webstruct-0.5-py2.py3-none-any.whl (56.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file webstruct-0.5.tar.gz.

File metadata

  • Download URL: webstruct-0.5.tar.gz
  • Upload date:
  • Size: 42.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.5.tar.gz
Algorithm Hash digest
SHA256 bbf7abd484dc2beb9a19842aeb8cd1777cab7ce71afdbb974e6362a0b3b1b8f4
MD5 e3f36d3477b193666443b16c4de427be
BLAKE2b-256 28f89b046f1972415a6355c035f99a6410181d1e4fdc2bf478bfe2708bc39327

See more details on using hashes here.

Provenance

File details

Details for the file webstruct-0.5-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for webstruct-0.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7513fd32e4ccca73b4074fc06bc8a0e2e5c1350212549ad2caa462be8db71669
MD5 43e2de233d994d83c7781ad92cd497d6
BLAKE2b-256 7e492a7f087df7a532e8cc620dc9276278349532353fa78a1c6b83794d1b43b7

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page