Skip to main content

A library for creating statistical NER systems that work on HTML data

Project description

PyPI Version Build Status Code Coverage Documentation

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

To run tests, make sure tox is installed, then run tox from the source root.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webstruct-0.6.tar.gz (48.3 kB view details)

Uploaded Source

Built Distribution

webstruct-0.6-py2.py3-none-any.whl (63.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file webstruct-0.6.tar.gz.

File metadata

  • Download URL: webstruct-0.6.tar.gz
  • Upload date:
  • Size: 48.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.6.tar.gz
Algorithm Hash digest
SHA256 839443b4d22c2e3cca58545d3947b752132c640d5753580b5faa2e05374e79cd
MD5 96ebcf483c5dea21a0d2ed5a07679945
BLAKE2b-256 42336da21470f8eba9ea2858c394fd806af9f9d191d8f234df8de5d69c9b2f69

See more details on using hashes here.

Provenance

File details

Details for the file webstruct-0.6-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for webstruct-0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b482e789bb39291e62b573c9a089ce06a2510f4f967695b5824252010bf4c332
MD5 46f4f7b5da5d9129848b70d941da403c
BLAKE2b-256 e7c4b0c13f60b24013e4a560a27b23b4547191d0581c67a35d8c64bc57eb83cc

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page