Skip to main content

A library for creating statistical NER systems that work on HTML data

Project description

https://travis-ci.org/scrapinghub/webstruct.svg?branch=master https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

To run tests, make sure tox is installed, then run tox from the source root.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webstruct-0.4.1.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

webstruct-0.4.1-py2.py3-none-any.whl (53.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file webstruct-0.4.1.tar.gz.

File metadata

  • Download URL: webstruct-0.4.1.tar.gz
  • Upload date:
  • Size: 40.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.4.1.tar.gz
Algorithm Hash digest
SHA256 af61c40f9d379530dc5b53832aea7dfde4711e15ead08c3bd6c2b1ad371d8863
MD5 d26c7ce9eaa134aff3bfe87f40a2f73d
BLAKE2b-256 bdc31e602693b6f6a1d8f2e753ebb718b548570b59f7b970f06170ef578c250d

See more details on using hashes here.

Provenance

File details

Details for the file webstruct-0.4.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for webstruct-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1fee1794794e82298b782050aeb90ef1482b47a1187fbdb07019cc0ac7cc6ce3
MD5 29f99a62b2ada4e8fda6248810bf9cac
BLAKE2b-256 f32d6523d8717fec4eca493b55b149123a7af4ec1b511da4cb2f63d133b44445

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page