Skip to main content

A library for creating statistical NER systems that work on HTML data

Project description

https://travis-ci.org/scrapinghub/webstruct.svg?branch=master

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

To run tests, make sure nose is installed, then run runtests.sh script.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webstruct-0.3.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

webstruct-0.3-py3-none-any.whl (51.8 kB view details)

Uploaded Python 3

File details

Details for the file webstruct-0.3.tar.gz.

File metadata

  • Download URL: webstruct-0.3.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.3.tar.gz
Algorithm Hash digest
SHA256 1a9e06726891d041d662e352a45d0c4e23dc01105ce659bb3b48674c8300e8d4
MD5 0f384ee6c807347cd9761ff22094fd3e
BLAKE2b-256 afb2b4d7bd1931795c1f10889d39f262e63ca6d422382584a3bb516a69ecc90d

See more details on using hashes here.

Provenance

File details

Details for the file webstruct-0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for webstruct-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 89a3be6e8d4fb689310c1605e492a70e6f730140be984972e9ef923e7280fc48
MD5 125cf23eff305384422eddb39b60f088
BLAKE2b-256 e0add186214161c1a08a520f3d3efb95b4cf8494bd414dde5ae17d8e61e7a9cf

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page