Skip to main content

A library for creating statistical NER systems that work on HTML data

Project description

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

To run tests, make sure nose is installed, then run runtests.sh script.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webstruct-0.2.tar.gz (32.2 kB view details)

Uploaded Source

File details

Details for the file webstruct-0.2.tar.gz.

File metadata

  • Download URL: webstruct-0.2.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.2.tar.gz
Algorithm Hash digest
SHA256 52c568b3e2460538c7ea5f6339c231398b4e6e891e1277b6bc2e993c519c2bd9
MD5 8ae6e94a3f4c4bbc518a1f3bf0171cad
BLAKE2b-256 01b8a300513adfaf62abd8223299a59e057d34202725d660e8d798dae2ce5507

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page