webstruct

A library for creating statistical NER systems that work on HTML data

These details have been verified by PyPI

Maintainers

dangra kmike lopuhin pablohoffman scrapinghub scrapy

These details have not been verified by PyPI

Project links

Homepage

Project description

Webstruct is a library for creating statistical NER systems that work on HTML data, i.e. a library for building tools that extract named entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only on text data. This allows to define features that use HTML structure, and also to embed annotation results back into HTML.

Read the docs for more info.

License is MIT.

Contributing

Source code: https://github.com/scrapinghub/webstruct
Bug tracker: https://github.com/scrapinghub/webstruct/issues

To run tests, make sure tox is installed, then run tox from the source root.

Project details

These details have been verified by PyPI

Maintainers

dangra kmike lopuhin pablohoffman scrapinghub scrapy

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.6

Dec 29, 2017

0.5

May 10, 2017

0.4.1

Nov 28, 2016

0.4

Nov 26, 2016

0.3

Sep 19, 2016

0.2

Apr 21, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webstruct-0.6.tar.gz (48.3 kB view details)

Uploaded Dec 29, 2017 Source

Built Distribution

webstruct-0.6-py2.py3-none-any.whl (63.0 kB view details)

Uploaded Dec 29, 2017 Python 2 Python 3

File details

Details for the file webstruct-0.6.tar.gz.

File metadata

Download URL: webstruct-0.6.tar.gz
Upload date: Dec 29, 2017
Size: 48.3 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.6.tar.gz
Algorithm	Hash digest
SHA256	`839443b4d22c2e3cca58545d3947b752132c640d5753580b5faa2e05374e79cd`
MD5	`96ebcf483c5dea21a0d2ed5a07679945`
BLAKE2b-256	`42336da21470f8eba9ea2858c394fd806af9f9d191d8f234df8de5d69c9b2f69`

See more details on using hashes here.

File details

Details for the file webstruct-0.6-py2.py3-none-any.whl.

File metadata

Download URL: webstruct-0.6-py2.py3-none-any.whl
Upload date: Dec 29, 2017
Size: 63.0 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for webstruct-0.6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b482e789bb39291e62b573c9a089ce06a2510f4f967695b5824252010bf4c332`
MD5	`46f4f7b5da5d9129848b70d941da403c`
BLAKE2b-256	`e7c4b0c13f60b24013e4a560a27b23b4547191d0581c67a35d8c64bc57eb83cc`