Skip to main content

An information extraction toolkit built on top of NLTK.

Project description

An information extraction toolkit.

To discuss the project with use, join our maing list: http://groups.google.com/forum/?fromgroups#!forum/bluestocking-dev

This project depends on NLTK. You will need to install it before running these scripts.

To run tests:

python tests.py

To run factchecker demo, try this:

python factchecker.py “The sky is not blue.”

or this:

python factchecker.py “People never eat fish. Goldfish are unpopular.”

This test a document against the Simple English Wikipedia articles for each word in the tested document. Try replacing test-factchecker.txt with your own text file!

(Warning: documents with long sentences take longer to query)

Scripts included:

### parse.py

Defines Document class for wrapping raw text and Parser class for extracting Relations from a Document.

Relations encapsulate a semantically significant lexical cooccurence.

Documents have a method to turn them into Doxaments (see below).

### doxament.py

Defines a Doxament class. A Doxament contains many Relations. A Doxament may be queried for consistency with another Doxament. They may also be merged to form a more complete knowledge base.

### other

wikipedia.py and wiki2plain.py from http://stackoverflow.com/questions/4460921/extract-the-first-paragraph-from-a-wikipedia-article-python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bluestocking-0.1.0.tar.gz (19.8 kB view details)

Uploaded Source

File details

Details for the file bluestocking-0.1.0.tar.gz.

File metadata

File hashes

Hashes for bluestocking-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0991f01b0ee45ac32df32587170c7b20be3994a7ef957ced0dd418319838cf68
MD5 007712c9fee62fe62a8b9ae2c163e1f6
BLAKE2b-256 4b4ce7a53e76a28d9b9ee556780b5d003d925a6c1799ff26c58e968698144699

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page