An information extraction toolkit built on top of NLTK.
Project description
An information extraction toolkit.
To discuss the project with use, join our maing list: http://groups.google.com/forum/?fromgroups#!forum/bluestocking-dev
This project depends on NLTK. You will need to install it before running these scripts.
To run tests:
python tests.py
To run factchecker demo, try this:
python factchecker.py “The sky is not blue.”
or this:
python factchecker.py “People never eat fish. Goldfish are unpopular.”
This test a document against the Simple English Wikipedia articles for each word in the tested document. Try replacing test-factchecker.txt with your own text file!
(Warning: documents with long sentences take longer to query)
Scripts included:
### parse.py
Defines Document class for wrapping raw text and Parser class for extracting Relations from a Document.
Relations encapsulate a semantically significant lexical cooccurence.
Documents have a method to turn them into Doxaments (see below).
### doxament.py
Defines a Doxament class. A Doxament contains many Relations. A Doxament may be queried for consistency with another Doxament. They may also be merged to form a more complete knowledge base.
### other
wikipedia.py and wiki2plain.py from http://stackoverflow.com/questions/4460921/extract-the-first-paragraph-from-a-wikipedia-article-python
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file bluestocking-0.1.1.tar.gz
.
File metadata
- Download URL: bluestocking-0.1.1.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b1d823a920e1735f68af99b642ce808534bd4fd1841981318c8cde9d589786c |
|
MD5 | 4969728cd2568c0704c84aecc5a3f41d |
|
BLAKE2b-256 | 504d0b697491597b9dbc07ba31ae845f14cd979fe22b6bd54a18290c11c43f2a |