Skip to main content

Simple, Pythonic text processing. Sentiment analysis, POS tagging, noun phrase parsing, and more.

Project description

TextBlob: Simplified Text Processing

Latest version Travis-CI Number of PyPI downloads Flattr Steve

Homepage: https://textblob.readthedocs.org/

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

from text.blob import TextBlob

text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''

blob = TextBlob(text)
blob.tags           # [(u'The', u'DT'), (u'titular', u'JJ'),
                    #  (u'threat', u'NN'), (u'of', u'IN'), ...]

blob.noun_phrases   # WordList(['titular threat', 'blob',
                    #            'ultimate movie monster',
                    #            'amoeba-like mass', ...])

for sentence in blob.sentences:
    print(sentence.sentiment)  # returns (polarity, subjectivity)
# (0.060, 0.605)
# (-0.341, 0.767)

blob.translate(to="es")  # 'La amenaza titular de The Blob...'

TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

Features

  • Noun phrase extraction

  • Part-of-speech tagging

  • Sentiment analysis

  • Classification (Naive Bayes)

  • Language translation and detection powered by Google Translate

  • Tokenization (splitting text into words and sentences)

  • Word and phrase frequencies

  • Parsing

  • n-grams

  • Word inflection (pluralization and singularization)

  • Spelling correction

  • JSON serialization

  • Easily swap models, or create your own

Get it now

$ pip install -U textblob
$ curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python

Examples

See more examples at the Quickstart guide.

Documentation

Full documentation is available at https://textblob.readthedocs.org/.

Requirements

  • Python >= 2.6 or >= 3.3

License

MIT licensed. See the bundled LICENSE file for more details.

Changelog

0.6.0 (2013-08-25)

  • Add Naive Bayes classification. New text.classifiers module, TextBlob.classify(), and Sentence.classify() methods.

  • Add parsing functionality via the TextBlob.parse() method. The text.parsers module currently has one implementation (PatternParser).

  • Add spelling correction. This includes the TextBlob.correct() and Word.spellcheck() methods.

  • Update NLTK.

  • Backwards incompatible: clean_html has been deprecated, just as it has in NLTK. Use Beautiful Soup’s soup.get_text() method for HTML-cleaning instead.

  • Slight API change to language translation: if from_lang isn’t specified, attempts to detect the language.

  • Add itokenize() method to tokenizers that returns a generator instead of a list of tokens.

0.5.3 (2013-08-21)

  • Unicode fixes: This fixes a bug that sometimes raised a UnicodeEncodeError upon creating accessing sentences for TextBlobs with non-ascii characters.

  • Update NLTK

0.5.2 (2013-08-14)

  • Important patch update for NLTK users: Fix bug with importing TextBlob if local NLTK is installed.

  • Fix bug with computing start and end indices of sentences.

0.5.1 (2013-08-13)

  • Fix bug that disallowed display of non-ascii characters in the Python REPL.

  • Backwards incompatible: Restore blob.json property for backwards compatibility with textblob<=0.3.10. Add a to_json() method that takes the same arguments as json.dumps.

  • Add WordList.append and WordList.extend methods that append Word objects.

0.5.0 (2013-08-10)

  • Language translation and detection API!

  • Add text.sentiments module. Contains the PatternAnalyzer (default implementation) as well as a NaiveBayesAnalyzer.

  • Part-of-speech tags can be accessed via TextBlob.tags or TextBlob.pos_tags.

  • Add polarity and subjectivity helper properties.

0.4.0 (2013-08-05)

  • New text.tokenizers module with WordTokenizer and SentenceTokenizer. Tokenizer instances (from either textblob itself or NLTK) can be passed to TextBlob’s constructor. Tokens are accessed through the new tokens property.

  • New Blobber class for creating TextBlobs that share the same tagger, tokenizer, and np_extractor.

  • Add ngrams method.

  • Backwards-incompatible: TextBlob.json() is now a method, not a property. This allows you to pass arguments (the same that you would pass to json.dumps()).

  • New home for documentation: https://textblob.readthedocs.org/

  • Add parameter for cleaning HTML markup from text.

  • Minor improvement to word tokenization.

  • Updated NLTK.

  • Fix bug with adding blobs to bytestrings.

0.3.10 (2013-08-02)

  • Bundled NLTK no longer overrides local installation.

  • Fix sentiment analysis of text with non-ascii characters.

0.3.9 (2013-07-31)

  • Updated nltk.

  • ConllExtractor is now Python 3-compatible.

  • Improved sentiment analysis.

  • Blobs are equal (with ==) to their string counterparts.

  • Added instructions to install textblob without nltk bundled.

  • Dropping official 3.1 and 3.2 support.

0.3.8 (2013-07-30)

  • Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to noun_phrases (instead of training them every time you import TextBlob).

  • Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).

  • NPExtractor and Tagger objects can be passed to TextBlob’s constructor.

  • Fix bug with POS-tagger not tagging one-letter words.

  • Rename text/np_extractor.py -> text/np_extractors.py

  • Add run_tests.py script.

0.3.7 (2013-07-28)

  • Every word in a Blob or Sentence is a Word instance which has methods for inflection, e.g word.pluralize() and word.singularize().

  • Updated the np_extractor module. Now has an new implementation, ConllExtractor that uses the Conll2000 chunking corpus. Only works on Py2.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textblob-0.6.0.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

textblob-0.6.0-py2.py3-none-any.whl (1.5 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file textblob-0.6.0.tar.gz.

File metadata

  • Download URL: textblob-0.6.0.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for textblob-0.6.0.tar.gz
Algorithm Hash digest
SHA256 bdb78f358721520d91a222c84bb1a9f1e4e88a9f58748521ede4e7279cac1b97
MD5 d986a813c9746c815742b3073cc32872
BLAKE2b-256 45e8dac0701597b9104ad91c991b0c414cd5f6f50ac4a564f2b1f5a81869929a

See more details on using hashes here.

File details

Details for the file textblob-0.6.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for textblob-0.6.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a446aac630ec3c2273e44cc5ca7b31709425022b461e8560b4be7300d688b23d
MD5 7fc30420b714d3c5617b001e194d8de9
BLAKE2b-256 ca150364db2b9e4a030ef887a069046df1811d8851e4f02051a8db6a7d0c72d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page