Skip to main content

Treebank tokenizer for English

Project description

Penn Treebank tokenizer

This is a simple fork of the famous Penn Treebank tokenizer. It is forked from DetectorMorse via NLTK.

  • It is appropriate for English, but not other languages.
  • It is appropriate when applied one sentence at a time, but should not be applied to paragraphs or documents.

Unlike the NLTK equivalent, it has no (library or data) dependencies except the built-in re. Unlike the NLTK equivalent, it is not hostilely polymorphic.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptbtok-0.1.tar.gz (3.1 kB view details)

Uploaded Source

File details

Details for the file ptbtok-0.1.tar.gz.

File metadata

  • Download URL: ptbtok-0.1.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for ptbtok-0.1.tar.gz
Algorithm Hash digest
SHA256 33a7f1447dba8edad9a9fa89921dea44f0d2ccf3634dbc6a55b2383a2f69da6f
MD5 b9a2758f0065060733bc2961a3b021c3
BLAKE2b-256 c34cf7bfe412c409a2fb74b0726a6a7d540704c70c625eca1f5b7bcec663f414

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page