Treebank tokenizer for English
Project description
Penn Treebank tokenizer
This is a simple fork of the famous Penn Treebank tokenizer. It is forked from DetectorMorse via NLTK.
- It is appropriate for English, but not other languages.
- It is appropriate when applied one sentence at a time, but should not be applied to paragraphs or documents.
Unlike the NLTK equivalent, it has no (library or data) dependencies except the
built-in re
. Unlike the NLTK
equivalent, it is not hostilely
polymorphic.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ptbtok-0.1.tar.gz
(3.1 kB
view details)
File details
Details for the file ptbtok-0.1.tar.gz
.
File metadata
- Download URL: ptbtok-0.1.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33a7f1447dba8edad9a9fa89921dea44f0d2ccf3634dbc6a55b2383a2f69da6f |
|
MD5 | b9a2758f0065060733bc2961a3b021c3 |
|
BLAKE2b-256 | c34cf7bfe412c409a2fb74b0726a6a7d540704c70c625eca1f5b7bcec663f414 |