Skip to main content

Fast and Customizable Tokenizers

Project description



Build GitHub


Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

  • Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
  • Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
  • Easy to use, but also extremely versatile.
  • Designed for research and production.
  • Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
  • Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install setuptools_rust
python setup.py install

Load a pretrained tokenizer from the Hub

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("bert-base-cased")

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

  • CharBPETokenizer: The original BPE
  • ByteLevelBPETokenizer: The byte level version of the BPE
  • SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
  • BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(
    vocab_size=20000,
    min_frequency=2,
    initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
    "./path/to/dataset/1.txt",
    "./path/to/dataset/2.txt",
    "./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.11.4.tar.gz (217.6 kB view details)

Uploaded Source

Built Distributions

tokenizers-0.11.4-cp310-cp310-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

tokenizers-0.11.4-cp310-cp310-win32.whl (3.0 MB view details)

Uploaded CPython 3.10 Windows x86

tokenizers-0.11.4-cp310-cp310-macosx_10_11_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.10 macOS 10.11+ x86-64

tokenizers-0.11.4-cp39-cp39-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

tokenizers-0.11.4-cp39-cp39-win32.whl (3.0 MB view details)

Uploaded CPython 3.9 Windows x86

tokenizers-0.11.4-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

tokenizers-0.11.4-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

tokenizers-0.11.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tokenizers-0.11.4-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tokenizers-0.11.4-cp39-cp39-macosx_10_11_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.9 macOS 10.11+ x86-64

tokenizers-0.11.4-cp38-cp38-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

tokenizers-0.11.4-cp38-cp38-win32.whl (3.0 MB view details)

Uploaded CPython 3.8 Windows x86

tokenizers-0.11.4-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

tokenizers-0.11.4-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

tokenizers-0.11.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

tokenizers-0.11.4-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tokenizers-0.11.4-cp38-cp38-macosx_10_11_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.8 macOS 10.11+ x86-64

tokenizers-0.11.4-cp37-cp37m-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.7m Windows x86-64

tokenizers-0.11.4-cp37-cp37m-win32.whl (3.0 MB view details)

Uploaded CPython 3.7m Windows x86

tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.7 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ s390x

tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.1 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ppc64le

tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.0 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

tokenizers-0.11.4-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tokenizers-0.11.4-cp37-cp37m-macosx_10_11_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.7m macOS 10.11+ x86-64

tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.7 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ s390x

tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.1 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ppc64le

tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.0 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ARM64

tokenizers-0.11.4-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tokenizers-0.11.4.tar.gz.

File metadata

  • Download URL: tokenizers-0.11.4.tar.gz
  • Upload date:
  • Size: 217.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4.tar.gz
Algorithm Hash digest
SHA256 62bfccd1de58d1372d3789f7ba2bbd5cd99cb5e799d5a70ffa8248d0fe4f7d81
MD5 c2d8ae752afcd462ddd2618bc8290340
BLAKE2b-256 e28f1caad5d4652335da37cd9e19d1db17647156b79c67c78df06ac77664e127

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ce6199e343b887556ea5a8e12818d612c47b3d8eeeeed2c89c64d248a236559a
MD5 dd7c7fd21e5a2d4815282dd1100c4551
BLAKE2b-256 e94b8afbc06f5b836bf931f51d3c4544df4b89a67d63024159ec19fe9a3d3d4a

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp310-cp310-win32.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp310-cp310-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 b06b59578ddf5f9b07c5f41f8cb2fc9b842842647da16a83b34daca5a9e888a0
MD5 83a16cb13ba4b461cf907e3dc5ee6c79
BLAKE2b-256 a75d83299498673b78f2103c95aba06bb3cb07463e52711665857a19e403e107

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp310-cp310-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp310-cp310-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.10, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp310-cp310-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 db80d19ac5e5fc99dcbc6050153e8c9c9f24e77ce3123b38ea5f94d47aa5edcb
MD5 efd6a904cc9dea2cd81e84f3cffaa667
BLAKE2b-256 73dd5b5088a9c4df63f67453dffef190542f8939928a258a0cd8804e1ca627d7

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 154f58fc3308db3896024adf6b20f77d464b4dd1b2bfe62c29d94046794a5e6e
MD5 a74a9c4c4dd635167955ae1500ac691b
BLAKE2b-256 674803dbfbc8108f0fe647547769efbaf314d0f36a9d5b84262ba954c51f7bd7

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp39-cp39-win32.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp39-cp39-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 df99148ade5c5e13af0c80bf051fb7c2440498fe9127f68233d2ef1eadd3399c
MD5 90e61a49d5ebd39be4e7ea51b5fab285
BLAKE2b-256 23300c1e16c5780688904aae0f46d36b72c5ef43e1a5342a5ca84de3ee1cecca

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 be07d100c8a9993ec4da017e8b5b4d8045ec60cfbf82041c5688be07e4169fc1
MD5 712cf10b72d34542587502b5dda1253b
BLAKE2b-256 9d93ec1d97e3a4d9b8561a0e8a1e39147fba14dc9da96bcc68383712aee5cf3d

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 a3d55e4c0ea4b41a1da97a0686346783fe0a8e9287912de88726a31ab17d19a1
MD5 e2f99548d8180af29c5f71a168e27d4e
BLAKE2b-256 4eea8cd921ab91dba4d870d2a61897bf0e06492221c57d11793bd9dc3f57bbec

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 caa0bb86866f70c933b98aede5b90c3c9eabe29dd5d05dd60827abeb95112490
MD5 12dd562ffbcb2ffe58aa6a36a03a02b5
BLAKE2b-256 b87dcea7434842343e617872b28125cb125d606022b45176469d8f5e0704bf61

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f4c60016957a9d1e370484c07b2f0ce2c53b96b750797ee1abbff14e962a21aa
MD5 7b85a0e0b36b481e181de4b6b5155e35
BLAKE2b-256 c132881872e1ad667f2dba45230ca6802b319efcdac7be5b8fbb9f4549e5c822

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp39-cp39-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp39-cp39-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.9, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp39-cp39-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 d0a458b490fcd2d8182cd66586d379a06031a1cd37acaefa3edb5c449007d64c
MD5 a64570560e8c892d1604672aaa34355f
BLAKE2b-256 aa1073d163e0d50006cff393b40a4d8765e1ff83714faa9b28a1aea15519e048

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 04367906a2f8ac76bf7919bfe520abdcf9f66e43988a63db91431bae55d294ca
MD5 e072cf508f5949b1238e8514fb98cf30
BLAKE2b-256 1a942f48b143851e22ad23e12eb2b4046511195a306847c514872aca883602bf

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp38-cp38-win32.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp38-cp38-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 ef5f99563a1ef3b48144ab53c87775497d34fa551339d8c94b79c3eadb03b6d1
MD5 65bc19ee1d27424cc56bcaf0a5fc687c
BLAKE2b-256 97c56cadf33a763c902c9f7951ceb198c7870dc2fbadc567400bc92721dae684

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 97c85048960e943d02bf5cbce7e5b1dcc4560936120292079dd426a2b0612e2c
MD5 de32bc3d11f86499661c95b7926d7019
BLAKE2b-256 85d7f8c25d1393901720b5370479a588f9682fef649f016b13131f4ffa15407d

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 f405c5c27606ec9a9e04c9a5bcaa8948381ec5ccef58426939ecf7aa8c5a5d09
MD5 5ec4ddb69675bee9e284d1fbea84da33
BLAKE2b-256 f80bf5600b7d2e0f3a315c211d9fe0c0d3c414cad00df7ee1d309a5174150e5f

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2778fcb87746d2d06b787ed281a2755508879c312424f56958eea4dc58c86006
MD5 bf2e328fa9ead5edc76821c36cf09cca
BLAKE2b-256 8c27fc38a12689213343b0afd894a3129884cd5e898b4efb3748de45f334ba1e

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5833d07e7601266658251c4ec84dc47b380bf7be9703b9e49b0a2338daa5ec5d
MD5 da0a8b68db5e94474d4714084333bab8
BLAKE2b-256 e7831b1214ee7daf3217fd630e3371a1f68dceb036dda4ec70e32f6612125f82

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp38-cp38-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp38-cp38-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.8, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp38-cp38-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 79933a2044acec7f3a693bdcb7d963ae01a499e29e2b7a9431e35637a3597ba1
MD5 c4f6a0428b9e0aca7c52a10047b54522
BLAKE2b-256 cf1eec914e8c1dc58ca3e1a57843fc1645c1b2b7c2c66dd3c503c3925ed829ae

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d70eeb1e2cb518936aa20a416cca481de618ca54657a3f3eb671aa8ef206972a
MD5 9627499ee0cc2e6a6ffca7d6d7cefb02
BLAKE2b-256 f70c025ee9b5517ecdfd2d35b401e1e4ca17c3bf7bcf3f14a6959c7dfddf1a0a

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp37-cp37m-win32.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 aaebcfcf37640d771cd13c9893f5ea5afc815707ee610576ad10610619170ed6
MD5 39b1697975ba9a1e8db96557616c5c18
BLAKE2b-256 28d62ab56c226368ab8692a0bef053eb1d955cdecaf43b515549eff12d8419ce

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 56de9c22ada53aa19a10115454f812140f555aae7a4e97f49e81888e2a664616
MD5 dd08e7b97b08009f109202ba9951d821
BLAKE2b-256 57e7f3b12529feac4ceed56c0e21080447e547d30952f72ba9a63c8a4fcaeba9

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 b9cf405d0b749b3aa845f9923c1446188a05c9079415ef45ef4fa471c5c3cf2e
MD5 fc6dc53327157313cad2d8c61920c288
BLAKE2b-256 54c1bda8f2e9115a9371344c148f8f930d5fc5282316b4a4990b6a01821c7f1d

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c8c2eec838c27afb7fca9254b963c2c6d7cd711f67dca969c182a31e85dc3544
MD5 eb13917c7282368ffd5a47457d8fd545
BLAKE2b-256 37bee0ac41cc6235682dddb7d9df2d3906077ff28af66387ae4ee632cb1f2ca7

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2b9a951d4c3a188eee2cdea1435bddca8a778e676fd5b8bb3a4a98dca2a4233d
MD5 b4aa6fee76a4e1fd9013ce92f5302e73
BLAKE2b-256 e709b66323fa08c37636c477d17d376b41c5df9c8fa39b60a32c4453882f6e4e

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp37-cp37m-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.11.4-cp37-cp37m-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.7m, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for tokenizers-0.11.4-cp37-cp37m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 a1547141741dc0b4309bd47274191aec3ee0361fc12ca9f230dddd7029cb8402
MD5 df66b014baaa1a922955765899087d32
BLAKE2b-256 dbae96fb55be207adaf15ff2758d41b12023a4963f10e6b2c887fc4d3f6062fe

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 95ea1359367da0336815955034ddcd1c66926e7d6588481ccd758744d260d0f4
MD5 8a35be5f9831d6a0685d512b3c05c0ce
BLAKE2b-256 178854a6b9ae4dc665abeb055aae168d471730df9cf6456213381fadfcd2df75

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 4930be9cd187f9a5899d6b8298390ec6e7d2e50dcb7e0b2ea8004b4fd42b0330
MD5 dde171a6b4b161ac7138b72bc99737a8
BLAKE2b-256 a5e58d530507beb8bc12f90779ec92cd1fa4e9716dfed3c53e5e25c2323d266b

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2393499ee491a19222e30d66fb6902bc6cb58aa42caa46f59e8c1f4d8a5235e2
MD5 09e88d7e9f49ff4e72b772e175adc3c8
BLAKE2b-256 5824eab1e00b5da6a990273c97f4142b051e4d7ebe2a3d1fd7cb4ebd22d9213b

See more details on using hashes here.

File details

Details for the file tokenizers-0.11.4-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.11.4-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 6cefa233692c6584003a259dec80ad8f92cad14c801973075fcf3db290afb8b3
MD5 b9dcf6e1d7164c69f3d69360607622ea
BLAKE2b-256 6406f12fa50238d3c07738d3a8c9c40d711aaf629f8ace4920774a6bf7c29c9d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page