Skip to main content

Fast and Customizable Tokenizers

Reason this release was yanked:

Breaking change with unexpected effect

Project description



Build GitHub


Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

  • Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
  • Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
  • Easy to use, but also extremely versatile.
  • Designed for research and production.
  • Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
  • Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install setuptools_rust
python setup.py install

Load a pretrained tokenizer from the Hub

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("bert-base-cased")

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

  • CharBPETokenizer: The original BPE
  • ByteLevelBPETokenizer: The byte level version of the BPE
  • SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
  • BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(
    vocab_size=20000,
    min_frequency=2,
    initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
    "./path/to/dataset/1.txt",
    "./path/to/dataset/2.txt",
    "./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.12.0.tar.gz (220.9 kB view details)

Uploaded Source

Built Distributions

tokenizers-0.12.0-cp310-cp310-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.10 Windows x86-64

tokenizers-0.12.0-cp310-cp310-win32.whl (3.0 MB view details)

Uploaded CPython 3.10 Windows x86

tokenizers-0.12.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

tokenizers-0.12.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (7.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

tokenizers-0.12.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

tokenizers-0.12.0-cp310-cp310-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.10 macOS 10.11+ x86-64

tokenizers-0.12.0-cp39-cp39-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.9 Windows x86-64

tokenizers-0.12.0-cp39-cp39-win32.whl (3.0 MB view details)

Uploaded CPython 3.9 Windows x86

tokenizers-0.12.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

tokenizers-0.12.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (7.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tokenizers-0.12.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tokenizers-0.12.0-cp39-cp39-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.9 macOS 10.11+ x86-64

tokenizers-0.12.0-cp38-cp38-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.8 Windows x86-64

tokenizers-0.12.0-cp38-cp38-win32.whl (3.0 MB view details)

Uploaded CPython 3.8 Windows x86

tokenizers-0.12.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

tokenizers-0.12.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (7.9 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.8 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

tokenizers-0.12.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tokenizers-0.12.0-cp38-cp38-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.8 macOS 10.11+ x86-64

tokenizers-0.12.0-cp37-cp37m-win_amd64.whl (3.2 MB view details)

Uploaded CPython 3.7m Windows x86-64

tokenizers-0.12.0-cp37-cp37m-win32.whl (3.0 MB view details)

Uploaded CPython 3.7m Windows x86

tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.6 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ s390x

tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (7.9 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.8 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

tokenizers-0.12.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tokenizers-0.12.0-cp37-cp37m-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.7m macOS 10.11+ x86-64

tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.6 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ s390x

tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (7.9 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.8 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ARM64

tokenizers-0.12.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tokenizers-0.12.0.tar.gz.

File metadata

  • Download URL: tokenizers-0.12.0.tar.gz
  • Upload date:
  • Size: 220.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0.tar.gz
Algorithm Hash digest
SHA256 81e2b69005d717c2856995979fe1f17646eec8e1bb974852a10b2195b3f7a64e
MD5 07f6728e88f5eb8cb2db7986975df226
BLAKE2b-256 25393db82f83e42e95e6949d2f196ee08bb6f3e651d8eced80f1d2dde9fbbdc0

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 437976576e8fddafad81a29c2db31c958b3c02f4b87db725869099d235942f0d
MD5 d2e377e4f069c8a6219a36783bf67bfc
BLAKE2b-256 75358e723904f4c929cf6b4afe359d2f46fbd7363211f8655e27c09440aeac06

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp310-cp310-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp310-cp310-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 a066a28157202c33c9791957215ac812f3fe2071659ad5cb2d468ad2d8d17bcf
MD5 1cfd071cc744c96a46bbf74ef4970515
BLAKE2b-256 466311937e015c85ab5fb86d6648a9b352dd7c821efb7c70456fe47ce9aff851

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 93a6515ee713a6291904d9dffb640ed1aeede35f0aee174d101d1bd3f71fa680
MD5 8aca74fda5f99c3ba885edb38daf3a38
BLAKE2b-256 9c12a637b6ad1cdbb9d7c69d9f388103588e70d98bce195b3b033aa45995d7b8

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 241e677a567044abc131a1a6375498057e8e0a5c1a03ef7d411591fdfe5cc6a5
MD5 ba28c483bf8226a6a7957c0324f02862
BLAKE2b-256 456b930d5c9edf0c737da822898c1c92e0674bc2b3d1f6707e0fe8ae05c7e5c7

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 45f5524951564ff9f0132a9905e15e827bbe4f1610b480d1150bf700fae91305
MD5 27aca34ead1f33524fb21ba4a8028446
BLAKE2b-256 ef9fb8fa39a28db1145f5847900f83e47950e8658737808dde331293a678ca32

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: CPython 3.10, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 e404cb3e96a83d1f86834a0a5d80a7cdcc42d3bf7082999405ae7cf435a39c07
MD5 f4a2605efb16b7387646d1564b477f58
BLAKE2b-256 bb5880446e8a5014db06a3fc22c570dc7602c9215d1ddf78cd1b1e295a6be07a

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp310-cp310-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp310-cp310-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: CPython 3.10, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp310-cp310-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 e0f3129c0a35d10834457a81b994e42ec67b77f19d5d075821b68720dd4eeb03
MD5 e3470140149f1ad4331b99937759f226
BLAKE2b-256 a319895ed398028dc39bd33128d1f2b2af519fc97f5ca276113a35bba3e3fc92

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 30b5e9f64cd2bb4ada58788be5e47e50901c4a239056b0254b666b863b01e16d
MD5 80944129764ad2d5d4acc548747b392e
BLAKE2b-256 0e057e133847e144abfaa57ef2780eb5fd096c3ff57e84285e4731eb35f0d915

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp39-cp39-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp39-cp39-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 7335c702a52d8fdc4c2a907ffa155aabe544d6870284cf5855de75214d6f04e5
MD5 d3f716e7b04465ac3b00db7e1472ba3e
BLAKE2b-256 e05f80972427ac375f6d4d758f268a31f829df2adbe389fd59e9349e841f7f3f

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 34ef31e39bd10cb019b704278af11c7af5286fd3754b6925aca0bf31fd0779d2
MD5 f2a24f4b2274bf983d1f8b2bbc4a4350
BLAKE2b-256 07ff15b45103a5adf4d8d61b589d575f463f5496fad391a64175b2c4d6587722

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 ac3b2185d0062cb03f031fd34998aecb430ece9d4815ef0fefe9c6947c2b70ee
MD5 7d37b573c754ebab0fd83fe663fc0159
BLAKE2b-256 eb397f603f1a01c65da902fe33cf0b771860324630a2e0f854ccca9164777581

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b5ccf61acc517804a6ef65cee7ada09f6f408aa43cd4c0053bd39059c7d7a352
MD5 751b01fa614508511271746459fa8332
BLAKE2b-256 863e0f4833ddcd3460f12c67f9b19120c79e285c56a6f24e3ddda8674fbb1519

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 72a9c1cdc7109f2a412d1edf1ce477590700bc74855d9377f81fa8cb9503cd74
MD5 040470af1d1df94a9c4fcb76412680ae
BLAKE2b-256 a03955d61a2edcfa4620474f2fdee76c9937958324b933f4a8cad424f2c8869a

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp39-cp39-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp39-cp39-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: CPython 3.9, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp39-cp39-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 67a41b74a3c726b20e7c8d10af2866365d441d88cb0d36013d2524c931b95b22
MD5 53d2155e4bcc6a887476d5b0740d1f06
BLAKE2b-256 56640559a231c14934a543e561916f2aa40e41586aa39146ccdfdc3ab2b9a44e

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 b08d09cf3f05d12039338dcb8bba04a892efd20b30bcd60dd8fd563decccef57
MD5 d2113bd5d71e44ca566d0484239a5ad5
BLAKE2b-256 0f4c21ac777cfcf4a127ab4b63738d8df25bf4c158aa41e886d6d9383f28794d

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp38-cp38-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp38-cp38-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 8c0a0d3c08c15748ab0d9816a1563878c5a6501fac44b2c84a24ec1a7d2cacad
MD5 1fccc4ad365ca386dd33bf19edb29cf2
BLAKE2b-256 1ecc2b5d2fe1f386f88870097b9dbed2c6593dee212a2956ee0b055f90006ac3

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 9a5e1719a9bb5e20f87c1f3f010ddf33f00d3c76141d0d5fce144cb9798755a6
MD5 bbed6d679f45756c08cf2d406723a303
BLAKE2b-256 ca7e31420d97908e1ee1b1022841821a5739f8d2bfbed4240e21ebdab8167bcb

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 8f88c8f9a7b50fdf02e339aacd6757de72c4570d7a05ba070889011fa9ae4b65
MD5 f00d58d234878498fd63ddef5de81ca1
BLAKE2b-256 e19525ff6991989925330c0cdd59ee7386711ffeebf2847b6c7c73185aa44844

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 628c0cb51ca3f8cf2b809b7d340fb0fb1a91bfd57f0504d3f890101e058eaa1a
MD5 873753e69c40599f156c7cb24244f470
BLAKE2b-256 ec4dd720632f66cf044f86081d83570c597e35576fdc9e545cd9cdecdcd2f92b

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3f5c527b225b0a059dabc8f1d41796d709832373f0fd15cb433d78d54a980a46
MD5 e91d9b0d638df278b17c2b722ad1e0b1
BLAKE2b-256 f032d66b166066dc64732865e631d35004c49fa0fb987179fae10a04055685a8

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp38-cp38-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp38-cp38-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: CPython 3.8, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp38-cp38-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 6364677ff1bac939e5f22f5d417cc608e9cf9818ff18c66cbd8f2446bfeeab94
MD5 cd819b4b2a972e1f309b4ef46b44d620
BLAKE2b-256 fbeedf31f5cc56c3751506e7d414e7041c5cceb69d1ed0aebdb2ea75a20cfae5

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 1f08da1d57874545b45a117e882d0256fd9de761433c35205f7e4ab841e43134
MD5 2b6aa429a287df613883fc3a7b2f1bee
BLAKE2b-256 f89f51c2b85330c1aca7c738184aee5c70dbe0c3b9f0acb6e1a33cd9bd88286f

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 10d1d380ccac80b3e4d980daeb4fa99231d1ee97c48cff41ef7c2f589877758b
MD5 d7c87c30ab2f6873e8fc94baeb7499df
BLAKE2b-256 cdf09be6c2c409bd01aaddbc1fd86d17f89737ed332b04542a7e26047403cf51

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 36b2fccdf3c631ed509b54e8e766fcfacbba0ad3edd08d94e8e010aed3af59d6
MD5 a6f4c850985971689b5b49dc324969d1
BLAKE2b-256 2ccfcc7476fb6c798da541413da6849a08b45ddcdc4dc61fc4d710b0b032d560

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 89ea69799ef1eb417b89cd3be66545e437c9d451110df6d4711360f6b6f57257
MD5 7c834381be06deb62d4956ee948bcf86
BLAKE2b-256 18cb5dc7a83e16d28ad48b4546734abdd2c011f71c5b36e7aaeb61f6f37beb44

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3c6d80b2526acb2e294a62af32a54cfaab3ff7b68ca887b5d7d40b1cd1579348
MD5 756b9b181faad4b7f09d831155f27658
BLAKE2b-256 01bfd28f00aa6b7955fa5434a1b40ad0663f1f639e3dbe02c624ffb8f195b530

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f01db8dbc5f1a6674d7bcb89e29da8bc8fc8b9d0df00a4967b1b2e7399e5e1f3
MD5 de42f0e388f33932ac95cc3b6930de9d
BLAKE2b-256 9321c718a87ac8f97b6dded2306c341ac3f941d678913aeb0ae3630359b96701

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp37-cp37m-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp37-cp37m-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: CPython 3.7m, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp37-cp37m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 28ae4b10e7a9ab05616b5d4de1dc69d4e3ac974a5937e128db783b4aa859ad14
MD5 4c2530eda11b767d39cc1cbb3ad1f2bb
BLAKE2b-256 0447dbd4b31891dbc3012c6f47219eeb10e6ae23f0f3d70c68769860639088a1

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 fe855f9e07f4a54f70d1262c1fd5e794353bf518593379528b0ceb28bc9b297f
MD5 78cdbd0e9f4e3a3733e21dcbc533d742
BLAKE2b-256 0f8a0b355c3a5934b7015e077534e1b1fdacf201e2c471d0b7bb7d75663139d8

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 9f809d6bf7eeaaf2d48e47e2d462d71837876c6a1cf3fbe209377f7a02c98741
MD5 9f401febe0b35eb574da13b64a3a84fd
BLAKE2b-256 9bb70492109f7bf7e507061a2aa8cd90a866056c85e06066e73d600555a85be8

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.0-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4e1106734d0189c97554ab222c72d6e72fa7bb2cf819422a2d6096045fbf3cfd
MD5 a1b2000610b3f7b29287d57f38805ae3
BLAKE2b-256 a790367118b3d8e155fb3a1fe36215b16b75c6932d54a6adfdea14f4234ed925

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.12.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.10.2

File hashes

Hashes for tokenizers-0.12.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 97943d09d8062974058ebc5abad9dc0b059501583dd70a92fe77537873fa77d2
MD5 cb7a6eb081d474a785fc9d0f4f8a37c9
BLAKE2b-256 db075125318be83d03c311125412da13988e422b0c0255c08621637614c40b86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page