Skip to main content

Fast and Customizable Tokenizers

Project description



Build GitHub


Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

  • Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
  • Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
  • Easy to use, but also extremely versatile.
  • Designed for research and production.
  • Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
  • Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install setuptools_rust
python setup.py install

Load a pretrained tokenizer from the Hub

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("bert-base-cased")

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

  • CharBPETokenizer: The original BPE
  • ByteLevelBPETokenizer: The byte level version of the BPE
  • SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
  • BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(
    vocab_size=20000,
    min_frequency=2,
    initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
    "./path/to/dataset/1.txt",
    "./path/to/dataset/2.txt",
    "./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.13.0.tar.gz (358.7 kB view details)

Uploaded Source

Built Distributions

tokenizers-0.13.0-cp310-cp310-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

tokenizers-0.13.0-cp310-cp310-win32.whl (3.0 MB view details)

Uploaded CPython 3.10 Windows x86

tokenizers-0.13.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp310-cp310-macosx_12_0_arm64.whl (3.6 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

tokenizers-0.13.0-cp310-cp310-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.10 macOS 10.11+ x86-64

tokenizers-0.13.0-cp39-cp39-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

tokenizers-0.13.0-cp39-cp39-win32.whl (3.0 MB view details)

Uploaded CPython 3.9 Windows x86

tokenizers-0.13.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp39-cp39-macosx_12_0_arm64.whl (3.6 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

tokenizers-0.13.0-cp39-cp39-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.9 macOS 10.11+ x86-64

tokenizers-0.13.0-cp38-cp38-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

tokenizers-0.13.0-cp38-cp38-win32.whl (3.0 MB view details)

Uploaded CPython 3.8 Windows x86

tokenizers-0.13.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp38-cp38-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8 macOS 10.11+ x86-64

tokenizers-0.13.0-cp37-cp37m-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.7m Windows x86-64

tokenizers-0.13.0-cp37-cp37m-win32.whl (3.0 MB view details)

Uploaded CPython 3.7m Windows x86

tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp37-cp37m-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.7m macOS 10.11+ x86-64

File details

Details for the file tokenizers-0.13.0.tar.gz.

File metadata

  • Download URL: tokenizers-0.13.0.tar.gz
  • Upload date:
  • Size: 358.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0.tar.gz
Algorithm Hash digest
SHA256 e49a137bd9321905bd37abcc77d34b7d9d6d11e09da3a901bd127e640be55985
MD5 d8d9ab2ea3358970b8c40eabdb1f0405
BLAKE2b-256 444b323787e105caddf5ace40c4007e0745abf97e00ef21554e268c6d266d64d

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 5013e8822bfef5e339d086cff96f3d69eaeed58e471ef5dd71b323470a73cd64
MD5 2af114a7b1b6c3712f1d3c4025b2b253
BLAKE2b-256 49c1f7fc459b8d3d0742f64d6dee117421eb16c3353daa65aa26b2e1f04d5c99

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-win32.whl.

File metadata

  • Download URL: tokenizers-0.13.0-cp310-cp310-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 4ff01a4320dd47127fc52c242fcfa33b661f89b50ed4ad70436b06179f567f1e
MD5 d173123d3110d04557c2f9bc3c71ad5e
BLAKE2b-256 a68fdb9e8e2566ed4f48368f9181b125f48de15919581dac2564f1b9908399c9

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 ad3951e9584cd386111fd4edfdce70083f62a3e82f739347192c0c4af1288554
MD5 636ffbc118afac2be7924d6f456267e9
BLAKE2b-256 ae011be2be7a700b7a21dff4a4add07c6c2e489b28f11afa5aead41f65fd0d8a

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 db9a71168b0f12f1092752bc0c58c3d0e87e58262de3ab3424dfb052f1003def
MD5 30fe87bcc1b8b35634dcaf7fb3fb6c41
BLAKE2b-256 683587bd74adbef807368a57096af7414e4b44931a7849bfe163cd128937f32e

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 076659344b815464d4f2965fc90c52ebbb2df8d2dcf40ab044705cd15178c44b
MD5 122e21b52840d178d929de8cc88d734e
BLAKE2b-256 9cdc4b6a74c7205cb92c78948c2e8bb023d910de0c41974d8c122719c59fb3be

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 e2dbd5105adbf582fc992eaddc39b469f34f1216f9c79d5bbb0a3f2700fbc0ac
MD5 b6b2a462618926ec77e6b1ea662853c7
BLAKE2b-256 cc674c05eb8cbe8d20e52f5f47a9c591738d8cbc2a29e918813b7fcc431ec3db

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 cef0026587d0f911232b8d6d38da64ea8ea187139f2a9506efd0c422e5e49926
MD5 5c19e7832bd96e70a26bc3b041933677
BLAKE2b-256 1ec115067b1918e0d4573aead3f3ac89568514a57dd251d837917e5385a43204

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 a3c69203690744fe357b95a4fbeb8207b3a3b0d565fba07ccab5f62491f8b6f0
MD5 a0bd1e8c2432d268326615bdaf57cad3
BLAKE2b-256 9a325dfa7777886a710a7e92b71985e1b8efd4f48da435eef1dae8df826fece8

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7dcaf96ec433b944f295902af0fe9aad6724b38f55544cbd81c30e0c4b401198
MD5 93d900bd697da1f4180345a72c81e07b
BLAKE2b-256 53b5e856f2d280f5a21db4c9cfa2498b3b26555e32515bc0518d6c2388d623d2

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-win32.whl.

File metadata

  • Download URL: tokenizers-0.13.0-cp39-cp39-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 74db42e50607a2a7b93beb37f6ddcbb989c895236cd472edccae0960a64c15b2
MD5 1cfe1425f4638cc89b40494f61b61087
BLAKE2b-256 a37f9bab774e840efa794290855d2487874f36b443009df5cf6faafa69ba1b8c

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 ed08aa1ed60de3ddbf88329b4567a77d72d012161c4df692ac4b5b2294933a6d
MD5 40c3c8065ed44eeaf0694a03a1131d07
BLAKE2b-256 5007727210ead354d5cfcf537244f22c173b5394605873b1f8db8d9c1cda6ede

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 0b17d9f269102241c63d4e46c4f64966fba0cc59a8c765ccba67e1873dd1f9d9
MD5 b0b68bdc221815b17273be8c81d27b01
BLAKE2b-256 d33a95f217de0d7f824ae6aabe0359dc2595d063cac65eb6aa4b5b4690fc377f

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5311b14d16a8dd8f68218e2df21fc5e230871fae9a866598addd94f0fb67daa7
MD5 81f7bad97645d22574a9eb13ea67f51a
BLAKE2b-256 5c68eb821731a4704c08f9cba80ab2f37fe87a0df07bbf72dff020ac0d68b68c

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 61d9697d79277cf56562e22ccda3de4b9f535e6e7e38b0b099e592bf953ad168
MD5 635c3c1a2ae0d1c9055c3adb7f0fcc45
BLAKE2b-256 4756b57fcc25a8aa4921b7201187563905a57d092b0af4998ad2aa038b7224fd

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 bd437dc2d3cd847828ca437bcbd0656fe0d8b722a495f852b4625a24a666c275
MD5 779d329013013303c3468e5ca8b38b59
BLAKE2b-256 36de1d66b543f121e3a18db00a15d838f65dca71d744ff51c3ebeb0ea8db03e1

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 619c002d74bfa78713aa451d5f630ea742a7628a022af858fbc29e4ac55d0be9
MD5 830c5c3f3c866a88462bc234b7a4dd64
BLAKE2b-256 111e1cf5c6fc7bb9cae4e6d89b0cbbc4814405af24f9a1db009cf528beb339e4

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 66fbf06c25b176228125ddb4c67b26c1d1cd46c2b82d1fa952cd0d2e6ed7762e
MD5 cfe53036235cb3c9517e5d715d44dccf
BLAKE2b-256 6f2d626fc1256873a80e2f126eba9d56596940a6862ec8adb7826cb8fcd2e4ea

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-win32.whl.

File metadata

  • Download URL: tokenizers-0.13.0-cp38-cp38-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 9bcbd62830219cf8699d7acdbbbf8625cb65bcea8e189f14463edd3bcf021eed
MD5 c4dfeaafbc16e6f047ab7c57e7e0d407
BLAKE2b-256 fa1cd39a357560bf0e895ca580cf4774490be6c5d18f42967b88aecc52e1fbbf

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 4125db11089d923452bb47bfc050c9b42f459f978ee1a214cabecbb98ea4b2be
MD5 4ec65cec09e9186b8f286e4431b5aaca
BLAKE2b-256 97e6cd412f662ca4de2e1ca37e6c5827c61de5bdafc8051f23d52a51966010ef

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 479320e697cdf538e5d4ebffc4e1dc38b23388ca7dbb2fa7266fbd7173c84a57
MD5 e1018e436d4dc93a2d7958e20ac623ad
BLAKE2b-256 47bcb296189f1483ac38907cf2d7be82f2e3e8f4a814502b9e0e59fff0fcb245

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9846d131ae4789bdce999276245d571d5abcd647ca4e913d5ff0f45b4dcf7fae
MD5 33d7468710e1535eb2a49bdd2d8e183a
BLAKE2b-256 c2d640f51550630154526cb30ed9f1c6d896d736f88198d59c471a4038087077

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 879585ef9b219ea0e300c670d4545e6a007363a022966a27c1dee8146666d38a
MD5 48fee4380c1fad08a2a544ad86b12e31
BLAKE2b-256 61fcd82d60ed5c7306e0b991a2966dec0b839426ea783668a9d409a32a59bbb4

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 8eacb864f764ea3f12fd8735a1879ac84b9658d3083c47b0f005a5ed8c97fd5a
MD5 f40e346d6b61a4e953aa90a9f3155436
BLAKE2b-256 0c1a3b349adb50e289699096eded8e76d3b124100b68fde66455c00ff6c94283

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 2b8636aa712b2de7efccef67c63eb1e4c10df3bcbd3a467573890e134cf00ba9
MD5 37ee5eb4a6589cabb2ecfeada6c97ff3
BLAKE2b-256 8281bc120d3e466a65cb135ee8d6192b6743573ecb3550c8ab8e2cf4b6fe3f2a

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-win32.whl.

File metadata

  • Download URL: tokenizers-0.13.0-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 4f88a82db27de2353ea9b3519b7bd3bb27f65fa01860a714cd959e5b574b99fc
MD5 a86265d81073d1ccd9e4dd903b2e1173
BLAKE2b-256 a15f1de38fc31054c3ff7f24c277f579632ec87dbfff2f69907ed741c970c27a

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 00c295f301568b4f77e3d8eff3dac9791b184d615d0ba0c2bc10a1e7556b3c37
MD5 ca255c52fc21a56a5c5aa881bfdf030a
BLAKE2b-256 17ed6111bd017a860c36d930527f18644d723e58aa1407facc83edbaafeabb03

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 4b822ab35ab6a874a5142a5e6cad703e049ec0836cc1cdc20f2c0aca4d568ec3
MD5 e099f3218e5d97b19cca4237799aab15
BLAKE2b-256 fb9c76d4428f4cdc9b1dbeb3482ea974ab94fbfbee2260b42221b2f2c6048bc2

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f8f6c2d896e0a59d42b58de83291f8c2c392d12737ac49484e53bd66b0fe32e1
MD5 393f9a3ad928d99186b72884274bdee6
BLAKE2b-256 4eca4ab54f8cce18ec5f5a99b6abf7cc5ff062fb2ad329d29504fb1993720ed7

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b73fc2f3c9dfe6df219335edafb14de9f93ffbe3916911e76030834f7d4cefcd
MD5 cece6c932f67488588aa80714e30c191
BLAKE2b-256 29b0a0d409092a885aef3a2522cd0ac1cf439a6fa9b1c91a0a1257924a838bb2

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 bd25e9ddf712f8699a29298c563340cf2d6d91e5dd0816a4bfdbb6319bc9e1b0
MD5 75ba7a0cfe8f0fde96e543ba72522af7
BLAKE2b-256 4036e050b40ae5f9c81e2826e021ada917711ecad1583338d87fdf2373269804

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page