Skip to main content

Fast and Customizable Tokenizers

Project description



Build GitHub


Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

  • Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
  • Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
  • Easy to use, but also extremely versatile.
  • Designed for research and production.
  • Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
  • Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install setuptools_rust
python setup.py install

Load a pretrained tokenizer from the Hub

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("bert-base-cased")

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

  • CharBPETokenizer: The original BPE
  • ByteLevelBPETokenizer: The byte level version of the BPE
  • SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
  • BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(
    vocab_size=20000,
    min_frequency=2,
    initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
    "./path/to/dataset/1.txt",
    "./path/to/dataset/2.txt",
    "./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.12.1.tar.gz (220.7 kB view details)

Uploaded Source

Built Distributions

tokenizers-0.12.1-cp310-cp310-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.10 Windows x86-64

tokenizers-0.12.1-cp310-cp310-win32.whl (3.0 MB view details)

Uploaded CPython 3.10 Windows x86

tokenizers-0.12.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

tokenizers-0.12.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

tokenizers-0.12.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

tokenizers-0.12.1-cp310-cp310-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.10 macOS 10.11+ x86-64

tokenizers-0.12.1-cp39-cp39-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.9 Windows x86-64

tokenizers-0.12.1-cp39-cp39-win32.whl (3.0 MB view details)

Uploaded CPython 3.9 Windows x86

tokenizers-0.12.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

tokenizers-0.12.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tokenizers-0.12.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tokenizers-0.12.1-cp39-cp39-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.9 macOS 10.11+ x86-64

tokenizers-0.12.1-cp38-cp38-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

tokenizers-0.12.1-cp38-cp38-win32.whl (3.0 MB view details)

Uploaded CPython 3.8 Windows x86

tokenizers-0.12.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.4 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

tokenizers-0.12.1-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

tokenizers-0.12.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tokenizers-0.12.1-cp38-cp38-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.8 macOS 10.11+ x86-64

tokenizers-0.12.1-cp37-cp37m-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.7m Windows x86-64

tokenizers-0.12.1-cp37-cp37m-win32.whl (3.0 MB view details)

Uploaded CPython 3.7m Windows x86

tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.4 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ s390x

tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.0 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tokenizers-0.12.1-cp37-cp37m-macosx_10_11_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.7m macOS 10.11+ x86-64

tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.4 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ s390x

tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.0 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ppc64le

tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ARM64

tokenizers-0.12.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file tokenizers-0.12.1.tar.gz.

File metadata

  • Download URL: tokenizers-0.12.1.tar.gz
  • Upload date:
  • Size: 220.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for tokenizers-0.12.1.tar.gz
Algorithm Hash digest
SHA256 070746f86efa6c873db341e55cf17bb5e7bdd5450330ca8eca542f5c3dab2c66
MD5 2e34833a50ed21d80e3040ece42ea459
BLAKE2b-256 1257da0cb8e40437f88630769164a66afec8af294ff686661497b6c88bc08556

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bdbca79726fe883c696088ea163715b2f902aec638a8e24bcf9790ff8fa45019
MD5 4545eec8bd7dc5aba58876aaea95ad26
BLAKE2b-256 3014d455b693ea2e3973bb5514c06658035289c33d7010a16f1285a06461e053

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp310-cp310-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.1-cp310-cp310-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for tokenizers-0.12.1-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 eff5ff411f18a201eec137b7b32fcb55e0c48b372d370bd24f965f5bad471fa4
MD5 dfcd94c652faf6f07a5324e89f64cd47
BLAKE2b-256 2ca33231f36d632fb7bd9f41335bc3f44bb87fc323bcd9a963b216cbffd47e15

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 5188e13fc09edfe05712ca3ae5a44e7f2b0137927b1ca210d0fad90d3e58315a
MD5 f25a39a41330ab9f4af653061a5e7293
BLAKE2b-256 19a21ee5a96f371f7523c0a72754e3e3391753daf4218fbd8a84adb610641e55

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 53b5f4012ce3ffddd5b00827441b80dc7a0f6b41f4fc5248ae6d36e7d3920c6d
MD5 e3acec17129b0874f3de260f0c90d5b5
BLAKE2b-256 996c45e50dbe0eaf84502efc88704f63ed2c0dc07e1af08680340c4a29e3a538

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cdeba37c2fb44e1aec8a72af4cb369655b59ba313181b1b4b8183f08e759c49c
MD5 de8d4ac2ca454e99da439c28ed01acd1
BLAKE2b-256 314051e86a23742a69b602c7aa5c3fd113493a6f700aa96569acb05ac2ababfb

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f1271224acafb27639c432e1ce4e7d38eab40305ba1c546e871d5c8a32f4f195
MD5 4063255aa570f9ad4e560901b7f1e62d
BLAKE2b-256 d4be2bba6943ee32f47a85ec72f5d7fdb943804ef42cc9babc922d010cfa05a9

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp310-cp310-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp310-cp310-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 d737df0f8f26e093a82bfb106b6cfb510a0e9302d35834568e5b20b73ddc5a9c
MD5 29fefa87900ba767de81ea3e9f452445
BLAKE2b-256 bf76ba3191336097533a9858b45abfabd5bc512fd3c8689dff0d4d3c0a8756e0

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 2158baf80cbc09259bfd6e0e0fc4597b611e7a72ad5443dad63918a90f1dd304
MD5 729eadff96772a6a873c91104751988f
BLAKE2b-256 28f3dbc9a5b9d244300bf68b717deb636578419e2989ec8fafb2a6c57c19f395

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp39-cp39-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.1-cp39-cp39-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for tokenizers-0.12.1-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 6a7a106d04154c2159db6cd7d042af2e2e0e53aee432f872fe6c8be45100436a
MD5 a68a079a510f6eed4ed207d42f9bb28d
BLAKE2b-256 3ebe867b536e19ec7a0d8864ab7fc5b1a6470e59ee8d138858909e3b187bab0d

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 0bf2380ad59c50222959a9b6f231339200a826fc5cb2be09ff96d8a59f65fc5e
MD5 85a4e41626bdfef761e3c45faf9ec084
BLAKE2b-256 5cfdaf2849241e3b67984dd91a5e72e41fc90f4ab5771557206a355ae1c8d6bb

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 664f36f0a0d409c24f2201d495161fec4d8bc93e091fbb78814eb426f29905a3
MD5 ca72519520ea2a561e7402ad267dbd69
BLAKE2b-256 4a2cf03c5c483fbe4d0b9e7d33c20f866d5ace5477288046ed7660f707d9c1c3

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8cea98f3f9577d1541b7bb0f7a3308a911751067e1d83e01485c9d3411bbf087
MD5 df044f66f1b6c990029f6e6092fd804b
BLAKE2b-256 36521ac0b88e09d5b9a697a2741131a254dbd895c40c95421d3532bfb2e93d3b

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 619728df2551bdfe6f96ff177f9ded958e7ed9e2af94c8d5ac2834d1eb06d112
MD5 e8a6b47bccfbea0fe6bc861986518cc3
BLAKE2b-256 60fc3da9736965bf6edd96e8b098984c9f4559c4a1cc5be563436cd228ad1e69

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp39-cp39-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp39-cp39-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 411ebc89228f30218ffa9d9c49d414864b0df5026a47c24820431821c4360460
MD5 1f1a250b743724ad5fd205f0c129e595
BLAKE2b-256 fd4bb08464a880fed6cf75fcd4b4541b279731b8a2b15da575488d2cd188d14a

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 62a723bd4b18bc55121f5c34cd8efd6c651f2d3b81f81dd50e5351fb65b8a617
MD5 abd8c411d240756c08aaa247b3708e59
BLAKE2b-256 e4074220b22988667259b793cfb87cdd1fa6ef0e375d149d30ac0190ba6ced7d

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp38-cp38-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.1-cp38-cp38-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for tokenizers-0.12.1-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 3f2647cc256d6a53d18b9dcd71d377828e9f8991fbcbd6fcd8ca2ceb174552b0
MD5 1fdc3b9a4864ef2f1bcbe50ba0b63b1f
BLAKE2b-256 ec11f263a992bbbde1f3716edf9e1c800fb20e8ce205cdc0a25134ee6a46c8bc

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 258873634406bd1d438c799993a5e44bbc0132ff055985c03c4fe30f702e9a33
MD5 9a00806baf75a5b7ea082a0fc22ba663
BLAKE2b-256 aa3ab8b7b87d1ce561492c5560d114108f30c047446c2f854d09590083ba1c02

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 7d43de14b4469b57490dbaf136a31c266cb676fa22320f01f230af9219ae9034
MD5 0e7a1b0c7178a4c1eb171b946b1fba87
BLAKE2b-256 7d551635caf57c29fc9574096ab847cc41eabc032b3a129d0e72d293e1f880be

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b9779944559cb7ace6a8516e402895f239b0d9d3c833c67dbaec496310e7e206
MD5 7376c3e3c7db847606a55489b0dc988e
BLAKE2b-256 5271b6c04ecca9370a4399e3d24ad7c637edaee9ea27c0f742d5be8a15e60688

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 419d113e3bcc4fe20a313afc47af81e62906306b08fe1601e1443d747d46af1f
MD5 4127246fc3ba96ac6ec80207719a410b
BLAKE2b-256 36fae22ebbcaeecd9bd04efa30f7ec43ccf1501c97615c9af3bbf13a77ce0b81

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp38-cp38-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp38-cp38-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 7c5c54080a7d5c89c990e0d478e0882dbac88926d43323a3aa236492a3c9455f
MD5 25b49400fbde21aa2b2b4b7ee6c8991b
BLAKE2b-256 be4e3ade334775dd35b109cdc1d4d05bd5c656059f5430b7e7f3d456b3847161

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 01abe6fbfe55e4131ca0c4c3d1a9d7ef5df424a8d536e998d2a4fc0bc57935f4
MD5 b998388cbde504f9dfe09a901d092401
BLAKE2b-256 1cb6b41856582bc63d9197160d8360a615e6bfb7206e461d0e1c9bf44fa166ce

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp37-cp37m-win32.whl.

File metadata

  • Download URL: tokenizers-0.12.1-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for tokenizers-0.12.1-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 38625595b2fd37bfcce64ff9bfb6868c07e9a7b7f205c909d94a615ce9472287
MD5 5160e54214262ebba2a9d75ea2901a30
BLAKE2b-256 82219510fcb2e558b1605634da6d1bf4b65cd0cb8b7de006a119830da63e2f7e

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 fde8dccb9033fa344ffce3ee1837939a50e7a210a768f1cf2059beeafa755481
MD5 fb370c39de1539f6f480aeceb493cb2b
BLAKE2b-256 9ac55ffc8df6e108b973dea702521944857e73cc10beb0a768d26c8ebe8f195f

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 6a38b2019d4807d42afeff603a119094ee00f63bea2921136524c8814e9003f8
MD5 41659af7bda3a2e8043007ad52186cc9
BLAKE2b-256 2559d5faf454ee7399f24a024b2b3c7aa679ab8208f1bc72e20433f298a6c3e1

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ae6c04b629ac2cd2f695739988cb70b9bd8d5e7f849f5b14c4510e942bee5770
MD5 f21e8873f7a93e3436d6edfa5dd97807
BLAKE2b-256 03788ae719924560be4b8513b50e6af4bc76d7e71fa00e6ffcdff03a3d152f44

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 7f4cb68dc538b52240d1986d2034eb0a6373be2ab5f0787d1be3ad1444ce71b7
MD5 69a43663d2b5feec6b8d1c061db9bc4d
BLAKE2b-256 84aaf0509ce47f22334654b3bbe9553525160df8ad7a02a37f6eab27def3c4c2

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp37-cp37m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp37-cp37m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 27d93b712aa2d4346aa506ecd4ec9e94edeebeaf2d484357b482cdeffc02b5f5
MD5 7afb205460f480f747793489d11e7f75
BLAKE2b-256 8ce6745465cdc9ee4ab039f78a028437b96100927dc6d9a41d0c2e094304a834

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 8d4339c376b695de2ad8ccaebffa75e4dc1d7857be1103d80e7925b34af8cf78
MD5 50fef06cac90e68700e0dc69e90e90a5
BLAKE2b-256 5b03440812b4edf85f730c6368af0354b6b2cabf95415fd354223641fa51daf7

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 230f51a0a82ca7b90077eaca2415f12ff9bd144607888b9c50c2ee543452322e
MD5 068fbbe0ce27d83f05a19f40dda43d7b
BLAKE2b-256 481db505ed4f1a8900eaf0336efeb6e4ce4ddfc7fef90046351fb473c69aca5e

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 91906d725cb84d8ee71ce05fbb155d39d494849622b4f9349e5176a8eb01c49b
MD5 89b5092379016e4caf6864978e9da363
BLAKE2b-256 a34a4d95d2d9d7ef23c70e22b6a7e0a5031a27c8dd4cda6633e0558653ad220a

See more details on using hashes here.

File details

Details for the file tokenizers-0.12.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tokenizers-0.12.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 28825dade9e52ad464164020758f9d49eb7251c32b6ae146601c506a23c67c0e
MD5 553b722f7ba35c1693423f7dbd3bfed8
BLAKE2b-256 362226b08c841c0493908b4be6960ec2be14a21d1ec0f42ae0cedbca5599ad3d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page