Skip to main content

Convert tokenizers into OpenVINO models

Project description

OpenVINO Tokenizers

OpenVINO Tokenizers adds text processing operations to OpenVINO.

Features

  • Perform tokenization and detokenization without third-party dependencies
  • Convert a HuggingFace tokenizer into OpenVINO model tokenizer and detokenizer
  • Combine OpenVINO models into a single model
  • Add greedy decoding pipeline to text generation model

Installation

(Recommended) Create and activate virtual env:

python3 -m venv venv
source venv/bin/activate
 # or
conda create --name openvino_tokenizer
conda activate openvino_tokenizer

Minimal Installation

Use minimal installation when you have a converted OpenVINO tokenizer:

pip install openvino-tokenizers
 # or
conda install -c conda-forge openvino openvino-tokenizers

Convert Tokenizers Installation

If you want to convert HuggingFace tokenizers into OpenVINO tokenizers:

pip install openvino-tokenizers[transformers]
 # or
conda install -c conda-forge openvino openvino-tokenizers && pip install transformers[sentencepiece] tiktoken

Build and install from source after OpenVINO installation

source path/to/installed/openvino/setupvars.sh
git clone https://github.com/openvinotoolkit/openvino_tokenizers.git
cd openvino_tokenizers
pip install .[transformers]

Build and install for development

source path/to/installed/openvino/setupvars.sh
git clone https://github.com/openvinotoolkit/openvino_tokenizers.git
cd openvino_tokenizers
pip install -e .[all]
# verify installation by running tests
cd python/tests/
pytest .

C++ Installation

You can use converted tokenizers in C++ pipelines with prebuild binaries.

  1. Download OpenVINO archive distribution for your OS from here and extract the archive.
  2. Download OpenVINO Tokenizers prebuild libraries from here. To ensure compatibility first three numbers of OpenVINO Tokenizers version should match OpenVINO version and OS.
  3. Extract OpenVINO Tokenizers archive into OpenVINO installation directory:
    • Windows: <openvino_dir>\runtime\bin\intel64\Release\
    • MacOS_x86: <openvino_dir>/runtime/lib/intel64/Release
    • MacOS_arm64: <openvino_dir>/runtime/lib/arm64/Release/
    • Linux_x86: <openvino_dir>/runtime/lib/intel64/
    • Linux_arm64: <openvino_dir>/runtime/lib/aarch64/

After that you can add binary extension in the code with:

  • core.add_extension("openvino_tokenizers.dll") for Windows
  • core.add_extension("libopenvino_tokenizers.dylib") for MacOS
  • core.add_extension("libopenvino_tokenizers.so") for Linux

and read/compile converted (de)tokenizers models.

Usage

:warning: OpenVINO Tokenizers can be inferred on a CPU device only.

Convert HuggingFace tokenizer

OpenVINO Tokenizers ships with CLI tool that can convert tokenizers from Huggingface Hub or Huggingface tokenizers saved on disk:

convert_tokenizer codellama/CodeLlama-7b-hf --with-detokenizer -o output_dir

There is also convert_tokenizer function that can convert tokenizer python object.

import numpy as np
from transformers import AutoTokenizer
from openvino import compile_model, save_model
from openvino_tokenizers import convert_tokenizer

hf_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
ov_tokenizer = convert_tokenizer(hf_tokenizer)

compiled_tokenzier = compile_model(ov_tokenizer)
text_input = ["Test string"]

hf_output = hf_tokenizer(text_input, return_tensors="np")
ov_output = compiled_tokenzier(text_input)

for output_name in hf_output:
    print(f"OpenVINO {output_name} = {ov_output[output_name]}")
    print(f"HuggingFace {output_name} = {hf_output[output_name]}")
# OpenVINO input_ids = [[ 101 3231 5164  102]]
# HuggingFace input_ids = [[ 101 3231 5164  102]]
# OpenVINO token_type_ids = [[0 0 0 0]]
# HuggingFace token_type_ids = [[0 0 0 0]]
# OpenVINO attention_mask = [[1 1 1 1]]
# HuggingFace attention_mask = [[1 1 1 1]]

# save tokenizer for later use
save_model(ov_tokenizer, "openvino_tokenizer.xml")

loaded_tokenizer = compile_model("openvino_tokenizer.xml")
loaded_ov_output = loaded_tokenizer(text_input)
for output_name in hf_output:
    assert np.all(loaded_ov_output[output_name] == ov_output[output_name])

Connect Tokenizer to a Model

To infer and convert the original model, install torch or torch-cpu to the virtual environment.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from openvino import compile_model, convert_model
from openvino_tokenizers import convert_tokenizer, connect_models

checkpoint = "mrm8488/bert-tiny-finetuned-sms-spam-detection"
hf_tokenizer = AutoTokenizer.from_pretrained(checkpoint)
hf_model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

text_input = ["Free money!!!"]
hf_input = hf_tokenizer(text_input, return_tensors="pt")
hf_output = hf_model(**hf_input)

ov_tokenizer = convert_tokenizer(hf_tokenizer)
ov_model = convert_model(hf_model, example_input=hf_input.data)
combined_model = connect_models(ov_tokenizer, ov_model)
compiled_combined_model = compile_model(combined_model)

openvino_output = compiled_combined_model(text_input)

print(f"OpenVINO logits: {openvino_output['logits']}")
# OpenVINO logits: [[ 1.2007061 -1.4698029]]
print(f"HuggingFace logits {hf_output.logits}")
# HuggingFace logits tensor([[ 1.2007, -1.4698]], grad_fn=<AddmmBackward0>)

Use Extension With Converted (De)Tokenizer or Model With (De)Tokenizer

Import openvino_tokenizers will add all tokenizer-related operations to OpenVINO, after which you can work with saved tokenizers and detokenizers.

import numpy as np
import openvino_tokenizers
from openvino import Core

core = Core()

# detokenizer from codellama sentencepiece model
compiled_detokenizer = core.compile_model("detokenizer.xml")

token_ids = np.random.randint(100, 1000, size=(3, 5))
openvino_output = compiled_detokenizer(token_ids)

print(openvino_output["string_output"])
# ['sc�ouition�', 'intvenord hasient', 'g shouldwer M more']

Text generation pipeline

import numpy as np
from openvino import compile_model, convert_model
from transformers import AutoModelForCausalLM, AutoTokenizer
from openvino_tokenizers import add_greedy_decoding, convert_tokenizer

# Use different repo for the tokenizer because the original repo doesn't have .model file
# Sentencepiece(Unigram) tokenizer supported only with .model file
tokenizer_checkpoint = "microsoft/Llama2-7b-WhoIsHarryPotter"
model_checkpoint = "nickypro/tinyllama-15M"
hf_tokenizer = AutoTokenizer.from_pretrained(tokenizer_checkpoint)
hf_model = AutoModelForCausalLM.from_pretrained(model_checkpoint, use_cache=False)

# convert hf tokenizer
text_input = ["Quick brown fox was"]
ov_tokenizer, ov_detokenizer = convert_tokenizer(hf_tokenizer, with_detokenizer=True)
compiled_tokenizer = compile_model(ov_tokenizer)

# transform input text into tokens
ov_input = compiled_tokenizer(text_input)
hf_input = hf_tokenizer(text_input, return_tensors="pt")

# convert Pytorch model to OpenVINO IR and add greedy decoding pipeline to it
ov_model = convert_model(hf_model, example_input=hf_input.data)
ov_model_with_greedy_decoding = add_greedy_decoding(ov_model)
compiled_model = compile_model(ov_model_with_greedy_decoding)

# generate new tokens
new_tokens_size = 10
prompt_size = ov_input["input_ids"].shape[-1]
input_dict = {
  output.any_name: np.hstack([tensor, np.zeros(shape=(1, new_tokens_size), dtype=np.int_)])
  for output, tensor in ov_input.items()
}
for idx in range(prompt_size, prompt_size + new_tokens_size):
  output = compiled_model(input_dict)["token_ids"]
  input_dict["input_ids"][:, idx] = output[:, idx - 1]
  input_dict["attention_mask"][:, idx] = 1
ov_token_ids = input_dict["input_ids"]

hf_token_ids = hf_model.generate(
  **hf_input,
  min_new_tokens=new_tokens_size,
  max_new_tokens=new_tokens_size,
  temperature=0,  # greedy decoding
)

# decode model output
compiled_detokenizer = compile_model(ov_detokenizer)
ov_output = compiled_detokenizer(ov_token_ids)["string_output"]
hf_output = hf_tokenizer.batch_decode(hf_token_ids, skip_special_tokens=True)
print(f"OpenVINO output string: `{ov_output}`")
# OpenVINO output string: `['Quick brown fox was walking through the forest. He was looking for something']`
print(f"HuggingFace output string: `{hf_output}`")
# HuggingFace output string: `['Quick brown fox was walking through the forest. He was looking for something']`

Supported Tokenizer Types

Huggingface
Tokenizer Type
Tokenizer Model Type Tokenizer Detokenizer
Fast WordPiece
BPE
Unigram
Legacy SentencePiece .model
Custom tiktoken

Test Results

This report is autogenerated and includes tokenizers and detokenizers tests. The Output Matched, % column shows the percent of test strings for which the results of OpenVINO and Hugingface Tokenizers are the same. To update the report run pytest --update_readme tokenizers_test.py in tests directory.

Output Match by Tokenizer Type

Tokenizer Type Output Matched, % Number of Tests
BPE 96.74 3439
SentencePiece 76.33 3620
Tiktoken 97.25 327
WordPiece 90.43 533

Output Match by Model

Tokenizer Type Model Output Matched, % Number of Tests
BPE EleutherAI/gpt-j-6b 98.90 181
BPE EleutherAI/gpt-neo-125m 98.90 181
BPE EleutherAI/gpt-neox-20b 97.79 181
BPE EleutherAI/pythia-12b-deduped 97.79 181
BPE KoboldAI/fairseq-dense-13B 98.90 181
BPE Salesforce/codegen-16B-multi 97.79 181
BPE ai-forever/rugpt3large_based_on_gpt2 97.79 181
BPE bigscience/bloom 99.45 181
BPE databricks/dolly-v2-3b 97.79 181
BPE facebook/bart-large-mnli 98.90 181
BPE facebook/galactica-120b 98.34 181
BPE facebook/opt-66b 98.90 181
BPE gpt2 98.90 181
BPE laion/CLIP-ViT-bigG-14-laion2B-39B-b160k 65.19 181
BPE microsoft/deberta-base 98.90 181
BPE roberta-base 98.90 181
BPE sentence-transformers/all-roberta-large-v1 98.90 181
BPE stabilityai/stablecode-completion-alpha-3b-4k 98.34 181
BPE stabilityai/stablelm-tuned-alpha-7b 97.79 181
SentencePiece NousResearch/Llama-2-13b-hf 100.00 181
SentencePiece NousResearch/Llama-2-13b-hf_slow 100.00 181
SentencePiece THUDM/chatglm2-6b 100.00 181
SentencePiece THUDM/chatglm2-6b_slow 100.00 181
SentencePiece THUDM/chatglm3-6b 19.34 181
SentencePiece THUDM/chatglm3-6b_slow 19.34 181
SentencePiece camembert-base 0.55 181
SentencePiece camembert-base_slow 74.03 181
SentencePiece codellama/CodeLlama-7b-hf 100.00 181
SentencePiece codellama/CodeLlama-7b-hf_slow 100.00 181
SentencePiece facebook/musicgen-small 80.11 181
SentencePiece facebook/musicgen-small_slow 74.03 181
SentencePiece microsoft/deberta-v3-base 93.37 181
SentencePiece microsoft/deberta-v3-base_slow 100.00 181
SentencePiece t5-base 81.22 181
SentencePiece t5-base_slow 75.14 181
SentencePiece xlm-roberta-base 97.24 181
SentencePiece xlm-roberta-base_slow 97.24 181
SentencePiece xlnet-base-cased 61.33 181
SentencePiece xlnet-base-cased_slow 53.59 181
Tiktoken Qwen/Qwen-14B-Chat 98.17 109
Tiktoken Salesforce/xgen-7b-8k-base 97.25 109
Tiktoken stabilityai/stablelm-2-1_6b 96.33 109
WordPiece ProsusAI/finbert 95.12 41
WordPiece bert-base-multilingual-cased 95.12 41
WordPiece bert-large-cased 95.12 41
WordPiece cointegrated/rubert-tiny2 80.49 41
WordPiece distilbert-base-uncased-finetuned-sst-2-english 95.12 41
WordPiece google/electra-base-discriminator 95.12 41
WordPiece google/mobilebert-uncased 95.12 41
WordPiece jhgan/ko-sbert-sts 75.61 41
WordPiece prajjwal1/bert-mini 95.12 41
WordPiece rajiv003/ernie-finetuned-qqp 95.12 41
WordPiece rasa/LaBSE 87.80 41
WordPiece sentence-transformers/all-MiniLM-L6-v2 75.61 41
WordPiece squeezebert/squeezebert-uncased 95.12 41

Recreating Tokenizers From Tests

In some tokenizers, you need to select certain settings so that their output is closer to the Huggingface tokenizers:

  • THUDM/chatglm2-6b detokenizer always skips special tokens. Use skip_special_tokens=True during conversion
  • THUDM/chatglm3-6b detokenizer don't skips special tokens. Use skip_special_tokens=False during conversion
  • All tested tiktoken based detokenizers leave extra spaces. Use clean_up_tokenization_spaces=False during conversion

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

openvino_tokenizers-2024.0.0.0-28-py3-none-win_amd64.whl (14.0 MB view details)

Uploaded Python 3 Windows x86-64

openvino_tokenizers-2024.0.0.0-28-py3-none-manylinux_2_27_aarch64.whl (13.8 MB view details)

Uploaded Python 3 manylinux: glibc 2.27+ ARM64

openvino_tokenizers-2024.0.0.0-28-py3-none-manylinux_2_17_x86_64.whl (13.7 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ x86-64

openvino_tokenizers-2024.0.0.0-28-py3-none-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded Python 3 macOS 11.0+ ARM64

openvino_tokenizers-2024.0.0.0-28-py3-none-macosx_10_12_x86_64.whl (13.7 MB view details)

Uploaded Python 3 macOS 10.12+ x86-64

File details

Details for the file openvino_tokenizers-2024.0.0.0-28-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2024.0.0.0-28-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 6160c3d13dcd6c0802f2c3be3323b3faddc9ce15e8399508f1ffa0a70533134a
MD5 2704498d3124fed51c1b31aad643bc1b
BLAKE2b-256 fb39cb9328ea602b1a0ca529d55e087ea33758a8791597518c92f076387ab017

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2024.0.0.0-28-py3-none-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2024.0.0.0-28-py3-none-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 4cf1e754087e167720c19549e534c0d176f52a62f62b7081318f023cb26a4efb
MD5 e3778cce311c8c693ed59199390b9a6e
BLAKE2b-256 f44a7ead4d34d7773fd2830f5d648df2021f4cb68693f50346ba3c45209a0abd

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2024.0.0.0-28-py3-none-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2024.0.0.0-28-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 1d0eb7cbdfc40a38547dc374b9fa5b72190bf45ba1d0f671e27a392816c24364
MD5 2f557c32399921f3622330e9199389c1
BLAKE2b-256 6799957558740ff688fd3adc4295a1835516ca32911caa38bcb9a523173b8820

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2024.0.0.0-28-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2024.0.0.0-28-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bf8079d8c596c13a98a75a8b65d20eb63fc2130a61ed4ba376a1dfc1a1ee35ba
MD5 77a1fcdfeacb9602191e4722055759dd
BLAKE2b-256 a13f770381a17841e2432be3266499e33262b2b139be9f8232d8105fbcbf55c5

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2024.0.0.0-28-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2024.0.0.0-28-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 74edc27b06c2483424bce32a554e6e89921400ccf6cda523e52b5155f17314b3
MD5 2c0d233623369e3f2fe93a028a9f1e23
BLAKE2b-256 b49a7bb6a915af42dde4b325cd2c66f7b6ae3e7a107ad68e3477abdec4b2d2ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page