Skip to main content

Convert tokenizers into OpenVINO models

Project description

OpenVINO Tokenizers

OpenVINO Tokenizers adds text processing operations to OpenVINO.

Features

  • Perform tokenization and detokenization without third-party dependencies
  • Convert a HuggingFace tokenizer into OpenVINO model tokenizer and detokenizer
  • Combine OpenVINO models into a single model
  • Add greedy decoding pipeline to text generation model

Installation

(Recommended) Create and activate virtual env:

python3 -m venv venv
source venv/bin/activate
 # or
conda create --name openvino_tokenizer 
conda activate openvino_tokenizer

Minimal Installation

Use minimal installation when you have a converted OpenVINO tokenizer:

pip install openvino-tokenizers
 # or
conda install -c conda-forge openvino openvino-tokenizers

Convert Tokenizers Installation

If you want to convert HuggingFace tokenizers into OpenVINO tokenizers:

pip install openvino-tokenizers[transformers]
 # or
conda install -c conda-forge openvino openvino-tokenizers && pip install transformers[sentencepiece] tiktoken

Build and install from source after OpenVINO installation

source path/to/installed/openvino/setupvars.sh
git clone https://github.com/openvinotoolkit/openvino_contrib.git
cd openvino_contrib/modules/custom_operations/
pip install .[transformers]

Build and install for development

source path/to/installed/openvino/setupvars.sh
git clone https://github.com/openvinotoolkit/openvino_contrib.git
cd openvino_contrib/modules/custom_operations/
pip install -e .[all]
# verify installation by running tests
cd user_ie_extensions/tokenizer/python/tests/
pytest .

C++ Installation

You can use converted tokenizers in C++ pipelines with prebuild binaries.

  1. Download OpenVINO archive distribution for your OS from here and extract the archive.
  2. Download OpenVINO Tokenizers prebuild libraries from here. To ensure compatibility first three numbers of OpenVINO Tokenizers version should match OpenVINO version and OS.
  3. Extract OpenVINO Tokenizers archive into OpenVINO installation directory:
    • Windows: <openvino_dir>\runtime\bin\intel64\Release\
    • MacOS_x86: <openvino_dir>/runtime/lib/intel64/Release/
    • MacOS_arm64: <openvino_dir>/runtime/lib/arm64/Release/
    • Linux_x86: <openvino_dir>/runtime/lib/intel64/
    • Linux_arm64: <openvino_dir>/runtime/lib/aarch64/

After that you can add binary extension in the code with:

  • core.add_extension("user_ov_extensions.dll") for Windows
  • core.add_extension("libuser_ov_extensions.dylib") for MacOS
  • core.add_extension("libuser_ov_extensions.so") for Linux

and read/compile converted (de)tokenizers models.

Usage

:warning: OpenVINO Tokenizers can be inferred on a CPU device only.

Convert HuggingFace tokenizer

OpenVINO Tokenizers ships with CLI tool that can convert tokenizers from Huggingface Hub or Huggingface tokenizers saved on disk:

convert_tokenizer codellama/CodeLlama-7b-hf --with-detokenizer -o output_dir

There is also convert_tokenizer function that can convert tokenizer python object.

import numpy as np
from transformers import AutoTokenizer
from openvino import compile_model, save_model
from openvino_tokenizers import convert_tokenizer

hf_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
ov_tokenizer = convert_tokenizer(hf_tokenizer)

compiled_tokenzier = compile_model(ov_tokenizer)
text_input = ["Test string"]

hf_output = hf_tokenizer(text_input, return_tensors="np")
ov_output = compiled_tokenzier(text_input)

for output_name in hf_output:
    print(f"OpenVINO {output_name} = {ov_output[output_name]}")
    print(f"HuggingFace {output_name} = {hf_output[output_name]}")
# OpenVINO input_ids = [[ 101 3231 5164  102]]
# HuggingFace input_ids = [[ 101 3231 5164  102]]
# OpenVINO token_type_ids = [[0 0 0 0]]
# HuggingFace token_type_ids = [[0 0 0 0]]
# OpenVINO attention_mask = [[1 1 1 1]]
# HuggingFace attention_mask = [[1 1 1 1]]

# save tokenizer for later use
save_model(ov_tokenizer, "openvino_tokenizer.xml")

loaded_tokenizer = compile_model("openvino_tokenizer.xml")
loaded_ov_output = loaded_tokenizer(text_input)
for output_name in hf_output:
    assert np.all(loaded_ov_output[output_name] == ov_output[output_name])

Connect Tokenizer to a Model

To infer and convert the original model, install torch or torch-cpu to the virtual environment.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from openvino import compile_model, convert_model
from openvino_tokenizers import convert_tokenizer, connect_models

checkpoint = "mrm8488/bert-tiny-finetuned-sms-spam-detection"
hf_tokenizer = AutoTokenizer.from_pretrained(checkpoint)
hf_model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

text_input = ["Free money!!!"]
hf_input = hf_tokenizer(text_input, return_tensors="pt")
hf_output = hf_model(**hf_input)

ov_tokenizer = convert_tokenizer(hf_tokenizer)
ov_model = convert_model(hf_model, example_input=hf_input.data)
combined_model = connect_models(ov_tokenizer, ov_model)
compiled_combined_model = compile_model(combined_model)

openvino_output = compiled_combined_model(text_input)

print(f"OpenVINO logits: {openvino_output['logits']}")
# OpenVINO logits: [[ 1.2007061 -1.4698029]]
print(f"HuggingFace logits {hf_output.logits}")
# HuggingFace logits tensor([[ 1.2007, -1.4698]], grad_fn=<AddmmBackward0>)

Use Extension With Converted (De)Tokenizer or Model With (De)Tokenizer

Import openvino_tokenizers will add all tokenizer-related operations to OpenVINO, after which you can work with saved tokenizers and detokenizers.

import numpy as np
import openvino_tokenizers
from openvino import Core

core = Core()

# detokenizer from codellama sentencepiece model
compiled_detokenizer = core.compile_model("detokenizer.xml")

token_ids = np.random.randint(100, 1000, size=(3, 5))
openvino_output = compiled_detokenizer(token_ids)

print(openvino_output["string_output"])
# ['sc�ouition�', 'intvenord hasient', 'g shouldwer M more']

Text generation pipeline

import numpy as np
from openvino import compile_model, convert_model
from transformers import AutoModelForCausalLM, AutoTokenizer
from openvino_tokenizers import add_greedy_decoding, convert_tokenizer

# Use different repo for the tokenizer because the original repo doesn't have .model file
# Sentencepiece(Unigram) tokenizer supported only with .model file
tokenizer_checkpoint = "microsoft/Llama2-7b-WhoIsHarryPotter"
model_checkpoint = "nickypro/tinyllama-15M"
hf_tokenizer = AutoTokenizer.from_pretrained(tokenizer_checkpoint)
hf_model = AutoModelForCausalLM.from_pretrained(model_checkpoint, use_cache=False)

# convert hf tokenizer
text_input = ["Quick brown fox was"]
ov_tokenizer, ov_detokenizer = convert_tokenizer(hf_tokenizer, with_detokenizer=True)
compiled_tokenizer = compile_model(ov_tokenizer)

# transform input text into tokens
ov_input = compiled_tokenizer(text_input)
hf_input = hf_tokenizer(text_input, return_tensors="pt")

# convert Pytorch model to OpenVINO IR and add greedy decoding pipeline to it
ov_model = convert_model(hf_model, example_input=hf_input.data)
ov_model_with_greedy_decoding = add_greedy_decoding(ov_model)
compiled_model = compile_model(ov_model_with_greedy_decoding)

# generate new tokens
new_tokens_size = 10
prompt_size = ov_input["input_ids"].shape[-1]
input_dict = {
  output.any_name: np.hstack([tensor, np.zeros(shape=(1, new_tokens_size), dtype=np.int_)])
  for output, tensor in ov_input.items()
}
for idx in range(prompt_size, prompt_size + new_tokens_size):
  output = compiled_model(input_dict)["token_ids"]
  input_dict["input_ids"][:, idx] = output[:, idx - 1]
  input_dict["attention_mask"][:, idx] = 1
ov_token_ids = input_dict["input_ids"]

hf_token_ids = hf_model.generate(
  **hf_input,
  min_new_tokens=new_tokens_size,
  max_new_tokens=new_tokens_size,
  temperature=0,  # greedy decoding
)

# decode model output
compiled_detokenizer = compile_model(ov_detokenizer)
ov_output = compiled_detokenizer(ov_token_ids)["string_output"]
hf_output = hf_tokenizer.batch_decode(hf_token_ids, skip_special_tokens=True)
print(f"OpenVINO output string: `{ov_output}`")
# OpenVINO output string: `['Quick brown fox was walking through the forest. He was looking for something']`
print(f"HuggingFace output string: `{hf_output}`")
# HuggingFace output string: `['Quick brown fox was walking through the forest. He was looking for something']`

Supported Tokenizer Types

Huggingface
Tokenizer Type
Tokenizer Model Type Tokenizer Detokenizer
Fast WordPiece
BPE
Unigram
Legacy SentencePiece .model
Custom tiktoken

Test Results

This report is autogenerated and includes tokenizers and detokenizers tests. The Output Matched, % column shows the percent of test strings for which the results of OpenVINO and Hugingface Tokenizers are the same. To update the report run pytest tokenizers_test.py --update_readme in modules/custom_operations/user_ie_extensions/tokenizer/python/tests directory.

Output Match by Tokenizer Type

Tokenizer Type Output Matched, % Number of Tests
BPE 95.82 3420
SentencePiece 86.28 2880
Tiktoken 97.69 216
WordPiece 82.12 520

Output Match by Model

Tokenizer Type Model Output Matched, % Number of Tests
BPE EleutherAI/gpt-j-6b 98.33 180
BPE EleutherAI/gpt-neo-125m 98.33 180
BPE EleutherAI/gpt-neox-20b 97.78 180
BPE EleutherAI/pythia-12b-deduped 97.78 180
BPE KoboldAI/fairseq-dense-13B 98.89 180
BPE Salesforce/codegen-16B-multi 97.22 180
BPE ai-forever/rugpt3large_based_on_gpt2 97.78 180
BPE bigscience/bloom 99.44 180
BPE databricks/dolly-v2-3b 97.78 180
BPE facebook/bart-large-mnli 97.22 180
BPE facebook/galactica-120b 98.33 180
BPE facebook/opt-66b 98.89 180
BPE gpt2 97.22 180
BPE laion/CLIP-ViT-bigG-14-laion2B-39B-b160k 61.11 180
BPE microsoft/deberta-base 96.11 180
BPE roberta-base 96.11 180
BPE sentence-transformers/all-roberta-large-v1 96.11 180
BPE stabilityai/stablecode-completion-alpha-3b-4k 98.33 180
BPE stabilityai/stablelm-tuned-alpha-7b 97.78 180
SentencePiece NousResearch/Llama-2-13b-hf 100.00 180
SentencePiece NousResearch/Llama-2-13b-hf_slow 100.00 180
SentencePiece THUDM/chatglm2-6b 100.00 180
SentencePiece THUDM/chatglm2-6b_slow 100.00 180
SentencePiece THUDM/chatglm3-6b 100.00 180
SentencePiece THUDM/chatglm3-6b_slow 100.00 180
SentencePiece camembert-base 0.00 180
SentencePiece camembert-base_slow 75.00 180
SentencePiece codellama/CodeLlama-7b-hf 100.00 180
SentencePiece codellama/CodeLlama-7b-hf_slow 100.00 180
SentencePiece microsoft/deberta-v3-base 93.33 180
SentencePiece microsoft/deberta-v3-base_slow 100.00 180
SentencePiece xlm-roberta-base 98.89 180
SentencePiece xlm-roberta-base_slow 98.89 180
SentencePiece xlnet-base-cased 61.11 180
SentencePiece xlnet-base-cased_slow 53.33 180
Tiktoken Qwen/Qwen-14B-Chat 98.15 108
Tiktoken Salesforce/xgen-7b-8k-base 97.22 108
WordPiece ProsusAI/finbert 80.00 40
WordPiece bert-base-multilingual-cased 80.00 40
WordPiece bert-large-cased 80.00 40
WordPiece cointegrated/rubert-tiny2 80.00 40
WordPiece distilbert-base-uncased-finetuned-sst-2-english 80.00 40
WordPiece google/electra-base-discriminator 80.00 40
WordPiece google/mobilebert-uncased 95.00 40
WordPiece jhgan/ko-sbert-sts 75.00 40
WordPiece prajjwal1/bert-mini 95.00 40
WordPiece rajiv003/ernie-finetuned-qqp 95.00 40
WordPiece rasa/LaBSE 72.50 40
WordPiece sentence-transformers/all-MiniLM-L6-v2 75.00 40
WordPiece squeezebert/squeezebert-uncased 80.00 40

Recreating Tokenizers From Tests

In some tokenizers, you need to select certain settings so that their output is closer to the Huggingface tokenizers:

  • THUDM/chatglm2-6b detokenizer always skips special tokens. Use skip_special_tokens=True during conversion
  • THUDM/chatglm3-6b detokenizer don't skips special tokens. Use skip_special_tokens=False during conversion
  • All tested tiktoken based detokenizers leave extra spaces. Use clean_up_tokenization_spaces=False during conversion

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

openvino_tokenizers-2023.3.0.0-py3-none-win_amd64.whl (14.0 MB view details)

Uploaded Python 3 Windows x86-64

openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_27_aarch64.whl (13.8 MB view details)

Uploaded Python 3 manylinux: glibc 2.27+ ARM64

openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_17_x86_64.whl (13.7 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ x86-64

openvino_tokenizers-2023.3.0.0-py3-none-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded Python 3 macOS 11.0+ ARM64

openvino_tokenizers-2023.3.0.0-py3-none-macosx_10_12_x86_64.whl (13.7 MB view details)

Uploaded Python 3 macOS 10.12+ x86-64

File details

Details for the file openvino_tokenizers-2023.3.0.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2023.3.0.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 832c5b7bc25332531fa365ace90781d944073a38c54866b85b5dbfcd11371fa8
MD5 987ae1bcaf143918fd1a266c9a48aee3
BLAKE2b-256 8a0a3b130e0e8ece49d0c0b77159f44be45c95e4cf3dc53331fbb8135a272723

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 ff5fdc3582f75b5e7af036b1a4390a08012fd3efac7fe2ddcfb54a1a0be92cd1
MD5 10b56938c54b92a60a240e883037f992
BLAKE2b-256 06146bd27ccff109d48791431afa001af1713c0e6ed1417b587b3600d1435a82

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 488955e8a3a057b26e77015c3b77beae9bd590adb2e44cd9bf536518499a8925
MD5 3e312746f92d37297eb651255cdbfad1
BLAKE2b-256 552568302a54097d1008bb4995a93448d3bf67dea818f35303ce98fecf8a9d87

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2023.3.0.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2023.3.0.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 43fb7ed01c98952a6c050d6ca8865809baaba0b1f79826953616f968c84021a4
MD5 d4d5935afbd4ddbbe45cec4e86b57b51
BLAKE2b-256 fac1e97b9171c4656c3fd200f9337d267bc1069ddfb1ad25c6ce30f6850d117c

See more details on using hashes here.

File details

Details for the file openvino_tokenizers-2023.3.0.0-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for openvino_tokenizers-2023.3.0.0-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b7ee7c000f74f0d842013d343887f37d8a0eb1402cb4ee73890273fd8bb7d5bc
MD5 d2d80bcd6620c15f6d7fa53e2515f2db
BLAKE2b-256 58071a765eb9d2011d5acd409c21f01f05de944f51699ce08ba7effa21aa9f6d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page