An insanely fast whisper CLI
Project description
Insanely Fast Whisper
Powered by 🤗 Transformers, Optimum & flash-attn
TL;DR - Transcribe 300 minutes (5 hours) of audio in less than 5 minutes - with OpenAI's Whisper Large v2. Blazingly fast transcription is now a reality!⚡️
Not convinced? Here are some benchmarks we ran on a free Google Colab T4 GPU! 👇
Optimisation type | Time to Transcribe (150 mins of Audio) |
---|---|
Transformers (fp32 ) |
~31 (31 min 1 sec) |
Transformers (fp32 + batching [8] ) |
~13 (13 min 19 sec) |
Transformers (fp16 + batching [24] + bettertransformer ) |
~5 (5 min 2 sec) |
Transformers (distil-whisper) (fp16 + batching [24] + bettertransformer ) |
~3 (3 min 16 sec) |
Faster Whisper (fp16 + beam_size [1] ) |
~9.23 (9 min 23 sec) |
Faster Whisper (8-bit + beam_size [1] ) |
~8 (8 min 15 sec) |
🆕 You can now access blazingly fast transcriptions via your terminal! ⚡️
We've added a CLI to enable fast transcriptions. Here's how you can use it:
Transcribe your audio
Install insanely-fast-whisper
with pipx
:
pipx install insanely-fast-whisper
Run inference from any path on your computer:
insanely-fast-whisper --file-name <filename or URL>
Don't want to install? Just use pipx run
:
pipx run insanely-fast-whisper --file-name <filename or URL>
Note: The CLI is opinionated and currently only works for Nvidia GPUs. Make sure to check out the defaults and the list of options you can play around with to maximise your transcription throughput. Run insanely-fast-whisper --help
or pipx run insanely-fast-whisper --help
to get all the CLI arguments and defaults.
How to use it without a CLI?
For older GPUs, all you need to run is:
import torch
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition",
"openai/whisper-large-v2",
torch_dtype=torch.float16,
device="cuda:0")
pipe.model = pipe.model.to_bettertransformer()
outputs = pipe("<FILE_NAME>",
chunk_length_s=30,
batch_size=24,
return_timestamps=True)
outputs["text"]
For newer (A10, A100, H100s), use Flash Attention:
import torch
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition",
"openai/whisper-large-v2",
torch_dtype=torch.float16,
model_kwargs={"use_flash_attention_2": True},
device="cuda:0")
outputs = pipe("<FILE_NAME>",
chunk_length_s=30,
batch_size=24,
return_timestamps=True)
outputs["text"]
Roadmap
- Add benchmarks for Whisper.cpp
- Add benchmarks for 4-bit inference
- Add a light CLI script
- Deployment script with Inference API
Community showcase
@ochen1 created a brilliant MVP for a CLI here: https://github.com/ochen1/insanely-fast-whisper-cli (Try it out now!)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file insanely_fast_whisper-0.0.3.tar.gz
.
File metadata
- Download URL: insanely_fast_whisper-0.0.3.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.10.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ff41da586b293768c585089f4bf2827fe8f433e6486aa8c375630c13d7c03b9 |
|
MD5 | 8d6224937332d3235f87b155f07a5a41 |
|
BLAKE2b-256 | 6733b41538ac36e514bd428aef35574cc69526b51eeb5ad53b76ccc879a10789 |
File details
Details for the file insanely_fast_whisper-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: insanely_fast_whisper-0.0.3-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.10.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4aa7a00c0750588820623f3de424c9153278e5fb255e3c8a909a0c940f29a6e8 |
|
MD5 | b8cdafd822a991f538bbba096e8581ee |
|
BLAKE2b-256 | 110f9754980b6c81fa713b83f8fb50406c3a03a5a040b4217cec45b5431c7bc5 |