Insanely Fast Whisper speeds audio transcription 19× with Flash Attention 2
The open‑source Insanely Fast Whisper CLI tool leverages Flash Attention 2 to accelerate OpenAI Whisper transcription by 19 times—cutting a 2.5‑hour audio from 31 minutes to just 98 seconds on an Nvidia A100—while preserving accuracy and adding multilingual, speaker‑diarization, and precise timestamp features.
Insanely Fast Whisper accelerates OpenAI Whisper transcription by 19×, reducing the processing time for a 2.5‑hour audio from 31 minutes to 98 seconds.
Technical core: Flash Attention 2
The tool integrates Flash Attention 2 while keeping the model weights identical to the standard Whisper model, resulting in zero quality loss and a dramatic speed boost.
Performance comparison (Nvidia A100 ‑ 80 GB)
Standard Whisper large‑v3: 31 minutes for 2.5 h audio
Optimized large‑v3 (Insanely Fast Whisper): 1 minute 38 seconds
Distil‑Whisper large‑v2: 1 minute 18 seconds
Beyond speed
Multilingual support : automatic detection of dozens of languages and optional translation to English
Speaker diarization : built‑in speaker identification to separate speakers
Precise timestamps : word‑level and segment‑level timestamps for exact audio navigation
Cross‑platform compatibility : works on NVIDIA GPUs and Apple Silicon Macs without code changes
Free operation : can run on Google Colab’s free tier even without a local GPU
Installation and usage
pipx install insanely-fast-whisper
insanely-fast-whisper --file-name <audio_file_path_or_URL>For temporary use, the tool can be run without installation:
pipx run insanely-fast-whisperBackground
The project started as a benchmark demo for Hugging Face Transformers. After community members discovered its practical value, the developer added features that users needed, evolving it into a full‑featured command‑line utility and spawning related community projects such as a web app and a Python package.
Repository: https://github.com/Vaibhavs10/insanely-fast-whisper
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
