Artificial Intelligence 4 min read

Insanely Fast Whisper speeds audio transcription 19× with Flash Attention 2

The open‑source Insanely Fast Whisper CLI tool leverages Flash Attention 2 to accelerate OpenAI Whisper transcription by 19 times—cutting a 2.5‑hour audio from 31 minutes to just 98 seconds on an Nvidia A100—while preserving accuracy and adding multilingual, speaker‑diarization, and precise timestamp features.

AI Engineering

Apr 28, 2026

Insanely Fast Whisper speeds audio transcription 19× with Flash Attention 2

Insanely Fast Whisper accelerates OpenAI Whisper transcription by 19×, reducing the processing time for a 2.5‑hour audio from 31 minutes to 98 seconds.

Technical core: Flash Attention 2

The tool integrates Flash Attention 2 while keeping the model weights identical to the standard Whisper model, resulting in zero quality loss and a dramatic speed boost.

Performance comparison (Nvidia A100 ‑ 80 GB)

Standard Whisper large‑v3: 31 minutes for 2.5 h audio

Optimized large‑v3 (Insanely Fast Whisper): 1 minute 38 seconds

Distil‑Whisper large‑v2: 1 minute 18 seconds

Beyond speed

Multilingual support : automatic detection of dozens of languages and optional translation to English

Speaker diarization : built‑in speaker identification to separate speakers

Precise timestamps : word‑level and segment‑level timestamps for exact audio navigation

Cross‑platform compatibility : works on NVIDIA GPUs and Apple Silicon Macs without code changes

Free operation : can run on Google Colab’s free tier even without a local GPU

Installation and usage

pipx install insanely-fast-whisper
insanely-fast-whisper --file-name <audio_file_path_or_URL>

For temporary use, the tool can be run without installation:

pipx run insanely-fast-whisper

Background

The project started as a benchmark demo for Hugging Face Transformers. After community members discovered its practical value, the developer added features that users needed, evolving it into a full‑featured command‑line utility and spawning related community projects such as a web app and a Python package.

Repository: https://github.com/Vaibhavs10/insanely-fast-whisper

GPU acceleration open-source CLI tool Whisper audio transcription Flash Attention 2

Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Technical core: Flash Attention 2

Performance comparison (Nvidia A100 ‑ 80 GB)

Beyond speed

Installation and usage

Background

AI Engineering

How this landed with the community

Was this worth your time?

0 Comments

Performance comparison (Nvidia A100 ‑ 80 GB)