Artificial Intelligence 6 min read

1.5B‑Parameter Model Enables Offline Real‑Time Speech Transcription

Liquid AI’s new 1.5 B‑parameter LFM2‑Audio model delivers high‑quality offline, real‑time speech‑to‑text, text‑to‑speech, and multimodal dialogue on local devices, using a 1.2 B language backbone, a FastConformer encoder, and supports two generation strategies, with benchmark scores surpassing larger rivals.

AI Engineering

Jan 6, 2026

1.5B‑Parameter Model Enables Offline Real‑Time Speech Transcription

Cloud‑based speech transcription is common, but fully offline real‑time transcription has only recently become viable. Liquid AI released its first end‑to‑end audio foundation model, LFM2‑Audio‑1.5B, demonstrating that a 1.5 B‑parameter model can handle high‑quality audio tasks locally.

Language model backbone : 1.2 B‑parameter LFM2 model

Audio encoder : FastConformer‑based 115 M‑parameter encoder

Audio tokenizer : Mimi from Kyutai, supporting eight codebooks

Context length : 32,768 tokens

Supported precision : bfloat16

Beyond its small size, the model is a unified multimodal system that does not require separate ASR and TTS components; it can perform speech‑to‑text, text‑to‑speech, and handle mixed multi‑turn dialogues.

The model supports two generation strategies:

Interleaved generation : Text and audio tokens alternate in a fixed pattern, minimizing the first audio output latency and suited for real‑time voice dialogue.

Sequential generation : A special token tells the model when to switch modalities, fitting ASR, TTS, or other non‑dialogue tasks.

This flexibility lets a single model adapt to different usage scenarios.

Typical usage examples (run with llama‑lfm2‑audio):

./llama-lfm2-audio \
    -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \
    --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \
    -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \
    -sys "Perform ASR." \
    --audio $INPUT_WAV

For text‑to‑speech:

./llama-lfm2-audio \
    -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \
    --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \
    -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \
    -sys "Perform TTS." \
    -p "My name is Pau Labarta Bajo and I love AI" \
    --output $OUTPUT_WAV

And for TTS with voice commands:

./llama-lfm2-audio \
    -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf \
    --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf \
    -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf \
    -sys "Perform TTS.
    Use the following voice: A male speaker delivers a very expressive and animated speech, with a low‑pitch voice and a slightly close‑sounding tone. The recording carries a slight background noise." \
    -p "What is your name man?" \
    --output $OUTPUT_WAV

Despite its modest parameter count, performance rivals larger competitors. In VoiceBench audio tests, LFM2‑Audio‑1.5B achieved a composite score of 56.78, far above the 7 B‑parameter Moshi model (29.51). On ASR, its average word error rate (WER) is 7.24 %, comparable to Whisper‑large‑V3’s 7.93 %.

A notable comparison is with Qwen2.5‑Omni‑3B, which has more than three times the parameters but shows similar metrics on most indicators, highlighting Liquid AI’s efficiency optimizations.

The current limitation is English‑only support, restricting some use cases.

Conclusion : Prioritizing local processing aligns with many applications that value data privacy and independence from network connectivity, creating numerous scenarios for offline‑first solutions.

Repository: https://github.com/Liquid4All/liquid-audio

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Multimodal AI FastConformer LFM2-Audio offline speech transcription VoiceBench benchmark Whisper-large-V3 comparison

Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.