Open‑Source GLM‑ASR‑Nano‑2512: Chinese Dialect‑Optimized Speech Recognition on Consumer‑Grade GPUs
GLM‑ASR‑Nano‑2512, a 1.5 B‑parameter open‑source speech‑recognition model released in December 2025, delivers state‑of‑the‑art accuracy on Chinese dialects and low‑volume audio, outperforms Whisper V3 on benchmark tests, runs on consumer GPUs, and provides detailed installation and deployment guides for transformers, vLLM and SGLang.
Model Overview
GLM‑ASR‑Nano‑2512 released Dec 2025 by Zhipu Z.AI. 1.5 B parameters, small footprint. Official evaluation shows it outperforms OpenAI Whisper V3 on Chinese benchmarks.
Dialect support : optimized for Cantonese and other Chinese dialects; standard ASR models fail when dialects mix with Mandarin.
Low‑volume speech : trained on “whisper” scenarios such as distant speakers, weak telephone recordings, and low‑voice speech in noisy environments.
SOTA performance : average error rate 4.10 % on Wenet Meeting (real‑meeting) and Aishell‑1 (standard Mandarin).
Language coverage : 17 languages with WER ≤ 20 %.
Benchmark
Official benchmark results show GLM‑ASR‑Nano leads across reported metrics.
Comparison with Whisper
Scenarios where GLM‑ASR‑Nano is preferred:
Need to recognize Cantonese, Sichuanese, or other Chinese dialects.
Meeting recordings contain many low‑volume utterances.
Require on‑premises deployment (data never leaves domain).
Plan to fine‑tune for domain‑specific data (medical, legal, finance).
Seek cost‑effective solution without API fees.
Scenarios where Whisper is preferred:
Coverage of 100+ languages.
Mature community ecosystem and extensive documentation.
Built‑in transcribe‑and‑translate capability.
Processing of global accents.
Hardware Requirements
Minimum configuration:
GPU 8 GB+ VRAM (e.g., RTX 3060)
Memory 16 GB
Storage 5 GB for model weights
Production recommendation:
GPU NVIDIA A100, V100 or equivalent
Memory 32 GB+
SSD storage for faster loading
With faster‑whisper optimization, mid‑range GPUs such as a down‑clocked 1080Ti can achieve faster‑than‑real‑time decoding.
Installation
pip install -r requirements.txt
sudo apt install ffmpeg
pip install git+https://github.com/huggingface/transformers # installs transformers 5.0.0 from sourceBasic Usage (Transformers 5.0.0)
from transformers import AutoModel, AutoProcessor
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
repo_id = "zai-org/GLM-ASR-Nano-2512"
processor = AutoProcessor.from_pretrained(repo_id)
model = AutoModel.from_pretrained(repo_id, dtype=torch.bfloat16, device_map=device)
messages = [{
"role": "user",
"content": [
{"type": "audio", "url": "example_zh.wav"},
{"type": "text", "text": "Please transcribe this audio into text"}
]
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True))Service Deployment with vLLM
Upgrade to vLLM 0.14.0 and install matching transformers version.
pip install git+https://github.com/huggingface/transformers
python -m vllm.entrypoints.openai.api_server \
--model /data/models/GLM-ASR-Nano-2512 \
--trust-remote-code \
--dtype bfloat16 \
--host 0.0.0.0 \
--port 8000Client example (OpenAI‑compatible):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
with open("audio.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(model="GLM-ASR-Nano-2512", file=audio_file)
print(transcript.text)Service Deployment with SGLang
docker pull lmsysorg/sglang:dev
pip install git+https://github.com/huggingface/transformers
python -m sglang.launch_server \
--model-path zai-org/GLM-ASR-Nano-2512 \
--served-model-name glm-asr \
--host 0.0.0.0 \
--port 8000OpenAI‑compatible call:
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://127.0.0.1:8000/v1")
response = client.chat.completions.create(
model="glm-asr",
messages=[{
"role": "user",
"content": [
{"type": "audio_url", "audio_url": {"url": "example_zh.wav"}},
{"type": "text", "text": "Please transcribe this audio into text"}
]
}],
max_tokens=1024,
)
print(response.choices[0].message.content.strip())Batch Inference
from transformers import GlmAsrForConditionalGeneration, AutoProcessor
processor = AutoProcessor.from_pretrained("zai-org/GLM-ASR-Nano-2512")
model = GlmAsrForConditionalGeneration.from_pretrained(
"zai-org/GLM-ASR-Nano-2512", dtype="auto", device_map="auto"
)
inputs = processor.apply_transcription_request(["audio1.mp3", "audio2.mp3"])
inputs = inputs.to(model.device, dtype=model.dtype)
outputs = model.generate(**inputs, do_sample=False, max_new_tokens=500)
decoded = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(decoded)Application Scenarios
Enterprise meeting transcription with mixed dialects and distant speakers.
Call‑center handling regional accents.
Medical record dictation with low‑volume, fast speech.
Media & broadcasting for local TV or online streams.
Edge‑device deployment; 1.5 B parameters run on consumer‑grade GPUs.
Download Links
🤗 Hugging Face: https://huggingface.co/zai-org/GLM-ASR-Nano-2512
🤖 ModelScope: https://modelscope.cn/models/ZhipuAI/GLM-ASR-Nano-2512
GitHub: https://github.com/zai-org/GLM-ASR
Note: Models downloaded before 27 December 2025 must be re‑pulled because the weight format was updated for compatibility with transformers and SGLang.
Advantages & Limitations
Advantages
Strong Cantonese and other dialect recognition.
Effective low‑volume speech handling.
Open‑source, free, supports local deployment and fine‑tuning.
Compatible with major inference frameworks: transformers 5.x, vLLM, SGLang.
Limitations
Language coverage limited to 17 languages (vs 100+ for Whisper).
Community ecosystem still under development.
Requires building transformers from source (5.0.0).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
