Artificial Intelligence 12 min read

How to Build Real‑Time Streaming Speech Recognition with a Large‑Model API (Go & Python)

This guide explains the background of speech‑to‑text technology, introduces the large‑model streaming speech recognition API, walks through obtaining an API key, and provides detailed Go and Python code for establishing a WebSocket connection, sending full‑client and audio‑only requests, and parsing server responses.

360 Smart Cloud

Dec 1, 2025

How to Build Real‑Time Streaming Speech Recognition with a Large‑Model API (Go & Python)

Background

About 70% of information exchanged between people is conveyed by voice, making speech recognition a critical component of intelligent assistants, voice input, and smart speakers. To meet this demand, a large‑model streaming speech recognition API has been added to the API marketplace.

Overview

The API converts spoken audio into text using advanced speech‑recognition and natural‑language‑understanding techniques. It supports scenarios such as intelligent customer service, novel reading, online education, meeting transcription, and video subtitles. The service uses a bidirectional streaming mode: the server only returns data packets when the recognition result changes, improving real‑time factor (RTF) and latency for the first and last words.

API Documentation

Technical reference: https://zyun.360.cn/product/apimarketitem/asr

Usage Instructions

1. Obtain an API Key

Log in to the cloud platform, navigate to the API Marketplace, locate the Speech Recognition service, create an application and generate an API Key. Save the key for later use.

2. Call the API

Establish WebSocket connection

func (c *AsrWsClient) createConnection() error {
    var tokenHeader = http.Header{"Authorization": []string{fmt.Sprintf("Bearer %s", "your token")}}
    fmt.Println("Connecting to ws://audio-asr.api.zyuncs.com/sauc ...")
    conn, resp, err := websocket.DefaultDialer.Dial("ws://audio-asr.api.zyuncs.com/sauc", tokenHeader)
    if err != nil {
        fmt.Println(err)
        return err
    }
    log.Printf("logid: %s
", resp.Header.Get("X-Tt-Logid"))
    c.connect = conn
    return nil
}

Send full client request

The first message after the connection is a full client request containing user metadata, audio metadata, and request configuration.

func NewFullClientRequest() []byte {
    var request bytes.Buffer
    request.Write(DefaultHeader().WithMessageTypeSpecificFlags(POS_SEQUENCE).toBytes())
    payload := AsrRequestPayload{
        User: UserMeta{Uid: "demo_uid"},
        Audio: AudioMeta{Format: "wav", Codec: "raw", Rate: 16000, Bits: 16, Channel: 1},
        Request: RequestMeta{
            ModelName: "bigmodel",
            EnableITN: true,
            EnablePUNC: true,
            EnableDDC: true,
            ShowUtterances: true,
            EnableNonstream: false,
        },
    }
    payloadArr, _ := sonic.Marshal(payload)
    payloadArr = GzipCompress(payloadArr)
    payloadSizeArr := make([]byte, 4)
    binary.BigEndian.PutUint32(payloadSizeArr, uint32(len(payloadArr)))
    binary.Write(&request, binary.BigEndian, int32(1))
    request.Write(payloadSizeArr)
    request.Write(payloadArr)
    return request.Bytes()
}

Send audio‑only requests

After the full request, audio data is sent in a series of audio‑only client requests. Each request carries a sequence number (starting at 1, ending with a negative number to indicate stream end) and a GZIP‑compressed audio payload.

func (c *AsrWsClient) sendMessages(segmentSize int, content []byte, stopChan <-chan struct{}) error {
    messageChan := make(chan []byte)
    go func() {
        for message := range messageChan {
            if err := c.connect.WriteMessage(websocket.TextMessage, message); err != nil {
                log.Printf("write message err: %s", err)
                return
            }
        }
    }()
    audioSegments := splitAudio(content, segmentSize)
    ticker := time.NewTicker(time.Duration(c.segmentDuration) * time.Millisecond)
    defer ticker.Stop()
    defer close(messageChan)
    for _, segment := range audioSegments {
        select {
        case <-ticker.C:
            if c.seq == len(audioSegments)+1 {
                c.seq = -c.seq
            }
            message := NewAudioOnlyRequest(c.seq, segment)
            messageChan <- message
            log.Printf("send message: seq: %d", c.seq)
            c.seq++
        case <-stopChan:
            return nil
        }
    }
    return nil
}

func NewAudioOnlyRequest(seq int, segment []byte) []byte {
    var request bytes.Buffer
    header := DefaultHeader()
    if seq < 0 {
        header.WithMessageTypeSpecificFlags(NEG_WITH_SEQUENCE)
    } else {
        header.WithMessageTypeSpecificFlags(POS_SEQUENCE)
    }
    header.WithMessageType(CLIENT_AUDIO_ONLY_REQUEST)
    request.Write(header.toBytes())
    binary.Write(&request, binary.BigEndian, int32(seq))
    payload := GzipCompress(segment)
    binary.Write(&request, binary.BigEndian, int32(len(payload)))
    request.Write(payload)
    return request.Bytes()
}

Parse full server response

The server replies with a full server response whose payload is a JSON object containing the recognition result. Parse the JSON according to the schema described in the documentation to extract the transcribed text.

Sample output

segmentSize is 6400
Connecting to ws://audio-asr.api.zyuncs.com/sauc ...
logid: 1234567890
send message: seq: 2
... (intermediate JSON fragments) ...
final result: "华为致力于把数字世界带入每个人、每个家庭、每个组织，构建万物互联的智能世界"

Conclusion

The large‑model streaming speech recognition API provides an optimized bidirectional streaming solution that delivers higher accuracy and contextual awareness for real‑time voice applications. By obtaining an API key, establishing a WebSocket connection, sending a full client request followed by audio‑only packets, and parsing the server’s JSON response, developers can integrate high‑quality speech‑to‑text capabilities into their services.

AI Golang Speech Recognition large model Streaming API

Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.