Mobile Development 15 min read

Using OpenSL ES for Low‑Latency Audio Capture, Transmission, and Playback on Android

The article describes using OpenSL ES API via NDK to achieve low-latency audio capture, UDP transmission, and playback on Android, detailing engine, recorder, and player initialization, buffer‑queue mechanisms, and thread handling to keep latency under 20 ms.

Tencent Music Tech Team
Tencent Music Tech Team
Tencent Music Tech Team
Using OpenSL ES for Low‑Latency Audio Capture, Transmission, and Playback on Android

Background

OpenSL ES is a hardware‑accelerated audio API optimized for embedded systems. It is royalty‑free, cross‑platform, and provides high‑performance, standardized, low‑latency audio processing, making native audio development on Android simpler and more efficient.

Framework Overview

The hardware and software implementations are shown in the original diagrams. Typical Android audio paths exhibit round‑trip latencies (RTL) of up to 300 ms, which is unacceptable for real‑time applications. Low‑latency targets are <100 ms, ideally <20 ms.

OpenSL ES uses a callback mechanism only to signal that a new buffer can be queued, keeping all processing on the audio thread.

Why Use OpenSL ES

Its buffer‑queue mechanism is more efficient within the Android media framework.

Low‑latency audio is supported only when using the OpenSL ES API via the Android NDK.

Being native code, it avoids Java/Dalvik overhead and delivers higher performance.

Key Interfaces

SLObjectItf – object interface

SLEngineItf – engine interface

SLPlayItf – playback interface

SLBufferQueueItf – buffer‑queue interface

SLVolumeItf – volume interface

Initialization

Engine Initialization

The engine object is created, realized, and its interfaces are obtained. Sample parameters (sample rate, frames per buffer, channels, bits per sample) are configured, and buffer queues for free and recorded buffers are allocated.

SLresult result;
memset(&engine, 0, sizeof(engine));
engine.fastPathSampleRate_   = static_cast<SLmilliHertz>(sampleRate) * 1000;
engine.fastPathFramesPerBuf_ = static_cast<uint32_t>(framesPerBuf);
engine.sampleChannels_      = AUDIO_SAMPLE_CHANNELS;
engine.bitsPerSample_       = SL_PCMSAMPLEFORMAT_FIXED_16;
result = slCreateEngine(&engine.slEngineObj_, 0, NULL, 0, NULL, NULL);
SLASSERT(result);
result = (*engine.slEngineObj_)->Realize(engine.slEngineObj_, SL_BOOLEAN_FALSE);
SLASSERT(result);
result = (*engine.slEngineObj_)->GetInterface(engine.slEngineObj_, SL_IID_ENGINE, &engine.slEngineItf_);
SLASSERT(result);
bufSize = engine.fastPathFramesPerBuf_ * engine.sampleChannels_ * engine.bitsPerSample_;
bufSize = (bufSize + 7) >> 3; // bits → byte
engine.bufCount_ = BUF_COUNT;
engine.bufs_ = allocateSampleBufs(engine.bufCount_, bufSize);
engine.freeBufQueue_ = new AudioQueue(engine.bufCount_);
engine.recBufQueue_  = new AudioQueue(engine.bufCount_);
for(uint32_t i=0; i<engine.bufCount_; i++) {
    engine.freeBufQueue_->push(&engine.bufs_[i]);
}

Recorder Initialization

The recorder sets up the audio source, data format, and buffer queue, then obtains the recording and buffer‑queue interfaces.

sampleInfo_ = *sampleFormat;
SLAndroidDataFormat_PCM_EX format_pcm;
ConvertToSLSampleFormat(&format_pcm, &sampleInfo_);
SLDataLocator_IODevice loc_dev = {SL_DATALOCATOR_IODEVICE, SL_IODEVICE_AUDIOINPUT, SL_DEFAULTDEVICEID_AUDIOINPUT, NULL};
SLDataSource audioSrc = {&loc_dev, NULL};
SLDataLocator_AndroidSimpleBufferQueue loc_bq = {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, DEVICE_SHADOW_BUFFER_QUEUE_LEN};
SLDataSink audioSnk = {&loc_bq, &format_pcm};
const SLInterfaceID id[2] = {SL_IID_ANDROIDSIMPLEBUFFERQUEUE, SL_IID_ANDROIDCONFIGURATION};
const SLboolean req[2] = {SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE};
result = (*slEngine)->CreateAudioRecorder(slEngine, &recObjectItf_, &audioSrc, &audioSnk, 2, id, req);
SLAndroidConfigurationItf inputConfig;
result = (*recObjectItf_)->GetInterface(recObjectItf_, SL_IID_ANDROIDCONFIGURATION, &inputConfig);
if (SL_RESULT_SUCCESS == result) {
    SLuint32 presetValue = SL_ANDROID_RECORDING_PRESET_VOICE_RECOGNITION;
    (*inputConfig)->SetConfiguration(inputConfig, SL_ANDROID_KEY_RECORDING_PRESET, &presetValue, sizeof(SLuint32));
}
result = (*recObjectItf_)->Realize(recObjectItf_, SL_BOOLEAN_FALSE);
result = (*recObjectItf_)->GetInterface(recObjectItf_, SL_IID_RECORD, &recItf_);
result = (*recObjectItf_)->GetInterface(recObjectItf_, SL_IID_ANDROIDSIMPLEBUFFERQUEUE, &recBufQueueItf_);
result = (*recBufQueueItf_)->RegisterCallback(recBufQueueItf_, bqRecorderCallback, this);
devShadowQueue_ = new AudioQueue(DEVICE_SHADOW_BUFFER_QUEUE_LEN);

Player Initialization

The player adds an OutputMix object for audio output and obtains playback, volume, and buffer‑queue interfaces.

sampleInfo_ = *sampleFormat;
result = (*slEngine)->CreateOutputMix(slEngine, &outputMixObjectItf_, 0, NULL, NULL);
result = (*outputMixObjectItf_)->Realize(outputMixObjectItf_, SL_BOOLEAN_FALSE);
SLDataLocator_AndroidSimpleBufferQueue loc_bufq = {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, DEVICE_SHADOW_BUFFER_QUEUE_LEN};
SLAndroidDataFormat_PCM_EX format_pcm;
ConvertToSLSampleFormat(&format_pcm, &sampleInfo_);
SLDataSource audioSrc = {&loc_bufq, &format_pcm};
SLDataLocator_OutputMix loc_outmix = {SL_DATALOCATOR_OUTPUTMIX, outputMixObjectItf_};
SLDataSink audioSnk = {&loc_outmix, NULL};
SLInterfaceID ids[2] = {SL_IID_BUFFERQUEUE, SL_IID_VOLUME};
SLboolean req[2] = {SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE};
result = (*slEngine)->CreateAudioPlayer(slEngine, &playerObjectItf_, &audioSrc, &audioSnk, 2, ids, req);
result = (*playerObjectItf_)->Realize(playerObjectItf_, SL_BOOLEAN_FALSE);
SLASSERT(result);
result = (*playerObjectItf_)->GetInterface(playerObjectItf_, SL_IID_PLAY, &playItf_);
result = (*playerObjectItf_)->GetInterface(playerObjectItf_, SL_IID_VOLUME, &volumeItf_);
result = (*playerObjectItf_)->GetInterface(playerObjectItf_, SL_IID_BUFFERQUEUE, &playBufferQueueItf_);
result = (*playBufferQueueItf_)->RegisterCallback(playBufferQueueItf_, bqPlayerCallback, this);

Audio Data Capture

The capture thread dequeues free buffers, enqueues them to the recorder, and pushes filled buffers into a shadow queue for transmission.

sample_buf *dataBuf = NULL;
devShadowQueue_->front(&dataBuf);
devShadowQueue_->pop();
dataBuf->size_ = dataBuf->cap_;
sendUdpMessage(dataBuf);
sample_buf* freeBuf;
while (freeQueue_->front(&freeBuf) && devShadowQueue_->push(freeBuf)) {
    freeQueue_->pop();
    SLresult result = (*bq)->Enqueue(bq, freeBuf->buf_, freeBuf->cap_);
    sample_buf *vienBuf = allocateOneSampleBufs(getBufSize());
    freeQueue_->push(vienBuf);
}

Audio Data Transmission

Captured buffers are sent via UDP; the receiver places incoming packets into the playback buffer queue.

void sendUdpMessage(sample_buf *dataBuf){
    sendto(client_socket_fd, dataBuf->buf_, dataBuf->size_, 0,
           (struct sockaddr *)&server_addr, sizeof(server_addr));
}
sample_buf *vien_buf = sampleBufs(BUF_SIZE);
if (recvfrom(server_socket_fd, vien_buf->buf_, BUF_SIZE, 0, (struct sockaddr*)&client_addr, &client_addr_length) == -1) {
    exit(1);
}
if (getAudioPlayer() != NULL) {
    getRecBufQueue()->push(vien_buf);
    if (count_buf++ == 3) {
        getAudioPlayer()->PlayAudioBuffers(PLAY_KICKSTART_BUFFER_COUNT);
    }
}

Audio Playback

Playback pulls buffers from the play queue, moves them through a shadow queue, and enqueues them to the OpenSL ES buffer queue. If no data is available, an empty buffer is generated to keep the pipeline alive.

sample_buf *buf = NULL;
if(!playQueue_->front(&buf)) {
    uint32_t totalBufCount;
    callback_(ctx_, ENGINE_SERVICE_MSG_RETRIEVE_DUMP_BUFS, &totalBufCount);
    break;
}
if(!devShadowQueue_->push(buf)) {
    break; // Player buffer queue full
}
(*playBufferQueueItf_)->Enqueue(playBufferQueueItf_, buf->buf_, buf->size_);
playQueue_->pop();
sample_buf *buf;
if(!devShadowQueue_->front(&buf)) {
    if(callback_) {
        uint32_t count;
        callback_(ctx_, ENGINE_SERVICE_MSG_RETRIEVE_DUMP_BUFS, &count);
    }
    return;
}
devShadowQueue_->pop();
buf->size_ = 0;
if(playQueue_->front(&buf) && devShadowQueue_->push(buf)) {
    (*bq)->Enqueue(bq, buf->buf_, buf->size_);
    playQueue_->pop();
} else {
    sample_buf *buf_temp = new sample_buf;
    buf_temp->buf_ = new uint8_t[BUF_SIZE];
    buf_temp->size_ = BUF_SIZE;
    buf_temp->cap_ = BUF_SIZE;
    (*bq)->Enqueue(bq, buf_temp->buf_, BUF_SIZE);
    devShadowQueue_->push(buf_temp);
}

The presented workflow demonstrates how native OpenSL ES code can achieve low‑latency audio capture, network transmission, and playback on Android devices.

Low LatencyAndroid AudioAudio StreamingNative CodeOpenSL ES
Tencent Music Tech Team
Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.