Remote Audio Interaction for Car Devices: Challenges, Solutions, and Final Selection
To enable reliable two‑way audio for remote testing of car‑mounted navigation devices, the team evaluated remote‑submix, software hooking, USB, Bluetooth and hardware forwarding, ultimately selecting a hybrid software‑hook and hardware‑forwarding approach combined with WebRTC encoding to meet sub‑500 ms latency and broad device coverage.
With the growth of mobile internet, many platforms (e.g., Alibaba Cloud MQC, Testin, Baidu MTC, Tencent WeTest, Huawei, Samsung) provide cloud‑controlled device management. However, all of them share a known limitation: they cannot handle audio playback or voice interaction in remote (cloud) devices.
Our product, Gaode Map Auto (car‑mounted navigation), relies heavily on audio scenarios such as navigation prompts, multimedia mixing, and voice‑assistant dialogs, which account for more than 25% of usage. Therefore, enabling two‑way audio in remote testing is a prerequisite for making cloud devices a daily production tool.
Challenges
Capability: support bidirectional audio for all car devices (which are more customized than phones).
Latency: keep transmission delay below 500 ms under typical network conditions.
Experience: avoid noticeable stutter or noise.
Audio acquisition and writing
We first examined the Android audio stack (application → libraries → HAL → driver → hardware). The MediaRecorder.AudioSource.REMOTE_SUBMIX API (added in API 19) can capture the system mix, but it requires the CAPTURE_AUDIO_OUTPUT permission, which only system components have. Third‑party apps would need system signing or OS modification, making this approach unsuitable for broad deployment.
Because most car devices are rooted (≈80%), we explored software hooking. By hooking the HAL functions ( open_output_stream , close_output_stream , open_input_stream , close_input_stream ) in system/lib/hw/audio.primary.*.so and the TinyALSA functions ( pcm_open , pcm_close , pcm_write , pcm_read ) in system/lib/libtinyalsa.so , we can intercept audio data before it reaches the driver.
However, hooking is costly: it requires root on each device, adaptation to different Android versions, and may not reproduce the exact mixed sound due to DSP processing in some car units.
USB audio
Android supports three USB modes: Host (audio possible but no ADB), Development (ADB available but no native USB audio), and Accessory (both ADB and audio output after Android 4.1). The accessory mode is rarely supported by car head‑units, limiting this approach.
Bluetooth reception
Using a Bluetooth receiver as a pseudo‑headset yields bidirectional audio, but only 5 out of 35 tested car units (14%) support it, making it impractical for large‑scale deployment.
Hardware forwarding
We can tap the audio line that drives the car speaker, convert it to a USB‑compatible stream, and process it on a PC. This solution is cross‑platform (Android, Linux, QNX, iOS), eliminates the mixing‑reconstruction issue, and supports full duplex audio. Drawbacks include the need for custom wiring on some car mirrors, additional hardware cost, and the requirement to design a suitable cable harness.
Solution comparison
The table below summarizes the pros, cons, and coverage of each approach:
Solution
Advantages
Disadvantages
Coverage
REMOTE_SUBMIX
Existing API; sync audio/video; no hardware cost
Requires system signature/root; no input channel; API 18+; cannot play locally while recording
Low
Software hook
No hardware cost; sync audio/video; works on all car OS versions
Root required; high adaptation cost; development difficulty; mixing artifacts
Theoretical 100 % (actual 52 % accurate mix)
USB audio
Existing tech; no hardware cost
Output only; works only in accessory mode; most cars lack support
Low
Bluetooth reception
Bidirectional
Requires USB hub & PC software; many cars restrict Bluetooth to slave mode
Low
Hardware forwarding
Cross‑platform; full duplex; no mixing issues
Some cars lack accessible wiring; custom harness needed; hardware cost
Pre‑install 100 %; aftermarket 58 %
Considering the car‑specific constraints, we selected a combination of software hook and hardware forwarding as the final solution.
Audio encoding & transmission
We evaluated several streaming protocols (RTMP, HTTP‑FLV, RTP, WebRTC). RTMP/HTTP‑FLV provide no packet loss but incur 2‑5 s latency. RTP can achieve ~1 s latency but suffers from packet loss and lacks browser support. WebRTC using Opus offers ~500 ms latency, acceptable packet loss, and works across browsers with a dedicated server.
Given our real‑time requirements, we chose the WebRTC solution.
Final architecture
The final technical diagram (see image) integrates the audio capture (hook + hardware forwarding), encoding with WebRTC, and the video stream optimization path.
In summary, a hybrid soft‑hardware approach enables reliable two‑way audio for car devices and can be adapted to other platforms such as smartphones, Android, iOS, Linux, and QNX.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.