Mobile Development 8 min read

Video Streaming Solution for the ARC Car Cloud Control Platform

The ARC Car Cloud Control platform now streams the vehicle’s screen using Android’s Virtual Display and a C++‑based H.264 hardware encoder, sending raw video over a TCP socket to a server that adaptively adjusts bitrate and frame rate, while the web client decodes the fragmented MP4 via MSE, dramatically lowering CPU usage and latency on low‑end head‑units.

Amap Tech
Amap Tech
Amap Tech
Video Streaming Solution for the ARC Car Cloud Control Platform

Background

ARC (Gaode Car Cloud Control Platform) is a cloud‑control platform deeply customized for in‑vehicle devices. It enables remote usage of various car devices. To let remote users operate the devices as if they were local, the device screen must be transmitted back in real time, making screen transmission a core component of ARC.

Initially we used the widely adopted open‑source screen transmission solution minicap . It captures the screen, compresses it into JPEG images, and sends frames to the web client. However, car head‑units have far lower performance than smartphones; image compression can consume up to 80% of CPU on low‑end devices, causing noticeable lag. Moreover, JPEG compression does not achieve high compression ratios, leading to large bandwidth consumption and excessive latency on low‑bandwidth networks.

Therefore we needed a solution that balances screen quality with CPU usage on the vehicle side. This article summarizes the video‑streaming approach adopted for the cloud‑control platform.

Approach

Transmitting raw images without compression would avoid CPU load on the device, but USB bandwidth on the car head‑unit cannot sustain uncompressed HD frames – only about three frames per second are possible.

Another idea was to use the device’s hardware encoder to reduce CPU consumption. Since Android 4.1, most devices include an H.264 hardware encoder, we decided to adopt a video‑streaming solution: the device encodes the screen into an H.264 stream, which is then forwarded by the server to the web client for decoding and display.

Implementation

The solution consists of three parts:

Device side : captures the screen and performs encoding.

Server side : handles video‑stream transmission and control.

Web side : decodes and displays the video stream.

Screen Capture and Encoding

Screen capture uses Android’s Virtual Display. Several encoding methods exist. The Java‑based approach only supports Android 5.0+, but a large share of in‑vehicle devices still run Android 4.x, so we implemented a C++ solution compatible with Android 4.3 and above.

Video Stream Transmission and Control

Common live‑streaming solutions (RTMP, HLS, flv.js) introduce 1‑3 seconds of latency, which is unacceptable for interactive cloud‑control scenarios that require millisecond‑level delay. Consequently we chose to transmit raw H.264 over a TCP socket, allowing the web client to play the stream directly.

To improve user experience, the server adds adaptive control: a buffer queue monitors front‑end bandwidth and automatically adjusts frame rate and bitrate to maintain smooth playback.

Web‑Side Presentation and Decoding

The web client uses Media Source Extensions (MSE) with fragmented MP4. The H.264 raw stream is packaged into fragmented MP4 fragments and fed to MSE for decoding and playback, following the open‑source Jmuxer implementation.

Frame Dropping and Frame Insertion

Android Virtual Display can produce up to 60 fps, but 30 fps is sufficient for perceived smoothness. To save bandwidth we cap the output at 30 fps and lower it further under poor network conditions. Since Android MediaCodec does not allow explicit frame‑rate control, we implement frame dropping to regulate the output rate.

Windows 7 hardware decoders lack low‑latency mode and need about 10 frames before playback starts. Because Virtual Display only generates frames when the screen changes, we also implement frame insertion (repeating the last frame) to avoid playback stalls.

We create an EGLSurface to handle dropping and inserting frames: frame dropping is achieved by controlling the time interval for drawing textures onto the EGLSurface, while frame insertion repeats the last frame data.

Conclusion

The solution deployed on the ARC platform improves transmission quality while significantly reducing CPU load on the vehicle side, resulting in smoother user interaction. The approach can be applied to other cloud‑control platforms. If Android 4.x support is not required, a Java‑level API can be used to obtain video data, further reducing development and adaptation effort.

AndroidH.264Video StreamingLow Latencycloud controlMedia Source Extensions
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.