Big Data 13 min read

How Hisense Juhau Revamped Its Big Data Platform for Real‑Time Intelligence

Hisense Juhau, an AI‑enabled TV cloud service, overhauled its massive offline‑centric data platform by adopting a real‑time data lake, compute‑storage separation, and serverless Spark/StarRocks on Alibaba Cloud, achieving sub‑5‑minute data freshness, elastic scaling, and dramatically improved performance for personalized content recommendation and smart operations.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Hisense Juhau Revamped Its Big Data Platform for Real‑Time Intelligence

Company Overview

Hisense Juhau (聚好看) is an internet TV cloud service provider under Hisense Group, serving over 120 million households with AI‑enabled smart TV experiences.

Challenges

Rapid shift to fine‑grained, personalized services required near‑real‑time data insight, dynamic user profiles, and minute‑level operational loops, exposing limitations of the traditional batch‑centric Lambda architecture such as long data pipelines, coupled compute‑storage, high expansion cost, and lake ingestion latency.

Architecture Upgrade

In partnership with Alibaba Cloud, Juhau rebuilt its platform using the full open‑source stack (EMR on ECS, Serverless Spark, Serverless StarRocks) and adopted Apache Paimon as the unified lake format, achieving:

Real‑time data lake

Compute‑storage separation

Serverless compute model

Continuous performance optimization

Real‑time Data Lake Solution

By introducing Paimon with Serverless Spark, Juhau enabled stream‑batch unified ingestion, reducing data‑to‑lake latency from hours to under five minutes and supporting billion‑scale device data with minute‑level freshness.

Real-time data lake architecture diagram
Real-time data lake architecture diagram

Compute‑Storage Separation

Data migrated to Alibaba Cloud OSS, decoupling storage from compute. EMR Serverless Spark and StarRocks clusters connect via high‑speed internal network, allowing elastic scaling, reducing NameNode pressure, and supporting multi‑engine data sharing.

Compute-storage separation diagram
Compute-storage separation diagram

Serverless Compute

Serverless Spark and StarRocks provide on‑demand, second‑level elastic compute. Jobs can scale to thousands of vCores within a minute, aligning resources with business‑driven SLAs and cutting TCO by over 30%.

Serverless Spark elasticity
Serverless Spark elasticity

Performance Optimizations

Leveraging EMR Serverless Spark, Juhau integrated Fusion Engine (vectorized Spark), Celeborn Remote Shuffle Service, and Apache Paimon‑based small‑file compaction, delivering up to 5× faster query execution, 30% overall performance gain, and >90% reduction in small files.

Outcome

The upgraded platform delivers sub‑5‑minute data freshness, elastic resource provisioning, higher stability, and supports future AI model training and cross‑scene smart services.

cloud computingApache PaimonReal-time Data Lake
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.