Big Data 15 min read

Real-Time Monitoring Dashboard Solution in Future Cloud – Architecture, Technical Challenges, and Product Insights

This article presents the Future Cloud Business Monitoring real-time dashboard solution, detailing its technical architecture, key challenges in massive log processing, storage choices, product considerations, experience sharing, future plans, and concrete case studies such as live classroom monitoring.

TAL Education Technology
TAL Education Technology
TAL Education Technology
Real-Time Monitoring Dashboard Solution in Future Cloud – Architecture, Technical Challenges, and Product Insights

Background: Real-time monitoring dashboards are classic big‑data applications widely used by internet companies (e.g., Taobao Double‑11 sales screen, Didi user distribution). This article introduces the real‑time monitoring dashboard solution offered by Future Cloud – Business Monitoring.

Key Points: The construction of a real‑time dashboard involves both technical and product dimensions, each with specific challenges and considerations.

Technical Challenges: • Second‑level metric calculation for massive logs; • Ensuring computation stability during tidal traffic spikes; • Selecting appropriate storage for different metric types.

Product Challenges: • Identifying which business scenarios require a real‑time dashboard; • Determining which metrics should be displayed; • Defining the overall value of a real‑time dashboard.

Technical Architecture:

1. Source Layer: Includes client logs (user events, network requests, crashes), trace logs (DCDN, origin, server trace), business logs (key business events), and structured business data (users, courses, orders).

2. Transport Layer: Relies on Future Cloud Log Center’s collection channel to unify ingestion and registration of source data, adhering to metadata standards.

3. Processing Layer: Utilizes distributed real‑time computation engines: Flink for low‑latency stream processing (on K8s or Yarn) and Spark for micro‑batch processing (on Yarn).

4. Storage Layer: Supports high‑traffic OLAP scenarios with two components: ClickHouse (column‑store, distributed, SQL‑enabled, high write throughput) and Redis (in‑memory, atomic operations, used as cache and for massive real‑time metric queries).

5. Presentation Layer: Backend services assemble data and expose it to the frontend via HTTP (minute‑level refresh) or WebSocket (sub‑second push for sales, online users, alerts).

Experience Sharing – Stability under Tidal Traffic:

Stress Testing: Use real peak‑period logs to simulate data skew and evaluate QPS, record size, and field diversity; test both low‑peak and high‑peak periods with minimal resources.

Task Scheduling: Optimize network overhead, keep data flow within the same data center, avoid cross‑region transfers that can cause bandwidth contention.

Resource Scaling: Leverage Spark on Yarn’s dynamic allocation and Flink on K8s native mode for automatic scaling, while retaining a buffer for unexpected spikes.

Storage Selection for Different Metrics:

Second‑level metrics: Use Redis for simple, high‑frequency reads (e.g., QPS, error rate); use ClickHouse for complex queries requiring joins, aggregations, or time‑series analysis.

Minute‑level and above: Store in ClickHouse; ingest from MySQL via binlog for incremental updates when MySQL load is low, otherwise pull from read replicas.

Product Thinking:

Which businesses need a real‑time dashboard? Scenarios with strict timeliness such as live classrooms or enrollment bursts, where rapid issue detection directly impacts user experience.

Which metrics to display? Business‑level KPIs (orders, online users, messages) and technical KPIs (interface QPS, system capacity, SLA).

How to define dashboard value? Measure active online users during critical periods and the dashboard’s ability to help users discover problems promptly through alerts and pre‑warnings.

Future Planning: Provide a self‑service capability allowing users to configure dashboards via drag‑and‑drop, and expand templates to cover more business scenarios, thereby reducing cost and improving efficiency.

Business Cases:

1. Live Classroom – Monitoring modules for live stream, messaging, and interaction; health maps visualize end‑to‑end latency, attendance, and complaint metrics.

2. Teaching Business (Enrollment) – Displays nationwide school schedules, live‑class attendance, streaming quality, real‑time payment peaks, success rates, and enrollment participation counts.

Future Cloud – Business Monitoring invites data‑real‑time‑computation, OLAP, and data‑product experts to join and empower Good Future’s businesses. Interested candidates can send their resumes to [email protected] .

ClickHousereal-time monitoringDashboardSpark
TAL Education Technology
Written by

TAL Education Technology

TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.