Cloud Native 22 min read

Building a System Observability Framework with YHP: Practices, Challenges, and Integrated Solutions

This article explains how YHP enables cloud‑native systems to achieve comprehensive observability by defining the three core signals—metrics, traces, and logs—addressing common enterprise pain points, and presenting an integrated platform that unifies data collection, storage, analysis, and visualization for efficient fault diagnosis and performance monitoring.

DataFunTalk
DataFunTalk
DataFunTalk
Building a System Observability Framework with YHP: Practices, Challenges, and Integrated Solutions

Observability has become a hot topic as cloud‑native microservice architectures increase system complexity, making fault diagnosis harder; a well‑observed system can boost production efficiency, product quality, and user satisfaction.

The article outlines four parts: (1) how to build an observability system, (2) typical enterprise pain points, (3) an introduction to Yanhuang Data and its products, and (4) YHP’s own observability practice.

It defines the three essential observability signals—Metrics, Traces, and Logs—explaining their roles in monitoring system health, locating issues, and recording events, and notes that additional signals such as dumps, profiles, and events can complement them.

Enterprise challenges include high system complexity, data silos, and poor user experience, which stem from the difficulty of integrating multiple data types and tools.

The proposed solution is an integrated observability platform that unifies data collection (using the self‑developed DataScale collector), storage, and analysis, reducing complexity and providing a consistent user interface for querying, visualization, and alerting.

YHP’s architecture leverages cloud‑native microservices on Kubernetes, supports both single‑node and clustered deployments, and offers standard RESTful APIs for extensibility.

Data collection covers metrics (via Prometheus exporters), traces (via OpenTelemetry), and logs, with DataScale handling source configuration, metadata enrichment, and preprocessing before ingestion.

Collected data are stored in separate datasets for each signal to optimize query performance and lifecycle management, while still allowing cross‑signal correlation through shared metadata such as trace IDs.

The platform provides unified SQL queries that can join traces and logs, visual dashboards, and alerting mechanisms, demonstrating the benefits of eliminating data silos.

Additional capabilities include dumps collection via shared Kubernetes PVs, Kubernetes health monitoring using Kuberhealthy with Prometheus exporters, and a unified visualization and alerting cockpit.

In conclusion, the integrated observability platform simplifies data pipelines, lowers operational costs, enhances user experience, and unlocks deeper insights from combined observability data.

cloud nativeobservabilitymetricsdata-platformLogsTraces
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.