Big Data 24 min read

Log Analysis and Schema‑On‑Read: Design and Implementation of the Honghu Real‑Time Heterogeneous Data Platform

This article examines the challenges and value of log analysis, introduces the concepts of schema‑on‑read versus schema‑on‑write, and details how the Honghu platform implements real‑time, one‑stop heterogeneous data analytics with flexible storage, indexing, and SQL‑based query engines.

DataFunTalk
DataFunTalk
DataFunTalk
Log Analysis and Schema‑On‑Read: Design and Implementation of the Honghu Real‑Time Heterogeneous Data Platform

The article begins by outlining the importance of log analysis for both consumer (e.g., e‑commerce recommendation) and enterprise scenarios (e.g., process mining, IT operations), and highlights technical challenges such as high ingestion rates, diverse and evolving log formats, and the need for both batch and ad‑hoc queries.

It then contrasts two technical approaches: Schema‑On‑Write (traditional ETL, requiring predefined tables and schemas) and Schema‑On‑Read (also called schema‑on‑read or ELT, where raw logs are stored and schemas are derived at query time). The advantages of schema‑on‑read—flexibility, lower storage cost, and faster ingestion—are explained, along with its trade‑offs compared to schema‑on‑write.

The piece introduces the Honghu platform (by Yanhuang Data) as a free, one‑stop solution for heterogeneous real‑time data analysis. It describes Honghu’s storage model (schema‑less event sets, time‑stamp indexing, inverted indexing, column‑ariented shards, and automatic handling of schema evolution) and its compute engine, which offers a schemaless SQL interface, scalar and table functions, CTE support, and materialized view acceleration.

Key technical details include automatic JSON/key‑value parsing, field extraction, dynamic schema union, vectorized execution, JIT‑compiled functions, and the ability to reuse modeling logic via views. The platform also supports pipelines that combine read‑time and write‑time modeling, enabling seamless transition from traditional ETL to schema‑on‑read workflows.

Finally, a Q&A section addresses migration strategies from ETL to schema‑on‑read, automatic type inference, and use cases such as IoT sensor data, emphasizing that both approaches can complement each other in modern big‑data environments.

Big DataReal-time AnalyticsData Platformlog analysisschema-on-read
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.