Tag

real-time ETL

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jan 8, 2023 · Big Data

ByteDance Event‑Tracking Data Cost Governance Practices

This article describes ByteDance's comprehensive approach to managing the massive volume of event‑tracking (埋点) data, detailing the background, cost‑reduction strategies, experience review, future plans, and a Q&A session that together illustrate how systematic data governance can dramatically cut storage and processing expenses.

ByteDanceData Governancebig data
0 likes · 18 min read
ByteDance Event‑Tracking Data Cost Governance Practices
Didi Tech
Didi Tech
Jul 1, 2021 · Big Data

Full-Chain Traffic Data Detection in DiDi's Omega Platform

DiDi’s Omega platform provides an end‑to‑end traffic‑data pipeline—from SDK collection through real‑time and offline ETL to storage and analysis—augmented by a detection service that measures loss, duplication and accuracy, achieving sub‑1% SDK loss, integrity tagging, comprehensive monitoring dashboards, and includes a senior data‑engineer hiring call.

Data PipelineOmega platformbig data
0 likes · 9 min read
Full-Chain Traffic Data Detection in DiDi's Omega Platform
Big Data Technology Architecture
Big Data Technology Architecture
Mar 2, 2021 · Big Data

Implementing Real-Time Log Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

This article describes how a data engineering team replaced nightly batch ETL with a Delta Lake‑based real‑time log ingestion pipeline on EMR, detailing the motivations, architecture, implementation steps, encountered issues such as data skew and schema evolution, and the practical solutions they applied to achieve low‑latency, reliable data delivery.

Data LakeDelta LakeHive
0 likes · 14 min read
Implementing Real-Time Log Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
Aug 1, 2019 · Big Data

Streaming Data Platform Practices and Challenges at Beike Real Estate

This article presents an in‑depth overview of Beike's four‑layer streaming data platform, covering the foundational infrastructure, capability aggregation, data content, and output layers, as well as the challenges of metadata management, real‑time processing, and productization through the Ark and Tianyan systems.

Ark platformBeikeTianyan
0 likes · 14 min read
Streaming Data Platform Practices and Challenges at Beike Real Estate