Apache Hudi Asia Summit Successfully Held
The first Apache Hudi Asia Summit in Beijing attracted over 230 attendees, featuring technical discussions on data lake optimization and case studies from companies like Fastly and Meituan.
0 views collected around this technical thread.
The first Apache Hudi Asia Summit in Beijing attracted over 230 attendees, featuring technical discussions on data lake optimization and case studies from companies like Fastly and Meituan.
Taobao’s AIGC pipeline combines a human‑feedback multimodal reward model, audio‑visual joint pre‑training, and Mixture‑of‑Experts distillation to clean data, align outputs with user preferences, and achieve state‑of‑the‑art multimodal LLM performance that drives content cold‑start and conversion gains in e‑commerce.
This presentation details Tencent's real‑time lakehouse architecture and the four key topics—lakehouse design, intelligent optimization services, scenario‑driven capabilities, and future outlook—covering components such as Spark, Flink, Iceberg, Auto‑Optimize Service, indexing, clustering, AutoEngine, and PyIceberg implementations.
This article presents Tencent's end‑to‑end real‑time lakehouse architecture, detailing its three‑layer design, the Auto Optimize Service modules such as compaction, indexing, clustering and engine acceleration, as well as scenario‑driven capabilities like multi‑stream joins, primary‑key tables, in‑place migration and PyIceberg support, and concludes with future optimization directions.
This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.
This article discusses the implementation of Apache Kylin as an OLAP engine for logistics data, focusing on optimizing cube building and query performance to handle large-scale, high-dimensional data analytics.
The article explores how early 1980s game programmers managed extremely limited memory for graphics, audio, and code by using techniques such as tile-based rendering, simple audio synthesis, and data size estimation, highlighting the stark contrast with modern development resources.
The article surveys unconventional offline data‑task optimizations—such as distribution‑by, seeded random shuffling, explode‑based skew mitigation, hash bucketing, task‑parallelism tuning, and multi‑insert materialization—organized by point, line, and surface perspectives, and stresses that effective performance gains require both technical tricks and business‑driven pipeline adjustments.
This article details Bilibili's lakehouse implementation using Apache Iceberg and Alluxio, covering background challenges, architectural components, data organization techniques like Z‑order and bitmap indexes, performance benchmarks, and future optimization plans for large‑scale analytics.
This article examines performance bottlenecks in a high‑traffic e‑commerce product service and proposes data‑centric optimizations—including read‑only focus, field‑level selection via bit‑masking, and Redis hash storage—to reduce payload size, lower GC pressure, and improve latency while maintaining scalability.
This article examines the challenges of mobile‑based trajectory tracking in city management and presents a comprehensive set of optimizations—including adaptive GPS sampling, keep‑alive strategies, accuracy enhancements, algorithmic fitting, and cinematic animation effects—to produce smooth, accurate, and visually appealing trajectory displays at scale.