DataFun Summit 2022 – Modern Data Stack Forum: Speaker Lineup and Session Overviews
The DataFun Summit 2022 featured a Data Lake & Warehouse forum with expert talks on PALO, ByteDance LAS, Iceberg at Huawei, and Presto‑Alluxio acceleration, providing detailed technical outlines, speaker backgrounds, and audience takeaways for modern big‑data architectures.
On September 17, 2022, from 09:00 to 12:45, the DataFun Summit 2022: Modern Data Stack Technology Summit hosted a Data Lake & Warehouse forum organized by Alluxio evangelist Fu Zhengjia , inviting frontline experts from Baidu, ByteDance, Huawei, Databricks, and Alluxio.
Fu Zhengjia (Alluxio evangelist) holds a Ph.D. in Information Engineering from CUHK and has published papers on computer networks and distributed systems; previously served as Machine Learning R&D Director at Bigo Technology.
Peng Xiangyu (Senior R&D Engineer, Baidu) graduated from Shanghai Jiao Tong University and has ten years of big‑data engineering experience with Hadoop, Spark, Flink, ClickHouse, etc., leading projects such as CloudMap, Minos, and Pingo, and currently works on real‑time data warehouse development in Baidu's PALO team.
Session: "From Apache Doris Compute‑Storage Separation to PALO’s Lake‑Warehouse Integration" – Outline: (1) History of Baidu PALO real‑time warehouse and its lineage with Doris; (2) PALO data storage structure; (3) Implementation of PALO’s compute‑storage separation; (4) Practical experiences and future directions of lake‑warehouse integration. Audience will learn PALO’s data model, the principles of compute‑storage separation, and real‑world lake‑warehouse practices.
Geng Xiaoyu (Data Platform Engineer, ByteDance) holds a master’s degree from Nanjing University’s PASA Lab and focuses on data‑lake deployment.
Session: "ByteDance LAS Data Lake Storage Engine Revealed" – Outline: (1) Challenges in data‑lake production; (2) Metadata services for data lakes; (3) Asynchronous operation management services; (4) Future plans. Audience will understand production issues, metadata management solutions, and asynchronous operation services for data lakes.
Li Liwei (Senior Big‑Data Engineer, Huawei) is an active contributor to Apache Iceberg.
Session: "Exploring Iceberg in Huawei’s Edge Cloud" – Outline: (1) Overall architecture overview; (2) Application scenarios; (3) Feature enhancements. Audience will learn how Iceberg reduces storage at scale, the ecosystem around Iceberg, and real‑time processing on Iceberg.
Fan Wencen (Technical Lead, Databricks) is an Apache Spark PMC member and a top contributor to the Spark community.
Session: "Lakehouse Technology as the Future of Data Warehousing" – Outline: Introduction to lakehouse architecture concepts and practical experiences building lakehouse systems. Audience will gain an understanding of lakehouse design and industry implementation insights.
Wang Beinan (Software Engineer, Alluxio) holds a Ph.D. in Computer Engineering from Syracuse University, contributes to Presto, Iceberg, Druid, and Parquet modules, and previously led large‑scale distributed SQL development at Twitter.
Session: "Presto + Alluxio Accelerating Iceberg Data Lake Access" – Outline: (1) Overview of Presto‑Iceberg connector; (2) Discussion on eliminating Hive Metastore and ensuring metadata consistency; (3) Encrypted storage of Parquet format; (4) Partition pruning in Presto and Alluxio’s local cache; (5) Advanced pruning and semantic caching; (6) Future work on Arrow and native operators. Audience will learn the latest advances in Presto‑Iceberg connector, Alluxio caching benefits, and future research directions.
How to participate: Scan the QR code to join the live‑stream group.
About DataFun: Founded in 2017, DataFun focuses on sharing and exchanging big‑data and AI technologies, having organized over 100 offline and online events across major Chinese cities, attracting more than 2,000 experts and publishing over 700 original articles with millions of reads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
