Databricks Acquires Tabular, the Company Behind Apache Iceberg, to Boost Lakehouse Interoperability
Databricks announced its acquisition of Tabular, the creators of Apache Iceberg, aiming to unify lakehouse formats through Delta Lake UniForm, while highlighting the rise of lakehouse architecture, format fragmentation, and the push toward open data interoperability.
On June 4, 2024, Databricks announced it will acquire Tabular, the data‑management startup founded by Ryan Blue, Daniel Weeks, and Jason Reid. The deal brings together the founders of Apache Iceberg™ and the Linux Foundation’s Delta Lake, the two leading open‑source lakehouse formats, with the goal of eliminating format‑related data compatibility concerns.
Lakehouse Architecture’s Rise and Format Incompatibility Databricks coined the lakehouse concept in 2020 to combine traditional data‑warehouse and AI workloads on a single managed data copy, requiring all data to be stored in open formats so that diverse engines can share it. Four years later, 74 % of enterprises have deployed lakehouse architectures, contrasting with proprietary warehouses that lock data behind vendor‑specific SQL engines.
The foundation of lakehouses is an open data format that provides ACID transactions on object storage, optimized for engines such as Apache Spark™, Trino, and Presto. In partnership with the Linux Foundation, Databricks launched the Delta Lake project, which now has over 500 code contributors and serves more than 10,000 companies processing over 4 EB of data daily.
Around the same time Delta Lake was created, Ryan Blue and Daniel Weeks launched the Iceberg project at Netflix and donated it to the Apache Software Foundation. Both Delta Lake and Iceberg build on Apache Parquet and share similar goals, yet their independent development paths have resulted in incompatibility between the two formats.
As more open‑source and proprietary engines adopt these formats, most choose only one—or even just a subset—leading to fragmented and isolated data, which diminishes the value of the lakehouse model.
The Road to Interoperability To realize the full benefits of lakehouses, Databricks is working closely with the Delta Lake and Iceberg communities to achieve cross‑format interoperability, a multi‑year effort. Last year Databricks introduced Delta Lake UniForm, a table abstraction that enables interoperability among Delta Lake, Iceberg, and Hudi and supports Iceberg’s REST catalog interface. UniForm is now widely used, helping companies process data with familiar analytics engines regardless of the underlying format, and the acquisition of the original Iceberg team will expand its impact.
Ali Ghodsi, co‑founder and CEO of Databricks, said the lakehouse architecture has become a global standard that reduces total‑ownership cost and accelerates AI projects, but the split between Delta Lake and Iceberg creates friction. Ryan Blue added that Iceberg was created to address data correctness, performance, and scalability, and that with Tabular’s addition Databricks can build a superior data‑management platform that frees enterprises from choosing a “right” format.
Both Databricks and Tabular have a history of supporting open‑source formats. Databricks, now the largest independent open‑source company, has contributed roughly 12 million lines of code to open‑source projects. Tabular, founded by the original Iceberg creators, offers a standalone data platform that solves infrastructure gaps for data engineers and scientists.
Databricks serves over 10,000 organizations—including Block, Comcast, Condé Nast, Rivian, Shell, and more than 60 % of Fortune 500 companies—through its data‑intelligence platform. The company was founded by the creators of Lakehouse, Apache Spark™, Delta Lake, and MLflow.
The proposed acquisition is subject to customary closing conditions and is expected to close in Databricks’s second fiscal quarter.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Past Memory Big Data
A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
