Tagged articles

Schema Evolution

13 articles · Page 1 of 1

Jun 18, 2026 · Big Data

How AI-Driven Real-Time Data Lakes Are Ditching ETL: A Kafka‑to‑Iceberg Architecture Simplification

In the AI era, enterprises need a data foundation that supports both low‑latency streaming and long‑term analytics, and the combination of Kafka, Iceberg and object storage is emerging as a preferred solution; by moving ingestion capabilities closer to the message layer and eliminating external ETL jobs, a "zero‑ETL" approach reduces architectural complexity, improves consistency, and streamlines schema evolution and small‑file management.

CDCData LakeIceberg

0 likes · 27 min read

How AI-Driven Real-Time Data Lakes Are Ditching ETL: A Kafka‑to‑Iceberg Architecture Simplification

Alibaba Cloud Big Data AI Platform

Aug 29, 2025 · Big Data

How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery

This article details how a leading Southeast Asian tech group migrated its real‑time write workloads from Google BigQuery to MaxCompute using MaxCompute Streaming Insert, covering architecture, core features, migration challenges, optimization strategies, business impact, and future enhancements.

Big DataBigQuery MigrationMaxCompute

0 likes · 9 min read

How MaxCompute Streaming Insert Revolutionized Real‑Time Data Migration from BigQuery

Java Baker

Jul 7, 2025 · Databases

Choosing the Right Database Schema for Dynamic Business Field Expansion

This article compares five common database extension strategies—from simple MySQL column additions to a hybrid MySQL‑HBase solution—detailing their implementation, advantages, drawbacks, and ideal scenarios, helping architects select the most scalable and maintainable design for evolving business data requirements.

Database DesignDynamic FieldsHBase

0 likes · 8 min read

Choosing the Right Database Schema for Dynamic Business Field Expansion

Big Data Technology & Architecture

Aug 20, 2024 · Big Data

Practical Insights on Using Apache Paimon for Real-World Data Lake Scenarios

This article shares a personal, experience‑driven overview of Apache Paimon, highlighting its design simplicity, key capabilities such as schema evolution, stream‑batch unified processing, primary‑key support, and closed‑loop data handling, while discussing when its features are appropriate for production environments.

Apache PaimonBatch ProcessingBig Data

0 likes · 5 min read

Practical Insights on Using Apache Paimon for Real-World Data Lake Scenarios

StarRocks

May 22, 2024 · Big Data

Unlocking Data Lake Power: Iceberg Architecture & StarRocks Acceleration

Apache Iceberg offers a modern, ACID‑compliant table format for data lakes with features like hidden partitions and schema evolution, while StarRocks provides high‑performance query acceleration, metadata caching, and distributed planning to address Iceberg’s latency challenges, enabling seamless lake‑warehouse integration and real‑time analytics.

Apache IcebergData LakeMetadata Caching

0 likes · 19 min read

Unlocking Data Lake Power: Iceberg Architecture & StarRocks Acceleration

DataFunSummit

Mar 17, 2024 · Big Data

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

This article presents OPPO's smart data lakehouse solution, describing its massive EB‑scale architecture, the integration of batch and streaming engines, the Glacier service for table management, schema‑adaptive ingestion, performance optimizations, and future technical road‑maps for unified data processing.

Big DataData LakehouseFlink

0 likes · 15 min read

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

DataFunSummit

Oct 1, 2023 · Big Data

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

This presentation introduces Iceberg's core capabilities, details Xiaomi's practical applications—including log ingestion, near‑real‑time warehousing, offline challenges, column‑level encryption, and Hive migration—and outlines future development directions such as materialized views and cloud migration, providing a comprehensive view of modern data‑lake engineering.

Big DataData LakeFlink

0 likes · 22 min read

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

DataFunTalk

Jun 26, 2023 · Big Data

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

This presentation details Iceberg's core capabilities—transactional writes, schema evolution, implicit partitioning, and row‑level updates—while showcasing Xiaomi's real‑world applications such as log ingestion redesign, near‑real‑time warehousing, offline optimizations, column‑level encryption, Hive migration strategies, and outlining upcoming enhancements like materialized views and cloud migration.

Big DataColumn EncryptionData Lake

0 likes · 20 min read

DataFunTalk

May 11, 2023 · Big Data

Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap

This article describes how ByteDance tackled petabyte‑scale feature storage by adopting Apache Iceberg, detailing the problem background, design choices, implementation of COW and MOR back‑fill strategies, performance optimizations, and future plans such as lake‑cold‑layering and materialized views.

Apache IcebergBig DataData Lake

0 likes · 16 min read

Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap

DataFunSummit

Oct 29, 2022 · Big Data

Apache Iceberg in Tencent: Architecture, Spark Read/Write, Production Practices, and Data Governance

This article presents an in‑depth overview of Apache Iceberg as used at Tencent, covering its table format architecture, Spark read/write mechanisms, production challenges and optimizations such as schema evolution, file filtering, upsert strategies, and the surrounding data‑governance services.

Apache IcebergBig DataData Governance

0 likes · 19 min read

Apache Iceberg in Tencent: Architecture, Spark Read/Write, Production Practices, and Data Governance

JavaEdge

Jun 26, 2022 · Backend Development

Ensuring Forward and Backward Compatibility in Distributed Systems

This article explains why forward and backward compatibility are crucial for evolving systems, covering database encoding, schema evolution, REST and RPC communication, message brokers, and actor frameworks, and provides practical guidance for designing compatible data flows across services.

Message BrokerRESTRPC

0 likes · 22 min read

Ensuring Forward and Backward Compatibility in Distributed Systems

dbaplus Community

Jan 15, 2020 · Databases

How Didi Built Fusion-NewSQL: A High‑Throughput, Low‑Latency NewSQL on Distributed KV

Fusion-NewSQL is Didi’s internally‑developed NewSQL system built atop the Fusion distributed KV store, offering MySQL compatibility, high throughput, low latency, flexible schema changes, secondary indexes, and integration with ElasticSearch and Hive, with detailed architecture, data flow, and future roadmap.

Distributed storageMySQL CompatibilityNewSQL

0 likes · 16 min read

How Didi Built Fusion-NewSQL: A High‑Throughput, Low‑Latency NewSQL on Distributed KV

Didi Tech

Oct 8, 2019 · Databases

Design and Implementation of Fusion-NewSQL: A NewSQL System Built on Distributed NoSQL Storage

Fusion‑NewSQL is a NewSQL layer built atop Didi’s distributed KV store Fusion, translating MySQL queries into Redis‑style hashmaps, asynchronously maintaining secondary indexes, supporting fast Hive‑to‑Fusion loads and Elasticsearch integration, thereby delivering over 2 million QPS, 600 TB storage and flexible schema evolution for dozens of services.

IndexingMySQL CompatibilityNewSQL

0 likes · 15 min read

Design and Implementation of Fusion-NewSQL: A NewSQL System Built on Distributed NoSQL Storage