Tagged articles
5 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 20, 2025 · Big Data

How Alibaba Cloud Handles Petabyte-Scale Data Lake Migration Without Downtime

This article explores the challenges of migrating hundreds of petabytes of data to a cloud data lake and details Alibaba Cloud's Data Lake Formation solution, covering bulk and incremental migration, dual‑run validation, automated path routing, and comprehensive project management to achieve near‑zero downtime and sub‑0.1% data deviation.

Alibaba CloudDLFcloud migration
0 likes · 11 min read
How Alibaba Cloud Handles Petabyte-Scale Data Lake Migration Without Downtime
DataFunSummit
DataFunSummit
Oct 30, 2022 · Big Data

Integrating Apache Spark with Cloud‑Native Technologies: Principles, Kubernetes Deployments, EMR on ACK, and Serverless Spark on DLF

This article examines the challenges of traditional Spark clusters and explains how integrating Spark with cloud‑native platforms—through Kubernetes deployment modes, EMR on ACK practices, Remote Shuffle Service, and serverless Spark on DLF—provides elastic scaling, lower operational costs, and advanced features such as executor rolling and custom scheduler support.

Big DataDLFKubernetes
0 likes · 18 min read
Integrating Apache Spark with Cloud‑Native Technologies: Principles, Kubernetes Deployments, EMR on ACK, and Serverless Spark on DLF
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

CDCDLFDelta Lake
0 likes · 10 min read
Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
May 13, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Delta Lake, an open‑source storage layer from Databricks, provides ACID transactions, data versioning, schema evolution, and unified batch‑stream processing, with a detailed file structure and metadata mechanism, while Alibaba Cloud EMR enhances it with advanced DML, performance optimizations, deep DLF integration, and solutions for G‑SCD and CDC.

CDCDLFData Lakehouse
0 likes · 11 min read
Unlocking Delta Lake: Key Features, Architecture, and EMR Integration