Big Data Technology & Architecture
Author

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

1.0k
Articles
0
Likes
41
Views
0
Comments
Recent Articles

Latest from Big Data Technology & Architecture

100 recent articles max
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 15, 2025 · Big Data

From Operations to Data Engineering: A Student’s Real‑World Journey and Practical Guide

This article shares a data‑engineering student’s personal experience—from a misaligned operations role to mastering big‑data technologies, building a portfolio, crafting a targeted resume, and navigating multi‑stage interviews—offering concrete advice and a structured learning roadmap for aspiring data professionals.

Big DataInterview preparationLearning Path
0 likes · 14 min read
From Operations to Data Engineering: A Student’s Real‑World Journey and Practical Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 13, 2025 · Big Data

How Apache Paimon Manages Snapshot Expiration: Synchronous vs Asynchronous Modes

This article explains Apache Paimon's snapshot expiration mechanism, comparing synchronous and asynchronous execution modes, their advantages and drawbacks, and how table properties control expiration to balance data consistency, performance, and back‑pressure in large‑scale data processing systems.

Apache PaimonData ConsistencySynchronous
0 likes · 6 min read
How Apache Paimon Manages Snapshot Expiration: Synchronous vs Asynchronous Modes
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 2, 2025 · Big Data

Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details

This article provides a comprehensive overview of Apache Paimon, covering its real‑time lake ingestion, unified stream‑batch processing, table types (primary‑key and append‑only), LSM‑tree storage, bucket mechanisms, merge‑engine options, compaction strategies, concurrency control, consumption methods, tag management, data cleanup, and system tables for big‑data workloads.

Apache PaimonBig DataFlink
0 likes · 25 min read
Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2024 · Big Data

Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables

This article explains how Spark ≥ 3.3 introduces Storage Partitioned Join (SPJ) to avoid costly shuffle operations when joining partitioned V2 source tables such as Apache Iceberg, detailing the required conditions, configuration settings, practical code examples, and various join scenarios including mismatched partitions and data skew.

BucketingData SkewSQL
0 likes · 15 min read
Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 26, 2024 · Fundamentals

Detailed Granularity Fact Tables (DWD): Types, Design Principles, and Comparison

The article explains the three detailed-granularity fact table types—transaction, periodic snapshot, and cumulative snapshot—detailing their purposes, design principles, and comparative usage, and offers a simplified interpretation to help data engineers choose the appropriate fact table for data warehouse modeling.

Big DataDWDData Modeling
0 likes · 5 min read
Detailed Granularity Fact Tables (DWD): Types, Design Principles, and Comparison
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 18, 2024 · Big Data

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

The article reviews the major directions of Flink 2.0—including compute‑storage separation, a new Materialized Table for unified batch‑stream processing, and deeper integration with Paimon for streaming warehouses—while offering a cautious perspective on their practical impact and migration challenges.

Batch-Stream IntegrationBig DataCompute-Storage Separation
0 likes · 5 min read
Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 12, 2024 · Big Data

Understanding Time Travel and Snapshot Retention in Lake Frameworks (Hudi & Paimon)

This article explains how lake frameworks like Hudi and Paimon implement Time Travel by recording older data versions, the snapshot retention policies that limit historical data access, and practical recommendations for managing snapshots and consumption patterns to reduce storage costs in large‑scale data warehouses.

Big DataHudiPaimon
0 likes · 7 min read
Understanding Time Travel and Snapshot Retention in Lake Frameworks (Hudi & Paimon)