Big Data Technology Architecture
Author

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

290
Articles
0
Likes
602
Views
0
Comments
Recent Articles

Latest from Big Data Technology Architecture

100 recent articles max
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergConcurrency Control
0 likes · 18 min read
Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures
Big Data Technology Architecture
Big Data Technology Architecture
Aug 13, 2022 · Big Data

Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices

This article details Xiaomi's three‑year journey of adopting Apache Doris across dozens of internal services, describing the transition from a Spark‑SQL‑based Lambda architecture to a unified MPP database, performance benchmarks, data ingestion pipelines, compaction tuning, two‑phase commit, single‑replica writes, monitoring, and community contributions.

Apache DorisData WarehouseMPP
0 likes · 19 min read
Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices
Big Data Technology Architecture
Big Data Technology Architecture
Jul 28, 2022 · Big Data

Reflections on Data Governance Challenges and Approaches

The author shares a candid account of transitioning from a non‑data role to confronting data‑centric bottlenecks, describing the current state of data projects, common pitfalls, and practical thoughts on simplifying data governance within limited resources and budget constraints.

DAMAData ManagementData quality
0 likes · 7 min read
Reflections on Data Governance Challenges and Approaches
Big Data Technology Architecture
Big Data Technology Architecture
Jul 15, 2022 · Big Data

Using and Designing the Apache SeaTunnel Examples Module

This article introduces Apache SeaTunnel's Examples module, compares SeaTunnel with DataX, explains its multi‑engine design, demonstrates Flink and Spark example implementations, and shares the speaker's experiences contributing to the open‑source community, providing practical guidance for big‑data integration projects.

Apache SeaTunnelData integrationFlink
0 likes · 10 min read
Using and Designing the Apache SeaTunnel Examples Module
Big Data Technology Architecture
Big Data Technology Architecture
Jul 14, 2022 · Operations

Postmortem of Bilibili SLB Outage on July 13, 2021

This postmortem details the July 13, 2021 Bilibili outage caused by a Lua‑induced CPU 100% bug in the OpenResty‑based SLB, describing the incident timeline, root‑cause analysis, mitigation steps, and the subsequent technical and process improvements to enhance reliability and multi‑active deployment.

IncidentLoad BalancerLua
0 likes · 16 min read
Postmortem of Bilibili SLB Outage on July 13, 2021
Big Data Technology Architecture
Big Data Technology Architecture
Jul 2, 2022 · Fundamentals

Indirect Shareholding Ratio Calculation Using Graph Techniques

This article explains how to compute indirect shareholding ratios between companies by generating synthetic relationship data, cleaning and normalizing it with multiprocessing, constructing a weighted directed graph using NetworkX, and applying a matrix‑based algorithm to derive the final ownership matrix.

Pythondata processinggraph-analysis
0 likes · 7 min read
Indirect Shareholding Ratio Calculation Using Graph Techniques
Big Data Technology Architecture
Big Data Technology Architecture
Jun 29, 2022 · Fundamentals

Deriving Data Lineage from Python Code Using AST and Pyflakes

This article explains how to automatically extract data lineage and code dependencies from large collections of Python scripts by leveraging the language's compilation stages, abstract syntax trees, and the Pyflakes static‑analysis library, providing practical code examples and custom parsers for SQL extraction.

ASTCode Parsingbig data
0 likes · 12 min read
Deriving Data Lineage from Python Code Using AST and Pyflakes