Big Data Technology Architecture
Author

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

290
Articles
0
Likes
602
Views
0
Comments
Recent Articles

Latest from Big Data Technology Architecture

100 recent articles max
Big Data Technology Architecture
Big Data Technology Architecture
Oct 25, 2022 · Big Data

Rebuilding Shopee's Data Integration Platform with Apache SeaTunnel

Shopee faced fragmented data‑ingestion pipelines, limited source support, and high maintenance overhead, so it evaluated open‑source tools and adopted Apache SeaTunnel to unify batch and streaming data transfers, simplify ETL workflows, and provide a scalable, extensible solution for its multi‑TB daily data processing needs.

ApacheData integrationETL
0 likes · 17 min read
Rebuilding Shopee's Data Integration Platform with Apache SeaTunnel
Big Data Technology Architecture
Big Data Technology Architecture
Oct 18, 2022 · Databases

Debezium 2.0.0.Final Release: New Features, Connector Enhancements, and Improvements

Debezium 2.0.0.Final introduces major enhancements such as Java 11 migration, improved incremental snapshot controls, multi‑partition support, new storage modules, pluggable topic naming, expanded connector capabilities for Cassandra, MongoDB, MySQL, Oracle, PostgreSQL and Vitess, plus ARM64 container images and community updates.

Change Data CaptureDatabase ConnectorsDebezium
0 likes · 28 min read
Debezium 2.0.0.Final Release: New Features, Connector Enhancements, and Improvements
Big Data Technology Architecture
Big Data Technology Architecture
Oct 15, 2022 · Operations

The Rise of Platform Engineering: From DevOps Frustrations to Internal Developer Platforms

This article explains how platform engineering emerges from DevOps frustrations, defining internal developer platforms, outlining their principles, benefits, and implementation guidelines, and showing why organizations should adopt them to reduce cognitive load and improve developer productivity.

Internal Developer Platformoperationsplatform engineering
0 likes · 11 min read
The Rise of Platform Engineering: From DevOps Frustrations to Internal Developer Platforms
Big Data Technology Architecture
Big Data Technology Architecture
Oct 10, 2022 · Big Data

Integrating Apache Hudi with MinIO: A Comprehensive Tutorial

This tutorial explains how to set up Apache Hudi on cloud‑native object storage with MinIO, covering Hudi’s architecture, file format, timeline, write and read paths, core features, schema evolution, and step‑by‑step Spark commands for ingesting, updating, deleting, and querying data in a streaming data‑lake environment.

Apache HudiMinIOSpark
0 likes · 26 min read
Integrating Apache Hudi with MinIO: A Comprehensive Tutorial
Big Data Technology Architecture
Big Data Technology Architecture
Sep 18, 2022 · Backend Development

Design and Source Code Analysis of Apache DolphinScheduler

This article provides an in‑depth technical overview of Apache DolphinScheduler, covering its distributed design strategies, fault‑tolerance mechanisms, remote log access, source‑code module breakdown, API interfaces, Quartz integration, master‑worker execution flows, RPC communication, load‑balancing algorithms, logging services, and community contribution guidelines.

DolphinSchedulerLoad BalancingLog Service
0 likes · 47 min read
Design and Source Code Analysis of Apache DolphinScheduler
Big Data Technology Architecture
Big Data Technology Architecture
Sep 17, 2022 · Databases

Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry

This article describes how Bilibili redesigned its log service by replacing Elasticsearch with ClickHouse, introducing OpenTelemetry‑based logging, optimizing storage, query, and alerting components, and enhancing ClickHouse features such as configuration tuning, Map types, and implicit columns to achieve higher performance, lower cost, and better observability.

ClickHouseOpenTelemetrydatabase optimization
0 likes · 28 min read
Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

The Apache Hudi 0.12.0 release introduces a native Presto connector, archive‑beyond‑savepoint capability, file‑system based locking, new deltastreamer termination strategies, expanded Spark and Flink support, numerous performance enhancements, and a series of configuration and API updates for better data‑lake management.

Apache HudiFlinkPresto
0 likes · 12 min read
Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates