Big Data Technology Architecture
Author

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

290
Articles
0
Likes
602
Views
0
Comments
Recent Articles

Latest from Big Data Technology Architecture

100 recent articles max
Big Data Technology Architecture
Big Data Technology Architecture
Jun 14, 2022 · Big Data

Applying Apache DolphinScheduler in a Big Data Platform: Architecture, Migration, and Future Plans

This presentation details the background, redesign, and migration of a large‑scale data platform at Dangbei Network Technology, focusing on the adoption of Apache DolphinScheduler, ClickHouse migration, storage and compute separation, monitoring solutions, and the roadmap for future upgrades and open‑source involvement.

Apache DolphinSchedulerClickHouseHA
0 likes · 12 min read
Applying Apache DolphinScheduler in a Big Data Platform: Architecture, Migration, and Future Plans
Big Data Technology Architecture
Big Data Technology Architecture
Jun 9, 2022 · Databases

Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned

This article details how a fast‑growing supply‑chain platform migrated from MySQL and Hive to Apache Doris for real‑time analytics, describing the architectural evolution, the advantages of the new design, practical implementation steps, encountered challenges, and the performance and cost benefits achieved.

Apache DorisData integrationFlink CDC
0 likes · 12 min read
Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned
Big Data Technology Architecture
Big Data Technology Architecture
Jun 8, 2022 · Big Data

Bilibili Offline Computing Platform: Migration from Hive to Spark and Comprehensive Performance Optimizations

The article details Bilibili's evolution of its offline computing platform from Hadoop‑based Hive to Spark, describing migration tools, SQL conversion, result and resource comparison, shuffle stability, small‑file handling, runtime filters, data skipping, ZSTD support, Hive Metastore federation, traffic control, and future optimization directions.

HiveSparkdata migration
0 likes · 29 min read
Bilibili Offline Computing Platform: Migration from Hive to Spark and Comprehensive Performance Optimizations
Big Data Technology Architecture
Big Data Technology Architecture
Jun 7, 2022 · Big Data

Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits

This article explains the motivation, design principles, implementation details, and performance improvements of the new multi‑modal indexing subsystem introduced in Apache Hudi 0.11.0 for Lakehouse architectures, covering scalable metadata, ACID updates, fast lookups, file listing, data skipping, upsert performance, and future work.

Apache HudiIndexingmetadata
0 likes · 19 min read
Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits
Big Data Technology Architecture
Big Data Technology Architecture
Jun 5, 2022 · Big Data

Introduction to Data Lake Concepts, Capabilities, and Applications

This article explains the origin and definition of data lakes, describes their ability to store structured, semi‑structured and unstructured data at any scale on‑premises or in the cloud, outlines essential lake capabilities such as unified storage, raw‑data preservation, scalable compute, metadata and security management, and compares data lakes with data warehouses and lakehouse architectures through real‑world cloud‑native examples.

cloud storagemetadata management
0 likes · 16 min read
Introduction to Data Lake Concepts, Capabilities, and Applications
Big Data Technology Architecture
Big Data Technology Architecture
Jun 3, 2022 · Operations

Understanding Apache Airflow DAGs, Operators, and Scheduling

This article explains Apache Airflow's core concepts, including DAG definitions, scheduling intervals, task dependencies, various operators such as BashOperator, PythonOperator, Branch operators, sensors, and custom operators, and provides code examples and configuration details for building robust data pipelines.

Apache AirflowDAGData Pipelines
0 likes · 15 min read
Understanding Apache Airflow DAGs, Operators, and Scheduling
Big Data Technology Architecture
Big Data Technology Architecture
May 31, 2022 · Big Data

Comprehensive Guide to Installing and Using Apache Airflow with Docker on Windows

This article provides a detailed tutorial on Apache Airflow fundamentals, Docker-based installation on Windows, Dockerfile creation, container deployment via Docker run and Docker Compose, Airflow configuration, and practical usage of DAGs, tasks, connections, and UI features for data pipeline orchestration.

Apache AirflowData PipelinesDocker
0 likes · 14 min read
Comprehensive Guide to Installing and Using Apache Airflow with Docker on Windows
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

CDCDLFDelta Lake
0 likes · 10 min read
Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions