Big Data Technology Architecture
Author

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

290
Articles
0
Likes
602
Views
0
Comments
Recent Articles

Latest from Big Data Technology Architecture

100 recent articles max
Big Data Technology Architecture
Big Data Technology Architecture
Sep 28, 2021 · Big Data

Integrating Apache Kyuubi with CDH 6 and Spark 3: Deployment, Configuration, and Performance Tuning

This guide explains how to deploy Apache Kyuubi on a CDH 6 cluster, replace HiveServer2 with Kyuubi, integrate Spark 3, apply necessary patches, configure environment and Spark settings, and optimize engine sharing for various workloads, providing complete code snippets and step‑by‑step instructions.

CDHHiveServer2Kyuubi
0 likes · 19 min read
Integrating Apache Kyuubi with CDH 6 and Spark 3: Deployment, Configuration, and Performance Tuning
Big Data Technology Architecture
Big Data Technology Architecture
Sep 17, 2021 · Big Data

Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

This article details the design and implementation of 58.com’s real‑time computing platform, covering its architecture, data ingestion, storage, Flink‑based stream processing, SQL extensions, performance optimizations, Storm‑to‑Flink migration tools, the Wstream management console, state handling, monitoring, and future roadmap.

FlinkReal‑Time ComputingStorm Migration
0 likes · 16 min read
Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com
Big Data Technology Architecture
Big Data Technology Architecture
Aug 24, 2021 · Big Data

Comprehensive Guide to Spark Performance Optimization, Data Skew Mitigation, and Troubleshooting

This article presents a detailed collection of Spark performance‑tuning techniques—including submit‑script parameters, RDD and operator optimizations, parallelism and memory settings, broadcast variables, Kryo serialization, locality wait adjustments—as well as systematic methods for detecting and resolving data skew and common runtime issues such as shuffle failures, serialization errors, and JVM memory problems.

Data SkewShuffleSpark
0 likes · 21 min read
Comprehensive Guide to Spark Performance Optimization, Data Skew Mitigation, and Troubleshooting
Big Data Technology Architecture
Big Data Technology Architecture
Aug 17, 2021 · Big Data

Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap

This article provides an in‑depth technical overview of Flink CDC 2.0, covering its CDC fundamentals, comparison of query‑based and log‑based approaches, the new lock‑free chunk algorithm, FLIP‑27 based parallel snapshot reading, performance benchmarks, documentation improvements, and future roadmap for stability and ecosystem integration.

Change Data CaptureData integrationDebezium
0 likes · 16 min read
Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap
Big Data Technology Architecture
Big Data Technology Architecture
Aug 12, 2021 · Big Data

Enterprise Data Lake Architecture, Delta Lake Core Capabilities, and Stream‑Batch Integrated Analytics on Alibaba Cloud

This article explains the rapid growth of data, the limitations of traditional warehouses, and how a cloud‑based data lake built on object storage with Delta Lake format provides low‑cost, flexible, and ACID‑compliant analytics, followed by a step‑by‑step guide to ingest, manage, and analyze data using Alibaba Cloud DLF and Databricks DDI with Spark streaming and batch jobs.

Alibaba CloudDelta LakeSpark
0 likes · 19 min read
Enterprise Data Lake Architecture, Delta Lake Core Capabilities, and Stream‑Batch Integrated Analytics on Alibaba Cloud
Big Data Technology Architecture
Big Data Technology Architecture
Aug 12, 2021 · Databases

Understanding HBase HLog and Fault Recovery Mechanisms

This article explains HBase's write path using Memstore and HLog, details the lifecycle of HLog including construction, rolling, expiration, and deletion, and thoroughly analyzes the three fault‑recovery models—Log Splitting, Distributed Log Splitting, and Distributed Log Replay—highlighting their processes, advantages, and configuration nuances.

HBaseHLogLog Splitting
0 likes · 14 min read
Understanding HBase HLog and Fault Recovery Mechanisms
Big Data Technology Architecture
Big Data Technology Architecture
Aug 10, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's practical experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points of traditional Lambda architectures, Iceberg's table format and capabilities, Flink‑Iceberg sink design, small‑file handling, and future roadmap for a unified streaming‑batch data lake.

Apache FlinkApache IcebergBatch Processing
0 likes · 20 min read
Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices
Big Data Technology Architecture
Big Data Technology Architecture
Jul 27, 2021 · Big Data

Key Components of the Big Data Ecosystem: Hadoop, Hive, HBase, Spark, Kafka, and Elasticsearch

This article introduces the most important and still mainstream components of the big data ecosystem—including Hadoop’s storage and compute framework, Hive data warehouse, HBase NoSQL database, Spark unified engine, Kafka messaging platform, and Elasticsearch search engine—explaining their core concepts, architectures, and typical use cases.

ElasticsearchHBaseHadoop
0 likes · 9 min read
Key Components of the Big Data Ecosystem: Hadoop, Hive, HBase, Spark, Kafka, and Elasticsearch