Big Data Technology Architecture
Author

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

290
Articles
0
Likes
602
Views
0
Comments
Recent Articles

Latest from Big Data Technology Architecture

100 recent articles max
Big Data Technology Architecture
Big Data Technology Architecture
Jul 20, 2021 · Big Data

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

This article details 360's Threat Hunting platform built on Flink, covering its evolution, architecture, block‑index design, Hilbert‑curve data ordering, like‑pushdown, join optimizations, Alluxio caching, and future plans for BI and multi‑user concurrency, all aimed at efficient PB‑scale data querying.

AlluxioBlock IndexFlink
0 likes · 18 min read
PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations
Big Data Technology Architecture
Big Data Technology Architecture
Jul 15, 2021 · Big Data

Resolving Spark Task Not Serializable Errors: Causes, Code Examples, and Best Practices

This article analyzes why Spark tasks fail with a "Task not serializable" exception when closures reference class members, demonstrates the issue with Scala code examples, and provides practical solutions such as using @transient annotations, moving functions to objects, and ensuring proper class serialization.

ScalaSparkTask Not Serializable
0 likes · 12 min read
Resolving Spark Task Not Serializable Errors: Causes, Code Examples, and Best Practices
Big Data Technology Architecture
Big Data Technology Architecture
Jul 15, 2021 · Big Data

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

This article presents a comprehensive overview of using Apache Iceberg with object storage to construct scalable data lake solutions, covering lake architecture, Iceberg table organization, Flink‑based write and read workflows, catalog abstractions, object storage versus HDFS comparisons, append‑upload and atomic‑commit challenges, a demonstration setup, and ideas for storage optimization.

CatalogFlinkIceberg
0 likes · 16 min read
Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization
Big Data Technology Architecture
Big Data Technology Architecture
Jun 22, 2021 · Databases

Hopsworks Feature Store: Transparent Dual‑Storage System for Online and Offline Machine Learning Features

This article explains how Hopsworks’ feature store unifies online low‑latency and offline high‑throughput storage using a dual‑system architecture built on RonDB, detailing its API, metadata handling, ingestion pipeline, benchmarks, and how it simplifies production machine‑learning feature access.

RonDBbenchmarkfeature store
0 likes · 17 min read
Hopsworks Feature Store: Transparent Dual‑Storage System for Online and Offline Machine Learning Features
Big Data Technology Architecture
Big Data Technology Architecture
Jun 17, 2021 · Databases

Key Features of ClickHouse: DBMS Capabilities, Columnar Storage, Vectorized Execution, and Distributed Architecture

ClickHouse is a high‑performance MPP column‑store DBMS that combines complete DBMS functions, column‑oriented storage with aggressive compression, SIMD‑based vectorized execution, flexible table engines, multithreading, distributed processing, a multi‑master architecture, and SQL compatibility to deliver fast online analytical queries on massive data sets.

ClickHouseColumnar StorageDBMS
0 likes · 21 min read
Key Features of ClickHouse: DBMS Capabilities, Columnar Storage, Vectorized Execution, and Distributed Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Jun 16, 2021 · Big Data

HBase Read and Write Performance Optimization Guide

This guide details practical server‑side and client‑side techniques for improving HBase read and write throughput, covering rowkey design, BlockCache configuration, HFile management, compaction tuning, scan cache sizing, bulkload usage, WAL policies, and SSD storage options.

Database TuningHBaseread optimization
0 likes · 8 min read
HBase Read and Write Performance Optimization Guide
Big Data Technology Architecture
Big Data Technology Architecture
Jun 10, 2021 · Big Data

Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music

This article explains Apache Iceberg’s table‑format design, compares it with Hive’s limitations, details its snapshot‑based architecture and metadata handling, and describes how NetEase Cloud Music leveraged Iceberg to dramatically improve large‑scale log processing performance and stability.

Apache IcebergSparkTable Format
0 likes · 12 min read
Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2021 · Big Data

Types of OLAP Data Warehouses and Performance Optimization Techniques

This article explains the various classifications of OLAP data warehouses—including MOLAP, ROLAP, HOLAP, and HTAP—based on data volume and modeling, reviews common open‑source ROLAP products, and details performance‑boosting techniques such as MPP architecture, cost‑based optimization, vectorized execution, and storage optimizations.

Data WarehouseMPPOLAP
0 likes · 27 min read
Types of OLAP Data Warehouses and Performance Optimization Techniques