Tagged articles

Storage Compute Separation

26 articles · Page 1 of 1

Dec 11, 2025 · Databases

How StarRocks Redesigns Bulk Import to Cut Small Files and Boost Throughput

This article explains how StarRocks mitigates the hidden risks of massive one‑time data imports in a storage‑compute separated architecture by redesigning the write path to spill to local disk, merge centrally, and write to object storage, resulting in fewer small files, higher write throughput, and more stable query performance.

Bulk ImportCompactionData Engineering

0 likes · 12 min read

How StarRocks Redesigns Bulk Import to Cut Small Files and Boost Throughput

Alibaba Cloud Big Data AI Platform

Apr 27, 2025 · Big Data

Scaling Property Services: StarRocks‑Powered Storage‑Compute Separation for 8000+ Communities

Facing a flood of data from over 8,000 communities, the Bifeng service team migrated from a monolithic storage‑compute architecture to a StarRocks‑based storage‑compute separation solution, achieving lower costs, higher resource utilization, faster queries, and improved SLA across their property management platform.

Big DataData WarehouseInfrastructure Migration

0 likes · 11 min read

Scaling Property Services: StarRocks‑Powered Storage‑Compute Separation for 8000+ Communities

DataFunTalk

Feb 20, 2025 · Big Data

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

This article analyzes the transition from a tightly coupled storage‑compute architecture to a decoupled model, detailing how Kubernetes, Kyuubi, Celeborn, Blaze, and Hue together solve resource inefficiencies, improve scalability, and boost query performance in modern big‑data environments.

Big DataBlazeKubernetes

0 likes · 16 min read

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

dbaplus Community

Jan 5, 2025 · Big Data

How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation

DeWu’s observability platform faced scalability, cost, and operational challenges from petabyte‑scale trace data, prompting a shift to a storage‑compute separated architecture that leverages AutoMQ’s Kafka‑compatible service and ClickHouse Enterprise’s SharedMergeTree engine, ultimately achieving up to 50% cost reduction and five‑fold cold‑read performance gains.

AutoMQBig DataClickHouse

0 likes · 20 min read

How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation

DataFunSummit

Dec 21, 2024 · Big Data

Big Data Implementation Practices and Architecture in a Foreign Bank

This article shares the foreign bank's big data implementation journey, covering background and goals, overall planning and architecture, practical insights, phased rollout, data governance, security, and Q&A, illustrating how a unified data platform, storage‑compute separation, and AI‑driven tools drive business innovation.

AIData ArchitectureData Governance

0 likes · 19 min read

Big Data Implementation Practices and Architecture in a Foreign Bank

Big Data Technology & Architecture

Oct 21, 2024 · Big Data

Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

Apache Doris 3.0 introduces storage‑compute separation, native lakehouse write‑back, optimized Variant handling for semi‑structured data, stronger ETL transaction support, enhanced multi‑table materialized views, and Java UDTF capabilities, providing developers with more flexible, cost‑effective, and high‑performance analytics solutions.

Apache DorisData WarehouseETL

0 likes · 7 min read

Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

ITPUB

Sep 11, 2024 · Big Data

Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate

The article examines the concepts of storage‑compute separation and the lake‑warehouse (lakehouse) model, tracing their evolution from physical Hadoop clusters to containerized compute and object storage, and argues that true separation requires MPP systems to adopt open standards, effectively merging lake and warehouse architectures.

Big Data ArchitectureHadoopLakehouse

0 likes · 7 min read

Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate

Alibaba Cloud Big Data AI Platform

Jun 6, 2024 · Databases

How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics

StarRocks combines extreme query speed and a unified architecture to deliver a lakehouse solution that separates storage and compute, supports multi‑warehouse resource isolation, offers Trino compatibility, materialized‑view acceleration, and cost‑effective scaling, making it suitable for real‑time analytics, data‑lake queries, and traditional OLAP workloads.

Big DataLakehouseStarRocks

0 likes · 23 min read

How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics

Baidu Geek Talk

Apr 3, 2024 · Databases

Cloud-Native Database: Market Trends, Technical Evolution and Accessibility

Cloud-native databases, now backed by major providers and projected to power 95 % of digital business by 2025, are rapidly evolving from traditional systems to flexible, Kubernetes-compatible, MySQL/PostgreSQL-compatible, HTAP-enabled, serverless platforms—exemplified by Baidu’s GaiaDB with advanced consensus, low-latency networking, columnar storage, AI-driven operations—while enterprises balance adoption benefits against deployment, maturity, and sustainability concerns.

AI4DBGaiaDBHTAP

0 likes · 15 min read

Cloud-Native Database: Market Trends, Technical Evolution and Accessibility

DataFunSummit

Feb 1, 2024 · Databases

StarRocks 3.0 Storage‑Compute Separation Architecture: Design, Implementation, and Evaluation

This article explains the storage‑compute separation architecture introduced in StarRocks 3.0, presents industry case studies, details the design of StarOS and compute nodes, discusses technical challenges and key techniques, and evaluates cost, reliability, elasticity, and performance through benchmarks and user feedback.

Cloud NativeStarRocksStorage Compute Separation

0 likes · 11 min read

StarRocks 3.0 Storage‑Compute Separation Architecture: Design, Implementation, and Evaluation

DataFunTalk

Jan 24, 2024 · Databases

Kuaishou Graph Database Storage‑Compute Separation Architecture and Its Application in Real‑Time Recommendation

This article presents Kuaishou's graph database storage‑compute separation architecture, detailing its application in real‑time recommendation scenarios, core requirements of cost, performance and usability, the layered service design, memory‑compact models, edge structures, snapshot isolation, and key performance optimizations such as Share‑Nothing and columnar data flow.

Storage Compute Separationgraph databasereal-time recommendation

0 likes · 11 min read

Kuaishou Graph Database Storage‑Compute Separation Architecture and Its Application in Real‑Time Recommendation

StarRocks

Dec 22, 2023 · Databases

What’s New in StarRocks 3.2? Key Features and Usability Enhancements

StarRocks 3.2, released on December 21, 2023, introduces major usability upgrades—including optimized random bucketing, fast schema evolution, PIPE import, HTTP SQL API, runtime profiling, enhanced storage‑compute separation, data lake analysis, and advanced materialized view capabilities—while refining existing features such as indexing, catalog support, and export syntax.

DatabaseStarRocksStorage Compute Separation

0 likes · 15 min read

What’s New in StarRocks 3.2? Key Features and Usability Enhancements

Sohu Tech Products

Nov 1, 2023 · Databases

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions

Douyin tackled vector‑retrieval challenges by optimizing HNSW and creating a high‑performance IVF algorithm, implementing custom scalar quantization, SIMD acceleration, and a DSL‑driven engine that merges filtering with search, then built a cloud‑native, storage‑compute‑separated vector database (VikingDB) delivering sub‑10 ms latency, real‑time updates, multi‑tenant support, and secure, scalable retrieval for LLM‑driven applications.

ANNLLM integrationStorage Compute Separation

0 likes · 18 min read

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions

DataFunTalk

Sep 17, 2023 · Cloud Native

REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse

REDck is a cloud‑native, storage‑compute separated real‑time OLAP data warehouse derived from ClickHouse that addresses scalability, operational cost, and reliability challenges through a unified metadata service, object‑storage optimizations, multi‑level caching, distributed task scheduling, and two‑phase commit transactions.

ClickHouseReal-time OLAPStorage Compute Separation

0 likes · 18 min read

REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse

Xiaohongshu Tech REDtech

Sep 6, 2023 · Databases

REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse

REDck is a cloud‑native, real‑time OLAP data warehouse built on ClickHouse that adds elastic compute and storage scaling, object‑storage optimizations, multi‑level caching, and exactly‑once ingestion, delivering petabyte‑scale interactive analytics with ten‑fold CPU efficiency, ten‑fold cost reduction, and 99.9% availability.

Big DataClickHouseReal-time OLAP

0 likes · 21 min read

REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse

Volcano Engine Developer Services

Sep 21, 2022 · Big Data

Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions

This article explores the open‑source lakehouse trend, outlines the architectural features of Volcano Engine EMR, examines key challenges of building enterprise‑grade data lakehouses, and presents best‑practice case studies demonstrating how EMR enables scalable, real‑time analytics, storage‑compute separation, and seamless integration with modern big‑data engines.

Data LakehouseEMROpen Source

0 likes · 22 min read

Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions

Baidu Geek Talk

Jul 1, 2022 · Big Data

Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture

The article traces the evolution of data platforms from early data warehouses—using schema‑on‑write, columnar storage, and MPP engines—to data lakes that retain raw data with schema‑on‑read, and finally to lakehouse architectures that merge storage and compute, offering unified metadata, versioning, and support for BI, big‑data, AI, and HPC workloads.

Data ArchitectureLakehouseOLAP

0 likes · 25 min read

Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture

DataFunSummit

May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseCloud Native

0 likes · 15 min read

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

Tencent Cloud Developer

Feb 28, 2022 · Big Data

GooseFS: Distributed Caching System for Storage-Compute Separation Architecture

GooseFS, Tencent Cloud’s distributed caching system for storage‑compute separation, links compute frameworks to underlying storage (COS, CHDFS, COSN) and boosts big‑data and AI workloads by 2‑10× through transparent acceleration, robust master‑worker architecture, Raft‑based HA, tiered caching, and metadata optimizations, delivering up to 50% cost savings and 29% faster compute jobs.

Big Data ArchitectureGooseFSRaft consensus

0 likes · 18 min read

GooseFS: Distributed Caching System for Storage-Compute Separation Architecture

Tencent Architect

Sep 10, 2021 · Databases

Design and Advantages of a Cloud‑Native ClickHouse OLAP System

This article presents the architecture, key features, and operational benefits of a cloud‑native ClickHouse OLAP platform, describing how storage‑compute separation, a unified master node, and shared storage reduce cost, improve availability, and simplify management while remaining fully compatible with the open‑source ClickHouse ecosystem.

ClickHouseDatabase ArchitectureOLAP

0 likes · 18 min read

Design and Advantages of a Cloud‑Native ClickHouse OLAP System

Programmer DD

Dec 11, 2019 · Big Data

Big Data Architecture Secrets: Storage-Compute Separation & Spark in Action

This article explores how enterprises can tackle the explosive growth of data by adopting modern big‑data architectures, including storage‑compute separation, data‑driven workflows, risk‑control frameworks, and real‑world Spark optimizations, offering practical guidance for scalable, high‑performance analytics.

Big DataData ArchitectureData-Driven

0 likes · 12 min read

Big Data Architecture Secrets: Storage-Compute Separation & Spark in Action

MaGe Linux Operations

Dec 6, 2019 · Backend Development

How Xiaomi’s Talos Redefined Distributed Messaging for Massive Scale

Xiaomi’s Talos, a self‑developed distributed message queue, tackles the limitations of Kafka by separating storage and compute on HDFS, introducing stateless scaling, advanced consistency, partition delay allocation, and extensive performance and resource optimizations to support trillions of daily messages and multi‑tenant workloads.

Distributed MessagingPerformance OptimizationStorage Compute Separation

0 likes · 16 min read

How Xiaomi’s Talos Redefined Distributed Messaging for Massive Scale

UCloud Tech

Dec 4, 2019 · Big Data

How to Evolve Big Data Architectures for ZB‑Scale Analytics and Real‑World Use Cases

This article reviews the challenges of handling Zettabyte‑scale data, outlines practical big‑data processing architectures, discusses storage‑compute separation, data‑driven workflows, risk‑control frameworks, and shares concrete Spark implementations at MobTech, offering actionable insights for modern data engineers.

Data ArchitectureSparkStorage Compute Separation

0 likes · 13 min read

How to Evolve Big Data Architectures for ZB‑Scale Analytics and Real‑World Use Cases

Tencent Cloud Developer

Dec 20, 2018 · Databases

CynosDB Architecture and Optimization: A PostgreSQL-Compatible NewSQL Database

CynosDB, Tencent’s PostgreSQL‑compatible NewSQL service, separates compute and storage, uses a log‑based distributed CynosStore with idempotent logs, offloads CRC checks, and implements async table extension, eliminating full‑page writes and dirty‑page flushing to deliver scalable, cost‑effective performance while preserving PostgreSQL features.

CynosDBDatabase ArchitectureDistributed storage

0 likes · 12 min read

CynosDB Architecture and Optimization: A PostgreSQL-Compatible NewSQL Database

Alibaba Cloud Developer

Dec 7, 2018 · Databases

How Alibaba Achieved Extreme Database Elasticity with Hybrid Cloud, Containers, and Storage‑Compute Separation

This article explains how Alibaba transformed its database infrastructure through hybrid‑cloud high‑performance ECS, container‑based multi‑instance deployment, and a user‑space storage‑compute separation architecture with RDMA, dramatically improving resource utilization, scaling speed, and cost efficiency for massive traffic spikes.

DatabasesRDMAStorage Compute Separation

0 likes · 15 min read

How Alibaba Achieved Extreme Database Elasticity with Hybrid Cloud, Containers, and Storage‑Compute Separation

Alibaba Cloud Developer

Nov 26, 2018 · Databases

How Alibaba’s DBFS Achieved Storage‑Compute Separation for Massive 11.11 Sales

This article details Alibaba's journey from the 2017 pilot of storage‑compute separation to the 2018 large‑scale deployment of the DBFS user‑space file system, highlighting innovations such as zero‑copy I/O, RDMA integration, adaptive page cache, asynchronous I/O, atomic writes, online resize, and hardware‑software co‑design that enabled elastic, high‑performance database operations during the Double‑11 shopping festival.

DBFSDatabase PerformanceRDMA

0 likes · 15 min read

How Alibaba’s DBFS Achieved Storage‑Compute Separation for Massive 11.11 Sales