Tagged articles
26 articles
Page 1 of 1
StarRocks
StarRocks
Dec 11, 2025 · Databases

How StarRocks Redesigns Bulk Import to Cut Small Files and Boost Throughput

This article explains how StarRocks mitigates the hidden risks of massive one‑time data imports in a storage‑compute separated architecture by redesigning the write path to spill to local disk, merge centrally, and write to object storage, resulting in fewer small files, higher write throughput, and more stable query performance.

Bulk ImportS3StarRocks
0 likes · 12 min read
How StarRocks Redesigns Bulk Import to Cut Small Files and Boost Throughput
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 27, 2025 · Big Data

Scaling Property Services: StarRocks‑Powered Storage‑Compute Separation for 8000+ Communities

Facing a flood of data from over 8,000 communities, the Bifeng service team migrated from a monolithic storage‑compute architecture to a StarRocks‑based storage‑compute separation solution, achieving lower costs, higher resource utilization, faster queries, and improved SLA across their property management platform.

Big DataData WarehouseInfrastructure Migration
0 likes · 11 min read
Scaling Property Services: StarRocks‑Powered Storage‑Compute Separation for 8000+ Communities
DataFunTalk
DataFunTalk
Feb 20, 2025 · Big Data

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

This article analyzes the transition from a tightly coupled storage‑compute architecture to a decoupled model, detailing how Kubernetes, Kyuubi, Celeborn, Blaze, and Hue together solve resource inefficiencies, improve scalability, and boost query performance in modern big‑data environments.

Big DataBlazeKubernetes
0 likes · 16 min read
From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms
dbaplus Community
dbaplus Community
Jan 5, 2025 · Big Data

How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation

DeWu’s observability platform faced scalability, cost, and operational challenges from petabyte‑scale trace data, prompting a shift to a storage‑compute separated architecture that leverages AutoMQ’s Kafka‑compatible service and ClickHouse Enterprise’s SharedMergeTree engine, ultimately achieving up to 50% cost reduction and five‑fold cold‑read performance gains.

AutoMQBig DataClickHouse
0 likes · 20 min read
How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation
DataFunSummit
DataFunSummit
Dec 21, 2024 · Big Data

Big Data Implementation Practices and Architecture in a Foreign Bank

This article shares the foreign bank's big data implementation journey, covering background and goals, overall planning and architecture, practical insights, phased rollout, data governance, security, and Q&A, illustrating how a unified data platform, storage‑compute separation, and AI‑driven tools drive business innovation.

AIBankingData Architecture
0 likes · 19 min read
Big Data Implementation Practices and Architecture in a Foreign Bank
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2024 · Big Data

Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

Apache Doris 3.0 introduces storage‑compute separation, native lakehouse write‑back, optimized Variant handling for semi‑structured data, stronger ETL transaction support, enhanced multi‑table materialized views, and Java UDTF capabilities, providing developers with more flexible, cost‑effective, and high‑performance analytics solutions.

Apache DorisData WarehouseETL
0 likes · 7 min read
Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF
ITPUB
ITPUB
Sep 11, 2024 · Big Data

Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate

The article examines the concepts of storage‑compute separation and the lake‑warehouse (lakehouse) model, tracing their evolution from physical Hadoop clusters to containerized compute and object storage, and argues that true separation requires MPP systems to adopt open standards, effectively merging lake and warehouse architectures.

Big Data ArchitectureHadoopLakehouse
0 likes · 7 min read
Is Storage‑Compute Separation the Future? Unpacking the Lakehouse Debate
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 6, 2024 · Databases

How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics

StarRocks combines extreme query speed and a unified architecture to deliver a lakehouse solution that separates storage and compute, supports multi‑warehouse resource isolation, offers Trino compatibility, materialized‑view acceleration, and cost‑effective scaling, making it suitable for real‑time analytics, data‑lake queries, and traditional OLAP workloads.

Big DataLakehouseReal-time analytics
0 likes · 23 min read
How StarRocks Redefines Lakehouse Architecture with Ultra-Fast Unified Analytics
Baidu Geek Talk
Baidu Geek Talk
Apr 3, 2024 · Databases

Cloud-Native Database: Market Trends, Technical Evolution and Accessibility

Cloud-native databases, now backed by major providers and projected to power 95 % of digital business by 2025, are rapidly evolving from traditional systems to flexible, Kubernetes-compatible, MySQL/PostgreSQL-compatible, HTAP-enabled, serverless platforms—exemplified by Baidu’s GaiaDB with advanced consensus, low-latency networking, columnar storage, AI-driven operations—while enterprises balance adoption benefits against deployment, maturity, and sustainability concerns.

AI4DBGaiaDBHTAP
0 likes · 15 min read
Cloud-Native Database: Market Trends, Technical Evolution and Accessibility
DataFunSummit
DataFunSummit
Feb 1, 2024 · Databases

StarRocks 3.0 Storage‑Compute Separation Architecture: Design, Implementation, and Evaluation

This article explains the storage‑compute separation architecture introduced in StarRocks 3.0, presents industry case studies, details the design of StarOS and compute nodes, discusses technical challenges and key techniques, and evaluates cost, reliability, elasticity, and performance through benchmarks and user feedback.

Cloud NativePerformance EvaluationStarRocks
0 likes · 11 min read
StarRocks 3.0 Storage‑Compute Separation Architecture: Design, Implementation, and Evaluation
DataFunTalk
DataFunTalk
Jan 24, 2024 · Databases

Kuaishou Graph Database Storage‑Compute Separation Architecture and Its Application in Real‑Time Recommendation

This article presents Kuaishou's graph database storage‑compute separation architecture, detailing its application in real‑time recommendation scenarios, core requirements of cost, performance and usability, the layered service design, memory‑compact models, edge structures, snapshot isolation, and key performance optimizations such as Share‑Nothing and columnar data flow.

Storage Compute Separationgraph databasereal-time recommendation
0 likes · 11 min read
Kuaishou Graph Database Storage‑Compute Separation Architecture and Its Application in Real‑Time Recommendation
StarRocks
StarRocks
Dec 22, 2023 · Databases

What’s New in StarRocks 3.2? Key Features and Usability Enhancements

StarRocks 3.2, released on December 21, 2023, introduces major usability upgrades—including optimized random bucketing, fast schema evolution, PIPE import, HTTP SQL API, runtime profiling, enhanced storage‑compute separation, data lake analysis, and advanced materialized view capabilities—while refining existing features such as indexing, catalog support, and export syntax.

Release NotesStarRocksStorage Compute Separation
0 likes · 15 min read
What’s New in StarRocks 3.2? Key Features and Usability Enhancements
Sohu Tech Products
Sohu Tech Products
Nov 1, 2023 · Databases

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions

Douyin tackled vector‑retrieval challenges by optimizing HNSW and creating a high‑performance IVF algorithm, implementing custom scalar quantization, SIMD acceleration, and a DSL‑driven engine that merges filtering with search, then built a cloud‑native, storage‑compute‑separated vector database (VikingDB) delivering sub‑10 ms latency, real‑time updates, multi‑tenant support, and secure, scalable retrieval for LLM‑driven applications.

ANNLLM integrationStorage Compute Separation
0 likes · 18 min read
Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions
DataFunTalk
DataFunTalk
Sep 17, 2023 · Cloud Native

REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse

REDck is a cloud‑native, storage‑compute separated real‑time OLAP data warehouse derived from ClickHouse that addresses scalability, operational cost, and reliability challenges through a unified metadata service, object‑storage optimizations, multi‑level caching, distributed task scheduling, and two‑phase commit transactions.

ClickHouseDistributed TransactionsReal-time OLAP
0 likes · 18 min read
REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 6, 2023 · Databases

REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse

REDck is a cloud‑native, real‑time OLAP data warehouse built on ClickHouse that adds elastic compute and storage scaling, object‑storage optimizations, multi‑level caching, and exactly‑once ingestion, delivering petabyte‑scale interactive analytics with ten‑fold CPU efficiency, ten‑fold cost reduction, and 99.9% availability.

Big DataClickHouseReal-time OLAP
0 likes · 21 min read
REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 21, 2022 · Big Data

Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions

This article explores the open‑source lakehouse trend, outlines the architectural features of Volcano Engine EMR, examines key challenges of building enterprise‑grade data lakehouses, and presents best‑practice case studies demonstrating how EMR enables scalable, real‑time analytics, storage‑compute separation, and seamless integration with modern big‑data engines.

Data LakehouseEMRStorage Compute Separation
0 likes · 22 min read
Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions
Baidu Geek Talk
Baidu Geek Talk
Jul 1, 2022 · Big Data

Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture

The article traces the evolution of data platforms from early data warehouses—using schema‑on‑write, columnar storage, and MPP engines—to data lakes that retain raw data with schema‑on‑read, and finally to lakehouse architectures that merge storage and compute, offering unified metadata, versioning, and support for BI, big‑data, AI, and HPC workloads.

Data ArchitectureLakehouseOLAP
0 likes · 25 min read
Evolution of Data Platform Technology: From Data Warehouse to Lakehouse Architecture
DataFunSummit
DataFunSummit
May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataClickHouseCloud Native
0 likes · 15 min read
Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer
Tencent Cloud Developer
Tencent Cloud Developer
Feb 28, 2022 · Big Data

GooseFS: Distributed Caching System for Storage-Compute Separation Architecture

GooseFS, Tencent Cloud’s distributed caching system for storage‑compute separation, links compute frameworks to underlying storage (COS, CHDFS, COSN) and boosts big‑data and AI workloads by 2‑10× through transparent acceleration, robust master‑worker architecture, Raft‑based HA, tiered caching, and metadata optimizations, delivering up to 50% cost savings and 29% faster compute jobs.

Big Data ArchitectureGooseFSRaft consensus
0 likes · 18 min read
GooseFS: Distributed Caching System for Storage-Compute Separation Architecture
Tencent Architect
Tencent Architect
Sep 10, 2021 · Databases

Design and Advantages of a Cloud‑Native ClickHouse OLAP System

This article presents the architecture, key features, and operational benefits of a cloud‑native ClickHouse OLAP platform, describing how storage‑compute separation, a unified master node, and shared storage reduce cost, improve availability, and simplify management while remaining fully compatible with the open‑source ClickHouse ecosystem.

ClickHouseDatabase ArchitectureDistributed Systems
0 likes · 18 min read
Design and Advantages of a Cloud‑Native ClickHouse OLAP System
Programmer DD
Programmer DD
Dec 11, 2019 · Big Data

Big Data Architecture Secrets: Storage-Compute Separation & Spark in Action

This article explores how enterprises can tackle the explosive growth of data by adopting modern big‑data architectures, including storage‑compute separation, data‑driven workflows, risk‑control frameworks, and real‑world Spark optimizations, offering practical guidance for scalable, high‑performance analytics.

Big DataData ArchitectureData-driven
0 likes · 12 min read
Big Data Architecture Secrets: Storage-Compute Separation & Spark in Action
MaGe Linux Operations
MaGe Linux Operations
Dec 6, 2019 · Backend Development

How Xiaomi’s Talos Redefined Distributed Messaging for Massive Scale

Xiaomi’s Talos, a self‑developed distributed message queue, tackles the limitations of Kafka by separating storage and compute on HDFS, introducing stateless scaling, advanced consistency, partition delay allocation, and extensive performance and resource optimizations to support trillions of daily messages and multi‑tenant workloads.

Distributed MessagingPerformance OptimizationStorage Compute Separation
0 likes · 16 min read
How Xiaomi’s Talos Redefined Distributed Messaging for Massive Scale
UCloud Tech
UCloud Tech
Dec 4, 2019 · Big Data

How to Evolve Big Data Architectures for ZB‑Scale Analytics and Real‑World Use Cases

This article reviews the challenges of handling Zettabyte‑scale data, outlines practical big‑data processing architectures, discusses storage‑compute separation, data‑driven workflows, risk‑control frameworks, and shares concrete Spark implementations at MobTech, offering actionable insights for modern data engineers.

Data ArchitectureSparkStorage Compute Separation
0 likes · 13 min read
How to Evolve Big Data Architectures for ZB‑Scale Analytics and Real‑World Use Cases
Tencent Cloud Developer
Tencent Cloud Developer
Dec 20, 2018 · Databases

CynosDB Architecture and Optimization: A PostgreSQL-Compatible NewSQL Database

CynosDB, Tencent’s PostgreSQL‑compatible NewSQL service, separates compute and storage, uses a log‑based distributed CynosStore with idempotent logs, offloads CRC checks, and implements async table extension, eliminating full‑page writes and dirty‑page flushing to deliver scalable, cost‑effective performance while preserving PostgreSQL features.

CynosDBDatabase ArchitectureLog System Optimization
0 likes · 12 min read
CynosDB Architecture and Optimization: A PostgreSQL-Compatible NewSQL Database
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 7, 2018 · Databases

How Alibaba Achieved Extreme Database Elasticity with Hybrid Cloud, Containers, and Storage‑Compute Separation

This article explains how Alibaba transformed its database infrastructure through hybrid‑cloud high‑performance ECS, container‑based multi‑instance deployment, and a user‑space storage‑compute separation architecture with RDMA, dramatically improving resource utilization, scaling speed, and cost efficiency for massive traffic spikes.

RDMAStorage Compute Separationcloud-native
0 likes · 15 min read
How Alibaba Achieved Extreme Database Elasticity with Hybrid Cloud, Containers, and Storage‑Compute Separation
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 26, 2018 · Databases

How Alibaba’s DBFS Achieved Storage‑Compute Separation for Massive 11.11 Sales

This article details Alibaba's journey from the 2017 pilot of storage‑compute separation to the 2018 large‑scale deployment of the DBFS user‑space file system, highlighting innovations such as zero‑copy I/O, RDMA integration, adaptive page cache, asynchronous I/O, atomic writes, online resize, and hardware‑software co‑design that enabled elastic, high‑performance database operations during the Double‑11 shopping festival.

DBFSDatabase PerformanceRDMA
0 likes · 15 min read
How Alibaba’s DBFS Achieved Storage‑Compute Separation for Massive 11.11 Sales