Tagged articles
24 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 13, 2025 · Big Data

How ODPS Evolved Over 15 Years into a Next‑Gen AI‑Ready Big Data Platform

This article chronicles ODPS's 15‑year journey from its exploratory beginnings to a modern, AI‑enabled big data platform, detailing its four development phases, architectural layers, SQL engine upgrades, real‑time processing, lakehouse integration, and the new Data+AI capabilities offered by MaxCompute and DataWorks.

AI integrationBig DataDataWorks
0 likes · 12 min read
How ODPS Evolved Over 15 Years into a Next‑Gen AI‑Ready Big Data Platform
Baidu Geek Talk
Baidu Geek Talk
Jun 24, 2024 · Big Data

Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation

The paper presents a Spark acceleration framework that replaces Java‑based task operators with a ClickHouse native library, converting plans via Protobuf and JNI, leveraging columnar storage, SIMD and JIT to achieve up to 3× speed‑up on TPC‑DS workloads while providing fallback mechanisms to ensure no performance loss.

Big DataNative AccelerationSQL Engine
0 likes · 31 min read
Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 24, 2024 · Databases

How Does GaussDB Execute SQL? Inside Its Engine and Optimization

This article explains GaussDB's system architecture, the roles of GTM, CN, and DN, the SQL and storage engines, parsing stages, rule‑based and cost‑based optimizers (including AI‑based techniques), distributed query plans, execution operators, and the parallel execution framework that together enable high‑performance SQL processing in a cloud‑native distributed database.

Database ArchitectureGaussDBSQL Engine
0 likes · 15 min read
How Does GaussDB Execute SQL? Inside Its Engine and Optimization
DataFunSummit
DataFunSummit
Feb 26, 2024 · Big Data

Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon

This article introduces a new lakehouse analytics paradigm by combining StarRocks and Paimon, covering the evolution of data lake technologies, key integration scenarios, core technical mechanisms such as JNI connectors, materialized views, and future roadmap for enhanced lakehouse capabilities.

AnalyticsBig DataData Lake
0 likes · 16 min read
Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon
dbaplus Community
dbaplus Community
Jan 21, 2024 · Databases

Why SQLite Dominates Everywhere: Origins, Architecture, and Secrets

This article explores why SQLite is the world’s most ubiquitous database, tracing its birth from a Navy project, its early implementation atop GDBM, the layered architecture that processes SQL statements, the transition to a B‑tree engine, and the creator’s philosophy of self‑contained software.

Embedded DatabaseGDBMRichard Hipp
0 likes · 10 min read
Why SQLite Dominates Everywhere: Origins, Architecture, and Secrets
DataFunTalk
DataFunTalk
Nov 12, 2023 · Big Data

MaxCompute Incremental Update Architecture, Intelligent Materialized Views, and Adaptive Execution Optimizations

This article presents a comprehensive overview of MaxCompute's near‑real‑time incremental update and processing architecture, the design of Transactional Table 2.0, intelligent materialized view evolution and recommendation, as well as multi‑level adaptive execution optimizations for the SQL engine, illustrating how these innovations improve efficiency, cost, and scalability for large‑scale data workloads.

Adaptive ExecutionMaxComputeSQL Engine
0 likes · 20 min read
MaxCompute Incremental Update Architecture, Intelligent Materialized Views, and Adaptive Execution Optimizations
DataFunSummit
DataFunSummit
Jul 18, 2023 · Databases

Apache Doris Data Lake Federation Features Overview

This article introduces Apache Doris’s data lake federation capabilities, detailing its lake‑warehouse integration design, supported data sources such as Hive, Iceberg, Hudi, and Elasticsearch, performance optimizations for metadata and file access, case studies, community roadmap, and Q&A on replacing Presto.

Apache DorisData LakeSQL Engine
0 likes · 21 min read
Apache Doris Data Lake Federation Features Overview
Big Data Technology Architecture
Big Data Technology Architecture
Jul 4, 2023 · Databases

Apache Doris 2.0‑beta Release: New Query Optimizer, Pipeline Engine, Workload Management and Performance Enhancements

Apache Doris 2.0‑beta, released on July 3, 2023, introduces a modern Cascades‑based query optimizer, a data‑driven pipeline execution engine, fine‑grained workload groups, enhanced memory management, partial‑column updates, compute nodes, cold‑hot tiering and cross‑cluster replication, delivering up to tenfold speedups and significant cost reductions for real‑time analytics.

Apache DorisPipeline ExecutionSQL Engine
0 likes · 24 min read
Apache Doris 2.0‑beta Release: New Query Optimizer, Pipeline Engine, Workload Management and Performance Enhancements
Alimama Tech
Alimama Tech
Feb 15, 2023 · Big Data

Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview

Dolphin, Alibaba’s hyper‑converged multi‑modal big‑data engine, unifies OLAP, AI, streaming, and batch workloads on a decoupled compute‑storage MPP foundation, offering a Dolphin SQL layer, advanced bitmap/GroupTable/AFile indexes, intelligent materialization, and one‑write‑multiple‑read storage that cuts costs over 70% while delivering sub‑millisecond queries on trillion‑row datasets.

Big DataOLAPSQL Engine
0 likes · 14 min read
Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview
DaTaobao Tech
DaTaobao Tech
Aug 11, 2022 · Big Data

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

The article describes how fragmented real‑time, batch, and online data‑warehouse pipelines suffer from low productivity and inconsistent data quality, and introduces a unified SQL engine built on Apache Calcite that parses, optimizes, and compiles a single SQL statement into executable plans for ODPS, Flink, or Java, leveraging Janino code generation, multi‑backend state storage, and snapshot‑join semantics to boost performance and simplify development.

Batch ProcessingCalciteFlink
0 likes · 16 min read
Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing
DataFunSummit
DataFunSummit
Mar 21, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

This article explains how Apache Doris adopts CPU‑level vectorization and columnar storage to boost query performance, details the design and current status of its vectorized engine, and outlines future work such as JOIN acceleration, storage‑layer vectorization, import optimization, and extensive SQL function support.

Apache DorisColumnar StorageSIMD
0 likes · 21 min read
Vectorization in Apache Doris: Design, Implementation, and Future Roadmap
Bilibili Tech
Bilibili Tech
Feb 18, 2022 · Big Data

Evolution of Bilibili's Data Retrieval Services and Lakehouse Architecture

Bilibili’s data retrieval journey progressed from a fragmented, chimney‑style pipeline to a unified Flink‑based service layer with the Ark construction system and Akuya SQL engine, and finally to an Iceberg‑driven lakehouse that eliminates data duplication, streamlines cross‑engine optimization, and offers platformized, low‑latency analytics.

Big DataBilibiliData Retrieval
0 likes · 14 min read
Evolution of Bilibili's Data Retrieval Services and Lakehouse Architecture
DataFunTalk
DataFunTalk
Jan 27, 2022 · Big Data

Kyuubi: NetEase’s Open‑Source Multi‑Tenant SQL Engine for Large‑Scale Data Processing

This article introduces Kyuubi, the first NetEase project contributed to the Apache Foundation, describing its core features, multi‑tenant architecture, Spark‑based execution engine, cloud‑native capabilities, and real‑world use cases within NetEase’s data‑warehouse, ad‑hoc, and internal systems, along with performance gains and community resources.

ApacheBig DataKyuubi
0 likes · 23 min read
Kyuubi: NetEase’s Open‑Source Multi‑Tenant SQL Engine for Large‑Scale Data Processing
NetEase Game Operations Platform
NetEase Game Operations Platform
May 22, 2021 · Big Data

Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi

This article systematically introduces NetEase Kyuubi, an open‑source high‑performance JDBC and SQL execution engine built on Apache Spark, covering its background, core architecture, service discovery, session and operation management, startup processes, and key source‑code implementations with detailed code examples.

Apache ThriftBig DataKyuubi
0 likes · 47 min read
Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi
DataFunTalk
DataFunTalk
Jan 6, 2021 · Big Data

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

This article presents Didi's three‑year experience with Presto, detailing its architecture, low‑latency design, large‑scale deployment, extensive Hive compatibility work, resource isolation, Druid connector integration, usability enhancements, stability engineering, performance tuning, and future directions for the ad‑hoc query engine.

Big DataDistributed SystemsDruid Connector
0 likes · 17 min read
Didi's Presto Engine: Architecture, Optimizations, and Operational Practices
ITPUB
ITPUB
Oct 10, 2020 · Big Data

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Didi’s three‑year journey with Presto transformed it into the company’s primary ad‑hoc and Hive‑SQL acceleration engine, serving over 6 000 users, processing 2‑3 PB of HDFS data daily, and achieving major gains in stability, performance, cost, and usability through extensive architectural tweaks, resource isolation, connector extensions, and monitoring enhancements.

Big DataCluster ManagementDruid Connector
0 likes · 18 min read
How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations
Dada Group Technology
Dada Group Technology
Apr 15, 2020 · Big Data

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

This article details Dada Group's development of the Dada Flink SQL engine, describing its background, architecture, parser design, dimension‑table join strategies, numerous enhancements such as HA support, Kafka keyword handling, metadata integration, Redis and ClickHouse sinks, BINLOG simplification, and future migration plans toward Flink 1.10.

FlinkReal‑Time ComputingSQL Engine
0 likes · 12 min read
Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 27, 2020 · Databases

How OceanBase Delivers Cloud‑Native Distributed Relational Database Performance and Availability

This article explains OceanBase's public‑cloud deployment, its unique architecture without a central controller, horizontal scaling via partition groups, LSM‑Tree storage design, advanced SQL engine features, ACID‑plus‑Availability guarantees, and real‑world performance records, illustrating why it suits high‑availability financial workloads.

Cloud NativeLSM‑TreeOceanBase
0 likes · 12 min read
How OceanBase Delivers Cloud‑Native Distributed Relational Database Performance and Availability
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationSQL Engine
0 likes · 12 min read
QuickSQL: 360’s Unified Multi-Source Query Engine Explained
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 18, 2019 · Big Data

How MaxCompute Evolved: 10 Years of Big Data Innovation at Alibaba

This article reviews a decade of MaxCompute development, covering its origins, core technologies, performance gains, ecosystem integration, intelligent features, competitive positioning, and commercialization, while highlighting the platform's role as Alibaba's central big‑data compute engine.

AI integrationBig DataMaxCompute
0 likes · 21 min read
How MaxCompute Evolved: 10 Years of Big Data Innovation at Alibaba
360 Tech Engineering
360 Tech Engineering
Dec 28, 2018 · Databases

Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine

Quicksql is an open‑source unified SQL query engine that simplifies and secures cross‑data‑source queries by providing a consistent ANSI‑based language, automatic engine selection, and support for mixed queries across Hive, MySQL, Elasticsearch, and other platforms, reducing learning and integration costs.

Data IntegrationSQL EngineUnified query
0 likes · 6 min read
Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine