Tag

SQL engine

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Jun 24, 2024 · Big Data

Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation

The paper presents a Spark acceleration framework that replaces Java‑based task operators with a ClickHouse native library, converting plans via Protobuf and JNI, leveraging columnar storage, SIMD and JIT to achieve up to 3× speed‑up on TPC‑DS workloads while providing fallback mechanisms to ensure no performance loss.

Big DataClickHouseNative Acceleration
0 likes · 31 min read
Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation
DataFunSummit
DataFunSummit
Feb 26, 2024 · Big Data

Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon

This article introduces a new lakehouse analytics paradigm by combining StarRocks and Paimon, covering the evolution of data lake technologies, key integration scenarios, core technical mechanisms such as JNI connectors, materialized views, and future roadmap for enhanced lakehouse capabilities.

Big DataLakehousePaimon
0 likes · 16 min read
Building a New Lakehouse Analytics Paradigm with StarRocks and Paimon
DataFunTalk
DataFunTalk
Nov 12, 2023 · Big Data

MaxCompute Incremental Update Architecture, Intelligent Materialized Views, and Adaptive Execution Optimizations

This article presents a comprehensive overview of MaxCompute's near‑real‑time incremental update and processing architecture, the design of Transactional Table 2.0, intelligent materialized view evolution and recommendation, as well as multi‑level adaptive execution optimizations for the SQL engine, illustrating how these innovations improve efficiency, cost, and scalability for large‑scale data workloads.

Adaptive ExecutionBig DataMaxCompute
0 likes · 20 min read
MaxCompute Incremental Update Architecture, Intelligent Materialized Views, and Adaptive Execution Optimizations
DataFunSummit
DataFunSummit
Jul 18, 2023 · Databases

Apache Doris Data Lake Federation Features Overview

This article introduces Apache Doris’s data lake federation capabilities, detailing its lake‑warehouse integration design, supported data sources such as Hive, Iceberg, Hudi, and Elasticsearch, performance optimizations for metadata and file access, case studies, community roadmap, and Q&A on replacing Presto.

Apache DorisBig DataSQL engine
0 likes · 21 min read
Apache Doris Data Lake Federation Features Overview
Big Data Technology Architecture
Big Data Technology Architecture
Jul 4, 2023 · Databases

Apache Doris 2.0‑beta Release: New Query Optimizer, Pipeline Engine, Workload Management and Performance Enhancements

Apache Doris 2.0‑beta, released on July 3, 2023, introduces a modern Cascades‑based query optimizer, a data‑driven pipeline execution engine, fine‑grained workload groups, enhanced memory management, partial‑column updates, compute nodes, cold‑hot tiering and cross‑cluster replication, delivering up to tenfold speedups and significant cost reductions for real‑time analytics.

Apache DorisDatabasePerformance
0 likes · 24 min read
Apache Doris 2.0‑beta Release: New Query Optimizer, Pipeline Engine, Workload Management and Performance Enhancements
Alimama Tech
Alimama Tech
Feb 15, 2023 · Big Data

Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview

Dolphin, Alibaba’s hyper‑converged multi‑modal big‑data engine, unifies OLAP, AI, streaming, and batch workloads on a decoupled compute‑storage MPP foundation, offering a Dolphin SQL layer, advanced bitmap/GroupTable/AFile indexes, intelligent materialization, and one‑write‑multiple‑read storage that cuts costs over 70% while delivering sub‑millisecond queries on trillion‑row datasets.

AIBig DataIndexing
0 likes · 14 min read
Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview
DataFunTalk
DataFunTalk
Aug 21, 2022 · Databases

Deep Dive into OpenMLDB Architecture: Millisecond‑Level Real‑Time Feature Computation Engine

This article provides a comprehensive technical overview of OpenMLDB, covering its overall architecture, online real‑time SQL execution and storage engines, core data structures, pre‑aggregation techniques, and performance test results that demonstrate millisecond‑level latency for feature computation.

Distributed DatabaseOpenMLDBPerformance Testing
0 likes · 13 min read
Deep Dive into OpenMLDB Architecture: Millisecond‑Level Real‑Time Feature Computation Engine
DaTaobao Tech
DaTaobao Tech
Aug 11, 2022 · Big Data

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

The article describes how fragmented real‑time, batch, and online data‑warehouse pipelines suffer from low productivity and inconsistent data quality, and introduces a unified SQL engine built on Apache Calcite that parses, optimizes, and compiles a single SQL statement into executable plans for ODPS, Flink, or Java, leveraging Janino code generation, multi‑backend state storage, and snapshot‑join semantics to boost performance and simplify development.

CalciteData WarehouseFlink
0 likes · 16 min read
Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing
DataFunSummit
DataFunSummit
Mar 21, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

This article explains how Apache Doris adopts CPU‑level vectorization and columnar storage to boost query performance, details the design and current status of its vectorized engine, and outlines future work such as JOIN acceleration, storage‑layer vectorization, import optimization, and extensive SQL function support.

Apache DorisPerformance OptimizationSIMD
0 likes · 21 min read
Vectorization in Apache Doris: Design, Implementation, and Future Roadmap
Bilibili Tech
Bilibili Tech
Feb 18, 2022 · Big Data

Evolution of Bilibili's Data Retrieval Services and Lakehouse Architecture

Bilibili’s data retrieval journey progressed from a fragmented, chimney‑style pipeline to a unified Flink‑based service layer with the Ark construction system and Akuya SQL engine, and finally to an Iceberg‑driven lakehouse that eliminates data duplication, streamlines cross‑engine optimization, and offers platformized, low‑latency analytics.

Big DataBilibiliData Retrieval
0 likes · 14 min read
Evolution of Bilibili's Data Retrieval Services and Lakehouse Architecture
DataFunTalk
DataFunTalk
Jan 27, 2022 · Big Data

Kyuubi: NetEase’s Open‑Source Multi‑Tenant SQL Engine for Large‑Scale Data Processing

This article introduces Kyuubi, the first NetEase project contributed to the Apache Foundation, describing its core features, multi‑tenant architecture, Spark‑based execution engine, cloud‑native capabilities, and real‑world use cases within NetEase’s data‑warehouse, ad‑hoc, and internal systems, along with performance gains and community resources.

ApacheBig DataKyuubi
0 likes · 23 min read
Kyuubi: NetEase’s Open‑Source Multi‑Tenant SQL Engine for Large‑Scale Data Processing
Architecture Digest
Architecture Digest
Jul 25, 2021 · Big Data

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

The article details the background, architecture, core features, scheduling mechanisms, Lisp‑based query DSL, and Alluxio integration of Vipshop's self‑developed Hera data service, illustrating how it unifies multi‑engine data access, improves SLA, and accelerates large‑scale crowd computing tasks.

AlluxioBig DataData Service
0 likes · 21 min read
Design and Architecture of Hera Data Service for Unified Data Access at Vipshop
NetEase Game Operations Platform
NetEase Game Operations Platform
May 22, 2021 · Big Data

Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi

This article systematically introduces NetEase Kyuubi, an open‑source high‑performance JDBC and SQL execution engine built on Apache Spark, covering its background, core architecture, service discovery, session and operation management, startup processes, and key source‑code implementations with detailed code examples.

Apache ThriftBig DataKyuubi
0 likes · 47 min read
Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi
DataFunTalk
DataFunTalk
Jan 6, 2021 · Big Data

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

This article presents Didi's three‑year experience with Presto, detailing its architecture, low‑latency design, large‑scale deployment, extensive Hive compatibility work, resource isolation, Druid connector integration, usability enhancements, stability engineering, performance tuning, and future directions for the ad‑hoc query engine.

Big DataDistributed SystemsDruid Connector
0 likes · 17 min read
Didi's Presto Engine: Architecture, Optimizations, and Operational Practices
Dada Group Technology
Dada Group Technology
Apr 15, 2020 · Big Data

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

This article details Dada Group's development of the Dada Flink SQL engine, describing its background, architecture, parser design, dimension‑table join strategies, numerous enhancements such as HA support, Kafka keyword handling, metadata integration, Redis and ClickHouse sinks, BINLOG simplification, and future migration plans toward Flink 1.10.

Big DataClickHouseFlink
0 likes · 12 min read
Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataSQL enginedata integration
0 likes · 12 min read
QuickSQL: 360’s Unified Multi-Source Query Engine Explained
360 Tech Engineering
360 Tech Engineering
Dec 28, 2018 · Databases

Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine

Quicksql is an open‑source unified SQL query engine that simplifies and secures cross‑data‑source queries by providing a consistent ANSI‑based language, automatic engine selection, and support for mixed queries across Hive, MySQL, Elasticsearch, and other platforms, reducing learning and integration costs.

SQL enginecross-data-sourcedata integration
0 likes · 6 min read
Quicksql: A Unified, Secure, and Fast Cross-Data-Source SQL Query Engine