Databases 24 min read

Exploring Presto SQL Engine (3) - Implementing WHERE Condition Filtering with Antlr and Dynamic Code Generation

The third article in the Presto SQL Engine series demonstrates how to implement WHERE‑clause filtering with Antlr, contrasting a direct AST‑traversal visitor approach—hampered by branch prediction and JVM inlining issues—with runtime bytecode generation using airlift.bytecode, which yields roughly three‑fold speed gains but adds complexity.

vivo Internet Technology

May 31, 2022

Exploring Presto SQL Engine (3) - Implementing WHERE Condition Filtering with Antlr and Dynamic Code Generation

This article is the third in the "Exploring Presto SQL Engine" series, focusing on implementing WHERE condition filtering using Antlr and comparing two implementation approaches: direct AST traversal versus dynamic bytecode generation.

Background: In massive data scenarios with strict response time requirements, simple data filtering using WHERE and JOIN presents many technical challenges. Boolean expression evaluation is fundamental to WHERE and JOIN processing in Presto. The article explores the implementation of WHERE conditions supporting AND, OR, and NOT logic operators, custom优先级 via parentheses, and field/function expressions.

Implementation Approach 1 - Direct AST Traversal: Using Antlr to define grammar rules for booleanExpression, implementing visitor methods for comparison and logical binary expressions, and traversing the SQL syntax tree using the Visitor pattern. The implementation uses a stack to store comparison results and performs recursive evaluation for AND/OR operations.

Performance Analysis: The article explains why direct parsing has performance issues: 1) CPU branch prediction is affected by conditional logic in data processing; 2) JVM method inlining is hindered by polymorphic calls for different data types and operators. A benchmark shows a 6x performance difference between normal execution and disabled inlining.

Implementation Approach 2 - Dynamic Code Generation: Using airlift.bytecode framework to generate bytecode at runtime. The approach treats WHERE conditions as binary trees and generates code by recursively traversing the expression tree. For leaf nodes (ComparisonExpression), it generates code to extract values from parameters and perform comparisons; for non-leaf nodes (LogicalBinaryExpression), it generates AND/OR operations using the stack.

Performance Results: Using JMH with 100,000 records, the code generation approach achieves ~211 ops/s compared to ~62 ops/s for direct parsing, demonstrating approximately 3x performance improvement.

Conclusion: Dynamic code generation significantly improves performance by eliminating branch prediction overhead and enabling better JVM optimization. However, developers should weigh the ROI considering the increased complexity in code readability and maintainability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization SQL parsing Airlift Bytecode Bytecode Generation Dynamic Code Generation Presto SQL Engine WHERE Condition Filtering

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.