Understanding the SQL Execution Process in ClickHouse
This article explains in detail how ClickHouse processes a user‑submitted SQL query, covering the server’s request handling, parsing, query rewrite, optimization, interpreter execution, and result transmission, while illustrating key source code snippets and architectural components.
User Submits a Query: What Happens Inside ClickHouse?
When a client sends an SQL statement, ClickHouse’s server receives the network packet, extracts the query, and initializes a context and network handler before delegating the request to the appropriate protocol handler (e.g., TCP).
Server Initialization (server.cpp)
int Server::main(){
// Initialize global context
global_context = std::make_unique<Context>(Context::createGlobal());
global_context->setApplicationType(Context::ApplicationType::SERVER);
// ZooKeeper initialization
zkutil::ZooKeeperNodeCache main_config_zk_node_cache([&] { return global_context->getZooKeeper(); });
// Bind listening port
auto address = make_socket_address(host, port);
socket.bind(address, true);
// Create servers for supported protocols (HTTP, HTTPS, TCP, etc.)
create_server("tcp_port", [&](UInt16 port){
Poco::Net::ServerSocket socket;
auto address = socket_bind_listen(socket, listen_host, port);
servers.emplace_back(std::make_unique<Poco::Net::TCPServer>(
new TCPHandlerFactory(*this), server_pool, socket, new Poco::Net::TCPServerParams));
});
// Start all servers
for (auto & server : servers)
server->start();
}TCP Handler Workflow
void TCPHandler::runImpl(){
in = std::make_shared<ReadBufferFromPocoSocket>(socket());
out = std::make_shared<WriteBufferFromPocoSocket>(socket());
while (1){
receivePacket();
state.io = executeQuery(state.query, *query_context, false, state.stage, may_have_embedded_data);
processInsertQuery();
processOrdinaryQueryWithProcessors();
processOrdinaryQuery();
}
}Core Query Execution (executeQueryImpl)
static std::tuple<ASTPtr, BlockIO> executeQueryImpl(){
ParserQuery parser(end, settings.enable_debug_queries);
ASTPtr ast = parseQuery(parser, begin, end, "", max_query_size);
auto interpreter = InterpreterFactory::get(ast, context, stage);
BlockIO res = interpreter->execute();
return std::make_tuple(ast, res);
}The function builds a parser, converts the SQL text into an abstract syntax tree (AST), creates an interpreter based on the AST type, and runs the interpreter to obtain a BlockIO result.
SQL Processor Components
Query Parsing : Lexical analysis tokenizes the input, and a recursive‑descent parser builds the AST. Example token flow is shown in tryParseQuery() and ParserQuery::parseImpl().
ASTPtr tryParseQuery(){
Tokens tokens(pos, end, max_query_size);
IParser::Pos token_iterator(tokens);
ASTPtr res;
bool parse_res = parser.parse(token_iterator, res, expected);
return res;
}Query Rewrite (Logical Optimizer) : Applies rule‑based transformations such as predicate push‑down, view expansion, constant folding, and expression simplification.
Query Optimizer (Physical Optimizer) : Generates an efficient execution plan, represented as a data‑flow graph of operators (e.g., scan, filter, join, aggregation).
Query Executor : Executes the plan using either the classic Volcano model or ClickHouse’s vectorized execution engine, reading data from storage engines and producing result blocks.
Interpreter and Execution Pipeline
Each AST type has a corresponding interpreter created by InterpreterFactory::get. For a SELECT query, InterpreterSelectQuery performs syntax analysis, builds an ExpressionActionsChain, and then runs a pipeline of BlockInputStream / BlockOutputStream stages such as executeWhere, executeAggregation, and executeDistinct.
void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputStreamPtr & prepared_input){
auto & query = getSelectQuery();
AnalysisResult expressions = analyzeExpressions(...);
executeFetchColumns(...);
executeWhere(pipeline, expressions.before_where, ...);
executeAggregation(pipeline, expressions.before_aggregation, ...);
executeDistinct(pipeline, true, expressions.selected_columns);
}The optimizer ( SyntaxAnalyzer) performs common sub‑expression elimination, scalar sub‑query constant replacement, and predicate push‑down, among other rule‑based improvements.
Insert Query Handling
BlockIO InterpreterInsertQuery::execute(){
StoragePtr table = getTable(query);
BlockOutputStreamPtr out = std::make_shared<AddingDefaultBlockOutputStream>( ... );
BlockIO res; res.out = std::move(out);
return res;
}Both reads and writes ultimately invoke the underlying storage engine’s read and write interfaces (e.g., MergeTree).
Result Transmission
void TCPHandler::processOrdinaryQuery(){
AsynchronousBlockInputStream async_in(state.io.in);
while (true){
Block block = async_in.read();
sendData(block);
}
}
void TCPHandler::sendData(const Block & block){
initBlockOutput(block);
state.block_out->write(block);
state.maybe_compressed_out->next();
out->next();
}The server streams the resulting blocks back to the client through the socket buffer.
Conclusion
Understanding ClickHouse’s end‑to‑end SQL processing pipeline—from network reception, parsing, logical rewrite, physical optimization, to execution and result delivery—helps database users write more efficient queries and enables kernel developers to grasp the system’s architecture for better performance tuning and feature development.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
