Databases 18 min read

Understanding the SQL Execution Process in ClickHouse

This article explains in detail how ClickHouse processes a user‑submitted SQL query, covering the server’s request handling, parsing, query rewrite, optimization, interpreter execution, and result transmission, while illustrating key source code snippets and architectural components.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding the SQL Execution Process in ClickHouse

User Submits a Query: What Happens Inside ClickHouse?

When a client sends an SQL statement, ClickHouse’s server receives the network packet, extracts the query, and initializes a context and network handler before delegating the request to the appropriate protocol handler (e.g., TCP).

Server Initialization (server.cpp)

int Server::main(){
    // Initialize global context
    global_context = std::make_unique<Context>(Context::createGlobal());
    global_context->setApplicationType(Context::ApplicationType::SERVER);

    // ZooKeeper initialization
    zkutil::ZooKeeperNodeCache main_config_zk_node_cache([&] { return global_context->getZooKeeper(); });

    // Bind listening port
    auto address = make_socket_address(host, port);
    socket.bind(address, true);

    // Create servers for supported protocols (HTTP, HTTPS, TCP, etc.)
    create_server("tcp_port", [&](UInt16 port){
        Poco::Net::ServerSocket socket;
        auto address = socket_bind_listen(socket, listen_host, port);
        servers.emplace_back(std::make_unique<Poco::Net::TCPServer>(
            new TCPHandlerFactory(*this), server_pool, socket, new Poco::Net::TCPServerParams));
    });

    // Start all servers
    for (auto & server : servers)
        server->start();
}

TCP Handler Workflow

void TCPHandler::runImpl(){
    in = std::make_shared<ReadBufferFromPocoSocket>(socket());
    out = std::make_shared<WriteBufferFromPocoSocket>(socket());

    while (1){
        receivePacket();
        state.io = executeQuery(state.query, *query_context, false, state.stage, may_have_embedded_data);
        processInsertQuery();
        processOrdinaryQueryWithProcessors();
        processOrdinaryQuery();
    }
}

Core Query Execution (executeQueryImpl)

static std::tuple<ASTPtr, BlockIO> executeQueryImpl(){
    ParserQuery parser(end, settings.enable_debug_queries);
    ASTPtr ast = parseQuery(parser, begin, end, "", max_query_size);
    auto interpreter = InterpreterFactory::get(ast, context, stage);
    BlockIO res = interpreter->execute();
    return std::make_tuple(ast, res);
}

The function builds a parser, converts the SQL text into an abstract syntax tree (AST), creates an interpreter based on the AST type, and runs the interpreter to obtain a BlockIO result.

SQL Processor Components

Query Parsing : Lexical analysis tokenizes the input, and a recursive‑descent parser builds the AST. Example token flow is shown in tryParseQuery() and ParserQuery::parseImpl().

ASTPtr tryParseQuery(){
    Tokens tokens(pos, end, max_query_size);
    IParser::Pos token_iterator(tokens);
    ASTPtr res;
    bool parse_res = parser.parse(token_iterator, res, expected);
    return res;
}

Query Rewrite (Logical Optimizer) : Applies rule‑based transformations such as predicate push‑down, view expansion, constant folding, and expression simplification.

Query Optimizer (Physical Optimizer) : Generates an efficient execution plan, represented as a data‑flow graph of operators (e.g., scan, filter, join, aggregation).

Query Executor : Executes the plan using either the classic Volcano model or ClickHouse’s vectorized execution engine, reading data from storage engines and producing result blocks.

Interpreter and Execution Pipeline

Each AST type has a corresponding interpreter created by InterpreterFactory::get. For a SELECT query, InterpreterSelectQuery performs syntax analysis, builds an ExpressionActionsChain, and then runs a pipeline of BlockInputStream / BlockOutputStream stages such as executeWhere, executeAggregation, and executeDistinct.

void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputStreamPtr & prepared_input){
    auto & query = getSelectQuery();
    AnalysisResult expressions = analyzeExpressions(...);
    executeFetchColumns(...);
    executeWhere(pipeline, expressions.before_where, ...);
    executeAggregation(pipeline, expressions.before_aggregation, ...);
    executeDistinct(pipeline, true, expressions.selected_columns);
}

The optimizer ( SyntaxAnalyzer) performs common sub‑expression elimination, scalar sub‑query constant replacement, and predicate push‑down, among other rule‑based improvements.

Insert Query Handling

BlockIO InterpreterInsertQuery::execute(){
    StoragePtr table = getTable(query);
    BlockOutputStreamPtr out = std::make_shared<AddingDefaultBlockOutputStream>( ... );
    BlockIO res; res.out = std::move(out);
    return res;
}

Both reads and writes ultimately invoke the underlying storage engine’s read and write interfaces (e.g., MergeTree).

Result Transmission

void TCPHandler::processOrdinaryQuery(){
    AsynchronousBlockInputStream async_in(state.io.in);
    while (true){
        Block block = async_in.read();
        sendData(block);
    }
}

void TCPHandler::sendData(const Block & block){
    initBlockOutput(block);
    state.block_out->write(block);
    state.maybe_compressed_out->next();
    out->next();
}

The server streams the resulting blocks back to the client through the socket buffer.

Conclusion

Understanding ClickHouse’s end‑to‑end SQL processing pipeline—from network reception, parsing, logical rewrite, physical optimization, to execution and result delivery—helps database users write more efficient queries and enables kernel developers to grasp the system’s architecture for better performance tuning and feature development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendSQLClickHouseQuery ExecutioninterpreterDatabase InternalsParser
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.