Why DuckDB Beats ClickHouse for Light‑Weight Analytics: 18 M Rows/sec in a Single Go Binary

The article analyzes why traditional row‑oriented databases struggle with high‑volume analytics, introduces DuckDB as an embedded columnar engine for Go, presents benchmark results of up to 18.6 M rows per second writes and 6 M rows per second scans, walks through the Appender API code, and outlines the trade‑offs and ideal hybrid architecture.

TonyBai
TonyBai
TonyBai
Why DuckDB Beats ClickHouse for Light‑Weight Analytics: 18 M Rows/sec in a Single Go Binary

Why MySQL/PostgreSQL struggle with large‑scale analytics

Row‑oriented storage forces the engine to read every column of each row, including large JSON or TEXT fields, even when only a few columns (e.g., user_id and timestamp) are needed. This results in massive I/O waste and CPU spikes to 100 % for queries such as GROUP BY or COUNT(DISTINCT) over millions of rows.

Deploying a dedicated columnar system (ClickHouse, Elasticsearch, Spark, etc.) solves the I/O problem but introduces heavyweight cluster deployment and operational overhead.

DuckDB – the SQLite of the OLAP world

DuckDB is an embedded columnar database that runs inside the application process without a separate server. Go bindings are available via import "github.com/duckdb/duckdb-go/v2", allowing static or dynamic linking.

Benchmarks from the open‑source project Arc (https://github.com/Basekick-Labs/arc) on a MacBook Pro M3 Max show:

Write performance: 18.6 M+ records/second

Write latency: P50 < 0.5 ms, P99 < 4 ms

Query performance: 6 M+ rows/second scanned (Arrow format)

DuckDB achieves these numbers with a vectorized execution engine and native Parquet file support.

Hand‑crafting 18 M rows/sec writes with the Appender API

Standard INSERT INTO … VALUES loops reach only tens of thousands of rows per second. The Appender API bypasses the SQL parser and pushes Go memory structures directly into DuckDB’s column store.

// https://go.dev/play/p/mHXu-kAydDX
package main

import (
    "context"
    "database/sql"
    "fmt"
    "log"
    "time"
    duckdb "github.com/duckdb/duckdb-go/v2"
)

func main() {
    // 1. Create a connector for a local file
    connector, err := duckdb.NewConnector("analytics.db", nil)
    if err != nil { log.Fatal(err) }
    defer connector.Close()

    // 2. Open a standard sql.DB for DDL
    db := sql.OpenDB(connector)
    defer db.Close()
    _, err = db.Exec(`CREATE TABLE IF NOT EXISTS metrics (id INTEGER, name VARCHAR, value DOUBLE, ts TIMESTAMP)`) 
    if err != nil { log.Fatal(err) }

    // 3. Obtain the low‑level driver.Conn for the Appender
    conn, err := connector.Connect(context.Background())
    if err != nil { log.Fatal(err) }
    defer conn.Close()

    // 4. Create an Appender for the "metrics" table
    appender, err := duckdb.NewAppenderFromConn(conn, "", "metrics")
    if err != nil { log.Fatal(err) }
    defer appender.Close()

    startTime := time.Now()
    for i := 0; i < 100000; i++ {
        err := appender.AppendRow(
            int32(i),
            fmt.Sprintf("metric_%d", i%10),
            float64(i%100),
            time.Now(),
        )
        if err != nil { log.Fatal(err) }
    }
    elapsed := time.Since(startTime)
    fmt.Printf("Inserted 100k rows in: %v
", elapsed)
    fmt.Printf("Throughput: %.0f rows/sec
", 100000.0/elapsed.Seconds())
}

On a 2019 Intel‑based MacBook Pro, the program inserts 100 k rows in 69 ms, yielding ~1.44 M rows/second—over 100× faster than plain SQL INSERT.

Replacing ELK with a single Go binary

DuckDB can query CSV or Parquet files on local disk or S3 without any ETL step. Example Go method that returns hourly PV/UV statistics directly from Parquet files:

func (adb *AnalyticsDB) GetHourlyStats() (map[string]interface{}, error) {
    rows, err := adb.db.Query(`
        SELECT
            DATE_TRUNC('hour', timestamp) AS hour,
            COUNT(*) AS pv,
            COUNT(DISTINCT path) AS uv
        FROM read_parquet('s3://my-bucket/nginx_logs/*.parquet')
        WHERE timestamp > NOW() - INTERVAL '24 hours'
        GROUP BY hour
        ORDER BY hour DESC
    `)
    // ... parse rows and return
    return nil, err
}

This pattern provides OLAP‑level analytics comparable to ClickHouse while the deployment artifact remains a single few‑megabyte Go binary.

Limitations – not a universal silver bullet

DuckDB excels at high‑throughput analytical workloads but lacks row‑level locking and optimizations for short, frequent transactions. It is therefore unsuitable for high‑concurrency OLTP scenarios such as e‑commerce order processing.

Practical architecture

Keep PostgreSQL/MySQL for core transactional logic and embed DuckDB in the Go application for side‑track log or report aggregation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GoPerformance BenchmarkOLAPEmbedded DatabaseColumnar StorageDuckDB
TonyBai
Written by

TonyBai

Tony Bai's tech world (tonybai.com). Not satisfied with just "knowing how", we strive for mastery. Focused on Go language internals, high-quality engineering practices, and cloud‑native architecture, exploring cutting‑edge intersections of Go and AI. Gophers who pursue technology are welcome—follow me and evolve with Go.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.