How to Process a 16 GB Log File in Seconds with Go

This guide shows how to efficiently extract timestamp‑filtered logs from a 16 GB text file in Go by reading the file in chunks, reusing memory with sync.Pool, and parallelising processing with goroutines, achieving roughly 25 seconds runtime.

Go Development Architecture Practice
Go Development Architecture Practice
Go Development Architecture Practice
How to Process a 16 GB Log File in Seconds with Go

Modern computer systems generate massive logs daily, and storing debugging data in a database becomes impractical. Most companies keep logs as files on local disks. The article demonstrates how to extract specific log entries from a 16 GB .txt / .log file using Go.

The program starts by opening the file with os.Open and handling any error. Two naïve approaches are considered: line‑by‑line reading (low memory, high CPU) and loading the whole file into memory (fast but impossible for 16 GB). The chosen solution reads the file in fixed‑size chunks using bufio.NewReader, which balances memory usage and speed.

f, err := os.Open(fileName)
if err != nil {
    fmt.Println("cannot able to read the file", err)
    return
}
defer f.Close()

Chunked reading is performed in a loop that obtains a buffer from a sync.Pool, reads into it, and processes the data concurrently. The pool reuses byte slices to reduce garbage‑collector pressure.

r := bufio.NewReader(f)
for {
    buf := linesPool.Get().([]byte)
    n, err := r.Read(buf)
    buf = buf[:n]
    if n == 0 {
        if err != nil {
            fmt.Println(err)
        }
        if err == io.EOF {
            break
        }
        return err
    }
    // read the rest of the line
    nextUntilNewline, err := r.ReadBytes('
')
    if err != io.EOF {
        buf = append(buf, nextUntilNewline...)
    }
    wg.Add(1)
    go func() {
        ProcessChunk(buf, &linesPool, &stringPool, &slicePool, start, end)
        wg.Done()
    }()
}
wg.Wait()

Two optimisation points are highlighted:

sync.Pool reuses memory slices, lowering GC overhead.

Goroutines process each chunk in parallel, dramatically increasing throughput.

The ProcessChunk function splits a chunk into individual log lines, parses the ISO‑8601 timestamp, and prints the line if the timestamp falls between the user‑provided start and end times.

func ProcessChunk(chunk []byte, linesPool *sync.Pool, stringPool *sync.Pool, slicePool *sync.Pool, start time.Time, end time.Time) {
    var wg2 sync.WaitGroup
    logs := string(chunk)
    linesPool.Put(chunk)
    logsSlice := strings.Split(logs, "
")
    stringPool.Put(logs)
    chunkSize := 300
    n := len(logsSlice)
    noOfThread := n / chunkSize
    if n%chunkSize != 0 {
        noOfThread++
    }
    for i := 0; i < noOfThread; i++ {
        wg2.Add(1)
        go func(s, e int) {
            defer wg2.Done()
            for i := s; i < e; i++ {
                text := logsSlice[i]
                if len(text) == 0 {
                    continue
                }
                logSlice := strings.SplitN(text, ",", 2)
                logCreationTimeString := logSlice[0]
                logCreationTime, err := time.Parse("2006-01-02T15:04:05.0000Z", logCreationTimeString)
                if err != nil {
                    fmt.Printf("
 Could not parse the time :%s for log : %v", logCreationTimeString, text)
                    return
                }
                if logCreationTime.After(start) && logCreationTime.Before(end) {
                    fmt.Println(text)
                }
            }
        }(i*chunkSize, int(math.Min(float64((i+1)*chunkSize), float64(len(logsSlice)))))
    }
    wg2.Wait()
}

The command‑line interface expects six arguments: the executable name, -f flag, start timestamp, -t flag, end timestamp, and the log file path. It validates the argument count and parses the timestamps using the layout 2006-01-02T15:04:05.0000Z.

func main() {
    s := time.Now()
    args := os.Args[1:]
    if len(args) != 6 {
        fmt.Println("Please give proper command line arguments")
        return
    }
    startTimeArg := args[1]
    finishTimeArg := args[3]
    fileName := args[5]
    // open file and parse timestamps ...
    // call Process or ProcessChunk as needed
    fmt.Println("
Time taken - ", time.Since(s))
}

Benchmarking on a 16 GB log file shows the extraction completes in about 25 seconds, demonstrating that chunked reading, memory pooling, and concurrent processing can handle massive files efficiently.

For the full source code, see the original Medium article: https://medium.com/swlh/processing-16gb-file-in-seconds-go-lang-3982c235dfa2

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Gosync.PoolLarge FilesFile Processing
Go Development Architecture Practice
Written by

Go Development Architecture Practice

Daily sharing of Golang-related technical articles, practical resources, language news, tutorials, real-world projects, and more. Looking forward to growing together. Let's go!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.