Why ByteDance’s Sonic JSON Library Beats the Rest: JIT, SIMD, and Lazy‑Load Explained

The article introduces Sonic, ByteDance’s high‑performance Go JSON library built with Just‑In‑Time compilation and SIMD vectorization, explains its design motivations, usage patterns, API features, compatibility considerations, and showcases benchmark results that demonstrate its superiority over other popular JSON parsers.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why ByteDance’s Sonic JSON Library Beats the Rest: JIT, SIMD, and Lazy‑Load Explained

Sonic is ByteDance’s open‑source Go JSON library that leverages Just‑In‑Time compilation and Single Instruction Multiple Data (SIMD) techniques to dramatically improve JSON encoding and decoding performance, while offering a lazy‑load design for versatile APIs across various business scenarios.

自研背景

Go includes the standard JSON library encoding/json and many third‑party options such as Json‑iterator, Easyjson, Gjson, and Sjson, with Json‑iterator being the most popular. ByteDance chose to develop its own JSON parser because JSON’s text‑based nature and lack of schema enforcement often lead to low‑efficiency encoding/decoding, and improper library choices can severely degrade service performance.

Analysis of ByteDance’s production services revealed that JSON serialization and deserialization consume nearly 10% of CPU, sometimes exceeding 40% in extreme cases, making JSON library performance a critical factor for improving machine utilization.

We evaluated existing Go JSON libraries and categorized their APIs into three usage patterns:

Generic (generic) encoding/decoding : JSON has no schema and is decoded into runtime objects such as map[string]interface{}.

Binding (binding) encoding/decoding : JSON is bound to a Go struct, providing both parsing and validation.

Get & set : A specific path is used to retrieve or modify a portion of the JSON value.

We also defined three JSON size levels based on key count and depth:

Small: 400 B, 11 keys, depth 3.

Medium: 110 KB, 300+ keys, depth 4 (real‑world data with many nested JSON strings).

Large: 550 KB, 10 000+ keys, depth 6.

如何使用

依赖

Go 1.16~1.20

Linux / macOS / Windows (requires Go 1.17+)

amd64 architecture

特色

Runtime object binding without code generation

Comprehensive JSON operation API

Fast, faster, fastest!

序列化/反序列化

The default behavior matches encoding/json except for HTML escaping and the SortKeys feature, which are omitted to follow RFC8259.

import "github.com/bytedance/sonic"

var data YourSchema
// Marshal
output, err := sonic.Marshal(&data)
// Unmarshal
err := sonic.Unmarshal(output, &data)

流式输入输出

Sonic can decode JSON from an io.Reader or encode an object to JSON and write it to an io.Writer, reducing memory usage for multiple values.

Encoder

var o1 = map[string]interface{}{ "a": "b" }
var o2 = 1
var w = bytes.NewBuffer(nil)
var enc = sonic.ConfigDefault.NewEncoder(w)
enc.Encode(o1)
enc.Encode(o2)
fmt.Println(w.String()) // {"a":"b"}1

Decoder

var o = map[string]interface{}{}
var r = strings.NewReader(`{"a":"b"}{"1":"2"}`)
var dec = sonic.ConfigDefault.NewDecoder(r)
dec.Decode(&o)
dec.Decode(&o)
fmt.Printf("%+v", o) // map[1:2 a:b]

使用 Number / int64

import "github.com/bytedance/sonic/decoder"

var input = `1`
var data interface{}
// default float64
dc := decoder.NewDecoder(input)
dc.Decode(&data) // data == float64(1)
// use json.Number
dc = decoder.NewDecoder(input)
dc.UseNumber()
dc.Decode(&data) // data == json.Number("1")
// use int64
dc = decoder.NewDecoder(input)
dc.UseInt64()
dc.Decode(&data) // data == int64(1)

root, _ := sonic.GetFromString(input)
jn := root.Number()
jm := root.InterfaceUseNumber().(json.Number) // jn == jm
fn := root.Float64()
fm := root.Interface().(float64) // fn == fm

对键排序

Sorting incurs about a 10% performance penalty, so Sonic disables it by default. Enable it when required, e.g., for zstd compatibility:

import "github.com/bytedance/sonic"
import "github.com/bytedance/sonic/encoder"

m := map[string]interface{}{}
v, err := encoder.Encode(m, encoder.SortMapKeys)
// or ast.Node.SortKeys() before marshal
var root = sonic.Get(JSON)
err := root.SortKeys()

HTML 转义

HTML escaping is disabled by default (≈15% overhead). Enable it with encoder.EscapeHTML, which behaves like encoding/json.HTMLEscape:

import "github.com/bytedance/sonic"

v := map[string]string{"&&": "<>"}
ret, err := Encode(v, EscapeHTML) // ret == `{"\u0026\u0026":{"X":"\u003c\u003e"}}`

紧凑格式

Sonic outputs compact JSON for basic types unless json.RawMessage or json.Marshaler is used. Use encoder.CompactMarshaler to enforce compactness.

打印错误

Invalid JSON triggers decoder.SyntaxError, which provides a formatted error location.

import "github.com/bytedance/sonic"
import "github.com/bytedance/sonic/decoder"

var data interface{}
err := sonic.UnmarshalString("[[[}]]", &data)
if err != nil {
    println(err.Error()) // "Syntax error at index 3: invalid char..."
    if e, ok := err.(decoder.SyntaxError); ok {
        print(e.Description())
    }
}

类型不匹配 [Sonic v1.6.0]

When a key’s value type mismatches, Sonic throws decoder.MismatchTypeError, reports the last mismatch, and continues decoding subsequent values.

import "github.com/bytedance/sonic"
import "github.com/bytedance/sonic/decoder"

var data struct{ A int; B int }
err := UnmarshalString(`{"A":"1","B":1}`, &data)
println(err.Error()) // Mismatch type int with value string "1"
fmt.Printf("%+v", data) // {A:0 B:1}

Ast.Node

Ast.Node is a standalone JSON abstract syntax tree library offering robust APIs for serialization, deserialization, and generic data manipulation.

查找/索引

Given a path (non‑negative integers, strings, or nil), the library returns the matching JSON fragment.

import "github.com/bytedance/sonic"

input := []byte(`{"key1":[{},{"key2":{"key3":[1,2,3]}}]}`)
root, _ := sonic.Get(input)
raw := root.Raw() // == original input
sub, _ := sonic.Get(input, "key1", 1, "key2")
value := sub.Get("key3").Index(2).Int64() // == 3

Note : Index() uses offset‑based positioning, which is much faster than Get() scanning.

修改

Use Set() / Unset() to modify JSON content.

import "github.com/bytedance/sonic"

root, _ := sonic.Get(input)
exist, _ := root.Set("key4", NewBool(true)) // exist == false
alias1 := root.Get("key4")
println(alias1.Valid()) // true
exist, _ = root.UnsetByIndex(1)
println(root.Get("key4").Check()) // "value not exist"

序列化

Encode an ast.Node to JSON with MarshalJson() or json.Marshal() (pointer required).

import (
    "encoding/json"
    "github.com/bytedance/sonic"
)

buf, _ := root.MarshalJson()
println(string(buf)) // {"key1":[{},{"key2":{"key3":[1,2,3]}}]}
exp, _ := json.Marshal(&root)
println(string(buf) == string(exp)) // true

APIs

Validity checks: Check(), Error(), Valid(), Exist() Indexing: Index(), Get(), IndexPair(), IndexOrGet(), GetByPath() Conversion to Go types: Int64(), Float64(), String(), Number(), Bool(), Map[UseNumber|UseNode](), Array[UseNumber|UseNode](), Interface[UseNumber|UseNode]() Go type constructors: NewRaw(), NewNumber(), NewNull(), NewBool(), NewString(), NewObject(), NewArray() Iteration: Values(), Properties(), ForEach(), SortKeys() Modification: Set(), SetByIndex(),

Add()

Ast.Visitor

Sonic provides a high‑level API for full‑JSON parsing into non‑standard containers without intermediate representations, using a SAX‑style ast.Visitor interface.

type Visitor interface {
    OnNull() error
    OnBool(v bool) error
    OnString(v string) error
    OnInt64(v int64, n json.Number) error
    OnFloat64(v float64, n json.Number) error
    OnObjectBegin(capacity int) error
    OnObjectKey(key string) error
    OnObjectEnd() error
    OnArrayBegin(capacity int) error
    OnArrayEnd() error
}

func Preorder(str string, visitor Visitor, opts *VisitorOptions) error { /* ... */ }

兼容性

Due to the difficulty of developing high‑performance code, Sonic does not guarantee support on all environments. Recommendations:

Mac M1: install Rosetta 2 and set GOARCH=amd64 during build.

Linux arm64: use qemu‑x86_64 with qemu-x86_64 -cpu max for binary translation.

For strict compatibility with encoding/json, Sonic offers three configuration presets: ConfigDefault: matches standard library with EscapeHTML=false and SortKeys=false. ConfigStd: matches standard library with EscapeHTML=true and SortKeys=true. ConfigFastest: fastest mode with NoQuoteTextMarshaler=true, some options become ineffective.

注意事项

预热

Because Sonic uses golang‑asm as a JIT assembler, the first large‑scale execution may cause timeouts or memory spikes. It is recommended to call Pretouch() before heavy Marshal()/Unmarshal() usage.

import (
    "reflect"
    "github.com/bytedance/sonic"
    "github.com/bytedance/sonic/option"
)

func init() {
    var v HugeStruct
    // Simple pretouch for typical nesting depth
    err := sonic.Pretouch(reflect.TypeOf(v))
    // For deeper nesting, customize compile options
    err = sonic.Pretouch(reflect.TypeOf(v), option.WithCompileRecursiveDepth(loop), option.WithCompileMaxInlineDepth(depth))
}

拷贝字符串

When decoding strings without escape characters, Sonic references the original JSON buffer instead of copying, saving CPU cycles but potentially retaining the entire buffer in memory. Use decoder.CopyString() to force copying when memory usage is a concern.

传递字符串还是字节数组?

For compatibility with encoding/json, Sonic accepts []byte inputs, which are copied for safety on large payloads. Use UnmarshalString() or GetFromString() for zero‑copy string handling, and MarshalString() for zero‑copy output.

加速 encoding.TextMarshaler

Sonic’s encoder references and escapes strings from encoding.TextMarshaler by default, which can be costly. The encoder.NoQuoteTextMarshaler option skips these steps, but callers must ensure the output complies with RFC8259.

泛型的性能优化

Full parsing with Unmarshal() outperforms a combination of Get() and Node.Interface(). For partial JSON access, combine Get() with Unmarshal() on the extracted fragment.

import "github.com/bytedance/sonic"

node, _ := sonic.GetFromString(_TwitterJson, "statuses", 3, "user")
var user User // partial schema
err := sonic.UnmarshalString(node.Raw(), &user)

When no concrete schema is needed, use ast.Node instead of map or interface{} for lazy parsing and lower memory overhead.

import "github.com/bytedance/sonic"

root, _ := sonic.GetFromString(_TwitterJson)
user := root.GetByPath("statuses", 3, "user")
err := user.Check()
// For concurrent use, call Load() or LoadAll() first.
go someFunc(user)

Note : ast.Node is not inherently thread‑safe; invoke Node.Load() or Node.LoadAll() to achieve concurrency safety, which may incur a performance trade‑off but remains more efficient than converting to map or interface{}.

使用 ast.Node 还是 ast.Visitor ?

For most generic data parsing, ast.Node suffices. However, when ultimate performance is required, implementing a custom ast.Visitor provides a SAX‑style parsing path similar to Unmarshal() but without intermediate representations.

底层原理

ByteDance’s R&D team identified three core problems:

Standard library’s reflection‑heavy schema handling incurs high call overhead.

Json‑iterator’s generated functions still suffer from interface dispatch and lack of inlining.

SIMD‑based parsers excel on large inputs but add overhead on small or irregular strings.

To address these, Sonic adopts:

JIT compilation to assemble bytecode matching the Go type schema, eliminating dynamic dispatch.

Hybrid SIMD‑scalar execution, selecting the optimal path based on input size and characteristics.

C/Clang‑compiled core functions, transformed to Plan 9 assembly via an asm2asm tool for seamless Go runtime loading.

Additional optimizations include a lightweight global function table with register‑based parameter passing, and a custom open‑addressing hash cache (replacing sync.Map) for high‑throughput static caching.

性能测试

Benchmark scripts evaluate encoder, decoder, AST operations, and parser performance across small (400 B), medium (110 KB), and large (635 KB) JSON payloads. Results consistently show Sonic achieving the highest throughput and lowest allocation counts compared with Json‑iterator, GoJson, and the standard library.

Medium JSON (13 KB, 300+ keys, 6 layers) example:

goversion: 1.17.1
cpu: Intel(R) Core(TM) i9-9880H @ 2.30GHz

BenchmarkEncoder_Generic_Sonic-16          32393 ns/op   402.40 MB/s   11965 B/op   4 allocs/op
BenchmarkEncoder_Binding_Sonic-16           6269 ns/op   2079.26 MB/s  14173 B/op   4 allocs/op
BenchmarkDecoder_Binding_Sonic-16           32557 ns/op   400.38 MB/s   28302 B/op   137 allocs/op
BenchmarkGetOne_Sonic-16                     3276 ns/op   3975.78 MB/s   24 B/op   1 allocs/op
BenchmarkSetOne_Sonic-16                     9571 ns/op   1360.61 MB/s  1584 B/op   17 allocs/op

Small JSON (400 B) and large JSON (635 KB) benchmarks similarly demonstrate Sonic’s dominance.

Original article: https://blog.csdn.net/qq_27681741/article/details/131696806 (© original author).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceJITJSONSIMDLibrary
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.