Mastering Go’s unsafe Package: 5 Real‑World Cases for Zero‑Copy and High‑Performance Tricks
This article walks through five production‑grade Go unsafe techniques—including zero‑copy string conversion, deep struct copying, lock‑free queues, memory‑mapped files, and custom serialization—explaining core concepts, providing benchmark results, and offering a detailed safety checklist to avoid common pitfalls.
Why use unsafe?
Before diving in, a performance comparison shows that converting a string to []byte with the ordinary method copies memory, while the unsafe approach uses unsafe.Slice to achieve zero‑copy. Benchmarks reveal a 130× speed increase and zero allocations.
Core concepts: unsafe.Pointer vs uintptr
unsafe.Pointer is a GC‑visible pointer and is safe to keep around; uintptr is an integer that the GC cannot track and can lead to crashes if the underlying object is moved or reclaimed.
func DangerousExample() {
s := &MyStruct{value: 42}
// Wrong: store address as uintptr
addr := uintptr(unsafe.Pointer(s))
runtime.GC() // GC may collect s
ptr := unsafe.Pointer(addr) // May point to reclaimed memory
s2 := (*MyStruct)(ptr)
fmt.Println(s2.value) // Crash or garbage
}
func SafeExample() {
s := &MyStruct{value: 42}
// Correct: keep unsafe.Pointer
ptr := unsafe.Pointer(s)
runtime.GC() // GC sees the pointer and keeps s alive
s2 := (*MyStruct)(ptr)
fmt.Println(s2.value) // Prints 42 safely
}Rule of thumb: use unsafe.Pointer whenever possible; avoid uintptr unless the operation is a single expression.
Case 1 – Zero‑copy string ↔ byte slice conversion
Typical scenario: an API gateway handling 100 000 requests per second needs to switch between string and []byte without allocating.
func String2Bytes(s string) []byte {
return unsafe.Slice(unsafe.StringData(s), len(s))
}
func Bytes2String(b []byte) string {
return unsafe.String(unsafe.SliceData(b), len(b))
}
func HandleRequest(jsonStr string) {
jsonBytes := String2Bytes(jsonStr)
var data map[string]interface{}
json.Unmarshal(jsonBytes, &data)
// ...process data
}Important: the returned []byte must not be modified because it points to read‑only string memory. If modification is required, copy the slice first.
Case 2 – Deep copy of structs with private fields
When a generic deep‑copy function must copy unexported fields, reflection is either too slow or impossible. Using unsafe, we compute field offsets and copy the values directly.
type User struct {
ID int64
name string // private
password string // private
}
func DeepCopy(src *User) *User {
dst := &User{}
dst.ID = src.ID
// copy private field 'name'
srcName := (*string)(unsafe.Pointer(uintptr(unsafe.Pointer(src)) + unsafe.Offsetof(src.name)))
dstName := (*string)(unsafe.Pointer(uintptr(unsafe.Pointer(dst)) + unsafe.Offsetof(dst.name)))
*dstName = *srcName
// copy private field 'password' similarly
srcPwd := (*string)(unsafe.Pointer(uintptr(unsafe.Pointer(src)) + unsafe.Offsetof(src.password)))
dstPwd := (*string)(unsafe.Pointer(uintptr(unsafe.Pointer(dst)) + unsafe.Offsetof(dst.password)))
*dstPwd = *srcPwd
return dst
}Go 1.17+ provides unsafe.Add, which makes the code more concise:
func DeepCopyModern(src *User) *User {
dst := &User{ID: src.ID}
srcNamePtr := (*string)(unsafe.Add(unsafe.Pointer(src), unsafe.Offsetof(src.name)))
dstNamePtr := (*string)(unsafe.Add(unsafe.Pointer(dst), unsafe.Offsetof(dst.name)))
*dstNamePtr = *srcNamePtr
// repeat for other fields
return dst
}Case 3 – Lock‑free ring buffer
High‑concurrency logging can benefit from a lock‑free queue to avoid mutex contention.
type LockFreeQueue struct {
_ [8]uint64 // padding to avoid false sharing
head uint64 // read index
_ [7]uint64
tail uint64 // write index
_ [7]uint64
mask uint64
nodes unsafe.Pointer // points to []unsafe.Pointer
}
func NewLockFreeQueue(size uint64) *LockFreeQueue {
size = roundUpToPower2(size)
nodes := make([]unsafe.Pointer, size)
return &LockFreeQueue{mask: size - 1, nodes: unsafe.Pointer(&nodes[0])}
}
func (q *LockFreeQueue) Enqueue(val interface{}) bool {
for {
tail := atomic.LoadUint64(&q.tail)
head := atomic.LoadUint64(&q.head)
if tail-head >= q.mask+1 { // full
return false
}
idx := tail & q.mask
node := (*unsafe.Pointer)(unsafe.Pointer(uintptr(q.nodes) + uintptr(idx)*unsafe.Sizeof(unsafe.Pointer(nil))))
if atomic.CompareAndSwapUint64(&q.tail, tail, tail+1) {
valPtr := unsafe.Pointer(&val)
atomic.StorePointer(node, valPtr)
return true
}
}
}
func (q *LockFreeQueue) Dequeue() (interface{}, bool) {
for {
head := atomic.LoadUint64(&q.head)
tail := atomic.LoadUint64(&q.tail)
if head >= tail { // empty
return nil, false
}
idx := head & q.mask
node := (*unsafe.Pointer)(unsafe.Pointer(uintptr(q.nodes) + uintptr(idx)*unsafe.Sizeof(unsafe.Pointer(nil))))
if atomic.CompareAndSwapUint64(&q.head, head, head+1) {
valPtr := atomic.LoadPointer(node)
if valPtr == nil {
continue
}
return *(*interface{})(valPtr), true
}
}
}
func roundUpToPower2(n uint64) uint64 {
n--
n |= n >> 1
n |= n >> 2
n |= n >> 4
n |= n >> 8
n |= n >> 16
n |= n >> 32
n++
return n
}Benchmark shows the lock‑free queue is 3–5× faster than a buffered channel.
Case 4 – Memory‑mapped file (mmap)
Processing gigabyte‑scale log files benefits from mapping the file directly into memory.
type MmapFile struct {
data []byte
file *os.File
}
func OpenMmap(filename string, size int64) (*MmapFile, error) {
f, err := os.OpenFile(filename, os.O_RDWR|os.O_CREATE, 0644)
if err != nil {
return nil, err
}
if err := f.Truncate(size); err != nil {
f.Close()
return nil, err
}
data, err := syscall.Mmap(int(f.Fd()), 0, int(size), syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
if err != nil {
f.Close()
return nil, err
}
return &MmapFile{data: data, file: f}, nil
}
func (m *MmapFile) WriteAt(p []byte, off int64) (int, error) {
if off < 0 || int(off) >= len(m.data) {
return 0, io.EOF
}
n := copy(m.data[off:], p)
return n, nil
}
func (m *MmapFile) ReadAt(p []byte, off int64) (int, error) {
if off < 0 || int(off) >= len(m.data) {
return 0, io.EOF
}
n := copy(p, m.data[off:])
return n, nil
}
func (m *MmapFile) Close() error {
if err := syscall.Munmap(m.data); err != nil {
return err
}
return m.file.Close()
}
func ProcessLargeFile() {
mf, err := OpenMmap("large.log", 1<<30) // 1 GB
if err != nil {
panic(err)
}
defer mf.Close()
data := []byte("quick write")
mf.WriteAt(data, 0)
buf := make([]byte, 100)
n, _ := mf.ReadAt(buf, 0)
fmt.Println(string(buf[:n]))
}Case 5 – High‑performance serialization
When serializing millions of structs, json.Marshal becomes a bottleneck. A custom serializer that writes directly into a byte buffer using unsafe zero‑copy techniques can be 5–8× faster.
type FastSerializer struct { buf []byte }
func NewSerializer() *FastSerializer { return &FastSerializer{buf: make([]byte, 0, 4096)} }
func (s *FastSerializer) WriteStruct(v interface{}) error {
val := reflect.ValueOf(v)
if val.Kind() == reflect.Ptr { val = val.Elem() }
for i := 0; i < val.NumField(); i++ {
f := val.Field(i)
switch f.Kind() {
case reflect.String:
s.writeString(f.String())
case reflect.Int64:
s.writeInt64(f.Int())
case reflect.Slice:
s.writeSlice(f)
}
}
return nil
}
func (s *FastSerializer) writeString(str string) {
s.writeInt32(int32(len(str)))
strBytes := unsafe.Slice(unsafe.StringData(str), len(str))
s.buf = append(s.buf, strBytes...)
}
func (s *FastSerializer) writeInt64(v int64) {
start := len(s.buf)
s.buf = append(s.buf, make([]byte, 8)...)
*(*int64)(unsafe.Pointer(&s.buf[start])) = v
}
func (s *FastSerializer) writeInt32(v int32) {
start := len(s.buf)
s.buf = append(s.buf, make([]byte, 4)...)
*(*int32)(unsafe.Pointer(&s.buf[start])) = v
}
func (s *FastSerializer) writeSlice(v reflect.Value) {
if v.Len() == 0 { s.writeInt32(0); return }
s.writeInt32(int32(v.Len()))
elemSize := v.Type().Elem().Size()
totalSize := uintptr(v.Len()) * elemSize
sliceData := unsafe.Pointer(v.Pointer())
start := len(s.buf)
s.buf = append(s.buf, make([]byte, totalSize)...)
copy(s.buf[start:], unsafe.Slice((*byte)(sliceData), totalSize))
}
func (s *FastSerializer) Bytes() []byte { return s.buf }
// Benchmark comparing json.Marshal vs FastSerializer (5‑8× faster)Ultimate Pitfall Guide
Pitfall 1 – Using uintptr in multiple steps
// Wrong
func Trap1() {
s := &MyStruct{value: 42}
addr := uintptr(unsafe.Pointer(s)) // step 1
doSomething() // GC may run here
ptr := unsafe.Pointer(addr) // step 3 – may be invalid
fmt.Println((*MyStruct)(ptr).value) // crash
}
// Correct – single expression
func Fix1() {
s := &MyStruct{value: 42}
valuePtr := (*int)(unsafe.Pointer(uintptr(unsafe.Pointer(s)) + unsafe.Offsetof(s.value)))
fmt.Println(*valuePtr) // safe
}Pitfall 2 – Modifying read‑only memory
// Wrong
func Trap2() {
s := "hello"
b := unsafe.Slice(unsafe.StringData(s), len(s))
b[0] = 'H' // segmentation fault
}
// Correct – copy before modifying
func Fix2() {
s := "hello"
b := unsafe.Slice(unsafe.StringData(s), len(s))
writable := make([]byte, len(b))
copy(writable, b)
writable[0] = 'H' // safe
}Pitfall 3 – Passing uintptr across goroutines
// Wrong
func Trap3() {
s := &MyStruct{value: 42}
addr := uintptr(unsafe.Pointer(s))
go func() {
ptr := unsafe.Pointer(addr) // unsafe – s may be collected
fmt.Println((*MyStruct)(ptr).value)
}()
time.Sleep(time.Second)
}
// Correct – pass unsafe.Pointer directly
func Fix3() {
s := &MyStruct{value: 42}
ptr := unsafe.Pointer(s)
go func() {
fmt.Println((*MyStruct)(ptr).value) // safe
}()
time.Sleep(time.Second)
}Pitfall 4 – Assuming field order
// Wrong – assumes offset 4 for field b
func Trap4() {
type MyStruct struct { a int32; b int64; c int32 }
s := &MyStruct{a:1, b:2, c:3}
bPtr := (*int64)(unsafe.Pointer(uintptr(unsafe.Pointer(s)) + 4))
fmt.Println(*bPtr) // may read garbage
}
// Correct – use unsafe.Offsetof
func Fix4() {
type MyStruct struct { a int32; b int64; c int32 }
s := &MyStruct{a:1, b:2, c:3}
bPtr := (*int64)(unsafe.Pointer(uintptr(unsafe.Pointer(s)) + unsafe.Offsetof(s.b)))
fmt.Println(*bPtr) // prints 2 safely
}Performance‑Optimization Checklist
Ask yourself: Do I really need unsafe? Use it only in hot paths where profiling shows a clear bottleneck.
Avoid premature optimization; keep the code simple until benchmarks prove otherwise.
Run thorough tests: go test -race ./..., go test -cover ./..., and go test -bench=. -benchmem.
Document the safety rationale with comments such as // SAFETY: using unsafe.Pointer ensures GC tracking. Target at least a 2× performance gain before committing unsafe code.
Conclusion
Unsafe is a powerful but dangerous tool; it can deliver massive speedups—often over 100× for zero‑copy operations—when used correctly. Follow the guidelines: prefer unsafe.Pointer, limit usage to high‑frequency code, back it with benchmarks, and write clear safety comments. When these principles are respected, unsafe becomes a reliable ally for building high‑performance Go services.
Code Wrench
Focuses on code debugging, performance optimization, and real-world engineering, sharing efficient development tips and pitfall guides. We break down technical challenges in a down-to-earth style, helping you craft handy tools so every line of code becomes a problem‑solving weapon. 🔧💻
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
