Loop Engineering in Action: Porting an RDMA Library to Go for Only $239
The article recounts a near‑automatic library‑development experiment that started from a PRD, split into 15 issues, executed by a Loop Engineering agent, and resulted in a fully functional Go RDMA library while exposing hidden costs, bugs, and lessons learned.
0. Origin
I wanted a Go language RDMA library. Since last year we have been using RDMA for high‑performance network monitoring, but the existing Go bindings were either internal, poorly maintained, or unstable. Re‑implementing in C would be too costly for the team, and a pure Go solution seemed challenging yet attractive.
The market offers few Go RDMA libraries: either tightly‑coupled internal wrappers or abandoned bindings. I needed a genuine Go API that wraps libibverbs + librdmacm (the user‑space part of rdma‑core) and supports RC/UD transports, Send/Recv, RDMA Read/Write, plus Go equivalents of the perftest tools ib_send_bw/lat, ib_write_bw/lat, and ib_read_bw/lat.
With recent advances in AI‑assisted coding, I felt confident to attempt the port. I added a loop‑it skill to my personal goal‑workflow system to try a lightweight Loop Engineering approach, and I also wanted to measure the token cost of such a practice.
1. Pipeline: PRD → Issues → Loop
The end‑to‑end chain looks like this:
/prd → /to‑issues → /loop‑it (→ /goal → /review‑it → /note‑it → /ship‑it) ×NPRD : a requirements document ( tasks/prd‑rdma‑go‑library.md) that pins key decisions – cgo wrapper, six perftest tools, RC+UD, TCP handshake + rdma_cm connection, target Mellanox/NVIDIA NICs with RoCE v2, default -t tx‑depth 128, and latency histogram output. The PRD skill prompts me with the phrase “port RDMA to Go”.
Issue splitting : the PRD is automatically broken into 15 issues with dependencies by the to‑issues skill.
loop‑it : an automation loop that runs each issue in order, providing checkpoint‑based recovery.
1→2→3→4→5→6→7→8→9→14→10→11→12→13→15For each issue the sub‑workflow is:
branch preparation → /goal implement → /review‑it vet/build/test → /note‑it record → /ship‑it publish → cleanup → write checkpointAfter each issue the state is written to .loop‑state.json (added to .gitignore) so a crash can resume from the last checkpoint.
2. How it actually runs
Pre‑checks verify GitHub authentication, a clean worktree, remote reachability, and absence of old state files. All 15 issues start as pending and execution begins with issue #1.
Example issue #3 (PD/MR): git checkout -b feat/issue‑3‑pd‑mr, write checkpoint in_progress Implement mr_linux.go (real cgo code) + mr_stub.go (non‑Linux stub) + mr_test.go Run go vet / go build / go test Commit, push, create PR with gh pr create, squash‑merge, delete branch
Sync master and write checkpoint shipped All 15 issues completed without failure, producing a full set of device enumeration, six perftest tools, CI, and documentation.
3. First lesson: the pipeline skipped review
During retrospection I noticed that /review‑it was never invoked. The reason was that /goal is an interactive UI slash command, not a skill, and the first attempt to invoke it produced a “UI command, not skill” error. Consequently, the agent fell back to manual steps:
Pipeline step | What I actually did
---|---
/goal | Read PRD/issue, write code
/review‑it | Run go vet + go build + go test (not a real code review)
/note‑it | Skipped, no implementation notes generated
/ship‑it | Manual git / PR / squash‑merge / closeThus the skills review‑it, note‑it, and ship‑it never really ran; the agent used ad‑hoc replacements.
4. Remedy: a real code review
I finally triggered a proper /code‑review (high‑effort, parallel‑angle review) on the merged code. It uncovered eight credible problems, two of which were fatal compile errors: cq_linux.go: c.imm_data undefined in struct ibv_wc (anonymous union, __be32 byte order) – cgo cannot access .imm_data. device_linux.go: type mismatch for ibv_query_port – modern rdma‑core makes it a static inline that forwards to a real symbol _compat_ibv_port_attr*, which cgo cannot call directly.
Additional runtime/logic issues included: Endpoint.Peer never set → all -R Write/Read return errNoPeer.
Busy‑wait in write.go lacks a memory barrier, allowing the compiler to hoist the load and cause dead‑loops or stale reads.
Missing host‑byte‑order conversion for imm_data on the send side.
Various rdma_cm error‑path leaks, wrong errno timing, and ignored -c/-d/-i/-x parameters.
The most striking fact: the two compile errors meant the core cgo code had never been compiled on Linux. I had been developing on macOS using the stub implementation; go build succeeded only because the stub compiled.
5. Fix phase: extracting truth step by step
Step 1 – Quality cleanup ( /simplify ) : parallel reviews (reuse, simplification, efficiency, altitude) fixed several real problems, such as reducing repeated sorts in stats.go to a single Stats() call, consolidating the three similar copy loops in the bandwidth tests into runBWPipeline(cfg, cq, post), merging duplicate -R / -UD reject logic into Config.RequireOneSidedTCP(), and removing dead cgo anchors.
Step 2 – Engineering : added a Makefile covering vet/build/test/tools/cross/stub/integration/lint/fmt, hardware‑specific integration guarded by GORDMA_HW=1, and introduced make fmt and make lint. Running make lint produced 20 issues (16 errcheck, 2 unused, 2 staticcheck SA5002), the latter pointing to the busy‑wait race already identified by /code‑review.
When fixing the race I first tried atomic.LoadUint8, which Go lacks; I switched to loading the containing 4‑byte word with atomic.LoadUint32 plus CompareAndSwapUint32, requiring Size >= 4.
Step 3 – Real‑machine “face‑punch” :
First attempt on an H20 GPU server failed with fatal error: rdma/rdma_cma.h: No such file or directory – missing librdmacm‑dev. Switched to a Docker image with the dependencies installed.
Second attempt reproduced the two compile errors ( c.imm_data undefined and ibv_query_port type mismatch) that /code‑review had predicted. Fixed by adding a C helper wc_imm_data() (with ntohl) and a wrapper gordma_query_port().
Third attempt uncovered a misuse of unsafe.Pointer in device_linux.go. Replaced the unsafe arithmetic with unsafe.Add while keeping the pointer as unsafe.Pointer.
6. Full commit timeline
edd44e2 docs: add badges to README
4fa8b9c fix: use unsafe.Add to walk device list (go vet unsafe.Pointer) ← real‑machine #3
f0b0453 fix: cgo compile errors (imm_data, ibv_query_port) + imm byte order ← real‑machine #2
d253e36 Chore/simplify cleanups (#31) ← quality cleanup + engineering
f2519af docs: README usage, godoc, CI workflow (#15) ← loop‑it final issue
...
d15513b project skeleton: go.mod, cgo config, non‑Linux stub (#1) ← loop‑it first issue
56514ff Initial commitIssues #1 through #15 were generated automatically by loop‑it. Subsequent fixes (e.g., d253e36) were human‑driven corrections after the agent hit reality.
7. Cost: the 239 yuan
The headline cost is the total token consumption of Claude Code for the whole round‑trip: one PRD, 15 issue implementations, one high‑effort code‑review, one four‑angle simplify, plus multiple real‑machine fix cycles, amounting to 239 CNY.
This represents “a few days of work for an experienced engineer”: cgo wrapper for the full verb set, six perftest‑style tools, cross‑platform stub, CI, and documentation. In labor terms it would cost thousands of yuan.
The expensive part is the back‑and‑forth. The first implementation was cheap (≈ 100 CNY). The later rounds of debugging and real‑machine testing incurred the bulk of the cost. Had the development started on a Linux box with an RDMA NIC, the two compile errors would never have surfaced.
Skipping the automated review saved a step but cost three steps later: the missing /review‑it was compensated by a manual /code‑review and three rounds of hardware debugging. “Save one, pay three.”
Appendix: Project information
Repository: github.com/smallnest/gordma Size: 52 Go files, ~3981 lines
Capabilities: full device/context/PD/MR/CQ/QP/AH verbs; RC + UD; TCP handshake + rdma_cm connections; six perftest‑style tools
Build: Linux + cgo real implementation; non‑Linux stub built with CGO_ENABLED=0 Status: cgo compile errors fixed on real hardware; end‑to‑end data path verified on RoCE v2 hardware
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
BirdNest Tech Talk
Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
