Artificial Intelligence 36 min read

Is gstack’s 118K Stars Earned by Real Engineering or Just Markdown? A Deep Source‑Code Dive

This article dissects the gstack open‑source project—its 117,967 GitHub stars, 170k+ lines of TypeScript, a persistent Chromium daemon, a dual‑engine architecture, six‑layer prompt‑injection defenses, and a sprint‑style workflow—to determine whether its popularity stems from solid engineering or merely a collection of Markdown files.

Shuge Unlimited

Jun 30, 2026

Is gstack’s 118K Stars Earned by Real Engineering or Just Markdown? A Deep Source‑Code Dive

Project Overview

gstack, created by YC CEO Garry Tan, has reached 117,967 ★ on GitHub, contains 170,765 lines of TypeScript , and ships a 58 MB compiled binary . In less than four months it accumulated 395 releases (≈4 releases / day).

Architecture

ARCHITECTURE.md defines a two‑layer design:

┌─────────────────────────────────────┐
│  Skills layer (~40 SKILL.md files)   │
│  • each file = an expert role        │
│  • all Markdown, human‑readable       │
└─────────────────────────────────────┘
               +
┌─────────────────────────────────────┐
│  Browser layer (58 MB binary)        │
│  • long‑running headless Chromium    │
│  • exposes HTTP API on localhost      │
│  • the hard engineering part          │
└─────────────────────────────────────┘

Skills are invoked via slash commands such as /plan‑ceo‑review, /cso, or /qa. The persistent browser eliminates cold‑start latency (≈3 s) and preserves cookies, local storage, and login state across commands (steady‑state latency 100‑200 ms).

Why Bun Instead of Node.js

Compiled binary via bun build --compile produces a single 58 MB executable, removing node_modules and PATH configuration.

Native SQLite support ( new Database()) avoids native addons.

Native TypeScript execution ( bun run server.ts) removes the need for ts-node and source‑map overhead.

Built‑in HTTP server ( Bun.serve()) handles ~10 routes without Express/Fastify.

The documentation notes that the bottleneck is always Chromium, not the CLI, so Bun’s speed is a bonus rather than a necessity.

Persistent Browser Performance

Cold start of the daemon takes about 3 seconds . After the first start, each command executes in 100‑200 ms . The browser process stays alive for 30 minutes of inactivity before timing out.

Ref System

Instead of injecting CSS selectors, gstack assigns stable references ( @e1, @e2, …) by snapshotting the accessibility tree with Playwright.

1. Agent calls $B snapshot -i
2. Server runs page.accessibility.snapshot()
3. ARIA tree is traversed, refs are assigned
4. Playwright locators are built (getByRole(...).nth(...))
5. Ref map stored in Map<string, RefEntry>
6. Annotated tree returned as plain text

Later the agent clicks an element with $B click @e3; the server resolves @e3 to the corresponding locator and executes locator.click(). This avoids CSP conflicts, framework hydration issues, and Shadow DOM barriers.

Prompt‑Injection Six‑Layer Defense

L1‑L3 : content‑level injection – datamarking, hidden‑element stripping, ARIA regex, URL blacklist, trust‑boundary envelope.

L4 : ML classifier – 22 MB BERT‑small ONNX model (int8 quantized) runs locally.

L4b : transcription classifier – a single Claude Haiku call gated by LOG_ONLY: 0.40.

L5 : canary token – random token injected into the system prompt; any leak triggers an immediate BLOCK.

L6 : ensemble decision – two ML classifiers must both reach WARN(0.75) before a BLOCK is issued.

An emergency switch GSTACK_SECURITY_OFF=1 disables all defenses for debugging.

SKILL.md Template System

Documentation drift is prevented by generating SKILL.md from a template ( SKILL.md.tmpl) and a generator ( gen‑skill‑docs.ts). Placeholders such as {{COMMAND_REFERENCE}}, {{SNAPSHOT_FLAGS}}, and {{PREAMBLE}} are filled from source metadata, guaranteeing that every documented command exists in code and vice‑versa.

Three‑Layer Testing

Static validation : parse each $B command in SKILL.md and cross‑check the registry. Cost: free. Speed: <2 s.

E2E with Claude : run a real Claude session for each skill. Cost: ≈ $3.85. Speed: ≈ 20 min.

LLM judge : Sonnet scores docs for clarity, completeness, operability. Cost: ≈ $0.15. Speed: ≈ 30 s.

This tiered approach catches ~95 % of issues cheaply and only spends LLM budget on borderline cases.

Preamble (Per‑Skill Bash Block)

Each skill begins with a {{PREAMBLE}} block that runs five tasks before the skill logic:

Update check via gstack-update-check.

Session tracking – creates a file under ~/.gstack/sessions/$PPID; if >3 concurrent sessions, the skill enters ELI16 mode, re‑establishing full context for each request.

Self‑improvement logging – writes failures and learnings to a JSONL file for later sessions.

Uniform user‑question formatting – forces a consistent RECOMMENDATION: … style.

“Search Before Building” principle from ETHOS.md is injected.

Design Philosophy (ETHOS.md)

Boil the Ocean : With AI assistance, fully testing a module adds only minutes, so the project aims for 100 % coverage rather than “don’t boil the ocean”.

Search Before Building : Three knowledge layers – Layer 1 (tried‑and‑true), Layer 2 (new‑and‑popular), Layer 3 (first‑principles) – guide when to reuse existing solutions.

User Sovereignty : AI makes recommendations; users retain final authority. This drives the strict canary‑token and dual‑classifier design for security findings.

Dual‑Listener Tunnel Architecture

When pair‑agent --client is used, two HTTP listeners are created:

Local listener ( 127.0.0.1:LOCAL_PORT) runs permanently and handles internal commands such as /cookie-picker, /inspector/*, and the full command API.

Tunnel listener ( 127.0.0.1:TUNNEL_PORT) is lazily bound on /tunnel/start and torn down on /tunnel/stop. It only exposes a whitelisted set of endpoints ( /connect, /command, /sidebar‑chat) and returns 404 for everything else.

Physical port separation provides stronger security than header‑based authentication because the tunneled socket never exposes internal endpoints such as /health or /cookie-picker.

Cookie Security Model (Five Principles)

Keychain access requires explicit user approval.

Decryption occurs in‑process (PBKDF2 + AES‑128‑CBC) and never writes plaintext to disk.

Chromium cookie DB is copied to a temporary file and opened read‑only.

AES keys are cached only for the server’s lifetime.

Logs never contain cookie values; commands show only metadata.

Crash Recovery

If the Chromium process disconnects, the server exits immediately. The CLI detects the dead server on the next command and restarts it, avoiding complex self‑healing logic. This “die‑and‑restart” strategy is simpler and more reliable than trying to reconnect a half‑dead browser.

Community Criticism (Hacker News)

LOC as a metric – reviewers called the claim of 600 k lines of production code a relic and pointed to the project’s own ON_THE_LOC_CONTROVERSY.md for a rebuttal.

Just Markdown – some argued the repo is “a bunch of files telling Claude to pretend to be different people”, overlooking the substantial browser daemon.

Nepotism – the YC CEO’s name was seen as an unfair boost; the most up‑voted HN post was a satirical PR translating the README.

Trust‑chain breakage – users reported over‑confident skills that mislead, noting the lack of regression tests for skill logic and the reliance on LLM confidence.

Derivative Ecosystem

More than ten “inspired by gstack” projects have appeared, e.g. upstack (red/green TDD), nanopm (project‑management layer), CFO‑stack (double‑entry accounting), tonone (role‑based Claude extensions), Gstack++ (C++ support), and CTP Room (multi‑agent collaboration). They extend the core idea of encoding AI agent workflows as role‑based skills.

Conclusion

gstack’s popularity rests on three concrete pillars:

A production‑grade, long‑running Chromium daemon with sub‑second latency and robust security.

A sprint‑style, slash‑command workflow that turns team processes into repeatable code (“process = code”).

Vendor‑neutral design supporting ten AI coding agents and easy host addition.

Shortcomings – lack of skill‑level regression tests, LOC hype, and perceived over‑engineering – are real and should be considered when adopting the tool. Nonetheless, the paradigm of role‑based AI agent workflows is likely to influence future developer tooling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt Engineering open source security browser automation AI workflow gstack

Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.