Mastering LangGraph Streaming: Token, Node, and Event-Level Output to Prevent UI Crashes
The article explains why streaming output is essential for responsive LLM agents, compares batch and streaming latency, details the five LangGraph streamMode options with code examples, shows how to combine them, and lists common pitfalls to avoid runtime errors and poor user experience.
Why streaming output matters
When the invoke method is used, the UI stays blank for the whole response time, causing user anxiety. Streaming reduces the first‑character delay from seconds to sub‑second, moving the perceived wait from the "attention drifts" range (1‑10 s) to the "acceptable delay" range (0.1‑1 s) according to Nielsen's perception latency study.
The .stream() API returns an AsyncGenerator, allowing consumption of data while it is being generated, unlike .invoke() which returns only the final result.
Stream mode options
values– emits the full state after each step; useful for debugging and inspecting state accumulation. updates – emits only the incremental changes (which node changed what); ideal for monitoring node‑level progress. messages – emits LLM token chunks together with metadata; enables a typewriter‑style UI. custom – emits arbitrary data pushed by a node or tool; suited for tool‑progress updates. debug – emits checkpoint and task events (most detailed); used for deep debugging.
Updates mode – node‑level progress
Example:
for (const chunk of graph.stream(inputs, { streamMode: "updates" })) {
console.log(chunk);
// → { plan: { outline: "Outline for: LangGraph Streaming" } }
// → { write: { article: "Article based on: Outline for: …" } }
}The output is a plain object keyed by node name, making it obvious which node finished and what it returned.
Messages mode – true typewriter effect
When the LLM is created with streaming: true, each generated token triggers a chunk:
for await (const [messageChunk, metadata] of graph.stream(
{ messages: [new HumanMessage("Explain quantum entanglement in three sentences")] },
{ streamMode: "messages" }
)) {
if (messageChunk.content) {
process.stdout.write(messageChunk.content as string);
}
}The chunk format is [messageChunk, metadata]. messageChunk.content holds the text; metadata contains fields such as langgraph_node, run_id and tags.
Custom mode – tool execution progress
Tools that are not LLMs can push progress updates via getLangGraphStreamWriter() inside a node or a tool:
const searchDatabase = tool(async ({ query }) => {
const writer = getLangGraphStreamWriter();
writer({ type: "progress", message: "Connecting…", progress: 0 });
await new Promise(r => setTimeout(r, 300));
writer({ type: "progress", message: "Running query…", progress: 30 });
// …more steps…
writer({ type: "progress", message: "Done", progress: 100 });
return `Found 42 records for "${query}"`;
}, { name: "search_database", description: "Search internal DB", schema: z.object({ query: z.string() }) });
for await (const chunk of graph.stream(inputs, { streamMode: "custom" })) {
if (chunk.type === "progress") {
console.log(`[${chunk.progress}%] ${chunk.message}`);
}
} getLangGraphStreamWriter()can only be called inside the LangGraph execution context; calling it elsewhere throws a runtime error.
Mixed mode – subscribing to multiple streams
Production systems often need both token‑level typing and node‑level progress. Pass an array to streamMode:
for await (const [mode, chunk] of graph.stream(
{ messages: [new HumanMessage("Write a poem about autumn")] },
{ streamMode: ["updates", "messages", "custom"] }
)) {
switch (mode) {
case "updates":
console.log("✅ Node completed:", Object.keys(chunk));
break;
case "messages":
const [msgChunk] = chunk;
if (msgChunk.content) process.stdout.write(msgChunk.content);
break;
case "custom":
if (chunk.type === "progress") console.log(`
[Progress] ${chunk.message}`);
break;
}
}In mixed mode the generator yields a two‑element tuple [mode, chunk]; single‑mode calls return the raw chunk directly.
Common pitfalls (5 issues)
Missing streaming: true on the LLM – the messages mode will output only at the end.
Calling getLangGraphStreamWriter outside the Graph – results in a runtime error; it must be used inside a node or a tool.
Forgetting to destructure in mixed mode – using chunk directly yields undefined because the generator returns [mode, data].
Sub‑graph tokens are not propagated by default – set subgraphs: true on addNode to forward inner tokens.
Breaking out of the async generator early – may leak resources; use await gen.return() to close the generator cleanly.
Key takeaways
updatesis the most common starting point for node‑level progress when you only need to know which node finished and its output.
True typewriter effect requires both messages mode and streaming: true on the model; omitting either disables token‑level streaming.
Tool progress should be emitted via custom mode together with getLangGraphStreamWriter – this is the only supported path for non‑LLM tools.
Mixed mode lets you handle multiple granularities simultaneously; remember the generator yields a [mode, chunk] tuple.
When nesting graphs, enable subgraphs: true to surface inner token streams; otherwise sub‑graph output remains hidden.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
