AI Handles 80% of a Medical Triage Agent, Product Managers Cover the Rest
The article walks through a medical triage AI Agent built with LangChain, LangGraph, and LangSmith, showing how the framework supplies core model and tool interfaces, how graph‑based orchestration manages complex branching, loops and human‑in‑the‑loop steps, and how tracing and evaluation prove reliability for product managers.
Setting the Stage
Before diving in, the three core concepts—LangChain, LangGraph, and LangSmith—are defined as the building blocks of an AI Agent lifecycle, analogous to parts suppliers, assembly lines, and quality‑control stations in car manufacturing.
Phase 1: Build the Skeleton with LangChain
Standardised Model Interface
LangChain abstracts away the differences between Claude, GPT‑4, Gemini and other LLM APIs. By changing a single model name string, the underlying request format, parameters and response parsing are automatically adapted, allowing rapid model swaps without code changes. This standardisation is crucial for product managers who need to benchmark multiple models for medical reasoning.
Tool‑Calling Wrapper
Agents can invoke external tools such as a disease‑knowledge base or department‑matching database. Each tool is defined by a name, description, and execution function. During a triage conversation, the model decides when and which tool to call, passing relevant arguments (e.g., symptoms “dizziness”, “morning‑worsening”, “tinnitus”).
Agent Pre‑built Architecture (ReAct)
LangChain’s built‑in ReAct loop (Reason → Act → Observe → Reason) enables multi‑step reasoning. In the example, the agent first recognises the need for more information, calls the knowledge‑base tool, observes the results, and then decides whether to ask follow‑up questions or generate a structured recommendation.
Phase 2: Compose the Workflow with LangGraph
When the basic agent runs, limitations appear: real‑world triage requires conditional branches, loops, and human approval. LangGraph models the agent as a directed graph where each node represents an action (symptom collection, information‑sufficiency check, emergency assessment, department matching, recommendation generation) and edges encode transition rules.
Persistent Execution : After each node, the state is checkpointed, allowing the system to resume after crashes without losing patient input.
State Backtracking : If an unexpected recommendation occurs, the full state history (dialogue, tool outputs, decisions) can be inspected.
Memory Management : Short‑term memory holds the current conversation; long‑term memory can store past triage records for chronic‑patient follow‑up.
The graph includes loops for “information‑sufficiency” (re‑collect symptoms until enough data) and conditional branches for emergency detection (immediate advice to call 120) and age‑specific handling (different pathways for children).
Phase 3: Validate Quality with LangSmith
LangSmith provides full‑trace visibility, batch evaluation, and an interactive Studio for debugging.
Trace : Every node’s input, output, latency, token usage, and tool calls are recorded and visualised as a waterfall diagram.
Eval : A test set of 100 real triage cases is run; metrics such as triage accuracy, emergency‑recognition recall, follow‑up relevance, and recommendation readability are computed.
Studio : Engineers can replay a failing case, pause at any node, modify intermediate data, and observe the impact on the final recommendation.
Eval results (e.g., overall accuracy 87 %, emergency recall 95 %) guide iterative improvements—updating knowledge‑base content, adjusting prompts, or refining graph edges.
Phase 4: Deploy and Operate
In production, LangSmith shifts from a debugging tool to a monitoring hub, tracking metrics like average response time, emergency‑trigger frequency, human‑approval rates, and drop‑off ratios. Anomalies (e.g., a sudden spike in emergency triggers) are traced back to data or graph changes, fixed, re‑evaluated, and redeployed, completing a closed‑loop lifecycle.
Takeaways for Product Managers
Product managers should:
Define the agent’s required capabilities (model, tools, data sources) using LangChain’s abstraction.
Sketch the workflow as a state‑graph (nodes, edges, conditions) to communicate with engineers via LangGraph.
Design evaluation criteria (accuracy, emergency detection, follow‑up relevance) and leverage LangSmith for systematic testing and monitoring.
Understanding these three layers—capability, orchestration, and observability—provides a concrete framework for turning vague AI product ideas into reliable, measurable solutions.
PMTalk Product Manager Community
One of China's top product manager communities, gathering 210,000 product managers, operations specialists, designers and other internet professionals; over 800 leading product experts nationwide are signed authors; hosts more than 70 product and growth events each year; all the product manager knowledge you want is right here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
