How Self‑Programming AI Agents Are Built: From LLM Brain to Dynamic Code Execution
This article explains how a self‑programming AI Agent is constructed by extending large language models as the brain, designing a multi‑area architecture, implementing memory layers, prompt engineering with segment mechanisms, and enabling code generation and execution through a Python‑Java bridge, while sharing practical insights and future directions.
Introduction
Exploring LLM applications reveals that large language models can not only write high‑quality code but also generate and run code to control an Agent’s behavior, enabling complex logic such as branches and loops.
Agent System Design
Based on the ReAct Agent pattern, the system introduces deep optimizations:
Code + Generalized invocation via Py4j for flexible tool calls.
Spring Boot backend integrating Spring AI and Alibaba capabilities.
Full‑chain evaluation and observation platforms.
MCP protocol for business capability augmentation.
A2A protocol for Multi‑Agent architecture.
Model Strategy
Translation/Data extraction: Qwen3‑Turbo (low latency).
Thinking/Dynamic code generation: Qwen3‑Coder (enhanced coding).
General scenarios: on‑demand Qwen, DeepSeek, etc.
Core Coding‑Driven Logic
The Agent uses a [Segment]: [Content] structure to compose prompts, enabling chain‑of‑thought reasoning.
System tools allow the Agent to inspect tasks, query skills, sleep, converse, and perform deep reasoning via large models.
Each toolbox is a Python class; functions are exposed as callable methods, and code is generated in Fill‑In‑Middle (FIM) format.
<span>IntentPlanner</span>
├── SegmentBuilder (members)
│ ├── List<InferencePromptNewBuilder> promptBuilders
│ ├── InferencePromptConfigManager (config manager)
│ └── build(type, segments, context, config) → InferenceSegment
└── PromptBuilder implementations (ThoughtPromptBuilder, CmdPromptBuilder, …)Agent Architecture
An Agent runs as a thread, containing multiple Areas (functional zones) and Acts (actions). Its lifecycle is illustrated below.
Areas
Perception Area : receives external information (messages, UI events, async results) and performs initial processing.
Cognition Area : after perception, the IntentPlanner uses segments to generate code, which is sent to the Python executor.
Movement Area (Advanced Cognition) : handles complex tasks by looping over segments until goals are met.
Expression Area : delivers responses, cards, or events to the user.
Self‑Evaluation Area : after task completion, SelfTaskCheck evaluates success and may trigger re‑execution.
Context Engineering
Context and Prompt are the core decision‑making components. Segments are assembled into prompts, supporting configuration and modularity.
## Platform Introduction
You are an Agent specialized in Alibaba DevOps, capable of code generation, tool invocation, and task execution.
## Role Definition
Professional DevOps assistant with core abilities.
## Work Mechanism
Think‑Execute‑Evaluate loop.
## Input/Output Formats
User input may include text, screenshots, URLs. Output code follows FIM format with <|fim_prefix|>, <|fim_middle|>, <|fim_suffix|> markers.Prompt Example
User: Help me query requirements for project 123 with keyword "login".
Context: Current page URL https://example.com/project/123.
Thought: Use example_toolkit.search_requirements.
Python:
<|fim_prefix|># Query project requirements
project_id = "123"
query = "login"
try:
<|fim_suffix|>
except Exception as e:
print(f"Error during query: {e}")
<|fim_middle|>Memory System
The memory hierarchy mirrors human memory: Sensory, Short‑Term, and Long‑Term. Short‑Term (session) memory stores recent segments; Long‑Term retains experiences, user preferences, and platform knowledge.
class SegmentMemoryMessage extends ShortTermMemoryMessage {
InferenceSegment segment;
String execId;
List<String> parentExecId;
}Code Execution Engine
The PythonExecutionEngine runs generated Python code in isolated threads, monitors execution, enforces timeouts, and reports results back to the Agent.
Asynchronous execution in separate threads.
Lifecycle management with execution IDs.
Timeout control (default 3600 s).
Resource isolation and monitoring.
Toolkit Bridge (Python ↔ Java)
Py4j provides a bidirectional gateway. Python code calls Java toolkits via a dynamic proxy.
class PythonToolkitProxy:
def __init__(self, toolkit_name):
self.toolkit_name = toolkit_name
def __getattr__(self, operation_name):
async def async_operation_method(*args, **kwargs):
return await call_toolkit_async(self.toolkit_name, operation_name, kwargs)
return async_operation_method
# Example usage
result = await example_toolkit.get_project_info(project_id="123")Toolkit Registration
Two registration mechanisms coexist:
Dynamic Toolkit : implements DynamicToolkit interface; supports runtime registration and self‑describing metadata.
Annotation Toolkit : uses @ToolkitComponent and @ToolkitOperation annotations; benefits from compile‑time checks and Spring integration.
Both are managed by separate registries and accessed uniformly via ToolkitMapper.
Reflections & Future Outlook
Key takeaways for building robust AI Agents:
Design prompt assembly and context engineering carefully to ensure multi‑turn stability.
Invest in solid engineering architecture; model quality sets the ceiling, engineering sets the floor.
Enable agents to accumulate experience and learn from past executions.
Future directions include more dynamic prompt generation, unified coding‑driven agent models, finer task isolation, continuous knowledge refresh, deeper MCP integration, desktop client development, and comprehensive observability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
