Author

AI Tech Publishing

In the fast-evolving AI era, we thoroughly explain stable technical foundations.

Articles

Likes

Views

Comments

Latest from AI Tech Publishing

77 recent articles

AI Tech Publishing

Apr 16, 2026 · Cloud Native

Deploying a Stateful AI Agent on a Stateless Web Architecture: Challenges, Solutions, and Code Walkthrough

This article analyzes the fundamental conflict between stateful AI agents and the inherently stateless, distributed nature of modern web services, explores time, state, and execution model mismatches, and presents a practical Agent‑as‑API solution using FastAPI, Redis, SSE, and Kubernetes to achieve scalable, fault‑tolerant deployments.

AI AgentFastAPIKubernetes

0 likes · 30 min read

Deploying a Stateful AI Agent on a Stateless Web Architecture: Challenges, Solutions, and Code Walkthrough

AI Tech Publishing

Apr 15, 2026 · Artificial Intelligence

8 Critical Harness Design Issues That Threaten Long‑Running Agent Accuracy

The article systematically breaks down why autonomous agents lose control during long‑running engineering tasks—missing context, short‑sighted planning, context anxiety, and plan drift—and shows how a well‑designed harness layer can preempt these problems without changing the underlying model.

AI engineeringContext ManagementHarness

0 likes · 11 min read

8 Critical Harness Design Issues That Threaten Long‑Running Agent Accuracy

AI Tech Publishing

Apr 14, 2026 · Artificial Intelligence

12 Harness Design Patterns from Claude Code: Memory, Workflow, Tools, and Automation

The article dissects twelve concrete harness design patterns uncovered in the leaked Claude Code source, organized into four categories—memory & context, workflow & orchestration, tools & permissions, and automation—detailing their use cases, trade‑offs, and implementation costs for building production‑grade AI agents.

Agent designAutomationClaude Code

0 likes · 14 min read

12 Harness Design Patterns from Claude Code: Memory, Workflow, Tools, and Automation

AI Tech Publishing

Apr 13, 2026 · Artificial Intelligence

12 Core Components of a Production-Grade Agent Harness and Framework Comparison

The article explains why production issues often stem from the agent harness rather than the model, defines the harness concept, breaks down its twelve essential components, shows a full execution loop, compares Anthropic, OpenAI, LangChain and other frameworks, and discusses key design trade‑offs for building robust AI agents.

AI agentsAgent Harnessframework comparison

0 likes · 21 min read

12 Core Components of a Production-Grade Agent Harness and Framework Comparison

AI Tech Publishing

Apr 12, 2026 · Artificial Intelligence

How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store

The article dissects Hermes Agent’s four‑store memory architecture—declarative, procedural, situational, and persona—deterministic routing, frozen snapshots, nudge‑driven persistence, security scanning, dual‑peer modeling, skill management, and three‑phase context compression, showing why it outperforms OpenClaw’s breadth‑first design.

Context CompressionHermes AgentLLM agents

0 likes · 17 min read

How Hermes Agent’s Multi‑Layer Memory Beats OpenClaw’s Simple Markdown Store

AI Tech Publishing

Apr 9, 2026 · Artificial Intelligence

Engineering‑Focused Guide to Training and Inference of Large Language Models

This article walks engineers through the full LLM stack—from tokenization and positional encoding to transformer blocks, efficient fine‑tuning, quantization, and production‑grade inference techniques such as KV‑cache, FlashAttention, PagedAttention, continuous batching, and speculative decoding—highlighting trade‑offs, toolchains, and practical workflow steps.

AttentionFine-tuningInference

0 likes · 13 min read

Engineering‑Focused Guide to Training and Inference of Large Language Models

AI Tech Publishing

Apr 8, 2026 · Artificial Intelligence

How Model, Harness, and Memory Enable Continual Learning for AI Agents

The article breaks down AI agent continual learning into three layers—model, harness, and context—explains their distinct challenges, shows how traces link them, and argues that focusing on harness and context yields faster, more practical improvements than merely retraining models.

AI agentsContinual Learningcontext memory

0 likes · 9 min read

How Model, Harness, and Memory Enable Continual Learning for AI Agents

AI Tech Publishing

Apr 7, 2026 · Artificial Intelligence

Auto Dream vs OpenClaw Dreaming: How AI Agents Consolidate Memory

The article examines the noise‑accumulation problem of AI‑Agent memory, explains Claude Code’s Auto Memory and its four‑step Auto Dream consolidation process, details OpenClaw’s three‑stage Dreaming mechanism, compares the two systems across several dimensions, and relates the design to human memory science and practical agent engineering.

AIAgent MemoryAuto-dream

0 likes · 15 min read

Auto Dream vs OpenClaw Dreaming: How AI Agents Consolidate Memory

AI Tech Publishing

Apr 6, 2026 · Artificial Intelligence

Six Core Components of a Coding Agent Explained with Code

The article systematically breaks down the six essential building blocks of a programming agent—live repository context, prompt shape and cache reuse, structured tool access and validation, context reduction, structured session memory, and bounded sub‑agent delegation—illustrated with a Mini Coding Agent implementation and comparisons to Claude Code, Codex, and OpenClaw.

Context CompressionLLMPython

0 likes · 15 min read

Six Core Components of a Coding Agent Explained with Code

AI Tech Publishing

Apr 5, 2026 · Artificial Intelligence

Why the First Token Is Slow: A Deep Dive into KV Cache for LLM Inference

The article explains how KV cache eliminates redundant computations in autoregressive LLM generation, detailing the attention mechanism, the O(n²) waste of recomputing K and V, the cache‑based solution, its impact on time‑to‑first‑token, and the memory‑vs‑speed trade‑off.

AttentionKV cacheLLM

0 likes · 7 min read

Why the First Token Is Slow: A Deep Dive into KV Cache for LLM Inference