Tagged articles
117 articles
Page 1 of 2
DataFunTalk
DataFunTalk
May 19, 2026 · Artificial Intelligence

How Knora’s Ontology‑Enhanced AI Tackles Hallucinations and Execution Gaps in Enterprise Deployments

The article explains how Knora 4.0 combines enterprise‑level ontologies with large‑model capabilities to overcome six common AI challenges—hallucination, instability, weak planning, poor responsiveness, data integration, and long cold‑start cycles—enabling autonomous, auditable execution illustrated by a LED production‑line case that achieved a 70‑fold efficiency boost.

AI ArchitectureAutonomous AgentsEnterprise AI
0 likes · 16 min read
How Knora’s Ontology‑Enhanced AI Tackles Hallucinations and Execution Gaps in Enterprise Deployments
FunTester
FunTester
May 19, 2026 · Artificial Intelligence

How Memory Layering Makes AI Agents Smarter Over Time

The article explains why default agent memory is fleeting, proposes a two‑layer design of session and long‑term memory with a post‑session “dreaming” integration step, and shows how selective persistence and shared long‑term storage keep agents continuously improving.

AI ArchitectureAgent MemoryDream Integration
0 likes · 8 min read
How Memory Layering Makes AI Agents Smarter Over Time
DataFunTalk
DataFunTalk
May 5, 2026 · Artificial Intelligence

How Knora’s Ontology‑Enhanced AI Tackles Hallucinations and Execution Gaps in Enterprise Deployments

The article analyzes Knora 4.0, an ontology‑enhanced AI platform that combines large‑model capabilities with a structured knowledge graph to overcome hallucinations and execution gaps in enterprise deployments, detailing its architecture, autonomous agent Knora Claw, real‑world case studies, and a three‑year roadmap.

AI ArchitectureAutonomous AgentsBusiness Automation
0 likes · 18 min read
How Knora’s Ontology‑Enhanced AI Tackles Hallucinations and Execution Gaps in Enterprise Deployments
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
May 3, 2026 · Artificial Intelligence

9 Advanced Retrieval‑Augmented Generation (RAG) Architectures Explained

This article introduces Retrieval‑Augmented Generation (RAG) and systematically details nine distinct RAG architectures—standard, conversational with memory, corrective (CRAG), adaptive, self‑RAG, fusion, HyDE, agentic, and Graph RAG—highlighting their workflows, real‑world examples, advantages, and trade‑offs.

AI ArchitectureGraphRAGLLM
0 likes · 17 min read
9 Advanced Retrieval‑Augmented Generation (RAG) Architectures Explained
DataFunSummit
DataFunSummit
Apr 28, 2026 · Artificial Intelligence

How Knora’s Ontology‑Enhanced Large Model Solves Hallucination and Execution Gaps in Enterprise AI

The article explains how Knora 4.0 combines enterprise ontologies with large‑model AI to create a unified, autonomous execution loop, addressing six common AI‑deployment challenges, detailing the platform’s architecture, autonomous agents, real‑world case studies, roadmap, and expert round‑table insights.

AI ArchitectureAutonomous AgentsEnterprise AI
0 likes · 17 min read
How Knora’s Ontology‑Enhanced Large Model Solves Hallucination and Execution Gaps in Enterprise AI
Architect
Architect
Apr 27, 2026 · Artificial Intelligence

Sub-Agent vs Agent Team: Designing Multi-Agent Architectures Around Context Boundaries

The article explains how to choose between Sub‑Agent and Agent Team structures for multi‑agent systems by evaluating whether sub‑tasks share context, need isolation, compression, parallelism, or continuous collaboration, and provides practical guidelines, pitfalls, and a decision framework to avoid over‑engineering.

AI ArchitectureAgent TeamContext Boundaries
0 likes · 18 min read
Sub-Agent vs Agent Team: Designing Multi-Agent Architectures Around Context Boundaries
Architect's Tech Stack
Architect's Tech Stack
Apr 25, 2026 · Artificial Intelligence

DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings

DeepSeek released the V4 series—V4‑Pro (1.6 T total, 49 B active) and V4‑Flash (284 B total, 13 B active)—featuring three architectural upgrades, three inference modes, mixed‑precision FP4/FP8 weights, and benchmark results that place its programming ability at the top of open‑source models while supporting a million‑token context window.

AI ArchitectureBenchmarkDeepSeek
0 likes · 5 min read
DeepSeek‑V4 Launch: 1.6 T Parameters, 1 M‑Token Context, Programming Skills Lead Open‑Source Rankings
IT Services Circle
IT Services Circle
Apr 25, 2026 · Artificial Intelligence

Understanding AI Core Concepts: Agent, Skills, Tools, and MCP

The article explains the four core AI components—Agent, Tools, Skills, and MCP—detailing their definitions, roles, the problems they address, and how they interoperate within the Cursor platform to transform a conversational model into a functional digital worker.

AI ArchitectureAgentMCP
0 likes · 13 min read
Understanding AI Core Concepts: Agent, Skills, Tools, and MCP
PaperAgent
PaperAgent
Apr 24, 2026 · Artificial Intelligence

DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6

DeepSeek‑V4’s open‑source report reveals a hybrid CSA/HCA attention design, manifold‑constrained residuals and the Muon optimizer that cut per‑token FLOPs to 27 % and KV‑Cache to 10 % at 1 M tokens, while benchmark results show it outperforms Claude Opus 4.6 on most tasks yet still lags on complex instruction following and multi‑turn dialogue.

AI ArchitectureBenchmarkClaude Opus
0 likes · 11 min read
DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6
Architect's Must-Have
Architect's Must-Have
Apr 23, 2026 · Artificial Intelligence

OpenAI Images 2.0 Deep Dive: How AI Image Generation Enters the “Thinking Era”

The article provides a comprehensive technical analysis of OpenAI's ChatGPT Images 2.0 (gpt‑image‑2), detailing its strategic launch, new autoregressive architecture, integrated reasoning and web‑search capabilities, multi‑image consistency, pricing model, competitive landscape, limitations, and future impact on visual AI workflows.

AI ArchitectureGPT Image 2Multimodal AI
0 likes · 28 min read
OpenAI Images 2.0 Deep Dive: How AI Image Generation Enters the “Thinking Era”
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 21, 2026 · Artificial Intelligence

How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention

OpenMythos re‑creates the Claude Mythos architecture as a Recurrent‑Depth Transformer with MoE routing, achieving comparable performance to larger Transformers while using roughly half the parameters, and demonstrates systematic generalization and depth extrapolation through looped inference in latent space.

AI ArchitectureLooped Language ModelsMixture of Experts
0 likes · 6 min read
How a 22‑Year‑Old Reversed‑Engineered Mythos into OpenMythos Using MoE and DeepSeek‑Inspired Attention
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 21, 2026 · Artificial Intelligence

Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture

Analyzing the credibility of Yifan Zhang’s brief “V4, next week” tweet, the article examines five supporting signals, details three newly revealed architecture components—Sparse MQA, Fused MoE Mega Kernel, and Manifold‑Constrained Hyper‑Connections—and summarizes V4’s rumored specifications, pricing, and strategic implications.

AI ArchitectureDeepSeekFused MoE
0 likes · 7 min read
Is DeepSeek V4 Really Launching Next Week? Inside Its Core Architecture
PaperAgent
PaperAgent
Apr 21, 2026 · Artificial Intelligence

OpenMythos: Rebuilding Claude Mythos with Recursive Transformers and MoE

OpenMythos is an open‑source PyTorch reimplementation of Anthropic's Claude Mythos that uses a mixed‑expert routed recurrent Transformer, introduces Recursive Depth Transformers, Multi‑Latent Attention, and several stability mechanisms, and demonstrates parameter‑efficient scaling backed by empirical studies.

AI ArchitectureClaude MythosMoE
0 likes · 6 min read
OpenMythos: Rebuilding Claude Mythos with Recursive Transformers and MoE
AI Architect Hub
AI Architect Hub
Apr 20, 2026 · Artificial Intelligence

Why LLMs Need RAG: Overcoming Core Limitations and Building Scalable AI Solutions

This article analyzes the fundamental shortcomings of large language models for enterprise use, explains how Retrieval‑Augmented Generation (RAG) bridges those gaps through a detailed offline‑online workflow, and explores emerging trends that will shape the next generation of intelligent AI architectures.

AI ArchitectureEnterprise AIFuture AI
0 likes · 10 min read
Why LLMs Need RAG: Overcoming Core Limitations and Building Scalable AI Solutions
Code Mala Tang
Code Mala Tang
Apr 19, 2026 · Artificial Intelligence

Why Real‑World Constraints Define the Success of Claude Code Agents

The analysis of the arXiv paper “Dive into Claude Code” reveals that beyond model loops, the decisive factors for coding agents are practical system design issues such as permission control, context compression, safety, user intervention, and reliable execution in real environments.

AI ArchitectureClaude CodeCoding Agent
0 likes · 5 min read
Why Real‑World Constraints Define the Success of Claude Code Agents
Architect
Architect
Apr 18, 2026 · Artificial Intelligence

Why Multi‑Agent Systems Need More Than Role‑Playing: 5 Coordination Patterns Explained

Anthropic’s recent analysis reveals five multi‑agent coordination patterns—Generator‑Verifier, Orchestrator‑Subagent, Agent Teams, Message Bus, and Shared State—highlighting that the real challenges lie in context boundaries, information flow, verification standards, and termination conditions rather than merely assigning roles.

AI ArchitectureAgent orchestrationCoordination Patterns
0 likes · 30 min read
Why Multi‑Agent Systems Need More Than Role‑Playing: 5 Coordination Patterns Explained
Machine Heart
Machine Heart
Apr 17, 2026 · Artificial Intelligence

Combining Transformers and RNNs: Google’s Memory Caching Unlocks Ultra‑Long Context

Google Research introduces Memory Caching (MC), a technique that gives RNNs growing memory capacity, bridging the gap with Transformers to enable ultra‑long context processing while reducing memory demands, and demonstrates its effectiveness through extensive language‑modeling and recall experiments.

AI ArchitectureGoogle ResearchMemory Caching
0 likes · 7 min read
Combining Transformers and RNNs: Google’s Memory Caching Unlocks Ultra‑Long Context
Qborfy AI
Qborfy AI
Apr 15, 2026 · Artificial Intelligence

Why Three AI Agents Beat One: Planner‑Generator‑Evaluator Architecture Explained

The article analyzes why a single AI struggles to self‑evaluate, presents Anthropic’s three‑agent (Planner, Generator, Evaluator) architecture with concrete DAW‑building examples, sprint contracts, cost‑benefit tables, and step‑by‑step processes that show how each role solves specific problems and improves overall quality.

AI ArchitectureEvaluatorMulti-Agent
0 likes · 24 min read
Why Three AI Agents Beat One: Planner‑Generator‑Evaluator Architecture Explained
FunTester
FunTester
Apr 14, 2026 · Artificial Intelligence

Why Long-Term Memory Is the Next Frontier for Large Language Models

The article examines how the evolution of large‑language‑model memory is shifting from expanding context windows to building controllable, auditable long‑term memory systems, comparing strategies of OpenAI, Anthropic, Google, Microsoft and Meta, and outlining future trends such as automatic memory policies, multimodal storage, agent‑shared memory, and memory‑reasoning integration.

AI ArchitectureLong-term Memoryfuture AI trends
0 likes · 8 min read
Why Long-Term Memory Is the Next Frontier for Large Language Models
AI Explorer
AI Explorer
Apr 14, 2026 · Artificial Intelligence

OpenAI Launches Spud to Counter Anthropic’s Claude Mythos on Blackwell

OpenAI’s newly announced Spud model directly targets Anthropic’s Claude Mythos, leveraging Nvidia’s Blackwell architecture to shift the AI race from sheer scale toward hardware efficiency, signalling a strategic pivot where performance per compute unit becomes the next competitive benchmark.

AI ArchitectureAnthropicBlackwell
0 likes · 6 min read
OpenAI Launches Spud to Counter Anthropic’s Claude Mythos on Blackwell
AI Engineering
AI Engineering
Apr 14, 2026 · Artificial Intelligence

Anthropic’s Multi‑Agent Coordination Guide: 5 Architectures and When to Use Them

When a single AI agent can’t finish a task, Anthropic’s new guide outlines five proven multi‑agent coordination patterns—generate‑validate, orchestrate‑sub‑agent, team, message‑bus, and shared‑state—detailing suitable scenarios, common pitfalls, and a recommendation to start simple and scale only as needed.

AI ArchitectureAnthropicCoordination Patterns
0 likes · 4 min read
Anthropic’s Multi‑Agent Coordination Guide: 5 Architectures and When to Use Them
AndroidPub
AndroidPub
Apr 9, 2026 · Artificial Intelligence

Beyond Prompting: Mastering Harness Engineering to Build Reliable LLM Applications

This article examines the evolution from Prompt Engineering to Context Engineering and finally to Harness Engineering, presenting a six‑layer architecture and practical modules that turn large language models into robust, observable, and maintainable AI systems.

AI ArchitectureContext EngineeringHarness Engineering
0 likes · 28 min read
Beyond Prompting: Mastering Harness Engineering to Build Reliable LLM Applications
AI Step-by-Step
AI Step-by-Step
Apr 6, 2026 · Artificial Intelligence

Why Single Agents Fail: Embracing Multi‑Agent Microservice Architecture

When a single AI agent’s logic hits bottlenecks, the article explains how breaking responsibilities into bounded microservice agents, using pipelines for deterministic steps and supervisors for dynamic routing, yields clearer contracts, shared state, easier debugging, and more stable, scalable task execution.

AI ArchitectureAgent FrameworksMicroservices
0 likes · 12 min read
Why Single Agents Fail: Embracing Multi‑Agent Microservice Architecture
Architecture and Beyond
Architecture and Beyond
Apr 4, 2026 · Artificial Intelligence

How Claude Code Structures Its Memory: A Deep Dive into Multi‑Layered Agent Memory Design

This article dissects Claude Code's memory architecture, explaining its four distinct memory layers, file‑based long‑term storage, dynamic retrieval without embeddings, multi‑stage write paths, and session‑compression strategies, while highlighting design trade‑offs and practical takeaways for building robust AI agents.

AI ArchitectureAgent MemoryClaude Code
0 likes · 20 min read
How Claude Code Structures Its Memory: A Deep Dive into Multi‑Layered Agent Memory Design
Advanced AI Application Practice
Advanced AI Application Practice
Apr 3, 2026 · Industry Insights

In-Depth Breakdown of the AI Business Architect Role and Interview Strategies

This article dissects the AI Business Architect position, detailing its true responsibilities, core competency formula, key role personas, supply‑demand matching scenarios, end‑to‑end technical architecture (including RAG and multi‑agent design), evaluation metrics, and provides concrete interview questions with model answers to help candidates prepare effectively.

AI ArchitectureAgent SystemsInterview Prep
0 likes · 18 min read
In-Depth Breakdown of the AI Business Architect Role and Interview Strategies
Architect
Architect
Apr 1, 2026 · Artificial Intelligence

Inside Claude Code: How Anthropic Built a Secure, Scalable Local Agent Runtime

This article dissects Claude Code’s open‑source repository, revealing how its startup sequence, context assembly, main loop, tool contracts, permission pipeline, and long‑task handling are engineered layer by layer to create a performant, secure local AI agent runtime.

AI ArchitectureClaude CodeContext management
0 likes · 24 min read
Inside Claude Code: How Anthropic Built a Secure, Scalable Local Agent Runtime
Ray's Galactic Tech
Ray's Galactic Tech
Mar 31, 2026 · Artificial Intelligence

From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint

This comprehensive guide walks Go engineers through the evolution from a prototype Retrieval‑Augmented Generation (RAG) service to a production‑grade, distributed AI platform, covering architecture, component boundaries, caching strategies, async indexing, observability, security, and step‑by‑step deployment.

AI ArchitectureBackend DevelopmentDistributed Systems
0 likes · 42 min read
From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint
Data STUDIO
Data STUDIO
Mar 30, 2026 · Artificial Intelligence

Why a Single AI Falls Short: Building a Multi‑Agent Expert Team for Superior Reports

The article demonstrates how a monolithic LLM struggles with multi‑dimensional market analysis and shows, through step‑by‑step code, how assembling specialized AI agents for news, technical and financial analysis yields clearer structure, deeper insight, and higher evaluation scores.

AI ArchitectureLLM evaluationLangChain
0 likes · 17 min read
Why a Single AI Falls Short: Building a Multi‑Agent Expert Team for Superior Reports
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Mar 28, 2026 · Artificial Intelligence

From RNNs to Multimodal Agents: A Decade of Transformer Evolution

This article traces the evolution of sequence models from early RNN/LSTM designs through the breakthrough Transformer, its major branches, dense scaling, efficiency‑focused variants, next‑generation linear‑complexity SSMs, and finally multimodal agent architectures, highlighting each stage's strengths, weaknesses, and typical use cases.

AI ArchitectureLLMTransformer
0 likes · 12 min read
From RNNs to Multimodal Agents: A Decade of Transformer Evolution
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Mar 28, 2026 · Artificial Intelligence

Mastering Multi‑Agent Systems: Design, Parallel Execution, and Interview Strategies

This article dissects the shortcomings of single‑agent LLM pipelines, introduces the Supervisor‑based Multi‑Agent architecture with LangGraph, demonstrates parallel task execution, robust error handling, and result merging, and provides concrete interview guidance backed by real performance data.

AI ArchitectureError HandlingLLM
0 likes · 19 min read
Mastering Multi‑Agent Systems: Design, Parallel Execution, and Interview Strategies
AI Explorer
AI Explorer
Mar 27, 2026 · Artificial Intelligence

Why Tsinghua’s Multi‑Intelligence DeepSeek‑R1 Shifts AI from Depth to Width

Tsinghua University and WuWen XinQiong unveil DeepSeek‑R1, a multi‑model AI architecture that prioritizes width over depth, enabling parallel expert models to tackle complex, multi‑format data, addressing single‑model limitations while attracting significant industry investment and posing new engineering challenges.

AI ArchitectureDeepSeek-R1Tsinghua
0 likes · 7 min read
Why Tsinghua’s Multi‑Intelligence DeepSeek‑R1 Shifts AI from Depth to Width
AI Info Trend
AI Info Trend
Mar 24, 2026 · Artificial Intelligence

How OpenClaw 2.0 Turns AI from Chatbot to Actionable Agent – A Deep Dive

The OpenClaw 2.0 research report maps the evolution from simple chatbots to fully‑actionable AI agents, detailing its market surge, four‑layer memory architecture, zero‑code deployment options, cost‑saving token optimization, and a roadmap that predicts AI agents will reshape personal productivity and enterprise workflows.

AI AgentAI ArchitectureAI trends
0 likes · 6 min read
How OpenClaw 2.0 Turns AI from Chatbot to Actionable Agent – A Deep Dive
Architect
Architect
Mar 22, 2026 · Artificial Intelligence

Can Frozen LLMs Keep Learning? Inside Memento‑Skills' Deployment‑Time Learning

The article analyses the Memento‑Skills paper and its open‑source implementation, showing how a frozen large language model can continuously improve by treating skills as external memory, using a five‑step Observe‑Read‑Act‑Feedback‑Write loop, advanced routing, and modular architecture to achieve significant gains on GAIA and HLE benchmarks.

AI ArchitectureAgentDeployment-Time Learning
0 likes · 21 min read
Can Frozen LLMs Keep Learning? Inside Memento‑Skills' Deployment‑Time Learning
SuanNi
SuanNi
Mar 21, 2026 · Artificial Intelligence

Can AI Achieve Human‑Like Autonomous Learning? A Blueprint from Top Researchers

The article analyzes a groundbreaking AI research blueprint proposed by Yann LeCun, Emmanuel Dupoux, and Jitendra Malik, outlining three interacting systems—observation, action, and meta‑control—to enable machines to learn autonomously like infants, while highlighting technical and ethical challenges.

AI ArchitectureMeta Learningautonomous learning
0 likes · 13 min read
Can AI Achieve Human‑Like Autonomous Learning? A Blueprint from Top Researchers
Data Party THU
Data Party THU
Mar 21, 2026 · Artificial Intelligence

Why Bigger Context Windows Hurt LLMs and How RAG Still Wins

The article explains that expanding LLM context windows leads to attention dilution and retrieval collapse, degrading answer quality, and argues that Retrieval‑Augmented Generation remains essential because it preserves signal density through focused retrieval and selective prompting.

AI ArchitectureAttention DilutionLLM
0 likes · 8 min read
Why Bigger Context Windows Hurt LLMs and How RAG Still Wins
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 20, 2026 · Artificial Intelligence

Mastering Multi‑Agent Patterns with AgentScope and Spring AI Alibaba

This article analyzes the evolution of enterprise AI from single‑model chat to scalable multi‑agent workflows, explains seven core multi‑agent patterns—including Pipeline, Routing, Skills, Subagents, Supervisor, Handoffs, and Custom Workflow—provides detailed implementation guidance with Java code, and shows how Spring AI Alibaba now natively supports AgentScope orchestration for robust, observable AI applications.

AI ArchitectureAgentScopeJava
0 likes · 23 min read
Mastering Multi‑Agent Patterns with AgentScope and Spring AI Alibaba
Coder Circle
Coder Circle
Mar 19, 2026 · Artificial Intelligence

OpenAI’s GPT‑5.4 mini and nano usher in the AI Execution‑Layer era

OpenAI’s March 17 release of GPT‑5.4 mini and nano marks a shift from single‑large‑model AI to a layered architecture with a control plane for complex reasoning and a data plane for high‑frequency tasks, delivering near‑flagship performance at a fraction of the cost and paving the way for hybrid agent systems and micro‑service‑style AI infrastructure.

AI ArchitectureControl PlaneData Plane
0 likes · 8 min read
OpenAI’s GPT‑5.4 mini and nano usher in the AI Execution‑Layer era
Tech Freedom Circle
Tech Freedom Circle
Mar 19, 2026 · Artificial Intelligence

Failed Alibaba Interview: The 4 RAG Modules and 6 Design Principles You Need

The article dissects a failed Alibaba second‑round interview where the candidate answered only “vector‑search‑enhanced” for a RAG design, and then presents a systematic, four‑module RAG architecture together with six design principles, detailed indexing, query understanding, multi‑path recall, and context generation techniques to help candidates demonstrate comprehensive technical depth.

AI ArchitectureKnowledge GraphMulti‑Path Recall
0 likes · 22 min read
Failed Alibaba Interview: The 4 RAG Modules and 6 Design Principles You Need
DeepHub IMBA
DeepHub IMBA
Mar 13, 2026 · Artificial Intelligence

Why Bigger Context Windows Make RAG Essential, Not Redundant

Although expanding LLM context windows seems to eliminate the need for Retrieval‑Augmented Generation, in practice larger windows dilute attention and cause retrieval failures, so RAG remains crucial for filtering high‑signal content and maintaining answer quality.

AI ArchitectureAttention DilutionLLM
0 likes · 7 min read
Why Bigger Context Windows Make RAG Essential, Not Redundant
AI Waka
AI Waka
Mar 13, 2026 · Artificial Intelligence

How to Map Enterprise Workflows to Agentic AI Execution Graphs

This article explores the evolution of Agentic AI, outlines a full lifecycle for designing, deploying, and governing AI agents, presents a reference architecture, and demonstrates a practical case study of automating a customer service desk using agentified workflows.

AI ArchitectureAgentic AIEnterprise Automation
0 likes · 15 min read
How to Map Enterprise Workflows to Agentic AI Execution Graphs
AI Explorer
AI Explorer
Mar 12, 2026 · Artificial Intelligence

Nvidia’s Open‑Source Nemotron 3 Super: Hybrid Mamba‑MoE Architecture Boosts Performance and Efficiency

Nvidia’s newly released open‑source 120‑billion‑parameter Nemotron 3 Super uses a hybrid Mamba‑MoE architecture that activates only a fraction of its parameters during inference, delivering up to 300 % faster inference while cutting costs, and its open‑source release aims to set new AI standards, influence ecosystem adoption, and spark a competition between architectural innovation and data quality.

AI ArchitectureMamba-MoENemotron-3-Super
0 likes · 6 min read
Nvidia’s Open‑Source Nemotron 3 Super: Hybrid Mamba‑MoE Architecture Boosts Performance and Efficiency
SuanNi
SuanNi
Mar 7, 2026 · Artificial Intelligence

How HY‑WU Enables Real‑Time Dynamic Parameters for Large‑Scale AI Models

Tencent's HY‑WU architecture introduces functional memory that generates task‑specific parameters on the fly, overcoming catastrophic forgetting and static‑weight limitations, and demonstrates superior performance in image‑editing benchmarks compared to leading open‑source and closed‑source models.

AI ArchitectureTencentdynamic parameters
0 likes · 12 min read
How HY‑WU Enables Real‑Time Dynamic Parameters for Large‑Scale AI Models
JD Tech
JD Tech
Feb 27, 2026 · Artificial Intelligence

Why Agent Skills and MCP Should Work Together, Not Compete

This article clarifies the distinct roles of Agent Skills and Model Context Protocol (MCP), compares their core features, shows how they complement each other through design philosophy and real‑world scenarios, and provides a decision framework for choosing the right tool in AI agent architectures.

AI ArchitectureAgent SkillsAgentic AI
0 likes · 26 min read
Why Agent Skills and MCP Should Work Together, Not Compete
SuanNi
SuanNi
Feb 26, 2026 · Artificial Intelligence

How Alibaba’s Qwen3.5 Series Redefines Efficient Large‑Model Design

Alibaba’s newly released Qwen3.5 series—spanning 27B, 35B, and 122B parameter models—demonstrates how hybrid compute, high‑quality data, and reinforcement‑learning can boost multimodal understanding, ultra‑long‑context handling, and multilingual support while drastically lowering hardware requirements, marking a shift from pure scaling to efficient AI evolution.

AI ArchitectureMultimodal AIlong context
0 likes · 7 min read
How Alibaba’s Qwen3.5 Series Redefines Efficient Large‑Model Design
PaperAgent
PaperAgent
Feb 23, 2026 · Industry Insights

Why Enterprise AI Fails and How Unified Context Layers Can Unlock True Autonomy

Enterprise AI projects are failing at alarming rates because fragmented context and lack of governance prevent autonomous agents from making decisions, and the Unified Context Layer (UCL) architecture offers a comprehensive solution that operationalizes context graphs, integrates existing systems, and enables truly autonomous, production‑grade AI.

AI ArchitectureAutonomous AgentsContext Engineering
0 likes · 15 min read
Why Enterprise AI Fails and How Unified Context Layers Can Unlock True Autonomy
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 19, 2026 · Artificial Intelligence

Inside GLM-5: Training Techniques, Architecture Innovations, and Benchmark Performance

The article dissects GLM-5’s 744B‑parameter MoE design, 28.5 T token training corpus, novel Muon Split and MLA‑256 optimizations, DSA sparse attention, a fully asynchronous RL pipeline, extensive domestic chip adaptation, and benchmark results that place it on par with Claude Opus 4.5 and ahead of Gemini 3 Pro.

AI ArchitectureBenchmarkDSA
0 likes · 13 min read
Inside GLM-5: Training Techniques, Architecture Innovations, and Benchmark Performance
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 14, 2026 · Artificial Intelligence

Revamping AliGo’s AI Travel Assistant: Multi‑Agent Architecture & Prompt Engineering

The AliGo travel platform upgraded its AI assistant by replacing a single‑agent workflow with a modular multi‑agent system, introducing dynamic prompt generation, real‑time reasoning chains, context sharing, observability, and a knowledge base, which dramatically improved accuracy, stability, and user experience.

AI ArchitectureAgentScopeKnowledge Base
0 likes · 19 min read
Revamping AliGo’s AI Travel Assistant: Multi‑Agent Architecture & Prompt Engineering
PMTalk Product Manager Community
PMTalk Product Manager Community
Feb 13, 2026 · Artificial Intelligence

From Zero to One: Building a Deployable RAG System for Intelligent Customer Service

This article walks product managers through the end‑to‑end design of a Retrieval‑Augmented Generation (RAG) intelligent‑customer‑service system, covering business value, knowledge‑base preparation, hybrid retrieval, prompt‑driven generation, deployment choices, monitoring metrics, and common methodological pitfalls.

AI ArchitectureIntelligent Customer ServiceKnowledge Retrieval
0 likes · 11 min read
From Zero to One: Building a Deployable RAG System for Intelligent Customer Service
AI Software Product Manager
AI Software Product Manager
Feb 4, 2026 · Artificial Intelligence

Mastering Agent Skills: A Systematic Guide to Large Model Capabilities

This article traces the evolution of large‑model capabilities from early plugins to the standardized Agent Skills framework, explains the core concepts, technical composition, and progressive disclosure mechanism, and provides a step‑by‑step practical guide for building, configuring, and deploying Skills across ecosystems.

AI ArchitectureAI OperationsAgent Skills
0 likes · 11 min read
Mastering Agent Skills: A Systematic Guide to Large Model Capabilities
大转转FE
大转转FE
Feb 2, 2026 · Artificial Intelligence

Inside Moltbot’s Core Architecture, AI Memory Systems, and ToolRL Advances

This edition of the ZuanZuan Frontend Weekly curates five in‑depth articles covering Moltbot’s underlying gateway architecture, the explosive growth of Moltbook AI agents, practical integration of Alibaba Cloud RDS AI assistants, the design of short‑ and long‑term AI Agent memory systems, and a two‑stage ToolRL approach that dramatically improves AI‑driven recommendation performance.

AI ArchitectureAI OpsAgent Memory
0 likes · 7 min read
Inside Moltbot’s Core Architecture, AI Memory Systems, and ToolRL Advances
PaperAgent
PaperAgent
Jan 21, 2026 · Artificial Intelligence

Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture

DeepSeek’s recent FlashMLA update introduces the new MODEL1, featuring a tighter KV-Cache layout, an extra two-stage cache, and a fixed 512×512 head dimension, with four code changes detailed in a public GitHub commit and illustrated by comparative diagrams.

AI ArchitectureDeepSeekFlashMLA
0 likes · 3 min read
Inside DeepSeek’s FlashMLA Update: What’s New in the MODEL1 Architecture
Architect
Architect
Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency

DeepSeek’s new paper introduces mHC, a manifold‑constrained version of Hyper‑Connections that stabilizes gradient flow, adds only 6.7% training overhead, and enables reliable training of 27‑billion‑parameter models while improving benchmark performance by about 2%.

AI ArchitectureDeep LearningLarge-Scale Training
0 likes · 7 min read
How Manifold-Constrained Hyper-Connections Boost Large Model Training Efficiency
PaperAgent
PaperAgent
Jan 1, 2026 · Artificial Intelligence

How Manifold-Constrained Hyper-Connections Boost Large-Scale Model Training Efficiency

The article introduces mHC, a Manifold‑Constrained Hyper‑Connections technique that replaces standard residual links with multiple learned pathways, using double‑stochastic matrices to lock gradients, achieving stable training of 27‑billion‑parameter models with only 6.7% extra compute and superior performance across eight downstream benchmarks.

AI ArchitectureEfficient ImplementationManifold-Constrained
0 likes · 6 min read
How Manifold-Constrained Hyper-Connections Boost Large-Scale Model Training Efficiency
PaperAgent
PaperAgent
Dec 31, 2025 · Artificial Intelligence

World Models Meet Embodied AI: The Next Leap for Agentic Systems

The article surveys the rise of agentic AI in 2025, highlights 2026’s shift toward world models combined with embodied intelligence, explains the concept and benefits of world models, and compares three architectural paradigms—modular, sequential, and unified—offering guidance for selecting the best approach.

AI ArchitectureAgentic AIEmbodied Intelligence
0 likes · 8 min read
World Models Meet Embodied AI: The Next Leap for Agentic Systems
Tencent Cloud Developer
Tencent Cloud Developer
Dec 24, 2025 · Backend Development

How IMA Scaled Its AI Knowledge Base from Monolith to Micro‑services

This article walks through the end‑to‑end design of IMA's AI‑driven knowledge base, covering its definition, core business flow, architecture evolution, data ingestion pipelines, management challenges, asynchronous processing, permission modeling, and the business value demonstrated by the prototype.

AI ArchitectureData ConsistencyKnowledge Base
0 likes · 14 min read
How IMA Scaled Its AI Knowledge Base from Monolith to Micro‑services
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 20, 2025 · Artificial Intelligence

How General‑Purpose Agents Are Converging on Claude Code and Deep Agent Designs

The article analyzes the 2025 shift toward a unified "general‑type" agent architecture exemplified by Claude Code and Deep Agent, detailing industry adoption, core technical features, skill‑based extensions, long‑running capabilities, and practical steps for building domain‑specific agents.

AI ArchitectureAgent SkillsClaude Code
0 likes · 25 min read
How General‑Purpose Agents Are Converging on Claude Code and Deep Agent Designs
ShiZhen AI
ShiZhen AI
Dec 5, 2025 · Artificial Intelligence

Can AI Achieve Human‑Like Long‑Term Memory? Inside Google’s Titans Architecture

Google’s newly unveiled Titans architecture tackles AI’s “forgetfulness” by embedding a Neural Long‑Term Memory (LMM) module that updates model weights during inference using a test‑time training approach and a MIRAS surprise metric, enabling over 2 million‑token context with linear O(N) computation and superior benchmark results versus GPT‑4 RAG.

AI ArchitectureGoogle TitansLong-term Memory
0 likes · 5 min read
Can AI Achieve Human‑Like Long‑Term Memory? Inside Google’s Titans Architecture
ITPUB
ITPUB
Nov 24, 2025 · Artificial Intelligence

Why Memory, Not Size, Is the Next Bottleneck for Large Language Models

In a detailed interview, the CTO of Memory Tensor (Shanghai) explains how limited memory capacity hampers large models, outlines the MemOS memory operating system, discusses information‑theoretic metrics, multimodal extensions, and reinforcement‑learning strategies for scalable, secure, and explainable AI memory management.

AI ArchitectureMultimodal AIinformation theory
0 likes · 23 min read
Why Memory, Not Size, Is the Next Bottleneck for Large Language Models
Data Party THU
Data Party THU
Nov 21, 2025 · Artificial Intelligence

Unlocking 2025 Multi-Agent AI: Core Tech, Frameworks, and Emerging Trends

This article analyzes the technical foundations, development frameworks, real‑time inference optimizations, typical industry deployments, and future research directions of multi‑agent systems in 2025, highlighting protocols like FIPA‑ACL and MCP, tools such as LangGraph and ADP3.0, and edge‑computing breakthroughs.

AI ArchitectureModel Quantizationdistributed computing
0 likes · 16 min read
Unlocking 2025 Multi-Agent AI: Core Tech, Frameworks, and Emerging Trends
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 13, 2025 · Artificial Intelligence

How Large‑and‑Small Language Model Collaboration Is Shaping the Future

The article argues that combining large, high‑capacity models with lightweight, fine‑tuned small models can cut costs, lower latency, enable specialized vertical tasks, and shift development from chasing ever‑bigger models toward optimal system architectures, outlining key techniques such as state‑space models, knowledge distillation, and staged fine‑tuning.

AI ArchitectureFine-tuningefficiency
0 likes · 3 min read
How Large‑and‑Small Language Model Collaboration Is Shaping the Future
Fun with Large Models
Fun with Large Models
Sep 30, 2025 · Artificial Intelligence

DeepSeek-V3.2 Architecture Breakthrough: A 5‑Minute Guide to Its Core Features

The article introduces DeepSeek-V3.2, highlighting its new DeepSeek Sparse Attention (DSA) that boosts training and inference efficiency by up to 50%, cuts model usage costs dramatically, explains the updated API endpoints, and details the four‑stage post‑training pipeline that underpins the model’s performance improvements.

AI ArchitectureDSADeepSeek-V3.2
0 likes · 8 min read
DeepSeek-V3.2 Architecture Breakthrough: A 5‑Minute Guide to Its Core Features
Data Party THU
Data Party THU
Sep 28, 2025 · Artificial Intelligence

Can the OaK Architecture Unlock General AI? A Deep Dive into Continuous Learning and Planning

The article presents Richard Sutton’s OaK architecture—a domain‑general, empirical, open‑ended framework that equips agents with continuously learnable components, meta‑learned step‑sizes, and a five‑stage FC‑STOMP pipeline to build world models, generate sub‑problems, learn options, and plan at run‑time.

AI ArchitectureWorld Modelscontinual learning
0 likes · 22 min read
Can the OaK Architecture Unlock General AI? A Deep Dive into Continuous Learning and Planning
Tech Freedom Circle
Tech Freedom Circle
Sep 25, 2025 · Artificial Intelligence

Inside RAGFlow: How Its Microservice Architecture Powers an Enterprise‑Grade Retrieval‑Augmented Generation Platform

This article provides a detailed technical walkthrough of RAGFlow's architecture, covering its microservice design, directory layout, layered structure, cloud‑native deployment, core modules such as DeepDoc, RAG engine, Agent system, and web UI, as well as multi‑tenant isolation, streaming responses, asynchronous task handling, concurrency controls, scalability strategies, and a complete request‑lifecycle example for document upload.

AI ArchitectureDeepDocDocker Compose
0 likes · 26 min read
Inside RAGFlow: How Its Microservice Architecture Powers an Enterprise‑Grade Retrieval‑Augmented Generation Platform
IT Architects Alliance
IT Architects Alliance
Sep 17, 2025 · Artificial Intelligence

How Distributed Scheduling Redefines AI Large-Model Training Architecture

The article examines how the explosive compute, storage, network, and fault‑tolerance demands of AI large‑model training force a fundamental redesign of system architecture, covering layered storage, optimized All‑Reduce communication, elastic resource orchestration, observability, and cost‑saving strategies.

AI ArchitectureCompute SchedulingCost Optimization
0 likes · 9 min read
How Distributed Scheduling Redefines AI Large-Model Training Architecture
IT Architects Alliance
IT Architects Alliance
Sep 10, 2025 · Cloud Native

How AI, Cloud‑Native, and Platform Engineering Redefine System Architecture in 2024

Amid rapid AI breakthroughs, mature cloud‑native infrastructure, and rising edge computing, architects must adopt platform engineering, event‑driven and composable architectures, and AI‑native designs, while evolving technical and soft skills to meet escalating business complexity and guide technology selection over the next five years.

AI ArchitectureEdge ComputingEvent-driven
0 likes · 12 min read
How AI, Cloud‑Native, and Platform Engineering Redefine System Architecture in 2024
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 10, 2025 · Artificial Intelligence

Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction

A recent Hugging Face pull request reveals Alibaba’s upcoming Qwen3‑Next series, highlighting its extreme‑context, parameter‑efficient design that combines a 1:50 high‑sparsity MoE, a hybrid attention architecture mixing gated attention with Gated DeltaNet, and a Multi‑Token Prediction technique, promising ten‑fold throughput gains for 32K‑plus token contexts.

AI ArchitectureMulti-token PredictionQwen3-Next
0 likes · 8 min read
Qwen3-Next Unveiled: Sparse MoE, Hybrid Attention & Multi‑Token Prediction
Architects Research Society
Architects Research Society
Sep 9, 2025 · Artificial Intelligence

Unlocking AI Autonomy: How Agentic Workflows Transform Complex Processes

Agentic Workflows introduce a dynamic, multi‑step AI orchestration framework that externalizes decision points, embeds observability, and supports branching, looping, and human intervention, enabling autonomous agents to automate intricate workflows across domains such as threat detection, fraud handling, and research assistance.

AIAI ArchitectureAutomation
0 likes · 3 min read
Unlocking AI Autonomy: How Agentic Workflows Transform Complex Processes
Architects Research Society
Architects Research Society
Sep 6, 2025 · Artificial Intelligence

From Hype to Engineered AI: The Core Architecture Behind Modern AI Apps

This article breaks down the essential components of production‑grade AI applications, covering the intelligent core (model, orchestration, memory), enterprise‑level supporting infrastructure, and critical governance, security, and data‑integrity measures required for reliable AI systems.

AI ArchitectureAI OpsLLM Orchestration
0 likes · 4 min read
From Hype to Engineered AI: The Core Architecture Behind Modern AI Apps
Instant Consumer Technology Team
Instant Consumer Technology Team
Sep 3, 2025 · Artificial Intelligence

Why Context Modeling Could Replace RAG – Insights from DeepVista CEO Jing Conan Wang

In a two‑hour interview, DeepVista CEO Jing Conan Wang explains how his new "context modeling" paradigm addresses the rigidity, lack of personalization, and performance limits of current RAG‑based AI agents, proposing a dual‑model architecture that learns and adapts context dynamically for faster, more accurate results.

AI ArchitectureLLM optimizationPersonalized AI
0 likes · 15 min read
Why Context Modeling Could Replace RAG – Insights from DeepVista CEO Jing Conan Wang
AI Algorithm Path
AI Algorithm Path
Aug 8, 2025 · Artificial Intelligence

GPT‑5 Is Here: In‑Depth Technical Walkthrough of Architecture, Features, and Benchmarks

OpenAI’s GPT‑5, released on August 7 2025, introduces a unified system with real‑time routing, up to 400 k token context windows, multiple model families, refined safety mechanisms, new API controls, and benchmark results that show it surpasses GPT‑4 across intelligence, coding, instruction following, function calling and multimodal tasks.

AI ArchitectureAPIBenchmark
0 likes · 9 min read
GPT‑5 Is Here: In‑Depth Technical Walkthrough of Architecture, Features, and Benchmarks
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 29, 2025 · Artificial Intelligence

How to Transform Chaotic AI Prompts into Robust System Designs

This article examines the pitfalls of rule‑heavy prompt engineering, introduces a systematic four‑layer architecture for AI prompts, outlines six practical compilation principles, and demonstrates how to rewrite a tangled prompt into a clear, maintainable, and scalable system blueprint.

AI ArchitectureLLMPrompt engineering
0 likes · 84 min read
How to Transform Chaotic AI Prompts into Robust System Designs
AI Frontier Lectures
AI Frontier Lectures
Jul 24, 2025 · Artificial Intelligence

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

This article analyzes the fundamental differences between state space models (SSM) and Transformer architectures, highlighting their three core components, training efficiency, memory handling, tokenization impact, and empirical performance trade‑offs, and argues why SSMs can outperform Transformers on many sequence tasks.

AI ArchitectureSequence ModelingTransformers
0 likes · 19 min read
State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling
Architect
Architect
Jul 11, 2025 · Artificial Intelligence

How OpenAI’s Zero‑Vector Agentic RAG Redefines AI Knowledge Retrieval

OpenAI’s new non‑vectorized Agentic RAG approach replaces traditional vector search with a hierarchical, multi‑round content selection process, leveraging large‑context models like GPT‑4.1‑mini for efficient document loading, dynamic navigation, and accurate answer generation, while outlining model selection strategies, cost trade‑offs, and production considerations.

AI ArchitectureModel SelectionRAG
0 likes · 15 min read
How OpenAI’s Zero‑Vector Agentic RAG Redefines AI Knowledge Retrieval
Data Thinking Notes
Data Thinking Notes
Jun 24, 2025 · Artificial Intelligence

Anthropic’s Multi‑Agent Research System: Architecture, Lessons & 90% Performance Boost

Anthropic’s detailed post explains how its new Research feature uses a multi‑agent architecture with a lead coordinator and parallel sub‑agents, covering design principles, prompt engineering tricks, evaluation methods, production reliability challenges, and the substantial performance gains achieved over single‑agent baselines.

AI ArchitectureLLM researchPrompt engineering
0 likes · 21 min read
Anthropic’s Multi‑Agent Research System: Architecture, Lessons & 90% Performance Boost
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 10, 2025 · Artificial Intelligence

How AI Application Architectures Evolve: From Simple LLM Calls to Guardrails, Routing, and Agents

This article traces the evolution of AI application architectures—from the earliest minimal user‑LLM interaction to advanced designs featuring context enhancement, input/output guardrails, intent routing, model gateways, caching strategies, agent capabilities, monitoring, and inference performance optimizations—providing practical insights and references for developers.

AI ArchitectureAgentInference Optimization
0 likes · 21 min read
How AI Application Architectures Evolve: From Simple LLM Calls to Guardrails, Routing, and Agents
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 9, 2025 · Artificial Intelligence

What Are Foundation Agents? A Deep Dive into Next‑Gen AI Architectures

This article reviews the 2025 "Advances and Challenges in Foundation Agents" paper, defining the Foundation Agent concept, detailing its seven core components, exploring self‑evolution, multi‑agent collaboration, and the safety and alignment challenges required to build trustworthy, autonomous AI systems.

AI ArchitectureAlignmentFoundation Agents
0 likes · 16 min read
What Are Foundation Agents? A Deep Dive into Next‑Gen AI Architectures
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 5, 2025 · Artificial Intelligence

Why Large Models Are Redefining Software: The Four AI Tech Drivers

The article explains how rapid AI advances and the AIAgent architecture are reshaping software development, outlines four key technical drivers—embedding, Transformer scaling laws, scenario Moore's law, and LLM OS—and discusses the security, professionalism, and responsibility challenges enterprises face when deploying AI‑native applications.

AI ArchitectureEmbeddingEnterprise AI
0 likes · 6 min read
Why Large Models Are Redefining Software: The Four AI Tech Drivers
Java Web Project
Java Web Project
Jun 4, 2025 · Artificial Intelligence

Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge

The article analyzes DeepSeek's rapid adoption, detailing its seven core models, the third‑generation MoE architecture, FP8 mixed‑precision training, 128K context window, benchmark superiority on MMLU/HumanEval/CMMLU, low training cost, and fully open‑source release, while also introducing a companion guide for developers.

AI ArchitectureDeepSeekFP8 training
0 likes · 9 min read
Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge
DataFunSummit
DataFunSummit
Jun 2, 2025 · Artificial Intelligence

Enterprise Knowledge Brain Powered by Large Models and Knowledge Graphs

This article explains how the rapid development of large language models and knowledge graph technologies creates new opportunities for enterprise knowledge management, outlines the challenges of massive unstructured data, describes the architecture and core data flow of a corporate knowledge brain, and showcases key technologies and real‑world applications.

AI ArchitectureData IntegrationEnterprise AI
0 likes · 13 min read
Enterprise Knowledge Brain Powered by Large Models and Knowledge Graphs
Tencent Technical Engineering
Tencent Technical Engineering
Apr 14, 2025 · Artificial Intelligence

MCP Protocol: Technical Principles and Business Applications

The article examines the Model Context Protocol (MCP), detailing its microkernel‑based technical architecture, development timeline from Anthropic’s 2024 release to industry adoption, hands‑on implementation examples, and business use cases such as multi‑agent QQ robots, highlighting MCP’s potential to standardize AI tool integration across industries.

AI ArchitectureAI applicationsBusiness Implementation
0 likes · 14 min read
MCP Protocol: Technical Principles and Business Applications
Ma Wei Says
Ma Wei Says
Mar 15, 2025 · Artificial Intelligence

Understanding Model Context Protocol (MCP) vs. Function Calling

The Model Context Protocol (MCP), announced by Anthropic, standardizes how AI applications provide context to LLMs, offering a client‑server architecture that simplifies data and tool integration, and is compared with function calling, highlighting its benefits, workflow, controversies, and future prospects.

AI ArchitectureAnthropicFunction Calling
0 likes · 9 min read
Understanding Model Context Protocol (MCP) vs. Function Calling
AI Frontier Lectures
AI Frontier Lectures
Mar 12, 2025 · Artificial Intelligence

Can Diffusion LLMs Replace Transformers? Inside Mercury Coder’s Speed Surge

The article analyzes the growing dissatisfaction with large language models, highlights generation speed as a critical bottleneck, compares the autoregressive approach with emerging diffusion LLMs, and examines Mercury Coder’s impressive token‑per‑second performance and its implications for the future of AI architecture.

AI ArchitectureMercury CoderModel Speed
0 likes · 10 min read
Can Diffusion LLMs Replace Transformers? Inside Mercury Coder’s Speed Surge
Architect
Architect
Feb 24, 2025 · Artificial Intelligence

Inside MoBA: A Sparse Attention Framework for 10‑Million‑Token Contexts

The article details the development, architectural evolution, and practical challenges of MoBA—a sparse attention framework inspired by Mixture‑of‑Experts that scales LLM context length to 10 M tokens, supports seamless switching between full and sparse attention, and is now released as a minimal open‑source solution.

AI ArchitectureContext ParallelLLM training
0 likes · 13 min read
Inside MoBA: A Sparse Attention Framework for 10‑Million‑Token Contexts
Architects' Tech Alliance
Architects' Tech Alliance
Feb 24, 2025 · Artificial Intelligence

NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington

The NSA mechanism introduces a three‑branch hardware‑optimized sparse attention architecture—token compression, token selection, and sliding window—combined with learnable gating to balance global and local context, dramatically improving inference speed and efficiency for long‑context large language models.

AI ArchitectureDeepSeekHardware acceleration
0 likes · 5 min read
NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington
Architect
Architect
Feb 16, 2025 · Artificial Intelligence

DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights

This article provides an in‑depth technical overview of DeepSeek‑V3, DeepSeek‑R1 and Janus‑Pro models, covering their Mixture‑of‑Experts architecture, novel MLA attention, auxiliary‑loss‑free load balancing, multi‑token prediction, FP8 mixed‑precision training, efficient cross‑node communication, reinforcement‑learning pipelines, multimodal modeling strategies, performance comparisons, cost statistics, and current limitations.

AI ArchitectureDeepSeek-V3FP8 training
0 likes · 18 min read
DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights
Lao Guo's Learning Space
Lao Guo's Learning Space
Feb 15, 2025 · Artificial Intelligence

What Is deepseek-MoE? Understanding the Mixture‑of‑Experts Architecture

The article explains deepseek-MoE (Mixture of Experts), describing its full English name, Chinese translation, how a gating network selects and weights multiple expert models for each input, and uses an analogy to illustrate load‑balancing and the divide‑and‑conquer design in large AI models.

AI ArchitectureMixture of Expertsdeepseek-MoE
0 likes · 2 min read
What Is deepseek-MoE? Understanding the Mixture‑of‑Experts Architecture
IT Architects Alliance
IT Architects Alliance
Feb 8, 2025 · Artificial Intelligence

Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance

This article examines DeepSeek's advanced Transformer‑based architecture, dynamic routing, MoE system, multi‑stage training, efficient inference, multimodal capabilities, real‑world applications, technical challenges, and future prospects, providing a comprehensive technical analysis of the model's strengths and limitations.

AI ArchitectureDeepSeekModel Optimization
0 likes · 15 min read
Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance
Infra Learning Club
Infra Learning Club
Feb 7, 2025 · Artificial Intelligence

Understanding LLM Agents: Architecture, Capabilities, and Key Challenges

This article explains what LLM agents are, their core components—brain, memory, planning, and tool use—illustrates how they handle complex queries through task decomposition, surveys notable frameworks, and discusses key challenges such as limited context, long‑term planning difficulties, output inconsistency, and prompt dependence.

AI ArchitectureLLM agentsMemory
0 likes · 15 min read
Understanding LLM Agents: Architecture, Capabilities, and Key Challenges
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 7, 2025 · Artificial Intelligence

Why DeepSeek V3 Achieves Low Training Costs: Inside Its AI Innovations

This article provides a comprehensive analysis of DeepSeek's large‑language‑model technology, covering the company's background, model capabilities, remarkably low training and inference costs, and the core architectural and algorithmic innovations such as MoE, MLA attention, FP8 mixed‑precision, and the DualPipe pipeline that enable efficient large‑scale AI deployment.

AI ArchitectureDeepSeekFP8 training
0 likes · 19 min read
Why DeepSeek V3 Achieves Low Training Costs: Inside Its AI Innovations
Airbnb Technology Team
Airbnb Technology Team
Dec 12, 2024 · Artificial Intelligence

Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI

Airbnb’s Automation Platform v2 replaces the rigid, workflow‑driven architecture of v1 with an LLM‑centric design that orchestrates context gathering, chain‑of‑thought reasoning, tool execution, and guardrails, enabling more natural, scalable, and safe conversational AI while preserving the reliability of traditional workflows.

AI ArchitectureAirbnbConversational AI
0 likes · 11 min read
Airbnb Automation Platform v2: Enabling LLM‑Driven Conversational AI