Artificial Intelligence 11 min read

Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework

This article introduces Alibaba's LangEngine, a pure Java AI application framework, detailing its high‑availability gateway architecture, communication protocols, streaming and non‑streaming output, multi‑level metadata caching, asynchronous and serverless designs, and future open‑source roadmap, offering practical guidance for building robust AI services.

Cognitive Technology Team

Feb 28, 2025

Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework

LangEngine is an internal Alibaba pure‑Java AI application development framework that has been widely adopted across many business scenarios such as Taobao, Tmall, Alibaba Cloud, and others, and now has been open‑sourced at https://github.com/AIDC-AI/ali-langengine .

The article describes the design principles and experience of building a high‑availability AI application gateway using LangEngine, covering communication protocols (HTTP and HSF, both streaming and non‑streaming), runtime protocols, AI component integration, and gateway routing.

Key capabilities include multi‑level metadata caching to reduce database load, comprehensive logging and monitoring via SLS, Blink, and Hologres, and support for both streaming and non‑streaming output to improve memory usage, latency, and server load.

LangEngine implements a multi‑tier cache architecture (local → distributed → database) to enhance performance under high concurrency, and provides asynchronous design patterns where synchronous HTTP requests can be transformed into asynchronous tasks using CompletableFuture, improving throughput and preventing thread blocking.

The framework also adopts serverless techniques to isolate application containers, allowing developers to focus on business logic while the platform handles underlying dependencies and upgrades.

Core modules (LangEngine‑Core) consist of Retrieval, Model I/O, Memory, Chains, Agents, and Callbacks, with open‑source repositories for core and community extensions. The LangRunnable architecture enables dynamic chain composition and unified invocation methods (invoke, batch, stream, invokeAsync).

Future directions include open‑sourcing the AgentFramework, extending streaming and agent asynchronous support, developing a multi‑agent execution engine, and providing a visual AI application building platform with one‑click REST API gateway deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM High Availability Streaming AI Framework asynchronous design LangEngine

Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.