Deploying LLMs with LangServe: A Complete Guide from Setup to Client Calls
This article introduces LangServe, explains its key features for LLM deployment, walks through environment setup, shows how to build a FastAPI‑based REST API with code examples, demonstrates testing via Postman and remote client calls, and summarizes its benefits for AI model serving.
Introduction
LangServe is a framework that simplifies the deployment and operation of large language models (LLMs) by turning them into production‑ready RESTful services.
Overview
LangServe integrates with common Python web frameworks such as FastAPI, Pydantic, uvloop, and asyncio to generate a complete REST API. It bundles components for model management, request handling, inference, result caching, monitoring, logging, and an API gateway, reducing operational overhead from prototype to production.
Repository: https://github.com/langchain-ai/langserve
Core Features
Multi‑Model Support
Deploy and switch among various AI models (text generation, image recognition, speech processing, etc.).
Efficient Inference Cache
Built‑in result caching stores hot data to accelerate responses and save compute resources.
Secure Access Control
Role‑ and policy‑based access management protects service security and data privacy.
Real‑time Monitoring & Logging
Integrated monitoring tracks service health; detailed logs aid debugging and analysis.
Simple API Design
The API is concise and intuitive, lowering the learning curve for developers.
Building a REST API with LangServe
Environment Preparation
Install the required packages: pip install "langserve[all]" Set the provider API key as an environment variable (example for OpenAI):
OPENAI_API_KEY=<your_valid_openai_api_key>Server Code
The following example creates a FastAPI server, adds a direct OpenAI endpoint, and a translation endpoint built from a prompt:
from fastapi import FastAPI
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI
from langserve import add_routes
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple API server using LangChain's Runnable interfaces",
)
# Endpoint 1: direct OpenAI model
add_routes(app, ChatOpenAI(), path="/openai")
# Prompt for translation
system_message_prompt = SystemMessagePromptTemplate.from_template(
"""You are a helpful assistant that translates {input_language} to {output_language}."""
)
human_message_prompt = HumanMessagePromptTemplate.from_template("{text}")
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
# Endpoint 2: translation using the prompt
add_routes(app, chat_prompt | ChatOpenAI(), path="/translate")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="localhost", port=9999)Running the Server
python app.pyTesting with Postman
After the server starts, the OpenAPI documentation is normally available at http://localhost:9999/docs. If the docs are unavailable due to compatibility issues, use Postman to send requests to /openai and /translate endpoints.
Client‑Side Invocation
A Python client can call the remote LangServe endpoints using RemoteRunnable:
from langchain.prompts.chat import ChatPromptTemplate
from langserve import RemoteRunnable
# Configure remote endpoint
openai_llm = RemoteRunnable("http://localhost:9999/openai/")
# Build a prompt for translation (example prompt)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a senior LLM expert"),
("human", "Please list five popular open‑source LLMs from China and five from abroad.")
]).format_messages()
# Invoke the model
response = openai_llm.invoke(prompt)
print(response)Sample response (truncated):
AIMessage(content='When discussing open‑source LLMs, common examples include:
**Foreign:**
1. BERT (Google)
2. GPT‑3 (OpenAI)
3. RoBERTa (Meta)
4. T5 (Google)
5. XLNet (Google Brain)
**Domestic:**
1. ERNIE (Baidu)
2. GPT‑2 (Harbin Institute of Technology & iFlytek)
3. HFL/THU‑Bert (Tsinghua University)
4. RoFormer (Huawei)
5. PaddleNLP (Baidu)')Conclusion
LangServe provides a purpose‑built platform for deploying and operating AI models. Its architecture and feature set reduce development effort, improve service stability, and support scalability for both startups and large enterprises as LLM technology continues to evolve.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
