Deploying LLMs with LangServe: A Complete Guide from Setup to Client Calls

This article introduces LangServe, explains its key features for LLM deployment, walks through environment setup, shows how to build a FastAPI‑based REST API with code examples, demonstrates testing via Postman and remote client calls, and summarizes its benefits for AI model serving.

JavaEdge
JavaEdge
JavaEdge
Deploying LLMs with LangServe: A Complete Guide from Setup to Client Calls

Introduction

LangServe is a framework that simplifies the deployment and operation of large language models (LLMs) by turning them into production‑ready RESTful services.

Overview

LangServe integrates with common Python web frameworks such as FastAPI, Pydantic, uvloop, and asyncio to generate a complete REST API. It bundles components for model management, request handling, inference, result caching, monitoring, logging, and an API gateway, reducing operational overhead from prototype to production.

Repository: https://github.com/langchain-ai/langserve

Core Features

Multi‑Model Support

Deploy and switch among various AI models (text generation, image recognition, speech processing, etc.).

Efficient Inference Cache

Built‑in result caching stores hot data to accelerate responses and save compute resources.

Secure Access Control

Role‑ and policy‑based access management protects service security and data privacy.

Real‑time Monitoring & Logging

Integrated monitoring tracks service health; detailed logs aid debugging and analysis.

Simple API Design

The API is concise and intuitive, lowering the learning curve for developers.

Building a REST API with LangServe

Environment Preparation

Install the required packages: pip install "langserve[all]" Set the provider API key as an environment variable (example for OpenAI):

OPENAI_API_KEY=<your_valid_openai_api_key>

Server Code

The following example creates a FastAPI server, adds a direct OpenAI endpoint, and a translation endpoint built from a prompt:

from fastapi import FastAPI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI
from langserve import add_routes

app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="A simple API server using LangChain's Runnable interfaces",
)

# Endpoint 1: direct OpenAI model
add_routes(app, ChatOpenAI(), path="/openai")

# Prompt for translation
system_message_prompt = SystemMessagePromptTemplate.from_template(
    """You are a helpful assistant that translates {input_language} to {output_language}."""
)
human_message_prompt = HumanMessagePromptTemplate.from_template("{text}")
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

# Endpoint 2: translation using the prompt
add_routes(app, chat_prompt | ChatOpenAI(), path="/translate")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="localhost", port=9999)

Running the Server

python app.py

Testing with Postman

After the server starts, the OpenAPI documentation is normally available at http://localhost:9999/docs. If the docs are unavailable due to compatibility issues, use Postman to send requests to /openai and /translate endpoints.

Client‑Side Invocation

A Python client can call the remote LangServe endpoints using RemoteRunnable:

from langchain.prompts.chat import ChatPromptTemplate
from langserve import RemoteRunnable

# Configure remote endpoint
openai_llm = RemoteRunnable("http://localhost:9999/openai/")

# Build a prompt for translation (example prompt)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a senior LLM expert"),
    ("human", "Please list five popular open‑source LLMs from China and five from abroad.")
]).format_messages()

# Invoke the model
response = openai_llm.invoke(prompt)
print(response)

Sample response (truncated):

AIMessage(content='When discussing open‑source LLMs, common examples include:

**Foreign:**
1. BERT (Google)
2. GPT‑3 (OpenAI)
3. RoBERTa (Meta)
4. T5 (Google)
5. XLNet (Google Brain)

**Domestic:**
1. ERNIE (Baidu)
2. GPT‑2 (Harbin Institute of Technology & iFlytek)
3. HFL/THU‑Bert (Tsinghua University)
4. RoFormer (Huawei)
5. PaddleNLP (Baidu)')

Conclusion

LangServe provides a purpose‑built platform for deploying and operating AI models. Its architecture and feature set reduce development effort, improve service stability, and support scalability for both startups and large enterprises as LLM technology continues to evolve.

PythonREST APIFastAPILLM deploymentAI model servingLangServe
JavaEdge
Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.