6 min read

Deploy Langchain‑ChatGLM on Volcengine VKE: A Step‑by‑Step Cloud‑Native Guide

This tutorial walks you through preparing a VKE cluster, pulling the Langchain‑ChatGLM container image, creating the necessary Deployment and Service resources, and adding a local knowledge base, enabling you to run a Langchain‑based ChatGLM service with GPU support on Volcengine’s cloud‑native platform.

Volcano Engine Developer Services

Jun 30, 2023

Deploy Langchain‑ChatGLM on Volcengine VKE: A Step‑by‑Step Cloud‑Native Guide

Langchain is a framework for building applications with large language models, providing components to connect LLMs with external data sources. This article explains how to deploy Langchain‑ChatGLM on Volcengine’s VKE platform.

What is Langchain‑ChatGLM

Langchain‑ChatGLM combines the Langchain framework with the ChatGLM‑6B model, enabling conversational AI with knowledge base integration.

Step 1: Prepare VKE Cluster

Log in to the Volcengine console, create a VKE cluster (version 1.24) with VPC‑CNI networking and enable public access. Choose to create GPU nodes (ecs.gni2.3xlarge NVIDIA A10) and install the nvidia‑device‑plugin component.

Step 2: Download Code, Model and Build Image

The code and models are open‑source and can be obtained from GitHub and HuggingFace:

Code: https://github.com/imClumsyPanda/langchain-ChatGLM

ChatGLM‑6B model: https://huggingface.co/THUDM/chatglm-6b

Embedding model: https://huggingface.co/GanymedeNil/text2vec-large-chinese

A pre‑built container image (cr-demo-cn-beijing.cr.volces.com/vke-ai/langchain-chatglm:v0.0.1) containing the models (~24 GB) is provided for quick deployment.

Step 3: Create Langchain‑ChatGLM Service

In the VKE console, create a Deployment named langchain-new with one replica, using the provided image and requesting one GPU:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-new
spec:
  replicas: 1
  selector:
    matchLabels:
      app: langchain-new
  template:
    metadata:
      labels:
        app: langchain-new
    spec:
      containers:
      - image: cr-demo-cn-beijing.cr.volces.com/vke-ai/langchain-chatglm:v0.0.1
        name: langchain
        resources:
          limits:
            nvidia.com/gpu: "1"

Expose the deployment with a LoadBalancer Service on port 80 mapping to container port 7860:

apiVersion: v1
kind: Service
metadata:
  name: langchain-new
spec:
  ports:
  - name: langchain
    port: 80
    protocol: TCP
    targetPort: 7860
  selector:
    app: langchain-new
  type: LoadBalancer

Step 4: Add a Local Knowledge Base

In the running service, switch to “knowledge base Q&A” mode, create a new knowledge base (name must be non‑Chinese), upload files or folders, and the Langchain model will automatically index the content.

Final Demonstration

After the Service is created, access the external IP to interact with the Langchain‑ChatGLM service, optionally via Ingress ALB or API Gateway.

Related Links

Volcengine homepage: https://www.volcengine.com

Container Service (VKE): https://www.volcengine.com/product/vke

Image Registry: https://www.volcengine.com/product/cr

Model image: cr-demo-cn-beijing.cr.volces.com/vke-ai/langchain-chatglm:v0.0.1

Kubernetes AI Deployment GPU ChatGLM

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.