Artificial Intelligence 15 min read

Deploy ChatGLM3‑6B with FastGPT, One‑API, and M3E on Linux

This guide walks you through deploying the ChatGLM3‑6B large language model locally, adding the M3E vector embedding model, setting up One‑API and FastGPT with Docker, configuring environments, fine‑tuning with LoRA, and testing the integrated knowledge‑base Q&A system.

Architect's Alchemy Furnace

Jul 3, 2024

Deploy ChatGLM3‑6B with FastGPT, One‑API, and M3E on Linux

The article outlines a complete technical pipeline for building a private knowledge‑base Q&A system using the ChatGLM3‑6B model as the core LLM, the M3E embedding model for vector search, One‑API for unified API management, and FastGPT for the front‑end service.

1. Local deployment of ChatGLM3

Clone the repository, download the model weights from HuggingFace into the models directory, and create a Conda environment with Python 3.11+, CUDA, and PyTorch. Install required packages with pip install -r requirements.txt. Run the demo with python cli_demo.py or streamlit run web_demo_streamlit.py.

# Example commands
conda create -n chatglm3 python=3.11
conda activate chatglm3
# install CUDA, PyTorch according to nvidia‑smi output
pip install -r requirements.txt
python cli_demo.py

2. LoRA fine‑tuning

Install fine‑tuning dependencies from finetune_demo/requirements.txt. Convert dataset to the ChatGLM3 conversation format, then run the fine‑tuning script with a lora.yaml configuration that specifies data paths, training arguments, and LoRA hyper‑parameters (r, lora_alpha, dropout, etc.).

# Sample JSON conversion snippet
import json
from pathlib import Path

def convert_adgen(data_dir, save_dir):
    # conversion logic ...
    pass

3. Deploying the M3E embedding model

M3E (Moka Massive Mixed Embedding) provides bilingual text embeddings. Clone the repository from Hugging Face, place the model under m3e-base, and load it in the API server with SentenceTransformer.

from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('m3e-base/', device='cuda')

4. Setting up One‑API

Install Docker and docker‑compose, then run:

# Install Docker
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
systemctl enable --now docker
# Install docker‑compose
curl -L https://github.com/docker/compose/releases/download/v2.20.3/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

Deploy One‑API with:

docker run --name one-api -d --restart always -p 3080:3000 -e TZ=Asia/Shanghai -v /home/data/one-api:/data justsong/one-api

Configure the base URL and API key for the ChatGLM and M3E models in One‑API’s channel settings.

5. Deploying FastGPT

FastGPT is a knowledge‑base Q&A system built on large language models. Pull the Docker compose files:

mkdir fastgpt && cd fastgpt
curl -O https://raw.githubusercontent.com/labring/FastGPT/main/files/deploy/fastgpt/docker-compose.yml
curl -O https://raw.githubusercontent.com/labring/FastGPT/main/projects/app/data/config.json

Edit docker-compose.yml to set OPENAI_BASE_URL to the One‑API address (e.g., http://<IP>:3080/v1) and provide the CHAT_API_KEY. Adjust config.json to register the ChatGLM3‑6B model under llmModels and the M3E model under vectorModels, specifying context length, temperature, and other flags.

{
  "llmModels": [{
    "model": "chatglm3-6B",
    "name": "chatglm3-6B",
    "maxContext": 16000,
    "maxResponse": 4000,
    "datasetProcess": true,
    "usedInClassify": true
  }],
  "vectorModels": [{
    "model": "m3e",
    "name": "m3e",
    "maxToken": 3000
  }]
}

Start FastGPT:

# Pull images and launch
docker-compose pull
docker-compose up -d

After containers are running, access FastGPT at http://<local‑ip>:3000, log in with the default root credentials (user: root, password: 1234), create an application, and test the end‑to‑end workflow.

The guide also includes troubleshooting tips such as restarting One‑API if it fails to connect to MySQL on the first attempt.

Docker ChatGLM3 LLM deployment FastGPT M3E One-API

Written by

Architect's Alchemy Furnace

A comprehensive platform that combines Java development and architecture design, guaranteeing 100% original content. We explore the essence and philosophy of architecture and provide professional technical articles for aspiring architects.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.