Deploy ChatGLM3‑6B with FastGPT, One‑API, and M3E on Linux
This guide walks you through deploying the ChatGLM3‑6B large language model locally, adding the M3E vector embedding model, setting up One‑API and FastGPT with Docker, configuring environments, fine‑tuning with LoRA, and testing the integrated knowledge‑base Q&A system.
The article outlines a complete technical pipeline for building a private knowledge‑base Q&A system using the ChatGLM3‑6B model as the core LLM, the M3E embedding model for vector search, One‑API for unified API management, and FastGPT for the front‑end service.
1. Local deployment of ChatGLM3
Clone the repository, download the model weights from HuggingFace into the models directory, and create a Conda environment with Python 3.11+, CUDA, and PyTorch. Install required packages with pip install -r requirements.txt. Run the demo with python cli_demo.py or streamlit run web_demo_streamlit.py.
# Example commands
conda create -n chatglm3 python=3.11
conda activate chatglm3
# install CUDA, PyTorch according to nvidia‑smi output
pip install -r requirements.txt
python cli_demo.py2. LoRA fine‑tuning
Install fine‑tuning dependencies from finetune_demo/requirements.txt. Convert dataset to the ChatGLM3 conversation format, then run the fine‑tuning script with a lora.yaml configuration that specifies data paths, training arguments, and LoRA hyper‑parameters (r, lora_alpha, dropout, etc.).
# Sample JSON conversion snippet
import json
from pathlib import Path
def convert_adgen(data_dir, save_dir):
# conversion logic ...
pass3. Deploying the M3E embedding model
M3E (Moka Massive Mixed Embedding) provides bilingual text embeddings. Clone the repository from Hugging Face, place the model under m3e-base, and load it in the API server with SentenceTransformer.
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('m3e-base/', device='cuda')4. Setting up One‑API
Install Docker and docker‑compose, then run:
# Install Docker
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
systemctl enable --now docker
# Install docker‑compose
curl -L https://github.com/docker/compose/releases/download/v2.20.3/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-composeDeploy One‑API with:
docker run --name one-api -d --restart always -p 3080:3000 -e TZ=Asia/Shanghai -v /home/data/one-api:/data justsong/one-apiConfigure the base URL and API key for the ChatGLM and M3E models in One‑API’s channel settings.
5. Deploying FastGPT
FastGPT is a knowledge‑base Q&A system built on large language models. Pull the Docker compose files:
mkdir fastgpt && cd fastgpt
curl -O https://raw.githubusercontent.com/labring/FastGPT/main/files/deploy/fastgpt/docker-compose.yml
curl -O https://raw.githubusercontent.com/labring/FastGPT/main/projects/app/data/config.jsonEdit docker-compose.yml to set OPENAI_BASE_URL to the One‑API address (e.g., http://<IP>:3080/v1) and provide the CHAT_API_KEY. Adjust config.json to register the ChatGLM3‑6B model under llmModels and the M3E model under vectorModels, specifying context length, temperature, and other flags.
{
"llmModels": [{
"model": "chatglm3-6B",
"name": "chatglm3-6B",
"maxContext": 16000,
"maxResponse": 4000,
"datasetProcess": true,
"usedInClassify": true
}],
"vectorModels": [{
"model": "m3e",
"name": "m3e",
"maxToken": 3000
}]
}Start FastGPT:
# Pull images and launch
docker-compose pull
docker-compose up -dAfter containers are running, access FastGPT at http://<local‑ip>:3000, log in with the default root credentials (user: root, password: 1234), create an application, and test the end‑to‑end workflow.
The guide also includes troubleshooting tips such as restarting One‑API if it fails to connect to MySQL on the first attempt.
Architect's Alchemy Furnace
A comprehensive platform that combines Java development and architecture design, guaranteeing 100% original content. We explore the essence and philosophy of architecture and provide professional technical articles for aspiring architects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
