Tagged articles
4 articles
Page 1 of 1
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Jul 3, 2024 · Artificial Intelligence

Deploy ChatGLM3‑6B with FastGPT, One‑API, and M3E on Linux

This guide walks you through deploying the ChatGLM3‑6B large language model locally, adding the M3E vector embedding model, setting up One‑API and FastGPT with Docker, configuring environments, fine‑tuning with LoRA, and testing the integrated knowledge‑base Q&A system.

ChatGLM3DockerFastGPT
0 likes · 15 min read
Deploy ChatGLM3‑6B with FastGPT, One‑API, and M3E on Linux
DaTaobao Tech
DaTaobao Tech
Dec 27, 2023 · Artificial Intelligence

Deploying a Private LLM Knowledge Base on a MacBook

The guide walks through installing and quantizing the open‑source ChatGLM3‑6B model and the m3e‑base embedder on a MacBook, wrapping them with a FastAPI OpenAI‑compatible service, routing requests through a One‑API gateway, storing metadata in MongoDB and vectors in PostgreSQL pgvector, deploying FastGPT for RAG, ingesting data, and demonstrating 5‑7 second response times, while outlining future improvements.

ChatGLM3DeploymentFastAPI
0 likes · 23 min read
Deploying a Private LLM Knowledge Base on a MacBook
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 29, 2023 · Artificial Intelligence

Building a Private LLM‑Powered Knowledge Base with LangChain and ChatGLM3

This article explains how to migrate personal notes into a private knowledge base by combining a large language model with an external vector store, detailing the concepts of tokenization, embedding, vector databases, and step‑by‑step deployment using LangChain‑Chatchat and the open‑source ChatGLM3 model.

ChatGLM3EmbeddingKnowledge Base
0 likes · 10 min read
Building a Private LLM‑Powered Knowledge Base with LangChain and ChatGLM3
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Nov 16, 2023 · Artificial Intelligence

ChatGLM2 vs ChatGLM3: MQA, FlashAttention, and New Prompt Features

During the Saturday session, we reviewed ChatGLM2’s upgrades—Multi‑Query Attention and FlashAttention—demonstrated deployment on Ascend + ModelArts + MindSpore, and introduced ChatGLM3’s revamped prompt design, native tool‑calling and code‑interpreter capabilities, while previewing the next lecture on text‑generation decoding.

ChatGLM2ChatGLM3FlashAttention
0 likes · 6 min read
ChatGLM2 vs ChatGLM3: MQA, FlashAttention, and New Prompt Features