Unlock Free GLM-4-Flash API: Step-by-Step Guide, Code Samples, and Logic Puzzle Test
This article explores the free GLM-4-Flash API from Zhipu AI, detailing its lightweight architecture, performance specs, a logic‑puzzle demonstration, and provides a comprehensive step‑by‑step tutorial—including data upload, model fine‑tuning, deployment commands and example code for building a LangChain‑based knowledge‑base retrieval system.
The post introduces the GLM-4-Flash model, a lightweight, high‑speed large language model released by Zhipu AI with a 128K context window, multilingual support, and a generation speed of about 72 tokens per second (≈115 characters). Trained on roughly 10 TB of cleaned data, it is positioned for tasks such as tagging, summarization, code generation, and translation.
Logic Puzzle Demonstration
To showcase reasoning ability, the author presents a classic “blue‑eyes/red‑eyes” puzzle involving three villagers. The model correctly solves the puzzle, proving that even the flash (FP8) version retains strong logical inference capabilities.
API Features and Access
The GLM-4-Flash API is offered for free, with generous concurrency limits for both existing and new users. The service supports standard chat completions, streaming responses, and can be called via HTTP or SDKs.
Fine‑Tuning Procedure
Users can fine‑tune the model by uploading data in a simple JSON format. The steps are:
Navigate to the model page and click the provided button (see image).
Upload a JSON file containing a {"messages": [...]} array. Example payloads are shown below.
Confirm the creation of the fine‑tuned model.
After training completes, deploy the model.
Example JSON payloads:
{"messages":[{"role":"system","content":"You are a helpful AI assistant."},{"role":"user","content":"Introduce the basics of AlphaGo."},{"role":"assistant","content":"AlphaGo combines deep neural networks with tree search..."}]}For simpler use‑cases, the system message can be omitted:
{"messages":[{"role":"user","content":"Introduce the basics of AlphaGo."},{"role":"assistant","content":"AlphaGo combines deep neural networks with tree search..."}]}Calling the API with Python
After deployment, the model can be accessed using the zhipuai SDK:
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR_KEY")
response = client.chat.completions.create(
model="chatglm3-6b-1001",
messages=[
{"role": "system", "content": "You are an AI assistant named chatGLM."},
{"role": "user", "content": "Hello! What is your name?"}
],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta)Building a Retrieval‑Augmented Generation (RAG) System with LangChain
By combining the free GLM-4-Flash API with LangChain, users can create a local knowledge‑base search engine. Required libraries include langchain, zhipuai, unstructured, pdf2image, chromadb, and tiktoken.
pip install langchain
pip install zhipuai
pip install unstructured
pip install pdf2image
pip install chromadb
pip install tiktokenSample script:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA
from zhipuai import ZhipuAI
import os
os.environ["OPENAI_API_KEY"] = "Your openai key"
loader = DirectoryLoader('./knowledge_base', glob='**/*.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
split_docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(split_docs, embeddings)
client = ZhipuAI(api_key="")
chatmodel = client.chat.completions.create(model="chatglm3-6b-1001")
qa = VectorDBQA.from_chain_type(llm=chatmodel, chain_type="stuff", vectorstore=docsearch, return_source_documents=True)
result = qa({"query": "When did the GLM-4-Flash API become free?"})
print(result)The query returns the answer 2024 August , confirming that the free API was launched at that time.
Conclusion
Providing a free, high‑performance LLM API lowers entry barriers for developers, accelerates AI adoption, and fosters a broader ecosystem of applications—from tagging and summarization to code generation and knowledge‑base retrieval—ultimately expanding the AI industry’s overall capacity.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
