Unlock Free GLM-4-Flash API: Step-by-Step Guide, Code Samples, and Logic Puzzle Test

This article explores the free GLM-4-Flash API from Zhipu AI, detailing its lightweight architecture, performance specs, a logic‑puzzle demonstration, and provides a comprehensive step‑by‑step tutorial—including data upload, model fine‑tuning, deployment commands and example code for building a LangChain‑based knowledge‑base retrieval system.

Baobao Algorithm Notes
Baobao Algorithm Notes
Baobao Algorithm Notes
Unlock Free GLM-4-Flash API: Step-by-Step Guide, Code Samples, and Logic Puzzle Test

The post introduces the GLM-4-Flash model, a lightweight, high‑speed large language model released by Zhipu AI with a 128K context window, multilingual support, and a generation speed of about 72 tokens per second (≈115 characters). Trained on roughly 10 TB of cleaned data, it is positioned for tasks such as tagging, summarization, code generation, and translation.

Logic Puzzle Demonstration

To showcase reasoning ability, the author presents a classic “blue‑eyes/red‑eyes” puzzle involving three villagers. The model correctly solves the puzzle, proving that even the flash (FP8) version retains strong logical inference capabilities.

API Features and Access

The GLM-4-Flash API is offered for free, with generous concurrency limits for both existing and new users. The service supports standard chat completions, streaming responses, and can be called via HTTP or SDKs.

Fine‑Tuning Procedure

Users can fine‑tune the model by uploading data in a simple JSON format. The steps are:

Navigate to the model page and click the provided button (see image).

Upload a JSON file containing a {"messages": [...]} array. Example payloads are shown below.

Confirm the creation of the fine‑tuned model.

After training completes, deploy the model.

Example JSON payloads:

{"messages":[{"role":"system","content":"You are a helpful AI assistant."},{"role":"user","content":"Introduce the basics of AlphaGo."},{"role":"assistant","content":"AlphaGo combines deep neural networks with tree search..."}]}

For simpler use‑cases, the system message can be omitted:

{"messages":[{"role":"user","content":"Introduce the basics of AlphaGo."},{"role":"assistant","content":"AlphaGo combines deep neural networks with tree search..."}]}

Calling the API with Python

After deployment, the model can be accessed using the zhipuai SDK:

from zhipuai import ZhipuAI
client = ZhipuAI(api_key="YOUR_KEY")
response = client.chat.completions.create(
    model="chatglm3-6b-1001",
    messages=[
        {"role": "system", "content": "You are an AI assistant named chatGLM."},
        {"role": "user", "content": "Hello! What is your name?"}
    ],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta)

Building a Retrieval‑Augmented Generation (RAG) System with LangChain

By combining the free GLM-4-Flash API with LangChain, users can create a local knowledge‑base search engine. Required libraries include langchain, zhipuai, unstructured, pdf2image, chromadb, and tiktoken.

pip install langchain
pip install zhipuai
pip install unstructured
pip install pdf2image
pip install chromadb
pip install tiktoken

Sample script:

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA
from zhipuai import ZhipuAI
import os
os.environ["OPENAI_API_KEY"] = "Your openai key"
loader = DirectoryLoader('./knowledge_base', glob='**/*.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
split_docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(split_docs, embeddings)
client = ZhipuAI(api_key="")
chatmodel = client.chat.completions.create(model="chatglm3-6b-1001")
qa = VectorDBQA.from_chain_type(llm=chatmodel, chain_type="stuff", vectorstore=docsearch, return_source_documents=True)
result = qa({"query": "When did the GLM-4-Flash API become free?"})
print(result)

The query returns the answer 2024 August , confirming that the free API was launched at that time.

Conclusion

Providing a free, high‑performance LLM API lowers entry barriers for developers, accelerates AI adoption, and fosters a broader ecosystem of applications—from tagging and summarization to code generation and knowledge‑base retrieval—ultimately expanding the AI industry’s overall capacity.

PythonLangChainFine-tuninglarge language modelAI DeploymentGLM-4-FlashFree API
Baobao Algorithm Notes
Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.