Stream Real-Time Chat with Ollama’s qwen3 Model via Async Python & LangChain
This guide walks you through installing Ollama, downloading the qwen3:4b model, and using Python’s async client to perform streaming chat requests, then shows how to integrate the same model with LangChain, including setup, initialization, and both regular and streaming output examples.
Prerequisites
Ensure that Ollama is installed locally and that the qwen3:4b model has been pulled.
Asynchronous Python with Ollama
The script below creates an AsyncClient, sends a user message to the qwen3:4b model, and streams the response token‑by‑token.
import asyncio
from ollama import AsyncClient
# Connect to the local Ollama service
client = AsyncClient(host='http://localhost:11434')
async def chat():
# User message
message = {'role': 'user', 'content': '你好,如何入门大模型学习'}
# Stream the chat response
async for part in await client.chat(
model='qwen3:4b',
messages=[message],
stream=True
):
# Print each fragment immediately
print(part['message']['content'], end='', flush=True)
# Run the async function
asyncio.run(chat())Key Points
AsyncClient: Ollama’s asynchronous client for non‑blocking or high‑concurrency use cases. stream=True: Enables incremental token delivery instead of waiting for the full answer. async for: Asynchronously iterates over the streamed chunks. flush=True: Forces each printed fragment to appear immediately in the terminal.
Running the script
When executed, the program prints the model’s answer in real time.
💡 Tip: If a connection error occurs, verify that the Ollama service is running by opening http://localhost:11434 in a browser.
Connecting LangChain to Ollama
Install the required packages
pip install langchain langchain-community langchain-core langchain-ollamaInitialize the Ollama model in LangChain
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage
# Create a ChatOllama instance (base_url defaults to http://localhost:11434)
llm = ChatOllama(
base_url="http://localhost:11434",
model="qwen3:4b",
temperature=0.7,
)
# Send a single message and retrieve the response
messages = [HumanMessage(content="你好,如何入门大模型")]
response = llm.invoke(messages)
print(response.content)Streaming output (ChatGPT‑like experience)
for chunk in llm.stream("写一首关于春天的短诗"):
print(chunk.content, end="", flush=True)Full source example
from langchain_ollama import ChatOllama
llm = ChatOllama(
base_url="http://localhost:11434",
model="qwen3:4b",
temperature=0.7,
)
for chunk in llm.stream("如何入门大模型"):
print(chunk.content, end="", flush=True)Typical use cases
Local knowledge‑base question answering (RAG)
Personal data‑analysis assistants
Offline code‑completion or assistance tools
Privacy‑preserving chatbots
Cognitive Technology Team
Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
