Stream Real-Time Chat with Ollama’s qwen3 Model via Async Python & LangChain

This guide walks you through installing Ollama, downloading the qwen3:4b model, and using Python’s async client to perform streaming chat requests, then shows how to integrate the same model with LangChain, including setup, initialization, and both regular and streaming output examples.

Cognitive Technology Team
Cognitive Technology Team
Cognitive Technology Team
Stream Real-Time Chat with Ollama’s qwen3 Model via Async Python & LangChain

Prerequisites

Ensure that Ollama is installed locally and that the qwen3:4b model has been pulled.

Asynchronous Python with Ollama

The script below creates an AsyncClient, sends a user message to the qwen3:4b model, and streams the response token‑by‑token.

import asyncio
from ollama import AsyncClient

# Connect to the local Ollama service
client = AsyncClient(host='http://localhost:11434')

async def chat():
    # User message
    message = {'role': 'user', 'content': '你好,如何入门大模型学习'}
    # Stream the chat response
    async for part in await client.chat(
        model='qwen3:4b',
        messages=[message],
        stream=True
    ):
        # Print each fragment immediately
        print(part['message']['content'], end='', flush=True)

# Run the async function
asyncio.run(chat())

Key Points

AsyncClient

: Ollama’s asynchronous client for non‑blocking or high‑concurrency use cases. stream=True: Enables incremental token delivery instead of waiting for the full answer. async for: Asynchronously iterates over the streamed chunks. flush=True: Forces each printed fragment to appear immediately in the terminal.

Running the script

When executed, the program prints the model’s answer in real time.

💡 Tip: If a connection error occurs, verify that the Ollama service is running by opening http://localhost:11434 in a browser.

Connecting LangChain to Ollama

Install the required packages

pip install langchain langchain-community langchain-core langchain-ollama

Initialize the Ollama model in LangChain

from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage

# Create a ChatOllama instance (base_url defaults to http://localhost:11434)
llm = ChatOllama(
    base_url="http://localhost:11434",
    model="qwen3:4b",
    temperature=0.7,
)

# Send a single message and retrieve the response
messages = [HumanMessage(content="你好,如何入门大模型")]
response = llm.invoke(messages)
print(response.content)

Streaming output (ChatGPT‑like experience)

for chunk in llm.stream("写一首关于春天的短诗"):
    print(chunk.content, end="", flush=True)

Full source example

from langchain_ollama import ChatOllama

llm = ChatOllama(
    base_url="http://localhost:11434",
    model="qwen3:4b",
    temperature=0.7,
)

for chunk in llm.stream("如何入门大模型"):
    print(chunk.content, end="", flush=True)

Typical use cases

Local knowledge‑base question answering (RAG)

Personal data‑analysis assistants

Offline code‑completion or assistance tools

Privacy‑preserving chatbots

LangChainChatbotOllamalocal AIQwen3Async PythonStreaming LLM
Cognitive Technology Team
Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.