Run OpenAI’s Open‑Source gpt‑oss Models Locally with Ollama – A Quick Guide
OpenAI’s new open‑source gpt‑oss models, available in 20B and 120B sizes, can be run locally via Ollama with features like agentic capabilities, configurable reasoning, fine‑tuning, and MXFP4 quantization, and the article provides step‑by‑step installation, usage, and integration instructions.
OpenAI released its latest open‑source weight model series, gpt‑oss, and partnered with Ollama to let developers run the models locally.
gpt‑oss model overview
Two model sizes are offered:
gpt-oss-20b : 200‑billion‑parameter model optimized for low latency and local or domain‑specific use, suitable for personal computers.
gpt-oss-120b : 1.2‑trillion‑parameter flagship model for production, general‑purpose and heavy inference, best run on servers with professional‑grade GPUs.
Key features
gpt-ossincludes built‑in agent capabilities such as function calling, web browsing, and structured output generation (e.g., JSON). It also provides full chain‑of‑thought visibility, configurable reasoning effort (low/medium/high), fine‑tuning support, and an Apache 2.0 license.
Technical deep‑dive: MXFP4 quantization
OpenAI uses MXFP4 quantization (≈4.25 bits per parameter) to compress the models. Over 90 % of parameters come from MoE layers; after quantization, gpt-oss-20b runs smoothly on a system with only 16 GB RAM, and gpt-oss-120b fits into a single 80 GB GPU.
Ollama natively supports the MXFP4 format, requiring no extra conversion and matching OpenAI’s reference implementation in benchmarks.
Quick start guide
Install Ollama for your OS, then verify the installation with ollama. Run a model with a single command, e.g.: ollama run gpt-oss:20b For the 120 B version (requires a GPU with ≥80 GB VRAM): ollama run gpt-oss:120b After the model downloads, you can chat directly in the terminal.
Ways to interact with the model
1. Command‑line tool
Use ollama run to ask questions and retrieve model information.
2. cURL API
Ollama exposes an HTTP service on port 11434. Example:
curl http://localhost:11434/api/generate -d '{ "model": "gpt-oss:20b", "prompt": "What is water made of?" }'3. Java integration
Add the spring-ai-ollama dependency and configure the base URL and model:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.model=gpt-oss:20bThen use the injected ChatModel to call the model. Note that the current Spring AI 1.0 release does not support configuring the reasoning‑effort parameter for gpt‑oss; use the HTTP API for that purpose.
Important reminder: If you already have Ollama installed, upgrade to the latest version to ensure full support for the gpt‑oss models.
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
