Artificial Intelligence 5 min read

Turn LM Studio into a Local OpenAI‑Compatible API Server

This guide shows how to select a model in LM Studio, expose a local port, start the HTTP server, and interact with it via curl commands, covering quick model listing, chat requests, and the difference between streaming and full‑response modes.

JavaEdge

Apr 26, 2025

Turn LM Studio into a Local OpenAI‑Compatible API Server

Overview

LM Studio can expose a local HTTP endpoint that mimics the OpenAI API, allowing client applications to call a loaded model through standard REST calls.

1. Select a Model

Open the Developer tab in LM Studio and choose the desired model from the list (e.g., llama-4-maverick-17b-128e-instruct).

2. Configure Port Exposure

In the Server settings set the listening port (default 1234) and enable CORS so that web pages or other tools can connect.

3. Start the Service

Switch to the Status tab. LM Studio will start an HTTP server and print the available endpoints.

2025-04-26 20:55:13  [INFO] [LM STUDIO SERVER] Success! HTTP server listening on port 1234
2025-04-26 20:55:13  [INFO] [LM STUDIO SERVER] Supported endpoints:
GET http://localhost:1234/v1/models
POST http://localhost:1234/v1/chat/completions
POST http://localhost:1234/v1/completions
POST http://localhost:1234/v1/embeddings
[LM STUDIO SERVER] Logs are saved into /Users/javaedge/.lmstudio/server-logs
Server started.

4. Quick‑Start API Calls

4.1 List Loaded Models

Verify that the server is reachable and see which models are loaded:

curl http://127.0.0.1:1234/v1/models/

4.2 Chat Completion

Send a /v1/chat/completions request that follows the OpenAI JSON schema. The request is stateless; the client must include the full conversation history each time.

curl http://127.0.0.1:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-maverick-17b-128e-instruct",
    "messages": [
      { "role": "system", "content": "Always answer in rhymes." },
      { "role": "user", "content": "Introduce yourself." }
    ],
    "temperature": 0.7,
    "max_tokens": -1,
    "stream": true
  }'

4.3 Streaming vs. Full Response

If "stream": true, LM Studio returns each generated token as soon as it is produced, enabling low‑latency, incremental display. Setting "stream": false makes the server accumulate the entire completion before sending the response, which can increase latency for long outputs.

AI API curl local LLM LM Studio OpenAI-compatible

Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.