4 min read

How to Turn Thinking Mode On or Off for Qwen3.5 Models in Ollama, LM Studio, llama.cpp, and vLLM

This guide shows step‑by‑step how to enable or disable the thinking mode of Qwen3.5 series large language models across Ollama, LM Studio (GGUF and MLX), llama.cpp, and vLLM/SGLang using command‑line flags, custom model YAML files, and API parameters.

Old Zhang's AI Learning

Mar 4, 2026

Qwen3.5 series large language models can be run with or without a "thinking" phase that outputs intermediate reasoning before the final answer. The article collects the most common ways to toggle this feature for four popular runtimes.

Ollama

Ollama provides the simplest switch: add --think=false to the run command to disable thinking, or omit it to keep thinking enabled.

Example:

ollama run qwen3.5:0.8b --think=false

LM Studio

For models in GGUF format, LM Studio uses a model.yaml file to expose a custom field named enableThinking. Setting its default value to false creates a “no‑think” variant.

model:
  mlx-community/Qwen3.5-0.8B-MLX-4bit-no-think
base:
  mlx-community/Qwen3.5-0.8B-MLX-4bit
metadataOverrides:
  reasoning: false
customFields:
  - key: enableThinking
    displayName: "Enable Thinking"
    description: "Whether to allow thinking output before the final answer"
    type: boolean
    defaultValue: false
    effects:
      - type: setJinjaVariable
        variable: enable_thinking

After adding the file, LM Studio lists two models – the original and the virtual “no‑think” one – and the custom field appears in the UI, allowing the user to toggle thinking on or off.

When the switch is on, the model shows a long reasoning output; when off, it returns the answer instantly.

llama.cpp

The author has not tested this runtime, but according to an unsloth tutorial the default mode is non‑thinking. The command line for the two modes is:

# Non‑Thinking mode (default, recommended for daily use)
./build/bin/llama-server \
  -m ./models/Qwen3.5-9B-Q4_K_M.gguf \
  --ctx-size 16384 \
  --port 8080 \
  --n-gpu-layers 35

# Thinking mode
./build/bin/llama-server \
  -m ./models/Qwen3.5-9B-Q4_K_M.gguf \
  --ctx-size 16384 \
  --port 8080 \
  --n-gpu-layers 35 \
  --chat-template-kwargs '{"enable_thinking":true}'

vLLM / SGLang

During deployment the author could not find a runtime flag to disable thinking. The only available method is to set the flag when calling the API:

{
  "chat_template_kwargs": {"enable_thinking": false}
}

The article concludes with a brief thank‑you note.

vLLM Ollama llama.cpp LM Studio Qwen3.5 Thinking mode

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.