Master Ollama on macOS: Install, Run, and Optimize Large Language Models
This step‑by‑step guide shows how to install Ollama on macOS, verify the installation, manage and run open‑source LLMs, create custom models, enable the OpenAI‑compatible API, integrate with Open WebUI, and troubleshoot performance issues across different Apple silicon chips.
1. Installation and Basic Configuration
Ollama can be installed on macOS via two methods. The recommended one‑click script runs: curl -fsSL https://ollama.com/install.sh | sh The script downloads and installs Ollama automatically. Alternatively, download the .dmg from the official website, drag Ollama.app into the Applications folder, and allow execution in System Settings → Privacy & Security if the system blocks the developer.
Verify the installation by opening a terminal and executing: ollama --version A version number (e.g., 0.5.7) confirms success.
Default directories are:
Models: ~/.ollama/models Logs: ~/.ollama/logs Config: ~/.ollama/config.json You can change the model storage location by setting the OLLAMA_MODELS environment variable, e.g.:
# Add to ~/.zshrc
export OLLAMA_MODELS="/Volumes/ExternalSSD/ollama_models"2. Core Commands and Model Management
Ollama’s CLI mirrors Docker’s style and provides the following essential commands: ollama run <model> – launch a model (e.g., ollama run llama3). ollama pull <model> – pre‑download a model. ollama list – list installed models. ollama show <model> – display model details. ollama rm <model> – delete a model. ollama serve – start the background service (usually auto‑started).
Popular 2026 models include:
llama3:8b – balanced performance.
qwen:7b – strong Chinese language capabilities.
deepseek-coder:6.7b – best for code generation.
phi3:3.8b – lightweight, suitable for M1/M2 base Macs.
Example of running Llama 3:
# First run will download the model
ollama run llama3
# In the interactive prompt, type your question
> 你好,请介绍一下你自己。3. Advanced Techniques and Deep Usage
3.1 Create a custom model with a Modelfile
Write a Modelfile that defines the base model, system prompt, and parameters:
FROM llama3
SYSTEM """
你是一位专业的技术顾问,回答问题时请保持简洁、准确,并使用 Markdown 格式。
"""
PARAMETER temperature 0.7
PARAMETER num_ctx 4096Build and run the custom model:
ollama create my-tech-assistant -f ./Modelfile
ollama run my-tech-assistant3.2 Enable the OpenAI‑compatible API service
Ollama includes an API server listening at http://localhost:11434. Documentation is available at http://localhost:11434/api/docs.
Python example to generate a response:
import requests
response = requests.post(
'http://localhost:11434/api/generate',
json={
'model': 'llama3',
'prompt': '解释一下量子计算的基本原理。',
'stream': False
}
)
print(response.json()['response'])3.3 Integrate with Open WebUI
Open WebUI provides a ChatGPT‑like web front‑end. Deploy it with Docker:
# One‑click Docker deployment
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui --restart always \
ghcr.io/open-webui/open-webui:mainAfter the container starts, open http://localhost:3000 and set the Ollama address to http://host.docker.internal:11434.
4. Performance Tuning and Troubleshooting
4.1 Chip‑specific model recommendations
M1/M2 (base) – use phi3 or llama3:8b‑instruct‑q4_K_M with ≥8 GB RAM.
M1 Pro/Max, M2 Pro/Max – llama3:8b or qwen:7b with ≥16 GB RAM.
M3 Max/Ultra – large models like llama3:70b or qwen:72b with ≥32 GB RAM.
For devices with limited memory, prefer quantized variants (e.g., q4_K_M) for smaller size and faster inference.
4.2 Common issues
"command not found: ollama" – the binary is not in PATH. Add /usr/local/bin or restart the terminal.
Slow model download – try a domestic mirror or use a proxy.
Excessive fan noise – the model is too large for the hardware. Switch to a smaller or quantized model, or reduce num_ctx in the Modelfile.
5. Conclusion
Ollama transforms how AI can be used on a personal computer by providing a simple installation path, a Docker‑style CLI for model management, support for custom Modelfiles, an OpenAI‑compatible API, and easy integration with web front‑ends like Open WebUI. Whether you are a developer, researcher, or casual user, you now have the tools to run powerful LLMs locally with full control over privacy and offline availability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
