Deploy and Run Llama 3 Locally with Ollama in Minutes
This guide explains how to download a GGUF‑format Llama 3 model, create a Modelfile, use Ollama commands to build and run the model locally, test it, and interact via the built‑in REST API, including useful Docker and model‑management tips.
Ollama Overview
Ollama is an open‑source LLM service that runs models locally, using Docker containers and a simple CLI.
Step 01 – Download GGUF Model
Obtain the Chinese Llama 3 8B GGUF file from Hugging Face:
https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit/tree/mainThe required file is Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf. Save it to a local directory, e.g. D:/AI/Download/.
Step 02 – Create Modelfile
Create a text file named Modelfile with a single FROM directive that points to the GGUF file path:
# FROM specifies the GGUF file path
FROM D:/AI/Download/Llama3-8B-Chinese-Chat-q8_0-v2_1.ggufStep 03 – Build the Ollama Model
Run the ollama create command, providing a model name and the Modelfile:
ollama create tinywan-Llama3-8B-Chinese -f ./ModelfileOllama reads the Modelfile, imports the GGUF weights, and registers a new model called tinywan-Llama3-8B-Chinese. Verify with:
ollama list NAME ID SIZE MODIFIED
tinywan-Llama3-8B-Chinese:latest adcb30feaee5 16 GB About a minute ago
llama3:8b a6990ed6be41 4.7 GB 2 weeks agoStep 04 – Run the Model
Start an interactive session or generate a single response:
ollama run tinywan-Llama3-8B-Chinese:latestExample Prompt
>> Write a poem
“我说。
他眨了眨眼,微笑着回答道:“好啊,我试试看。”然后,他闭上眼睛,开始吟唱:
在星空之下,
月光洒银辉。
…REST API Access
Ollama also exposes an HTTP API on port 11434. To generate a completion, POST a JSON payload to /api/generate:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama3:8b",
"prompt": "中文回答。你是什么大模型?",
"stream": false
}'The response is a JSON stream containing fields such as model, response, total_duration, load_duration, etc. Full API reference is in the Ollama repository at https://github.com/ollama/ollama/blob/main/docs/api.md
Meta‑Llama Reference
The original Llama 3 source code is hosted at https://github.com/meta-llama/llama3. The GGUF version is distributed on Hugging Face to avoid conversion with llama.cpp.
Additional Model Management Commands
Delete a model : ollama rm my-model Copy a model :
ollama cp original-model new-modelOpen Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
