Deploy and Run Llama 3 Locally with Ollama in Minutes

This guide explains how to download a GGUF‑format Llama 3 model, create a Modelfile, use Ollama commands to build and run the model locally, test it, and interact via the built‑in REST API, including useful Docker and model‑management tips.

Open Source Tech Hub
Open Source Tech Hub
Open Source Tech Hub
Deploy and Run Llama 3 Locally with Ollama in Minutes

Ollama Overview

Ollama is an open‑source LLM service that runs models locally, using Docker containers and a simple CLI.

Step 01 – Download GGUF Model

Obtain the Chinese Llama 3 8B GGUF file from Hugging Face:

https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit/tree/main

The required file is Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf. Save it to a local directory, e.g. D:/AI/Download/.

Step 02 – Create Modelfile

Create a text file named Modelfile with a single FROM directive that points to the GGUF file path:

# FROM specifies the GGUF file path
FROM D:/AI/Download/Llama3-8B-Chinese-Chat-q8_0-v2_1.gguf

Step 03 – Build the Ollama Model

Run the ollama create command, providing a model name and the Modelfile:

ollama create tinywan-Llama3-8B-Chinese -f ./Modelfile

Ollama reads the Modelfile, imports the GGUF weights, and registers a new model called tinywan-Llama3-8B-Chinese. Verify with:

ollama list
NAME                                 ID               SIZE   MODIFIED
tinywan-Llama3-8B-Chinese:latest     adcb30feaee5    16 GB   About a minute ago
llama3:8b                           a6990ed6be41    4.7 GB  2 weeks ago

Step 04 – Run the Model

Start an interactive session or generate a single response:

ollama run tinywan-Llama3-8B-Chinese:latest

Example Prompt

>> Write a poem
“我说。
他眨了眨眼,微笑着回答道:“好啊,我试试看。”然后,他闭上眼睛,开始吟唱:
在星空之下,
月光洒银辉。
…

REST API Access

Ollama also exposes an HTTP API on port 11434. To generate a completion, POST a JSON payload to /api/generate:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3:8b",
  "prompt": "中文回答。你是什么大模型?",
  "stream": false
}'

The response is a JSON stream containing fields such as model, response, total_duration, load_duration, etc. Full API reference is in the Ollama repository at https://github.com/ollama/ollama/blob/main/docs/api.md

Meta‑Llama Reference

The original Llama 3 source code is hosted at https://github.com/meta-llama/llama3. The GGUF version is distributed on Hugging Face to avoid conversion with llama.cpp.

Additional Model Management Commands

Delete a model : ollama rm my-model Copy a model :

ollama cp original-model new-model
DockerLLMREST APIOllamaLlama3GGUF
Open Source Tech Hub
Written by

Open Source Tech Hub

Sharing cutting-edge internet technologies and practical AI resources.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.