Build a Private ChatGPT on Your Laptop with Ollama, DeepSeek‑R1 and Go MCP
This guide walks you through installing Ollama, pulling the open‑source DeepSeek‑R1:1.5B model, wrapping it with a Go‑based Model Context Protocol (MCP) server, creating a client example, and enhancing the experience with Open‑WebUI while offering performance‑tuning tips.
1. Install Ollama
Download and install Ollama from https://ollama.com/download.
2. Pull the DeepSeek‑R1:1.5B model
ollama pull deepseek-r1:1.5b3. Run the model
ollama run deepseek-r1:1.5b4. Wrap the model with an MCP‑Go server
The server is implemented in main.go and uses the github.com/mark3labs/mcp-go library.
package main
import (
"bufio"
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"time"
"github.com/mark3labs/mcp-go/mcp"
"github.com/mark3labs/mcp-go/server"
)
const ollamaURL = "http://localhost:11434"
type OllamaGenerateRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
MaxTokens int `json:"max_tokens,omitempty"`
Temperature float64 `json:"temperature,omitempty"`
}
type OllamaGenerateResponse struct {
Response string `json:"response"`
}
func callOllamaStream(prompt string) (string, error) {
reqBody := OllamaGenerateRequest{Model: "deepseek-r1:1.5b", Prompt: prompt, MaxTokens: 256, Temperature: 0.7}
bodyBytes, _ := json.Marshal(reqBody)
client := &http.Client{Timeout: 0}
resp, err := client.Post(ollamaURL+"/api/generate", "application/json", bytes.NewReader(bodyBytes))
if err != nil { return "", fmt.Errorf("error calling Ollama: %v", err) }
defer resp.Body.Close()
reader := bufio.NewReader(resp.Body)
var output string
for {
line, err := reader.ReadBytes('
')
if err == io.EOF { break }
if err != nil { return "", err }
var genResp OllamaGenerateResponse
if err := json.Unmarshal(line, &genResp); err == nil && genResp.Response != "" {
output += genResp.Response
}
var doneCheck map[string]interface{}
if err := json.Unmarshal(line, &doneCheck); err == nil {
if done, ok := doneCheck["done"].(bool); ok && done { break }
}
}
return output, nil
}
func handlerDeepSeek(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
prompt, err := request.RequireString("prompt")
if err != nil { return mcp.NewToolResultError(fmt.Sprintf("missing prompt: %v", err)), nil }
output, err := callOllamaStream(prompt)
if err != nil { return mcp.NewToolResultError(fmt.Sprintf("stream error: %v", err)), nil }
return mcp.NewToolResultText(output), nil
}
func handlerLLaMA2(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
prompt, _ := request.RequireString("prompt")
response := "[LLaMA2 response] " + prompt
return mcp.NewToolResultText(response), nil
}
func main() {
s := server.NewMCPServer("Multi-Model MCP Server", "1.0.0",
server.WithToolCapabilities(false),
server.WithRecovery(),
)
deepseekTool := mcp.NewTool("deepseek_completions",
mcp.WithDescription("Generate text using DeepSeek-R1:1.5B via Ollama"),
mcp.WithString("prompt", mcp.Required(), mcp.Description("Prompt for DeepSeek")),
)
s.AddTool(deepseekTool, handlerDeepSeek)
llama2Tool := mcp.NewTool("llama2_chat",
mcp.WithDescription("Generate conversational responses using LLaMA2"),
mcp.WithString("prompt", mcp.Required(), mcp.Description("Prompt for LLaMA2")),
)
s.AddTool(llama2Tool, handlerLLaMA2)
httpSrv := server.NewStreamableHTTPServer(s)
addr := ":9000"
log.Printf("Multi-Model MCP-Go server started at %s ...
", addr)
if err := httpSrv.Start(addr); err != nil { log.Fatalf("HTTP server failed: %v", err) }
}Run the server
go run main.goThe server listens at http://localhost:9000/mcp.
5. Client JSON‑RPC call example
package main
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
)
type MCPRequest struct {
Method string `json:"method"`
Params interface{} `json:"params"`
ID int `json:"id"`
}
type MCPResponse struct {
Result struct {
Type string `json:"type"`
Content string `json:"content"`
} `json:"result"`
ID int `json:"id"`
}
func main() {
req := MCPRequest{Method: "call_tool", Params: map[string]interface{}{"tool": "llama2_chat", "arguments": map[string]string{"prompt": "你好,请自我介绍"}}, ID: 1}
bodyBytes, _ := json.Marshal(req)
resp, err := http.Post("http://localhost:9000/mcp", "application/json", bytes.NewReader(bodyBytes))
if err != nil { panic(err) }
defer resp.Body.Close()
respBytes, _ := ioutil.ReadAll(resp.Body)
var result MCPResponse
if err := json.Unmarshal(respBytes, &result); err != nil { panic(err) }
fmt.Println("模型输出:", result.Result.Content)
}6. Enhance interaction with Open‑WebUI
Install via Docker:
docker run -it -p 7080:8080 ghcr.io/open-webui/open-webui:mainConfigure the API endpoint to point to either Ollama ( http://localhost:11434) or the MCP‑Go server ( http://localhost:9000/mcp).
Access the UI at http://localhost:7080 for a visual chat experience.
7. Deployment optimization and performance tips
CPU / GPU allocation : Prefer GPU; for CPU‑only reduce max_tokens or use a smaller model; control thread count to avoid contention.
Concurrent request management : Use a queue or rate‑limiting to prevent model overload.
Cache strategy : Short‑term cache repeated prompts; long‑term cache frequent Q&A or summaries.
Streaming output optimization : Return tokens as they are generated to improve responsiveness; WebUI can display incremental results.
Security and access control : Add authentication, IP restrictions, or password protection for the WebUI.
Multi‑model and tool management : Register DeepSeek, LLaMA2, etc., in MCP‑Go, allocate resources, and limit concurrency per model.
8. Summary
Local deployment of DeepSeek‑R1:1.5B with Ollama.
MCP‑Go server exposing a multi‑model JSON‑RPC interface.
Streaming output handling.
Client JSON‑RPC invocation.
Unified management of multiple models and tools.
WebUI for interactive chat.
Performance tuning and security considerations.
Following these steps turns your laptop into a private AI toolbox that can serve, combine, and interact with multiple large language models efficiently.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Wrench
Focuses on code debugging, performance optimization, and real-world engineering, sharing efficient development tips and pitfall guides. We break down technical challenges in a down-to-earth style, helping you craft handy tools so every line of code becomes a problem‑solving weapon. 🔧💻
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
