Build a Private ChatGPT on Your Laptop with Ollama, DeepSeek‑R1 and Go MCP

This guide walks you through installing Ollama, pulling the open‑source DeepSeek‑R1:1.5B model, wrapping it with a Go‑based Model Context Protocol (MCP) server, creating a client example, and enhancing the experience with Open‑WebUI while offering performance‑tuning tips.

Code Wrench
Code Wrench
Code Wrench
Build a Private ChatGPT on Your Laptop with Ollama, DeepSeek‑R1 and Go MCP

1. Install Ollama

Download and install Ollama from https://ollama.com/download.

2. Pull the DeepSeek‑R1:1.5B model

ollama pull deepseek-r1:1.5b

3. Run the model

ollama run deepseek-r1:1.5b

4. Wrap the model with an MCP‑Go server

The server is implemented in main.go and uses the github.com/mark3labs/mcp-go library.

package main

import (
    "bufio"
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "io"
    "log"
    "net/http"
    "time"

    "github.com/mark3labs/mcp-go/mcp"
    "github.com/mark3labs/mcp-go/server"
)

const ollamaURL = "http://localhost:11434"

type OllamaGenerateRequest struct {
    Model       string  `json:"model"`
    Prompt      string  `json:"prompt"`
    MaxTokens   int     `json:"max_tokens,omitempty"`
    Temperature float64 `json:"temperature,omitempty"`
}

type OllamaGenerateResponse struct {
    Response string `json:"response"`
}

func callOllamaStream(prompt string) (string, error) {
    reqBody := OllamaGenerateRequest{Model: "deepseek-r1:1.5b", Prompt: prompt, MaxTokens: 256, Temperature: 0.7}
    bodyBytes, _ := json.Marshal(reqBody)
    client := &http.Client{Timeout: 0}
    resp, err := client.Post(ollamaURL+"/api/generate", "application/json", bytes.NewReader(bodyBytes))
    if err != nil { return "", fmt.Errorf("error calling Ollama: %v", err) }
    defer resp.Body.Close()
    reader := bufio.NewReader(resp.Body)
    var output string
    for {
        line, err := reader.ReadBytes('
')
        if err == io.EOF { break }
        if err != nil { return "", err }
        var genResp OllamaGenerateResponse
        if err := json.Unmarshal(line, &genResp); err == nil && genResp.Response != "" {
            output += genResp.Response
        }
        var doneCheck map[string]interface{}
        if err := json.Unmarshal(line, &doneCheck); err == nil {
            if done, ok := doneCheck["done"].(bool); ok && done { break }
        }
    }
    return output, nil
}

func handlerDeepSeek(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
    prompt, err := request.RequireString("prompt")
    if err != nil { return mcp.NewToolResultError(fmt.Sprintf("missing prompt: %v", err)), nil }
    output, err := callOllamaStream(prompt)
    if err != nil { return mcp.NewToolResultError(fmt.Sprintf("stream error: %v", err)), nil }
    return mcp.NewToolResultText(output), nil
}

func handlerLLaMA2(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
    prompt, _ := request.RequireString("prompt")
    response := "[LLaMA2 response] " + prompt
    return mcp.NewToolResultText(response), nil
}

func main() {
    s := server.NewMCPServer("Multi-Model MCP Server", "1.0.0",
        server.WithToolCapabilities(false),
        server.WithRecovery(),
    )
    deepseekTool := mcp.NewTool("deepseek_completions",
        mcp.WithDescription("Generate text using DeepSeek-R1:1.5B via Ollama"),
        mcp.WithString("prompt", mcp.Required(), mcp.Description("Prompt for DeepSeek")),
    )
    s.AddTool(deepseekTool, handlerDeepSeek)
    llama2Tool := mcp.NewTool("llama2_chat",
        mcp.WithDescription("Generate conversational responses using LLaMA2"),
        mcp.WithString("prompt", mcp.Required(), mcp.Description("Prompt for LLaMA2")),
    )
    s.AddTool(llama2Tool, handlerLLaMA2)
    httpSrv := server.NewStreamableHTTPServer(s)
    addr := ":9000"
    log.Printf("Multi-Model MCP-Go server started at %s ...
", addr)
    if err := httpSrv.Start(addr); err != nil { log.Fatalf("HTTP server failed: %v", err) }
}

Run the server

go run main.go

The server listens at http://localhost:9000/mcp.

5. Client JSON‑RPC call example

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io/ioutil"
    "net/http"
)

type MCPRequest struct {
    Method string      `json:"method"`
    Params interface{} `json:"params"`
    ID     int         `json:"id"`
}

type MCPResponse struct {
    Result struct {
        Type    string `json:"type"`
        Content string `json:"content"`
    } `json:"result"`
    ID int `json:"id"`
}

func main() {
    req := MCPRequest{Method: "call_tool", Params: map[string]interface{}{"tool": "llama2_chat", "arguments": map[string]string{"prompt": "你好,请自我介绍"}}, ID: 1}
    bodyBytes, _ := json.Marshal(req)
    resp, err := http.Post("http://localhost:9000/mcp", "application/json", bytes.NewReader(bodyBytes))
    if err != nil { panic(err) }
    defer resp.Body.Close()
    respBytes, _ := ioutil.ReadAll(resp.Body)
    var result MCPResponse
    if err := json.Unmarshal(respBytes, &result); err != nil { panic(err) }
    fmt.Println("模型输出:", result.Result.Content)
}

6. Enhance interaction with Open‑WebUI

Install via Docker:

docker run -it -p 7080:8080 ghcr.io/open-webui/open-webui:main

Configure the API endpoint to point to either Ollama ( http://localhost:11434) or the MCP‑Go server ( http://localhost:9000/mcp).

Access the UI at http://localhost:7080 for a visual chat experience.

7. Deployment optimization and performance tips

CPU / GPU allocation : Prefer GPU; for CPU‑only reduce max_tokens or use a smaller model; control thread count to avoid contention.

Concurrent request management : Use a queue or rate‑limiting to prevent model overload.

Cache strategy : Short‑term cache repeated prompts; long‑term cache frequent Q&A or summaries.

Streaming output optimization : Return tokens as they are generated to improve responsiveness; WebUI can display incremental results.

Security and access control : Add authentication, IP restrictions, or password protection for the WebUI.

Multi‑model and tool management : Register DeepSeek, LLaMA2, etc., in MCP‑Go, allocate resources, and limit concurrency per model.

8. Summary

Local deployment of DeepSeek‑R1:1.5B with Ollama.

MCP‑Go server exposing a multi‑model JSON‑RPC interface.

Streaming output handling.

Client JSON‑RPC invocation.

Unified management of multiple models and tools.

WebUI for interactive chat.

Performance tuning and security considerations.

Following these steps turns your laptop into a private AI toolbox that can serve, combine, and interact with multiple large language models efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MCPGoDeepSeekModel ServingOllamaLocal AIOpen WebUI
Code Wrench
Written by

Code Wrench

Focuses on code debugging, performance optimization, and real-world engineering, sharing efficient development tips and pitfall guides. We break down technical challenges in a down-to-earth style, helping you craft handy tools so every line of code becomes a problem‑solving weapon. 🔧💻

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.