Artificial Intelligence 12 min read

Building Enterprise‑Grade Semantic Search with Ollama—No External APIs Required

This article walks through the complete design and implementation of a locally deployed, enterprise‑level semantic search system using Ollama for embedding generation and Easysearch for vector retrieval, covering problem analysis, architecture decisions, pipeline configuration, bulk indexing, and hybrid query execution.

Mingyi World Elasticsearch

Aug 4, 2025

Building Enterprise‑Grade Semantic Search with Ollama—No External APIs Required

1. Problem Origin

Traditional keyword search fails to understand user intent, missing results such as “high‑performance cheap headphones” when the product description uses phrases like “超高性价比无线蓝牙耳机”. The same issue appears in technical document search where relevant articles are hidden because their titles do not contain the exact query terms.

2. Problem Analysis

Keyword‑based search relies on inverted indexes and TF‑IDF, which only perform string matching and lack true semantic understanding. Semantic search solves this by converting text into high‑dimensional vectors and measuring cosine similarity, allowing semantically related texts to be retrieved even with different wording.

3. Solution Exploration

The proposed solution consists of two components:

Ollama – a local vectorisation service that transforms text into embeddings.

Easysearch – a search engine that stores vectors and provides k‑NN retrieval.

3.1 Why Ollama?

Fully local deployment keeps data inside the enterprise network.

Supports multiple open‑source embedding models (e.g., nomic‑embed‑text, mxbai‑embed‑large).

Resource consumption is moderate; a single machine can handle medium‑scale workloads.

Simple API reduces integration effort.

3.2 Why Easysearch?

API compatibility with Elasticsearch makes migration easy.

Native vector search with mature k‑NN implementation.

Built‑in pipeline mechanism enables seamless integration of external AI services.

Supports hybrid retrieval, combining keyword and semantic search.

3.3 Architecture – Dual‑Pipeline Design

Ingest Pipeline : Calls Ollama API during document ingestion to generate and store vectors.

Search Pipeline : Converts user queries into vectors at search time for similarity matching.

This design is transparent to business code; existing indexing and query logic require little change.

3.4 Key Technical Decisions

Vector dimension chosen to balance storage cost and retrieval accuracy.

Cosine similarity selected for its length‑insensitivity, suitable for text.

LSH algorithm used for indexing to achieve high precision with fast queries.

4. Practical Implementation

4.1 Environment Setup

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve

# Pull embedding model
ollama pull nomic-embed-text:latest

Test the model with a simple HTTP request:

curl -X POST http://localhost:11434/api/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "nomic-embed-text", "prompt": "test"}'

4.2 Configure Easysearch Ingest Pipeline

PUT _ingest/pipeline/ollama-embedding-pipeline
{
  "description": "Ollama embedding example",
  "processors": [
    {
      "text_embedding": {
        "url": "http://localhost:11434/api/embed",
        "vendor": "ollama",
        "text_field": "content",
        "vector_field": "content_vector",
        "model_id": "nomic-embed-text:latest"
      }
    }
  ]
}

Important parameters: text_field – source field to embed (chosen as content). vector_field – destination field for the embedding. batch_size set to 5 to improve throughput while respecting Ollama’s capacity.

4.3 Create Vector Index

PUT /knowledge-base
{
  "mappings": {
    "properties": {
      "title": {"type": "text", "analyzer": "ik_max_word"},
      "content": {"type": "text", "analyzer": "ik_max_word"},
      "category": {"type": "keyword"},
      "content_vector": {
        "type": "knn_dense_float_vector",
        "knn": {
          "dims": 768,
          "model": "lsh",
          "similarity": "cosine",
          "L": 99,
          "k": 1
        }
      }
    }
  }
}

Key points: type set to knn_dense_float_vector, dims matches the 768‑dimensional output of nomic-embed-text, and LSH provides a good precision‑performance trade‑off.

4.4 Bulk Import Documents

POST /_bulk?pipeline=ollama-embedding-pipeline&refresh=wait_for
{ "index": {"_index": "knowledge-base", "_id": "1"} }
{ "title": "一本书讲透 Elasticsearch", "content": "本文详细介绍了Elasticsearch集群的性能调优方法...", "category": "技术文档" }
{ "index": {"_index": "knowledge-base", "_id": "2"} }
{ "title": "数据库查询优化技巧", "content": "深入分析SQL查询性能瓶颈...", "category": "技术文档" }
{ "index": {"_index": "knowledge-base", "_id": "3"} }
{ "title": "微服务架构设计原则", "content": "探讨微服务架构的核心设计理念...", "category": "架构设计" }

4.5 Configure Search Pipeline

PUT /_search/pipeline/ollama-search-pipeline
{
  "request_processors": [
    {
      "semantic_query_enricher": {
        "tag": "ollama_enricher",
        "description": "使用Ollama进行查询向量化",
        "url": "http://localhost:11434/api/embed",
        "vendor": "ollama",
        "default_model_id": "nomic-embed-text:latest",
        "vector_field_model_id": {"content_vector": "nomic-embed-text:latest"}
      }
    }
  ]
}

Set it as the default search pipeline:

PUT /knowledge-base/_settings
{
  "index.search.default_pipeline": "ollama-search-pipeline"
}

4.6 Execute Semantic Search

GET /knowledge-base/_search
{
  "_source": ["title", "content", "category"],
  "query": {
    "semantic": {
      "content_vector": {
        "query_text": "如何提升Elasticsearch 查询速度",
        "candidates": 20,
        "query_strategy": "LSH_COSINE"
      }
    }
  },
  "size": 5
}

The query returns highly relevant documents such as “一本书讲透 Elasticsearch” and “数据库查询优化技巧”, demonstrating effective semantic matching.

4.7 Hybrid Retrieval Optimization

A combined query uses both traditional multi_match and semantic clauses, giving higher boost to keyword matches while still leveraging semantic similarity:

POST /knowledge-base/_search
{
  "_source": ["title", "content", "category"],
  "query": {
    "bool": {
      "should": [
        {"multi_match": {"query": "数据库查询优化", "fields": ["title^3", "content"], "type": "best_fields", "boost": 1.5}},
        {"semantic": {"content_vector": {"query_text": "数据库查询优化", "candidates": 50, "query_strategy": "LSH_COSINE", "boost": 1.0}}}
      ],
      "minimum_should_match": 1
    }
  },
  "size": 10
}

5. Conclusion

Local deployment brings data security, controllability, and customization. By selecting suitable embedding models and tuning algorithm parameters, the solution integrates seamlessly with existing systems and upgrades search from simple string matching to true semantic understanding, benefiting any text‑heavy business scenario.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Search Engine semantic search local deployment Ollama vector embeddings Easysearch

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.