Backend Development 6 min read

Quick Setup of a Search Engine with Searx Docker Image

This article shows how to quickly set up a searchable engine using the open‑source Searx Docker image, provides the necessary Docker commands, examines core Python code that aggregates query results, and suggests customizing responses with your own data sources.

Python Programming Learning Circle

Jul 24, 2021

Quick Setup of a Search Engine with Searx Docker Image

A group member asked how to quickly build a search engine; the answer points to the open‑source Searx project, which provides a ready‑to‑use Docker image.

Code Location

Git repository: https://github.com/asciimoo/searx

The official Docker image can be pulled and run with a few commands:

cid=$(sudo docker ps -a | grep searx | awk '{print $1}')
 echo searx  cid is $cid
 if [ "$cid" != "" ];then
     sudo docker stop $cid
     sudo docker rm $cid
 fi
 sudo docker run -d --name searx -e IMAGE_PROXY=True -e BASE_URL=http://yourdomain.com  -p 7777:8888 wonderfall/searx

After the container is running, you can access the search engine via the mapped port.

Thoughts

The setup is very convenient; let's look at the source code to see how it works.

The core logic aggregates request results, which can come from databases, files, etc.

from urllib import urlencode
from json import loads
from collections import Iterable

search_url = None
url_query = None
content_query = None
title_query = None
suggestion_query = ''
results_query = ''

# parameters for engines with paging support
# number of results on each page
page_size = 1
# number of the first page (usually 0 or 1)
first_page_num = 1

def iterate(iterable):
    if type(iterable) == dict:
        it = iterable.iteritems()
    else:
        it = enumerate(iterable)
    for index, value in it:
        yield str(index), value

def is_iterable(obj):
    if type(obj) == str:
        return False
    if type(obj) == unicode:
        return False
    return isinstance(obj, Iterable)

def parse(query):
    q = []
    for part in query.split('/'):
        if part == '':
            continue
        else:
            q.append(part)
    return q

def do_query(data, q):
    ret = []
    if not q:
        return ret

    qkey = q[0]

    for key, value in iterate(data):
        if len(q) == 1:
            if key == qkey:
                ret.append(value)
            elif is_iterable(value):
                ret.extend(do_query(value, q))
        else:
            if not is_iterable(value):
                continue
            if key == qkey:
                ret.extend(do_query(value, q[1:]))
            else:
                ret.extend(do_query(value, q))
    return ret

def query(data, query_string):
    q = parse(query_string)
    return do_query(data, q)

def request(query, params):
    query = urlencode({'q': query})[2:]
    fp = {'query': query}
    if paging and search_url.find('{pageno}') >= 0:
        fp['pageno'] = (params['pageno'] - 1) * page_size + first_page_num
    params['url'] = search_url.format(**fp)
    params['query'] = query
    return params

def response(resp):
    results = []
    json = loads(resp.text)
    if results_query:
        for result in query(json, results_query)[0]:
            url = query(result, url_query)[0]
            title = query(result, title_query)[0]
            content = query(result, content_query)[0]
            results.append({'url': url, 'title': title, 'content': content})
    else:
        for url, title, content in zip(
            query(json, url_query),
            query(json, title_query),
            query(json, content_query)
        ):
            results.append({'url': url, 'title': title, 'content': content})

    if not suggestion_query:
        return results
    for suggestion in query(json, suggestion_query):
        results.append({'suggestion': suggestion})
    return results

Result

By customizing the response handling, you can feed your own data sources (e.g., databases, files) into Searx, effectively creating a personal mini‑search engine; integrating with jieba for Chinese tokenization makes it even more fun.

Original link: https://brucedone.com/archives/838

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Search Engine tutorial Searx

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.