Quick Setup of a Search Engine with Searx Docker Image
This article shows how to quickly set up a searchable engine using the open‑source Searx Docker image, provides the necessary Docker commands, examines core Python code that aggregates query results, and suggests customizing responses with your own data sources.
Quick Setup of a Search Engine with Searx Docker Image
A group member asked how to quickly build a search engine; the answer points to the open‑source Searx project, which provides a ready‑to‑use Docker image.
Code Location
Git repository: https://github.com/asciimoo/searx
The official Docker image can be pulled and run with a few commands:
cid=$(sudo docker ps -a | grep searx | awk '{print $1}')
echo searx cid is $cid
if [ "$cid" != "" ];then
sudo docker stop $cid
sudo docker rm $cid
fi
sudo docker run -d --name searx -e IMAGE_PROXY=True -e BASE_URL=http://yourdomain.com -p 7777:8888 wonderfall/searxAfter the container is running, you can access the search engine via the mapped port.
Thoughts
The setup is very convenient; let's look at the source code to see how it works.
The core logic aggregates request results, which can come from databases, files, etc.
from urllib import urlencode
from json import loads
from collections import Iterable
search_url = None
url_query = None
content_query = None
title_query = None
suggestion_query = ''
results_query = ''
# parameters for engines with paging support
# number of results on each page
page_size = 1
# number of the first page (usually 0 or 1)
first_page_num = 1
def iterate(iterable):
if type(iterable) == dict:
it = iterable.iteritems()
else:
it = enumerate(iterable)
for index, value in it:
yield str(index), value
def is_iterable(obj):
if type(obj) == str:
return False
if type(obj) == unicode:
return False
return isinstance(obj, Iterable)
def parse(query):
q = []
for part in query.split('/'):
if part == '':
continue
else:
q.append(part)
return q
def do_query(data, q):
ret = []
if not q:
return ret
qkey = q[0]
for key, value in iterate(data):
if len(q) == 1:
if key == qkey:
ret.append(value)
elif is_iterable(value):
ret.extend(do_query(value, q))
else:
if not is_iterable(value):
continue
if key == qkey:
ret.extend(do_query(value, q[1:]))
else:
ret.extend(do_query(value, q))
return ret
def query(data, query_string):
q = parse(query_string)
return do_query(data, q)
def request(query, params):
query = urlencode({'q': query})[2:]
fp = {'query': query}
if paging and search_url.find('{pageno}') >= 0:
fp['pageno'] = (params['pageno'] - 1) * page_size + first_page_num
params['url'] = search_url.format(**fp)
params['query'] = query
return params
def response(resp):
results = []
json = loads(resp.text)
if results_query:
for result in query(json, results_query)[0]:
url = query(result, url_query)[0]
title = query(result, title_query)[0]
content = query(result, content_query)[0]
results.append({'url': url, 'title': title, 'content': content})
else:
for url, title, content in zip(
query(json, url_query),
query(json, title_query),
query(json, content_query)
):
results.append({'url': url, 'title': title, 'content': content})
if not suggestion_query:
return results
for suggestion in query(json, suggestion_query):
results.append({'suggestion': suggestion})
return resultsResult
By customizing the response handling, you can feed your own data sources (e.g., databases, files) into Searx, effectively creating a personal mini‑search engine; integrating with jieba for Chinese tokenization makes it even more fun.
Original link: https://brucedone.com/archives/838
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
