Information Security 15 min read

How to Build a Fast, Accurate Honeypot Detector with Python and CrawlerGo

This article explains what web honeypots are, outlines their distinctive JSONP‑hijacking behavior, and provides a step‑by‑step guide—including asset collection with CrawlerGo, data cleaning, dictionary matching, multithreaded scanning, and proxy‑pool integration—to automatically identify and filter honeypots from large domain lists.

MaGe Linux Operations

Sep 15, 2023

How to Build a Fast, Accurate Honeypot Detector with Python and CrawlerGo

What is a honeypot?

A honeypot ("蜜罐") is a trap designed to lure attackers; once they interact with it, their activity can be traced back, making it a powerful reconnaissance tool for malicious actors.

Why honeypots are useful

They exploit JSONP hijacking to capture social‑network information about attackers by sending numerous cross‑domain requests to popular services such as Bilibili, Baidu, Huya, and Youku.

Key characteristics of honeypots

Typical signs include a large number of abnormal cross‑domain requests and the presence of specific JavaScript files (e.g., js/portrait.js or js/moment.min.js) that contain the string parcelRequire=function(e,r,t,n).

Detecting honeypots from massive asset lists

The workflow starts with gathering sub‑domains, then filtering out honeypots before proceeding with penetration testing.

Using crawlergo , all HTTP requests made by a target IP are captured and saved as a JSON file. The all_domain_list field contains every domain requested during the crawl.

{"all_domain_list": ["api.bilibili.com", "api.huya.com", ...]}

From this list a dictionary of known honeypot request domains is built and later used for matching.

Implementation steps

1. Crawl assets crawlergo -u http://target.ip -o result.json 2. Data cleaning

Extract the all_domain_list array and convert it to a Python list using regular expressions.

import re, json
with open('result.json') as f:
    data = json.load(f)
domains = re.findall(r'"([\w.-]+)"', str(data['all_domain_list']))

3. Dictionary matching

Use set intersection to count how many known honeypot domains appear in the crawled list.

honeypot_dict = set(['api.bilibili.com', 'api.huya.com', 'api.youku.com'])
matches = honeypot_dict & set(domains)
if matches:
    print('Honeypot detected')

4. File I/O

with open('result.txt', 'w') as out:
    out.write('
'.join(matches))

5. Speed optimization with multithreading

Wrap the detection function in a ThreadPoolExecutor (or threadpool) to scan many IPs concurrently.

from concurrent.futures import ThreadPoolExecutor

def scan(ip):
    # call crawlergo, clean data, match dictionary
    pass

with ThreadPoolExecutor(max_workers=10) as pool:
    pool.map(scan, ip_list)

This reduces the scan time for a single IP from ~10 seconds to 1–2 seconds.

6. Proxy pool integration

To avoid being blocked or traced, route requests through a rotating proxy pool. A free example is proxy_pool , while paid services offer higher reliability.

# Example of fetching a proxy
import requests
proxy = requests.get('http://127.0.0.1:5010/get/').text
proxies = {'http': proxy, 'https': proxy}

Combine the proxy with the multithreaded scanner to achieve fast, stealthy detection.

Further improvements

Static fingerprinting (checking for js/portrait.js or js/moment.min.js) can speed up detection even more, though it may fail against obfuscated or randomized honeypots.

Collecting additional fingerprints and continuously updating the dictionary will improve both accuracy and coverage.

All code snippets above are fully functional and can be adapted to specific environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python multithreading proxy pool crawlergo honeypot detection

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.