How to Build a Fast, Accurate Honeypot Detector with Python and CrawlerGo
This article explains what web honeypots are, outlines their distinctive JSONP‑hijacking behavior, and provides a step‑by‑step guide—including asset collection with CrawlerGo, data cleaning, dictionary matching, multithreaded scanning, and proxy‑pool integration—to automatically identify and filter honeypots from large domain lists.
What is a honeypot?
A honeypot ("蜜罐") is a trap designed to lure attackers; once they interact with it, their activity can be traced back, making it a powerful reconnaissance tool for malicious actors.
Why honeypots are useful
They exploit JSONP hijacking to capture social‑network information about attackers by sending numerous cross‑domain requests to popular services such as Bilibili, Baidu, Huya, and Youku.
Key characteristics of honeypots
Typical signs include a large number of abnormal cross‑domain requests and the presence of specific JavaScript files (e.g., js/portrait.js or js/moment.min.js) that contain the string parcelRequire=function(e,r,t,n).
Detecting honeypots from massive asset lists
The workflow starts with gathering sub‑domains, then filtering out honeypots before proceeding with penetration testing.
Using crawlergo , all HTTP requests made by a target IP are captured and saved as a JSON file. The all_domain_list field contains every domain requested during the crawl.
{"all_domain_list": ["api.bilibili.com", "api.huya.com", ...]}From this list a dictionary of known honeypot request domains is built and later used for matching.
Implementation steps
1. Crawl assets crawlergo -u http://target.ip -o result.json 2. Data cleaning
Extract the all_domain_list array and convert it to a Python list using regular expressions.
import re, json
with open('result.json') as f:
data = json.load(f)
domains = re.findall(r'"([\w.-]+)"', str(data['all_domain_list']))3. Dictionary matching
Use set intersection to count how many known honeypot domains appear in the crawled list.
honeypot_dict = set(['api.bilibili.com', 'api.huya.com', 'api.youku.com'])
matches = honeypot_dict & set(domains)
if matches:
print('Honeypot detected')4. File I/O
with open('result.txt', 'w') as out:
out.write('
'.join(matches))5. Speed optimization with multithreading
Wrap the detection function in a ThreadPoolExecutor (or threadpool) to scan many IPs concurrently.
from concurrent.futures import ThreadPoolExecutor
def scan(ip):
# call crawlergo, clean data, match dictionary
pass
with ThreadPoolExecutor(max_workers=10) as pool:
pool.map(scan, ip_list)This reduces the scan time for a single IP from ~10 seconds to 1–2 seconds.
6. Proxy pool integration
To avoid being blocked or traced, route requests through a rotating proxy pool. A free example is proxy_pool , while paid services offer higher reliability.
# Example of fetching a proxy
import requests
proxy = requests.get('http://127.0.0.1:5010/get/').text
proxies = {'http': proxy, 'https': proxy}Combine the proxy with the multithreaded scanner to achieve fast, stealthy detection.
Further improvements
Static fingerprinting (checking for js/portrait.js or js/moment.min.js) can speed up detection even more, though it may fail against obfuscated or randomized honeypots.
Collecting additional fingerprints and continuously updating the dictionary will improve both accuracy and coverage.
All code snippets above are fully functional and can be adapted to specific environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
