How to Bypass Captchas with Selenium and Tesseract: A Step‑by‑Step Python Guide
This tutorial walks through using Selenium to handle pop‑ups and simple numeric captchas on a web portal, captures the captcha image, applies binary thresholding, recognizes the text with Tesseract OCR, and then submits the login credentials, including retry logic for failed recognitions.
Introduction
When crawling websites, captchas are a common obstacle. This article demonstrates how to use Selenium to handle pop‑ups and solve a simple numeric captcha on an instrument reservation platform.
The captcha consists of colored digits with a simple background noise, which can be solved without AI by applying a binary threshold and feeding the result to Google’s tesseract‑OCR engine.
Note: Configuration for selenium and tesseract is assumed to be handled by the reader.
Python Practice
import re
# image processing
from PIL import Image
# text recognition
import pytesseract
# browser automation
from selenium import webdriver
import timeSolving Pop‑up Dialogs
First open the target site:
url = 'http://lims.gzzoc.com/client'
driver = webdriver.Chrome()
driver.get(url)
time.sleep(30)The site shows an alert‑style pop‑up. Common dialog types are:
alert(message) – simple OK alert confirm(message) – OK/Cancel dialog prompt(text, defaultText) – input dialog
Non‑traditional alerts may be embedded in div, iframe, or separate windows and require element location or frame/window switching.
Capturing the Captcha Image
Steps to capture and crop the captcha:
Locate the captcha image element. Get its size and position. Take a full‑page screenshot. Crop the screenshot to the captcha region.
img = driver.find_element_by_xpath('//img[@id="valiCode"]')
time.sleep(1)
location = img.location
size = img.size
left = 2 * location['x']
top = 2 * location['y']
right = left + 2 * size['width'] - 10
bottom = top + 2 * size['height'] - 10
driver.save_screenshot('valicode.png')
page_snap_obj = Image.open('valicode.png')
image_obj = page_snap_obj.crop((left, top, right, bottom))
image_obj.show()Adjust the scaling factor if the screenshot is offset due to browser zoom.
Binary Threshold Processing
The threshold (e.g., 205) separates foreground digits from background noise.
img = image_obj.convert("L") # convert to grayscale
pixdata = img.load()
w, h = img.size
threshold = 205
for y in range(h):
for x in range(w):
if pixdata[x, y] < threshold:
pixdata[x, y] = 0
else:
pixdata[x, y] = 255Further clean isolated noise:
data = img.getdata()
w, h = img.size
black_point = 0
for x in range(1, w-1):
for y in range(1, h-1):
mid_pixel = data[w*y + x]
if mid_pixel < 50:
top_pixel = data[w*(y-1) + x]
left_pixel = data[w*y + (x-1)]
down_pixel = data[w*(y+1) + x]
right_pixel = data[w*y + (x+1)]
if top_pixel < 10: black_point += 1
if left_pixel < 10: black_point += 1
if down_pixel < 10: black_point += 1
if right_pixel < 10: black_point += 1
if black_point < 1:
img.putpixel((x, y), 255)
black_point = 0
img.show()Text Recognition
result = pytesseract.image_to_string(img)
regex = '\d+'
result = ''.join(re.findall(regex, result))
print(result)Submitting Credentials
driver.find_element_by_name('code').send_keys(result)
driver.find_element_by_name('userName').send_keys('xxx')
driver.find_element_by_name('password').send_keys('xxx')
# click confirm button
driver.find_element_by_xpath("//div[@class='form-group login-input'][3]").click()Because binary captcha recognition is not 100% reliable, a retry loop can be used:
while True:
try:
# ... perform login steps ...
break
except:
driver.find_element_by_id('valiCode').click()Conclusion
After a successful login, the session cookies can be used for further automation with Selenium or passed to requests for data extraction.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
