How to Bypass Captchas with Selenium and Tesseract: A Step‑by‑Step Python Guide

This tutorial walks through using Selenium to handle pop‑ups and simple numeric captchas on a web portal, captures the captcha image, applies binary thresholding, recognizes the text with Tesseract OCR, and then submits the login credentials, including retry logic for failed recognitions.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Bypass Captchas with Selenium and Tesseract: A Step‑by‑Step Python Guide

Introduction

When crawling websites, captchas are a common obstacle. This article demonstrates how to use Selenium to handle pop‑ups and solve a simple numeric captcha on an instrument reservation platform.

The captcha consists of colored digits with a simple background noise, which can be solved without AI by applying a binary threshold and feeding the result to Google’s tesseract‑OCR engine.

Note: Configuration for selenium and tesseract is assumed to be handled by the reader.

Python Practice

import re
# image processing
from PIL import Image
# text recognition
import pytesseract
# browser automation
from selenium import webdriver
import time

Solving Pop‑up Dialogs

First open the target site:

url = 'http://lims.gzzoc.com/client'
driver = webdriver.Chrome()
driver.get(url)
time.sleep(30)

The site shows an alert‑style pop‑up. Common dialog types are:

alert(message) – simple OK alert confirm(message) – OK/Cancel dialog prompt(text, defaultText) – input dialog

Non‑traditional alerts may be embedded in div, iframe, or separate windows and require element location or frame/window switching.

Capturing the Captcha Image

Steps to capture and crop the captcha:

Locate the captcha image element. Get its size and position. Take a full‑page screenshot. Crop the screenshot to the captcha region.
img = driver.find_element_by_xpath('//img[@id="valiCode"]')
time.sleep(1)
location = img.location
size = img.size
left = 2 * location['x']
top = 2 * location['y']
right = left + 2 * size['width'] - 10
bottom = top + 2 * size['height'] - 10
driver.save_screenshot('valicode.png')
page_snap_obj = Image.open('valicode.png')
image_obj = page_snap_obj.crop((left, top, right, bottom))
image_obj.show()

Adjust the scaling factor if the screenshot is offset due to browser zoom.

Binary Threshold Processing

The threshold (e.g., 205) separates foreground digits from background noise.

img = image_obj.convert("L")  # convert to grayscale
pixdata = img.load()
w, h = img.size
threshold = 205
for y in range(h):
    for x in range(w):
        if pixdata[x, y] < threshold:
            pixdata[x, y] = 0
        else:
            pixdata[x, y] = 255

Further clean isolated noise:

data = img.getdata()
w, h = img.size
black_point = 0
for x in range(1, w-1):
    for y in range(1, h-1):
        mid_pixel = data[w*y + x]
        if mid_pixel < 50:
            top_pixel = data[w*(y-1) + x]
            left_pixel = data[w*y + (x-1)]
            down_pixel = data[w*(y+1) + x]
            right_pixel = data[w*y + (x+1)]
            if top_pixel < 10: black_point += 1
            if left_pixel < 10: black_point += 1
            if down_pixel < 10: black_point += 1
            if right_pixel < 10: black_point += 1
            if black_point < 1:
                img.putpixel((x, y), 255)
            black_point = 0
img.show()

Text Recognition

result = pytesseract.image_to_string(img)
regex = '\d+'
result = ''.join(re.findall(regex, result))
print(result)

Submitting Credentials

driver.find_element_by_name('code').send_keys(result)
driver.find_element_by_name('userName').send_keys('xxx')
driver.find_element_by_name('password').send_keys('xxx')
# click confirm button
driver.find_element_by_xpath("//div[@class='form-group login-input'][3]").click()

Because binary captcha recognition is not 100% reliable, a retry loop can be used:

while True:
    try:
        # ... perform login steps ...
        break
    except:
        driver.find_element_by_id('valiCode').click()

Conclusion

After a successful login, the session cookies can be used for further automation with Selenium or passed to requests for data extraction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAutomationCaptchaWeb ScrapingSeleniumtesseract
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.