How to Crack Different Captcha Types with Python OCR and Selenium
This article explains the main captcha varieties—input, sliding, grid, and click‑based—and provides step‑by‑step Python solutions using OCR libraries, image preprocessing, and Selenium automation to bypass them effectively.
Introduction
Many websites employ captchas to prevent automated access. This article introduces the most common captcha types and presents Python‑based techniques to recognize and bypass them.
Common Captcha Types
The four mainstream captchas are input (text) captchas, sliding puzzles, grid (jigsaw) captchas, and click‑based image/text verification.
01 Input Captcha
These captchas require users to type characters shown in an image. The simplest solution is to apply OCR. Python libraries such as tesserocr and pytesseract wrap the Tesseract‑OCR engine. For noisy images, preprocessing steps—grayscale conversion, binarization, and denoising—greatly improve accuracy.
# Load image
im = Image.open(img_path)
# Convert to grayscale
im_gray = im.convert('L')
# Save grayscale image
im_gray.save('gray-'+img_name)
# Binarize using a threshold
threshold = 140
table = [0 if i < threshold else 1 for i in range(256)]
im_bin = im_gray.point(table, '1')
im_bin.save('bin-'+img_name)
# OCR recognition
text = pytesseract.image_to_string(im_bin)
print('Recognition result:', text)02 Sliding Captcha
This captcha shows a puzzle piece that must be dragged into a missing gap. The solution simulates human behavior with Selenium: first click the button to reveal the gap, capture before‑and‑after screenshots, compare pixel differences to locate the gap position, then move the slider to that coordinate while varying speed to avoid detection.
# Locate slider element
slider = driver.find_element(By.ID, 'slider')
# Get element location and size
loc = slider.location
size = slider.size
left = int(loc['x'])
right = int(loc['x'] + size['width'])
# Capture screenshots before and after clicking the button
driver.find_element(By.ID, 'button').click()
time.sleep(0.5)
before = driver.get_screenshot_as_png()
# ... compute gap position by pixel comparison ...
# Drag slider to gap
ActionChains(driver).click_and_hold(slider).move_by_offset(gap_x, 0).release().perform()03 Grid Captcha
Grid captchas present several shuffled image pieces that must be dragged in a specific order. By collecting templates of all possible pieces, one can perform template matching to identify each piece and its required sequence. After determining the order (e.g., 4→3→2→1), Selenium can automate the drag actions accordingly.
04 Click‑Based Captcha
These captchas ask users to click specific characters or icons. The typical approach is to send the captcha image to a third‑party recognition service that returns the required click coordinates, then use Selenium to simulate the clicks.
OCR Workflow
The OCR pipeline consists of preprocessing, grayscale conversion, binarization, denoising, segmentation, and recognition. Installing Tesseract‑OCR is a prerequisite for using pytesseract.
Preprocessing Example
# Convert to grayscale
im_gray = im.convert('L')
# Binarize
threshold = 140
table = [0 if i < threshold else 1 for i in range(256)]
im_bin = im_gray.point(table, '1')Noise Removal
Simple thresholding can eliminate most background noise. Pixels with values greater than zero are set to white (255).
pixel_matrix = img.load()
for y in range(img.height):
for x in range(img.width):
if pixel_matrix[x, y] != 0:
pixel_matrix[x, y] = 255
img.save('clean.png')Building a Character Library
After extracting individual characters, store multiple samples for each alphanumeric symbol (A‑Z, 0‑9). Choose the clearest sample (fewest black pixels) as the template for matching new captchas.
Recognition Algorithm
For a new captcha, apply the same preprocessing, segment it into character blocks, and compare each block against the template library. The character with the highest pixel‑match percentage is selected as the recognized result.
By following these steps, a fully automated Python pipeline can solve a wide range of captchas.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
