Artificial Intelligence 22 min read

Build a Serverless CAPTCHA Solver with CNN – Full Step‑by‑Step Tutorial

This article explains how to create a serverless CAPTCHA recognition service using a convolutional neural network (CNN) in Python, covering CAPTCHA types, data generation, model training, API implementation, front‑end integration, and deployment on Alibaba Cloud with detailed code examples.

Alibaba Cloud Native

Dec 31, 2020

Build a Serverless CAPTCHA Solver with CNN – Full Step‑by‑Step Tutorial

Serverless computing has become a hot topic, and combining it with AI techniques like convolutional neural networks (CNN) enables powerful CAPTCHA recognition services.

Understanding CAPTCHAs

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is used to prevent automated attacks such as credential stuffing, voting fraud, and forum spam. CAPTCHAs have evolved from simple numeric codes to complex image‑based challenges, including sliding puzzles, click‑through tasks, and distorted text.

Simple CAPTCHA Recognition

Early CAPTCHAs can be solved by image binarization, segmentation, and character‑wise classification. The article shows how to generate synthetic alphanumeric CAPTCHAs, perform binary conversion, and split characters for individual recognition.

# coding:utf-8
import random
import numpy as np
from PIL import Image
from captcha.image import ImageCaptcha

CAPTCHA_LIST = [c for c in "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"]
CAPTCHA_LEN = 4  # length of the code
CAPTCHA_HEIGHT = 60
CAPTCHA_WIDTH = 160

randomCaptchaText = lambda char=CAPTCHA_LIST, size=CAPTCHA_LEN: "".join([random.choice(char) for _ in range(size)])

def genCaptchaTextImage(width=CAPTCHA_WIDTH, height=CAPTCHA_HEIGHT, save=None):
    image = ImageCaptcha(width=width, height=height)
    captchaText = randomCaptchaText()
    if save:
        image.write(captchaText, f'./img/{captchaText}.jpg')
    return captchaText, np.array(Image.open(image.generate(captchaText)))

print(genCaptchaTextImage(save=True))

CNN‑Based CAPTCHA Recognition

CNNs reduce the need for manual feature extraction and handle distorted, noisy images more robustly. The article describes a three‑layer CNN architecture, explains convolution, pooling, and fully‑connected layers, and highlights advantages such as parameter sharing and translation invariance.

import tensorflow.compat.v1 as tf
from datetime import datetime
from util import getNextBatch
from captcha_gen import CAPTCHA_HEIGHT, CAPTCHA_WIDTH, CAPTCHA_LEN, CAPTCHA_LIST

tf.compat.v1.disable_eager_execution()

variable = lambda shape, alpha=0.01: tf.Variable(alpha * tf.random_normal(shape))
conv2d = lambda x, w: tf.nn.conv2d(x, w, strides=[1,1,1,1], padding='SAME')
maxPool2x2 = lambda x: tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
optimizeGraph = lambda y, y_conv: tf.train.AdamOptimizer(1e-3).minimize(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=y_conv)))

def cnnGraph(x, keepProb, size, captchaList=CAPTCHA_LIST, captchaLen=CAPTCHA_LEN):
    imageHeight, imageWidth = size
    xImage = tf.reshape(x, shape=[-1, imageHeight, imageWidth, 1])
    hDrop1 = maxPool2x2(tf.nn.relu(conv2d(xImage, variable([3,3,1,32]))) )
    hDrop2 = maxPool2x2(tf.nn.relu(conv2d(hDrop1, variable([3,3,32,64]))) )
    hDrop3 = maxPool2x2(tf.nn.relu(conv2d(hDrop2, variable([3,3,64,64]))) )
    # Fully connected layer
    fc_input = tf.reshape(hDrop3, [-1, imageHeight*imageWidth*64])
    wFc = variable([imageHeight*imageWidth*64, 1024])
    bFc = variable([1024])
    hFc = tf.nn.relu(tf.matmul(fc_input, wFc) + bFc)
    hDropFc = tf.nn.dropout(hFc, keepProb)
    # Output layer
    wOut = variable([1024, len(captchaList)*captchaLen])
    bOut = variable([len(captchaList)*captchaLen])
    return tf.matmul(hDropFc, wOut) + bOut

Training the Model

The training script creates batches of synthetic CAPTCHAs, converts images to grayscale, flattens them, and feeds them to the CNN. Accuracy is evaluated every 100 steps, and the model is saved once the accuracy exceeds 90%.

def train(height=CAPTCHA_HEIGHT, width=CAPTCHA_WIDTH, ySize=len(CAPTCHA_LIST)*CAPTCHA_LEN):
    accRate = 0.95
    x = tf.placeholder(tf.float32, [None, height*width])
    y = tf.placeholder(tf.float32, [None, ySize])
    keepProb = tf.placeholder(tf.float32)
    yConv = cnnGraph(x, keepProb, (height, width))
    optimizer = optimizeGraph(y, yConv)
    accuracy = accuracyGraph(y, yConv)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        step = 0
        while True:
            batchX, batchY = getNextBatch(64)
            sess.run(optimizer, feed_dict={x: batchX, y: batchY, keepProb: 0.75})
            if step % 100 == 0:
                batchXTest, batchYTest = getNextBatch(100)
                acc = sess.run(accuracy, feed_dict={x: batchXTest, y: batchYTest, keepProb: 1.0})
                print(datetime.now().strftime('%c'), ' step:', step, ' accuracy:', acc)
                if acc > accRate:
                    modelPath = "./model/captcha.model"
                    saver.save(sess, modelPath, global_step=step)
                    accRate += 0.01
                    if accRate > 0.90:
                        break
            step += 1

train()

Serverless API Integration

A lightweight WSGI handler is implemented to expose two endpoints: get_captcha returns a base64‑encoded CAPTCHA image, and get_text accepts a base64 image, runs the trained model, and returns the predicted text. The handler uses the same TensorFlow graph for inference.

class Response:
    def __init__(self, start_response, response, errorCode=None):
        self.start = start_response
        body = {'Error': {'Code': errorCode, 'Message': response}} if errorCode else {'Response': response}
        body['ResponseId'] = str(uuid.uuid1())
        self.response = json.dumps(body)
    def __iter__(self):
        status = '200'
        headers = [('Content-type', 'application/json; charset=UTF-8')]
        self.start(status, headers)
        yield self.response.encode('utf-8')

def handler(environ, start_response):
    try:
        size = int(environ.get('CONTENT_LENGTH', 0))
    except ValueError:
        size = 0
    requestBody = json.loads(environ['wsgi.input'].read(size).decode('utf-8'))
    reqType = requestBody.get('type')
    if reqType == 'get_captcha':
        text, img = genCaptchaTextImage(save='./tmp.jpg')
        with open('./tmp.jpg', 'rb') as f:
            data = base64.b64encode(f.read()).decode()
        return Response(start_response, {'image': data})
    if reqType == 'get_text':
        imgData = base64.b64decode(requestBody['image'])
        with open('./tmp.jpg', 'wb') as f:
            f.write(imgData)
        img = Image.open('./tmp.jpg').resize((160,60)).convert('RGB')
        img = np.asarray(img)
        img = convert2Gray(img).flatten() / 255.0
        result = captcha2Text([img])
        return Response(start_response, {'result': result})

Front‑End Demo

A simple HTML page using the Bottle framework provides two buttons: one to fetch a new CAPTCHA image and another to submit the image for recognition. The page displays the generated image and the recognition result returned by the serverless API.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>CAPTCHA Recognition Demo</title>
  <script>
    var image;
    function getCaptcha(){
      var xhr = new XMLHttpRequest();
      xhr.open('GET','/get_captcha',false);
      xhr.onreadystatechange = function(){
        if(xhr.readyState==4 && xhr.status==200){
          image = JSON.parse(xhr.responseText).Response.image;
          document.getElementById('captcha').src = 'data:image/png;base64,'+image;
          document.getElementById('getResult').style.visibility='visible';
        }
      };
      xhr.setRequestHeader('Content-type','application/json');
      xhr.send();
    }
    function getCaptchaResult(){
      var xhr = new XMLHttpRequest();
      xhr.open('POST','/get_captcha_result',false);
      xhr.onreadystatechange = function(){
        if(xhr.readyState==4 && xhr.status==200){
          document.getElementById('result').innerText = 'Result: '+JSON.parse(xhr.responseText).Response.result;
        }
      };
      xhr.setRequestHeader('Content-type','application/json');
      xhr.send(JSON.stringify({image: image}));
    }
  </script>
</head>
<body>
  <img id="captcha" src=""/>
  <button onclick="getCaptcha()">Get CAPTCHA</button>
  <button id="getResult" style="visibility:hidden" onclick="getCaptchaResult()">Recognize</button>
  <p id="result"></p>
</body>
</html>

Deployment with Serverless Devs

The project is packaged for Alibaba Cloud Function Compute using the Serverless Devs framework. A YAML configuration defines two components: the backend function (Python 3 runtime, 3072 MB memory) and the front‑end Bottle service. After running s deploy, the service is accessible via an HTTP endpoint.

Conclusion

Combining serverless infrastructure with a CNN‑based CAPTCHA solver creates a scalable, low‑maintenance service that can achieve around 90% accuracy on generated samples. While many CAPTCHA varieties exist, extending the dataset and model can address more complex challenges, making this approach valuable for automated data collection and security testing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN Serverless machine learning Python Deployment TensorFlow Captcha

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.