Information Security 9 min read

Understanding CSS Sprites and Techniques to Bypass Sprite‑Based Anti‑Scraping

This article explains the concept and benefits of CSS sprites, analyzes their drawbacks for web performance and security, and provides a step‑by‑step Python‑based method—including code snippets—to extract and sum numbers hidden behind sprite images used as an anti‑scraping measure.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Understanding CSS Sprites and Techniques to Bypass Sprite‑Based Anti‑Scraping

Web pages often load many small images, each requiring a separate TCP connection, which can increase latency and server load; combining these images into a single CSS sprite reduces the number of connections.

CSS sprites (also called CSS spritesheets) merge multiple icons or background images into one file and use background-position to display the required portion, but they introduce challenges such as increased memory usage and more complex CSS.

The main advantages of sprites are fewer HTTP requests and faster page loads, while disadvantages include wasted whitespace, scaling issues, maintenance difficulty, and larger CSS files.

A practical example is the site http://www.glidedsky.com/level/web/crawler-sprite-image-1 , where numbers are rendered via div elements with CSS background positions; the sprite image contains digits 0‑9, and the displayed number is determined by the background-position-x value.

To defeat this anti‑scraping technique, three approaches are suggested: (1) collect all background-position-x values and map them to digits; (2) download the sprite image and segment it according to the positions; (3) estimate digit width from the image size and compute digits via integer division.

The first implementation step downloads the base64‑encoded sprite image, decodes it, saves it locally, and returns its width. The Python code is:

def save_img(img_data):
    """save image in local directory
    :param img_data: image base64 data
    :return: width of image
    """
    img = base64.urlsafe_b64decode(img_data)
    filename = "{}.{}".format(uuid.uuid4(), "png")
    filepath = os.path.join("./Images", filename)
    with open(filepath, "wb") as f:
        f.write(img)
    image = Image.open(filepath)
    return image.width

The next step extracts all background-position-x values using a regular expression:

re.findall(r"background-position-x:-?(\d+)?px", html)

With the sprite width known, the average digit width is calculated (image width ÷ 10), and a mapping function translates positions to digits:

def parse(num_list: list, gap: int):
    """translate position to digit
    :param num_list: number list
    :param gap: average gap between numbers
    :return: dict mapping position to digit
    """
    return {str(num): str(int(num // gap)) for num in num_list}

Finally, the digits are collected from the HTML, converted using the position‑digit dictionary, concatenated into three‑digit numbers, and summed:

def get_digits(html, pos_dict):
    """get digit according to the class and sum up the numbers
    :param html: html
    :param pos_dict: position to digit
    :return: None (prints result)
    """
    et = etree.HTML(html)
    pos_classes = et.xpath('//*[@id="app"]/main/div[1]/div/div/div/div/div/@class')
    digits, d = [], ""
    for pos in pos_classes:
        if len(d) == 3:
            digits.append(d)
            d = ""
        pos_x = re.findall(pos.split(" ")[0] + r" { background-position-x:-?(\d+?)px }", html)
        d = d + pos_dict[pos_x[0]]
    digits.append(d)
    result = sum([int(i) for i in digits])
    print("The result is : {}".format(result))

By following these steps, the hidden numbers can be extracted and summed despite the sprite‑based anti‑scraping protection.

Front-endPythonsecurityCSSWeb Scrapinganti-scrapingSprite
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.