Applying Image Recognition in UI Automation Testing with Sikuli
This article introduces the use of image‑recognition techniques, particularly the Sikuli tool, for UI automation testing, covering typical scenarios, underlying principles, key functions such as Find, click, wait, and type, as well as example code, and discusses the advantages and limitations of this approach.
Introduction
When thinking of UI automation, most people first consider element‑based methods such as XPath, ID, or CSS selectors. However, many situations on web or mobile require locating elements based on visual content or images, which traditional selectors cannot handle. This article explains how image‑recognition can be applied to testing, using the Sikuli tool.
Typical Scenarios for Image‑Recognition in Testing
During testing, capture screenshots of the application under test and use image‑recognition algorithms to detect predefined actionable controls; if found, trigger corresponding actions.
Validate test results by comparing screenshots of the actual UI with expected images, automatically determining pass/fail.
Perform performance testing, such as measuring response time in app testing, by comparing visual states.
Principle
Sikuli scripts are written in Jython and use image‑recognition to simulate keyboard and mouse events, achieving UI‑level automation. The core consists of a Java library that wraps two main components:
java.awt.Robot – sends keyboard and mouse events to specific screen coordinates.
C++ engine (based on OpenCV) – searches for target images on the screen, providing coordinates to the Robot via JNI. On top of this low‑level layer, Sikuli offers a simple API for script writers.
Function Introduction
Find(x)
Locate a single occurrence of the image x on the screen.
findAll(x)
Locate all occurrences of the image x on the screen.
wait(x, 10)
Wait up to 10 seconds for image x to appear in a specified region.
waitVanish(x, 10)
Wait up to 10 seconds for image x to disappear from the screen or region.
exists(x)
Check whether image x exists; returns None instead of throwing an exception if not found.
click(x)
Perform a left‑click on the best‑matching GUI component represented by image x .
doubleClick(x)
Double‑click on the best‑matching component.
rightClick(x)
Right‑click on the best‑matching component.
hover(x)
Move the mouse pointer over the best‑matching component.
dragDrop(x, y)
Drag the image x and drop it onto image y .
type(x, "text")
Send the string "text" to the focused element identified by x .
paste(x, "text")
Paste the string "text" into the focused element (functionally similar to type ).
Code Example
A sample script for measuring response time can be found at the Sikuli productivity page (http://www.sikuli.org/productivity.html). The article includes screenshots of the actual code used for such tests.
Advantages
Very simple – a screenshot is enough to start automating.
Effective for games or applications with custom UI controls that are hard to locate with traditional selectors.
Low learning curve; common functions are already wrapped and easy to use.
Open‑source, allowing custom extensions.
Can handle visual elements like Flash that lack DOM‑based identifiers.
Disadvantages
The screen must be unobstructed; any overlay will cause the target image to be missed.
Changing display resolution or moving to a different monitor requires new reference screenshots.
Cannot run in the background; the desktop must be visible and active.
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.