Artificial Intelligence 15 min read

How to Separate Complex Image Foreground from Background Using AI and Classic CV Techniques

This article presents a step‑by‑step solution that combines computer‑vision preprocessing, OCR, CNN classification, shape matching, and inpainting to isolate meaningful foreground elements from images with intricate backgrounds, discussing practical results, limitations, and code implementations.

Xianyu Technology

Nov 20, 2018

Background

Previous work introduced a UI‑automation goal: convert a single design image into code. To do this we must first cut out meaningful blocks (text, buttons, product images) from the picture. Traditional cutting treats the whole picture as a single image and loses structural information, especially when the background is complex. Industry solutions often rely on computer‑vision or AI (e.g., FCN+CRF) which achieve only ~80% accuracy, lack pixel‑level edges, require costly labeling, are hard to train, and behave like a black box. We therefore explored a hypothesis that UI foregrounds usually exhibit clear geometric features (regular shapes, presence of text, closed contours) and can be separated without heavy AI.

Practical Results

Extensive testing of many CV algorithms showed that no single method works universally; each works only in specific scenarios and requires different parameters for varying color complexities. A case‑by‑case approach would become unmaintainable.

We therefore built a pipeline that:

Detects as many foreground regions as possible.

Filters out low‑confidence regions.

Assigns foreground‑background layers with a hierarchical allocator.

Repairs the image by filling blank background areas.

Below are the foreground‑search process (GIF) and the final layered separation result.

Logic Overview

Text Processing

OCR Rough Position

OCR provides approximate bounding boxes for text. For example, the left image is the Xianyu homepage and the right image shows OCR‑derived white boxes. OCR gives coarse positions but cannot separate individual characters; whole lines may be merged and non‑text elements (e.g., banner slogans) can be mistakenly recognized as text.

Segmentation and CNN Classifier

Each OCR‑detected region is cropped to the smallest possible image and fed to a TensorFlow CNN that decides whether the region is editable text or an image‑based graphic.

"""
    ui基础元素识别
"""
# Load model (placeholder code)
with ui_sess.as_default():
    with g2.as_default():
        tf.global_variables_initializer().run()
# Load label file
ui_label_lines = [line.rstrip() for line in tf.gfile.GFile("AI_models/CNN/ui-elements-NN/tf_files/retrained_labels.txt")]
# Load graph
with tf.gfile.FastGFile("AI_models/CNN/ui-elements-NN/tf_files/retrained_graph.pb", 'rb') as f:
    ui_graph_def = tf.GraphDef()
    ui_graph_def.ParseFromString(f.read())
    tf.import_graph_def(ui_graph_def, name='')
ui_softmax_tensor = ui_sess.graph.get_tensor_by_name('final_result:0')

def ui_classify(image_path):
    image_data = tf.gfile.FastGFile(image_path, 'rb').read()
    predictions = ui_sess.run(ui_softmax_tensor, {'DecodeJpeg/contents:0': image_data})
    top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
    for node_id in top_k:
        human_string = ui_label_lines[node_id]
        score = predictions[0][node_id]
        print('%s (score = %s)' % (human_string, score))
        return human_string, score

Text Extraction

When the background is uniform, text regions are easy to extract. For complex backgrounds we evaluated Harris corners, Canny edges, SWT, and K‑means; K‑means gave the best results. The following code reshapes the gray region, runs K‑means, and reconstructs the segmented mask.

Z = gray_region.reshape((-1, 1))
Z = np.float32(Z)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
ret, label, center = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center)
res = center[label.flatten()]
res2 = res.reshape((gray_region.shape))

Foreground Search

Enhance Edges, Suppress Non‑edges

We convolve the original image with a kernel that highlights edges while smoothing flat areas.

conv_kernel = [
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1]
]

Denoising

The convolved image is converted to grayscale, binarized, and small noisy components are removed using cv2.connectedComponentsWithStats().

Contour Search Based on Text Position

Using the top‑left corner of each OCR‑detected text block as a seed, we perform flood‑fill to obtain a region, then extract its external contour with cv2.findContours(). We test whether the contour encloses the text (via cv2.pointPolygonTest) to decide if it represents a valid foreground.

Determine Inner/Outer Contours

If the text lies inside a contour, the contour is expanded outward until the border is captured; otherwise the existing border is used directly.

Foreground Classifier

Define Valid Shapes

Three template shapes (square, rectangle, circle) are pre‑loaded. A contour is considered valid if its matchShapes score against any template is below a threshold (empirically < 3) and the contour contains text.

# Load shape templates
circle = cv2.imread(os.getcwd() + '/fgbgIsolation/utils/shapes/circle.png', 0)
_, contours, _ = cv2.findContours(circle, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
self.circle = contours[0]

square = cv2.imread(os.getcwd() + '/fgbgIsolation/utils/shapes/square.png', 0)
_, contours, _ = cv2.findContours(square, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
self.square = contours[0]

rect = cv2.imread(os.getcwd() + '/fgbgIsolation/utils/shapes/rect.png', 0)
_, contours, _ = cv2.findContours(rect, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
self.rect = contours[0]

def detect(self, cnt):
    shape = "unidentified"
    types = [self.square, self.rect, self.circle]
    names = ['square', 'rect', 'circle']
    for i in range(len(types)):
        type = types[i]
        score = cv2.matchShapes(type, cnt, 1, 0.0)  # lower score = more similar
        if score < 3:
            shape = names[i]
            break
    return shape, score

Image Inpainting

Compute Overlap Regions

Only overlapping parts of layered foregrounds need repair. We compute the intersection of the current layer mask with masks of all higher layers using cv2.bitwise_and, then build a combined overlap mask.

# mask: current layer mask; layers_merge: list of all foreground masks
UPPER_level_mask = np.zeros(mask.shape, np.uint8)
UPPER_level_mask = np.where(layers_merge > i, 255, 0).astype(np.uint8)
_, contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
overlaps_mask = np.zeros(mask.shape, np.uint8)
for cnt in contours:
    cnt_mask = np.zeros(mask.shape, np.uint8)
    cv2.drawContours(cnt_mask, [cnt], 0, (255, 255, 255), cv2.FILLED, cv2.LINE_AA)
    overlap_mask = cv2.bitwise_and(inpaint_mask, cnt_mask, mask=UPPER_level_mask)
    overlaps_mask = cv2.bitwise_or(overlaps_mask, overlap_mask)
# Assign the computed overlap mask to the inpainting mask
inpaint_mask = overlaps_mask

Inpainting

OpenCV's cv2.INPAINT_TELEA algorithm first restores edge pixels and then propagates inward until the whole masked area is filled.

# img: original image; inpaint_mask: mask from previous step
# dst: repaired image
dst = cv2.inpaint(img, inpaint_mask, 3, cv2.INPAINT_TELEA)

Extension

The presented computer‑vision‑centric, deep‑learning‑assisted pipeline works well for many UI screenshots, but challenging cases with high‑contrast edges, heavy noise, or indistinct contours still leave room for improvement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision deep learning TensorFlow image segmentation opencv foreground extraction shape detection

Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.