How Precise Customer‑Flow Algorithms Transform Retail with AI Vision

This article explains how AI‑driven precise customer‑flow algorithms—covering pedestrian detection, full‑scene tracking, and person re‑identification—enable accurate offline traffic analysis, real‑time shopper profiling, and data‑driven store management for modern retail environments.

Suning Technology
Suning Technology
Suning Technology
How Precise Customer‑Flow Algorithms Transform Retail with AI Vision

On September 30, Suning Retail Technology Research Institute algorithm engineer Cai Zhongqiang presented the talk “Implementation of Precise Customer‑Flow Algorithms,” analyzing how video image data captured in offline retail scenes can be continuously tracked to recognize user attributes and behaviors.

Application Scenarios and Value of Precise Customer‑Flow in Store Digitization

In the era where traffic is king, merely attracting visitors does not guarantee conversion or revenue. Precise offline traffic measurement—such as shopper counts, new vs. returning customer ratios, and dwell time—provides core metrics for operators to formulate scientific strategies, reduce costs, and increase profitability. The rise of unmanned retail and visual technologies has sparked a revolution in offline shopping models, enabling digital modeling of the “people‑goods‑place” ecosystem.

The precise‑customer‑flow algorithm framework consists of three technical components: target detection, full‑scene multi‑target tracking, and pedestrian re‑identification (ReID). Detection supplies inputs for tracking; ReID links identities across frames and cameras; tracking provides real‑time localization of shoppers throughout the store.

Full‑scene tracking combines detection outputs to generate accurate shopper profiles, which can be merged with online data to support decision‑making. The three technical pillars unlock substantial business value in offline scenarios, creating a win‑win situation for operators and consumers.

Target Detection Technology Challenges and Model Design

Precise customer‑flow requires three types of information: spatial location (person detection), attribute estimation (appearance, age, etc.), and behavior recognition (shopping actions). A top‑down approach first detects pedestrians with bounding boxes, then extracts cropped human images for attribute and behavior analysis.

Traditional detection methods relied on handcrafted features (Haar‑like, HOG, LBP) and classifiers (SVM, AdaBoost) with multi‑stage pipelines (region proposal → feature extraction → classification → NMS). The deep‑learning era introduced anchor‑based models (YOLOv4, PAA) and anchor‑free models (FCOS, CenterNet), dramatically improving accuracy at the cost of higher GPU consumption.

Detection performance can be degraded by low resolution, poor lighting, fast motion, and occlusion, leading to missed or false detections. To mitigate occlusion, a double‑anchor model simultaneously detects head boxes and full‑body boxes, allowing joint NMS to boost recall and reduce errors.

Our optimizations include: (1) converting a two‑stage detector into a single‑stage architecture; (2) integrating the double‑anchor concept for joint head‑body detection; (3) applying a generalized focal loss to handle boundary uncertainty caused by occlusion. These improvements substantially alleviate missed and false detections in real store environments.

Algorithmic Solution for Full‑Scene Tracking

Full‑scene tracking corresponds to Multi‑Camera Multi‑Target Tracking (MTMC). Each camera runs a MOT algorithm to generate per‑frame trajectories (IDs). Cross‑camera association is achieved by extracting robust feature vectors for each person (e.g., clothing color, body shape) and matching them across cameras, effectively solving the ReID problem.

Breakthroughs in Person Re‑Identification

ReID aims to determine whether a query (probe) and a gallery entry depict the same individual. For image inputs, CNNs extract feature vectors; for video inputs, CNN+LSTM pipelines are used. Suning’s research introduced a Multi‑Scale Body‑Part Mask‑Guided Attention model built on ResNet‑50, employing dual mask‑guided attention modules and a combined classification‑triplet loss to learn discriminative, robust person embeddings.

Because pedestrian appearance changes (clothing, pose) and lacks stable features, the ReID gallery must be built dynamically. Suning proposes a patented dynamic gallery construction method that evaluates detection quality, camera source, and orientation before adding a new feature vector, preventing gallery contamination and improving tracking accuracy.

How Precise Customer‑Flow Algorithms Empower Store Digitization

High‑precision real‑time tracking enables detailed shopper profiling, heat‑map generation, and trajectory analysis. Combined with attribute recognition, operators can make data‑driven decisions on product placement, route planning, and personalized recommendations. The technology also supports event monitoring (e.g., theft detection), VR fitting mirrors, and identity‑based services (VIP, staff) through facial and pose analysis.

By integrating offline visual data with online user profiles, Suning replicates its “thousands of faces” recommendation engine in unmanned stores, achieving a seamless end‑to‑end shopping experience across channels.

object detectionperson re-identificationretail AIcustomer flowmulti-camera tracking
Suning Technology
Written by

Suning Technology

Official Suning Technology account. Explains cutting-edge retail technology and shares Suning's tech practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.