Artificial Intelligence 25 min read

Alibaba's AI-Driven In-Store Foot Traffic Digitization

Alibaba’s search division showcases how AI transforms traditional retail by digitizing in‑store foot traffic, employing camera‑based person detection, re‑identification, RFID‑enhanced product interaction, and edge‑optimized models to generate real‑time customer insights, heatmaps, and personalized recommendations that bridge offline and online shopping experiences.

Alibaba Cloud Developer

Jan 29, 2019

Alibaba's AI-Driven In-Store Foot Traffic Digitization

Overall Solution

The solution integrates off‑site product selection guidance, in‑store traffic acquisition, customer profiling, trajectory tracking, interaction data collection, smart mirror recommendations, and post‑visit online re‑engagement, forming a full‑process pipeline from offline to online.

In‑Store Foot Traffic Digitization

Hardware uses existing surveillance cameras and RFID tags, combined with visual and radio‑frequency technologies, and GPU edge terminals for computation. The system applies face‑recognition to infer basic attributes (gender, age, new/returning), pedestrian detection and cross‑camera re‑identification to map movement, generating heatmaps for store zones. Fusion of camera and RFID sensors captures actions such as browsing and trying on garments, accurately counting product interactions.

Pedestrian Detection

Real‑time detection must run on low‑cost hardware; standard YOLO models are too heavy. A lightweight model based on Tiny DarkNet with FPN and Spatial Pyramid Pooling reduces size to 1/10 of YOLOv3 while losing less than 2% AP, achieving 268% FPS increase on Tesla P4 and supporting 16 simultaneous video streams on a GTX 1070.

Knowledge Distillation Optimization

Knowledge distillation transfers soft targets from a teacher network (YOLOv3) to a student network, using hint layers and selective loss on uncertain predictions to improve convergence without increasing model complexity.

Pedestrian Re‑Identification

Re‑identification links the same person across multiple cameras. Global features struggle with pose, occlusion, and lighting variations, so a multi‑scale local feature fusion network is used, combining local and global cues to improve Rank@1 on Market1501 from 89.9% (global only) to 96.19% (fused).

Cross‑Dataset Re‑Identification Exploration

To adapt models to new stores without per‑store labeling, a spatio‑temporal mixture model combines visual features with camera transition probabilities, leveraging typical customer movement patterns (e.g., entering through entrance camera, moving between adjacent zones).

Person‑Product Interaction Detection

Combining vision with RFID, the system detects when a customer picks up or flips a product. RFID tags emit RSSI and Phase signals; changes are analyzed via JS divergence of frequency distributions to infer motion, achieving 94% accuracy versus 91.9% for raw RSSI/Phase.

Image‑based action classification uses MobileNet on detected pedestrians, optimized for recall while maintaining precision.

Application of Foot Traffic Digitization

Collected data supports offline store operations (heatmaps, flow maps, demographic profiling) and enables personalized offline traffic distribution via interactive screens outside and inside the mall, matching user attributes with store demographics and offering tailored coupons.

Engineering Implementation

Initial serial AI inference left CPU and GPU under‑utilized; a pipeline architecture with shared memory (mmap ctypes) and independent processes increased throughput, achieving >300× speedup over pipe communication.

Business Impact

Metrics now include daily foot traffic counts, hourly heatmaps, zone flow diagrams, and demographic breakdowns, allowing merchants to adjust product placement, store layout, and promotional strategies based on real‑time insights.

References

Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.

Hinton G, Vinyals O, Dean J. Distilling knowledge in a neural network. NIPS 2014.

Romero A, et al. FitNets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.

Chen G, et al. Learning efficient object detection models with knowledge distillation. NeurIPS 2017.

Sun Y, Zheng L, Yang Y, Tian Q. Beyond Part Models: Person Retrieval with Refined Part Pooling.

Li W, Zhu X, Gong S. Person Re‑Identification by Deep Joint Learning of Multi‑Loss Classification, IJCAI 2017.

Lv J, Chen W, Li Q, Yang C. Unsupervised Cross‑dataset Person Re‑identification by Transfer Learning of Spatial‑Temporal Patterns, CVPR 2018.

Zheng L, Shen L, Tian L, Wang S, Tian Q. Scalable Person Re‑identification: A Benchmark, ICCV 2015.

Wang G, Yuan Y, Chen X, Li J, Zhou X. Learning Discriminative Features with Multiple Granularities for Person Re‑Identification, MM 2018.

Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation, CVPR 2018.

Liu T, Yang L, Li X‑Y, Huang H, Liu Y. TagBooth: Deep shopping data acquisition powered by RFID tags, INFOCOM 2015.

TensorFlow MobileNet v1 documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision AI person re-identification Retail Analytics RFID

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.