How AI-Powered Hand Gesture Detection Drove a Double‑11 Celebrity Rock‑Paper‑Scissors Game
This article details how Alibaba leveraged AI-driven hand‑gesture detection and a lightweight SSD‑based object detection model to create an interactive rock‑paper‑scissors game for Double‑11, addressing challenges of undefined gestures, real‑time mobile performance, and data collection, and achieving over 16 million page views and high accuracy.
Project Background
Alibaba sought new ways to connect merchants and consumers as traditional marketing became less effective and users grew resistant to frequent promotions. Interactive mini‑games that are easy to play can boost engagement, and advances in computer‑vision AI provide novel interaction methods.
Problem Definition
The rock‑paper‑scissors (RPS) game requires real‑time hand‑gesture recognition, which is a hand‑gesture detection task. Classification alone cannot handle multiple hands, varying positions, sizes, and background clutter, so a detection‑first approach is needed.
Challenges
Uncertain number of hands and potential cheating gestures.
Hands appear small amid faces and background, making classification difficult.
No clear definition for each gesture; variations in shape, angle, and user style make exhaustive labeling impossible.
Dynamic timing: only gestures within a specific time window after the cue should be counted.
Model must be small, fast, and run on diverse mobile devices, especially Android.
Algorithm Overview
We adopted a one‑stage object detection framework (SSD) for its speed and memory efficiency, enhanced with feature‑pyramid networks (FPN) for multi‑scale detection and a lightweight backbone (MNasNet) optimized for mobile.
Target Detection
Detection outputs both class and bounding‑box coordinates for each hand. One‑stage models like SSD share convolutions for classification and localization, offering higher speed than two‑stage methods.
SSD Details
Multi‑scale feature maps predict objects at different resolutions.
Anchor boxes of various sizes and aspect ratios serve as priors for bounding‑box regression.
All‑convolutional design reduces memory usage.
Backbone Network
We replaced the original VGG backbone with a mobile‑friendly architecture (MNasNet) discovered via neural‑architecture search, balancing accuracy and latency.
Feature Fusion
FPN combines shallow high‑resolution features with deep semantic features, improving detection of small hands.
Loss Function
We used a combination of smooth L1 loss for localization and sigmoid‑based binary cross‑entropy for classification, removing the ambiguous “other” class and treating the problem as four classes (scissors, rock, paper, background). Focal loss was added to focus training on hard examples.
Data Collection & Annotation
Hundreds of short videos were crowdsourced, each showing users performing RPS gestures. A pre‑trained hand detector extracted hand crops, which were then labeled as rock, paper, scissors, other, or uncertain. Uncertain samples were assigned a weight of zero during training.
Results
The final model size is 1.9 MB, achieving ~17 ms inference on iOS devices and an [email protected] of 0.984 on internal test data. During the Double‑11 event, the game generated 16 million+ page views, 10 million+ unique visitors, and strong merchant feedback.
Future Work
Handling crowded offline scenes with many hands.
Improving detection of very small hands in full‑body shots.
Further optimizing inference speed on low‑end devices.
Overall, the RPS game demonstrates how a well‑engineered, mobile‑friendly object detection pipeline can turn a simple interactive concept into a high‑impact commercial solution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
