Technical Overview of Alipay's 2018 Spring Festival “Scan Fu” Image Recognition System
The article details Alipay's 2018 Spring Festival "Scan Fu" initiative, describing the challenges of high‑volume Chinese character detection, the client‑server architecture, the lightweight xFuNet deep‑learning model, training strategies, performance results, and future AR extensions.
Introduction Alipay’s Spring Festival "Scan Fu" activity, launched in 2018, required a robust image‑recognition pipeline to quickly and accurately detect various forms of the Chinese character "福" while handling massive user traffic and preventing misrecognition of similar characters.
2017 Review In 2017 the feature was a small add‑on that quickly became a mainstream game mechanic. A client‑side fallback detection was introduced when server capacity was exceeded, but the solution proved insufficient during peak usage, prompting a redesign.
2018 Goals The 2018 version aimed for higher accuracy, faster response, and stronger concurrency. Specific objectives included reducing false positives for non‑"福" characters, supporting high‑throughput AR entry points, and maintaining seamless user experience without degradation.
Architecture Overview Two detection paths were adopted: a lightweight traditional detector on the client to generate candidate regions, followed by a small verification network (xFuNet). The server complements the client with a deep‑learning based detector for cases the client cannot handle, and performs secondary verification.
Client‑Side Process a) Fast detection of candidate "福" regions across all device models, targeting >90% recall with tolerable false positives. b) xFuNet verification removes most false detections; uncertain cases are escalated to the server for further analysis.
Server‑Side Process a) Complementary detection using a small‑network deep‑learning model for the few characters missed by the client. b) Secondary verification, including handling devices that cannot run xFuNet locally and performing final validation of server results.
Core Technology – xFuNet To meet the <10 ms latency requirement, Alipay built a custom lightweight network derived from ResNet‑18, named xFuNet. Design principles include early feature‑map reduction, limited kernel sizes and output channels, and a reduced number of convolutional layers, resulting in a 120 KB model with 10 ms inference time.
Performance xFuNet achieves high accuracy comparable to MobileNet‑0.25 while being significantly smaller and faster, making it suitable for the client‑heavy "Scan Fu" scenario.
Training the Verification Model A large dataset of real‑world "福" images (handwritten, printed, window‑flower, etc.) and negative samples (e.g., characters like "逼" or "祸") was collected. The classification task was split into multiple fine‑grained categories to improve robustness and enable rapid fine‑tuning when new negative samples appear.
2018 Results The new system delivered markedly higher recognition rates and lower latency compared with 2017, achieving sub‑1% false‑positive rates and effectively suppressing negative public sentiment.
Conclusion After a year of technical consolidation, the 2018 "Scan Fu" activity succeeded, and the team plans to extend the AR platform with additional capabilities such as gesture recognition, leveraging the same lightweight AI infrastructure.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.