Automated Detection of Illegal Watermarks in Images Using Deep Learning at 58.com
This article describes how 58.com built an end‑to‑end deep‑learning watermark detection service, covering business needs, data collection and augmentation, model selection and iterative improvements (Faster‑RCNN, SSD, YOLOv3, anchor‑free methods), deployment results, and future research directions.
58.com processes billions of user‑uploaded images daily, many of which contain illegal watermarks such as third‑party logos, mosaics, QR codes, or other copyrighted marks. Manual review cannot keep up, so an automated detection system based on computer‑vision techniques was required.
The technical background leverages recent advances in artificial intelligence, especially deep‑learning object detection. Two main families of detectors were considered: stage‑two methods like Faster‑RCNN, which offer high accuracy but are computationally heavy, and stage‑one (one‑stage) methods such as SSD and YOLO, which provide faster inference.
Data processing began with large‑scale collection of watermark images, followed by extensive augmentation (brightness, contrast, saturation, noise, random cropping, mirroring) and the use of MixUp to improve robustness. Custom anchors were generated via k‑means clustering to better suit the small size of watermarks.
The detection scheme evolved through five key iterations:
Initial model based on MobileNetV2‑SSD achieved 51.79% accuracy; after adding missing watermark classes and further augmentation, accuracy rose to 93.06%.
Switch to a YOLOv3 backbone with Feature Pyramid Networks increased recall but required model pruning (using BN‑γ scaling factors) to reduce size from 236 MB to 34 MB and raise QPS from 20 to 70.85.
Fine‑grained classification of mosaic watermarks reduced false‑positive rate from 3% to 1.4%.
Incorporating mis‑detected samples as synthetic training data lowered the error rate to 0.34%.
Exploring anchor‑free detectors (CenterNet) further improved small‑object detection without the need for non‑maximum suppression.
Extensive ablation experiments on training cost, convergence time, and model performance (MAP, Recall) were documented in Tables 1‑3, demonstrating that the final model balances high accuracy (≈97% MAP) with low latency.
In production, the watermark detection service handles about 71 million requests per day across 11 scenarios, achieving a 72% recall and processing over 25 000 posts daily.
The authors conclude with practical lessons (importance of balanced, large datasets, synthetic data labeling) and outline future work: improving synthetic data quality with GANs, continual model upgrades, and addressing data imbalance across categories.
References to MobileNetV2, SSD, YOLOv3, CenterNet, and related papers are provided.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.