From Blur to Brilliance: How AI‑Powered Image Quality Assessment Transformed 58.com’s Recruitment Images
This article reviews image quality assessment fundamentals, modern CNN‑based IQA models, and their deployment at 58.com to automatically score, filter, and rank millions of recruitment photos, achieving a drop in low‑quality images from 9% to zero while boosting overall accuracy to 94.7%.
Background
Image Quality Assessment (IQA) evaluates the visual fidelity of images. IQA methods are divided by reference requirement:
Full‑Reference (FR) : requires the pristine image.
Reduced‑Reference (RR) : uses partial features of the reference.
No‑Reference (NR) / Blind : operates without any reference.
Traditional FR metrics include MSE, PSNR, SSIM, VMAF. NR methods such as BRISQUE, MSDD, RankIQA rely on handcrafted features. Deep learning has produced FR and NR CNN‑based models that achieve higher correlation with human judgments.
Evaluation Metrics
Performance is measured by Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank‑order Correlation Coefficient (SRCC). PLCC is the covariance of predicted and ground‑truth scores divided by the product of their standard deviations. SRCC is the Pearson correlation of the rank vectors of predictions and MOS.
Typical IQA Datasets
LIVE (2006): 29 reference images, 779 distorted images, 5 distortion types.
TID2013 (2013): 25 references, 3000 distorted images, 24 distortion types.
CSIQ : 512×512 images, 6 distortion types, 25 observers.
KonIQ‑10k (2018): 10 073 authentically distorted images with crowdsourced MOS.
Datasets provide MOS (Mean Opinion Score) or DMOS (Differential MOS) labels for training and testing.
CNN‑Based IQA Models
WaDIQaM
WaDIQaM uses a twin VGG‑16 backbone that processes 32×32 patches from the reference and distorted images. Features f_r and f_d are concatenated with their difference f_r‑f_d and fed to a fully‑connected layer that regresses a patch quality score. Two spatial‑pooling strategies are supported:
Simple average of patch scores.
Learned weighted average via an additional FC layer.
Training uses L1 loss on randomly sampled patches.
DBCNN
DBCNN consists of two VGG‑16 branches:
S‑CNN is pretrained on synthetic distortions generated from the Waterloo Exploration and PASCAL VOC datasets (39 distortion categories).
The second branch is pretrained on ImageNet for authentic‑distortion recognition.
Features from both branches undergo bilinear pooling ( B = Y_1^T Y_2) and are regressed to a quality score with L2 loss.
hyperIQA
hyperIQA comprises three modules:
Feature extractor : ResNet‑50 (four stages) extracts multi‑scale features.
Hyper‑network : Generates adaptive weights for the quality predictor based on the extracted features.
Quality regression head : A fully‑connected layer predicts the final score.
Training samples random 224×224 patches and uses L1 loss. The hyper‑network enables per‑image adaptation, improving robustness to diverse real‑world distortions.
Business Scenario – 58.com Recruitment
The recruitment platform receives billions of user‑uploaded images daily. Quality defects (blur, noise, extreme aspect ratio, solid‑color backgrounds, large text blocks, certificates, low resolution) caused a 9 % defect rate.
Quality Rules
Severe blur/noise/exposure/distortion → low quality.
Extreme aspect‑ratio (max(width/height, height/width) > 2.5) → non‑high quality.
Solid‑color product or logo images → non‑high quality.
Text area > 30 % of image → non‑high quality.
Certificates, licenses, contracts → non‑high quality.
Resolution < 256 px (width or height) → low quality.
Technical Solution
The pipeline consists of an IQA model followed by rule‑based calibrations:
Resolution : penalize images < 256 px, reward > 1000 px.
Aspect‑ratio : penalize if max(width/height, height/width) > 2.5.
OCR (DBNet) : compute text‑pixel ratio; if > 30 % downgrade to medium quality.
Text recognition (CRNN) : detect keywords (e.g., “license”, “certificate”, “contract”); if found, downgrade.
Background‑color clustering : identify product‑showcase or logo images and map their scores to the medium range.
Model outputs a continuous score (0‑100). K‑means clustering maps the score to three discrete categories (low 0‑40, medium 40‑60, high 60‑100). Thresholds are tuned on a validation set.
Dataset and Training
A custom test set 58zhaopin‑5k contains 814 low‑quality, 1 605 medium‑quality, and 2 572 high‑quality images.
Models were trained on public datasets:
WaDIQaM on LIVE.
DBCNN on TID2013.
hyperIQA on KonIQ‑10k (achieving 86.93 % raw accuracy on 58zhaopin‑5k).
To reduce inference cost, the ResNet‑50 backbone of hyperIQA was replaced with ResNet‑18, yielding comparable accuracy with lower computational complexity.
Calibration Impact
After applying the rule‑based calibrations, overall classification accuracy on 58zhaopin‑5k increased from 86.93 % to 94.72 %.
Deployment Results
The service processes ~120 k new images and ~200 k total images per day. The image‑defect rate dropped from 9 % to 0 %, meeting the business’s quality requirements.
Conclusion and Future Work
A customized IQA pipeline combining a deep learning model with domain‑specific calibrations successfully filtered low‑quality recruitment images. Future work includes:
Generalizing the pipeline to other visual services.
Integrating high‑level semantic cues with low‑level distortion features.
Extending the approach to video quality assessment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
