How Alibaba Cloud’s New Transformers and Model Fingerprinting Are Shaping ICCV 2023
Alibaba Cloud’s PAI platform showcased three breakthrough papers at ICCV 2023—including the Scale‑Aware Modulation Transformer for efficient vision backbones, the Stable‑DINO detection transformer with improved matching, and a non‑invasive fingerprinting method for deep image‑restoration models—highlighting its growing impact in AI research.
Alibaba Cloud’s Machine Learning Platform PAI had three papers accepted at ICCV 2023, underscoring its expanding influence in the international computer‑vision community.
Scale‑Aware Modulation Meets Transformer
The paper introduces SMT (Scale‑Aware Modulation Transformer), a hybrid CNN‑Transformer backbone that uses a lightweight Scale‑Aware Modulation (SAM) unit to capture multi‑scale features while expanding the receptive field. It also proposes an Evolutionary Hybrid Network (EHN) that better models the transition from local to global dependencies as depth increases. SMT achieves strong results on ImageNet, COCO, and ADE20K, reaching 88.1% top‑1 accuracy on ImageNet‑1k with only 80.5 M parameters after pre‑training on ImageNet‑22k.
Stable Matching Improves Detection Transformers
The authors identify an instability in DETR’s one‑to‑one matching caused by multiple optimization paths. By adding a position‑based term to the classification loss, they design a position‑supervised loss and a position‑modulated matching cost that can be applied to any DETR‑style model. They also introduce dense memory fusion to enhance encoder and backbone features. Experiments show Stable‑DINO reaches 50.4 AP and 51.5 AP on COCO with a ResNet‑50 backbone under standard settings, and scales up to 63.8 AP and 64.8 AP when using Swin‑Large and Focal‑Huge backbones.
Fingerprinting Deep Image Restoration Models
To protect the intellectual property of deep image‑restoration networks, the paper proposes a non‑invasive fingerprinting scheme that extracts a unique fingerprint from a model without altering its parameters. The workflow consists of three steps: (1) extract a fingerprint from the source model; (2) extract a fingerprint from a suspect model (which may be stolen or benign); (3) compare the two fingerprints using feature extraction and statistical similarity to assess ownership. The method leverages model inversion to generate a critical image that balances reconstruction difficulty and gradient non‑smoothness, yielding a robust fingerprint.
Advantages: no impact on model performance, resistance to common attacks.
Limitation: requires access to model gradients, demanding higher verification permissions.
The source code for both the SMT and fingerprinting methods has been open‑sourced, and a PAI‑based training and deployment framework is planned for release in October.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
