How Alibaba Cloud’s New Transformers and Model Fingerprinting Are Shaping ICCV 2023

Alibaba Cloud’s PAI platform showcased three breakthrough papers at ICCV 2023—including the Scale‑Aware Modulation Transformer for efficient vision backbones, the Stable‑DINO detection transformer with improved matching, and a non‑invasive fingerprinting method for deep image‑restoration models—highlighting its growing impact in AI research.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Alibaba Cloud’s New Transformers and Model Fingerprinting Are Shaping ICCV 2023

Alibaba Cloud’s Machine Learning Platform PAI had three papers accepted at ICCV 2023, underscoring its expanding influence in the international computer‑vision community.

Scale‑Aware Modulation Meets Transformer

The paper introduces SMT (Scale‑Aware Modulation Transformer), a hybrid CNN‑Transformer backbone that uses a lightweight Scale‑Aware Modulation (SAM) unit to capture multi‑scale features while expanding the receptive field. It also proposes an Evolutionary Hybrid Network (EHN) that better models the transition from local to global dependencies as depth increases. SMT achieves strong results on ImageNet, COCO, and ADE20K, reaching 88.1% top‑1 accuracy on ImageNet‑1k with only 80.5 M parameters after pre‑training on ImageNet‑22k.

Stable Matching Improves Detection Transformers

The authors identify an instability in DETR’s one‑to‑one matching caused by multiple optimization paths. By adding a position‑based term to the classification loss, they design a position‑supervised loss and a position‑modulated matching cost that can be applied to any DETR‑style model. They also introduce dense memory fusion to enhance encoder and backbone features. Experiments show Stable‑DINO reaches 50.4 AP and 51.5 AP on COCO with a ResNet‑50 backbone under standard settings, and scales up to 63.8 AP and 64.8 AP when using Swin‑Large and Focal‑Huge backbones.

Fingerprinting Deep Image Restoration Models

To protect the intellectual property of deep image‑restoration networks, the paper proposes a non‑invasive fingerprinting scheme that extracts a unique fingerprint from a model without altering its parameters. The workflow consists of three steps: (1) extract a fingerprint from the source model; (2) extract a fingerprint from a suspect model (which may be stolen or benign); (3) compare the two fingerprints using feature extraction and statistical similarity to assess ownership. The method leverages model inversion to generate a critical image that balances reconstruction difficulty and gradient non‑smoothness, yielding a robust fingerprint.

Advantages: no impact on model performance, resistance to common attacks.

Limitation: requires access to model gradients, demanding higher verification permissions.

The source code for both the SMT and fingerprinting methods has been open‑sourced, and a PAI‑based training and deployment framework is planned for release in October.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TransformerImage RestorationModel Fingerprinting
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.