AIWalker
Jan 10, 2025 · Artificial Intelligence
How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090
This paper presents SiCLIP, a framework that simplifies the Transformer architecture, combines weight‑sharing, multi‑stage knowledge distillation, and a novel pair‑matching loss with synthetic captions to train a competitive CLIP model using only one RTX3090 GPU and 1 TB of storage, achieving state‑of‑the‑art data‑size‑parameter‑accuracy trade‑offs.
CLIPLightweight TrainingSynthetic Captions
0 likes · 19 min read
