Kuaishou Tech
Oct 20, 2021 · Artificial Intelligence
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
This paper proposes HiT, a hierarchical transformer model with momentum contrast for video-text retrieval, addressing limitations in existing multimodal learning methods by introducing hierarchical cross-modal contrast matching and momentum cross-modal contrast to improve retrieval performance.
Artificial IntelligenceHCMMCC
0 likes · 9 min read