Tagged articles

CVPR2025

8 articles · Page 1 of 1

Aug 3, 2025 · Artificial Intelligence

CVPR 2025: DeQA-Score Lets LLMs Predict Image Quality Score Distributions

DeQA-Score introduces a soft‑label discretization that lets multimodal large language models regress continuous image‑quality scores as Gaussian distributions, achieving 30× lower mean error and preserving variance and inter‑image relationships, with KL‑divergence and fidelity losses driving state‑of‑the‑art performance.

CVPR2025DeQA-Scoreimage quality assessment

0 likes · 8 min read

CVPR 2025: DeQA-Score Lets LLMs Predict Image Quality Score Distributions

Tencent Technical Engineering

Jun 30, 2025 · Artificial Intelligence

How iMatch Won CVPR2025 NTIRE Image-Text Alignment: Techniques & Benchmarks

The IH‑VQA team’s iMatch solution clinched the CVPR2025 NTIRE Image‑Text Alignment champion by introducing dual‑model fusion, pseudo‑label data augmentation, Q‑Align probability mapping, and visual augmentations, and the paper also presents a comprehensive iMatch benchmark evaluating 23 state‑of‑the‑art text‑to‑image models across multiple resolutions.

AI quality assessmentCVPR2025Multimodal Evaluation

0 likes · 15 min read

How iMatch Won CVPR2025 NTIRE Image-Text Alignment: Techniques & Benchmarks

AntTech

Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Generative AIcomputer vision

0 likes · 20 min read

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

Kuaishou Audio & Video Technology

Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Multimodal

0 likes · 21 min read

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou Tech

Jun 10, 2025 · Artificial Intelligence

Teaching Large Language Models to Predict Image Quality Scores with DeQA-Score

DeQA-Score, a CVPR 2025 work, shows how to train multimodal large language models to regress continuous image quality scores by discretizing scores into soft-label level tokens, preserving Gaussian distribution statistics and achieving state‑of‑the‑art performance without any installation.

CVPR2025DeQA-Scoreimage quality assessment

0 likes · 8 min read

Teaching Large Language Models to Predict Image Quality Scores with DeQA-Score

AntTech

Mar 14, 2025 · Artificial Intelligence

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

The CVPR 2025 paper "MP-GUI: Modality Perception with MLLMs for GUI Understanding" presents a novel algorithm that enhances multimodal large language models' ability to perceive and reason about graphical user interfaces by integrating text, visual, and spatial signals through specialized perception modules and a dynamic fusion gate, achieving state‑of‑the‑art performance on multiple GUI benchmarks.

CVPR2025GUI UnderstandingMLLM

0 likes · 5 min read

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

AIWalker

Mar 7, 2025 · Artificial Intelligence

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks

The paper introduces GIFNet, a three‑branch network that leverages low‑level visual tasks and a cross‑fusion gating mechanism to achieve a single, task‑agnostic image‑fusion model with dramatically reduced computation, strong generalization to unseen modalities, and even single‑modal enhancement capabilities.

CVPR2025GIFNetImage Fusion

0 likes · 20 min read

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks