Tagged articles
8 articles
Page 1 of 1
AIWalker
AIWalker
Aug 3, 2025 · Artificial Intelligence

CVPR 2025: DeQA-Score Lets LLMs Predict Image Quality Score Distributions

DeQA-Score introduces a soft‑label discretization that lets multimodal large language models regress continuous image‑quality scores as Gaussian distributions, achieving 30× lower mean error and preserving variance and inter‑image relationships, with KL‑divergence and fidelity losses driving state‑of‑the‑art performance.

CVPR2025DeQA-Scoreimage quality assessment
0 likes · 8 min read
CVPR 2025: DeQA-Score Lets LLMs Predict Image Quality Score Distributions
Tencent Technical Engineering
Tencent Technical Engineering
Jun 30, 2025 · Artificial Intelligence

How iMatch Won CVPR2025 NTIRE Image-Text Alignment: Techniques & Benchmarks

The IH‑VQA team’s iMatch solution clinched the CVPR2025 NTIRE Image‑Text Alignment champion by introducing dual‑model fusion, pseudo‑label data augmentation, Q‑Align probability mapping, and visual augmentations, and the paper also presents a comprehensive iMatch benchmark evaluating 23 state‑of‑the‑art text‑to‑image models across multiple resolutions.

AI quality assessmentCVPR2025Multimodal Evaluation
0 likes · 15 min read
How iMatch Won CVPR2025 NTIRE Image-Text Alignment: Techniques & Benchmarks
AntTech
AntTech
Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Computer VisionVideo Generation
0 likes · 20 min read
21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Jun 11, 2025 · Artificial Intelligence

Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI

Kuaishou presented twelve peer‑reviewed papers at CVPR 2025 covering video quality assessment, large‑scale video datasets, dynamic 3D avatar reconstruction, 4D scene simulation, controllable video generation, scaling laws for diffusion transformers, multimodal foundations, and more, highlighting the company's leading research in computer vision and AI.

AI researchCVPR2025Deep Learning
0 likes · 21 min read
Kuaishou Showcases 12 Cutting-Edge CVPR 2025 Papers on Video Generation and AI
Kuaishou Tech
Kuaishou Tech
Jun 10, 2025 · Artificial Intelligence

Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025

The article highlights CVPR 2025’s acceptance statistics and showcases twelve cutting‑edge video‑generation papers from Kuaishou, spanning datasets, quality assessment, style control, scaling laws, 4D simulation, interleaved image‑text data, vision‑language acceleration, high‑fidelity avatars, patch‑wise super‑resolution, narrative‑driven benchmarks, sketch‑based editing, and spatio‑temporal diffusion, each with links and abstracts.

CVPR2025Computer VisionKuaishou
0 likes · 20 min read
Top 12 Cutting-Edge Video Generation Papers from Kuaishou at CVPR 2025
AIWalker
AIWalker
Apr 11, 2025 · Artificial Intelligence

Teaching Large Language Models to Predict Image Quality Scores with DeQA-Score

DeQA-Score, a CVPR 2025 work, shows how to train multimodal large language models to regress continuous image quality scores by discretizing scores into soft-label level tokens, preserving Gaussian distribution statistics and achieving state‑of‑the‑art performance without any installation.

CVPR2025DeQA-Scoreimage quality assessment
0 likes · 8 min read
Teaching Large Language Models to Predict Image Quality Scores with DeQA-Score
AntTech
AntTech
Mar 14, 2025 · Artificial Intelligence

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

The CVPR 2025 paper "MP-GUI: Modality Perception with MLLMs for GUI Understanding" presents a novel algorithm that enhances multimodal large language models' ability to perceive and reason about graphical user interfaces by integrating text, visual, and spatial signals through specialized perception modules and a dynamic fusion gate, achieving state‑of‑the‑art performance on multiple GUI benchmarks.

CVPR2025Computer VisionGUI Understanding
0 likes · 5 min read
MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding
AIWalker
AIWalker
Mar 7, 2025 · Artificial Intelligence

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks

The paper introduces GIFNet, a three‑branch network that leverages low‑level visual tasks and a cross‑fusion gating mechanism to achieve a single, task‑agnostic image‑fusion model with dramatically reduced computation, strong generalization to unseen modalities, and even single‑modal enhancement capabilities.

CVPR2025Computer VisionGIFNet
0 likes · 20 min read
How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks