AIWalker
Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

153
Articles
0
Likes
0
Views
0
Comments
Recent Articles

Latest from AIWalker

100 recent articles max
AIWalker
AIWalker
Mar 19, 2026 · Artificial Intelligence

Vision‑R1 Multimodal Reasoning Model Delivers Human‑Level Logic and Near‑OpenAI O1 Accuracy

Vision‑R1 introduces a 7B multimodal large language model that leverages 200K unsupervised CoT data, Modality Bridging, and Progressive Thinking Suppression Training to overcome data scarcity and over‑thinking, achieving 73.5% accuracy on MathVista—within 0.4% of OpenAI’s O1.

benchmark performancechain of thoughtlarge language models
0 likes · 12 min read
Vision‑R1 Multimodal Reasoning Model Delivers Human‑Level Logic and Near‑OpenAI O1 Accuracy
AIWalker
AIWalker
Mar 18, 2026 · Artificial Intelligence

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

The AAAI 2026 paper by Tsinghua’s Huang‑Gao team shows that modeling Vision‑Transformer attention as a Block‑Circulant matrix and computing it with FFT reduces the quadratic complexity to O(N log N), delivering up to seven‑fold real‑world speedups without sacrificing accuracy.

AAAI 2026Attention MechanismsCirculant Matrices
0 likes · 15 min read
7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms
AIWalker
AIWalker
Mar 17, 2026 · Artificial Intelligence

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

InternVL-U, a 4‑billion‑parameter unified multimodal model released as open source, combines a 2B MLLM backbone with a 1.7B visual generation head and, through a reasoning‑centric data pipeline and Chain‑of‑Thought guidance, achieves superior understanding, generation, and editing performance that surpasses much larger 14‑20B models on multiple benchmarks.

AI researchImage GenerationInternVL-U
0 likes · 22 min read
How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants
AIWalker
AIWalker
Mar 16, 2026 · Artificial Intelligence

DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects

Beyond-Hungarian replaces the costly Hungarian assignment in DETR with a differentiable, query‑free matching scheme that halves training latency, boosts large‑object AP by 4.2 points, and introduces a GT‑Probe module and dual‑loss framework, while detailing trade‑offs, ablations, and future challenges.

DETRGT-ProbeHungarian matching
0 likes · 14 min read
DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects
AIWalker
AIWalker
Mar 13, 2026 · Artificial Intelligence

Towards AI That Truly Understands Art: Introducing the ArtiMuse Aesthetic Understanding Model

ArtiMuse, a new image aesthetic model unveiled at CVPR 2026 by Shanghai AI Lab and the China Academy of Art, combines a massive 10K fine‑grained dataset, a Token‑As‑Score scoring scheme, and unified textual‑and‑numeric feedback to deliver culturally aware, expert‑level art analysis and robust quantitative ratings.

AI aestheticsToken-As-Scoreart analysis
0 likes · 7 min read
Towards AI That Truly Understands Art: Introducing the ArtiMuse Aesthetic Understanding Model
AIWalker
AIWalker
Mar 12, 2026 · Artificial Intelligence

Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation

Mind-Brush introduces a ‘think‑research‑create’ agentic framework that unifies intent analysis, multimodal evidence retrieval, and knowledge‑driven reasoning to transform text‑to‑image generation from static decoding into an active cognitive workflow, achieving large accuracy gains on the new Mind‑Bench benchmark and surpassing existing SOTA models.

Agentic AIImage GenerationMind-Brush
0 likes · 15 min read
Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation
AIWalker
AIWalker
Mar 12, 2026 · Artificial Intelligence

BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)

The paper introduces BeautyGRPO, a reinforcement‑learning framework that combines a fine‑grained preference dataset (FRPref‑10K) with Dynamic Path Guidance to balance aesthetic enhancement and high‑fidelity preservation in portrait retouching, achieving superior metrics and user preference over existing SFT and RL models.

AI aestheticsCVPR 2026dynamic path guidance
0 likes · 11 min read
BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)
AIWalker
AIWalker
Mar 11, 2026 · Artificial Intelligence

Why 90% of DETR Queries Stay Idle and How PaQ‑DETR Boosts mAP by 4.2%

The article dissects the query‑activation imbalance in DETR‑based detectors, explains PaQ‑DETR’s pattern‑sharing and quality‑aware assignment mechanisms, and shows how these jointly raise detection mAP by up to 4.2% on COCO with less than 5% extra FLOPs.

DETRPaQ-DETRobject detection
0 likes · 15 min read
Why 90% of DETR Queries Stay Idle and How PaQ‑DETR Boosts mAP by 4.2%
AIWalker
AIWalker
Mar 10, 2026 · Artificial Intelligence

MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation

The paper introduces MIGM-Shortcut, a self‑supervised method that learns controlled latent‑state dynamics to bypass redundant bidirectional attention in Masked Image Generation Models, achieving over 4× speed‑up on state‑of‑the‑art multimodal diffusion models like Lumina‑DiMOO while preserving image quality.

AIDiffusion ModelsMIGM
0 likes · 8 min read
MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation
AIWalker
AIWalker
Mar 9, 2026 · Artificial Intelligence

How EFSI‑DETR Achieves 188 FPS and Boosts Small‑Object Detection Accuracy by 5.8%

The article dissects EFSI‑DETR, a UAV small‑object detector that combines simulated frequency processing with dynamic semantic enhancement to overcome pixel scarcity, static fusion, and ignored frequency cues, delivering 188 FPS and a 5.8% APₛ gain on VisDrone while remaining lightweight.

DETRReal-time InferenceUAV vision
0 likes · 16 min read
How EFSI‑DETR Achieves 188 FPS and Boosts Small‑Object Detection Accuracy by 5.8%