Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

163

Articles

Likes

231

Views

Comments

Latest from AIWalker

100 recent articles max

AIWalker

Mar 19, 2026 · Artificial Intelligence

Vision‑R1 Multimodal Reasoning Model Delivers Human‑Level Logic and Near‑OpenAI O1 Accuracy

Vision‑R1 introduces a 7B multimodal large language model that leverages 200K unsupervised CoT data, Modality Bridging, and Progressive Thinking Suppression Training to overcome data scarcity and over‑thinking, achieving 73.5% accuracy on MathVista—within 0.4% of OpenAI’s O1.

Chain-of-ThoughtMultimodal Reasoningbenchmark performance

0 likes · 12 min read

Vision‑R1 Multimodal Reasoning Model Delivers Human‑Level Logic and Near‑OpenAI O1 Accuracy

AIWalker

Mar 18, 2026 · Artificial Intelligence

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

The AAAI 2026 paper by Tsinghua’s Huang‑Gao team shows that modeling Vision‑Transformer attention as a Block‑Circulant matrix and computing it with FFT reduces the quadratic complexity to O(N log N), delivering up to seven‑fold real‑world speedups without sacrificing accuracy.

AAAI 2026Circulant MatricesFFT

0 likes · 15 min read

7× Faster Inference: Tsinghua’s Huang‑Gao Team Redesigns Vision‑Transformer Attention via Fourier Transforms

AIWalker

Mar 17, 2026 · Artificial Intelligence

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

InternVL-U, a 4‑billion‑parameter unified multimodal model released as open source, combines a 2B MLLM backbone with a 1.7B visual generation head and, through a reasoning‑centric data pipeline and Chain‑of‑Thought guidance, achieves superior understanding, generation, and editing performance that surpasses much larger 14‑20B models on multiple benchmarks.

AI researchInternVL-Uimage generation

0 likes · 22 min read

How a 4B-Parameter Open-Source Model Outperforms 14B Multimodal Giants

AIWalker

Mar 16, 2026 · Artificial Intelligence

DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects

Beyond-Hungarian replaces the costly Hungarian assignment in DETR with a differentiable, query‑free matching scheme that halves training latency, boosts large‑object AP by 4.2 points, and introduces a GT‑Probe module and dual‑loss framework, while detailing trade‑offs, ablations, and future challenges.

DETRGT-ProbeHungarian matching

0 likes · 14 min read

DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects

AIWalker

Mar 13, 2026 · Artificial Intelligence

Towards AI That Truly Understands Art: Introducing the ArtiMuse Aesthetic Understanding Model

ArtiMuse, a new image aesthetic model unveiled at CVPR 2026 by Shanghai AI Lab and the China Academy of Art, combines a massive 10K fine‑grained dataset, a Token‑As‑Score scoring scheme, and unified textual‑and‑numeric feedback to deliver culturally aware, expert‑level art analysis and robust quantitative ratings.

AI aestheticsToken-As-Scoreart analysis

0 likes · 7 min read

Towards AI That Truly Understands Art: Introducing the ArtiMuse Aesthetic Understanding Model

AIWalker

Mar 12, 2026 · Artificial Intelligence

Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation

Mind-Brush introduces a ‘think‑research‑create’ agentic framework that unifies intent analysis, multimodal evidence retrieval, and knowledge‑driven reasoning to transform text‑to‑image generation from static decoding into an active cognitive workflow, achieving large accuracy gains on the new Mind‑Bench benchmark and surpassing existing SOTA models.

Mind-BrushMultimodal Reasoningagentic AI

0 likes · 15 min read

Mind-Brush: ‘Think‑Research‑Create’ Intent Reasoning for Image Generation

AIWalker

Mar 12, 2026 · Artificial Intelligence

BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)

The paper introduces BeautyGRPO, a reinforcement‑learning framework that combines a fine‑grained preference dataset (FRPref‑10K) with Dynamic Path Guidance to balance aesthetic enhancement and high‑fidelity preservation in portrait retouching, achieving superior metrics and user preference over existing SFT and RL models.

AI aestheticsCVPR 2026dynamic path guidance

0 likes · 11 min read

BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)

AIWalker

Mar 11, 2026 · Artificial Intelligence

Why 90% of DETR Queries Stay Idle and How PaQ‑DETR Boosts mAP by 4.2%

The article dissects the query‑activation imbalance in DETR‑based detectors, explains PaQ‑DETR’s pattern‑sharing and quality‑aware assignment mechanisms, and shows how these jointly raise detection mAP by up to 4.2% on COCO with less than 5% extra FLOPs.

DETRPaQ-DETRobject detection

0 likes · 15 min read

Why 90% of DETR Queries Stay Idle and How PaQ‑DETR Boosts mAP by 4.2%

AIWalker

Mar 10, 2026 · Artificial Intelligence

MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation

The paper introduces MIGM-Shortcut, a self‑supervised method that learns controlled latent‑state dynamics to bypass redundant bidirectional attention in Masked Image Generation Models, achieving over 4× speed‑up on state‑of‑the‑art multimodal diffusion models like Lumina‑DiMOO while preserving image quality.

AIMIGMdiffusion models

0 likes · 8 min read

MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation

AIWalker

Mar 9, 2026 · Artificial Intelligence

How EFSI‑DETR Achieves 188 FPS and Boosts Small‑Object Detection Accuracy by 5.8%

The article dissects EFSI‑DETR, a UAV small‑object detector that combines simulated frequency processing with dynamic semantic enhancement to overcome pixel scarcity, static fusion, and ignored frequency cues, delivering 188 FPS and a 5.8% APₛ gain on VisDrone while remaining lightweight.

DETRUAV visiondynamic fusion

0 likes · 16 min read

How EFSI‑DETR Achieves 188 FPS and Boosts Small‑Object Detection Accuracy by 5.8%