Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

153

Articles

Likes

Views

Comments

Latest from AIWalker

100 recent articles max

AIWalker

Aug 13, 2025 · Artificial Intelligence

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

The paper presents OMFA, a diffusion‑based unified framework for virtual try‑on and try‑off that removes the need for garment templates, segmentation masks, and fixed poses by leveraging a novel partial‑diffusion mechanism and SMPL‑X pose conditioning, achieving state‑of‑the‑art results on VITON‑HD and DeepFashion‑MultiModal datasets.

AI try-onSMPL-Xcomputer vision

0 likes · 15 min read

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

AIWalker

Aug 13, 2025 · Artificial Intelligence

Look-Back Triggers Visual Reflection in Qwen-2.5-VL, +6.3% Perception

Look-Back is an implicit training paradigm that enables the Qwen‑2.5‑VL‑7B multimodal LLM to autonomously re‑focus on visual inputs during reasoning, achieving a 6.3 % boost in perception tasks, outperforming prior baselines while requiring no extra image tokens or model architecture changes.

Look-BackQwen-2.5-VLimplicit training

0 likes · 26 min read

Look-Back Triggers Visual Reflection in Qwen-2.5-VL, +6.3% Perception

AIWalker

Aug 6, 2025 · Artificial Intelligence

Why ByteDance’s 7B BAGEL Model Rivals GPT‑4o in Unified Multimodal Understanding and Generation

The article provides an in‑depth technical analysis of ByteDance’s 7‑billion‑parameter BAGEL model, detailing its MoT architecture, high‑quality interleaved multimodal pre‑training data, multi‑stage training strategy, emergent capabilities, and extensive benchmark results that show BAGEL matching or surpassing GPT‑4o on vision‑language tasks.

BAGELEmergent AbilitiesGPT-4o comparison

0 likes · 24 min read

Why ByteDance’s 7B BAGEL Model Rivals GPT‑4o in Unified Multimodal Understanding and Generation

AIWalker

Aug 5, 2025 · Artificial Intelligence

Perception‑R1: RL Gives Visual Insight Without Chain‑of‑Thought, Beats Four Tasks

The paper introduces Perception‑R1, a rule‑based reinforcement‑learning framework that trains multimodal large language models for visual perception tasks without relying on chain‑of‑thought reasoning, and demonstrates up to 17.9% performance gains on RefCOCO+, PixMo‑Count, PageOCR and COCO2017, while analyzing the key roles of perception confusion and reward design.

RLHFReward Designbenchmark

0 likes · 24 min read

Perception‑R1: RL Gives Visual Insight Without Chain‑of‑Thought, Beats Four Tasks

AIWalker

Aug 4, 2025 · Artificial Intelligence

Introducing CAIG: CTR‑Driven Advertising Image Generation with Open‑Source Code

CAIG leverages a multimodal large language model, a novel reward model, and product‑centered preference optimization to generate ad images that maximize click‑through rate, achieving state‑of‑the‑art performance in both online and offline evaluations.

CTROpen-sourcead image generation

0 likes · 7 min read

Introducing CAIG: CTR‑Driven Advertising Image Generation with Open‑Source Code

AIWalker

Aug 4, 2025 · Artificial Intelligence

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

Lumina-mGPT 2.0 is a decoder‑only, zero‑shot trained autoregressive image model that rivals diffusion systems like DALL·E 3 in quality while offering unified multimodal tokenization, flexible multi‑task generation, and several inference‑speed tricks, yet it still faces licensing, scaling and sampling‑time challenges.

AI model analysisImage GenerationLumina-mGPT

0 likes · 22 min read

Can Lumina-mGPT 2.0 Replace Diffusion Models? A Deep Dive into Its Autoregressive Power

AIWalker

Aug 3, 2025 · Artificial Intelligence

Tree-Guided CNN Boosts Image Super-Resolution in Joint University Study

A collaborative team from five universities proposes a tree-structured convolutional neural network that leverages binary‑tree guidance, cosine cross‑domain extraction, and an adaptive Nesterov momentum optimizer to markedly improve image super‑resolution performance.

adaptive optimizercomputer visiondeep learning

0 likes · 5 min read

Tree-Guided CNN Boosts Image Super-Resolution in Joint University Study

AIWalker

Aug 3, 2025 · Artificial Intelligence

CVPR 2025: DeQA-Score Lets LLMs Predict Image Quality Score Distributions

DeQA-Score introduces a soft‑label discretization that lets multimodal large language models regress continuous image‑quality scores as Gaussian distributions, achieving 30× lower mean error and preserving variance and inter‑image relationships, with KL‑divergence and fidelity losses driving state‑of‑the‑art performance.

CVPR2025DeQA-Scoreimage quality assessment

0 likes · 8 min read

CVPR 2025: DeQA-Score Lets LLMs Predict Image Quality Score Distributions

AIWalker

Jul 15, 2025 · Artificial Intelligence

Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%

This article presents Dynamic Vision Mamba (DyVM), a method that tackles token and block redundancy in Mamba‑based visual models through a novel re‑ordering pruning strategy and dynamic block selection, achieving a 35.2% FLOPs reduction with only a 1.7% accuracy loss while demonstrating strong generalization across tasks and architectures.

Dynamic Block SelectionFLOPs ReductionModel Efficiency

0 likes · 22 min read

Dynamic Vision Mamba: Re‑ordering Pruning and Adaptive Block Selection Cut FLOPs by 35.2%

AIWalker

Jul 1, 2025 · Artificial Intelligence

How a Minor Tweak to PINN Achieves Up to 100× Speedup

Recent breakthroughs such as VS‑PINN, Stiff‑PINN, MAD‑Scientist and KAN‑ODEs demonstrate how small algorithmic changes and novel training strategies can accelerate physics‑informed neural networks by orders of magnitude while expanding their applicability to stiff PDEs, chemical kinetics and dynamical systems.

PINNPhysics-Informed Neural NetworksScientific Machine Learning

0 likes · 6 min read

How a Minor Tweak to PINN Achieves Up to 100× Speedup