AIWalker
Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

153
Articles
0
Likes
0
Views
0
Comments
Recent Articles

Latest from AIWalker

100 recent articles max
AIWalker
AIWalker
May 26, 2025 · Artificial Intelligence

VisionReasoner: RL‑Unified Model Beats YOLO‑World Detection, Segmentation, Counting

VisionReasoner presents a reinforcement‑learning‑driven unified framework that simultaneously tackles detection, segmentation, and counting tasks, employing a novel multi‑target cognition strategy and efficient Hungarian‑based matching, and demonstrates substantial gains—29.1% on COCO detection, 22.1% on ReasonSeg, and 15.3% on CountBench—using only 7,000 training samples.

Multi-Task LearningVisionReasonercounting
0 likes · 20 min read
VisionReasoner: RL‑Unified Model Beats YOLO‑World Detection, Segmentation, Counting
AIWalker
AIWalker
May 22, 2025 · Artificial Intelligence

VisionReasoner: RL‑Unified System Beats YOLO‑World on Detection, Segmentation, Counting

VisionReasoner introduces a reinforcement‑learning‑driven unified framework that simultaneously handles detection, segmentation, and counting tasks within a single model, achieving 29.1% higher COCO detection AP, 22.1% better ReasonSeg segmentation, and 15.3% improvement on CountBench, while requiring only 7,000 training samples and offering efficient multi‑target matching via batch computation and the Hungarian algorithm.

LVLMVisionReasonerimage segmentation
0 likes · 19 min read
VisionReasoner: RL‑Unified System Beats YOLO‑World on Detection, Segmentation, Counting
AIWalker
AIWalker
May 18, 2025 · Artificial Intelligence

YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2

YOLOE unifies object detection and segmentation in a single efficient model that supports text, visual, and prompt‑free inference, introduces RepRTA, SAVPE, and LRPC strategies, and achieves higher AP with up to three‑fold lower training cost and 1.4× faster inference on GPUs and mobile devices, as demonstrated by extensive LVIS and COCO experiments.

YOLOEcomputer visionobject detection
0 likes · 29 min read
YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2
AIWalker
AIWalker
May 16, 2025 · Artificial Intelligence

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

GPDiT, a novel autoregressive diffusion transformer, unifies diffusion and autoregressive modeling for video generation, introducing lightweight causal attention and a parameter‑free rotation‑based time conditioning that boost temporal consistency and cut training/inference costs, achieving state‑of‑the‑art results on multiple benchmarks.

Diffusion ModelsVideo Generationautoregressive modeling
0 likes · 16 min read
GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework
AIWalker
AIWalker
May 14, 2025 · Artificial Intelligence

Guided Image Filtering Explained: Insights from He Kaiming’s Classic Paper

This article reviews the Guided Image Filtering technique introduced by He Kaiming, compares it with other edge‑preserving filters, provides detailed OpenCV C++ and Python implementations, discusses the fast variant, analyzes computational complexity, and showcases visual results.

Edge PreservingGuided FilteringImage Processing
0 likes · 10 min read
Guided Image Filtering Explained: Insights from He Kaiming’s Classic Paper
AIWalker
AIWalker
May 14, 2025 · Artificial Intelligence

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

This paper presents HGO‑YOLO, a lightweight real‑time anomaly‑behavior detector that integrates HGNetv2 and GhostConv into YOLOv8, achieving 87.4% mAP with just 4.6 MB of parameters and 56 FPS on CPU, and validates its performance across multiple datasets and hardware platforms.

Edge AILightweight ModelsYOLO
0 likes · 25 min read
How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters
AIWalker
AIWalker
May 13, 2025 · Artificial Intelligence

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

PixelHacker introduces a latent class guidance (LCG) paradigm that injects foreground and background embeddings into a diffusion model, training on 14 million image‑mask pairs and achieving state‑of‑the‑art structural and semantic consistency across Places2, CelebA‑HQ and FFHQ benchmarks.

Diffusion ModelsPixelHackerSOTA
0 likes · 16 min read
PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA
AIWalker
AIWalker
May 12, 2025 · Artificial Intelligence

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

DefMamba introduces a multi‑scale backbone, deformable Mamba modules, and a dynamic scanning strategy to preserve image spatial structure, achieving state‑of‑the‑art performance on image classification, object detection, and semantic segmentation benchmarks.

DefMambacomputer visiondeformable state space
0 likes · 23 min read
DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks
AIWalker
AIWalker
May 11, 2025 · Artificial Intelligence

Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances

This comprehensive survey reviews the rapid progress of multimodal understanding and text‑to‑image generation models, categorises existing unified architectures into diffusion‑based, autoregressive, and hybrid paradigms, analyses their tokenisation strategies, datasets and benchmarks, and highlights current challenges and future research directions.

DatasetsDiffusion ModelsSurvey
0 likes · 64 min read
Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances