Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

163

Articles

Likes

232

Views

Comments

Latest from AIWalker

100 recent articles max

AIWalker

May 6, 2025 · Artificial Intelligence

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

SimpleAR demonstrates that a vanilla autoregressive model with only 0.5 B parameters can generate high‑fidelity 1024×1024 images, covering pretraining, supervised fine‑tuning, and reinforcement learning, achieving competitive GenEval (0.59) and DPG‑Bench (79.66) scores while reducing inference time to about 14 seconds with vLLM and KV‑cache optimizations.

PretrainingSupervised Fine‑Tuningautoregressive

0 likes · 14 min read

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

AIWalker

Apr 28, 2025 · Artificial Intelligence

SimpleAR: Autoregressive Visual Generation at 1024×1024 Using Only 0.5B Parameters

SimpleAR is a minimalist autoregressive visual generation framework that, with only 0.5 B parameters, achieves competitive 1024×1024 image synthesis through a three‑stage pipeline of large‑scale pretraining, supervised fine‑tuning, and GRPO‑based reinforcement learning, and demonstrates significant inference speedups using KV‑cache, vLLM, and speculative decoding.

Pretrainingautoregressive generationbenchmark

0 likes · 14 min read

SimpleAR: Autoregressive Visual Generation at 1024×1024 Using Only 0.5B Parameters

AIWalker

Apr 17, 2025 · Artificial Intelligence

Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation

This article provides an in‑depth analysis of DeepSeek’s Janus and Janus‑Pro models, explaining how decoupling visual encoding resolves the conflict between multimodal understanding and generation, detailing training stages, data scaling, architectural choices, and presenting extensive benchmark results that demonstrate significant performance gains.

DeepSeekJanusModel Scaling

0 likes · 23 min read

Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation

AIWalker

Apr 16, 2025 · Artificial Intelligence

Fast and Precise: FloED Sets New State‑of‑the‑Art in Video Restoration Over All Diffusion Models

FloED introduces a dual‑branch, flow‑guided diffusion framework that dramatically improves spatio‑temporal consistency and computational efficiency for video restoration, outperforming existing text‑guided diffusion methods on both object removal and background repair benchmarks.

FloEDdiffusion modelsefficiency

0 likes · 16 min read

Fast and Precise: FloED Sets New State‑of‑the‑Art in Video Restoration Over All Diffusion Models

AIWalker

Apr 16, 2025 · Artificial Intelligence

Plug‑and‑Play Multi‑Scale Attention: A Seamless Boost for Model Performance

This article reviews recent multi‑scale attention breakthroughs—including EMA, MSDA, VWA, and related modules—showing how they improve accuracy, cut FLOPs by up to 70%, and can be inserted into existing models with minimal effort, backed by code and paper links.

Plug-and-Playcomputer visiondeep learning

0 likes · 10 min read

Plug‑and‑Play Multi‑Scale Attention: A Seamless Boost for Model Performance

AIWalker

Apr 14, 2025 · Artificial Intelligence

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

FlexIP introduces a dual‑adapter architecture and a dynamic weight‑gating mechanism that decouple identity preservation from personalized editing, allowing continuous control over image generation and outperforming prior SOTA methods in both fidelity and flexibility.

AIdiffusion modelsdual-adapter

0 likes · 16 min read

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

AIWalker

Apr 13, 2025 · Artificial Intelligence

Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs

Huawei's Pangu Ultra introduces a 135‑billion‑parameter dense language model trained entirely on Ascend NPUs, detailing novel stability architectures, a domain‑aware tokenizer, multi‑stage pre‑training, extensive system optimizations, and benchmark results that surpass leading models such as Llama 405B and DeepSeek‑R1.

Ascend NPUDense ModelSystem Optimization

0 likes · 15 min read

Huawei Pangu Ultra: 135B Ascend‑Native Dense LLM Without Nvidia GPUs

AIWalker

Apr 11, 2025 · Artificial Intelligence

Teaching Large Language Models to Predict Image Quality Scores with DeQA-Score

DeQA-Score, a CVPR 2025 work, shows how to train multimodal large language models to regress continuous image quality scores by discretizing scores into soft-label level tokens, preserving Gaussian distribution statistics and achieving state‑of‑the‑art performance without any installation.

CVPR2025DeQA-Scoreimage quality assessment

0 likes · 8 min read

Teaching Large Language Models to Predict Image Quality Scores with DeQA-Score

AIWalker

Apr 10, 2025 · Artificial Intelligence

DCEdit: Precise Text-Guided Image Editing that Preserves Backgrounds

DCEdit introduces a precise semantic localization strategy and a dual-level control mechanism for text‑guided image editing, delivering superior background preservation and editing quality, as demonstrated on the new RW‑800 benchmark and extensive comparisons with state‑of‑the‑art diffusion models.

AIImage editingbenchmark

0 likes · 16 min read

DCEdit: Precise Text-Guided Image Editing that Preserves Backgrounds

AIWalker

Apr 8, 2025 · Artificial Intelligence

AgenticIR: An Agentic System for Restoring Images with Complex Degradations

AgenticIR combines visual language models and large language models in a multi‑stage reasoning workflow—perception, planning, execution, reflection, and adjustment—to evaluate, plan, and iteratively apply specialized restoration tools, achieving superior results on complexly degraded images compared to baseline methods.

Agentic SystemsICLR 2025Vision-Language Models

0 likes · 15 min read

AgenticIR: An Agentic System for Restoring Images with Complex Degradations