Author

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

165

Articles

Likes

397

Views

Comments

Latest from AIWalker

100 recent articles max

AIWalker

Jun 4, 2026 · Artificial Intelligence

How YOLO26 Redefines Real‑Time Detection: NMS‑Free Dual‑Head Architecture Beats YOLO11

YOLO26 eliminates NMS and DFL, adopts a dual‑head design, MuSGD optimizer, progressive loss weighting, and STAL small‑object assignment, achieving 57.5 mAP with 1.7 ms latency on COCO while unifying detection, segmentation, pose, OBB and open‑set tasks, as shown by extensive ablations.

MuSGD optimizerSTAL small-object assignmentYOLO26

0 likes · 14 min read

How YOLO26 Redefines Real‑Time Detection: NMS‑Free Dual‑Head Architecture Beats YOLO11

AIWalker

May 21, 2026 · Artificial Intelligence

AnyFlow: Generate High‑Quality Video in 4 Steps with Unlimited Sampling Improvement

AnyFlow introduces a flow‑map distillation framework that enables video diffusion models to produce high‑quality results in just four steps while continuously improving with additional sampling steps, supporting both causal and bidirectional architectures up to 14 B parameters and allowing downstream fine‑tuning.

AI Video GenerationVideo Diffusionany-step sampling

0 likes · 13 min read

AnyFlow: Generate High‑Quality Video in 4 Steps with Unlimited Sampling Improvement

AIWalker

May 20, 2026 · Artificial Intelligence

AnyFlow: Generate High‑Quality Video in 4 Steps and Keep Improving with More Sampling

AnyFlow introduces a flow‑map distillation framework that enables video diffusion models to produce high‑quality results in just four sampling steps while still gaining quality as the number of steps increases, supporting both causal and bidirectional architectures and scaling up to 14 B parameters.

Video Diffusionbidirectional videocausal video

0 likes · 14 min read

AnyFlow: Generate High‑Quality Video in 4 Steps and Keep Improving with More Sampling

AIWalker

May 19, 2026 · Artificial Intelligence

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

EUPE introduces a three‑stage “scale‑then‑shrink” distillation pipeline that first trains a large proxy model to absorb heterogeneous expert knowledge and then compresses it into an 86M encoder, achieving state‑of‑the‑art performance on image classification, dense prediction and vision‑language tasks on an iPhone with only 62 ms latency.

EUPEModel CompressionViT

0 likes · 16 min read

How EUPE’s Three‑Stage Distillation Lets an 86M Model Run Classification, Segmentation and VLM on iPhone in 62 ms (SOTA)

AIWalker

May 19, 2026 · Industry Insights

How TCL’s XRGB Achieves BT.2020 131% Color Gamut with a Four‑Color Pixel Architecture

TCL Huaxing’s XRGB technology adds an independent cyan sub‑pixel to the traditional RGB layout, forming an RGBC four‑color architecture, custom cyan color‑filter and backlight, and a dedicated color‑mapping algorithm that together deliver BT.2020 131% gamut, 7000:1 contrast, 0.7% reflectivity and true‑4K resolution, redefining LCD display limits.

BT.2020LCD displayRGBC

0 likes · 7 min read

How TCL’s XRGB Achieves BT.2020 131% Color Gamut with a Four‑Color Pixel Architecture

AIWalker

May 19, 2026 · Artificial Intelligence

Why Attention Transfer Fails for DINOv2 and Other Modern ViTs: Architecture Mismatch Revealed

A large-scale benchmark of 20 pretrained ViT teachers across 11 families shows that attention copy and distillation improve some models but hurt others—especially DINOv2, CLIP, and BEiTv2—due to architecture mismatches, and adding the teachers' native components to students restores the lost performance.

Architecture CompatibilityAttention TransferDeep Learning

0 likes · 13 min read

Why Attention Transfer Fails for DINOv2 and Other Modern ViTs: Architecture Mismatch Revealed

AIWalker

May 18, 2026 · Artificial Intelligence

ByteDance Teams with He Kaiming to Open‑Source the Continuous Diffusion Language Model Cola DLM

The article analyzes ByteDance's Cola DLM, a fully open‑source continuous diffusion language model that abandons token‑centric generation in favor of latent semantic representations, detailing its architecture, training strategy, scaling stability, and how it compares with the earlier ELF model.

ByteDanceCola DLMLanguage Model

0 likes · 14 min read

ByteDance Teams with He Kaiming to Open‑Source the Continuous Diffusion Language Model Cola DLM

AIWalker

May 17, 2026 · Industry Insights

Why Converting SDR to HDR Involves More Than Just Brightening the Image

The paper presents a pixel‑level statistical study of the ASC StEM2 test film, building a three‑layer physical‑perceptual comparison of EXR, SDR and HDR masters, revealing that about 82 % of image regions can be restored through a restrained restoration process while the remaining areas require targeted semantic adjustments, offering concrete guidance for AI‑driven HDR conversion and industry standards.

Artificial IntelligenceDigital CinemaHDR

0 likes · 29 min read

Why Converting SDR to HDR Involves More Than Just Brightening the Image

AIWalker

May 17, 2026 · Artificial Intelligence

From Image Captioning to Detective‑Style Perception: Pixel‑Searcher Beats Closed‑Source Models

Pixel‑Searcher introduces an agentic search‑driven visual perception framework that integrates web‑based evidence with pixel‑level grounding, and the new WebEyes benchmark demonstrates its superiority over existing open‑ and closed‑source multimodal models across localization, segmentation, and VQA tasks.

Agentic SearchPixel-SearcherWebEyes

0 likes · 16 min read

From Image Captioning to Detective‑Style Perception: Pixel‑Searcher Beats Closed‑Source Models

AIWalker

May 16, 2026 · Artificial Intelligence

Qwen3-VL-Seg Unlocks Pixel‑Level Open‑World Segmentation

Qwen3-VL-Seg, the latest open‑source multimodal LLM from Alibaba, extends bounding‑box predictions to pixel‑accurate masks using a lightweight box‑guided decoder, achieving strong performance on both closed‑set and open‑world segmentation tasks with only 0.4% extra parameters.

Multimodal LLMQwen3-VL-SegSA1B-ORS dataset

0 likes · 6 min read