Author

AI Frontier Lectures

Leading AI knowledge platform

164

Articles

Likes

Views

Comments

Latest from AI Frontier Lectures

100 recent articles max

AI Frontier Lectures

Jan 5, 2026 · Artificial Intelligence

Why WeDLM Outpaces AR Models: Diffusion Decoding Meets KV Cache for 10× Faster Inference

Tencent WeChat AI introduces WeDLM, a diffusion language model that works with standard causal attention and KV caching, achieving up to ten‑fold speedups over autoregressive models while maintaining or improving generation quality across math reasoning and open‑ended tasks.

Diffusion Language ModelKV cacheParallel Decoding

0 likes · 8 min read

Why WeDLM Outpaces AR Models: Diffusion Decoding Meets KV Cache for 10× Faster Inference

AI Frontier Lectures

Jan 5, 2026 · Artificial Intelligence

Can AI Really Understand Dynamic First‑Person Scenes? Inside the New EOC‑Bench

The article introduces EOC‑Bench, a pioneering benchmark that evaluates multimodal large language models on dynamic first‑person visual tasks across past, present, and future time dimensions, presents its 3,277 questions, novel multi‑scale temporal accuracy metric, extensive model comparisons, and detailed error analysis revealing current models’ limitations in temporal perception and memory.

MLLM evaluationdynamic perceptionmultimodal AI

0 likes · 10 min read

Can AI Really Understand Dynamic First‑Person Scenes? Inside the New EOC‑Bench

AI Frontier Lectures

Dec 17, 2025 · Artificial Intelligence

Can OmniVGGT Unlock Multi‑Modal 3D Vision with Any Number of Inputs?

OmniVGGT introduces a flexible omni‑modality driven transformer that can ingest arbitrary numbers of geometric cues such as depth maps and camera parameters, achieving state‑of‑the‑art performance on diverse 3D tasks while keeping inference speed comparable to its RGB‑only predecessor.

3D visionMulti-ModalOmniVGGT

0 likes · 13 min read

Can OmniVGGT Unlock Multi‑Modal 3D Vision with Any Number of Inputs?

AI Frontier Lectures

Dec 15, 2025 · Artificial Intelligence

How UnityVideo Unifies Multimodal Training to Boost Video Generation

UnityVideo, a new vision framework from HKUST, CUHK, Tsinghua and Kuaishou, unifies training across depth, flow, pose, segmentation and RGB modalities, achieving faster convergence, higher video quality, zero‑shot generalization and stronger physical reasoning compared with existing single‑modality video generators.

AI researchUnityVideoVision Models

0 likes · 15 min read

How UnityVideo Unifies Multimodal Training to Boost Video Generation

AI Frontier Lectures

Dec 12, 2025 · Industry Insights

Why Global Tech Communities Are Obsessed with China's Embodied AI Breakthroughs

A surge of international attention focuses on China's rapid progress in embodied AI, highlighted by the GDPS 2025 competition, mass‑produced robots, and industry‑wide ecosystem support, prompting foreign observers to question why Chinese robotics outpace their own prototypes and demos.

ChinaCompetitionGDPS2025

0 likes · 9 min read

Why Global Tech Communities Are Obsessed with China's Embodied AI Breakthroughs

AI Frontier Lectures

Dec 9, 2025 · Artificial Intelligence

CrossVid: The New Benchmark Exposing AI’s Struggle with Cross‑Video Reasoning

CrossVid is an open‑source benchmark that evaluates multimodal large language models on cross‑video reasoning, offering 5,331 videos and 9,015 high‑quality QA pairs across four reasoning dimensions, and revealing that even the strongest models achieve only about 50% accuracy compared with human performance.

AI evaluationcross-video reasoningvideo understanding

0 likes · 9 min read

CrossVid: The New Benchmark Exposing AI’s Struggle with Cross‑Video Reasoning

AI Frontier Lectures

Dec 9, 2025 · Artificial Intelligence

Can Token‑Level Surrogates Stabilize RL for Large Language Models? A Deep Dive

This article analyzes why optimizing sequence‑level rewards for LLMs with token‑level surrogate objectives can improve reinforcement‑learning stability, explains the theoretical conditions required, introduces Routing Replay for MoE models, and presents extensive experiments validating the approach.

Importance SamplingMixture of Expertslarge language models

0 likes · 12 min read

Can Token‑Level Surrogates Stabilize RL for Large Language Models? A Deep Dive

AI Frontier Lectures

Nov 28, 2025 · Artificial Intelligence

Can AI Generate the Next Step in a Video? Inside the VANS Model

Researchers from Kuaishou and Hong Kong City University introduce VANS, a novel Video-as-Answer system that predicts and visualizes the next event in a video by jointly optimizing a visual language model and a video diffusion model, enabling personalized step‑by‑step guidance and future scenario generation.

Video Generationfuture predictionjoint optimization

0 likes · 10 min read

Can AI Generate the Next Step in a Video? Inside the VANS Model

AI Frontier Lectures

Nov 28, 2025 · Artificial Intelligence

How Meta’s SAM 3D Turns a Single Photo into Detailed 3D Models

Meta’s newly released SAM 3 and SAM 3D models enable single‑image 3D reconstruction and promptable segmentation, outperforming prior methods on benchmarks, introducing a shared perception encoder, a Presence Head to reduce hallucinations, and a two‑stage generation pipeline that produces high‑fidelity geometry and texture.

3D reconstructionMetaSAM 3

0 likes · 12 min read

How Meta’s SAM 3D Turns a Single Photo into Detailed 3D Models

AI Frontier Lectures

Nov 25, 2025 · Artificial Intelligence

How RoMa v2 Achieves Harder, Better, Faster, Denser Feature Matching

RoMa v2 introduces a two‑stage matching‑then‑refinement pipeline powered by DINOv3 features, custom CUDA kernels, and diverse training data, delivering state‑of‑the‑art accuracy, speed, and pixel‑level uncertainty estimation across a wide range of dense matching benchmarks.

DINOv3RoMa v2benchmark results

0 likes · 10 min read

How RoMa v2 Achieves Harder, Better, Faster, Denser Feature Matching