Tagged articles
28 articles
Page 1 of 1
Machine Heart
Machine Heart
May 14, 2026 · Artificial Intelligence

Breaking the 3D Perception Bottleneck: VGGT Series Enables Dynamic High‑Fidelity Reconstruction

The VGGT series from KOKONI 3D and collaborators tackles three core 3D perception limits—unbounded sequence memory, dynamic‑static entanglement, and compute‑precision trade‑offs—by introducing StreamCacheVGGT, progressive decoupling, and HD‑VGGT, achieving O(1) memory streaming, 15%+ accuracy gains on dynamic benchmarks, and record‑high AUC on RealEstate10K.

3D reconstructionComputer VisionVGGT
0 likes · 10 min read
Breaking the 3D Perception Bottleneck: VGGT Series Enables Dynamic High‑Fidelity Reconstruction
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

Scal3R Enables Stable Kilometer-Scale 3D Reconstruction of Long Videos

Scal3R introduces test‑time training with a global‑context memory and synchronization mechanism that lets models train on and infer over ultra‑long video sequences, achieving accurate camera poses and dense point clouds for kilometer‑scale scenes while outperforming prior SLAM, SfM and streaming baselines on multiple benchmarks.

3D reconstructionComputer VisionScal3R
0 likes · 11 min read
Scal3R Enables Stable Kilometer-Scale 3D Reconstruction of Long Videos
Machine Heart
Machine Heart
Apr 26, 2026 · Industry Insights

Is 3D Reconstruction the Spatial Foundation for Next‑Gen Models?

The article examines how 3D reconstruction is evolving from offline, single‑scene pipelines to continuous, streaming workflows that feed web distribution, robot simulation, visual positioning, spatial editing, and world‑generation systems, highlighting recent research, standards, and industry deployments.

3D reconstructionDigital TwinRobotics
0 likes · 10 min read
Is 3D Reconstruction the Spatial Foundation for Next‑Gen Models?
Machine Heart
Machine Heart
Apr 15, 2026 · Artificial Intelligence

From Clip Generation to Long‑Video Roaming: OmniRoam Enables Stable, Trajectory‑Controlled Video Synthesis

OmniRoam introduces a panoramic, coarse‑to‑fine framework that generates long, trajectory‑controlled videos with higher spatial consistency and temporal coherence, offering a stable and controllable alternative to short‑clip generation and supporting real‑time preview, high‑resolution refinement, and 3D reconstruction applications.

3D reconstructionOmniRoamgenerative AI
0 likes · 8 min read
From Clip Generation to Long‑Video Roaming: OmniRoam Enables Stable, Trajectory‑Controlled Video Synthesis
Machine Heart
Machine Heart
Apr 2, 2026 · Artificial Intelligence

HSImul3R: Bridging Perception and Simulation for Physics‑Ready 3D Human‑Scene Interaction

HSImul3R introduces a physics‑in‑the‑loop reconstruction pipeline that closes the perception‑simulation gap by jointly optimizing human motion and scene geometry, leveraging reinforcement learning, direct simulation‑reward optimization, and a new HSIBench dataset to produce simulation‑ready 3D human‑scene interactions.

3D reconstructionDSROHSIBench
0 likes · 12 min read
HSImul3R: Bridging Perception and Simulation for Physics‑Ready 3D Human‑Scene Interaction
Data Party THU
Data Party THU
Mar 29, 2026 · Artificial Intelligence

How LoGeR Enables Minute‑Long 3D Reconstruction with Hybrid Memory

The article presents LoGeR, a long‑context geometric reconstruction framework that combines test‑time‑training memory and sliding‑window attention to achieve minute‑scale, fully‑feedforward 3D reconstruction with superior accuracy on benchmarks such as KITTI and VBR.

3D reconstructionComputer VisionHybrid Memory
0 likes · 11 min read
How LoGeR Enables Minute‑Long 3D Reconstruction with Hybrid Memory
HyperAI Super Neural
HyperAI Super Neural
Mar 26, 2026 · Artificial Intelligence

MIT’s Wave‑Former Reconstructs Fully Occluded Objects with 85% Precision, Boosting Recall to 72%

MIT researchers introduce Wave‑Former, a physics‑aware, generative‑AI framework for mmWave sensing that achieves high‑precision 3D reconstruction of completely hidden objects, raising recall from 54% to 72% while maintaining 85% precision and outperforming existing baselines on real‑world datasets.

3D reconstructionBenchmarkgenerative AI
0 likes · 15 min read
MIT’s Wave‑Former Reconstructs Fully Occluded Objects with 85% Precision, Boosting Recall to 72%
AI Frontier Lectures
AI Frontier Lectures
Mar 16, 2026 · Artificial Intelligence

How LoGeR Extends 3D Reconstruction to Thousands of Frames with Hybrid Memory

LoGeR, a new long‑context geometric reconstruction framework from DeepMind and UC Berkeley, uses a hybrid memory module combining test‑time‑training (TTT) and sliding‑window attention (SWA) to enable feed‑forward 3D reconstruction over sequences of up to tens of thousands of frames, achieving state‑of‑the‑art accuracy on KITTI, VBR, 7‑Scenes, ScanNetV2 and TUM‑Dynamics benchmarks.

3D reconstructionDeep LearningHybrid Memory
0 likes · 11 min read
How LoGeR Extends 3D Reconstruction to Thousands of Frames with Hybrid Memory
Data Party THU
Data Party THU
Feb 19, 2026 · Artificial Intelligence

How Data Priors and Scene Parameterization Boost 3D Indoor Reconstruction

This thesis investigates the two core challenges of data prior utilization and scene parameterization in multi‑view RGB‑based 3D indoor reconstruction, proposing novel representations and learning‑based methods to improve reconstruction quality, generalization, and applicability across AR, robotics, and autonomous navigation.

3D reconstructionComputer Visiondata priors
0 likes · 8 min read
How Data Priors and Scene Parameterization Boost 3D Indoor Reconstruction
HyperAI Super Neural
HyperAI Super Neural
Dec 6, 2025 · Artificial Intelligence

Quick Look at This Week’s Frontier AI Papers: DeepSeekMath‑V2, MedSAM‑3, SAM 3D, Qwen3‑VL, and M²

This roundup surveys five cutting‑edge AI papers—DeepSeekMath‑V2’s self‑verifiable mathematical reasoning, MedSAM‑3’s promptable medical image and video segmentation, SAM 3D’s single‑image 3D reconstruction, Qwen3‑VL’s high‑capacity vision‑language model, and the M² memory‑mesh transformer for image captioning—highlighting their key methods, benchmarks, and code links.

3D reconstructionImage CaptioningMathematical Reasoning
0 likes · 6 min read
Quick Look at This Week’s Frontier AI Papers: DeepSeekMath‑V2, MedSAM‑3, SAM 3D, Qwen3‑VL, and M²
AI Frontier Lectures
AI Frontier Lectures
Nov 28, 2025 · Artificial Intelligence

How Meta’s SAM 3D Turns a Single Photo into Detailed 3D Models

Meta’s newly released SAM 3 and SAM 3D models enable single‑image 3D reconstruction and promptable segmentation, outperforming prior methods on benchmarks, introducing a shared perception encoder, a Presence Head to reduce hallucinations, and a two‑stage generation pipeline that produces high‑fidelity geometry and texture.

3D reconstructionMetaSAM 3
0 likes · 12 min read
How Meta’s SAM 3D Turns a Single Photo into Detailed 3D Models
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 26, 2025 · Artificial Intelligence

How AI is Revolutionizing 3D Content Creation for Immersive Experiences

The Volcano Engine Multimedia Lab showcases cutting‑edge AI‑driven 3D and VR technologies—including volume video, dual‑Gaussian modeling, topological‑aware representations, and the Beaver3D AIGC model—to lower creation barriers, enable real‑time immersive interaction, and bridge research breakthroughs with industry applications.

3D reconstructionAI-generated 3DAIGC3D
0 likes · 15 min read
How AI is Revolutionizing 3D Content Creation for Immersive Experiences
AI Frontier Lectures
AI Frontier Lectures
Jun 19, 2025 · Industry Insights

What Made SIGGRAPH 2025’s Top Papers Stand Out? A Deep Dive into Award‑Winning Research

SIGGRAPH 2025 announced record‑breaking submissions and awarded five best papers, several honorable mentions, and a Test‑of‑Time prize, highlighting breakthroughs in 3D reconstruction, neural fields, Monte‑Carlo rendering, cloth simulation, and IMU calibration, with detailed author, institution, and technical insights provided.

3D reconstructionAIMonte Carlo rendering
0 likes · 13 min read
What Made SIGGRAPH 2025’s Top Papers Stand Out? A Deep Dive into Award‑Winning Research
AI Frontier Lectures
AI Frontier Lectures
Jun 16, 2025 · Artificial Intelligence

What Do the CVPR 2025 Awards Reveal About the Future of Computer Vision?

The CVPR 2025 awards spotlight groundbreaking work—from the VGGT transformer that predicts full 3D scenes in a single feed‑forward pass to neural inverse rendering that reconstructs geometry from time‑resolved light—offering a comprehensive view of emerging trends, novel architectures, and performance breakthroughs across computer‑vision research.

3D reconstructionCVPR 2025Deep Learning
0 likes · 11 min read
What Do the CVPR 2025 Awards Reveal About the Future of Computer Vision?
AI Frontier Lectures
AI Frontier Lectures
Apr 28, 2025 · Artificial Intelligence

How DP-Recon Uses Diffusion Models to Reconstruct 3D Scenes from Sparse Photos

DP-Recon leverages generative diffusion priors and a visibility‑guided SDS loss to achieve high‑fidelity, compositional 3D scene reconstruction from extremely sparse images, delivering superior geometry, texture, and text‑driven editing capabilities demonstrated on benchmark datasets and real‑world indoor scenarios.

3D reconstructionAIdiffusion models
0 likes · 10 min read
How DP-Recon Uses Diffusion Models to Reconstruct 3D Scenes from Sparse Photos
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Artificial Intelligence

Low‑Cost 3D Reconstruction Using 3D Gaussian Splatting

This article explains how to create high‑quality 3D scenes from ordinary video footage by slicing frames with ffmpeg, extracting camera poses with COLMAP, and applying 3D Gaussian Splatting to replace traditional mesh‑texture pipelines, dramatically lowering equipment costs and data size.

3D reconstructionCOLMAPComputer Vision
0 likes · 6 min read
Low‑Cost 3D Reconstruction Using 3D Gaussian Splatting
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jan 12, 2024 · Artificial Intelligence

Exploring NeRF: From Theory to Real-World 3D Reconstruction Tools

This article introduces Neural Radiance Fields (NeRF) as a cutting‑edge AI technique for high‑quality 3D reconstruction, explains its core principles and advantages, outlines a step‑by‑step building workflow, reviews popular open‑source libraries such as Luma AI, NVIDIA Instant NeRF and NeRFStudio, and offers a forward‑looking summary of its potential and challenges.

3D reconstructionAIComputer Vision
0 likes · 12 min read
Exploring NeRF: From Theory to Real-World 3D Reconstruction Tools
Bilibili Tech
Bilibili Tech
Nov 1, 2023 · Artificial Intelligence

Neural Radiance Fields and Generative Intelligent Media: Recent Advances and Applications

Professor Hu Qiang presented recent progress in Neural Radiance Fields—covering implicit/explicit representations, hybrid models, and solutions for dynamic scenes, cloud‑based and edge‑cloud rendering—while also reviewing generative AI advances such as diffusion‑based text‑to‑image/video/3D, LoRA fine‑tuning, and large‑scale story‑book datasets, highlighting applications in virtual‑real content, smart‑city modeling, and 6‑DoF e‑commerce displays.

3D reconstructionNeRFdiffusion models
0 likes · 14 min read
Neural Radiance Fields and Generative Intelligent Media: Recent Advances and Applications
DataFunTalk
DataFunTalk
Sep 28, 2023 · Artificial Intelligence

Panoramic Image Indoor Layout Estimation Using Vision Transformer (PanoViT)

This article introduces the PanoViT method for indoor layout estimation from panoramic images, covering research background, the transformer‑based architecture with backbone, vision transformer encoder, boundary‑enhancement and 3D loss modules, experimental results, and step‑by‑step usage in ModelScope.

3D reconstructionindoor layout estimationpanoramic vision transformer
0 likes · 7 min read
Panoramic Image Indoor Layout Estimation Using Vision Transformer (PanoViT)
DataFunSummit
DataFunSummit
Aug 24, 2023 · Artificial Intelligence

Panoramic Indoor Layout Estimation with Vision Transformer (PanoViT)

This article introduces the PanoViT model, a vision‑transformer‑based approach for indoor layout estimation from panoramic images, covering its research background, architectural components, experimental results on public datasets, and step‑by‑step usage within ModelScope.

3D reconstructionComputer VisionDeep Learning
0 likes · 8 min read
Panoramic Indoor Layout Estimation with Vision Transformer (PanoViT)
DaTaobao Tech
DaTaobao Tech
Jun 14, 2023 · Artificial Intelligence

Optimizing NeRF for Real-Time Mobile 3D Rendering in Alibaba's Object Drawer

Alibaba’s Taobao engineers detail how they transformed slow, high‑quality NeRF reconstruction into a real‑time mobile solution by combining an Octree‑Tiny‑MLP architecture, SNeRG optimizations, and a high‑frequency voxel reduction that shrank models to ~5 MB and achieved ~6 FPS on low‑end Android phones, targeting sub‑1 MB models and 50 FPS.

3D reconstructionMobile OptimizationNeRF
0 likes · 10 min read
Optimizing NeRF for Real-Time Mobile 3D Rendering in Alibaba's Object Drawer
DaTaobao Tech
DaTaobao Tech
Jun 10, 2022 · Artificial Intelligence

NeRF-Editing: Geometry Editing of Neural Radiance Fields

NeRF‑Editing introduces an interactive framework that lets users freely deform the geometry of neural radiance fields by coupling an explicit mesh with implicit NeRF representations, propagating mesh vertex changes through tetrahedral ARAP optimization to bend rays during rendering, enabling realistic edits and animations on synthetic and real‑world scenes, a first reported at CVPR 2022.

3D reconstructionARAP deformationComputer Vision
0 likes · 6 min read
NeRF-Editing: Geometry Editing of Neural Radiance Fields
Alibaba Terminal Technology
Alibaba Terminal Technology
Mar 16, 2022 · Cloud Computing

Transforming Immersive Streaming with Free-Viewpoint Video: Capture to Cloud

This article explains the end‑to‑end workflow of free‑viewpoint video technology—from multi‑camera on‑site capture and hardware setup, through cloud‑based 3D reconstruction, depth estimation and encoding, to mobile SDK rendering—highlighting the technical challenges and optimizations that enable real‑time immersive streaming.

3D reconstructioncloud renderingfree-viewpoint
0 likes · 10 min read
Transforming Immersive Streaming with Free-Viewpoint Video: Capture to Cloud
Kuaishou Tech
Kuaishou Tech
Sep 17, 2021 · Artificial Intelligence

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

SnowflakeNet introduces a novel Snowflake Point Deconvolution architecture combined with a Skip-Transformer to progressively split seed points, enabling high‑quality point‑cloud completion that preserves fine‑grained geometric details such as smooth surfaces, sharp edges, and corners across dense and sparse datasets.

3D reconstructionComputer VisionDeep Learning
0 likes · 10 min read
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
Suning Technology
Suning Technology
Sep 17, 2020 · Artificial Intelligence

Unlocking Retail Innovation: 3D Digital Storebuilding with Multi‑Camera Vision

This article explores how 3D digital storebuilding integrates multiple visual sensors, GPU acceleration, and advanced camera calibration to create high‑precision, real‑time digital twins of retail spaces, enabling fine‑grained lifecycle management and immersive customer experiences.

3D reconstructionGPU Accelerationcamera calibration
0 likes · 15 min read
Unlocking Retail Innovation: 3D Digital Storebuilding with Multi‑Camera Vision
Didi Tech
Didi Tech
Sep 10, 2020 · Artificial Intelligence

Technical Overview of DiDi's AR Indoor Navigation System

DiDi's AR indoor navigation system addresses GPS unreliability in large indoor venues by using SfM-based 3D reconstruction, robust visual localization with magnetometer/GNSS priors, and sensor fusion with pedestrian dead‑reckoning and deep‑learning heading estimation, cutting passenger pick‑up time by up to 25 % across dozens of airports and malls.

3D reconstructionAR navigationSensor Fusion
0 likes · 19 min read
Technical Overview of DiDi's AR Indoor Navigation System