Tagged articles
7 articles
Page 1 of 1
Machine Heart
Machine Heart
May 5, 2026 · Artificial Intelligence

Monocular Open‑Vocabulary Occupancy Prediction Sets New SOTA for Indoor 3D Scenes (CVPR 2026 Oral)

The paper introduces LegoOcc, a monocular open‑vocabulary occupancy framework that unifies geometry and semantics via language‑embedded Gaussians, uses Poisson‑based aggregation and progressive temperature decay, and achieves over twice the previous mIoU on Occ‑ScanNet while running at 22.47 FPS, making it well suited for embodied robots.

3D visionCVPR 2026Monocular
0 likes · 12 min read
Monocular Open‑Vocabulary Occupancy Prediction Sets New SOTA for Indoor 3D Scenes (CVPR 2026 Oral)
HyperAI Super Neural
HyperAI Super Neural
Dec 22, 2025 · Artificial Intelligence

DA3 Enables Arbitrary‑View 3D Reconstruction with a Single Transformer

The ByteDance‑Seed team introduces Depth Anything 3 (DA3), a minimalist visual‑geometry model that uses a vanilla Transformer backbone and depth‑ray representation to jointly predict depth and camera pose from any number of images, achieving state‑of‑the‑art performance with a 35.7% gain in pose accuracy and a 23.6% improvement in geometric precision over prior methods.

3D visionDA3Depth estimation
0 likes · 6 min read
DA3 Enables Arbitrary‑View 3D Reconstruction with a Single Transformer
AI Frontier Lectures
AI Frontier Lectures
Dec 17, 2025 · Artificial Intelligence

Can OmniVGGT Unlock Multi‑Modal 3D Vision with Any Number of Inputs?

OmniVGGT introduces a flexible omni‑modality driven transformer that can ingest arbitrary numbers of geometric cues such as depth maps and camera parameters, achieving state‑of‑the‑art performance on diverse 3D tasks while keeping inference speed comparable to its RGB‑only predecessor.

3D visionGeometryOmniVGGT
0 likes · 13 min read
Can OmniVGGT Unlock Multi‑Modal 3D Vision with Any Number of Inputs?
Baidu Tech Salon
Baidu Tech Salon
Apr 14, 2023 · Artificial Intelligence

How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development

This article examines the challenges of 3D vision data acquisition and explains how Baidu's PaddleDepth and Paddle3D toolkits provide low‑cost depth collection, super‑resolution, and end‑to‑end perception pipelines, showcasing performance on KITTI and Middlebury datasets with code examples.

3D visionComputer VisionDepth estimation
0 likes · 12 min read
How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development
AntTech
AntTech
Apr 12, 2023 · Artificial Intelligence

Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights

The Ant Technology Research Institute’s Interactive Intelligence Lab announced that 13 of its papers were accepted at CVPR 2023, alongside other recent achievements in generative models and 3D vision, highlighting collaborations with top universities and summarizing the lab’s contributions to artificial intelligence research.

3D visionCVPRComputer Vision
0 likes · 6 min read
Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights
Kuaishou Large Model
Kuaishou Large Model
Sep 30, 2021 · Artificial Intelligence

How SnowflakeNet Revolutionizes Point Cloud Completion with Skip‑Transformer

SnowflakeNet introduces a novel Snowflake Point Deconvolution architecture combined with a Skip‑Transformer to explicitly split and refine points, enabling high‑quality reconstruction of fine local geometry in incomplete point clouds and outperforming prior methods on both dense and sparse benchmarks.

3D visionDeep LearningSkip-Transformer
0 likes · 11 min read
How SnowflakeNet Revolutionizes Point Cloud Completion with Skip‑Transformer