Tagged articles

3D vision

9 articles · Page 1 of 1

Jul 23, 2026 · Artificial Intelligence

Turning Global 3D Maps into a UAV Training Ground: AirZoo’s Unified 3D Vision Benchmark

AirZoo introduces a large‑scale, globally‑distributed UAV dataset and automated AirSim‑Cesium‑Unreal pipeline that provides pixel‑level RGB‑D images, precise 6‑DoF poses, and diverse weather conditions, enabling unified training and evaluation for aerial image retrieval, cross‑view matching, and multi‑view 3D reconstruction, with demonstrated performance gains on several state‑of‑the‑art models.

3D visionAirSimCesium

0 likes · 11 min read

Turning Global 3D Maps into a UAV Training Ground: AirZoo’s Unified 3D Vision Benchmark

Machine Heart

Jun 9, 2026 · Artificial Intelligence

Why Standard Vision‑Language Models + Scale Data Beat Specialized 3D Vision Designs (VLM³)

Meta’s VLM³ demonstrates that a plain vision‑language model, when trained on large‑scale data with simple camera‑focal‑length and pixel‑space normalization, matches or surpasses expert 3D vision models across monocular depth estimation, object‑level understanding, pixel‑matching and camera‑pose tasks, eliminating the need for task‑specific architectures, loss functions, data augmentations or regression formulations.

3D visionDepth EstimationMeta

0 likes · 6 min read

Why Standard Vision‑Language Models + Scale Data Beat Specialized 3D Vision Designs (VLM³)

Geek Labs

May 19, 2026 · Artificial Intelligence

Four Must‑Try Open‑Source Projects for AI Coding, 3D Vision, Code Optimization, and Desktop Beautification

The article introduces four popular GitHub open‑source projects—a design guide for AI agents, a Python code‑complexity optimizer, a Meta/Oxford 3D scene reconstruction model, and a C++ implementation of Wallpaper Engine—detailing their features, usage, and resource links.

3D visionAI agentsOpen Source

0 likes · 8 min read

Four Must‑Try Open‑Source Projects for AI Coding, 3D Vision, Code Optimization, and Desktop Beautification

Machine Heart

May 5, 2026 · Artificial Intelligence

Monocular Open‑Vocabulary Occupancy Prediction Sets New SOTA for Indoor 3D Scenes (CVPR 2026 Oral)

The paper introduces LegoOcc, a monocular open‑vocabulary occupancy framework that unifies geometry and semantics via language‑embedded Gaussians, uses Poisson‑based aggregation and progressive temperature decay, and achieves over twice the previous mIoU on Occ‑ScanNet while running at 22.47 FPS, making it well suited for embodied robots.

3D visionCVPR 2026Monocular

0 likes · 12 min read

Monocular Open‑Vocabulary Occupancy Prediction Sets New SOTA for Indoor 3D Scenes (CVPR 2026 Oral)

HyperAI Super Neural

Dec 22, 2025 · Artificial Intelligence

DA3 Enables Arbitrary‑View 3D Reconstruction with a Single Transformer

The ByteDance‑Seed team introduces Depth Anything 3 (DA3), a minimalist visual‑geometry model that uses a vanilla Transformer backbone and depth‑ray representation to jointly predict depth and camera pose from any number of images, achieving state‑of‑the‑art performance with a 35.7% gain in pose accuracy and a 23.6% improvement in geometric precision over prior methods.

3D visionDA3Depth Estimation

0 likes · 6 min read

DA3 Enables Arbitrary‑View 3D Reconstruction with a Single Transformer

AI Frontier Lectures

Dec 17, 2025 · Artificial Intelligence

Can OmniVGGT Unlock Multi‑Modal 3D Vision with Any Number of Inputs?

OmniVGGT introduces a flexible omni‑modality driven transformer that can ingest arbitrary numbers of geometric cues such as depth maps and camera parameters, achieving state‑of‑the‑art performance on diverse 3D tasks while keeping inference speed comparable to its RGB‑only predecessor.

3D visionGeometryMulti-modal

0 likes · 13 min read

Can OmniVGGT Unlock Multi‑Modal 3D Vision with Any Number of Inputs?

Baidu Tech Salon

Apr 14, 2023 · Artificial Intelligence

How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development

This article examines the challenges of 3D vision data acquisition and explains how Baidu's PaddleDepth and Paddle3D toolkits provide low‑cost depth collection, super‑resolution, and end‑to‑end perception pipelines, showcasing performance on KITTI and Middlebury datasets with code examples.

3D visionDepth EstimationOpen Source

0 likes · 12 min read

How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development

AntTech

Apr 12, 2023 · Artificial Intelligence

Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights

The Ant Technology Research Institute’s Interactive Intelligence Lab announced that 13 of its papers were accepted at CVPR 2023, alongside other recent achievements in generative models and 3D vision, highlighting collaborations with top universities and summarizing the lab’s contributions to artificial intelligence research.

3D visionCVPRGenerative Models

0 likes · 6 min read

Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights

Kuaishou Large Model

Sep 30, 2021 · Artificial Intelligence

How SnowflakeNet Revolutionizes Point Cloud Completion with Skip‑Transformer

SnowflakeNet introduces a novel Snowflake Point Deconvolution architecture combined with a Skip‑Transformer to explicitly split and refine points, enabling high‑quality reconstruction of fine local geometry in incomplete point clouds and outperforming prior methods on both dense and sparse benchmarks.

3D visionDeep LearningSkip-Transformer

0 likes · 11 min read

How SnowflakeNet Revolutionizes Point Cloud Completion with Skip‑Transformer