How UniSHARP Enables One‑Shot Monocular 3DGS Across All Camera Types

UniSHARP is an open‑source monocular 3D Gaussian Splatting model that, with a single input image, instantly produces a high‑quality Gaussian point cloud for perspective, wide‑angle, fisheye and 360° cameras, eliminating the need for multi‑view inputs or per‑scene optimization.

Machine Heart
Machine Heart
Machine Heart
How UniSHARP Enables One‑Shot Monocular 3DGS Across All Camera Types

Problem and Limitations of Existing 3DGS

Most monocular 3D Gaussian Splatting (3DGS) methods assume a pinhole camera model and are trained on narrow‑field perspective images. When applied to fisheye or equirectangular (ERP) panoramas they generalize poorly. Many approaches also require multiple views or per‑scene optimization, which is infeasible when only a single image is available.

Why Simple Fixes Fail

Two intuitive alternatives were examined. (1) Fine‑tuning a perspective‑trained model to larger fields of view fails because the network is bound to normalized device coordinates of a pinhole camera and cannot correctly predict geometry under distortion. (2) Cutting a wide‑angle or panoramic image into many perspective tiles adds computational overhead and produces visible stitching seams and geometric discontinuities.

Ray‑Based Unified Representation

UniSHARP abandons the pinhole assumption. For each pixel it predicts a unit ray direction and a radial distance d. The 3D point is p = d·r̂. This places every Gaussian primitive in a common metric space regardless of camera type (perspective, fisheye, or ERP). The design follows UniK3D and enables native adaptation to arbitrary fields of view without tiling. Rendering a full ERP with SHARP requires six cube faces and exhibits stitching artifacts, whereas UniSHARP renders a coherent view directly.

Geometry‑Anchored Gaussians + Feature‑Conditioned Residuals

Within the ray grid UniSHARP first builds a two‑layer Geometry‑Anchored Gaussian (GAG) structure. The first layer aligns with visible surfaces; the second layer captures occluded regions and high‑frequency details, providing a stable geometric foundation. 2D semantic features and 3D geometric features are fused to predict Feature‑Conditioned Gaussian Residuals, which refine the Gaussian spheres for fine‑grained appearance. For ERP inputs the authors add spherical Gaussian initialization and a distortion‑aware dropout to mitigate severe distortion, improving results on datasets such as HM3D.

Mixed‑Camera Training and Pose‑Free Inference

Training mixes data from perspective (RealEstate10K, DL3DV, WildRGB‑D), fisheye (ScanNet++ Fisheye) and panoramic (HM3D, OmniRooms) sources using the same ray interface; no camera‑specific branches are introduced, so a single network processes all samples. At inference time the Pose‑Free mode infers camera type and intrinsics from the predicted ray field, eliminating the need for calibrated parameters.

OmniRooms Dataset and FoV‑Layered Benchmark

The authors released the OmniRooms dataset to evaluate from 60° to 360° fields of view. It contains approximately 300 000 equirectangular images (1024×2048) with depth maps across 16 large indoor scenes. For each anchor pose the dataset renders one central camera and 29 nearby cameras on a 0.5 m voxel grid, yielding 30 viewpoints per anchor for 3DGS‑style reconstruction.

16 large indoor scenes

≈300 000 ERP images (1024×2048) with depth

30 viewpoints per anchor (1 center + 29 offsets) on a 0.5 m voxel grid

Benchmark Results

On perspective benchmarks UniSHARP matches or exceeds prior methods such as SHARP and Flash3D and achieves the highest PSNR on zero‑shot Tanks & Temples. On panoramic benchmarks the advantage is larger: visual comparisons show artifact‑free renders where SHARP exhibits stitching seams. Quantitative tables (not reproduced) report consistent PSNR gains across the FoV‑layered benchmark.

Open‑Source Release

Training and testing code, model weights, the OmniRooms dataset, and an online demo are publicly available at https://github.com/Insta360-Research-Team/UniSHARP and on HuggingFace.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

3D Gaussian Splattingheterogeneous camerasmonocular view synthesisOmniRooms datasetUniSHARP
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.