How UniSHARP Enables One‑Shot Monocular 3DGS Across All Camera Types
UniSHARP is an open‑source monocular 3D Gaussian Splatting model that, with a single input image, instantly produces a high‑quality Gaussian point cloud for perspective, wide‑angle, fisheye and 360° cameras, eliminating the need for multi‑view inputs or per‑scene optimization.
Problem and Limitations of Existing 3DGS
Most monocular 3D Gaussian Splatting (3DGS) methods assume a pinhole camera model and are trained on narrow‑field perspective images. When applied to fisheye or equirectangular (ERP) panoramas they generalize poorly. Many approaches also require multiple views or per‑scene optimization, which is infeasible when only a single image is available.
Why Simple Fixes Fail
Two intuitive alternatives were examined. (1) Fine‑tuning a perspective‑trained model to larger fields of view fails because the network is bound to normalized device coordinates of a pinhole camera and cannot correctly predict geometry under distortion. (2) Cutting a wide‑angle or panoramic image into many perspective tiles adds computational overhead and produces visible stitching seams and geometric discontinuities.
Ray‑Based Unified Representation
UniSHARP abandons the pinhole assumption. For each pixel it predicts a unit ray direction r̂ and a radial distance d. The 3D point is p = d·r̂. This places every Gaussian primitive in a common metric space regardless of camera type (perspective, fisheye, or ERP). The design follows UniK3D and enables native adaptation to arbitrary fields of view without tiling. Rendering a full ERP with SHARP requires six cube faces and exhibits stitching artifacts, whereas UniSHARP renders a coherent view directly.
Geometry‑Anchored Gaussians + Feature‑Conditioned Residuals
Within the ray grid UniSHARP first builds a two‑layer Geometry‑Anchored Gaussian (GAG) structure. The first layer aligns with visible surfaces; the second layer captures occluded regions and high‑frequency details, providing a stable geometric foundation. 2D semantic features and 3D geometric features are fused to predict Feature‑Conditioned Gaussian Residuals, which refine the Gaussian spheres for fine‑grained appearance. For ERP inputs the authors add spherical Gaussian initialization and a distortion‑aware dropout to mitigate severe distortion, improving results on datasets such as HM3D.
Mixed‑Camera Training and Pose‑Free Inference
Training mixes data from perspective (RealEstate10K, DL3DV, WildRGB‑D), fisheye (ScanNet++ Fisheye) and panoramic (HM3D, OmniRooms) sources using the same ray interface; no camera‑specific branches are introduced, so a single network processes all samples. At inference time the Pose‑Free mode infers camera type and intrinsics from the predicted ray field, eliminating the need for calibrated parameters.
OmniRooms Dataset and FoV‑Layered Benchmark
The authors released the OmniRooms dataset to evaluate from 60° to 360° fields of view. It contains approximately 300 000 equirectangular images (1024×2048) with depth maps across 16 large indoor scenes. For each anchor pose the dataset renders one central camera and 29 nearby cameras on a 0.5 m voxel grid, yielding 30 viewpoints per anchor for 3DGS‑style reconstruction.
16 large indoor scenes
≈300 000 ERP images (1024×2048) with depth
30 viewpoints per anchor (1 center + 29 offsets) on a 0.5 m voxel grid
Benchmark Results
On perspective benchmarks UniSHARP matches or exceeds prior methods such as SHARP and Flash3D and achieves the highest PSNR on zero‑shot Tanks & Temples. On panoramic benchmarks the advantage is larger: visual comparisons show artifact‑free renders where SHARP exhibits stitching seams. Quantitative tables (not reproduced) report consistent PSNR gains across the FoV‑layered benchmark.
Open‑Source Release
Training and testing code, model weights, the OmniRooms dataset, and an online demo are publicly available at https://github.com/Insta360-Research-Team/UniSHARP and on HuggingFace.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
