How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding

SpatialLM is a large language model designed for 3D spatial understanding that converts point‑cloud data from videos, RGB‑D images or LiDAR into structured scene descriptions, and this guide explains its architecture, model versions, repository links, and step‑by‑step deployment on Ubuntu with PyTorch.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding

Overview

SpatialLM is a large language model for 3‑D spatial understanding. It takes point‑cloud data from monocular video, RGB‑D images or LiDAR, encodes geometry, and generates structured scene descriptions (walls, doors, windows, semantically labeled object bounding boxes). The pipeline: an RGB video is reconstructed into a dense point cloud with MASt3R‑SLAM, a point‑cloud encoder compresses the cloud into feature vectors, and the LLM produces scene codes that can be converted to a 3‑D layout.

Two model variants are provided: 1 B parameters (Llama‑based) and 0.5 B parameters (Qwen‑based). Both use a multimodal architecture that fuses unstructured geometry with structured scene representations, enabling spatial reasoning for robotics and autonomous navigation.

Resources

GitHub repository: https://github.com/manycore-research/SpatialLM

ModelScope checkpoints:

1 B: https://modelscope.cn/models/manycore-research/SpatialLM-Llama-1B

0.5 B: https://modelscope.cn/models/manycore-research/SpatialLM-Qwen-0.5B

SpatialLM overview
SpatialLM overview

Deployment (Ubuntu 22.04, PyTorch 2.5.1, Python 3.12, CUDA 12.4)

Install system dependencies:

conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash

Install Poetry and the project dependencies (the original pyproject.toml has been adjusted for Python 3.12):

pip install poetry && poetry config virtualenvs.create false --local
poetry install

Compile and install TorchSparse (required for sparse 3‑D convolutions):

pip install git+https://github.com/mit-han-lab/torchsparse.git
# or using the provided Poetry task
poe install-torchsparse

Download a test point‑cloud (PLY) from the official dataset:

huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .

Download the desired model checkpoint (example uses the 1 B model, ~2.6 GB):

modelscope download manycore-research/SpatialLM-Llama-1B --local_dir ./manycore-research/SpatialLM-Llama-1B

Run inference to generate a textual description of the scene:

python inference.py --point_cloud ./pcd/scene0000_00.ply --output scene0000_00.txt --model_path ./manycore-research/SpatialLM-Llama-1B

Visualize the result (install rerun-sdk if not already present):

python visualize.py --point_cloud ./pcd/scene0000_00.ply --layout scene0000_00.txt --save scene0000_00.rrd
rerun scene0000_00.rrd

If the cloud environment cannot display the visualization, copy the generated .rrd file to a local machine and run the rerun command after installing rerun-sdk .

SpatialLM inference result
SpatialLM inference result
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Multimodal AIPythonlarge language modelPyTorchdeployment guide3D point cloudSpatialLM
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.