How SpatialLM Turns 3D Point Clouds into Structured Scene Understanding
SpatialLM is a large language model designed for 3D spatial understanding that converts point‑cloud data from videos, RGB‑D images or LiDAR into structured scene descriptions, and this guide explains its architecture, model versions, repository links, and step‑by‑step deployment on Ubuntu with PyTorch.
Overview
SpatialLM is a large language model for 3‑D spatial understanding. It takes point‑cloud data from monocular video, RGB‑D images or LiDAR, encodes geometry, and generates structured scene descriptions (walls, doors, windows, semantically labeled object bounding boxes). The pipeline: an RGB video is reconstructed into a dense point cloud with MASt3R‑SLAM, a point‑cloud encoder compresses the cloud into feature vectors, and the LLM produces scene codes that can be converted to a 3‑D layout.
Two model variants are provided: 1 B parameters (Llama‑based) and 0.5 B parameters (Qwen‑based). Both use a multimodal architecture that fuses unstructured geometry with structured scene representations, enabling spatial reasoning for robotics and autonomous navigation.
Resources
GitHub repository: https://github.com/manycore-research/SpatialLM
ModelScope checkpoints:
1 B: https://modelscope.cn/models/manycore-research/SpatialLM-Llama-1B
0.5 B: https://modelscope.cn/models/manycore-research/SpatialLM-Qwen-0.5B
Deployment (Ubuntu 22.04, PyTorch 2.5.1, Python 3.12, CUDA 12.4)
Install system dependencies:
conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehashInstall Poetry and the project dependencies (the original pyproject.toml has been adjusted for Python 3.12):
pip install poetry && poetry config virtualenvs.create false --local
poetry installCompile and install TorchSparse (required for sparse 3‑D convolutions):
pip install git+https://github.com/mit-han-lab/torchsparse.git
# or using the provided Poetry task
poe install-torchsparseDownload a test point‑cloud (PLY) from the official dataset:
huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .Download the desired model checkpoint (example uses the 1 B model, ~2.6 GB):
modelscope download manycore-research/SpatialLM-Llama-1B --local_dir ./manycore-research/SpatialLM-Llama-1BRun inference to generate a textual description of the scene:
python inference.py --point_cloud ./pcd/scene0000_00.ply --output scene0000_00.txt --model_path ./manycore-research/SpatialLM-Llama-1BVisualize the result (install rerun-sdk if not already present):
python visualize.py --point_cloud ./pcd/scene0000_00.ply --layout scene0000_00.txt --save scene0000_00.rrd
rerun scene0000_00.rrdIf the cloud environment cannot display the visualization, copy the generated .rrd file to a local machine and run the rerun command after installing rerun-sdk .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
