MR.ScaleMaster

Scale-Consistent Collaborative Mapping
from Crowd-Sourced Monocular Videos

Go2 (Legged Robot)
Scout Mini (Wheeled Robot)
Human #1
Human #2
scroll
Teaser

Fig. 1. Real-world heterogeneous multi-robot mapping with MR.ScaleMaster in a multi-floor indoor environment. Four agents — a legged robot , a wheeled robot , and two handheld cameras — collaboratively build a unified dense 3D map. Inset pairs show inter-agent loop closures across different platforms.

Abstract

Crowd-sourced cooperative mapping from monocular cameras promises scalable 3D reconstruction without specialized sensors, yet remains hindered by two scale-specific failure modes: abrupt scale collapse from false-positive loop closures in repetitive environments, and gradual scale drift over long trajectories and per-robot scale ambiguity that prevent direct multi-session fusion. We present MR.ScaleMaster, a cooperative mapping system that addresses both failure modes through three key mechanisms. A Scale Collapse Alarm rejects spurious loop closures before they corrupt the pose graph. A Sim(3) anchor node formulation generalizes the classical SE(3) framework to explicitly estimate per-session scale, resolving per-robot scale ambiguity and enforcing global scale consistency. A modular, open-source, plug-and-play interface enables any monocular reconstruction model to integrate without backend modification. On KITTI sequences with up to 15 agents, the Sim(3) formulation achieves a 7.2× ATE reduction over the SE(3) baseline, and the alarm rejects all false-positive loops while preserving every valid constraint. We further demonstrate heterogeneous multi-robot dense mapping fusing MASt3R-SLAM, π³, and VGGT-SLAM 2.0 within a single unified map.

What Stops Monocular Collaborative Mapping?

Challenge 1
Abrupt Scale Collapse
False-positive loop closures in repetitive environments (e.g., corridors) force the optimizer to satisfy conflicting constraints by collapsing scale toward zero — or exploding it to infinity. A single erroneous factor poisons the entire pose graph.
→ Scale Collapse Alarm
📏
Challenge 2
Gradual Drift & Inter-Session Scale Mismatch
Scale estimation error accumulates over long trajectories. In multi-robot settings, each agent carries its own unknown scale factor, making direct fusion infeasible without explicit scale alignment across sessions.
→ Sim(3) Anchor Node Optimization
🔌
Challenge 3
Heterogeneous Front-End Integration
The rapidly growing ecosystem of 3D vision foundation models (MASt3R-SLAM, π³, VGGT-SLAM 2.0) each have different scale conventions, making plug-and-play multi-robot deployment difficult without a unified backend.
→ Front-End-Agnostic Modular Architecture

System Overview

System pipeline

Fig. 2. System overview of MR.ScaleMaster. (1) Front-end-agnostic multi-robot partitioning bounds gradual scale drift by distributing a long trajectory across short per-agent sessions. (2) The Scale Collapse Alarm monitors per-session scale evolution and rejects false-positive loop closures before they enter the pose graph. (3) Global Sim(3) anchor node optimization resolves inter-session scale discrepancies, producing a unified dense map.

1
Front-End-Agnostic Partitioning
Each agent runs any monocular front-end (MASt3R-SLAM, π³, VGGT-SLAM 2.0) and transmits keyframe packets (Ik, Tk, Pk). Shorter per-session trajectories bound drift accumulation.
2
Scale Collapse Alarm
A two-criterion alarm — accumulated rotation check and adaptive scale-jump detection — gates degenerate loop factors before they enter the pose graph, with transactional rollback on detection.
3
Global Sim(3) Anchor Optimization
Sim(3) anchor nodes map each session's local frame into a shared global frame. Per-session scale is an explicit degree of freedom, resolved jointly across all inter-session loop constraints via g2o with analytic Jacobians.

Scale Collapse Alarm — Before & After

Exp 1 — Corridor

Before After

Exp 2 — Perceptually Aliased Corridor

Before After

Sim(3) Anchor Node Optimization — Visualization

Qualitative Results

Fig. 4. Effect of scale estimation on KITTI 00 with 15 robots. Ours jointly refines anchor nodes and keyframe poses, yielding globally consistent trajectories.

Heterogeneous Front-End Deployment

KITTI 00 · 15 robots · The same backend handles all front-ends without modification.

Fig. 7. Front-end agnostic validation on KITTI 00 with 15 robots. The heterogeneous deployment (18.74 m) outperforms homogeneous π³, indicating that stronger front-ends contribute corrective constraints that benefit weaker ones.

Video Comparison

KITTI benchmark — select a sequence to compare all front-ends side by side.

MASt3R-SLAM CVPR 2025
VGGT-SLAM 2.0 arXiv 2026
Heterogeneous
MASt3R-SLAM CVPR 2025
VGGT-SLAM 2.0 arXiv 2026
Heterogeneous
MASt3R-SLAM CVPR 2025
VGGT-SLAM 2.0 arXiv 2026
Heterogeneous
MASt3R-SLAM CVPR 2025
VGGT-SLAM 2.0 arXiv 2026
Heterogeneous
MASt3R-SLAM CVPR 2025
VGGT-SLAM 2.0 arXiv 2026
Heterogeneous

Trajectory Comparison

Drag the handle to compare estimated trajectory (left) vs. ground truth (right).

KITTI 00 Estimated
KITTI 00 Ground Truth
Estimated Ground Truth
KITTI 02 Estimated
KITTI 02 Ground Truth
Estimated Ground Truth
KITTI 05 Estimated
KITTI 05 Ground Truth
Estimated Ground Truth
KITTI 07 Estimated
KITTI 07 Ground Truth
Estimated Ground Truth
KITTI 08 Estimated
KITTI 08 Ground Truth
Estimated Ground Truth

Real-World Experiment

Four heterogeneous agents capture the environment independently. MR.ScaleMaster fuses their monocular streams into a single consistent 3D map.

Exp 1 — Indoor & Outdoor Multi-Agent
Go2 (Legged Robot)
Human #1
Scout Mini (Wheeled Robot)
Human #2
MR.ScaleMaster
Collaborative 3D Map

Exp 2 — Indoor Library (Multi-Floor)
Collaborative 3D Map