Scale-Consistent Collaborative Mapping
from Crowd-Sourced Monocular Videos
Crowd-sourced cooperative mapping from monocular cameras promises scalable 3D reconstruction without specialized sensors, yet remains hindered by two scale-specific failure modes: abrupt scale collapse from false-positive loop closures in repetitive environments, and gradual scale drift over long trajectories and per-robot scale ambiguity that prevent direct multi-session fusion. We present MR.ScaleMaster, a cooperative mapping system that addresses both failure modes through three key mechanisms. A Scale Collapse Alarm rejects spurious loop closures before they corrupt the pose graph. A Sim(3) anchor node formulation generalizes the classical SE(3) framework to explicitly estimate per-session scale, resolving per-robot scale ambiguity and enforcing global scale consistency. A modular, open-source, plug-and-play interface enables any monocular reconstruction model to integrate without backend modification. On KITTI sequences with up to 15 agents, the Sim(3) formulation achieves a 7.2× ATE reduction over the SE(3) baseline, and the alarm rejects all false-positive loops while preserving every valid constraint. We further demonstrate heterogeneous multi-robot dense mapping fusing MASt3R-SLAM, π³, and VGGT-SLAM 2.0 within a single unified map.
Fig. 2. System overview of MR.ScaleMaster. (1) Front-end-agnostic multi-robot partitioning bounds gradual scale drift by distributing a long trajectory across short per-agent sessions. (2) The Scale Collapse Alarm monitors per-session scale evolution and rejects false-positive loop closures before they enter the pose graph. (3) Global Sim(3) anchor node optimization resolves inter-session scale discrepancies, producing a unified dense map.
Exp 1 — Corridor
Exp 2 — Perceptually Aliased Corridor
Sim(3) Anchor Node Optimization — Visualization
Fig. 4. Effect of scale estimation on KITTI 00 with 15 robots. Ours jointly refines anchor nodes and keyframe poses, yielding globally consistent trajectories.
KITTI 00 · 15 robots · The same backend handles all front-ends without modification.
Fig. 7. Front-end agnostic validation on KITTI 00 with 15 robots. The heterogeneous deployment (18.74 m) outperforms homogeneous π³, indicating that stronger front-ends contribute corrective constraints that benefit weaker ones.
KITTI benchmark — select a sequence to compare all front-ends side by side.
Drag the handle to compare estimated trajectory (left) vs. ground truth (right).
Four heterogeneous agents capture the environment independently. MR.ScaleMaster fuses their monocular streams into a single consistent 3D map.