MR.ScaleMaster: Scale-Consistent Collaborative Mapping

Fig. 1. Real-world heterogeneous multi-robot mapping with MR.ScaleMaster in a multi-floor indoor environment. Four agents — a legged robot ●, a wheeled robot ●, and two handheld cameras ●● — collaboratively build a unified dense 3D map. Inset pairs show inter-agent loop closures across different platforms.

Abstract

Crowd-sourced cooperative mapping from monocular cameras promises scalable 3D reconstruction without specialized sensors, yet remains hindered by two scale-specific failure modes: abrupt scale collapse from false-positive loop closures in repetitive environments, and gradual scale drift over long trajectories and per-robot scale ambiguity that prevent direct multi-session fusion. We present MR.ScaleMaster, a cooperative mapping system that addresses both failure modes through three key mechanisms. A Scale Collapse Alarm rejects spurious loop closures before they corrupt the pose graph. A Sim(3) anchor node formulation generalizes the classical SE(3) framework to explicitly estimate per-session scale, resolving per-robot scale ambiguity and enforcing global scale consistency. A modular, open-source, plug-and-play interface enables any monocular reconstruction model to integrate without backend modification. On KITTI sequences with up to 15 agents, the Sim(3) formulation achieves a 7.2× ATE reduction over the SE(3) baseline, and the alarm rejects all false-positive loops while preserving every valid constraint. We further demonstrate heterogeneous multi-robot dense mapping fusing MASt3R-SLAM, π³, and VGGT-SLAM 2.0 within a single unified map.

What Stops Monocular Collaborative Mapping?

⚡

Challenge 1

Abrupt Scale Collapse

False-positive loop closures in repetitive environments (e.g., corridors) force the optimizer to satisfy conflicting constraints by collapsing scale toward zero — or exploding it to infinity. A single erroneous factor poisons the entire pose graph.

→ Scale Collapse Alarm

📏

Challenge 2

Gradual Drift & Inter-Session Scale Mismatch

Scale estimation error accumulates over long trajectories. In multi-robot settings, each agent carries its own unknown scale factor, making direct fusion infeasible without explicit scale alignment across sessions.

→ Sim(3) Anchor Node Optimization

🔌

Challenge 3

Heterogeneous Front-End Integration

The rapidly growing ecosystem of 3D vision foundation models (MASt3R-SLAM, π³, VGGT-SLAM 2.0) each have different scale conventions, making plug-and-play multi-robot deployment difficult without a unified backend.

→ Front-End-Agnostic Modular Architecture

System Overview

Fig. 2. System overview of MR.ScaleMaster. (1) Front-end-agnostic multi-robot partitioning bounds gradual scale drift by distributing a long trajectory across short per-agent sessions. (2) The Scale Collapse Alarm monitors per-session scale evolution and rejects false-positive loop closures before they enter the pose graph. (3) Global Sim(3) anchor node optimization resolves inter-session scale discrepancies, producing a unified dense map.

Front-End-Agnostic Partitioning

Each agent runs any monocular front-end (MASt3R-SLAM, π³, VGGT-SLAM 2.0) and transmits keyframe packets (I_k, T_k, P_k). Shorter per-session trajectories bound drift accumulation.

Scale Collapse Alarm

A two-criterion alarm — accumulated rotation check and adaptive scale-jump detection — gates degenerate loop factors before they enter the pose graph, with transactional rollback on detection.

Global Sim(3) Anchor Optimization

Sim(3) anchor nodes map each session's local frame into a shared global frame. Per-session scale is an explicit degree of freedom, resolved jointly across all inter-session loop constraints via g2o with analytic Jacobians.

Scale Collapse Alarm — Before & After

Exp 1 — Corridor

Before After

Exp 2 — Perceptually Aliased Corridor

Before After

Sim(3) Anchor Node Optimization — Visualization

Qualitative Results

(a) SE(3) opt. — scale absorbed as rotation

(b) Sim(3) anchor-only — keyframes fixed

(d) Ground truth

Fig. 4. Effect of scale estimation on KITTI 00 with 15 robots. Ours jointly refines anchor nodes and keyframe poses, yielding globally consistent trajectories.

Heterogeneous Front-End Deployment

KITTI 00 · 15 robots · The same backend handles all front-ends without modification.

(a) MASt3R-SLAM

(b) VGGT-SLAM 2.0

(d) Heterogeneous deployment

Fig. 7. Front-end agnostic validation on KITTI 00 with 15 robots. The heterogeneous deployment (18.74 m) outperforms homogeneous π³, indicating that stronger front-ends contribute corrective constraints that benefit weaker ones.