Monocular Dynamic Gaussian Splatting Overfits: A Diagnostic Study

TL;DR

Dynamic 3D Gaussian Splatting overfits by 6.18 dB on average on D-NeRF. A systematic ablation traces >80% of this gap to the split operation of Adaptive Density Control. Across 9 ablation conditions we see a log-linear count–gap correlation (r = 0.995). Then EER—a k-NN elastic-strain penalty on per-Gaussian deformation—breaks this correlation: it reduces the gap by 40.8% while increasing the Gaussian count by 85%. Our full combination closes 57.4% of the baseline gap.

6.18 dBbaseline gap (8 D-NeRF scenes)

99.72%EER strain reduction (8-scene mean)

40.8%EER gap reduction (D-NeRF)

15.9%EER gap reduction (HyperNeRF, real)

57.4%full combination

Count vs gap: EER breaks the log-linear correlation. — **The count–gap paradigm shift.** Ablations (gray) follow a log-linear trend (r = 0.995, bootstrap 95% CI [0.993, 1.000]). EER (green) uses *more* Gaussians yet overfits *less*. The correlation holds within 41 non-EER configurations (r = 0.987) — EER is the only lever we found that breaks it.

Abstract

Dynamic 3D Gaussian Splatting achieves impressive novel-view synthesis on monocular video by coupling a deformable point cloud with Adaptive Density Control (ADC), but exhibits a severe train–test generalization gap. On the D-NeRF benchmark (8 synthetic scenes) we measure an average gap of 6.18 dB (up to 11 dB per scene) and, through a systematic ablation of every ADC sub-operation (split, clone, prune, frequency, threshold, schedule), identify splitting as the dominant pathway.

Our central finding is that Elastic Energy Regularization (EER)—an isotropic k-NN penalty on the relative deformation of neighboring Gaussians—breaks the log-linear count–gap correlation observed across ablations. This reframes overfitting from a capacity problem to an incoherent deformation problem. We evaluate 48 configurations spanning four axes of control (capacity, deformation complexity, view-dependent encoding, stochastic regularization); capacity control and coherence regularization compound, and GAD+LogiGrow+PTDrop+EER closes 57.4% of the baseline gap.

All findings are on synthetic D-NeRF scenes; real-world validation (HyperNeRF, Deformable-3DGS cross-architecture) is partial and still in progress — see the cross-architecture section.

Key Findings

1. Split drives >80% of overfitting

Disabling split collapses both the cloud (2K vs 44K Gaussians) and the gap (1.15 dB vs 6.18 dB). Disabling pruning changes nothing.

2. Count–gap correlation is real but incomplete

r = 0.995 on 9 ablation conditions, holding within both sub-clusters (r = 0.998 on high-count, 0.95 on low-count) and across 41 non-EER configurations (r = 0.987).

3. EER breaks the correlation

+85% Gaussians, −40.8% gap. At the per-Gaussian level, EER reduces deformation strain by 99.6% on Lego, 99.8% on T-Rex, 99.6% on Hellwarrior.

4. Orthogonal axes compound

GAD+EER = 48.2% reduction. Adding LogiGrow + PTDrop = 57.4%, the only configuration in our sweep to more than halve the gap.

Method ranking across 48 configurations

Method ranking by gap reduction. — Every EER-containing method dominates. View-dependent regularization (ChromReg, OEM) has no meaningful effect.

Pareto frontier: quality vs overfitting

Ablation summary

Gap grows with training, not with iterations alone

Overfitting gap over training iterations. — Train–test PSNR gap over training (mean ± std across 8 scenes). Baseline grows to ~6 dB; disabling split holds it at ~1 dB. The divergence tracks the densification window (iters 500–15,000).

Why early stopping fails: densification is front-loaded

Front-loaded densification bar chart. — 84–89% of cloud growth happens before iter 7,500. Stopping densification at iter 7,500 (A6) only trims the count by 10% and has essentially no effect on the gap — confirming that mitigation must modulate densification *from the start*, not truncate it at the end.

Dose–response across all methods

Dose-response curves for all methods. — Each panel sweeps a method's strength parameter. EER shows the steepest dose–response; ChromReg and OEM are essentially flat, confirming that view-dependent regularization is not the right axis.

Method Taxonomy

We organize 8 mitigation methods along 3 axes of control, plus stochastic regularization:

Capacity (how many Gaussians)

GAD — BIC-motivated adaptive threshold
LogiGrow — Verhulst logistic carrying capacity
SGD — spectral gating on loss FFT

Deformation coherence (how deformations behave)

EER ★ — elastic strain energy on k-NN graph
STSR — H¹ Sobolev on the deformation in time

View-dependent encoding

ChromReg — penalize high-degree SH coefficients
OEM — opacity entropy maximization

Stochastic regularization

PTDrop — temporal-consistency-weighted dropout

GAD: a BIC-motivated threshold schedule

We adapt the per-iteration gradient threshold as

τ_GAD(t) = τ_base · (1 + λ · K(t) / (N · Δℓ_ema(t)))

where K(t) is the current count, N is the number of training pixels, and Δℓ_ema is an EMA of the per-iteration loss improvement. λ is the single tunable knob. The mapping from BIC to this formula is a heuristic (see paper, §6.2); the empirical diminishing-returns exponent we measure (α ≈ 0.04) is too mild to justify the often-quoted O((N/λ)^1/4) growth bound, so we present the bound qualitatively as "sublinear in N".

EER: k-NN elastic strain energy

For a subset of Gaussians i and their k=8 canonical neighbors j, we penalize

ℒ_EER = mean_i,j ‖ u(x_i, t) − u(x_j, t) ‖² / (‖ x_i − x_j ‖² + ε)

where u(x, t) is the deformation offset at time t. This is the discrete elastic strain — physically the correct choice for linear elasticity (Hooke's law penalizes ∂u/∂x, not ∂u). In canonical space the k-NN graph is stable; we rebuild it every 500 iterations and apply a cosine ramp from iteration 3K to 10K.

Interactive 3D Deformation Viewer

Explore the deformation field in 3D. Left panel: baseline (incoherent per-Gaussian deformation). Right panel: EER (coherent elastic deformation). Use the time slider to animate — watch how baseline Gaussians scatter chaotically at novel timesteps while EER maintains spatial coherence. Drag to orbit; scroll to zoom. Cameras are linked between panels.

12,000 highest-opacity Gaussians per scene, 11 timesteps (t=0.0 to 1.0). Color by displacement magnitude (viridis) or strain (inferno). Requires serving via HTTP (python -m http.server 8000).

What EER Actually Does to the Deformation Field

For every D-NeRF scene, we load the trained 4DGS model, query the per-Gaussian deformation at 4 timesteps, and plot the distribution of per-Gaussian strain ε_i = mean_j ‖u_i−u_j‖² / ‖x_i−x_j‖² over its 8 canonical neighbors.

Lego deformation field. — **Lego**: strain ↓ 99.62%

T-Rex deformation field. — **T-Rex**: strain ↓ 99.80%

Hellwarrior deformation field. — **Hellwarrior**: strain ↓ 99.58%

Bouncing-balls deformation field. — **Bouncing-balls**: strain ↓ 99.90%

Jumping-jacks deformation field. — **Jumping-jacks**: strain ↓ 99.84%

Stand-up deformation field. — **Stand-up**: strain ↓ 99.82%

Mutant deformation field. — **Mutant**: strain ↓ 99.64%

Hook deformation field. — **Hook**: strain ↓ 99.59%

Each panel shows (left) canonical cloud colored by displacement magnitude, (middle) a subsampled quiver of u(x, t=0.5), (right) the per-Gaussian strain histogram. Baseline is bimodal with heavy tails; EER collapses the distribution by two orders of magnitude. This is the direct mechanism behind EER's overfitting reduction.

Strain reduction on every scene

Scene	Baseline ε	EER ε	Reduction
bouncingballs	2.835	0.00296	99.90%
hellwarrior	5.785	0.02408	99.58%
hook	2.627	0.01090	99.59%
jumpingjacks	6.772	0.01106	99.84%
lego	1.573	0.00594	99.62%
mutant	1.323	0.00481	99.64%
standup	3.686	0.00667	99.82%
trex	3.715	0.00738	99.80%
mean (n=8)	3.539	0.00922	99.72%

Measured at iter 20,000 on trained 4DGS checkpoints. Strain ε is mean over k=8 canonical neighbors of ‖u_i−u_j‖² / ‖x_i−x_j‖², averaged over 4 timesteps (t=0, 0.25, 0.5, 0.75).

EER: The Paradigm Shift

EER three-panel analysis. — (a) EER λ sweep: consistent gap reduction across scenes. (b) EER *increases* final Gaussian count — the reverse of capacity control. (c) Per-scene gap reduction: consistent across all 8 scenes, including the pathological Lego and Hellwarrior.

Combination additivity plot. — Combinations are *super-additive*: GAD+EER exceeds the sum of individual reductions, confirming capacity and coherence target orthogonal failure modes.

Real-World Validation (HyperNeRF)

EER transfers to real monocular video. On HyperNeRF chickchicken, with 4DGS and the same λ=0.05 tuned on D-NeRF, EER reduces every generalization-gap metric:

Metric	Baseline	EER λ=0.05	Reduction
PSNR gap (dB)	5.48	4.61	15.9%
SSIM gap	0.067	0.051	23.7%
LPIPS gap	0.030	0.020	33.4%
Test PSNR (dB)	26.42	26.22	−0.20

4DGS on HyperNeRF chickchicken, 14K iterations (stock HyperNeRF config), single run on an RTX 3070. The same λ=0.05 used on D-NeRF transfers directly — no per-dataset tuning required. Multi-scene HyperNeRF + iPhone + Nerfies are left as future work.

Cross-Architecture Validation (Deformable-3DGS)

Main experiments are on 4DGS (HexPlane deformation). We ported EER and GAD to Deformable-3DGS (MLP deformation) and ran baseline + EER on three D-NeRF scenes for 20K iterations.

Phase 1: direct-transfer test at D-NeRF-tuned λ=0.05

Scene	Baseline gap	EER λ=0.05 gap	Reduction	ΔPSNR
lego	13.15 dB	13.56 dB	-3.1%	-0.02 dB
trex	1.50 dB	1.81 dB	-20.8%	-0.38 dB
hellwarrior	4.08 dB	3.87 dB	+5.2%	-0.22 dB

Direct transfer at λ=0.05 is poor (mean −6% reduction). Why? Deformable-3DGS trains with L1+0.2·(1−SSIM) vs.\ 4DGS's pure L1 — the loss magnitude is roughly 3× larger and λ=0.05 is therefore under-regularized. Our dimensional-analysis note (paper §6.2) predicts the correct λ for Deformable-3DGS is ≈ 0.15–0.30. Testing this directly:

Phase 2: λ sweep on Deformable-3DGS Lego (dimensional-analysis test)

λ	Gap (dB)	Train PSNR	Test PSNR	ΔTest	Reduction
0 (baseline)	13.15	38.38	25.23	—	—
0.05	13.56	38.77	25.21	−0.02	−3.1%
0.15	10.23	35.55	25.33	+0.10	+22.3%
0.30	8.26	33.60	25.34	+0.11	+37.2%
0.60	7.82	33.21	25.39	+0.16	+40.6%

Clean monotonic dose-response. At every λ ≥ 0.15, EER simultaneously reduces the gap by 22–41% AND slightly improves test PSNR — a rare regularizer that gives you both. The coherence mechanism transfers across deformation architectures; the hyperparameter requires per-architecture calibration, exactly as the dimensional-analysis note predicted.

BibTeX

@article{droby2026monodygs,
  author  = {Ahmad Droby},
  title   = {Monocular Dynamic Gaussian Splatting Overfits:
             A Diagnostic Study of Densification in 4D Gaussian Fields},
  journal = {arXiv preprint},
  year    = {2026}
}