
Overview This release presents a controlled empirical study of diffusion-based action-sequence modeling for manipulation, focusing on execution strategy. We compare: Gaussian MLP Behavior Cloning (BC) Diffusion Policy (Open-loop execution) Diffusion Policy (Receding-horizon execution) All experiments are: Deterministic Multi-seed (3 seeds) CPU reproducible Fully scripted end-to-end Environment: gym_pusht/PushT-v0Observation: Low-dimensional stateHorizon: 8DDIM steps (K): 10Diffusion T: 50 (linear schedule) Multi-Seed Results (3 Seeds) | Method | Return (mean ± std) | |-----------------------|--------------------| | Gaussian BC | 3.98 ± 1.60 | | Diffusion (Open-loop) | 7.70 ± 1.99 | | Diffusion (Receding)| **7.72 ± 1.98 | Per-seed returns: | Seed | BC | Diff Open | Diff Receding | |------|-----|-----------|---------------| | 0 | 1.97| 5.65 | 5.68 | | 1 | 4.09| 7.05 | 7.07 | | 2 | 5.88| 10.40 | 10.41 | Key Observations Diffusion-based sequence modeling consistently outperforms Gaussian BC in this small-data PushT setting. Open-loop and receding-horizon execution strategies produce nearly identical performance under fixed horizon (H=8) and DDIM steps (K=10). Execution strategy differences do not materially manifest in this short-horizon regime. Note: success_rate is not used as the primary metric in this setup, as PushT does not expose a binary success signal under the current evaluator configuration. Return is the primary metric. Reproducibility To reproduce the multi-seed experiment: python scripts/reproduce_multiseed.py \ --env_id gym_pusht/PushT-v0 \ --seeds 0 1 2 \ --episodes_record 20 \ --max_steps_record 200 \ --steps_bc 3000 \ --steps_diff 5000 \ --episodes_eval 20 \ --max_steps_eval 200 \ --results_root results/rq_exec_mode \ --device cpu python scripts/aggregate_results.py \ --results_root results/rq_exec_mode \ --seeds 0 1 2 Plots are generated via: python scripts/plot_summary.py python scripts/plot_per_seed.py Scope & Limitations Single environment (PushT) Low-dimensional state input Fixed horizon (H=8) Fixed sampler steps (K=10) No vision encoder No latency benchmarking No sim-to-real claims This release isolates execution strategy under controlled conditions.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
