RL for robotics

12 hypotheses · composite score 0.641

Running
EXP-015
Run #1
|45m ago|12 hypotheses completed
Composite Baseline0.641+3.1% from start
Sampler Agents2 running
SA-001DEPTH
11m 03s

Increasing replay buffer priority exponent alpha from 0.6 to 0.8 to emphasize high-TD-error transitions

73%
SA-002BREADTH
6m 22s

Tuning entropy target from -dim(action) to -0.5*dim(action) for better exploration-exploitation trade-off

42%
Evaluator Agents4 active
Task Success Rate
0.670+0.050
best: 0.670
Sample Efficiency
0.412+0.032
best: 0.412
Policy Stability
0.083-0.011
best: 0.083
Generalization
0.521+0.031
best: 0.521
Experiment LogEXP-015
IDStatusΔ ScoreHypothesisTime
EXP-015KEPT+2.1%TD3 policy update delay: 2 → 4 steps3m ago
KEPT+0.9%Exploration noise std 0.1 → 0.2 with decay schedule18m ago
REVERTED-0.6%Critic network: 2 layers → 3 layers, hidden 25633m ago
KEPTbaselineSAC baseline — auto-tuned entropy, twin Q-networks45m ago
Soft-Stop Checkpoints
Marginal Return Threshold

Avg gain last 10 exp: +0.9%

Eval Convergence

3/4 evals still improving

Compute Budget

78% of budget consumed

Depth/Breadth Balance

Sampler split 2:1