12 hypotheses · composite score 0.641
Increasing replay buffer priority exponent alpha from 0.6 to 0.8 to emphasize high-TD-error transitions
Tuning entropy target from -dim(action) to -0.5*dim(action) for better exploration-exploitation trade-off
| ID | Status | Δ Score | Hypothesis | Time |
|---|---|---|---|---|
| EXP-015 | KEPT | +2.1% | TD3 policy update delay: 2 → 4 steps | 3m ago |
| ╰ | KEPT | +0.9% | Exploration noise std 0.1 → 0.2 with decay schedule | 18m ago |
| ╰ | REVERTED | -0.6% | Critic network: 2 layers → 3 layers, hidden 256 | 33m ago |
| ╰ | KEPT | baseline | SAC baseline — auto-tuned entropy, twin Q-networks | 45m ago |
Avg gain last 10 exp: +0.9%
3/4 evals still improving
78% of budget consumed
Sampler split 2:1