Claude for Research

EXP-015

Run #1

|45m ago|12 hypotheses completed

Composite Baseline0.641+3.1% from start

Sampler Agents2 running

SA-001DEPTH

11m 03s

Increasing replay buffer priority exponent alpha from 0.6 to 0.8 to emphasize high-TD-error transitions

73%

SA-002BREADTH

6m 22s

Tuning entropy target from -dim(action) to -0.5*dim(action) for better exploration-exploitation trade-off

42%

Evaluator Agents4 active

Task Success Rate

0.670+0.050

best: 0.670

Sample Efficiency

0.412+0.032

best: 0.412

Policy Stability

0.083-0.011

best: 0.083

Generalization

0.521+0.031

best: 0.521

Experiment LogEXP-015

ID	Status	Δ Score	Hypothesis	Time
EXP-015	KEPT	+2.1%	TD3 policy update delay: 2 → 4 steps	3m ago
╰	KEPT	+0.9%	Exploration noise std 0.1 → 0.2 with decay schedule	18m ago
╰	REVERTED	-0.6%	Critic network: 2 layers → 3 layers, hidden 256	33m ago
╰	KEPT	baseline	SAC baseline — auto-tuned entropy, twin Q-networks	45m ago

Soft-Stop Checkpoints

Marginal Return Threshold

Avg gain last 10 exp: +0.9%

Eval Convergence

3/4 evals still improving

Compute Budget

78% of budget consumed

Depth/Breadth Balance

Sampler split 2:1