Drug discovery - ADMET

89 hypotheses · composite score 0.891

Complete
EXP-046|89 hypotheses evaluated|Best composite: 0.902+9.8% from baseline
All success criteria met
Why It Stopped
Soft-stop triggered: AUC-ROC and MCC both converged within 0.5% of individual maxima
Over the final 22 hypotheses, AUC-ROC varied ±0.002 and MCC varied ±0.003 — both below the 0.5% convergence threshold. The composite gain rate dropped to +0.03% per hypothesis.
Success criteria
AUC-ROC ≥ 0.90(0.903)
ECE < 0.05(0.038)
MCC ≥ 0.80(0.814)
Triggered at
Hypothesis 89 of 120
Best Configuration
Cumulative changes from the highest-scoring hypothesis chain
Architecture diff vs. baseline0.902 composite (+9.8%)
PoolingMean poolingAttention-weighted sum+1.2%
AugmentationSMILES dropout 0.1SMILES dropout 0.2+0.7%
Features2D fingerprint only+ 3D conformer embedding+2.3% (flagged)
Loss weightingUniformInverse class frequency+1.8%
Cross-val splitRandom scaffoldScaffold stratified k-fold+0.5%
Best Hypothesis Per Eval
Depth-first optimizer results — the single change that most improved each eval
AUC-ROCHIGH
auc_roc_macro
0.903
+0.083 vs baseline
0.820
0.903
Graph conv pooling: mean → attention-weighted sum
Attention pooling learns which atoms matter per endpoint, recovering signal lost in uniform aggregation.
SA-003 · DEPTHHypothesis 61
F1 ScoreHIGH
f1_macro
0.871
+0.080 vs baseline
0.791
0.871
Multi-task loss weighting: inverse class frequency
Rare ADMET endpoints (BBB, hERG) were under-weighted. Inverse-frequency re-weighting recovered macro F1 on minority classes.
SA-001 · DEPTHHypothesis 44
MCCHIGH
matthews_corrcoef
0.814
+0.080 vs baseline
0.734
0.814
Scaffold-stratified k-fold cross-validation
Random splits leak scaffold information; stratified splits produce unbiased MCC estimates and prevent overfitting to common scaffolds.
SA-002 · DEPTHHypothesis 38
Calibration (ECE)MEDIUM
expected_calibration_error
0.038
−0.030 vs baseline
0.068
0.038
Temperature scaling post-hoc calibration (T=1.4)
Without calibration the model was overconfident at high-probability predictions. Temperature scaling brought ECE well below the 0.05 target.
SA-004 · BREADTHHypothesis 77
Proposed Next Directions
Select a direction to configure a new experiment run, or chat with Claude to define a custom path