This report is evidence for /supergoal. It is not part of the skill runtime
contract.
Does /supergoal improve difficult coding-task outcomes against plain Codex CLI
and Codex Goal mode?
| Arm | Hidden checks | Verification | Token signal | Outcome |
|---|---|---|---|---|
| Plain Codex CLI | Failed | No solution diff; no final output | Not reported | No usable result |
/supergoal |
Passed all | Focused regressions green; neighbor checks green; git diff --check green; delivery gate green |
378,468 | Best result |
| Codex Goal mode | Failed 1 check | Focused regressions green; git diff --check green |
165,336 CLI + 130,543 internal | Partial result |
git diff --check./supergoal: passed every hidden check.Both solved arms also probed a broad Gradle suite. The broad suite failed on pre-existing fixture/config/context failures outside the changed surface, so the score used focused checks plus the shared hidden scorer.
/supergoal: 352 tests completed, 47 failed, 3 skipped.342 tests completed, 47 failed, 3 skipped.On this harder private-codebase task, /supergoal produced the only complete
answer. The difference was not just code generation; the delivery gate, review
loop, and hidden-check discipline caught coverage and completion gaps that the
other arms missed.