Measurable refactor missions for IBM Bob
Give Bob a map, a mission, and a validation gate. MissionForge turns refactor work into scoped, auditable, PR-ready evidence.
$ missionforge decompose MF-001 ✓ MF-001-A ready (no dependencies) ✓ MF-001-B ready after A ✓ MF-001-C ready after A ◎ MF-001-D waiting on B + C
The problem
Bob re-reads the same files 2–3× per session
Bob measures the code.MissionForge measures the mission.
The CLI never reads source code. It is language-agnostic by construction.
Workflow
Six phases. Most CLI commands cost zero tokens.
Init
Create mission workspace, write goal and forbidden paths.
Decompose
Bob reads codebase once and writes sub-mission files + dependency plan.
Baseline
Bob measures before-state metrics for each sub-mission's scoped files.
Implement
Bob implements within allowed paths. Forbidden paths enforced by the CLI.
Validate
CLI runs git diff + tests. Bob fills final metric values.
Report
CLI templates the PR-ready evidence report. No Bob needed.
Decomposition
Without MissionForge, Bob is flying blind.
With it, every refactor has a contract, a dependency graph, and a validation gate. Here's the difference — and exactly how Bob and MissionForge work together to get there.
Bob explores, implements, then re-reads the same files to verify — no stable place to store what it learned.
Nothing stops Bob from touching files it shouldn't. You find out in code review — or you don't.
Did the refactor actually work? You have Bob's word for it. No baseline, no metric, no proof.
One large session that fails halfway means starting over. No sub-tasks, no recovery, no independent audit trail.
MissionForge records findings to disk. Subsequent steps reference the mission file — no re-exploration.
The CLI checks every changed file against the allowed list. Violations are flagged before validation can pass.
Bob measures once. The CLI commits the numbers. Validation compares final state against the baseline — no ambiguity.
Bob decomposes the work. Each sub-mission passes its own gate. A failure retries without restarting the parent.
Define what needs to change, what must never be touched, and what success looks like — in one YAML file. One command creates the workspace.
MissionForge generates focused prompts from the mission contract. Bob reads only what's in scope — storing findings to disk so they're never re-read.
Bob writes sub-mission YAML files — each with its own scope, metrics, and dependencies. You review and approve before any code changes begin.
Checks for cycles, overlapping paths, and invalid references. Computes topological order and tells you exactly what's ready to run.
After each sub-mission, the CLI runs git diff against forbidden paths, executes the test command, and records pass/fail with full evidence. Bob never re-reads to verify — MissionForge does it deterministically.
Baselines, final metrics, test results, git diff, and scope audit — all templated automatically. Zero Bob tokens spent on reporting.
Mission contract
Goal, forbidden paths, metrics, and validation — in one file
id: MF-001 goal: | Modernize the authentication layer to remove all legacy token handling while preserving existing session behaviour end-to-end. forbidden_paths: - src/payments/** - src/user-profile/** - src/admin/** aggregate_metrics: - id: legacy_token_calls baseline_target: 22 final_target: 0 - id: auth_e2e_test_passes baseline_target: false final_target: true test_command: ./run-tests.sh --suite auth
Plain English mission statement. Bob writes this with you before decomposition begins.
Global safety boundary. No sub-mission can touch these files — the CLI enforces it via git diff.
Before/after numbers. Bob measures; the CLI records them and validates against targets.
Shell command the CLI runs for deterministic test evidence.
Mission Board
Kanban view, dependency diamond, stakeholder translation
The Mission Board visualizes CLI state in real time. Engineers see scope and metrics. PMs get a business-friendly translation of mission progress.
Expand any parent mission to reveal the dependency diamond with ready, in-progress, and blocked states visible at a glance.
Board updates every 3–5 seconds as Bob and the CLI change mission state. No WebSockets required.
One click translates technical evidence into a plain-English business summary, generated by watsonx.ai.
Benchmark
A simple, falsifiable comparison
| Metric | Without MissionForge | With MissionForge |
|---|---|---|
| Bobcoin cost on same refactor | ✗ Measured (baseline) | ✓ Measured — target: lower |
| Forbidden files touched | ✗ Inspected manually | ✓ Enforced by CLI — zero violations |
| PR-attachable evidence report | ✗ Not produced | ✓ Generated automatically |
| Baseline metrics recorded | ✗ None | ✓ Structured JSON, immutable after commit |
| Per-metric pass/fail validation | ✗ None | ✓ Every sub-mission, every metric |
| Visible to non-engineers | ✗ No | ✓ Mission Board + Stakeholder Translation |
| Re-runnable validation | ✗ No | ✓ Yes — CLI is idempotent |
One mission down.The next is already planned.
Bob measures the code. MissionForge measures the mission. The Board makes it visible. One mission at a time, until the system is done.