IBM Bob Hackathon 2026

Measurable refactor missions for IBM Bob

Give Bob a map, a mission, and a validation gate. MissionForge turns refactor work into scoped, auditable, PR-ready evidence.

$ missionforge decompose MF-001
✓ MF-001-A  ready (no dependencies)
✓ MF-001-B  ready after A
✓ MF-001-C  ready after A
◎ MF-001-D  waiting on B + C

The problem

Bob re-reads the same files 2–3× per session

Read files broadly Explore codebase structure
Plan the refactor Bob reasons and outlines steps
Re-read during implementation Same files again — token cost ×2
Token waste
Run grep / find Verify scope
Re-read to verify Same files again — token cost ×3
Token waste
Re-read to summarize Final confirmation pass
Token waste

Bob measures the code.MissionForge measures the mission.

The CLI never reads source code. It is language-agnostic by construction.

Workflow

Six phases. Most CLI commands cost zero tokens.

01
Zero

Init

/mf-init

Create mission workspace, write goal and forbidden paths.

02
High

Decompose

/mf-decompose

Bob reads codebase once and writes sub-mission files + dependency plan.

03
Focused

Baseline

/mf-baseline

Bob measures before-state metrics for each sub-mission's scoped files.

04
Focused

Implement

Bob works

Bob implements within allowed paths. Forbidden paths enforced by the CLI.

05
Focused

Validate

/mf-validate

CLI runs git diff + tests. Bob fills final metric values.

06
Zero

Report

/mf-report

CLI templates the PR-ready evidence report. No Bob needed.

Decomposition

Without MissionForge, Bob is flying blind.

With it, every refactor has a contract, a dependency graph, and a validation gate. Here's the difference — and exactly how Bob and MissionForge work together to get there.

Without MissionForge
Bob free-exploring
Files read 2–3× per session

Bob explores, implements, then re-reads the same files to verify — no stable place to store what it learned.

No scope boundary

Nothing stops Bob from touching files it shouldn't. You find out in code review — or you don't.

No before/after evidence

Did the refactor actually work? You have Bob's word for it. No baseline, no metric, no proof.

All or nothing execution

One large session that fails halfway means starting over. No sub-tasks, no recovery, no independent audit trail.

With MissionForge
Bob with a contract
Bob reads each file once

MissionForge records findings to disk. Subsequent steps reference the mission file — no re-exploration.

Forbidden paths enforced by git diff

The CLI checks every changed file against the allowed list. Violations are flagged before validation can pass.

Immutable baseline + final metrics

Bob measures once. The CLI commits the numbers. Validation compares final state against the baseline — no ambiguity.

Scoped sub-missions, dependency order

Bob decomposes the work. Each sub-mission passes its own gate. A failure retries without restarting the parent.

You
Bob
MissionForge CLI
Bob + MissionForge
YOU
You
Write the mission goal and safety boundary

Define what needs to change, what must never be touched, and what success looks like — in one YAML file. One command creates the workspace.

$ missionforge init MF-001
B+M
Bob + MissionForge
Bob reads the codebase once. MissionForge scopes the questions.

MissionForge generates focused prompts from the mission contract. Bob reads only what's in scope — storing findings to disk so they're never re-read.

$ missionforge decompose MF-001
BOB
Bob
Proposes the sub-mission breakdown

Bob writes sub-mission YAML files — each with its own scope, metrics, and dependencies. You review and approve before any code changes begin.

MF-001-A Replace token validation MF-001-B Update session middleware MF-001-C Integration test (gates on A+B)
CLI
MissionForge CLI
Validates the graph. Resolves execution order.

Checks for cycles, overlapping paths, and invalid references. Computes topological order and tells you exactly what's ready to run.

→ MF-001-A and MF-001-B are ready. MF-001-C waits on both to pass.
B+M
Bob + MissionForge
Bob implements. MissionForge enforces scope + captures evidence.

After each sub-mission, the CLI runs git diff against forbidden paths, executes the test command, and records pass/fail with full evidence. Bob never re-reads to verify — MissionForge does it deterministically.

$ missionforge validate MF-001-A --capture
CLI
MissionForge CLI
Generates the PR-ready evidence report

Baselines, final metrics, test results, git diff, and scope audit — all templated automatically. Zero Bob tokens spent on reporting.

$ missionforge report MF-001

Mission contract

Goal, forbidden paths, metrics, and validation — in one file

id: MF-001
goal: |
  Modernize the authentication layer to remove
  all legacy token handling while preserving
  existing session behaviour end-to-end.

forbidden_paths:
  - src/payments/**
  - src/user-profile/**
  - src/admin/**

aggregate_metrics:
  - id: legacy_token_calls
    baseline_target: 22
    final_target: 0

  - id: auth_e2e_test_passes
    baseline_target: false
    final_target: true

test_command: ./run-tests.sh --suite auth
Goal prose

Plain English mission statement. Bob writes this with you before decomposition begins.

Forbidden paths

Global safety boundary. No sub-mission can touch these files — the CLI enforces it via git diff.

Aggregate metrics

Before/after numbers. Bob measures; the CLI records them and validates against targets.

Test command

Shell command the CLI runs for deterministic test evidence.

Mission Board

Kanban view, dependency diamond, stakeholder translation

The Mission Board visualizes CLI state in real time. Engineers see scope and metrics. PMs get a business-friendly translation of mission progress.

Parent + sub-mission cards

Expand any parent mission to reveal the dependency diamond with ready, in-progress, and blocked states visible at a glance.

Real-time polling

Board updates every 3–5 seconds as Bob and the CLI change mission state. No WebSockets required.

Stakeholder view toggle

One click translates technical evidence into a plain-English business summary, generated by watsonx.ai.

Benchmark

A simple, falsifiable comparison

Metric Without MissionForge With MissionForge
Bobcoin cost on same refactor ✗ Measured (baseline) ✓ Measured — target: lower
Forbidden files touched ✗ Inspected manually ✓ Enforced by CLI — zero violations
PR-attachable evidence report ✗ Not produced ✓ Generated automatically
Baseline metrics recorded ✗ None ✓ Structured JSON, immutable after commit
Per-metric pass/fail validation ✗ None ✓ Every sub-mission, every metric
Visible to non-engineers ✗ No ✓ Mission Board + Stakeholder Translation
Re-runnable validation ✗ No ✓ Yes — CLI is idempotent

One mission down.The next is already planned.

Phase 1: CLI Harness · Complete
Phase 2: Mission Board · Building
Phase 3: True Parallel Execution · Roadmap

Bob measures the code. MissionForge measures the mission. The Board makes it visible. One mission at a time, until the system is done.