Architecture

Agent Workflows & Communication

How agents communicate and coordinate through asynchronous update queues. A message-passing architecture where agents stay loosely coupled while enabling cross-agent collaboration — all under your control.

Key Principles

The toolkit uses a message-passing architecture that keeps agents independent while enabling continuous improvement.

Self-Improving System

Agents don't just follow instructions — they identify gaps and queue improvements. Your toolkit gets better every time you use it.

You Stay in Control

Every update is queued, not applied. You review, approve, modify, or reject changes on your schedule.

Separation of Concerns

Agents stay in their lane. Implementation agents don't modify the toolkit; toolkit agents don't touch your code.

Asynchronous Collaboration

Queue an idea now, address it later. Keep your flow while building a backlog of improvements.

The Three Primary Agents

These are the entry-point agents you invoke directly. Each has a distinct role and communicates with others through the update queue system.

Model Selection

Primary agents use your active OpenCode model selection by default. When no explicit model is specified in agent frontmatter, the agent inherits your current model choice — whether that's Claude, GPT-4, or another provider. This lets you switch models globally without editing agent definitions.

@planner

Project planning and PRD management

  • Bootstrap new projects with stack detection
  • Create and refine PRDs (Product Requirements Documents)
  • Manage the PRD registry and lifecycle states
  • Queue toolkit updates when discovering missing stack support
@builder

Feature implementation orchestrator

  • Build features from PRDs or ad-hoc requests
  • Orchestrate implementation agents (developer, tester, critic)
  • Apply project updates queued by @toolkit
  • Queue toolkit updates when discovering workflow gaps
@toolkit

AI toolkit maintenance

  • Maintain agents, skills, templates, and scaffolds
  • Process toolkit update requests from other agents
  • Queue project updates when schema or patterns change
  • Keep toolkit-structure.json and documentation in sync

Communication Flow

Agents communicate through file-based update queues. This keeps them loosely coupled while enabling cross-agent collaboration.

USER CONTROL — Reviews, approves, rejects, or escalates all updates
@plannerProject
@builderFeature
@toolkitAI
queue updates
pending-updates/

Any agent → @toolkit

reads & processes
@toolkit
Reviews & applies
queues project changes
project-updates/

@toolkit → @builder

applies to projects
@builder
Applies to projects

Update Queues

File-based message queues enable cross-agent communication without tight coupling.

pending-updates/Any agent → @toolkit

Toolkit improvement requests discovered during work

When any agent discovers a gap in the toolkit — a missing pattern, outdated guidance, or needed enhancement — it writes a request file here instead of modifying toolkit files directly.

Workflow

  1. 1Agent discovers toolkit gap during normal work
  2. 2Agent writes structured request to pending-updates/
  3. 3Agent notifies user: 'Queued toolkit update for @toolkit'
  4. 4@toolkit presents pending requests at session start
  5. 5User reviews each request and approves/rejects
  6. 6@toolkit applies approved changes, deletes update file, and commits
project-updates/@toolkit → @builder

Project changes required by toolkit updates

When toolkit changes require updates to existing projects — schema migrations, new required fields, or pattern changes — @toolkit queues instructions here for @builder to apply.

Workflow

  1. 1@toolkit makes a change that affects project structure
  2. 2@toolkit writes update instructions for each affected project
  3. 3@builder presents pending updates when user selects a project
  4. 4User chooses to apply, defer, or skip each update
  5. 5@builder delegates to @developer to apply the changes
  6. 6@builder deletes the update file and verifies deletion before marking complete

User Control: Users can review requests, ask clarifying questions, modify the scope, or reject updates entirely. Nothing is applied automatically.

Project Update File Format

Each project update file requires YAML frontmatter with a scope field indicating what type of work is required.

project-updates/my-project/2026-02-21-migrate-to-capabilities.md
---
scope: implementation
date: 2026-02-21
priority: normal
breaking: false
---

# Update: Migrate 'features' to 'capabilities'

## What to change
Rename the `features` field to `capabilities` in project.json.

## Steps
1. Open docs/project.json
2. Rename `features` key to `capabilities`
3. Run npm run typecheck to verify

## Verification
- `npm run typecheck` passes
- No references to old field name remain

Required Scope Values

ScopeHandled ByDescription
implementation@builderCode changes only — file edits, config updates, migrations
planning@plannerPRD/documentation changes, architecture decisions, planning metadata in docs/project.json
mixedBothRequires both planning and implementation work

Planner Write Allowlist

Planning-scope updates can target docs/project.json for planning metadata changes (e.g., capability flags, PRD lifecycle settings). This file is included in the planner's write allowlist alongside PRD files and the prd-registry. @planner owns all planning metadata updates to docs/project.json.

Update File Lifecycle

Update files must be deleted after successful application to prevent re-processing. Agents must verify deletion before marking the update complete.

  1. 1

    Apply changes

    Execute all steps specified in the update file

  2. 2

    Run verification

    Execute verification commands from the update file

  3. 3

    Delete update file

    Remove the update file from the queue

  4. 4

    Verify deletion

    Confirm the file no longer exists before marking complete

Why verify deletion? If an agent fails to delete the update file (or deletion fails silently), the update will be presented again on the next session, leading to confusion or duplicate work. Verification ensures the lifecycle completes cleanly.

Integration Provisioning Automation

When projects need integrations (Stripe, Resend, OpenAI, etc.), the agent system uses a lightweight workflow to provision and promote integration-specific skills.

1

Planner Adds Integration Tasks

During PRD creation, @planner identifies integration needs by analyzing user stories and requirements. When an integration is detected (e.g., "accept payments" → Stripe), the planner:

  • Sets capability flags in docs/project.json (e.g., capabilities.payments: true)
  • Adds the integration to the integrations array
  • Adds an integration skill task to the PRD (non-blocking, just part of the work)
2

Builder Creates Missing Skills

During build, @builder checks if integration skills exist and creates missing ones on-demand using meta-skill generators:

  • Checks docs/project.json for required integrations
  • Loads the appropriate meta-skill generator (e.g., stripe-skill-generator)
  • Generates the skill file to docs/skills/<integration>/SKILL.md
  • Records the generated skill in docs/project.jsonskills.generated[]
3

Builder Queues Toolkit Promotion

After creating a new integration skill, @builder always queues a toolkit promotion update. @toolkit later reviews and promotes mature patterns:

  • @builder queues promotion update to pending-updates/ for every new skill
  • @toolkit reviews queued updates and extracts reusable patterns
  • Updates meta-skill generators with improved patterns
  • Future projects automatically benefit from the promoted patterns

Available Meta-Skill Generators

These generators create project-specific skills based on detected capabilities and integrations.

GeneratorTriggered ByGenerates
auth-skill-generatorcapabilities.authenticationAuth flow patterns, session handling
stripe-skill-generatorintegrations: ["stripe"]Payment flows, webhook handling
email-skill-generatorcapabilities.emailTransactional email templates, delivery
crud-skill-generatorAny project with databaseEntity patterns, validation, API routes
ai-tools-skill-generatorcapabilities.aiAI tool definitions, chatbot patterns

Lightweight Integration Workflow

Integration skills are created on-demand during build, with no blocking dependencies. Builder automatically queues promotion updates, ensuring patterns flow back to the toolkit for future projects.

Builder Startup Behavior

When @builder starts a session, it performs optional file checks to determine current work state. Missing files are expected and handled gracefully.

Startup Sequence

  1. 1

    Check docs directory listing

    List docs/ contents to see what files exist before attempting reads.

  2. 2

    Read builder-state.json (if present)

    Only read docs/builder-state.json if it appears in the listing. A missing file is normal — it means no prior session state.

  3. 3

    Load PRD and project context

    Read docs/prd.json and docs/project.json to understand current work and project configuration.

  4. 4

    Present workflow dashboard

    Show the interactive dashboard with workflow options: P (PRD), A (Ad-hoc), U (Updates), E (E2E tests). Wait for user selection before proceeding.

  5. 5

    Start dev server (after workflow selection)

    Check dev server health and start it if needed. This happens after the user selects a workflow, not immediately on session start.

  6. 6

    Detect available CLIs (optional)

    Check which service CLIs are installed and authenticated: vercel, supabase, aws, gh, etc. Authenticated CLIs are shown in the dashboard and stored for direct use throughout the session.

Why check first? Attempting to read a missing file triggers an error. By listing the directory first, the builder knows which files exist and avoids unnecessary error handling for expected missing state.

Dev Server Check Timing

Dev server checks are deferred until workflow selection — after the user chooses P, A, U, or E. The agent must pass the strict readiness gate via scripts/check-dev-server.sh before proceeding. This avoids blocking the dashboard while waiting for a server that may not be needed (e.g., for documentation-only updates).

Correct

Load state → Show dashboard → User selects workflow → Check/start dev server → Execute workflow

Avoid

Load state → Check/start dev server → Show dashboard (blocks user while server starts)

Token Budget Management

Builder startup enforces token-light file reads to preserve context window capacity for actual work.

Why This Matters

AI agents have ~128K token context limits. Careless file reads during startup can consume 15,000+ tokens from a single file — for example, prd-registry.json at 50KB+. This leaves insufficient capacity for implementation, debugging, and iterative conversation.

Token-Light Read Rules

File TypeStrategy
JSON files >10KBUse jq to extract only needed fields
Text files >50 linesUse offset/limit reads — specific sections only
Log filesNever read in full — use tail or grep
Source codeRead specific files, not entire directories

Example: Reading prd-registry.json

Wasteful (~50KB)

cat docs/prd-registry.json

Token-Efficient (~2KB)

jq '[.prds[] | {id, name, status}]' \
  docs/prd-registry.json

Skills Loading Strategy

Skills are loaded on-demand, never eagerly at session start. Large skills are deferred until actually needed:

SkillSize / TokensLoad When
test-flow~698 lines (unified)Any story execution — unified orchestrator (skip gate → activity resolution → quality pipeline → completion)
adhoc-workflow61KB / ~15K tokensEntering ad-hoc mode
prd-workflow34KB / ~9K tokensSelecting a PRD
builder-state23KB / ~6K tokensReference in-line, rarely need full skill
builder-dashboard5KB / ~1.3K tokensRendering status dashboards
builder-error-recovery4KB / ~1K tokensError recovery escalation
builder-verification14KB / ~3.5K tokensVerification loops and quality gates
session-setup3KB / ~750 tokensSession startup sequence

Note: The test-flow skill was consolidated from 7 sub-skills into a unified orchestrator (~698 lines). It absorbed test-activity-resolution and test-quality-checks directly, while continuing to load 5 Tier 2 sub-skills on demand (test-verification-loop, test-prerequisite-detection, ui-test-flow, test-ui-verification, test-failure-handling). The unified orchestrator owns the full quality cycle: skip gate → activity resolution → quality check pipeline → completion prompt.

Builder Extraction (Phase 2): Following the same pattern, builder.md had three large sections extracted into dedicated skills: builder-dashboard, builder-error-recovery, and builder-verification. This reduced the base agent file by ~30% while making these capabilities available on-demand. The new session-setup skill handles session initialization with the always-on coordination model.

Key Rule: Startup file reads should total <10KB. Skills are loaded on-demand, never eagerly at session start. This preserves context capacity for the actual implementation work.

Verification Pipeline Resolution

Before committing any code change, Builder must resolve and execute the correct verification pipeline. This ensures desktop apps are rebuilt and relaunched, while web apps rely on HMR — all verified using the appropriate Playwright variant.

Automatic UI Project Detection

No configuration is required for Playwright verification. Projects are automatically detected as UI projects when any of:

  • postChangeWorkflow.steps[] contains a step with "playwright" in its name or command
  • apps.*.testing.framework contains "playwright"
  • apps.*.type is "frontend" or "desktop"

The legacy agents.verification.mode: "playwright-required" is still respected but no longer required. Use "no-ui" to explicitly opt out.

1

Check for postChangeWorkflow Override

If project.json defines a postChangeWorkflow, its steps array is executed in order. If any required: true step fails, the commit is blocked. Auto-inference (Step 2) is skipped entirely.

2

Auto-Infer from apps[] Configuration

When no override exists, Builder reads apps[] from project.json and selects the pipeline based on app type, framework, and web content strategy.

App TypeFrameworkwebContentPipeline
desktopelectronbundledtypecheck → test → buildrelaunch Electron → verify-with-playwright-electron
desktopelectronremotetypecheck → test → ensure Electron running → verify-with-playwright-electron
desktopelectronhybridtypecheck → test → buildrelaunch Electron → verify-with-playwright-electron
webanyn/atypecheck → test → verify-with-playwright (dev server + HMR)
mobilereact-nativen/atypecheck → test → (no automated UI verify yet)
No apps[]Fall back to existing quality checks (typecheck/lint/test)

Critical: Desktop apps always use playwright-electron, never browser-based verification. Even webContent: "remote" requires connecting Playwright to the Electron process, not opening localhost in a browser.

3

Execute the Resolved Pipeline

Run each step in order. If any required step fails, block the commit, fix the issue, and re-run from the failed step.

4

Skip Conditions

The verification pipeline can be skipped when all changed files match one of these patterns:

  • Docs-only changes (*.md files only)
  • Config-only changes (project.json or similar)
  • Test-only changes (*.test.ts, *.spec.ts, __tests__/)
  • CI/build config changes (.github/, Dockerfile)
  • Lockfile-only changes (pnpm-lock.yaml, package-lock.json)
  • User explicitly says "skip verification"
5

Story-Scoped Playwright PRD mode

In PRD mode, Playwright runs are story-scoped: only tests covering changed files and their 1-hop import consumers are executed. The full suite is never auto-run per-story.

Scoping Algorithm

  1. 1.Find direct test files for each changed source file (path overlap, route overlap, explicit mapping)
  2. 2.Follow import graph 1 hop to find direct consumers of each changed file
  3. 3.Find test files for those consumers too
  4. 4.Union all scoped tests — if empty, fall back to full Playwright command from postChangeWorkflow

Example: 1-hop dependency scoping

Story changes: calculateLineItem() in src/invoices/line-items.ts

Direct tests:
  → e2e/invoices/line-items.spec.ts

1-hop consumers (files that import line-items.ts):
  → src/invoices/invoice-total.ts
  → src/components/InvoiceEditor.tsx

Consumer tests:
  → e2e/invoices/totals.spec.ts
  → e2e/desktop/invoice-editor.spec.ts

Scoped run: 3 test files

Full suite is user-initiated only. Builder does not auto-run the full Playwright suite. Users can request it at any time.

6

Playwright Retry Strategy PRD mode

In PRD mode, Playwright failures use a 5-attempt retry with fix attempts between each retry. After 5 failures, the Playwright step is skipped and logged — Builder continues to the next story.

AttemptContext Given to @developer
1Error message + test file
2Previous error + what attempt 1 tried
3Full history + stack trace + DOM snapshot
4All prior context + alternative approach suggestion
5All prior context + "last attempt before skip" flag

Differs from ad-hoc mode: The general verification loop uses 3 attempts then stops and asks the user. In PRD per-story mode, the goal is momentum — log the failure, continue to the next story. Skips are recorded in builder-state.json → activeWork.stories[].playwrightSkips.

7

Ops-Only Task Verification ad-hoc mode

When taskType is ops-with-runtime-impact, the standard pipeline (typecheck → build → test) is skipped because no source files changed. Instead, Builder runs Playwright verification against the affected runtime behavior directly after ops commands complete.

This closes the gap where ops-only fixes to browser-visible issues (CORS errors, auth failures, missing deployments) were declared "done" without browser-level verification.

Task TypeWhenVerification Pipeline
source-changeModifying source files (.ts, .tsx, .go, etc.)Standard: typecheck → test → build → Playwright
ops-with-runtime-impactCLI/ops commands only AND original issue is browser-visible (CORS, auth, API errors)Reduced: skip typecheck/build, run Playwright against affected behavior
ops-onlyCLI/ops commands only AND no browser-visible impact (CI config, log rotation)None: mark complete after ops commands succeed

⛔ ops-only Guard

If the implementation modifies any source file, the task is NOT ops-only. Source files (.ts, .tsx, .go, .py, etc.) always require the standard verification pipeline — even if the change is to "infrastructure" code like IPC handlers, main process files, build config, or native API wrappers.

Common misclassification attempts:

  • "This is an IPC handler change, it's infrastructure" → It's a source file → source-change
  • "This is a main process change, not web content" → It's a source file → source-change
  • "This is a build/config change" → If it modifies a source file → source-change

✅ Valid ops-only examples: supabase secrets set, vercel env add, gh secret set, restarting a service

Post-Ops Verification Flow

After ops commands complete, Builder reads taskType from builder-state.json and branches:

  1. 1.Read verificationTarget from state (what behavior to verify)
  2. 2.Write or identify existing Playwright test for the target behavior
  3. 3.Execute Playwright — retry up to 3 times with fix attempts on failure
  4. 4.If test passes → mark verified and proceed to completion
Ops TaskTest TargetExample Assertion
Deploy edge functionsFunction endpoint respondsexpect(response.status()).toBe(200)
Set secrets / env varsFeature that uses the secretNavigate to feature → verify no error
Deploy infrastructureDependent app behaviorFeature using infra is functional
Database migration (remote)App reads migrated dataNavigate to page → verify data renders

Browser verification required: The Playwright test must verify the user-visible behavior that was broken, not just that the ops command succeeded. A curl 200 is NOT sufficient when the original issue was browser-visible.

Todo Synchronization

All three primary agents maintain a synchronization contract between OpenCode's right-panel todos and persistent state files. This enables session resumability across restarts.

Synchronization Contract

  1. 1

    Restore on startup

    Each agent checks for its state file and replays all stored todos to the OpenCode right panel using todowrite.

  2. 2

    Sync during work

    When creating, updating, or completing todos in the UI, immediately persist the change to the agent's state file with status and priority.

  3. 3

    Clean up on completion

    When a workflow completes (PRD shipped, ad-hoc finished, update applied), clear the relevant todos from both the UI and persistent state.

Agent State Files

Each primary agent persists todos to its own state file:

AgentState File
@builder<project>/docs/builder-state.json
@planner<project>/docs/planner-state.json
@toolkit~/.config/opencode/.tmp/toolkit-state.json

Builder Todo Flows

PRD Workflow (P)

  • One todo per user story
  • Reference ID: Story ID (US-###)

Ad-hoc Workflow (A)

  • One todo per decomposed task
  • Reference ID: adhoc-###
  • Pre-analysis screenshot captured before code analysis (Step 0.0a)
  • Task type classified: source-change, ops-with-runtime-impact, or ops-only (Step 0.1a)
  • Playwright analysis probe confirms DOM state before implementation (Step 0.1b)
  • Design decision detection surfaces implicit choices before coding (Step 0.1c)
  • Playwright analysis validation confirms visual alignment (Step 0.1d)
  • Task Spec generated from analysis — used as @developer delegation contract
  • [F] Flow chart option shows implementation plan with per-story pipeline steps
  • Mandatory verification plan in every ANALYSIS COMPLETE dashboard

Pending Updates (U)

  • One todo per update file
  • Reference ID: Update filename

Deferred E2E (E)

  • One todo per queued E2E test
  • Reference ID: E2E file path

Playwright Analysis Probe (Step 0.1b)

Before showing the ANALYSIS COMPLETE dashboard, Builder runs a lightweight Playwright probe against the live app to confirm code analysis conclusions. This catches discrepancies between what the code says and what actually renders — such as elements hidden by CSS, gated by feature flags, or rendered differently at runtime.

6 Assertion Types

visible, absent, enabled, disabled, text-contains, exists

Probe Outcomes

Confirmed → proceed with badge · Contradicted → re-analyze with lower confidence

🔌 Probe Transport Detection (Step 2)

Before building the probe spec, Builder reads project.json → architecture.deployment to select the correct transport:

Deployment TypeTransportSpec Format
web, web-*, serverlessbrowserBrowser Probe Spec
electron-only, desktop, taurielectronElectron Probe Spec
hybridBothOne probe per transport

⛔ Desktop apps have NO browser-accessible web server

Never use baseUrl: "http://localhost:{devPort}" for desktop/Electron apps. Using browser transport against localhost will probe the wrong thing (or nothing at all). If architecture.deployment is electron-only, you must use transport: electron with paths from apps.desktop.testing.

Browser Probe Spec

transport: browser
baseUrl: "http://localhost:{devPort}"
assertions:
  - page: "/checkout"
    checks:
      - selector: "button[type='submit']"
        expect: "visible"

Electron Probe Spec

transport: electron
launchTarget: # from project.json
executablePath: # per-platform
zombieCleanup: true
assertions:
  - window: "main"
    checks:
      - selector: "button[type='submit']"
        expect: "visible"
electronChecks:
  - type: "ipc-response"
    channel: # relevant IPC channel

⛔ No-Bypass Rule

The probe cannot be skipped through rationalization. Common invalid excuses:

Invalid RationalizationWhy It's Wrong
"This is an Electron/desktop app"Electron apps have web content — probe the web UI
"The analysis is clear from code"Code analysis misses runtime state, CSS, route guards
"This is a UX flow restructuring"UX changes affect visible elements — probe them
"The user described it clearly"User descriptions are input, not verification
"I already took a screenshot"Screenshots show current state; probes verify specific assertions
"This is a backend/config change"If the change has any runtime UI impact, probe the affected pages
"This change cannot be verified via Playwright"Every source code change has observable effects — re-analyze what the change affects in the rendered UI
"This is a main process / IPC / native API change"Main process changes affect what the renderer shows — IPC handlers serve data to web content
"Code analysis is definitive"Code analysis is input to the probe, not a replacement — runtime behavior diverges from source
"The critical path cannot be verified in a browser"If the app has any web content, there are browser-observable effects of every code change

The ONLY way a probe can be skipped is if the user explicitly accepts a skip after Builder has exhausted all options and asked for assistance. This sets probeStatus: "user-skipped" — Builder cannot set this status autonomously.

⚠️ Page Targeting Rule

Probes must target the actual pages being modified — not just whatever public pages are accessible. If the analysis identifies changes to /dashboard and /settings, assertions must be generated for those pages — not only /login because it's public.

If target pages require authentication, Builder follows the autonomous auth resolution protocol to authenticate before probing. A probe that only checked public pages when the actual changes target authenticated pages is not "partially confirmed" — it's not probed at all.

🔐 Auth Resolution Escalation

When probing authenticated pages, Builder resolves authentication autonomously — it never asks the user for credentials. The escalation ladder:

  1. 1Check project.json → authentication for existing config
  2. 2If configured — load the matching auth skill (Supabase OTP, NextAuth credentials, headless, etc.)
  3. 3If not configured — load setup-auth skill to auto-detect and configure
  4. 4Only if all approaches fail — degrade to public-page-only probing with degraded-no-auth status

Mandatory Verification Plan

Every ANALYSIS COMPLETE dashboard includes a 🔧 VERIFICATION PLAN section making explicit what verification will run and why. This prevents Builder from "forgetting" verification after ops commands complete.

Source Changes

Whether any source files will be modified

Pipeline

Exact verification steps that will execute

Playwright Scope

What behavior will be browser-verified

Dashboard Layout

The ANALYSIS COMPLETE dashboard organizes its recommendations into distinct sections:

RECOMMENDED APPROACH

Always shown as its own section — never listed inside alternatives

🔀

ALTERNATIVES

Non-recommended options only; collapsed if no alternatives exist

🔧

VERIFICATION PLAN

Mandatory — shows task type, source changes, pipeline, and Playwright scope

⚙️

IMPLEMENTATION DECISIONS

Shown when Step 0.1c detected and resolved design decisions — lists each decision with the user's choice

Planner Todo Flows

Draft Refinement (D)

  • One todo per refinement task/question batch
  • Reference ID: draft-<slug>-task-###

New PRD (N)

  • One todo per creation step
  • Reference ID: new-prd-<slug>-step-###

Move to Ready (R)

  • One todo per PRD moved
  • Reference ID: prd-<slug>

Planning Updates (U)

  • One todo per planning update file
  • Reference ID: Update filename

Toolkit Todo Flows

Pending Updates

  • One todo per pending update file
  • Ref: pending-update filename

Direct Requests

  • One todo per user request task
  • Ref: toolkit-task-###

Post-Change Workflow

  • One todo per mandatory step
  • Ref: postchange-step-###

uiTodos Schema (builder-state.json)

The uiTodos object in builder-state.json stores the current todo state.

docs/builder-state.json
{
  "lastActivity": "2026-02-21T15:30:00Z",
  "currentPrd": "feature-auth",
  "currentStory": "US-003",
  "uiTodos": {
    "items": [
      {
        "id": "todo-1",
        "content": "Implement login form validation",
        "status": "in_progress",
        "priority": "high",
        "createdAt": "2026-02-21T14:00:00Z"
      },
      {
        "id": "todo-2",
        "content": "Add password reset flow",
        "status": "pending",
        "priority": "medium",
        "createdAt": "2026-02-21T14:05:00Z"
      }
    ],
    "syncedAt": "2026-02-21T15:30:00Z"
  }
}

Why persist todos? Sessions can be interrupted at any time. By persisting todo state, each primary agent can restore exactly where work left off — including in-progress items the user was tracking.

Story Processing Pipeline

Every story — whether from a PRD or an ad-hoc request — passes through the same mandatory 6-step pipeline. No agent may skip steps or reorder them. The adhoc-workflow and prd-workflow skills reference this pipeline — they do not define their own.

for each story in activeWork.stories where status == "pending":
    run Pipeline Steps 1–6
1

Set story status → in_progress

Update activeWork.stories[currentStoryIndex].status to "in_progress" in builder-state.json.

2

Delegate implementation → @developer

Delegate the story to @developer with full story context (story ID, description, acceptance criteria, project context block). If @developer returns an error, the story is marked failed and the pipeline stops.

3

Run test-flow → unconditional call

Load and execute test-flow unconditionally. test-flow owns the full quality cycle including skip-gate evaluation, activity resolution, quality checks (typecheck / lint / test / rebuild / critic / Playwright), fix loop (redelegation to @developer, re-check, retry), and the completion prompt. This is not a single pass — it includes the entire fix/critic/redelegation loop until pass or exhaustion.

4

Auto-commit → mandatory after test-flow passes

Auto-commit is unconditional and mandatory — it always commits after each story completes, regardless of any git.autoCommit setting. Per-story commits are required for resumability and audit trail.

git add -A
git commit -m "feat: [story description] ([story-id])"
5

Update story status → completed

Update the story with status: "completed", committedAt timestamp, commitHash, and testFlowResult.

6

Advance to next story

Increment activeWork.currentStoryIndex. If more pending stories exist, the loop continues from Step 1.

Failure Handling

Failure PointStory StatusPipeline Action
@developer returns error (Step 2)failedSTOP — report to user
test-flow exhausts retries (Step 3)failedSTOP — report to user

Session Resume

When Builder starts, it checks builder-state.json → activeWork. If any story has a non-terminal status, a Resume Dashboard is shown instead of the normal startup.

Old-Format Field Detection

If builder-state.json contains legacy fields (activePrd, activeTask, adhocQueue) without an activeWork field, they are cleared entirely and the session starts fresh. No backward-compatibility migration is performed.

Resume Dashboard

Resume Dashboard
Mode:   prd (feature-auth)
Branch: feature/auth

Stories:
  ✅ US-001  Create user model          completed
  ✅ US-002  Add validation              completed
  ❌ US-003  Implement auth flow         failed
  ⏳ US-004  Add error handling          pending
  ⏳ US-005  Write integration tests     pending

Progress: 2/5 completed | 1 failed | 2 remaining

[R] Resume from next pending story
[A] Abort — mark remaining as cancelled
[S] Start fresh — archive and begin new session

Status icons: ✅ completed · ❌ failed · 🔄 in_progress · ⏸ skipped · ⏳ pending · 🚫 cancelled

Failed Story Handling

If any stories have status: "failed", they are listed individually before the main resume options. The user must explicitly choose for each failed story — no automatic retry.

❌ US-003: Implement auth flow
   Error: test-flow failed — 2 unit tests failing
   Files: src/auth/flow.ts, src/auth/middleware.ts

   [R] Retry — reset to pending and re-run full pipeline
   [S] Skip — mark as skipped, move on
   [A] Abort — stop all work, cancel remaining stories
ChoiceBehavior
[R] ResumeContinue from first pending story. Use existing activeWork — do not re-analyze.
[A] AbortSet all pending stories to cancelled. Keep completed and skipped as-is. Clear activeWork.
[S] Start freshArchive current activeWork, then clear it. Start a new session from the main dashboard.

Design Decision Detection

Step 0.1c in the ad-hoc workflow surfaces implicit design and implementation decisions that the user should weigh in on before Builder proceeds. These are decisions about how to build it well — not clarifications about what to build.

When to Skip (No Questions)

Decision detection is skipped entirely when the request is clearly trivial:

Skip CriterionExamples
Bug fix with clear root cause"Fix the 404 on /settings"
Typo / copy correction"Change 'Submitt' to 'Submit'"
Version bump / dependency update"Update React to 18.3"
Config-only change"Change the timeout to 30s"
Ops-only task"Deploy the edge functions"
Single-file, single-behavior change"Make the header sticky"

When to Detect Decisions

Run decision detection when the request involves:

  • Multiple reasonable implementation variants — more than one experienced developer would reasonably choose a different approach
  • UX behavior choices — navigation, state persistence, validation timing, progressive disclosure
  • Data lifecycle decisions — soft vs hard delete, sync vs async, cache invalidation, retry policy
  • Component composition — modal vs page, wizard vs form, inline vs overlay, tabs vs accordion
  • Error handling strategy — toast vs inline, retry vs fail, graceful degradation approach

Questions UI

When decisions are detected, Builder presents them as lettered multiple-choice questions:

1. Should wizard state persist so users can leave and resume?
   A. Yes — save progress to localStorage/DB
   B. No — reset on page leave (simpler)

2. When should validation run?
   A. Per-step — each step validates before allowing Next
   B. Final step — validate everything at submission

Reply with codes (e.g., "1A, 2B") or describe your preference.
Type "you decide" to let me choose based on best practices.
  • Maximum 5 questions per request (highest-impact first)
  • Each question has 2–4 concrete options with brief explanations
  • Single round only — no follow-up questions after user answers
  • Decisions the user already specified in their request are omitted entirely
  • Supports planning.considerations from project.json — relevant consideration questions are included (up to the 5-question max)

Playwright Analysis Validation (Step 0.1d)

After design decisions are resolved, Builder runs a second Playwright pass to visually confirm that the complete analysis — including any adjustments from decision resolution — aligns with what's actually rendered. This applies to every request — all projects get full Playwright verification.

Step 0.1b (Probe)

Confirms analysis findings — element existence, absence, state. Tests specific assertions.

Step 0.1d (Validation)

Validates overall analysis makes sense visually — right page, right components, right context.

Validation ResultAction
Analysis aligns with visual stateProceed — record visualValidation: "confirmed"
Minor discrepanciesAdjust analysis, note discrepancies in dashboard, proceed
Major contradictionRe-analyze from updated visual context, lower confidence

Flow Chart Option

When the ANALYSIS COMPLETE dashboard is shown, the [F] option generates an ASCII flow chart showing the full implementation plan adapted to the specific stories from analysis.

Implementation Flow Chart
  4 stories │ Story Processing Pipeline (per story)
  ──────────┤
            │
  ┌─────────────────────────────────────────────────┐
  │ TSK-001: Add loading state to SubmitButton       │
  │   implement → test-flow → auto-commit            │
  └──────────────────────┬──────────────────────────┘
                         │
  ┌─────────────────────────────────────────────────┐
  │ TSK-002: Show Spinner when loading               │
  │   implement → test-flow → auto-commit            │
  └──────────────────────┬──────────────────────────┘
                         │
  ┌─────────────────────────────────────────────────┐
  │ TSK-003: Disable button during submission        │
  │   implement → test-flow → auto-commit            │
  └──────────────────────┬──────────────────────────┘
                         │
  ┌─────────────────────────────────────────────────┐
  │ TSK-004: Add unit tests                          │
  │   implement → test-flow → auto-commit            │
  └─────────────────────────────────────────────────┘

  Pipeline per story:
    1. Set status → in_progress
    2. Delegate to @developer
    3. Run test-flow (typecheck → lint → test → Playwright → fix loop)
    4. Auto-commit (mandatory, unconditional)
    5. Update status → completed
    6. Advance to next story
ScenarioFlow Chart Behavior
Single storyOne box, no connecting lines
Multi-story (no deps)Vertical sequence with connectors
Stories with dependenciesShow dependency arrows
PRD modeUse US-XXX prefixes instead of TSK-XXX

Escalation to PRD

Simple fixes stay simple. Complex changes become proper PRDs with planning and decomposition.

When to escalate:

  • Update requires multiple coordinated changes across the toolkit
  • Update introduces new concepts that need documentation and testing
  • Update affects multiple agents or skills in complex ways
  • User requests formal planning for a queued update

# Example: "Add Rust support" is too complex for a simple update

@planner create a PRD from the pending Rust support request

The original pending update is deleted (superseded by PRD), and @builder implements through the normal PRD workflow.

Governance Critics

These specialized critics ensure the agent system maintains consistency and follows established contracts. They're invoked automatically during toolkit changes.

@workflow-enforcement-critic

Verifies mandatory toolkit post-change workflow artifacts and completion reporting

Purpose: Ensures agents follow required workflows after making changes

@handoff-contract-critic

Checks builder/planner/toolkit routing contracts for ownership contradictions and scope drift

Purpose: Prevents agents from stepping outside their defined responsibilities

@update-schema-critic

Validates project-updates file structure, required frontmatter, and required workflow sections

Purpose: Ensures update files are properly formatted for reliable processing

@policy-testability-critic

Flags non-testable MUST/CRITICAL/NEVER rules and suggests enforceable rewrites

Purpose: Keeps policy rules concrete and verifiable rather than aspirational

Real-World Examples

Continuous Toolkit Evolution

You're building a feature when @developer mentions it doesn't have guidance for a pattern you're using. It queues an update. Next week, @toolkit presents the queue — 5 improvements discovered organically. You review and approve them. Your toolkit just got better without any dedicated "maintenance time."

Schema Migrations at Scale

@toolkit updates the project.json schema to add a new required field. It automatically queues migration instructions for all 12 of your projects. As you work on each project, @builder offers to apply the migration. No project gets left behind; no project is forced to update before you're ready.

Team Knowledge Capture

A team member discovers a gotcha with the database library. They tell @developer to queue a toolkit update. @toolkit adds it to the coding conventions. Now every future implementation — by any team member — benefits from that knowledge. Institutional memory, automated.