Architecture

Agent Workflows & Communication

How agents communicate and coordinate through asynchronous update queues. A message-passing architecture where agents stay loosely coupled while enabling cross-agent collaboration — all under your control.

Key Principles

The toolkit uses a message-passing architecture that keeps agents independent while enabling continuous improvement.

Self-Improving System

Agents don't just follow instructions — they identify gaps and queue improvements. Your toolkit gets better every time you use it.

You Stay in Control

Every update is queued, not applied. You review, approve, modify, or reject changes on your schedule.

Separation of Concerns

Agents stay in their lane. Implementation agents don't modify the toolkit; toolkit agents don't touch your code.

Asynchronous Collaboration

Queue an idea now, address it later. Keep your flow while building a backlog of improvements.

The Three Primary Agents

These are the entry-point agents you invoke directly. Each has a distinct role and communicates with others through the update queue system.

Model Selection

Primary agents use your active OpenCode model selection by default. When no explicit model is specified in agent frontmatter, the agent inherits your current model choice — whether that's Claude, GPT-4, or another provider. This lets you switch models globally without editing agent definitions.

@planner

Project planning and PRD management

•Bootstrap new projects with stack detection
•Create and refine PRDs (Product Requirements Documents)
•Manage the PRD registry and lifecycle states
•Queue toolkit updates when discovering missing stack support

@builder

Feature implementation orchestrator

•Build features from PRDs or ad-hoc requests
•Orchestrate implementation agents (developer, tester, critic)
•Apply project updates queued by @toolkit
•Queue toolkit updates when discovering workflow gaps

@toolkit

AI toolkit maintenance

•Maintain agents, skills, templates, and scaffolds
•Process toolkit update requests from other agents
•Queue project updates when schema or patterns change
•Keep toolkit-structure.json and documentation in sync

Communication Flow

Agents communicate through file-based update queues. This keeps them loosely coupled while enabling cross-agent collaboration.

USER CONTROL — Reviews, approves, rejects, or escalates all updates

@plannerProject

@builderFeature

@toolkitAI

queue updates

pending-updates/

Any agent → @toolkit

reads & processes

@toolkit

Reviews & applies

queues project changes

project-updates/

@toolkit → @builder

applies to projects

@builder

Applies to projects

Update Queues

File-based message queues enable cross-agent communication without tight coupling.

pending-updates/Any agent → @toolkit

Toolkit improvement requests discovered during work

When any agent discovers a gap in the toolkit — a missing pattern, outdated guidance, or needed enhancement — it writes a request file here instead of modifying toolkit files directly.

Workflow

1Agent discovers toolkit gap during normal work
2Agent writes structured request to pending-updates/
3Agent notifies user: 'Queued toolkit update for @toolkit'
4@toolkit presents pending requests at session start
5User reviews each request and approves/rejects
6@toolkit applies approved changes, deletes update file, and commits

project-updates/@toolkit → @builder

Project changes required by toolkit updates

When toolkit changes require updates to existing projects — schema migrations, new required fields, or pattern changes — @toolkit queues instructions here for @builder to apply.

Workflow

1@toolkit makes a change that affects project structure
2@toolkit writes update instructions for each affected project
3@builder presents pending updates when user selects a project
4User chooses to apply, defer, or skip each update
5@builder delegates to @developer to apply the changes
6@builder deletes the update file and verifies deletion before marking complete

User Control: Users can review requests, ask clarifying questions, modify the scope, or reject updates entirely. Nothing is applied automatically.

Project Update File Format

Each project update file requires YAML frontmatter with a scope field indicating what type of work is required.

project-updates/my-project/2026-02-21-migrate-to-capabilities.md

---
scope: implementation
date: 2026-02-21
priority: normal
breaking: false
---

# Update: Migrate 'features' to 'capabilities'

## What to change
Rename the `features` field to `capabilities` in project.json.

## Steps
1. Open docs/project.json
2. Rename `features` key to `capabilities`
3. Run npm run typecheck to verify

## Verification
- `npm run typecheck` passes
- No references to old field name remain

Required Scope Values

Scope	Handled By	Description
`implementation`	@builder	Code changes only — file edits, config updates, migrations
`planning`	@planner	PRD/documentation changes, architecture decisions, planning metadata in `docs/project.json`
`mixed`	Both	Requires both planning and implementation work

Planner Write Allowlist

Planning-scope updates can target docs/project.json for planning metadata changes (e.g., capability flags, PRD lifecycle settings). This file is included in the planner's write allowlist alongside PRD files and the prd-registry. @planner owns all planning metadata updates to docs/project.json.

Update File Lifecycle

Update files must be deleted after successful application to prevent re-processing. Agents must verify deletion before marking the update complete.

1
Apply changes
Execute all steps specified in the update file
2
Run verification
Execute verification commands from the update file
3
Delete update file
Remove the update file from the queue
4
Verify deletion
Confirm the file no longer exists before marking complete

Why verify deletion? If an agent fails to delete the update file (or deletion fails silently), the update will be presented again on the next session, leading to confusion or duplicate work. Verification ensures the lifecycle completes cleanly.

Integration Provisioning Automation

When projects need integrations (Stripe, Resend, OpenAI, etc.), the agent system uses a lightweight workflow to provision and promote integration-specific skills.

Planner Adds Integration Tasks

During PRD creation, @planner identifies integration needs by analyzing user stories and requirements. When an integration is detected (e.g., "accept payments" → Stripe), the planner:

•Sets capability flags in docs/project.json (e.g., capabilities.payments: true)
•Adds the integration to the integrations array
•Adds an integration skill task to the PRD (non-blocking, just part of the work)

Builder Creates Missing Skills

During build, @builder checks if integration skills exist and creates missing ones on-demand using meta-skill generators:

•Checks docs/project.json for required integrations
•Loads the appropriate meta-skill generator (e.g., stripe-skill-generator)
•Generates the skill file to docs/skills/<integration>/SKILL.md
•Records the generated skill in docs/project.json → skills.generated[]

Builder Queues Toolkit Promotion

After creating a new integration skill, @builder always queues a toolkit promotion update. @toolkit later reviews and promotes mature patterns:

•@builder queues promotion update to pending-updates/ for every new skill
•@toolkit reviews queued updates and extracts reusable patterns
•Updates meta-skill generators with improved patterns
•Future projects automatically benefit from the promoted patterns

Available Meta-Skill Generators

These generators create project-specific skills based on detected capabilities and integrations.

Generator	Triggered By	Generates
`auth-skill-generator`	`capabilities.authentication`	Auth flow patterns, session handling
`stripe-skill-generator`	`integrations: ["stripe"]`	Payment flows, webhook handling
`email-skill-generator`	`capabilities.email`	Transactional email templates, delivery
`crud-skill-generator`	Any project with database	Entity patterns, validation, API routes
`ai-tools-skill-generator`	`capabilities.ai`	AI tool definitions, chatbot patterns

Lightweight Integration Workflow

Integration skills are created on-demand during build, with no blocking dependencies. Builder automatically queues promotion updates, ensuring patterns flow back to the toolkit for future projects.

Builder Startup Behavior

When @builder starts a session, it performs optional file checks to determine current work state. Missing files are expected and handled gracefully.

Startup Sequence

1
Check docs directory listing
List docs/ contents to see what files exist before attempting reads.
2
Read builder-state.json (if present)
Only read docs/builder-state.json if it appears in the listing. A missing file is normal — it means no prior session state.
3
Load PRD and project context
Read docs/prd.json and docs/project.json to understand current work and project configuration.
4
Present workflow dashboard
Show the interactive dashboard with workflow options: P (PRD), A (Ad-hoc), U (Updates), E (E2E tests). Wait for user selection before proceeding.
5
Start dev server (after workflow selection)
Check dev server health and start it if needed. This happens after the user selects a workflow, not immediately on session start.
6
Detect available CLIs (optional)
Check which service CLIs are installed and authenticated: vercel, supabase, aws, gh, etc. Authenticated CLIs are shown in the dashboard and stored for direct use throughout the session.

Why check first? Attempting to read a missing file triggers an error. By listing the directory first, the builder knows which files exist and avoids unnecessary error handling for expected missing state.

Dev Server Check Timing

Dev server checks are deferred until workflow selection — after the user chooses P, A, U, or E. The agent must pass the strict readiness gate via scripts/check-dev-server.sh before proceeding. This avoids blocking the dashboard while waiting for a server that may not be needed (e.g., for documentation-only updates).

Correct

Load state → Show dashboard → User selects workflow → Check/start dev server → Execute workflow

Avoid

Load state → Check/start dev server → Show dashboard (blocks user while server starts)

Token Budget Management

Builder startup enforces token-light file reads to preserve context window capacity for actual work.

Why This Matters

AI agents have ~128K token context limits. Careless file reads during startup can consume 15,000+ tokens from a single file — for example, prd-registry.json at 50KB+. This leaves insufficient capacity for implementation, debugging, and iterative conversation.

Token-Light Read Rules

File Type	Strategy
JSON files >10KB	Use `jq` to extract only needed fields
Text files >50 lines	Use offset/limit reads — specific sections only
Log files	Never read in full — use `tail` or `grep`
Source code	Read specific files, not entire directories

Example: Reading prd-registry.json

Wasteful (~50KB)

cat docs/prd-registry.json

Token-Efficient (~2KB)

jq '[.prds[] | {id, name, status}]' \
  docs/prd-registry.json

Skills Loading Strategy

Skills are loaded on-demand, never eagerly at session start. Large skills are deferred until actually needed:

Skill	Size / Tokens	Load When
`test-flow`	~698 lines (unified)	Any story execution — unified orchestrator (skip gate → activity resolution → quality pipeline → completion)
`adhoc-workflow`	61KB / ~15K tokens	Entering ad-hoc mode
`prd-workflow`	34KB / ~9K tokens	Selecting a PRD
`builder-state`	23KB / ~6K tokens	Reference in-line, rarely need full skill
`builder-dashboard`	5KB / ~1.3K tokens	Rendering status dashboards
`builder-error-recovery`	4KB / ~1K tokens	Error recovery escalation
`builder-verification`	14KB / ~3.5K tokens	Verification loops and quality gates
`session-setup`	3KB / ~750 tokens	Session startup sequence

Note: The test-flow skill was consolidated from 7 sub-skills into a unified orchestrator (~698 lines). It absorbed test-activity-resolution and test-quality-checks directly, while continuing to load 5 Tier 2 sub-skills on demand (test-verification-loop, test-prerequisite-detection, ui-test-flow, test-ui-verification, test-failure-handling). The unified orchestrator owns the full quality cycle: skip gate → activity resolution → quality check pipeline → completion prompt.

Builder Extraction (Phase 2): Following the same pattern, builder.md had three large sections extracted into dedicated skills: builder-dashboard, builder-error-recovery, and builder-verification. This reduced the base agent file by ~30% while making these capabilities available on-demand. The new session-setup skill handles session initialization with the always-on coordination model.

Key Rule: Startup file reads should total <10KB. Skills are loaded on-demand, never eagerly at session start. This preserves context capacity for the actual implementation work.

Verification Pipeline Resolution

Before committing any code change, Builder must resolve and execute the correct verification pipeline. This ensures desktop apps are rebuilt and relaunched, while web apps rely on HMR — all verified using the appropriate Playwright variant.

Automatic UI Project Detection

No configuration is required for Playwright verification. Projects are automatically detected as UI projects when any of:

•postChangeWorkflow.steps[] contains a step with "playwright" in its name or command
•apps.*.testing.framework contains "playwright"
•apps.*.type is "frontend" or "desktop"

The legacy agents.verification.mode: "playwright-required" is still respected but no longer required. Use "no-ui" to explicitly opt out.

Check for postChangeWorkflow Override

If project.json defines a postChangeWorkflow, its steps array is executed in order. If any required: true step fails, the commit is blocked. Auto-inference (Step 2) is skipped entirely.

Auto-Infer from apps[] Configuration

When no override exists, Builder reads apps[] from project.json and selects the pipeline based on app type, framework, and web content strategy.

App Type	Framework	webContent	Pipeline
desktop	electron	bundled	typecheck → test → build → relaunch Electron → verify-with-playwright-electron
desktop	electron	remote	typecheck → test → ensure Electron running → verify-with-playwright-electron
desktop	electron	hybrid	typecheck → test → build → relaunch Electron → verify-with-playwright-electron
web	any	n/a	typecheck → test → verify-with-playwright (dev server + HMR)
mobile	react-native	n/a	typecheck → test → (no automated UI verify yet)
No apps[]	—	—	Fall back to existing quality checks (typecheck/lint/test)

Critical: Desktop apps always use playwright-electron, never browser-based verification. Even webContent: "remote" requires connecting Playwright to the Electron process, not opening localhost in a browser.

Execute the Resolved Pipeline

Run each step in order. If any required step fails, block the commit, fix the issue, and re-run from the failed step.

Skip Conditions

The verification pipeline can be skipped when all changed files match one of these patterns:

•Docs-only changes (*.md files only)
•Config-only changes (project.json or similar)
•Test-only changes (*.test.ts, *.spec.ts, __tests__/)
•CI/build config changes (.github/, Dockerfile)
•Lockfile-only changes (pnpm-lock.yaml, package-lock.json)
•User explicitly says "skip verification"

Story-Scoped Playwright PRD mode

In PRD mode, Playwright runs are story-scoped: only tests covering changed files and their 1-hop import consumers are executed. The full suite is never auto-run per-story.

Scoping Algorithm

1.Find direct test files for each changed source file (path overlap, route overlap, explicit mapping)
2.Follow import graph 1 hop to find direct consumers of each changed file
3.Find test files for those consumers too
4.Union all scoped tests — if empty, fall back to full Playwright command from postChangeWorkflow

Example: 1-hop dependency scoping

Story changes: calculateLineItem() in src/invoices/line-items.ts

Direct tests:
  → e2e/invoices/line-items.spec.ts

1-hop consumers (files that import line-items.ts):
  → src/invoices/invoice-total.ts
  → src/components/InvoiceEditor.tsx

Consumer tests:
  → e2e/invoices/totals.spec.ts
  → e2e/desktop/invoice-editor.spec.ts

Scoped run: 3 test files

Full suite is user-initiated only. Builder does not auto-run the full Playwright suite. Users can request it at any time.

Playwright Retry Strategy PRD mode

In PRD mode, Playwright failures use a 5-attempt retry with fix attempts between each retry. After 5 failures, the Playwright step is skipped and logged — Builder continues to the next story.

Attempt	Context Given to @developer
1	Error message + test file
2	Previous error + what attempt 1 tried
3	Full history + stack trace + DOM snapshot
4	All prior context + alternative approach suggestion
5	All prior context + "last attempt before skip" flag

Differs from ad-hoc mode: The general verification loop uses 3 attempts then stops and asks the user. In PRD per-story mode, the goal is momentum — log the failure, continue to the next story. Skips are recorded in builder-state.json → activeWork.stories[].playwrightSkips.

Ops-Only Task Verification ad-hoc mode

When taskType is ops-with-runtime-impact, the standard pipeline (typecheck → build → test) is skipped because no source files changed. Instead, Builder runs Playwright verification against the affected runtime behavior directly after ops commands complete.

This closes the gap where ops-only fixes to browser-visible issues (CORS errors, auth failures, missing deployments) were declared "done" without browser-level verification.

Task Type	When	Verification Pipeline
`source-change`	Modifying source files (`.ts`, `.tsx`, `.go`, etc.)	Standard: typecheck → test → build → Playwright
`ops-with-runtime-impact`	CLI/ops commands only AND original issue is browser-visible (CORS, auth, API errors)	Reduced: skip typecheck/build, run Playwright against affected behavior
`ops-only`	CLI/ops commands only AND no browser-visible impact (CI config, log rotation)	None: mark complete after ops commands succeed

⛔ ops-only Guard

If the implementation modifies any source file, the task is NOT ops-only. Source files (.ts, .tsx, .go, .py, etc.) always require the standard verification pipeline — even if the change is to "infrastructure" code like IPC handlers, main process files, build config, or native API wrappers.

Common misclassification attempts:

"This is an IPC handler change, it's infrastructure" → It's a source file → source-change
"This is a main process change, not web content" → It's a source file → source-change
"This is a build/config change" → If it modifies a source file → source-change

✅ Valid ops-only examples: supabase secrets set, vercel env add, gh secret set, restarting a service

Post-Ops Verification Flow

After ops commands complete, Builder reads taskType from builder-state.json and branches:

1.Read verificationTarget from state (what behavior to verify)
2.Write or identify existing Playwright test for the target behavior
3.Execute Playwright — retry up to 3 times with fix attempts on failure
4.If test passes → mark verified and proceed to completion

Ops Task	Test Target	Example Assertion
Deploy edge functions	Function endpoint responds	expect(response.status()).toBe(200)
Set secrets / env vars	Feature that uses the secret	Navigate to feature → verify no error
Deploy infrastructure	Dependent app behavior	Feature using infra is functional
Database migration (remote)	App reads migrated data	Navigate to page → verify data renders

Browser verification required: The Playwright test must verify the user-visible behavior that was broken, not just that the ops command succeeded. A curl 200 is NOT sufficient when the original issue was browser-visible.

Todo Synchronization

All three primary agents maintain a synchronization contract between OpenCode's right-panel todos and persistent state files. This enables session resumability across restarts.

Synchronization Contract

1
Restore on startup
Each agent checks for its state file and replays all stored todos to the OpenCode right panel using todowrite.
2
Sync during work
When creating, updating, or completing todos in the UI, immediately persist the change to the agent's state file with status and priority.
3
Clean up on completion
When a workflow completes (PRD shipped, ad-hoc finished, update applied), clear the relevant todos from both the UI and persistent state.

Agent State Files

Each primary agent persists todos to its own state file:

Agent	State File
`@builder`	<project>/docs/builder-state.json
`@planner`	<project>/docs/planner-state.json
`@toolkit`	~/.config/opencode/.tmp/toolkit-state.json

Builder Todo Flows

PRD Workflow (P)

•One todo per user story
•Reference ID: Story ID (US-###)

Ad-hoc Workflow (A)

•One todo per decomposed task
•Reference ID: adhoc-###
•Pre-analysis screenshot captured before code analysis (Step 0.0a)
•Task type classified: source-change, ops-with-runtime-impact, or ops-only (Step 0.1a)
•Playwright analysis probe confirms DOM state before implementation (Step 0.1b)
•Design decision detection surfaces implicit choices before coding (Step 0.1c)
•Playwright analysis validation confirms visual alignment (Step 0.1d)
•Task Spec generated from analysis — used as @developer delegation contract
•[F] Flow chart option shows implementation plan with per-story pipeline steps
•Mandatory verification plan in every ANALYSIS COMPLETE dashboard

Pending Updates (U)

•One todo per update file
•Reference ID: Update filename

Deferred E2E (E)

•One todo per queued E2E test
•Reference ID: E2E file path

Playwright Analysis Probe (Step 0.1b)

Before showing the ANALYSIS COMPLETE dashboard, Builder runs a lightweight Playwright probe against the live app to confirm code analysis conclusions. This catches discrepancies between what the code says and what actually renders — such as elements hidden by CSS, gated by feature flags, or rendered differently at runtime.

6 Assertion Types

visible, absent, enabled, disabled, text-contains, exists

Probe Outcomes

Confirmed → proceed with badge · Contradicted → re-analyze with lower confidence

🔌 Probe Transport Detection (Step 2)

Before building the probe spec, Builder reads project.json → architecture.deployment to select the correct transport:

Deployment Type	Transport	Spec Format
`web`, `web-*`, `serverless`	`browser`	Browser Probe Spec
`electron-only`, `desktop`, `tauri`	`electron`	Electron Probe Spec
`hybrid`	Both	One probe per transport

⛔ Desktop apps have NO browser-accessible web server

Never use baseUrl: "http://localhost:{devPort}" for desktop/Electron apps. Using browser transport against localhost will probe the wrong thing (or nothing at all). If architecture.deployment is electron-only, you must use transport: electron with paths from apps.desktop.testing.

Browser Probe Spec

transport: browser
baseUrl: "http://localhost:{devPort}"
assertions:
  - page: "/checkout"
    checks:
      - selector: "button[type='submit']"
        expect: "visible"

Electron Probe Spec

transport: electron
launchTarget: # from project.json
executablePath: # per-platform
zombieCleanup: true
assertions:
  - window: "main"
    checks:
      - selector: "button[type='submit']"
        expect: "visible"
electronChecks:
  - type: "ipc-response"
    channel: # relevant IPC channel

⛔ No-Bypass Rule

The probe cannot be skipped through rationalization. Common invalid excuses:

Invalid Rationalization	Why It's Wrong
"This is an Electron/desktop app"	Electron apps have web content — probe the web UI
"The analysis is clear from code"	Code analysis misses runtime state, CSS, route guards
"This is a UX flow restructuring"	UX changes affect visible elements — probe them
"The user described it clearly"	User descriptions are input, not verification
"I already took a screenshot"	Screenshots show current state; probes verify specific assertions
"This is a backend/config change"	If the change has any runtime UI impact, probe the affected pages
"This change cannot be verified via Playwright"	Every source code change has observable effects — re-analyze what the change affects in the rendered UI
"This is a main process / IPC / native API change"	Main process changes affect what the renderer shows — IPC handlers serve data to web content
"Code analysis is definitive"	Code analysis is input to the probe, not a replacement — runtime behavior diverges from source
"The critical path cannot be verified in a browser"	If the app has any web content, there are browser-observable effects of every code change

The ONLY way a probe can be skipped is if the user explicitly accepts a skip after Builder has exhausted all options and asked for assistance. This sets probeStatus: "user-skipped" — Builder cannot set this status autonomously.

⚠️ Page Targeting Rule

Probes must target the actual pages being modified — not just whatever public pages are accessible. If the analysis identifies changes to /dashboard and /settings, assertions must be generated for those pages — not only /login because it's public.

If target pages require authentication, Builder follows the autonomous auth resolution protocol to authenticate before probing. A probe that only checked public pages when the actual changes target authenticated pages is not "partially confirmed" — it's not probed at all.

🔐 Auth Resolution Escalation

When probing authenticated pages, Builder resolves authentication autonomously — it never asks the user for credentials. The escalation ladder:

1Check project.json → authentication for existing config
2If configured — load the matching auth skill (Supabase OTP, NextAuth credentials, headless, etc.)
3If not configured — load setup-auth skill to auto-detect and configure
4Only if all approaches fail — degrade to public-page-only probing with degraded-no-auth status

Mandatory Verification Plan

Every ANALYSIS COMPLETE dashboard includes a 🔧 VERIFICATION PLAN section making explicit what verification will run and why. This prevents Builder from "forgetting" verification after ops commands complete.

Source Changes

Whether any source files will be modified

Pipeline

Exact verification steps that will execute

Playwright Scope

What behavior will be browser-verified

Dashboard Layout

The ANALYSIS COMPLETE dashboard organizes its recommendations into distinct sections:

✅

RECOMMENDED APPROACH

Always shown as its own section — never listed inside alternatives

🔀

ALTERNATIVES

Non-recommended options only; collapsed if no alternatives exist

🔧

VERIFICATION PLAN

Mandatory — shows task type, source changes, pipeline, and Playwright scope

⚙️

IMPLEMENTATION DECISIONS

Shown when Step 0.1c detected and resolved design decisions — lists each decision with the user's choice

Planner Todo Flows

Draft Refinement (D)

•One todo per refinement task/question batch
•Reference ID: draft-<slug>-task-###

New PRD (N)

•One todo per creation step
•Reference ID: new-prd-<slug>-step-###

Move to Ready (R)

•One todo per PRD moved
•Reference ID: prd-<slug>

Planning Updates (U)

•One todo per planning update file
•Reference ID: Update filename

Toolkit Todo Flows

Pending Updates

•One todo per pending update file
•Ref: pending-update filename

Direct Requests

•One todo per user request task
•Ref: toolkit-task-###

Post-Change Workflow

•One todo per mandatory step
•Ref: postchange-step-###

uiTodos Schema (builder-state.json)

The uiTodos object in builder-state.json stores the current todo state.

docs/builder-state.json

{
  "lastActivity": "2026-02-21T15:30:00Z",
  "currentPrd": "feature-auth",
  "currentStory": "US-003",
  "uiTodos": {
    "items": [
      {
        "id": "todo-1",
        "content": "Implement login form validation",
        "status": "in_progress",
        "priority": "high",
        "createdAt": "2026-02-21T14:00:00Z"
      },
      {
        "id": "todo-2",
        "content": "Add password reset flow",
        "status": "pending",
        "priority": "medium",
        "createdAt": "2026-02-21T14:05:00Z"
      }
    ],
    "syncedAt": "2026-02-21T15:30:00Z"
  }
}

Why persist todos? Sessions can be interrupted at any time. By persisting todo state, each primary agent can restore exactly where work left off — including in-progress items the user was tracking.

Story Processing Pipeline

Every story — whether from a PRD or an ad-hoc request — passes through the same mandatory 6-step pipeline. No agent may skip steps or reorder them. The adhoc-workflow and prd-workflow skills reference this pipeline — they do not define their own.

for each story in activeWork.stories where status == "pending":
run Pipeline Steps 1–6

Set story status → in_progress

Update activeWork.stories[currentStoryIndex].status to "in_progress" in builder-state.json.

Delegate implementation → @developer

Delegate the story to @developer with full story context (story ID, description, acceptance criteria, project context block). If @developer returns an error, the story is marked failed and the pipeline stops.

Run test-flow → unconditional call

Load and execute test-flow unconditionally. test-flow owns the full quality cycle including skip-gate evaluation, activity resolution, quality checks (typecheck / lint / test / rebuild / critic / Playwright), fix loop (redelegation to @developer, re-check, retry), and the completion prompt. This is not a single pass — it includes the entire fix/critic/redelegation loop until pass or exhaustion.

Auto-commit → mandatory after test-flow passes

Auto-commit is unconditional and mandatory — it always commits after each story completes, regardless of any git.autoCommit setting. Per-story commits are required for resumability and audit trail.

git add -A
git commit -m "feat: [story description] ([story-id])"

Update story status → completed

Update the story with status: "completed", committedAt timestamp, commitHash, and testFlowResult.

Advance to next story

Increment activeWork.currentStoryIndex. If more pending stories exist, the loop continues from Step 1.

Failure Handling

Failure Point	Story Status	Pipeline Action
@developer returns error (Step 2)	`failed`	STOP — report to user
test-flow exhausts retries (Step 3)	`failed`	STOP — report to user

Session Resume

When Builder starts, it checks builder-state.json → activeWork. If any story has a non-terminal status, a Resume Dashboard is shown instead of the normal startup.

Old-Format Field Detection

If builder-state.json contains legacy fields (activePrd, activeTask, adhocQueue) without an activeWork field, they are cleared entirely and the session starts fresh. No backward-compatibility migration is performed.

Resume Dashboard

Mode:   prd (feature-auth)
Branch: feature/auth

Stories:
  ✅ US-001  Create user model          completed
  ✅ US-002  Add validation              completed
  ❌ US-003  Implement auth flow         failed
  ⏳ US-004  Add error handling          pending
  ⏳ US-005  Write integration tests     pending

Progress: 2/5 completed | 1 failed | 2 remaining

[R] Resume from next pending story
[A] Abort — mark remaining as cancelled
[S] Start fresh — archive and begin new session

Status icons: ✅ completed · ❌ failed · 🔄 in_progress · ⏸ skipped · ⏳ pending · 🚫 cancelled

Failed Story Handling

If any stories have status: "failed", they are listed individually before the main resume options. The user must explicitly choose for each failed story — no automatic retry.

❌ US-003: Implement auth flow
   Error: test-flow failed — 2 unit tests failing
   Files: src/auth/flow.ts, src/auth/middleware.ts

   [R] Retry — reset to pending and re-run full pipeline
   [S] Skip — mark as skipped, move on
   [A] Abort — stop all work, cancel remaining stories

Choice	Behavior
[R] Resume	Continue from first pending story. Use existing activeWork — do not re-analyze.
[A] Abort	Set all pending stories to cancelled. Keep completed and skipped as-is. Clear activeWork.
[S] Start fresh	Archive current activeWork, then clear it. Start a new session from the main dashboard.

Design Decision Detection

Step 0.1c in the ad-hoc workflow surfaces implicit design and implementation decisions that the user should weigh in on before Builder proceeds. These are decisions about how to build it well — not clarifications about what to build.

When to Skip (No Questions)

Decision detection is skipped entirely when the request is clearly trivial:

Skip Criterion	Examples
Bug fix with clear root cause	"Fix the 404 on /settings"
Typo / copy correction	"Change 'Submitt' to 'Submit'"
Version bump / dependency update	"Update React to 18.3"
Config-only change	"Change the timeout to 30s"
Ops-only task	"Deploy the edge functions"
Single-file, single-behavior change	"Make the header sticky"

When to Detect Decisions

Run decision detection when the request involves:

•Multiple reasonable implementation variants — more than one experienced developer would reasonably choose a different approach
•UX behavior choices — navigation, state persistence, validation timing, progressive disclosure
•Data lifecycle decisions — soft vs hard delete, sync vs async, cache invalidation, retry policy
•Component composition — modal vs page, wizard vs form, inline vs overlay, tabs vs accordion
•Error handling strategy — toast vs inline, retry vs fail, graceful degradation approach

Questions UI

When decisions are detected, Builder presents them as lettered multiple-choice questions:

1. Should wizard state persist so users can leave and resume?
   A. Yes — save progress to localStorage/DB
   B. No — reset on page leave (simpler)

2. When should validation run?
   A. Per-step — each step validates before allowing Next
   B. Final step — validate everything at submission

Reply with codes (e.g., "1A, 2B") or describe your preference.
Type "you decide" to let me choose based on best practices.

•Maximum 5 questions per request (highest-impact first)
•Each question has 2–4 concrete options with brief explanations
•Single round only — no follow-up questions after user answers
•Decisions the user already specified in their request are omitted entirely
•Supports planning.considerations from project.json — relevant consideration questions are included (up to the 5-question max)

Playwright Analysis Validation (Step 0.1d)

After design decisions are resolved, Builder runs a second Playwright pass to visually confirm that the complete analysis — including any adjustments from decision resolution — aligns with what's actually rendered. This applies to every request — all projects get full Playwright verification.

Step 0.1b (Probe)

Confirms analysis findings — element existence, absence, state. Tests specific assertions.

Step 0.1d (Validation)

Validates overall analysis makes sense visually — right page, right components, right context.

Validation Result	Action
Analysis aligns with visual state	Proceed — record `visualValidation: "confirmed"`
Minor discrepancies	Adjust analysis, note discrepancies in dashboard, proceed
Major contradiction	Re-analyze from updated visual context, lower confidence

Flow Chart Option

When the ANALYSIS COMPLETE dashboard is shown, the [F] option generates an ASCII flow chart showing the full implementation plan adapted to the specific stories from analysis.

Implementation Flow Chart

  4 stories │ Story Processing Pipeline (per story)
  ──────────┤
            │
  ┌─────────────────────────────────────────────────┐
  │ TSK-001: Add loading state to SubmitButton       │
  │   implement → test-flow → auto-commit            │
  └──────────────────────┬──────────────────────────┘
                         │
  ┌─────────────────────────────────────────────────┐
  │ TSK-002: Show Spinner when loading               │
  │   implement → test-flow → auto-commit            │
  └──────────────────────┬──────────────────────────┘
                         │
  ┌─────────────────────────────────────────────────┐
  │ TSK-003: Disable button during submission        │
  │   implement → test-flow → auto-commit            │
  └──────────────────────┬──────────────────────────┘
                         │
  ┌─────────────────────────────────────────────────┐
  │ TSK-004: Add unit tests                          │
  │   implement → test-flow → auto-commit            │
  └─────────────────────────────────────────────────┘

  Pipeline per story:
    1. Set status → in_progress
    2. Delegate to @developer
    3. Run test-flow (typecheck → lint → test → Playwright → fix loop)
    4. Auto-commit (mandatory, unconditional)
    5. Update status → completed
    6. Advance to next story

Scenario	Flow Chart Behavior
Single story	One box, no connecting lines
Multi-story (no deps)	Vertical sequence with connectors
Stories with dependencies	Show dependency arrows
PRD mode	Use US-XXX prefixes instead of TSK-XXX

Escalation to PRD

Simple fixes stay simple. Complex changes become proper PRDs with planning and decomposition.

When to escalate:

•Update requires multiple coordinated changes across the toolkit
•Update introduces new concepts that need documentation and testing
•Update affects multiple agents or skills in complex ways
•User requests formal planning for a queued update

# Example: "Add Rust support" is too complex for a simple update

@planner create a PRD from the pending Rust support request

The original pending update is deleted (superseded by PRD), and @builder implements through the normal PRD workflow.

Governance Critics

These specialized critics ensure the agent system maintains consistency and follows established contracts. They're invoked automatically during toolkit changes.

@workflow-enforcement-critic

Verifies mandatory toolkit post-change workflow artifacts and completion reporting

Purpose: Ensures agents follow required workflows after making changes

@handoff-contract-critic

Checks builder/planner/toolkit routing contracts for ownership contradictions and scope drift

Purpose: Prevents agents from stepping outside their defined responsibilities

@update-schema-critic

Validates project-updates file structure, required frontmatter, and required workflow sections

Purpose: Ensures update files are properly formatted for reliable processing

@policy-testability-critic

Flags non-testable MUST/CRITICAL/NEVER rules and suggests enforceable rewrites

Purpose: Keeps policy rules concrete and verifiable rather than aspirational

Real-World Examples

Continuous Toolkit Evolution

You're building a feature when @developer mentions it doesn't have guidance for a pattern you're using. It queues an update. Next week, @toolkit presents the queue — 5 improvements discovered organically. You review and approve them. Your toolkit just got better without any dedicated "maintenance time."

Schema Migrations at Scale

@toolkit updates the project.json schema to add a new required field. It automatically queues migration instructions for all 12 of your projects. As you work on each project, @builder offers to apply the migration. No project gets left behind; no project is forced to update before you're ready.

Team Knowledge Capture

A team member discovers a gotcha with the database library. They tell @developer to queue a toolkit update. @toolkit adds it to the coding conventions. Now every future implementation — by any team member — benefits from that knowledge. Institutional memory, automated.

Related Concepts

The Agent Loop

Understand the build-review-ship cycle for implementing features.

Learn more

Understanding Agents

Deep dive into primary vs sub-agents and how they specialize.

Learn more