The Human in the Loop (you)

Working with AI agents is a collaboration, not a handoff. This guide explains how to work effectively with Planner, Builder, and Toolkit to move from idea to shipped change.

What You'll Learn

Each primary agent has a distinct purpose. Understanding when and how to use each one helps you accomplish goals efficiently.

Planner

Turn ideas into implementation-ready PRDs. Refine scope, clarify requirements, and break features into user stories.

Builder

Execute PRDs or handle ad-hoc tasks. Implements features, runs quality gates, and commits working code.

Toolkit

Evolve agents, skills, and scaffolds safely. Handles toolkit-level changes that affect how all agents behave.

Working with Planner

Turn ideas into implementation-ready PRDs

Planner is your partner for requirements engineering. You bring the vision; Planner helps structure it into clear, testable user stories. The goal is always a PRD that Builder can execute without ambiguity.

The Draft-to-Ready Journey

Most features go through a refinement loop before they're ready for implementation. Here's what that looks like:

Create a Draft PRD

Start by describing your feature at a high level. Planner generates a draft PRD with user stories, acceptance criteria, and a suggested branch name. Draft PRDs live in docs/prds/ with status "draft".

Refine Requirements

Review the draft with Planner. Challenge assumptions, add edge cases, and tighten acceptance criteria. This is where you catch scope creep before implementation begins.

Clarify Scope

Explicitly call out what's in scope and what's not. This prevents Builder from over-building and keeps PRDs focused on one coherent set of changes.

Mark as Ready

When you're confident the PRD is complete, mark it ready. The file moves to docs/prd.json (or stays in docs/prds/ with status "ready"). Builder can now pick it up.

Draft Refinement Tips

Challenge the stories

Ask "What if this story fails?" and "What edge cases does this miss?" Planner can add error handling and boundary conditions.

Split large stories

If a story has more than 4-5 acceptance criteria, it's probably too big. Ask Planner to break it into smaller pieces.

Review Planner's definition of done

Planner authors acceptance criteria for each story. Review them for specificity—push back on vague criteria like "works well" and ask for measurable outcomes like "Response time under 200ms".

Check dependencies

Ensure stories are ordered so dependencies are built first. Ask Planner to reorder if the sequence doesn't make sense.

Clarifying Scope

A good PRD explicitly states boundaries. When refining with Planner, address these questions:

In Scope

• What features are included?
• Which user flows are covered?
• What quality bars must be met?
• Which platforms/browsers are supported?

Out of Scope

• What's explicitly excluded?
• What's deferred to future PRDs?
• What edge cases won't be handled?
• What existing behavior stays unchanged?

Pro tip: Add a "Non-Goals" section to your PRD. This prevents Builder from implementing features you didn't ask for and keeps the scope tight.

Practical Prompts for Planning Sessions

Here are copy-ready prompts for common planning scenarios:

New Feature

Starting a planning session for a new feature

@planner I want to add [feature description]. The main user goal is [what the user wants to accomplish]. Key constraints: [constraints if any].

Refine Draft

Continuing to refine an existing draft PRD

@planner Review the draft PRD at docs/prds/[prd-name].json. I have concerns about [specific stories or criteria]. Can we refine those?

Edge Cases

Adding error handling and edge cases to stories

@planner For US-[number] in the current PRD, what edge cases are we missing? Add acceptance criteria for error states and boundary conditions.

Scope Check

Verifying scope boundaries before marking ready

@planner Review the scope of this PRD. Is anything missing that should be included? Is anything included that should be deferred? Add explicit non-goals if needed.

Mark Ready

Finalizing and activating a PRD for implementation

@planner The PRD looks good. Mark it as ready and copy it to docs/prd.json so Builder can start implementation.

When to Use Planner vs Builder

Situation	Agent	Why
New multi-story feature	Planner	Needs requirements breakdown before code
Quick bug fix	Builder	Scope is already clear, just needs implementation
Uncertain requirements	Planner	Need to explore and clarify before building
Existing PRD ready	Builder	Planning done, time to execute
Refactoring with defined scope	Builder	Technical change, requirements are implicit

Working with Builder

Execute PRDs or handle ad-hoc tasks

Builder is your implementation partner. Give it a ready PRD and it works through each story systematically. Give it a direct task and it handles it immediately. Builder orchestrates specialist agents, runs quality gates, and commits working code.

Two Operating Modes

Builder operates in two distinct modes depending on what you need. Choose the right mode to get the best results.

PRD Mode

For implementing features from a ready PRD. Builder picks up stories in priority order, implements each one, runs tests, commits changes, and marks stories as complete.

Systematic story-by-story execution

Auto-generates tests after each story

Tracks progress in prd.json

Ad-hoc Mode

For quick fixes, one-off tasks, and direct requests. No PRD needed—just describe what you need and Builder handles it immediately with a batch/verify/ship workflow.

Immediate execution

Tests on request or after completion

Great for bug fixes and refactoring

When to Use Each Mode

Task	Mode	Why
Multi-story feature	PRD	Structured execution with progress tracking
Quick bug fix	Ad-hoc	No planning overhead for simple changes
Code refactoring	Ad-hoc	Technical work with implicit requirements
New user flow	PRD	Complex feature needing structured stories
Add a new API endpoint	Ad-hoc	Single-task, clear scope
Launch a new feature	PRD	Multiple stories, acceptance criteria needed

The Update Flow

Builder participates in a continuous improvement cycle. When specialist agents discover gaps or toolkit changes require project updates, the update flow keeps everything in sync.

Agent discovers a gap or improvement

Queues update in pending-updates/

@toolkit reviews and applies toolkit changes

@builder applies project updates when you work

You stay in control: Updates are queued, not applied automatically. You review, approve, defer, or skip each update when Builder presents them.

Expected Outcomes

Here's what you can expect when working with Builder in each mode:

PRD Mode Outcomes

Each story implemented and committed separately
Unit tests auto-generated after each story
E2E tests queued for affected UI areas
Progress tracked in prd.json and progress.txt
Feature branch ready for PR when complete

Ad-hoc Mode Outcomes

Task completed and committed quickly
Tests generated on request or at completion
Quality gates still enforced (lint, typecheck)
No PRD overhead for simple changes
Ready to push or create PR immediately

Practical Prompts for Builder Sessions

Here are copy-ready prompts for common Builder scenarios:

PRD Mode

Starting implementation of a ready PRD

@builder Start implementing docs/prd.json. Begin with the first incomplete story.

PRD Mode

Continuing work on the next story

@builder Continue with the PRD. Pick up where we left off.

Ad-hoc

Fixing a bug without a PRD

@builder Fix the login form validation—it's allowing empty email addresses through.

Ad-hoc

Quick refactoring task

@builder Extract the date formatting logic in UserProfile into a shared utility.

Ad-hoc

Adding a new API endpoint

@builder Add a GET /api/users/me endpoint that returns the current user's profile.

Advanced Reliability Features

Builder applies techniques from AI delegation research to ensure sub-agent work is verifiable, resumable, and recoverable — even under adverse conditions.

Verification Contracts

Before delegating a story or task to a sub-agent, Builder generates a verification contract — a structured spec of what "done" looks like: expected files, required behaviours, and validation checks. After delegation completes, Builder validates the output against the contract before marking the story as passing. This catches incomplete work that might otherwise silently pass.

Checkpoint Serialization

Builder saves a detailed checkpoint at each major milestone — story start, post-delegation, post-test, pre-commit. Checkpoints capture the completed steps, pending steps, and any decisions made. If a session ends mid-story (rate limit, network drop, browser close), the next session resumes from the exact checkpoint rather than the beginning of the story.

Dynamic Reassignment

When a sub-agent fails twice on the same task, Builder consults a fallback chain — an ordered list of alternative agents that can handle the same work. For example, if the primary React developer agent fails, Builder may reassign to a general developer agent. Fallback chains are defined in the toolkit and can be overridden per project in docs/project.json under agents.fallbackChains.

Pro Tips for Working with Builder

Let it commit: Builder commits after each story/task. This creates clean, reviewable history.
Check progress.txt: Builder logs learnings here. Read it to see patterns it discovered.
Quality gates are automatic: Lint and typecheck run before commits. Don't worry about broken code sneaking through.
Start with PRD mode: For anything beyond a one-liner, PRD mode gives you better tracking and traceability.

Working with Toolkit

Evolve agents, skills, and scaffolds safely

Toolkit operates at a different level than Planner and Builder. While those agents work within a specific project, Toolkit manages the shared infrastructure—agents, skills, scaffolds, and data files—that all projects use. Changes here ripple across every project that uses the toolkit.

The Key Distinction: Toolkit vs Project

Understanding which level you're working at prevents confusion and keeps changes in the right place.

Project Level

Changes that affect one project. Features, bug fixes, refactoring—all handled by @planner and @builder.

Uses

docs/prd.json
docs/project.json
docs/progress.txt

Toolkit Level

Changes that affect all projects. Agent behavior, skill definitions, scaffold templates—handled by @toolkit.

Manages

agents/*.md
skills/*/SKILL.md
scaffolds/*

What Toolkit Changes

Toolkit owns the "meta" layer—the definitions that control how agents behave across all projects.

Agents

The instruction sets that define how each agent thinks and acts. Located in agents/*.md. Changes here affect all projects using that agent.

Skills

Specialized workflows agents can load on demand. Located in skills/*/SKILL.md. Add new skills for patterns that agents encounter repeatedly.

Scaffolds

Templates for new projects. Located in scaffolds/*. Defines project structure, dependencies, and configuration for different stacks.

Data Files

Configuration and reference data. Located in data/*.json. Detection rules, triggers, and lookup tables that agents consult.

When to Use Toolkit vs Project Flows

Scenario	Agent	Why
Fix a bug in your app	Builder	Project-specific change
Improve how Builder handles tests	Toolkit	Changes agent behavior globally
Plan a new feature	Planner	Project-specific requirements
Add a new skill for form handling	Toolkit	Reusable across projects
Add project-specific docs	Builder	Lives in project repo
Update scaffold templates	Toolkit	Affects new project creation

Rule of thumb: If your change affects only the current project, use @planner or @builder. If it affects how agents work across all projects, use @toolkit.

The Pending-Updates Handoff Flow

Project agents can't modify toolkit files directly. Instead, they queue requests that @toolkit reviews and applies. This keeps toolkit changes intentional and coordinated.

@builder discovers a toolkit gap while working

Example: "This project uses Playwright but there's no E2E skill"

@builder writes request to pending-updates/

File format: YYYY-MM-DD-agent-description.md

You invoke @toolkit to review pending updates

@toolkit shows queued requests and asks what to do with each

@toolkit applies approved changes to toolkit repo

Updates agents, skills, or scaffolds; archives the request

All projects get improved agents on next session

Changes propagate automatically via toolkit config

Why this indirection? Toolkit changes are high-impact. This flow ensures you review each change before it affects all projects, preventing accidental regressions.

Practical Prompts for Toolkit Sessions

Here are copy-ready prompts for common Toolkit scenarios:

Review

Reviewing and applying queued updates

@toolkit Review pending updates. Show me what's queued and let me decide what to apply.

New Skill

Creating a new skill for a common pattern

@toolkit Create a skill for [pattern name]. It should help agents [what the skill enables]. Use [project] as a reference.

Agent Update

Improving an agent's behavior

@toolkit Update @builder to [describe the behavior change]. This fixes [problem observed in projects].

Scaffold

Creating a scaffold for a new stack

@toolkit Create a scaffold for [stack name]. Base it on [existing scaffold] but add [modifications].

Audit

Checking toolkit coverage for a project

@toolkit Audit toolkit coverage for [project path]. What skills or agents are missing for its stack?

Pro Tips for Working with Toolkit

Batch your reviews: Let updates accumulate for a few days, then review them together for context.
Test on one project first: After toolkit changes, run a project through @builder to verify the change works as expected.
Document skill triggers: Good skills have clear trigger phrases so agents know when to load them.
Keep agents focused: Resist adding too much to a single agent. If behavior gets complex, extract it to a skill.

Website Sync Modes

When Toolkit makes changes that affect documentation websites (like this one), it uses configurable sync modes to determine how updates are handled. The mode is resolved from your local overrides file, with a safe public default.

Mode Resolution

Toolkit checks .local/toolkit-overrides.json for your configured sync mode. If not present, the public default disabled is used.

Mode	Behavior
`disabled`default	No website sync. Toolkit makes local changes only.
`owner-managed`	Toolkit owner has direct access. Syncs changes to linked website projects.
`queue-file`	Writes sync requests to a queue file for later processing.

Configuring Local Overrides

Create .local/toolkit-overrides.json in your toolkit directory to configure sync behavior:

{
  "websiteSync": {
    "mode": "owner-managed",
    "projectId": "opencode-toolkit-website"
  }
}

The websiteSync.projectId identifies which website project to sync with when running in owner-managed mode. This file is gitignored and stays local to your machine.

Note: The public toolkit defaults to disabled to ensure safe out-of-the-box behavior. Only toolkit maintainers with direct website access should configure owner-managed or queue-file modes.

Multi-Session Coordination

What you need to do when running parallel sessions

Session coordination is now always-on. The toolkit automatically detects when you're running multiple sessions and activates full coordination. In solo sessions, it uses a lightweight "lazy heartbeat" (local-only, no git ops). Here's what you need to know as the human operator.

Your Decision Points

Session coordination is automatic: The toolkit detects multiple sessions and coordinates them. No flag to enable—just start your sessions and the system handles locks and heartbeats.
Assign PRDs to sessions: When starting each session, tell it which PRD to work on. Agents will auto-claim but you decide the assignment.
Release stale locks: If a session crashes, its lock lingers. Check session-locks.json and delete entries older than 10 minutes to unblock other sessions.
Resolve merge conflicts: If agents can't auto-resolve conflicts during rebase, you'll need to step in. Keep PRDs small and independent to minimize this.

Do

• Keep PRDs focused (5–10 stories max)
• Assign non-overlapping PRDs to sessions
• Check session-locks.json periodically
• Let agents finish before switching PRDs

Don't

• Don't edit files an agent session is working on
• Don't manually force-merge without rebasing
• Don't run sessions on the same PRD simultaneously
• Don't ignore stuck sessions—release their locks

Quick check: Run cat docs/session-locks.json to see active locks. If a session's lastHeartbeat is stale, remove that entry to release the lock.

End-to-End Operating Loops

Repeatable workflows from idea to shipped change

These loops give you a predictable path from idea to production. Each loop defines clear handoff points between agents and explicit completion criteria so you always know where you are and what's next.

New Feature Loop

Use this loop when building a new capability that requires planning, multiple stories, and structured implementation.

Plan with @planner

Start Here

Describe your feature idea. Planner creates a draft PRD with user stories and acceptance criteria.

@planner I want to add user notifications. Users should get in-app and email alerts for mentions.

Handoff: PRD marked ready

Implement with @builder

Builder picks up stories in priority order. Each story gets implemented, tested, and committed. Repeat until all stories pass.

@builder Implement docs/prd.json. Start with the first story.

Optional Handoff: Builder queues toolkit gaps

Apply toolkit updates (if queued)

Optional

If Builder discovered missing agents or skills, review and apply them before your next feature.

@toolkit Review pending updates and apply approved changes.

Completion: All stories pass, PR ready

Ship

Done

Review PR, merge to main, deploy. PRD is archived automatically.

Loop Complete When:

All stories in prd.json have passes: true
All tests pass (unit + E2E if applicable)
PR merged to main branch
PRD archived to docs/prds/archive/

Quick Fix Loop

Use this loop for bug fixes, small improvements, and one-off tasks that don't warrant full planning.

Fix directly with @builder (ad-hoc mode)

Start Here

Describe the problem or task. Builder implements immediately without a PRD.

@builder The submit button doesn't disable during form submission. Fix it to prevent double-clicks.

Builder implements + commits

Verify the fix

Check in browser or ask Builder to add a regression test. Quality gates (lint, typecheck) run automatically.

@builder Add a test to prevent this bug from recurring.

Completion: Fix verified, ready to push

Ship

Done

Push directly or create a quick PR. No PRD cleanup needed.

Loop Complete When:

Fix implemented and committed
Quality gates pass (lint, typecheck)
Changes pushed or PR created

Toolkit Sync Loop

Use this loop periodically to apply queued improvements and keep your toolkit evolving based on real project learnings.

Review pending updates with @toolkit

Start Here

See what gaps Builder discovered across your projects. Approve, defer, or skip each update.

@toolkit Show me pending updates. What's queued?

Decision: Approve changes to apply

Apply approved changes

Toolkit updates agents, skills, or scaffolds. Archived requests are moved to pending-updates/archive/.

@toolkit Apply the email skill update and the builder test improvement. Skip the rest for now.

Verify: Test changes on a project

Test on a project with @builder

Run Builder on a project to verify the toolkit changes work as expected before they affect all your work.

@builder Run the test suite to verify the toolkit changes didn't break anything.

Completion: Toolkit updated, verified working

Sync complete

Done

All projects now benefit from the improved agents, skills, and scaffolds.

Loop Complete When:

Pending updates reviewed (applied, deferred, or skipped)
Toolkit changes committed
Changes verified on at least one project

Choosing the Right Loop

Situation	Loop	Agents Involved
Multi-story feature	New Feature	Planner → Builder (→ Toolkit)
Bug fix or quick improvement	Quick Fix	Builder only
Refactoring with clear scope	Quick Fix	Builder only
Improving agent behavior	Toolkit Sync	Toolkit → Builder (verify)
Weekly maintenance	Toolkit Sync	Toolkit → Builder (verify)

Pro Tips for Operating Loops

Don't mix loops: If a quick fix grows complex, pause and start a New Feature loop with proper planning.
Batch Toolkit Syncs: Don't sync after every feature. Let 3-5 updates accumulate for context.
Check progress.txt first: Before starting any loop, read the Codebase Patterns section to avoid repeating mistakes.
Trust the completion criteria: A loop isn't done until all criteria are met. Partial completion leads to drift.

Agent Resilience

Agents are designed to handle real-world interruptions gracefully. Whether a network hiccup cuts a session short or the AI provider temporarily limits requests, agents save their state and resume cleanly — without losing work or requiring you to start over.

Rate Limit Handling

When the AI provider temporarily limits requests (HTTP 429), agents detect it immediately and pause gracefully instead of retrying in a loop.

1Detect the limit — Agent identifies the 429 response and stops making new requests immediately.
2Save state — Current task description, last action, and context anchor are written to builder-state.json before stopping.
3Notify you — A clear message shows what was in progress, what was saved, and how to resume after waiting a few minutes.
4Resume cleanly — When you return, the agent reads saved state and continues from where it left off.

Session Resumability

Builder tracks an activeWork object in builder-state.json throughout every session. This unified state model covers both PRD and ad-hoc work — if a session ends unexpectedly (power loss, network drop, or a browser close), the next session reads this state and offers to resume exactly where work stopped.

What is saved per task:

Task description and which story or ad-hoc todo was active
Last completed action (e.g., "committed US-003, about to start US-004")
Context anchor — a short summary of what the agent knew at pause time
Rate limit timestamp, if that was the reason for stopping
Analysis gate status — whether you already approved the task for implementation
Playwright probe status — whether live DOM checks confirmed or contradicted the code analysis (includes auth degradation states like degraded-no-auth when authenticated pages couldn't be probed)

The analysis gate checkpoint (analysisCompleted) and probe status (probeStatus) are particularly important: they survive context compaction, ensuring Builder never starts implementing without your prior approval and live DOM confirmation — even in very long sessions where earlier conversation history has been summarized.

Tool Error Recovery

Transient errors — network timeouts, brief disconnects — are retried automatically once before escalating. Rate limits are never auto-retried; they always pause and notify you.

Error Type	Behavior
429 Rate Limit	Save state, notify, pause — no auto-retry
499 / Timeout	Retry once automatically, then ask you
Network drop	Retry once automatically, then ask you
Sub-agent failure	Check partial work, retry with context, report after 2 failures

Commit Gate

Builder will not commit code for a completed story if the required post-change checks have not passed. This prevents half-finished work from landing in your codebase silently.

Required before any commit:

Typecheck must pass (tsc --noEmit)
Unit tests must pass (if story has testIntensity > low)
Story status in PRD JSON updated to passes: true

Root Cause Analysis Requirement

Before attempting any fix, agents must diagnose the root cause first. This prevents band-aid fixes that hide real bugs and create technical debt.

Agents are instructed to stop if they catch themselves:

Adding setTimeout or delays to mask timing issues
Using !important in CSS instead of fixing specificity
Making multiple speculative changes in one edit
Swallowing errors with empty catch blocks

Instead, agents trace the problem systematically — checking for duplicate selectors, cascade conflicts, conditional branches, and data flow — before forming a hypothesis and making a targeted single-change fix.

Identity Lock Protection

When agents commit code, they verify the git identity matches your configured user to prevent commits under the wrong identity. This is especially important when working across multiple machines with different git configurations.

Protection includes:

Verify git user.name and user.email before committing
Alert if identity differs from expected configuration
Never modify git config — report and ask instead

Quick Start Prompts

Copy-ready prompts to start working with each agent

Click the copy button on any prompt below to start a session with the right agent. Replace bracketed placeholders with your specific details.

Planner

Use these prompts to plan features and create PRDs.

Start planning a new feature

@planner I want to add [feature]. The main user goal is [what they accomplish]. Key constraints: [any limitations].

Refine an existing draft PRD

@planner Review the draft PRD at docs/prds/[name].json. Add edge cases for [specific area] and tighten the acceptance criteria.

Mark a PRD as ready for implementation

@planner The PRD looks good. Mark it ready and copy to docs/prd.json.

Split a story that's too large

@planner US-[number] has too many acceptance criteria. Break it into smaller, focused stories.

Builder

Use these prompts for PRD implementation and ad-hoc tasks.

PRD Mode

Start implementing a ready PRD

@builder Implement docs/prd.json. Start with the first incomplete story.

Continue with the next story

@builder Continue with the PRD. Pick up the next incomplete story.

Implement a specific story

@builder Implement US-[number] from the current PRD.

Ad-hoc Mode

Fix a bug

@builder Fix [describe the bug]. It happens when [trigger condition].

Add a quick feature

@builder Add [small feature]. It should [expected behavior].

Refactor code

@builder Refactor [component/function]. Extract [what to extract] into a shared utility.

Add tests for existing code

@builder Add unit tests for [file or component]. Cover the main flows and edge cases.

Toolkit

Use these prompts to evolve agents, skills, and scaffolds.

Review pending updates from projects

@toolkit Review pending updates. Show what's queued and let me decide what to apply.

Create a new skill

@toolkit Create a skill for [pattern]. It should help agents [what it enables]. Reference [project] for examples.

Update agent behavior

@toolkit Update @builder to [behavior change]. This addresses [problem observed].

Audit toolkit coverage for a project

@toolkit Audit toolkit coverage for [project path]. What skills or agents are missing?

Create a new scaffold template

@toolkit Create a scaffold for [stack]. Base it on [existing scaffold] with [modifications].

Prompt Tips

Be specific: "Fix the login form" is better than "fix the bug".
Include context: Mention file paths, user flows, or error messages when relevant.
State constraints: If something should NOT change, say so upfront.
Use the right agent: Planning goes to @planner, implementation to @builder, toolkit changes to @toolkit.