The Human in the Loop (you)
Working with AI agents is a collaboration, not a handoff. This guide explains how to work effectively with Planner, Builder, and Toolkit to move from idea to shipped change.
What You'll Learn
Each primary agent has a distinct purpose. Understanding when and how to use each one helps you accomplish goals efficiently.
Planner
Turn ideas into implementation-ready PRDs. Refine scope, clarify requirements, and break features into user stories.
Builder
Execute PRDs or handle ad-hoc tasks. Implements features, runs quality gates, and commits working code.
Toolkit
Evolve agents, skills, and scaffolds safely. Handles toolkit-level changes that affect how all agents behave.
Working with Planner
Turn ideas into implementation-ready PRDs
Planner is your partner for requirements engineering. You bring the vision; Planner helps structure it into clear, testable user stories. The goal is always a PRD that Builder can execute without ambiguity.
The Draft-to-Ready Journey
Most features go through a refinement loop before they're ready for implementation. Here's what that looks like:
Create a Draft PRD
Start by describing your feature at a high level. Planner generates a draft PRD with user stories, acceptance criteria, and a suggested branch name. Draft PRDs live in docs/prds/ with status "draft".
Refine Requirements
Review the draft with Planner. Challenge assumptions, add edge cases, and tighten acceptance criteria. This is where you catch scope creep before implementation begins.
Clarify Scope
Explicitly call out what's in scope and what's not. This prevents Builder from over-building and keeps PRDs focused on one coherent set of changes.
Mark as Ready
When you're confident the PRD is complete, mark it ready. The file moves to docs/prd.json (or stays in docs/prds/ with status "ready"). Builder can now pick it up.
Draft Refinement Tips
Challenge the stories
Ask "What if this story fails?" and "What edge cases does this miss?" Planner can add error handling and boundary conditions.
Split large stories
If a story has more than 4-5 acceptance criteria, it's probably too big. Ask Planner to break it into smaller pieces.
Review Planner's definition of done
Planner authors acceptance criteria for each story. Review them for specificity—push back on vague criteria like "works well" and ask for measurable outcomes like "Response time under 200ms".
Check dependencies
Ensure stories are ordered so dependencies are built first. Ask Planner to reorder if the sequence doesn't make sense.
Clarifying Scope
A good PRD explicitly states boundaries. When refining with Planner, address these questions:
In Scope
- • What features are included?
- • Which user flows are covered?
- • What quality bars must be met?
- • Which platforms/browsers are supported?
Out of Scope
- • What's explicitly excluded?
- • What's deferred to future PRDs?
- • What edge cases won't be handled?
- • What existing behavior stays unchanged?
Pro tip: Add a "Non-Goals" section to your PRD. This prevents Builder from implementing features you didn't ask for and keeps the scope tight.
Practical Prompts for Planning Sessions
Here are copy-ready prompts for common planning scenarios:
Starting a planning session for a new feature
Continuing to refine an existing draft PRD
Adding error handling and edge cases to stories
Verifying scope boundaries before marking ready
Finalizing and activating a PRD for implementation
When to Use Planner vs Builder
| Situation | Agent | Why |
|---|---|---|
| New multi-story feature | Planner | Needs requirements breakdown before code |
| Quick bug fix | Builder | Scope is already clear, just needs implementation |
| Uncertain requirements | Planner | Need to explore and clarify before building |
| Existing PRD ready | Builder | Planning done, time to execute |
| Refactoring with defined scope | Builder | Technical change, requirements are implicit |
Working with Builder
Execute PRDs or handle ad-hoc tasks
Builder is your implementation partner. Give it a ready PRD and it works through each story systematically. Give it a direct task and it handles it immediately. Builder orchestrates specialist agents, runs quality gates, and commits working code.
Two Operating Modes
Builder operates in two distinct modes depending on what you need. Choose the right mode to get the best results.
PRD Mode
For implementing features from a ready PRD. Builder picks up stories in priority order, implements each one, runs tests, commits changes, and marks stories as complete.
Ad-hoc Mode
For quick fixes, one-off tasks, and direct requests. No PRD needed—just describe what you need and Builder handles it immediately with a batch/verify/ship workflow.
When to Use Each Mode
| Task | Mode | Why |
|---|---|---|
| Multi-story feature | PRD | Structured execution with progress tracking |
| Quick bug fix | Ad-hoc | No planning overhead for simple changes |
| Code refactoring | Ad-hoc | Technical work with implicit requirements |
| New user flow | PRD | Complex feature needing structured stories |
| Add a new API endpoint | Ad-hoc | Single-task, clear scope |
| Launch a new feature | PRD | Multiple stories, acceptance criteria needed |
The Update Flow
Builder participates in a continuous improvement cycle. When specialist agents discover gaps or toolkit changes require project updates, the update flow keeps everything in sync.
pending-updates/You stay in control: Updates are queued, not applied automatically. You review, approve, defer, or skip each update when Builder presents them.
Expected Outcomes
Here's what you can expect when working with Builder in each mode:
PRD Mode Outcomes
- Each story implemented and committed separately
- Unit tests auto-generated after each story
- E2E tests queued for affected UI areas
- Progress tracked in prd.json and progress.txt
- Feature branch ready for PR when complete
Ad-hoc Mode Outcomes
- Task completed and committed quickly
- Tests generated on request or at completion
- Quality gates still enforced (lint, typecheck)
- No PRD overhead for simple changes
- Ready to push or create PR immediately
Practical Prompts for Builder Sessions
Here are copy-ready prompts for common Builder scenarios:
Starting implementation of a ready PRD
Continuing work on the next story
Fixing a bug without a PRD
Quick refactoring task
Adding a new API endpoint
Advanced Reliability Features
Builder applies techniques from AI delegation research to ensure sub-agent work is verifiable, resumable, and recoverable — even under adverse conditions.
Verification Contracts
Before delegating a story or task to a sub-agent, Builder generates a verification contract — a structured spec of what "done" looks like: expected files, required behaviours, and validation checks. After delegation completes, Builder validates the output against the contract before marking the story as passing. This catches incomplete work that might otherwise silently pass.
Checkpoint Serialization
Builder saves a detailed checkpoint at each major milestone — story start, post-delegation, post-test, pre-commit. Checkpoints capture the completed steps, pending steps, and any decisions made. If a session ends mid-story (rate limit, network drop, browser close), the next session resumes from the exact checkpoint rather than the beginning of the story.
Dynamic Reassignment
When a sub-agent fails twice on the same task, Builder consults a fallback chain — an ordered list of alternative agents that can handle the same work. For example, if the primary React developer agent fails, Builder may reassign to a general developer agent. Fallback chains are defined in the toolkit and can be overridden per project in docs/project.json under agents.fallbackChains.
Pro Tips for Working with Builder
- Let it commit: Builder commits after each story/task. This creates clean, reviewable history.
- Check progress.txt: Builder logs learnings here. Read it to see patterns it discovered.
- Quality gates are automatic: Lint and typecheck run before commits. Don't worry about broken code sneaking through.
- Start with PRD mode: For anything beyond a one-liner, PRD mode gives you better tracking and traceability.
Working with Toolkit
Evolve agents, skills, and scaffolds safely
Toolkit operates at a different level than Planner and Builder. While those agents work within a specific project, Toolkit manages the shared infrastructure—agents, skills, scaffolds, and data files—that all projects use. Changes here ripple across every project that uses the toolkit.
The Key Distinction: Toolkit vs Project
Understanding which level you're working at prevents confusion and keeps changes in the right place.
Project Level
Changes that affect one project. Features, bug fixes, refactoring—all handled by @planner and @builder.
Uses
docs/prd.jsondocs/project.jsondocs/progress.txt
Toolkit Level
Changes that affect all projects. Agent behavior, skill definitions, scaffold templates—handled by @toolkit.
Manages
agents/*.mdskills/*/SKILL.mdscaffolds/*
What Toolkit Changes
Toolkit owns the "meta" layer—the definitions that control how agents behave across all projects.
Agents
The instruction sets that define how each agent thinks and acts. Located in agents/*.md. Changes here affect all projects using that agent.
Skills
Specialized workflows agents can load on demand. Located in skills/*/SKILL.md. Add new skills for patterns that agents encounter repeatedly.
Scaffolds
Templates for new projects. Located in scaffolds/*. Defines project structure, dependencies, and configuration for different stacks.
Data Files
Configuration and reference data. Located in data/*.json. Detection rules, triggers, and lookup tables that agents consult.
When to Use Toolkit vs Project Flows
| Scenario | Agent | Why |
|---|---|---|
| Fix a bug in your app | Builder | Project-specific change |
| Improve how Builder handles tests | Toolkit | Changes agent behavior globally |
| Plan a new feature | Planner | Project-specific requirements |
| Add a new skill for form handling | Toolkit | Reusable across projects |
| Add project-specific docs | Builder | Lives in project repo |
| Update scaffold templates | Toolkit | Affects new project creation |
Rule of thumb: If your change affects only the current project, use @planner or @builder. If it affects how agents work across all projects, use @toolkit.
The Pending-Updates Handoff Flow
Project agents can't modify toolkit files directly. Instead, they queue requests that @toolkit reviews and applies. This keeps toolkit changes intentional and coordinated.
Example: "This project uses Playwright but there's no E2E skill"
pending-updates/File format: YYYY-MM-DD-agent-description.md
@toolkit shows queued requests and asks what to do with each
Updates agents, skills, or scaffolds; archives the request
Changes propagate automatically via toolkit config
Why this indirection? Toolkit changes are high-impact. This flow ensures you review each change before it affects all projects, preventing accidental regressions.
Practical Prompts for Toolkit Sessions
Here are copy-ready prompts for common Toolkit scenarios:
Reviewing and applying queued updates
Creating a new skill for a common pattern
Improving an agent's behavior
Creating a scaffold for a new stack
Checking toolkit coverage for a project
Pro Tips for Working with Toolkit
- Batch your reviews: Let updates accumulate for a few days, then review them together for context.
- Test on one project first: After toolkit changes, run a project through @builder to verify the change works as expected.
- Document skill triggers: Good skills have clear trigger phrases so agents know when to load them.
- Keep agents focused: Resist adding too much to a single agent. If behavior gets complex, extract it to a skill.
Website Sync Modes
When Toolkit makes changes that affect documentation websites (like this one), it uses configurable sync modes to determine how updates are handled. The mode is resolved from your local overrides file, with a safe public default.
Mode Resolution
Toolkit checks .local/toolkit-overrides.json for your configured sync mode. If not present, the public default disabled is used.
| Mode | Behavior |
|---|---|
disableddefault | No website sync. Toolkit makes local changes only. |
owner-managed | Toolkit owner has direct access. Syncs changes to linked website projects. |
queue-file | Writes sync requests to a queue file for later processing. |
Configuring Local Overrides
Create .local/toolkit-overrides.json in your toolkit directory to configure sync behavior:
{
"websiteSync": {
"mode": "owner-managed",
"projectId": "opencode-toolkit-website"
}
}The websiteSync.projectId identifies which website project to sync with when running in owner-managed mode. This file is gitignored and stays local to your machine.
Note: The public toolkit defaults to disabled to ensure safe out-of-the-box behavior. Only toolkit maintainers with direct website access should configure owner-managed or queue-file modes.
Multi-Session Coordination
What you need to do when running parallel sessions
Session coordination is now always-on. The toolkit automatically detects when you're running multiple sessions and activates full coordination. In solo sessions, it uses a lightweight "lazy heartbeat" (local-only, no git ops). Here's what you need to know as the human operator.
Your Decision Points
- Session coordination is automatic: The toolkit detects multiple sessions and coordinates them. No flag to enable—just start your sessions and the system handles locks and heartbeats.
- Assign PRDs to sessions: When starting each session, tell it which PRD to work on. Agents will auto-claim but you decide the assignment.
- Release stale locks: If a session crashes, its lock lingers. Check
session-locks.jsonand delete entries older than 10 minutes to unblock other sessions. - Resolve merge conflicts: If agents can't auto-resolve conflicts during rebase, you'll need to step in. Keep PRDs small and independent to minimize this.
Do
- • Keep PRDs focused (5–10 stories max)
- • Assign non-overlapping PRDs to sessions
- • Check session-locks.json periodically
- • Let agents finish before switching PRDs
Don't
- • Don't edit files an agent session is working on
- • Don't manually force-merge without rebasing
- • Don't run sessions on the same PRD simultaneously
- • Don't ignore stuck sessions—release their locks
Quick check: Run cat docs/session-locks.json to see active locks. If a session's lastHeartbeat is stale, remove that entry to release the lock.
End-to-End Operating Loops
Repeatable workflows from idea to shipped change
These loops give you a predictable path from idea to production. Each loop defines clear handoff points between agents and explicit completion criteria so you always know where you are and what's next.
New Feature Loop
Use this loop when building a new capability that requires planning, multiple stories, and structured implementation.
Plan with @planner
Start HereDescribe your feature idea. Planner creates a draft PRD with user stories and acceptance criteria.
Implement with @builder
Builder picks up stories in priority order. Each story gets implemented, tested, and committed. Repeat until all stories pass.
Apply toolkit updates (if queued)
OptionalIf Builder discovered missing agents or skills, review and apply them before your next feature.
Ship
DoneReview PR, merge to main, deploy. PRD is archived automatically.
Loop Complete When:
- All stories in prd.json have
passes: true - All tests pass (unit + E2E if applicable)
- PR merged to main branch
- PRD archived to docs/prds/archive/
Quick Fix Loop
Use this loop for bug fixes, small improvements, and one-off tasks that don't warrant full planning.
Fix directly with @builder (ad-hoc mode)
Start HereDescribe the problem or task. Builder implements immediately without a PRD.
Verify the fix
Check in browser or ask Builder to add a regression test. Quality gates (lint, typecheck) run automatically.
Ship
DonePush directly or create a quick PR. No PRD cleanup needed.
Loop Complete When:
- Fix implemented and committed
- Quality gates pass (lint, typecheck)
- Changes pushed or PR created
Toolkit Sync Loop
Use this loop periodically to apply queued improvements and keep your toolkit evolving based on real project learnings.
Review pending updates with @toolkit
Start HereSee what gaps Builder discovered across your projects. Approve, defer, or skip each update.
Apply approved changes
Toolkit updates agents, skills, or scaffolds. Archived requests are moved to pending-updates/archive/.
Test on a project with @builder
Run Builder on a project to verify the toolkit changes work as expected before they affect all your work.
Sync complete
DoneAll projects now benefit from the improved agents, skills, and scaffolds.
Loop Complete When:
- Pending updates reviewed (applied, deferred, or skipped)
- Toolkit changes committed
- Changes verified on at least one project
Choosing the Right Loop
| Situation | Loop | Agents Involved |
|---|---|---|
| Multi-story feature | New Feature | Planner → Builder (→ Toolkit) |
| Bug fix or quick improvement | Quick Fix | Builder only |
| Refactoring with clear scope | Quick Fix | Builder only |
| Improving agent behavior | Toolkit Sync | Toolkit → Builder (verify) |
| Weekly maintenance | Toolkit Sync | Toolkit → Builder (verify) |
Pro Tips for Operating Loops
- Don't mix loops: If a quick fix grows complex, pause and start a New Feature loop with proper planning.
- Batch Toolkit Syncs: Don't sync after every feature. Let 3-5 updates accumulate for context.
- Check progress.txt first: Before starting any loop, read the Codebase Patterns section to avoid repeating mistakes.
- Trust the completion criteria: A loop isn't done until all criteria are met. Partial completion leads to drift.
Agent Resilience
Agents are designed to handle real-world interruptions gracefully. Whether a network hiccup cuts a session short or the AI provider temporarily limits requests, agents save their state and resume cleanly — without losing work or requiring you to start over.
Rate Limit Handling
When the AI provider temporarily limits requests (HTTP 429), agents detect it immediately and pause gracefully instead of retrying in a loop.
- 1Detect the limit — Agent identifies the 429 response and stops making new requests immediately.
- 2Save state — Current task description, last action, and context anchor are written to
builder-state.jsonbefore stopping. - 3Notify you — A clear message shows what was in progress, what was saved, and how to resume after waiting a few minutes.
- 4Resume cleanly — When you return, the agent reads saved state and continues from where it left off.
Session Resumability
Builder tracks an activeWork object in builder-state.json throughout every session. This unified state model covers both PRD and ad-hoc work — if a session ends unexpectedly (power loss, network drop, or a browser close), the next session reads this state and offers to resume exactly where work stopped.
What is saved per task:
- Task description and which story or ad-hoc todo was active
- Last completed action (e.g., "committed US-003, about to start US-004")
- Context anchor — a short summary of what the agent knew at pause time
- Rate limit timestamp, if that was the reason for stopping
- Analysis gate status — whether you already approved the task for implementation
- Playwright probe status — whether live DOM checks confirmed or contradicted the code analysis (includes auth degradation states like
degraded-no-authwhen authenticated pages couldn't be probed)
The analysis gate checkpoint (analysisCompleted) and probe status (probeStatus) are particularly important: they survive context compaction, ensuring Builder never starts implementing without your prior approval and live DOM confirmation — even in very long sessions where earlier conversation history has been summarized.
Tool Error Recovery
Transient errors — network timeouts, brief disconnects — are retried automatically once before escalating. Rate limits are never auto-retried; they always pause and notify you.
| Error Type | Behavior |
|---|---|
| 429 Rate Limit | Save state, notify, pause — no auto-retry |
| 499 / Timeout | Retry once automatically, then ask you |
| Network drop | Retry once automatically, then ask you |
| Sub-agent failure | Check partial work, retry with context, report after 2 failures |
Commit Gate
Builder will not commit code for a completed story if the required post-change checks have not passed. This prevents half-finished work from landing in your codebase silently.
Required before any commit:
- Typecheck must pass (
tsc --noEmit) - Unit tests must pass (if story has
testIntensity > low) - Story status in PRD JSON updated to
passes: true
Root Cause Analysis Requirement
Before attempting any fix, agents must diagnose the root cause first. This prevents band-aid fixes that hide real bugs and create technical debt.
Agents are instructed to stop if they catch themselves:
- Adding
setTimeoutor delays to mask timing issues - Using
!importantin CSS instead of fixing specificity - Making multiple speculative changes in one edit
- Swallowing errors with empty catch blocks
Instead, agents trace the problem systematically — checking for duplicate selectors, cascade conflicts, conditional branches, and data flow — before forming a hypothesis and making a targeted single-change fix.
Identity Lock Protection
When agents commit code, they verify the git identity matches your configured user to prevent commits under the wrong identity. This is especially important when working across multiple machines with different git configurations.
Protection includes:
- Verify git user.name and user.email before committing
- Alert if identity differs from expected configuration
- Never modify git config — report and ask instead
Quick Start Prompts
Copy-ready prompts to start working with each agent
Click the copy button on any prompt below to start a session with the right agent. Replace bracketed placeholders with your specific details.
Planner
Use these prompts to plan features and create PRDs.
Start planning a new feature
@planner I want to add [feature]. The main user goal is [what they accomplish]. Key constraints: [any limitations].Refine an existing draft PRD
@planner Review the draft PRD at docs/prds/[name].json. Add edge cases for [specific area] and tighten the acceptance criteria.Mark a PRD as ready for implementation
@planner The PRD looks good. Mark it ready and copy to docs/prd.json.Split a story that's too large
@planner US-[number] has too many acceptance criteria. Break it into smaller, focused stories.Builder
Use these prompts for PRD implementation and ad-hoc tasks.
Start implementing a ready PRD
@builder Implement docs/prd.json. Start with the first incomplete story.Continue with the next story
@builder Continue with the PRD. Pick up the next incomplete story.Implement a specific story
@builder Implement US-[number] from the current PRD.Fix a bug
@builder Fix [describe the bug]. It happens when [trigger condition].Add a quick feature
@builder Add [small feature]. It should [expected behavior].Refactor code
@builder Refactor [component/function]. Extract [what to extract] into a shared utility.Add tests for existing code
@builder Add unit tests for [file or component]. Cover the main flows and edge cases.Toolkit
Use these prompts to evolve agents, skills, and scaffolds.
Review pending updates from projects
@toolkit Review pending updates. Show what's queued and let me decide what to apply.Create a new skill
@toolkit Create a skill for [pattern]. It should help agents [what it enables]. Reference [project] for examples.Update agent behavior
@toolkit Update @builder to [behavior change]. This addresses [problem observed].Audit toolkit coverage for a project
@toolkit Audit toolkit coverage for [project path]. What skills or agents are missing?Create a new scaffold template
@toolkit Create a scaffold for [stack]. Base it on [existing scaffold] with [modifications].Prompt Tips
- Be specific: "Fix the login form" is better than "fix the bug".
- Include context: Mention file paths, user flows, or error messages when relevant.
- State constraints: If something should NOT change, say so upfront.
- Use the right agent: Planning goes to @planner, implementation to @builder, toolkit changes to @toolkit.