How I run 4–8 parallel coding agents with tmux and markdown specs

Feb 26, 2026 · 13 min read

I’ve been running 4–8 parallel coding agents as my primary development setup since January 2025. No orchestrator, no custom infrastructure. Just tmux, markdown files, bash aliases, and six slash commands.

Each agent takes on a role:

Agent	What it does
Planner	Explore code, design features, iterate on specs
Worker	Implement from a finished spec
PM	Backlog grooming, prioritization, idea dumping

These are not formal subagents. No special system prompts, no skill definitions, no subagent configs. Just a naming convention.

The core idea: every unit of work gets a written spec (I call them Feature Designs, or FDs) before any agent writes code. Agents pick up specs, execute them, and close them. This separation of planning and execution is what makes parallelism work. In the last two months, this has produced 300+ completed FDs (small-to-medium scoped changes; one spec, one implementation pass) across three projects.

/fd-init bootstraps the full system into any repo. This article walks through how it works.

Feature Design tracking

Each FD is:

A numbered spec file (FD-001, FD-002, etc.) with the problem, solution, files to change, and verification plan
Tracked in an index across all FDs
Managed through slash commands for the full lifecycle

Each FD file lives in docs/features/ and moves through 8 stages:

Stage	What it means
Planned	Identified, not yet designed
Design	Actively designing the solution
Open	Designed, ready for implementation
In Progress	Currently being implemented
Pending Verification	Code complete, awaiting runtime verification
Complete	Verified working, ready to archive
Deferred	Postponed indefinitely
Closed	Won’t fix

Six slash commands handle the lifecycle:

Command	What it does
`/fd-new`	Create a new FD from an idea dump
`/fd-status`	Show the index: what’s active, pending verification, and done
`/fd-explore`	Bootstrap a session: load architecture docs, dev guide, FD index
`/fd-deep`	Launch 4 parallel Opus agents to explore a hard design problem
`/fd-verify`	Proofread code, propose a verification plan, commit
`/fd-close`	Archive the FD, update the index, update the changelog

Every commit ties back to its FD: FD-049: Implement incremental embedding refresh. The changelog accumulates automatically as FDs complete.

A typical FD file looks like this:

FD-051: Topic Hierarchy Deduplication

Status: Open          Priority: Medium
Effort: Medium        Impact: Clean topic hierarchy for downstream queries

## Problem
Topic extraction generates near-duplicate topics across runs:
"customer onboarding", "onboarding flow", "user onboarding"
all exist as separate entries.

## Solution
Add a merge step after extraction:

1. Cluster by cosine similarity (>=0.90)
2. Pick the most frequent as canonical, and alias the rest.
3. For ambiguous matches (>0.5 and <0.90), use an LLM to verify fit before merging.
4. Store alias mappings so downstream queries resolve automatically.

## Files to Modify
- src/topics/merge.py (new: LLM + clustering logic)
- src/topics/aliases.py (new: alias resolution for queries)
- sql/01_schema.sql (add topic_aliases table)
- sql/06_merge_procedure.sql (new: scheduled merge after extraction)

## Verification
1. Deploy to test env, run merge on live topic table
2. Verify no errors in operation log, run health checks
3. Query topic_aliases: "customer onboarding" cluster has expected mappings
4. Run tests, confirm alias resolution in downstream queries

The FEATURE_INDEX.md tracks status across all FDs:

## Active Features
| FD     | Title                          | Status               | Effort | Priority |
|--------|--------------------------------|----------------------|--------|----------|
| FD-051 | Topic hierarchy deduplication  | Open                 | Medium | Medium   |
| FD-052 | Streaming entity extraction    | In Progress          | Large  | High     |
| FD-050 | Confidence-based routing       | Pending Verification | Medium | High     |

## Completed
| FD     | Title                          | Completed  | Notes               |
|--------|--------------------------------|------------|---------------------|
| FD-049 | Incremental embedding refresh  | 2026-02-20 | 45 min → 2 min      |
| FD-048 | Operation log cleanup          | 2026-02-18 |                     |

Portable: `/fd-init`

The original project’s FD system was built slowly over months. I wanted the same structure in every new project without repeating that work, so I packaged it as a slash command.

Run /fd-init in any repo and it:

Infers project context from CLAUDE.md, package configs, and git log
Creates the directory structure (docs/features/, docs/features/archive/)
Generates a FEATURE_INDEX.md customized to the project
Creates an FD template
Installs the slash commands (/fd-new, /fd-status, /fd-explore, /fd-deep, /fd-verify, /fd-close)
Appends FD lifecycle conventions to the project’s CLAUDE.md

* FD System Initialized

  Files Created

  - docs/features/FEATURE_INDEX.md — Feature index
  - docs/features/TEMPLATE.md — FD file template
  - docs/features/archive/ — Archive directory
  - CHANGELOG.md — Changelog (Keep a Changelog format)
  - CLAUDE.md — Project conventions with FD management section
  - .claude/commands/fd-new.md — Create new FD
  - .claude/commands/fd-explore.md — Project exploration
  - .claude/commands/fd-deep.md — Deep parallel analysis
  - .claude/commands/fd-status.md — Status and grooming
  - .claude/commands/fd-verify.md — Verification workflow
  - .claude/commands/fd-close.md — Close and archive FD with changelog update

  Next Steps

  1. Run /fd-new to create your first feature design
  2. Run /fd-status to check the current state

The development loop

How I plan

I spend most of the time working with Planners. Each one starts with /fd-explore, which loads codebase context and past work so the agent doesn’t start from zero: architecture docs, dev guide, readmes, FD index. I customize it per project as it grows.

From there, I work through the FD design:

on fd14 - can we use airflow? how is the legacy batch notify system scheduled on airflow? service user?

In Boris Tane’s How I Use Claude Code, he describes how he uses inline annotations to give Claude feedback. I adapted this pattern for complex FDs where conversational back-and-forth can be imprecise. I edit the FD file directly in Cursor and add inline annotations prefixed with %%:

## Solution

Replace cron-based notify scheduling with Snowflake Tasks.
EVALUATE_ALERTS_SP runs as SYSTEM via user_email().

%% system user won't run as main modeling role? can we run a quick experiment to check

Batch window: 15-min intervals, drain in-flight notifications before cutover.
Route failures to the existing dead-letter queue in autonotify.

%% what happens to in-flight notifications during cutover? need to confirm drain behavior

Then in Claude Code:

fd14 - check %% notes.

Claude revises the design, removes the annotations, and the cycle repeats until the design is solid.

For critical designs, I may do two things:

I cross-check the plan in Cursor with GPT or Gemini to catch blind spots.
/fd-deep launches 4 Opus agents in parallel to explore different angles:

are you sure setting this up with the Airflow service user will bypass the RAP policy on this table? use /fd-deep.

Each agent runs in read-only Explore mode with a specific angle to investigate (algorithmic, structural, incremental, environmental, or whatever fits the problem). Once they report back, the orchestrator verifies their factual claims (file paths, function signatures, behavioral assumptions), flags contradictions, and synthesizes a ranked recommendation with confidence levels, tradeoffs, and a concrete first step.

The pattern borrows from GPT Pro’s parallel test-time compute¹, adapted for design questions where there’s no single correct answer.

Worker execution

Once an FD is marked Open, a Worker picks it up. I point it at the FD, turn on plan mode so Claude builds a line-level implementation plan, review it, then switch to accept edits and let it run. Most FDs are self-contained: one design, one implementation pass, working on a dev branch. When a feature needs isolation, I tell the agent to create a git worktree. Claude Code handles it natively. The finished FD contains all the files and details, so even after compaction the Worker stays on track.

Verification

When Workers finished, I kept typing the same things:

proofread your code end to end, must be airtight

check for edge cases again

commit now, then create a verification plan on live test deployment.

It’s well known that agents consistently find more issues when prompted to review their own work. So I built /fd-verify. It does a proofread pass, proposes a verification plan, and commits.

Some projects go further with dedicated slash commands like /test-cli that run full verification against real deployments. These aren’t traditional test suites. There’s no test runner and no assert statements. The agent reads markdown instructions, executes commands against real infrastructure, reasons about whether the results are correct, and writes structured results: markdown files with tables, timestamps, and diagnostic notes.

When something fails, the agent can investigate on the spot rather than just flagging it. By the end, the result comes back diagnosed. For systems that are inherently async and run on real data, an LLM following markdown instructions is a more natural verification harness than pytest.

One cycle

Putting it all together:

PM window:
1. /fd-status                      ← What's active, what's pending, what's done
2. Pick an FD (or /fd-new)         ← Groom the backlog or dump a new idea

Planner window (new agent session):
3. /fd-explore                     ← Load project context
4. Design the FD                   ← /fd-deep if stuck, cross-check in Cursor
5. FD status → Open                ← Design is solid, ready for implementation

Worker window (fresh agent session):
6. /fd-explore                     ← Fresh context load
7. "Implement FD-XXX" (plan mode)  ← Claude builds a line-level implementation plan
8. Implement with atomic commits   ← FD-XXX: description
9. /fd-verify                      ← Proofread, verification plan
10. Test on real deployment        ← Verification skills or manual
11. /fd-close                      ← Archive, update index, changelog

The Planner and Worker are separate sessions on purpose. Planning can burn through multiple context windows as the agent explores the codebase, and compaction tends to drop files the Planner still needs. I always start Workers fresh with just the FD, or with /fd-explore when they need broader project context.

Where the decisions live

FD files as decision traces

The development loop produces a trail of FD files. Each one captures more than the task itself. It records what was considered, what was chosen, what was rejected, and why. In practice, when a new agent picks up an FD, it may launch an Explore subagent that (unprompted to do so) finds past FDs with related work on its own. The agent arrives with context about prior decisions. The FD archive is institutional memory that accumulates with every completed feature.

The dev guide

Every project accumulates practical lessons. The dev guide (docs/dev_guide/) captures these as short entries. Agents read a summary on session start and can go deeper into any specific entry when it’s relevant to the task. Unlike the FD system (which bootstraps in seconds via /fd-init), the dev guide grows organically. Each lesson becomes a new entry as it comes up.

Some examples from one project:

Entry	What it covers
No silent fallback values	Config errors fail loudly instead of hiding behind defaults
DRY: extract helpers and utilities	Don’t rewrite the same parser or validation logic twice
No backwards compatibility	All deployments are test environments, no migration code necessary
Operation log naming conventions	Uniform operation types across all features
Embedding handling	Always normalize with `parse_embedding()`, never trust raw format from Snowpark
Deployment safety	Destructive ops must wait for running tasks to complete before modifying procedures
LLM JSON parsing	Use `parse_llm_json()` with `strict=False` and regex fallback, never raw `json.loads()`

The dev guide is separate from CLAUDE.md on purpose. CLAUDE.md loads into every session, so it stays lean: commit style, tool preferences, hard guardrails. The dev guide entries are denser, often with inline code examples, and load on demand via /fd-explore when they’re relevant to the current task.

Two-tier CLAUDE.md

Claude Code loads a CLAUDE.md file at the start of every session. I split this into two tiers:

Global (~/.claude/CLAUDE.md) sets rules that apply everywhere: no AI attribution in commits, Python and SQL conventions, and never bypass denied commands.

Project-level (<repo>/CLAUDE.md) adds project conventions and FD lifecycle rules (written by /fd-init).

How it compounds

Past FDs reduce future decision points. For example, an early FD ran performance optimization experiments on a hotpath and documented what worked and what didn’t. Months later, a new agent working on a feature that touched that same hotpath found the old FD and asked whether benchmarking was needed before writing any code. Without that FD in the archive, I would have had to catch that myself or the agent would have just gone ahead.

The physical setup

┌────────────────────────┬────────────────────────┬────────────────────────┐
│                        │                        │                        │
│  Cursor (IDE)          │  Ghostty Terminal 1    │  Ghostty Terminal 2    │
│                        │  tmux                  │  tmux                  │
│                        │                        │                        │
│  Visual browsing       │  Window 1: PM          │  Window 1: Worker      │
│  Hand edits            │  Window 2: Planner     │  Window 2: Worker      │
│  Cross-model checks    │  Window 3: Planner     │  Window 3: Worker      │
│                        │  Window 4: Planner     │  Window 4: bash        │
│                        │                        │                        │
└────────────────────────┴────────────────────────┴────────────────────────┘

Three panels across an ultrawide monitor:

Cursor (left) for visual code browsing, hand edits, and cross-checking plans with other models like GPT or Gemini.
Two Ghostty terminals (middle and right), each running a tmux session.

Two coding agents across the terminals:

Claude Code is my daily driver for general-purpose coding.
Cortex Code is Snowflake’s coding agent. It runs the latest Opus model, loads the same CLAUDE.md file, and comes bundled with internal custom profiles that connect to MCPs and prebuilt skills tailored to Snowflake workflows.

I use mostly vanilla tmux to navigate: Ctrl-b n/p to cycle windows, Ctrl-b , to rename them (planner, worker-fd038, PM), Ctrl-b c to spin up a new agent, Ctrl-b s to browse sessions. A few custom additions: Shift-Left/Right to reorder windows, m to move a window between sessions, and renumber-windows on so closing a tab doesn’t leave gaps.

Every project gets a g* alias (“go to”) for instant navigation:

Alias	Project
`gllmt`	~/workspace/snowflake/llmt
`gautonotify`	~/workspace/snowflake/autonotify
`gdatakit`	~/workspace/datakit
`gclaude`	~/.claude

Claude reads them too. I tell Claude “run the eval in gllmt” and it resolves the alias to the actual path.

When an agent finishes, the tmux tab changes color. Two config layers make this work:

Layer	File	What it does
Claude Code	`~/.claude/settings.json`	`Notification` hook (matcher: `idle_prompt`) sends bell (`\a`) to terminal
tmux	`~/.tmux.conf`	`monitor-bell on`, `bell-action any`, `window-status-bell-style reverse`

Agent goes idle, Claude Code fires the hook, tmux catches the bell and inverts the tab color.

tmux tabs with multiple agent sessions — tmux tabs showing active agent sessions: PM, planners, and an fd-init run. Tabs change color when an agent goes idle. *(click to enlarge)*

What’s hard

With 6+ agents running, there’s always something waiting for me, like a Planner with design questions or a Worker ready for verification. Managing that is where the system starts to strain.

Cognitive load is the real ceiling. Around 8 agents is my practical max. Past that, I lose track of what each one is doing and design decisions suffer.

Not everything parallelizes. Some features have sequential dependencies. Forcing parallelism on inherently serial work creates merge conflicts and wasted effort.

Context window limits. Planners burn through context windows fast. When compaction kicks in, it can drop files the agent needs to continue the design. I’ve learned to checkpoint FD progress before compaction hits.

Sandbox whack-a-mole. I deny destructive commands (rm, git reset --hard, DROP). The agent finds creative alternatives: unlink, python -c "import os; os.remove()", find ... -delete. The permission system has evaluation order quirks where blanket allows override specific denies. My CLAUDE.md now says “If a command is denied, that’s the answer. Ask the user to do it.”

Translating business context into FDs is still manual. Jira tickets, Slack threads, meeting notes, product decisions. I’m the bridge between all of that and a well-scoped FD. A dedicated subagent profile would close this gap.

OpenAI describes GPT-5 Pro as using "scaled but efficient parallel test-time compute." Nathan Lambert on Lex Fridman #490 discusses the broader pattern of inference-time scaling: giving models more compute at generation time to explore multiple reasoning paths. ↩

If you try this, I'd love to hear what you change.