This update corrects misleading performance and cost claims in the documentation: CORRECTED CLAIMS: - Performance: Changed from "40-50% faster" to "20-30% faster" (honest observation) - Token Cost: Changed from "60-70% savings" to "1.9-2.0x more expensive" (actual cost) - Parallelism: Clarified "concurrent requests" vs "true parallel execution" - Architecture: Updated from "parallel" to "concurrent" throughout NEW DOCUMENTATION: - REALITY.md: Honest assessment and reality vs. marketing - ARCHITECTURE.md: Technical details on concurrent vs. parallel execution - TOKEN-USAGE.md: Detailed token cost breakdown and optimization strategies UPDATED FILES: - master-orchestrator.md: Accurate performance, cost, and when-to-use guidance - README.md: Updated architecture overview and trade-offs KEY INSIGHTS: - Concurrent agent architecture IS valuable but for different reasons: * Main thread context is clean (20-30% of single-agent size) * 4 independent expert perspectives (genuine value) * API rate limiting affects actual speed (20-30% typical) * Cost is 1.9-2.0x tokens vs. single agent analysis - Best for enterprise quality-critical work, NOT cost-efficient projects - Includes decision matrix and cost optimization strategies This update maintains technical accuracy while preserving the genuine benefits of multi-perspective analysis and context isolation that make the system valuable.
15 KiB
15 KiB
Token Usage & Cost Analysis
Version: 1.0.0 Date: 2025-10-31 Purpose: Understand the true cost of concurrent agents vs. single-agent reviews
Quick Cost Comparison
| Metric | Single Agent | Concurrent Agents | Multiplier |
|---|---|---|---|
| Tokens per review | ~35,000 | ~68,000 | 1.9x |
| Monthly reviews (5M tokens) | 142 | 73 | 0.5x |
| Cost multiplier | 1x | 2x | - |
| Time to execute | 39-62 min | 31-42 min | 0.6-0.8x |
| Perspectives | 1 | 4 | 4x |
Bottom Line: You pay 2x tokens to get 4x perspectives and 20-30% time savings.
Detailed Token Breakdown
Single Agent Review (Baseline)
STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens
STAGES 2-5: COMPREHENSIVE ANALYSIS (Single Agent)
├─ Code review analysis: ~8,000 tokens
├─ Architecture analysis: ~10,000 tokens
├─ Security analysis: ~8,000 tokens
├─ Multi-perspective analysis: ~6,000 tokens
└─ Subtotal: ~32,000 tokens
STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~3,000 tokens
├─ Action plan creation: ~2,000 tokens
└─ Subtotal: ~5,000 tokens
STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens
TOTAL SINGLE AGENT: ~44,500 tokens (~35,000-45,000 typical)
Concurrent Agents Review
STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens
STAGE 2: CODE REVIEW AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
│ (re-establishing context, no shared history)
├─ Git diff input: ~2,000 tokens
│ (agent needs own copy of diff)
├─ Code quality analysis: ~10,000 tokens
│ (duplication, errors, secrets, style)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~15,500 tokens
STAGE 3: ARCHITECTURE AUDIT AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ File structure input: ~2,500 tokens
│ (agent needs file paths and structure)
├─ Architecture analysis: ~12,000 tokens
│ (6-dimensional analysis)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~18,000 tokens
STAGE 4: SECURITY & COMPLIANCE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Code input for security review: ~2,000 tokens
├─ Security analysis: ~11,000 tokens
│ (OWASP, dependencies, secrets)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~16,000 tokens
STAGE 5: MULTI-PERSPECTIVE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Feature description: ~1,500 tokens
│ (agent needs less context, just requirements)
├─ Multi-perspective analysis: ~9,000 tokens
│ (6 stakeholder perspectives)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~13,500 tokens
STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~4,000 tokens
│ (4 sets of results to aggregate)
├─ Action plan creation: ~2,500 tokens
└─ Subtotal: ~6,500 tokens
STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens
TOTAL CONCURRENT AGENTS: ~76,500 tokens (~68,000-78,000 typical)
Why Concurrent Costs More
Cost Difference Breakdown:
Extra overhead from concurrent approach:
├─ Agent initialization (4x): 8,000 tokens
│ (each agent re-establishes context)
├─ Input duplication (4x): 8,000 tokens
│ (each agent gets its own copy of files)
├─ Result aggregation: 2,000 tokens
│ (main thread consolidates 4 result sets)
├─ Synthesis complexity: 1,500 tokens
│ (harder to merge 4 perspectives)
└─ API overhead: ~500 tokens
(4 separate API requests)
TOTAL EXTRA COST: ~20,000 tokens
(~32,000 base + 20,000 overhead = 52,000)
BUT agents run in parallel, so you might expect:
- Sequential single agent: 44,500 tokens
- Concurrent 4 agents: 44,500 / 4 = 11,125 per agent
- Total: ~44,500 tokens
ACTUAL concurrent: 76,500 tokens
Why the gap?
- No shared context between agents
- Each agent re-does setup
- Each agent needs full input data
- Results aggregation is not "free"
Token Cost by Analysis Type
Code Review Agent Token Budget
Input Processing:
├─ Git diff loading: ~2,000 tokens
├─ File context: ~1,000 tokens
└─ Subtotal: ~3,000 tokens
Analysis:
├─ Readability review: ~2,000 tokens
├─ Duplication detection: ~2,000 tokens
├─ Error handling check: ~2,000 tokens
├─ Secret detection: ~1,500 tokens
├─ Test coverage review: ~1,500 tokens
├─ Performance analysis: ~1,000 tokens
└─ Subtotal: ~10,000 tokens
Output:
├─ Formatting results: ~1,000 tokens
├─ Severity prioritization: ~500 tokens
└─ Subtotal: ~1,500 tokens
Code Review Total: ~14,500 tokens
Architecture Audit Agent Token Budget
Input Processing:
├─ File structure loading: ~2,500 tokens
├─ Module relationship mapping: ~2,000 tokens
└─ Subtotal: ~4,500 tokens
Analysis (6 dimensions):
├─ Architecture & Design: ~2,500 tokens
├─ Code Quality: ~2,000 tokens
├─ Security: ~2,000 tokens
├─ Performance: ~1,500 tokens
├─ Testing: ~1,500 tokens
├─ Maintainability: ~1,500 tokens
└─ Subtotal: ~11,000 tokens
Output:
├─ Dimension scoring: ~1,500 tokens
├─ Recommendations: ~1,000 tokens
└─ Subtotal: ~2,500 tokens
Architecture Total: ~18,000 tokens
Security & Compliance Agent Token Budget
Input Processing:
├─ Code loading: ~2,000 tokens
├─ Dependency list: ~1,000 tokens
└─ Subtotal: ~3,000 tokens
Analysis:
├─ OWASP Top 10 check: ~3,000 tokens
├─ Dependency vulnerability scan: ~2,500 tokens
├─ Secrets/keys detection: ~2,000 tokens
├─ Encryption review: ~1,500 tokens
├─ Auth/AuthZ review: ~1,500 tokens
├─ Compliance requirements: ~1,000 tokens
└─ Subtotal: ~11,500 tokens
Output:
├─ Severity assessment: ~1,000 tokens
├─ Remediation guidance: ~1,000 tokens
└─ Subtotal: ~2,000 tokens
Security Total: ~16,500 tokens
Multi-Perspective Agent Token Budget
Input Processing:
├─ Feature description: ~1,500 tokens
├─ Change summary: ~1,000 tokens
└─ Subtotal: ~2,500 tokens
Analysis (6 perspectives):
├─ Product perspective: ~1,500 tokens
├─ Dev perspective: ~1,500 tokens
├─ QA perspective: ~1,500 tokens
├─ Security perspective: ~1,500 tokens
├─ DevOps perspective: ~1,000 tokens
├─ Design perspective: ~1,000 tokens
└─ Subtotal: ~8,000 tokens
Output:
├─ Stakeholder summary: ~1,500 tokens
├─ Risk assessment: ~1,000 tokens
└─ Subtotal: ~2,500 tokens
Multi-Perspective Total: ~13,000 tokens
Monthly Cost Comparison
Scenario: 5M Token Monthly Budget
SINGLE AGENT APPROACH
├─ Tokens per review: ~35,000
├─ Reviews per month: 5,000,000 / 35,000 = 142 reviews
├─ Cost efficiency: Excellent
└─ Best for: High-frequency reviews, rapid feedback
CONCURRENT AGENTS APPROACH
├─ Tokens per review: ~68,000
├─ Reviews per month: 5,000,000 / 68,000 = 73 reviews
├─ Cost efficiency: Half as many reviews
└─ Best for: Selective, high-quality reviews
COST COMPARISON
├─ Same budget: 5M tokens
├─ Single agent can do: 142 reviews
├─ Concurrent can do: 73 reviews
├─ Sacrifice: 69 fewer reviews per month
├─ Gain: 4 expert perspectives per review
Pricing Impact (USD)
Assuming Claude 3.5 Sonnet pricing (~$3 per 1M tokens):
SINGLE AGENT
├─ 35,000 tokens per review: $0.105 per review
├─ 142 reviews per month: $14.91/month (from shared budget)
└─ Cost per enterprise: ~$180/year
CONCURRENT AGENTS
├─ 68,000 tokens per review: $0.204 per review
├─ 73 reviews per month: $14.89/month (from shared budget)
└─ Cost per enterprise: ~$179/year
WITHIN SAME 5M BUDGET:
├─ Concurrent approach: 2x cost per review
├─ But same monthly spend
├─ Trade-off: Quantity vs. Quality
Optimization Strategies
Strategy 1: Use Single Agent for Everyday
Mix Approach:
├─ 80% of code reviews: Single agent (~28,000 tokens avg)
├─ 20% of code reviews: Concurrent agents (for critical work)
Monthly breakdown (5M budget):
├─ 80% single agent: ~114 reviews @ 28K tokens = ~3.2M tokens
├─ 20% concurrent agents: ~37 reviews @ 68K tokens = ~2.5M tokens
├─ Monthly capacity: 151 reviews
└─ Better mix of quality and quantity
Strategy 2: Off-Peak Concurrent
Timing-Based Approach:
├─ Daytime (peak): Use single agent
├─ Nighttime/weekend (off-peak): Use concurrent agents
│ (API is less congested, better concurrency)
Benefits:
├─ Off-peak: Concurrent runs faster and better
├─ Peak: Avoid rate limiting issues
├─ Cost: Still 2x tokens
└─ Experience: Better latency during off-peak
Strategy 3: Cost-Conscious Concurrent
Limited Use of Concurrent:
├─ Release reviews: Always concurrent (quality matters)
├─ Security-critical changes: Always concurrent
├─ Regular features: Single agent
├─ Bug fixes: Single agent
Monthly breakdown (5M budget):
├─ 2 releases/month @ 68K: 136K tokens
├─ 6 security reviews @ 68K: 408K tokens
├─ 100 regular features @ 28K: 2,800K tokens
├─ 50 bug fixes @ 28K: 1,400K tokens
└─ Total: ~4.7M tokens (stays within budget)
Reducing Token Costs
For Concurrent Agents
1. Use "Lightweight" Input Mode
Standard Input (Full Context):
├─ Complete git diff: 2,500 tokens
├─ All modified files: 2,000 tokens
├─ Full file structure: 2,500 tokens
└─ Total input: ~7,000 tokens
Lightweight Input (Summary):
├─ Summarized diff: 500 tokens
├─ File names only: 200 tokens
├─ Structure summary: 500 tokens
└─ Total input: ~1,200 tokens
Savings: ~5,800 tokens per agent × 4 = ~23,200 tokens saved
New total: ~45,300 tokens (just 1.3x single agent!)
2. Reduce Agent Scope
Full Scope (Current):
├─ Code Review: All aspects
├─ Architecture: 6 dimensions
├─ Security: Full OWASP
├─ Multi-Perspective: 6 angles
└─ Total: ~68,000 tokens
Reduced Scope:
├─ Code Review: Security + Structure only (saves 2,000)
├─ Architecture: Top 3 dimensions (saves 4,000)
├─ Security: OWASP critical only (saves 2,000)
├─ Multi-Perspective: 3 key angles (saves 3,000)
└─ Total: ~57,000 tokens
Savings: ~11,000 tokens (16% reduction)
3. Skip Non-Critical Agents
Full Pipeline (4 agents):
└─ Total: ~68,000 tokens
Critical Only (2 agents):
├─ Code Review Agent: ~15,000 tokens
├─ Security Agent: ~16,000 tokens
└─ Total: ~31,000 tokens (same as single agent)
Use when:
- Simple changes (no architecture impact)
- No security implications
- Team review not needed
When Higher Token Cost is Worth It
ROI Calculation
Extra cost per review: 33,000 tokens (~$0.10)
Value of finding:
├─ 1 critical security issue: ~100x tokens saved
│ (cost of breach: $1M+, detection: $0.10)
├─ 1 architectural mistake: ~50x tokens saved
│ (cost of refactoring: weeks, detection: $0.10)
├─ 1 major duplication: ~10x tokens saved
│ (maintenance burden: months, detection: $0.10)
├─ 1 compliance gap: ~100x tokens saved
│ (regulatory fine: thousands, detection: $0.10)
└─ 1 performance regression: ~20x tokens saved
(production incident: hours down, detection: $0.10)
Examples Where ROI is Positive
-
Security-Critical Code
- Payment processing
- Authentication systems
- Data encryption
- Cost of miss: Breach ($1M+), regulatory fine ($1M+)
- Cost of concurrent review: $0.10
- ROI: Infinite (one miss pays for millions of reviews)
-
Release Preparation
- Release branches
- Major features
- API changes
- Cost of miss: Outage, rollback, customer impact
- Cost of concurrent review: $0.10
- ROI: Extremely high
-
Regulatory Compliance
- HIPAA-covered code
- PCI-DSS systems
- SOC2 requirements
- Cost of miss: Regulatory fine ($100K-$1M+)
- Cost of concurrent review: $0.10
- ROI: Astronomical
-
Enterprise Standards
- Multiple team sign-off
- Audit trail requirement
- Stakeholder input
- Cost of miss: Rework, team friction
- Cost of concurrent review: $0.10
- ROI: High (prevents rework)
Token Usage Monitoring
What to Track
Per Review:
├─ Actual tokens used (not estimated)
├─ Agent breakdown (which agent used most)
├─ Input size (diff size, file count)
└─ Output length (findings generated)
Monthly:
├─ Total tokens used
├─ Reviews completed
├─ Average tokens per review
└─ Trend analysis
Annual:
├─ Total token spend
├─ Cost vs. budget
├─ Reviews completed
└─ ROI analysis
Setting Alerts
Rate Limit Alerts:
├─ 70% of TPM used in a minute → Warning
├─ 90% of TPM used in a minute → Critical
├─ Hit TPM limit → Block and notify
Monthly Budget Alerts:
├─ 50% of budget used → Informational
├─ 75% of budget used → Warning
├─ 90% of budget used → Critical
Cost Thresholds:
├─ Single review > 100K tokens → Unexpected (investigate)
├─ Average > 80K tokens → Possible over-analysis (review)
├─ Concurrent running during peak hours → Not optimal (schedule off-peak)
Cost Optimization Summary
| Strategy | Token Saved | When to Use |
|---|---|---|
| Mix single + concurrent | Save 40% per month | Daily workflow |
| Off-peak scheduling | Save 15% (better concurrency) | When possible |
| Lightweight input mode | Save 35% per concurrent | Non-critical reviews |
| Reduce agent scope | Save 15-20% | Simple changes |
| Skip non-critical agents | Save 50% | Low-risk PRs |
| Single agent only | 50% baseline cost | Cost-sensitive |
Recommendation
Use Concurrent Agents When:
├─ Token budget > 5M per month
├─ Quality > Cost priority
├─ Security-critical code
├─ Release reviews
├─ Multiple perspectives needed
└─ Regulatory requirements
Use Single Agent When:
├─ Limited token budget
├─ High-frequency reviews needed
├─ Simple changes
├─ Speed important (20-30% gain not material)
├─ Cost sensitive
└─ No multi-perspective requirement
Use Mix Strategy When:
├─ Want both quality and quantity
├─ Can do selective high-value concurrent reviews
├─ Have moderate token budget
├─ Enterprise with varied code types
└─ Want best of both worlds
For full analysis, see REALITY.md and ARCHITECTURE.md.