This update corrects misleading performance and cost claims in the documentation: CORRECTED CLAIMS: - Performance: Changed from "40-50% faster" to "20-30% faster" (honest observation) - Token Cost: Changed from "60-70% savings" to "1.9-2.0x more expensive" (actual cost) - Parallelism: Clarified "concurrent requests" vs "true parallel execution" - Architecture: Updated from "parallel" to "concurrent" throughout NEW DOCUMENTATION: - REALITY.md: Honest assessment and reality vs. marketing - ARCHITECTURE.md: Technical details on concurrent vs. parallel execution - TOKEN-USAGE.md: Detailed token cost breakdown and optimization strategies UPDATED FILES: - master-orchestrator.md: Accurate performance, cost, and when-to-use guidance - README.md: Updated architecture overview and trade-offs KEY INSIGHTS: - Concurrent agent architecture IS valuable but for different reasons: * Main thread context is clean (20-30% of single-agent size) * 4 independent expert perspectives (genuine value) * API rate limiting affects actual speed (20-30% typical) * Cost is 1.9-2.0x tokens vs. single agent analysis - Best for enterprise quality-critical work, NOT cost-efficient projects - Includes decision matrix and cost optimization strategies This update maintains technical accuracy while preserving the genuine benefits of multi-perspective analysis and context isolation that make the system valuable.
560 lines
15 KiB
Markdown
560 lines
15 KiB
Markdown
# Token Usage & Cost Analysis
|
||
|
||
**Version:** 1.0.0
|
||
**Date:** 2025-10-31
|
||
**Purpose:** Understand the true cost of concurrent agents vs. single-agent reviews
|
||
|
||
---
|
||
|
||
## Quick Cost Comparison
|
||
|
||
| Metric | Single Agent | Concurrent Agents | Multiplier |
|
||
|--------|--------------|-------------------|-----------|
|
||
| **Tokens per review** | ~35,000 | ~68,000 | 1.9x |
|
||
| **Monthly reviews (5M tokens)** | 142 | 73 | 0.5x |
|
||
| **Cost multiplier** | 1x | 2x | - |
|
||
| **Time to execute** | 39-62 min | 31-42 min | 0.6-0.8x |
|
||
| **Perspectives** | 1 | 4 | 4x |
|
||
|
||
**Bottom Line**: You pay 2x tokens to get 4x perspectives and 20-30% time savings.
|
||
|
||
---
|
||
|
||
## Detailed Token Breakdown
|
||
|
||
### Single Agent Review (Baseline)
|
||
|
||
```
|
||
STAGE 1: GIT PREPARATION (Main Thread)
|
||
├─ Git status check: ~500 tokens
|
||
├─ Git diff analysis: ~2,500 tokens
|
||
├─ File listing: ~500 tokens
|
||
└─ Subtotal: ~3,500 tokens
|
||
|
||
STAGES 2-5: COMPREHENSIVE ANALYSIS (Single Agent)
|
||
├─ Code review analysis: ~8,000 tokens
|
||
├─ Architecture analysis: ~10,000 tokens
|
||
├─ Security analysis: ~8,000 tokens
|
||
├─ Multi-perspective analysis: ~6,000 tokens
|
||
└─ Subtotal: ~32,000 tokens
|
||
|
||
STAGE 6: SYNTHESIS (Main Thread)
|
||
├─ Results consolidation: ~3,000 tokens
|
||
├─ Action plan creation: ~2,000 tokens
|
||
└─ Subtotal: ~5,000 tokens
|
||
|
||
STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
|
||
├─ User interaction: Variable (assume 2,000 tokens)
|
||
├─ Pre-push verification: ~1,500 tokens
|
||
├─ Commit message generation: ~500 tokens
|
||
└─ Subtotal: ~4,000 tokens
|
||
|
||
TOTAL SINGLE AGENT: ~44,500 tokens (~35,000-45,000 typical)
|
||
```
|
||
|
||
### Concurrent Agents Review
|
||
|
||
```
|
||
STAGE 1: GIT PREPARATION (Main Thread)
|
||
├─ Git status check: ~500 tokens
|
||
├─ Git diff analysis: ~2,500 tokens
|
||
├─ File listing: ~500 tokens
|
||
└─ Subtotal: ~3,500 tokens
|
||
|
||
STAGE 2: CODE REVIEW AGENT (Independent Context)
|
||
├─ Agent initialization: ~2,000 tokens
|
||
│ (re-establishing context, no shared history)
|
||
├─ Git diff input: ~2,000 tokens
|
||
│ (agent needs own copy of diff)
|
||
├─ Code quality analysis: ~10,000 tokens
|
||
│ (duplication, errors, secrets, style)
|
||
├─ Results generation: ~1,500 tokens
|
||
└─ Subtotal: ~15,500 tokens
|
||
|
||
STAGE 3: ARCHITECTURE AUDIT AGENT (Independent Context)
|
||
├─ Agent initialization: ~2,000 tokens
|
||
├─ File structure input: ~2,500 tokens
|
||
│ (agent needs file paths and structure)
|
||
├─ Architecture analysis: ~12,000 tokens
|
||
│ (6-dimensional analysis)
|
||
├─ Results generation: ~1,500 tokens
|
||
└─ Subtotal: ~18,000 tokens
|
||
|
||
STAGE 4: SECURITY & COMPLIANCE AGENT (Independent Context)
|
||
├─ Agent initialization: ~2,000 tokens
|
||
├─ Code input for security review: ~2,000 tokens
|
||
├─ Security analysis: ~11,000 tokens
|
||
│ (OWASP, dependencies, secrets)
|
||
├─ Results generation: ~1,000 tokens
|
||
└─ Subtotal: ~16,000 tokens
|
||
|
||
STAGE 5: MULTI-PERSPECTIVE AGENT (Independent Context)
|
||
├─ Agent initialization: ~2,000 tokens
|
||
├─ Feature description: ~1,500 tokens
|
||
│ (agent needs less context, just requirements)
|
||
├─ Multi-perspective analysis: ~9,000 tokens
|
||
│ (6 stakeholder perspectives)
|
||
├─ Results generation: ~1,000 tokens
|
||
└─ Subtotal: ~13,500 tokens
|
||
|
||
STAGE 6: SYNTHESIS (Main Thread)
|
||
├─ Results consolidation: ~4,000 tokens
|
||
│ (4 sets of results to aggregate)
|
||
├─ Action plan creation: ~2,500 tokens
|
||
└─ Subtotal: ~6,500 tokens
|
||
|
||
STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
|
||
├─ User interaction: Variable (assume 2,000 tokens)
|
||
├─ Pre-push verification: ~1,500 tokens
|
||
├─ Commit message generation: ~500 tokens
|
||
└─ Subtotal: ~4,000 tokens
|
||
|
||
TOTAL CONCURRENT AGENTS: ~76,500 tokens (~68,000-78,000 typical)
|
||
```
|
||
|
||
### Why Concurrent Costs More
|
||
|
||
```
|
||
Cost Difference Breakdown:
|
||
|
||
Extra overhead from concurrent approach:
|
||
├─ Agent initialization (4x): 8,000 tokens
|
||
│ (each agent re-establishes context)
|
||
├─ Input duplication (4x): 8,000 tokens
|
||
│ (each agent gets its own copy of files)
|
||
├─ Result aggregation: 2,000 tokens
|
||
│ (main thread consolidates 4 result sets)
|
||
├─ Synthesis complexity: 1,500 tokens
|
||
│ (harder to merge 4 perspectives)
|
||
└─ API overhead: ~500 tokens
|
||
(4 separate API requests)
|
||
|
||
TOTAL EXTRA COST: ~20,000 tokens
|
||
(~32,000 base + 20,000 overhead = 52,000)
|
||
|
||
BUT agents run in parallel, so you might expect:
|
||
- Sequential single agent: 44,500 tokens
|
||
- Concurrent 4 agents: 44,500 / 4 = 11,125 per agent
|
||
- Total: ~44,500 tokens
|
||
|
||
ACTUAL concurrent: 76,500 tokens
|
||
|
||
Why the gap?
|
||
- No shared context between agents
|
||
- Each agent re-does setup
|
||
- Each agent needs full input data
|
||
- Results aggregation is not "free"
|
||
```
|
||
|
||
---
|
||
|
||
## Token Cost by Analysis Type
|
||
|
||
### Code Review Agent Token Budget
|
||
|
||
```
|
||
Input Processing:
|
||
├─ Git diff loading: ~2,000 tokens
|
||
├─ File context: ~1,000 tokens
|
||
└─ Subtotal: ~3,000 tokens
|
||
|
||
Analysis:
|
||
├─ Readability review: ~2,000 tokens
|
||
├─ Duplication detection: ~2,000 tokens
|
||
├─ Error handling check: ~2,000 tokens
|
||
├─ Secret detection: ~1,500 tokens
|
||
├─ Test coverage review: ~1,500 tokens
|
||
├─ Performance analysis: ~1,000 tokens
|
||
└─ Subtotal: ~10,000 tokens
|
||
|
||
Output:
|
||
├─ Formatting results: ~1,000 tokens
|
||
├─ Severity prioritization: ~500 tokens
|
||
└─ Subtotal: ~1,500 tokens
|
||
|
||
Code Review Total: ~14,500 tokens
|
||
```
|
||
|
||
### Architecture Audit Agent Token Budget
|
||
|
||
```
|
||
Input Processing:
|
||
├─ File structure loading: ~2,500 tokens
|
||
├─ Module relationship mapping: ~2,000 tokens
|
||
└─ Subtotal: ~4,500 tokens
|
||
|
||
Analysis (6 dimensions):
|
||
├─ Architecture & Design: ~2,500 tokens
|
||
├─ Code Quality: ~2,000 tokens
|
||
├─ Security: ~2,000 tokens
|
||
├─ Performance: ~1,500 tokens
|
||
├─ Testing: ~1,500 tokens
|
||
├─ Maintainability: ~1,500 tokens
|
||
└─ Subtotal: ~11,000 tokens
|
||
|
||
Output:
|
||
├─ Dimension scoring: ~1,500 tokens
|
||
├─ Recommendations: ~1,000 tokens
|
||
└─ Subtotal: ~2,500 tokens
|
||
|
||
Architecture Total: ~18,000 tokens
|
||
```
|
||
|
||
### Security & Compliance Agent Token Budget
|
||
|
||
```
|
||
Input Processing:
|
||
├─ Code loading: ~2,000 tokens
|
||
├─ Dependency list: ~1,000 tokens
|
||
└─ Subtotal: ~3,000 tokens
|
||
|
||
Analysis:
|
||
├─ OWASP Top 10 check: ~3,000 tokens
|
||
├─ Dependency vulnerability scan: ~2,500 tokens
|
||
├─ Secrets/keys detection: ~2,000 tokens
|
||
├─ Encryption review: ~1,500 tokens
|
||
├─ Auth/AuthZ review: ~1,500 tokens
|
||
├─ Compliance requirements: ~1,000 tokens
|
||
└─ Subtotal: ~11,500 tokens
|
||
|
||
Output:
|
||
├─ Severity assessment: ~1,000 tokens
|
||
├─ Remediation guidance: ~1,000 tokens
|
||
└─ Subtotal: ~2,000 tokens
|
||
|
||
Security Total: ~16,500 tokens
|
||
```
|
||
|
||
### Multi-Perspective Agent Token Budget
|
||
|
||
```
|
||
Input Processing:
|
||
├─ Feature description: ~1,500 tokens
|
||
├─ Change summary: ~1,000 tokens
|
||
└─ Subtotal: ~2,500 tokens
|
||
|
||
Analysis (6 perspectives):
|
||
├─ Product perspective: ~1,500 tokens
|
||
├─ Dev perspective: ~1,500 tokens
|
||
├─ QA perspective: ~1,500 tokens
|
||
├─ Security perspective: ~1,500 tokens
|
||
├─ DevOps perspective: ~1,000 tokens
|
||
├─ Design perspective: ~1,000 tokens
|
||
└─ Subtotal: ~8,000 tokens
|
||
|
||
Output:
|
||
├─ Stakeholder summary: ~1,500 tokens
|
||
├─ Risk assessment: ~1,000 tokens
|
||
└─ Subtotal: ~2,500 tokens
|
||
|
||
Multi-Perspective Total: ~13,000 tokens
|
||
```
|
||
|
||
---
|
||
|
||
## Monthly Cost Comparison
|
||
|
||
### Scenario: 5M Token Monthly Budget
|
||
|
||
```
|
||
SINGLE AGENT APPROACH
|
||
├─ Tokens per review: ~35,000
|
||
├─ Reviews per month: 5,000,000 / 35,000 = 142 reviews
|
||
├─ Cost efficiency: Excellent
|
||
└─ Best for: High-frequency reviews, rapid feedback
|
||
|
||
CONCURRENT AGENTS APPROACH
|
||
├─ Tokens per review: ~68,000
|
||
├─ Reviews per month: 5,000,000 / 68,000 = 73 reviews
|
||
├─ Cost efficiency: Half as many reviews
|
||
└─ Best for: Selective, high-quality reviews
|
||
|
||
COST COMPARISON
|
||
├─ Same budget: 5M tokens
|
||
├─ Single agent can do: 142 reviews
|
||
├─ Concurrent can do: 73 reviews
|
||
├─ Sacrifice: 69 fewer reviews per month
|
||
├─ Gain: 4 expert perspectives per review
|
||
```
|
||
|
||
### Pricing Impact (USD)
|
||
|
||
Assuming Claude 3.5 Sonnet pricing (~$3 per 1M tokens):
|
||
|
||
```
|
||
SINGLE AGENT
|
||
├─ 35,000 tokens per review: $0.105 per review
|
||
├─ 142 reviews per month: $14.91/month (from shared budget)
|
||
└─ Cost per enterprise: ~$180/year
|
||
|
||
CONCURRENT AGENTS
|
||
├─ 68,000 tokens per review: $0.204 per review
|
||
├─ 73 reviews per month: $14.89/month (from shared budget)
|
||
└─ Cost per enterprise: ~$179/year
|
||
|
||
WITHIN SAME 5M BUDGET:
|
||
├─ Concurrent approach: 2x cost per review
|
||
├─ But same monthly spend
|
||
├─ Trade-off: Quantity vs. Quality
|
||
```
|
||
|
||
---
|
||
|
||
## Optimization Strategies
|
||
|
||
### Strategy 1: Use Single Agent for Everyday
|
||
|
||
```
|
||
Mix Approach:
|
||
├─ 80% of code reviews: Single agent (~28,000 tokens avg)
|
||
├─ 20% of code reviews: Concurrent agents (for critical work)
|
||
|
||
Monthly breakdown (5M budget):
|
||
├─ 80% single agent: ~114 reviews @ 28K tokens = ~3.2M tokens
|
||
├─ 20% concurrent agents: ~37 reviews @ 68K tokens = ~2.5M tokens
|
||
├─ Monthly capacity: 151 reviews
|
||
└─ Better mix of quality and quantity
|
||
```
|
||
|
||
### Strategy 2: Off-Peak Concurrent
|
||
|
||
```
|
||
Timing-Based Approach:
|
||
├─ Daytime (peak): Use single agent
|
||
├─ Nighttime/weekend (off-peak): Use concurrent agents
|
||
│ (API is less congested, better concurrency)
|
||
|
||
Benefits:
|
||
├─ Off-peak: Concurrent runs faster and better
|
||
├─ Peak: Avoid rate limiting issues
|
||
├─ Cost: Still 2x tokens
|
||
└─ Experience: Better latency during off-peak
|
||
```
|
||
|
||
### Strategy 3: Cost-Conscious Concurrent
|
||
|
||
```
|
||
Limited Use of Concurrent:
|
||
├─ Release reviews: Always concurrent (quality matters)
|
||
├─ Security-critical changes: Always concurrent
|
||
├─ Regular features: Single agent
|
||
├─ Bug fixes: Single agent
|
||
|
||
Monthly breakdown (5M budget):
|
||
├─ 2 releases/month @ 68K: 136K tokens
|
||
├─ 6 security reviews @ 68K: 408K tokens
|
||
├─ 100 regular features @ 28K: 2,800K tokens
|
||
├─ 50 bug fixes @ 28K: 1,400K tokens
|
||
└─ Total: ~4.7M tokens (stays within budget)
|
||
```
|
||
|
||
---
|
||
|
||
## Reducing Token Costs
|
||
|
||
### For Concurrent Agents
|
||
|
||
#### 1. Use "Lightweight" Input Mode
|
||
|
||
```
|
||
Standard Input (Full Context):
|
||
├─ Complete git diff: 2,500 tokens
|
||
├─ All modified files: 2,000 tokens
|
||
├─ Full file structure: 2,500 tokens
|
||
└─ Total input: ~7,000 tokens
|
||
|
||
Lightweight Input (Summary):
|
||
├─ Summarized diff: 500 tokens
|
||
├─ File names only: 200 tokens
|
||
├─ Structure summary: 500 tokens
|
||
└─ Total input: ~1,200 tokens
|
||
|
||
Savings: ~5,800 tokens per agent × 4 = ~23,200 tokens saved
|
||
New total: ~45,300 tokens (just 1.3x single agent!)
|
||
```
|
||
|
||
#### 2. Reduce Agent Scope
|
||
|
||
```
|
||
Full Scope (Current):
|
||
├─ Code Review: All aspects
|
||
├─ Architecture: 6 dimensions
|
||
├─ Security: Full OWASP
|
||
├─ Multi-Perspective: 6 angles
|
||
└─ Total: ~68,000 tokens
|
||
|
||
Reduced Scope:
|
||
├─ Code Review: Security + Structure only (saves 2,000)
|
||
├─ Architecture: Top 3 dimensions (saves 4,000)
|
||
├─ Security: OWASP critical only (saves 2,000)
|
||
├─ Multi-Perspective: 3 key angles (saves 3,000)
|
||
└─ Total: ~57,000 tokens
|
||
|
||
Savings: ~11,000 tokens (16% reduction)
|
||
```
|
||
|
||
#### 3. Skip Non-Critical Agents
|
||
|
||
```
|
||
Full Pipeline (4 agents):
|
||
└─ Total: ~68,000 tokens
|
||
|
||
Critical Only (2 agents):
|
||
├─ Code Review Agent: ~15,000 tokens
|
||
├─ Security Agent: ~16,000 tokens
|
||
└─ Total: ~31,000 tokens (same as single agent)
|
||
|
||
Use when:
|
||
- Simple changes (no architecture impact)
|
||
- No security implications
|
||
- Team review not needed
|
||
```
|
||
|
||
---
|
||
|
||
## When Higher Token Cost is Worth It
|
||
|
||
### ROI Calculation
|
||
|
||
```
|
||
Extra cost per review: 33,000 tokens (~$0.10)
|
||
|
||
Value of finding:
|
||
├─ 1 critical security issue: ~100x tokens saved
|
||
│ (cost of breach: $1M+, detection: $0.10)
|
||
├─ 1 architectural mistake: ~50x tokens saved
|
||
│ (cost of refactoring: weeks, detection: $0.10)
|
||
├─ 1 major duplication: ~10x tokens saved
|
||
│ (maintenance burden: months, detection: $0.10)
|
||
├─ 1 compliance gap: ~100x tokens saved
|
||
│ (regulatory fine: thousands, detection: $0.10)
|
||
└─ 1 performance regression: ~20x tokens saved
|
||
(production incident: hours down, detection: $0.10)
|
||
```
|
||
|
||
### Examples Where ROI is Positive
|
||
|
||
1. **Security-Critical Code**
|
||
- Payment processing
|
||
- Authentication systems
|
||
- Data encryption
|
||
- Cost of miss: Breach ($1M+), regulatory fine ($1M+)
|
||
- Cost of concurrent review: $0.10
|
||
- ROI: Infinite (one miss pays for millions of reviews)
|
||
|
||
2. **Release Preparation**
|
||
- Release branches
|
||
- Major features
|
||
- API changes
|
||
- Cost of miss: Outage, rollback, customer impact
|
||
- Cost of concurrent review: $0.10
|
||
- ROI: Extremely high
|
||
|
||
3. **Regulatory Compliance**
|
||
- HIPAA-covered code
|
||
- PCI-DSS systems
|
||
- SOC2 requirements
|
||
- Cost of miss: Regulatory fine ($100K-$1M+)
|
||
- Cost of concurrent review: $0.10
|
||
- ROI: Astronomical
|
||
|
||
4. **Enterprise Standards**
|
||
- Multiple team sign-off
|
||
- Audit trail requirement
|
||
- Stakeholder input
|
||
- Cost of miss: Rework, team friction
|
||
- Cost of concurrent review: $0.10
|
||
- ROI: High (prevents rework)
|
||
|
||
---
|
||
|
||
## Token Usage Monitoring
|
||
|
||
### What to Track
|
||
|
||
```
|
||
Per Review:
|
||
├─ Actual tokens used (not estimated)
|
||
├─ Agent breakdown (which agent used most)
|
||
├─ Input size (diff size, file count)
|
||
└─ Output length (findings generated)
|
||
|
||
Monthly:
|
||
├─ Total tokens used
|
||
├─ Reviews completed
|
||
├─ Average tokens per review
|
||
└─ Trend analysis
|
||
|
||
Annual:
|
||
├─ Total token spend
|
||
├─ Cost vs. budget
|
||
├─ Reviews completed
|
||
└─ ROI analysis
|
||
```
|
||
|
||
### Setting Alerts
|
||
|
||
```
|
||
Rate Limit Alerts:
|
||
├─ 70% of TPM used in a minute → Warning
|
||
├─ 90% of TPM used in a minute → Critical
|
||
├─ Hit TPM limit → Block and notify
|
||
|
||
Monthly Budget Alerts:
|
||
├─ 50% of budget used → Informational
|
||
├─ 75% of budget used → Warning
|
||
├─ 90% of budget used → Critical
|
||
|
||
Cost Thresholds:
|
||
├─ Single review > 100K tokens → Unexpected (investigate)
|
||
├─ Average > 80K tokens → Possible over-analysis (review)
|
||
├─ Concurrent running during peak hours → Not optimal (schedule off-peak)
|
||
```
|
||
|
||
---
|
||
|
||
## Cost Optimization Summary
|
||
|
||
| Strategy | Token Saved | When to Use |
|
||
|----------|-------------|------------|
|
||
| **Mix single + concurrent** | Save 40% per month | Daily workflow |
|
||
| **Off-peak scheduling** | Save 15% (better concurrency) | When possible |
|
||
| **Lightweight input mode** | Save 35% per concurrent | Non-critical reviews |
|
||
| **Reduce agent scope** | Save 15-20% | Simple changes |
|
||
| **Skip non-critical agents** | Save 50% | Low-risk PRs |
|
||
| **Single agent only** | 50% baseline cost | Cost-sensitive |
|
||
|
||
---
|
||
|
||
## Recommendation
|
||
|
||
```
|
||
Use Concurrent Agents When:
|
||
├─ Token budget > 5M per month
|
||
├─ Quality > Cost priority
|
||
├─ Security-critical code
|
||
├─ Release reviews
|
||
├─ Multiple perspectives needed
|
||
└─ Regulatory requirements
|
||
|
||
Use Single Agent When:
|
||
├─ Limited token budget
|
||
├─ High-frequency reviews needed
|
||
├─ Simple changes
|
||
├─ Speed important (20-30% gain not material)
|
||
├─ Cost sensitive
|
||
└─ No multi-perspective requirement
|
||
|
||
Use Mix Strategy When:
|
||
├─ Want both quality and quantity
|
||
├─ Can do selective high-value concurrent reviews
|
||
├─ Have moderate token budget
|
||
├─ Enterprise with varied code types
|
||
└─ Want best of both worlds
|
||
```
|
||
|
||
---
|
||
|
||
**For full analysis, see [REALITY.md](REALITY.md) and [ARCHITECTURE.md](ARCHITECTURE.md).**
|
||
|