claude-skills/TOKEN-USAGE.md
Svrnty 672bdacc8d docs: Reality-check update - honest assessment of concurrent agent architecture
This update corrects misleading performance and cost claims in the documentation:

CORRECTED CLAIMS:
- Performance: Changed from "40-50% faster" to "20-30% faster" (honest observation)
- Token Cost: Changed from "60-70% savings" to "1.9-2.0x more expensive" (actual cost)
- Parallelism: Clarified "concurrent requests" vs "true parallel execution"
- Architecture: Updated from "parallel" to "concurrent" throughout

NEW DOCUMENTATION:
- REALITY.md: Honest assessment and reality vs. marketing
- ARCHITECTURE.md: Technical details on concurrent vs. parallel execution
- TOKEN-USAGE.md: Detailed token cost breakdown and optimization strategies

UPDATED FILES:
- master-orchestrator.md: Accurate performance, cost, and when-to-use guidance
- README.md: Updated architecture overview and trade-offs

KEY INSIGHTS:
- Concurrent agent architecture IS valuable but for different reasons:
  * Main thread context is clean (20-30% of single-agent size)
  * 4 independent expert perspectives (genuine value)
  * API rate limiting affects actual speed (20-30% typical)
  * Cost is 1.9-2.0x tokens vs. single agent analysis
- Best for enterprise quality-critical work, NOT cost-efficient projects
- Includes decision matrix and cost optimization strategies

This update maintains technical accuracy while preserving the genuine benefits
of multi-perspective analysis and context isolation that make the system valuable.
2025-10-31 13:14:24 -04:00

560 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Token Usage & Cost Analysis
**Version:** 1.0.0
**Date:** 2025-10-31
**Purpose:** Understand the true cost of concurrent agents vs. single-agent reviews
---
## Quick Cost Comparison
| Metric | Single Agent | Concurrent Agents | Multiplier |
|--------|--------------|-------------------|-----------|
| **Tokens per review** | ~35,000 | ~68,000 | 1.9x |
| **Monthly reviews (5M tokens)** | 142 | 73 | 0.5x |
| **Cost multiplier** | 1x | 2x | - |
| **Time to execute** | 39-62 min | 31-42 min | 0.6-0.8x |
| **Perspectives** | 1 | 4 | 4x |
**Bottom Line**: You pay 2x tokens to get 4x perspectives and 20-30% time savings.
---
## Detailed Token Breakdown
### Single Agent Review (Baseline)
```
STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens
STAGES 2-5: COMPREHENSIVE ANALYSIS (Single Agent)
├─ Code review analysis: ~8,000 tokens
├─ Architecture analysis: ~10,000 tokens
├─ Security analysis: ~8,000 tokens
├─ Multi-perspective analysis: ~6,000 tokens
└─ Subtotal: ~32,000 tokens
STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~3,000 tokens
├─ Action plan creation: ~2,000 tokens
└─ Subtotal: ~5,000 tokens
STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens
TOTAL SINGLE AGENT: ~44,500 tokens (~35,000-45,000 typical)
```
### Concurrent Agents Review
```
STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens
STAGE 2: CODE REVIEW AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
│ (re-establishing context, no shared history)
├─ Git diff input: ~2,000 tokens
│ (agent needs own copy of diff)
├─ Code quality analysis: ~10,000 tokens
│ (duplication, errors, secrets, style)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~15,500 tokens
STAGE 3: ARCHITECTURE AUDIT AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ File structure input: ~2,500 tokens
│ (agent needs file paths and structure)
├─ Architecture analysis: ~12,000 tokens
│ (6-dimensional analysis)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~18,000 tokens
STAGE 4: SECURITY & COMPLIANCE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Code input for security review: ~2,000 tokens
├─ Security analysis: ~11,000 tokens
│ (OWASP, dependencies, secrets)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~16,000 tokens
STAGE 5: MULTI-PERSPECTIVE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Feature description: ~1,500 tokens
│ (agent needs less context, just requirements)
├─ Multi-perspective analysis: ~9,000 tokens
│ (6 stakeholder perspectives)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~13,500 tokens
STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~4,000 tokens
│ (4 sets of results to aggregate)
├─ Action plan creation: ~2,500 tokens
└─ Subtotal: ~6,500 tokens
STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens
TOTAL CONCURRENT AGENTS: ~76,500 tokens (~68,000-78,000 typical)
```
### Why Concurrent Costs More
```
Cost Difference Breakdown:
Extra overhead from concurrent approach:
├─ Agent initialization (4x): 8,000 tokens
│ (each agent re-establishes context)
├─ Input duplication (4x): 8,000 tokens
│ (each agent gets its own copy of files)
├─ Result aggregation: 2,000 tokens
│ (main thread consolidates 4 result sets)
├─ Synthesis complexity: 1,500 tokens
│ (harder to merge 4 perspectives)
└─ API overhead: ~500 tokens
(4 separate API requests)
TOTAL EXTRA COST: ~20,000 tokens
(~32,000 base + 20,000 overhead = 52,000)
BUT agents run in parallel, so you might expect:
- Sequential single agent: 44,500 tokens
- Concurrent 4 agents: 44,500 / 4 = 11,125 per agent
- Total: ~44,500 tokens
ACTUAL concurrent: 76,500 tokens
Why the gap?
- No shared context between agents
- Each agent re-does setup
- Each agent needs full input data
- Results aggregation is not "free"
```
---
## Token Cost by Analysis Type
### Code Review Agent Token Budget
```
Input Processing:
├─ Git diff loading: ~2,000 tokens
├─ File context: ~1,000 tokens
└─ Subtotal: ~3,000 tokens
Analysis:
├─ Readability review: ~2,000 tokens
├─ Duplication detection: ~2,000 tokens
├─ Error handling check: ~2,000 tokens
├─ Secret detection: ~1,500 tokens
├─ Test coverage review: ~1,500 tokens
├─ Performance analysis: ~1,000 tokens
└─ Subtotal: ~10,000 tokens
Output:
├─ Formatting results: ~1,000 tokens
├─ Severity prioritization: ~500 tokens
└─ Subtotal: ~1,500 tokens
Code Review Total: ~14,500 tokens
```
### Architecture Audit Agent Token Budget
```
Input Processing:
├─ File structure loading: ~2,500 tokens
├─ Module relationship mapping: ~2,000 tokens
└─ Subtotal: ~4,500 tokens
Analysis (6 dimensions):
├─ Architecture & Design: ~2,500 tokens
├─ Code Quality: ~2,000 tokens
├─ Security: ~2,000 tokens
├─ Performance: ~1,500 tokens
├─ Testing: ~1,500 tokens
├─ Maintainability: ~1,500 tokens
└─ Subtotal: ~11,000 tokens
Output:
├─ Dimension scoring: ~1,500 tokens
├─ Recommendations: ~1,000 tokens
└─ Subtotal: ~2,500 tokens
Architecture Total: ~18,000 tokens
```
### Security & Compliance Agent Token Budget
```
Input Processing:
├─ Code loading: ~2,000 tokens
├─ Dependency list: ~1,000 tokens
└─ Subtotal: ~3,000 tokens
Analysis:
├─ OWASP Top 10 check: ~3,000 tokens
├─ Dependency vulnerability scan: ~2,500 tokens
├─ Secrets/keys detection: ~2,000 tokens
├─ Encryption review: ~1,500 tokens
├─ Auth/AuthZ review: ~1,500 tokens
├─ Compliance requirements: ~1,000 tokens
└─ Subtotal: ~11,500 tokens
Output:
├─ Severity assessment: ~1,000 tokens
├─ Remediation guidance: ~1,000 tokens
└─ Subtotal: ~2,000 tokens
Security Total: ~16,500 tokens
```
### Multi-Perspective Agent Token Budget
```
Input Processing:
├─ Feature description: ~1,500 tokens
├─ Change summary: ~1,000 tokens
└─ Subtotal: ~2,500 tokens
Analysis (6 perspectives):
├─ Product perspective: ~1,500 tokens
├─ Dev perspective: ~1,500 tokens
├─ QA perspective: ~1,500 tokens
├─ Security perspective: ~1,500 tokens
├─ DevOps perspective: ~1,000 tokens
├─ Design perspective: ~1,000 tokens
└─ Subtotal: ~8,000 tokens
Output:
├─ Stakeholder summary: ~1,500 tokens
├─ Risk assessment: ~1,000 tokens
└─ Subtotal: ~2,500 tokens
Multi-Perspective Total: ~13,000 tokens
```
---
## Monthly Cost Comparison
### Scenario: 5M Token Monthly Budget
```
SINGLE AGENT APPROACH
├─ Tokens per review: ~35,000
├─ Reviews per month: 5,000,000 / 35,000 = 142 reviews
├─ Cost efficiency: Excellent
└─ Best for: High-frequency reviews, rapid feedback
CONCURRENT AGENTS APPROACH
├─ Tokens per review: ~68,000
├─ Reviews per month: 5,000,000 / 68,000 = 73 reviews
├─ Cost efficiency: Half as many reviews
└─ Best for: Selective, high-quality reviews
COST COMPARISON
├─ Same budget: 5M tokens
├─ Single agent can do: 142 reviews
├─ Concurrent can do: 73 reviews
├─ Sacrifice: 69 fewer reviews per month
├─ Gain: 4 expert perspectives per review
```
### Pricing Impact (USD)
Assuming Claude 3.5 Sonnet pricing (~$3 per 1M tokens):
```
SINGLE AGENT
├─ 35,000 tokens per review: $0.105 per review
├─ 142 reviews per month: $14.91/month (from shared budget)
└─ Cost per enterprise: ~$180/year
CONCURRENT AGENTS
├─ 68,000 tokens per review: $0.204 per review
├─ 73 reviews per month: $14.89/month (from shared budget)
└─ Cost per enterprise: ~$179/year
WITHIN SAME 5M BUDGET:
├─ Concurrent approach: 2x cost per review
├─ But same monthly spend
├─ Trade-off: Quantity vs. Quality
```
---
## Optimization Strategies
### Strategy 1: Use Single Agent for Everyday
```
Mix Approach:
├─ 80% of code reviews: Single agent (~28,000 tokens avg)
├─ 20% of code reviews: Concurrent agents (for critical work)
Monthly breakdown (5M budget):
├─ 80% single agent: ~114 reviews @ 28K tokens = ~3.2M tokens
├─ 20% concurrent agents: ~37 reviews @ 68K tokens = ~2.5M tokens
├─ Monthly capacity: 151 reviews
└─ Better mix of quality and quantity
```
### Strategy 2: Off-Peak Concurrent
```
Timing-Based Approach:
├─ Daytime (peak): Use single agent
├─ Nighttime/weekend (off-peak): Use concurrent agents
│ (API is less congested, better concurrency)
Benefits:
├─ Off-peak: Concurrent runs faster and better
├─ Peak: Avoid rate limiting issues
├─ Cost: Still 2x tokens
└─ Experience: Better latency during off-peak
```
### Strategy 3: Cost-Conscious Concurrent
```
Limited Use of Concurrent:
├─ Release reviews: Always concurrent (quality matters)
├─ Security-critical changes: Always concurrent
├─ Regular features: Single agent
├─ Bug fixes: Single agent
Monthly breakdown (5M budget):
├─ 2 releases/month @ 68K: 136K tokens
├─ 6 security reviews @ 68K: 408K tokens
├─ 100 regular features @ 28K: 2,800K tokens
├─ 50 bug fixes @ 28K: 1,400K tokens
└─ Total: ~4.7M tokens (stays within budget)
```
---
## Reducing Token Costs
### For Concurrent Agents
#### 1. Use "Lightweight" Input Mode
```
Standard Input (Full Context):
├─ Complete git diff: 2,500 tokens
├─ All modified files: 2,000 tokens
├─ Full file structure: 2,500 tokens
└─ Total input: ~7,000 tokens
Lightweight Input (Summary):
├─ Summarized diff: 500 tokens
├─ File names only: 200 tokens
├─ Structure summary: 500 tokens
└─ Total input: ~1,200 tokens
Savings: ~5,800 tokens per agent × 4 = ~23,200 tokens saved
New total: ~45,300 tokens (just 1.3x single agent!)
```
#### 2. Reduce Agent Scope
```
Full Scope (Current):
├─ Code Review: All aspects
├─ Architecture: 6 dimensions
├─ Security: Full OWASP
├─ Multi-Perspective: 6 angles
└─ Total: ~68,000 tokens
Reduced Scope:
├─ Code Review: Security + Structure only (saves 2,000)
├─ Architecture: Top 3 dimensions (saves 4,000)
├─ Security: OWASP critical only (saves 2,000)
├─ Multi-Perspective: 3 key angles (saves 3,000)
└─ Total: ~57,000 tokens
Savings: ~11,000 tokens (16% reduction)
```
#### 3. Skip Non-Critical Agents
```
Full Pipeline (4 agents):
└─ Total: ~68,000 tokens
Critical Only (2 agents):
├─ Code Review Agent: ~15,000 tokens
├─ Security Agent: ~16,000 tokens
└─ Total: ~31,000 tokens (same as single agent)
Use when:
- Simple changes (no architecture impact)
- No security implications
- Team review not needed
```
---
## When Higher Token Cost is Worth It
### ROI Calculation
```
Extra cost per review: 33,000 tokens (~$0.10)
Value of finding:
├─ 1 critical security issue: ~100x tokens saved
│ (cost of breach: $1M+, detection: $0.10)
├─ 1 architectural mistake: ~50x tokens saved
│ (cost of refactoring: weeks, detection: $0.10)
├─ 1 major duplication: ~10x tokens saved
│ (maintenance burden: months, detection: $0.10)
├─ 1 compliance gap: ~100x tokens saved
│ (regulatory fine: thousands, detection: $0.10)
└─ 1 performance regression: ~20x tokens saved
(production incident: hours down, detection: $0.10)
```
### Examples Where ROI is Positive
1. **Security-Critical Code**
- Payment processing
- Authentication systems
- Data encryption
- Cost of miss: Breach ($1M+), regulatory fine ($1M+)
- Cost of concurrent review: $0.10
- ROI: Infinite (one miss pays for millions of reviews)
2. **Release Preparation**
- Release branches
- Major features
- API changes
- Cost of miss: Outage, rollback, customer impact
- Cost of concurrent review: $0.10
- ROI: Extremely high
3. **Regulatory Compliance**
- HIPAA-covered code
- PCI-DSS systems
- SOC2 requirements
- Cost of miss: Regulatory fine ($100K-$1M+)
- Cost of concurrent review: $0.10
- ROI: Astronomical
4. **Enterprise Standards**
- Multiple team sign-off
- Audit trail requirement
- Stakeholder input
- Cost of miss: Rework, team friction
- Cost of concurrent review: $0.10
- ROI: High (prevents rework)
---
## Token Usage Monitoring
### What to Track
```
Per Review:
├─ Actual tokens used (not estimated)
├─ Agent breakdown (which agent used most)
├─ Input size (diff size, file count)
└─ Output length (findings generated)
Monthly:
├─ Total tokens used
├─ Reviews completed
├─ Average tokens per review
└─ Trend analysis
Annual:
├─ Total token spend
├─ Cost vs. budget
├─ Reviews completed
└─ ROI analysis
```
### Setting Alerts
```
Rate Limit Alerts:
├─ 70% of TPM used in a minute → Warning
├─ 90% of TPM used in a minute → Critical
├─ Hit TPM limit → Block and notify
Monthly Budget Alerts:
├─ 50% of budget used → Informational
├─ 75% of budget used → Warning
├─ 90% of budget used → Critical
Cost Thresholds:
├─ Single review > 100K tokens → Unexpected (investigate)
├─ Average > 80K tokens → Possible over-analysis (review)
├─ Concurrent running during peak hours → Not optimal (schedule off-peak)
```
---
## Cost Optimization Summary
| Strategy | Token Saved | When to Use |
|----------|-------------|------------|
| **Mix single + concurrent** | Save 40% per month | Daily workflow |
| **Off-peak scheduling** | Save 15% (better concurrency) | When possible |
| **Lightweight input mode** | Save 35% per concurrent | Non-critical reviews |
| **Reduce agent scope** | Save 15-20% | Simple changes |
| **Skip non-critical agents** | Save 50% | Low-risk PRs |
| **Single agent only** | 50% baseline cost | Cost-sensitive |
---
## Recommendation
```
Use Concurrent Agents When:
├─ Token budget > 5M per month
├─ Quality > Cost priority
├─ Security-critical code
├─ Release reviews
├─ Multiple perspectives needed
└─ Regulatory requirements
Use Single Agent When:
├─ Limited token budget
├─ High-frequency reviews needed
├─ Simple changes
├─ Speed important (20-30% gain not material)
├─ Cost sensitive
└─ No multi-perspective requirement
Use Mix Strategy When:
├─ Want both quality and quantity
├─ Can do selective high-value concurrent reviews
├─ Have moderate token budget
├─ Enterprise with varied code types
└─ Want best of both worlds
```
---
**For full analysis, see [REALITY.md](REALITY.md) and [ARCHITECTURE.md](ARCHITECTURE.md).**