claude-skills/TOKEN-USAGE.md

# Token Usage & Cost Analysis

**Version:** 1.0.0
**Date:** 2025-10-31
**Purpose:** Understand the true cost of concurrent agents vs. single-agent reviews

---

## Quick Cost Comparison

| Metric | Single Agent | Concurrent Agents | Multiplier |
|--------|--------------|-------------------|-----------|
| **Tokens per review** | ~35,000 | ~68,000 | 1.9x |
| **Monthly reviews (5M tokens)** | 142 | 73 | 0.5x |
| **Cost multiplier** | 1x | 2x | - |
| **Time to execute** | 39-62 min | 31-42 min | 0.6-0.8x |
| **Perspectives** | 1 | 4 | 4x |

**Bottom Line**: You pay 2x tokens to get 4x perspectives and 20-30% time savings.

---

## Detailed Token Breakdown

### Single Agent Review (Baseline)

```
STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens

STAGES 2-5: COMPREHENSIVE ANALYSIS (Single Agent)
├─ Code review analysis: ~8,000 tokens
├─ Architecture analysis: ~10,000 tokens
├─ Security analysis: ~8,000 tokens
├─ Multi-perspective analysis: ~6,000 tokens
└─ Subtotal: ~32,000 tokens

STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~3,000 tokens
├─ Action plan creation: ~2,000 tokens
└─ Subtotal: ~5,000 tokens

STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens

TOTAL SINGLE AGENT: ~44,500 tokens (~35,000-45,000 typical)
```

### Concurrent Agents Review

```
STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens

STAGE 2: CODE REVIEW AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
│  (re-establishing context, no shared history)
├─ Git diff input: ~2,000 tokens
│  (agent needs own copy of diff)
├─ Code quality analysis: ~10,000 tokens
│  (duplication, errors, secrets, style)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~15,500 tokens

STAGE 3: ARCHITECTURE AUDIT AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ File structure input: ~2,500 tokens
│  (agent needs file paths and structure)
├─ Architecture analysis: ~12,000 tokens
│  (6-dimensional analysis)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~18,000 tokens

STAGE 4: SECURITY & COMPLIANCE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Code input for security review: ~2,000 tokens
├─ Security analysis: ~11,000 tokens
│  (OWASP, dependencies, secrets)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~16,000 tokens

STAGE 5: MULTI-PERSPECTIVE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Feature description: ~1,500 tokens
│  (agent needs less context, just requirements)
├─ Multi-perspective analysis: ~9,000 tokens
│  (6 stakeholder perspectives)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~13,500 tokens

STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~4,000 tokens
│  (4 sets of results to aggregate)
├─ Action plan creation: ~2,500 tokens
└─ Subtotal: ~6,500 tokens

STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens

TOTAL CONCURRENT AGENTS: ~76,500 tokens (~68,000-78,000 typical)
```

### Why Concurrent Costs More

```
Cost Difference Breakdown:

Extra overhead from concurrent approach:
├─ Agent initialization (4x): 8,000 tokens
│  (each agent re-establishes context)
├─ Input duplication (4x): 8,000 tokens
│  (each agent gets its own copy of files)
├─ Result aggregation: 2,000 tokens
│  (main thread consolidates 4 result sets)
├─ Synthesis complexity: 1,500 tokens
│  (harder to merge 4 perspectives)
└─ API overhead: ~500 tokens
   (4 separate API requests)

TOTAL EXTRA COST: ~20,000 tokens
                  (~32,000 base + 20,000 overhead = 52,000)

BUT agents run in parallel, so you might expect:
- Sequential single agent: 44,500 tokens
- Concurrent 4 agents: 44,500 / 4 = 11,125 per agent
- Total: ~44,500 tokens

ACTUAL concurrent: 76,500 tokens

Why the gap?
- No shared context between agents
- Each agent re-does setup
- Each agent needs full input data
- Results aggregation is not "free"
```

---

## Token Cost by Analysis Type

### Code Review Agent Token Budget

```
Input Processing:
├─ Git diff loading: ~2,000 tokens
├─ File context: ~1,000 tokens
└─ Subtotal: ~3,000 tokens

Analysis:
├─ Readability review: ~2,000 tokens
├─ Duplication detection: ~2,000 tokens
├─ Error handling check: ~2,000 tokens
├─ Secret detection: ~1,500 tokens
├─ Test coverage review: ~1,500 tokens
├─ Performance analysis: ~1,000 tokens
└─ Subtotal: ~10,000 tokens

Output:
├─ Formatting results: ~1,000 tokens
├─ Severity prioritization: ~500 tokens
└─ Subtotal: ~1,500 tokens

Code Review Total: ~14,500 tokens
```

### Architecture Audit Agent Token Budget

```
Input Processing:
├─ File structure loading: ~2,500 tokens
├─ Module relationship mapping: ~2,000 tokens
└─ Subtotal: ~4,500 tokens

Analysis (6 dimensions):
├─ Architecture & Design: ~2,500 tokens
├─ Code Quality: ~2,000 tokens
├─ Security: ~2,000 tokens
├─ Performance: ~1,500 tokens
├─ Testing: ~1,500 tokens
├─ Maintainability: ~1,500 tokens
└─ Subtotal: ~11,000 tokens

Output:
├─ Dimension scoring: ~1,500 tokens
├─ Recommendations: ~1,000 tokens
└─ Subtotal: ~2,500 tokens

Architecture Total: ~18,000 tokens
```

### Security & Compliance Agent Token Budget

```
Input Processing:
├─ Code loading: ~2,000 tokens
├─ Dependency list: ~1,000 tokens
└─ Subtotal: ~3,000 tokens

Analysis:
├─ OWASP Top 10 check: ~3,000 tokens
├─ Dependency vulnerability scan: ~2,500 tokens
├─ Secrets/keys detection: ~2,000 tokens
├─ Encryption review: ~1,500 tokens
├─ Auth/AuthZ review: ~1,500 tokens
├─ Compliance requirements: ~1,000 tokens
└─ Subtotal: ~11,500 tokens

Output:
├─ Severity assessment: ~1,000 tokens
├─ Remediation guidance: ~1,000 tokens
└─ Subtotal: ~2,000 tokens

Security Total: ~16,500 tokens
```

### Multi-Perspective Agent Token Budget

```
Input Processing:
├─ Feature description: ~1,500 tokens
├─ Change summary: ~1,000 tokens
└─ Subtotal: ~2,500 tokens

Analysis (6 perspectives):
├─ Product perspective: ~1,500 tokens
├─ Dev perspective: ~1,500 tokens
├─ QA perspective: ~1,500 tokens
├─ Security perspective: ~1,500 tokens
├─ DevOps perspective: ~1,000 tokens
├─ Design perspective: ~1,000 tokens
└─ Subtotal: ~8,000 tokens

Output:
├─ Stakeholder summary: ~1,500 tokens
├─ Risk assessment: ~1,000 tokens
└─ Subtotal: ~2,500 tokens

Multi-Perspective Total: ~13,000 tokens
```

---

## Monthly Cost Comparison

### Scenario: 5M Token Monthly Budget

```
SINGLE AGENT APPROACH
├─ Tokens per review: ~35,000
├─ Reviews per month: 5,000,000 / 35,000 = 142 reviews
├─ Cost efficiency: Excellent
└─ Best for: High-frequency reviews, rapid feedback

CONCURRENT AGENTS APPROACH
├─ Tokens per review: ~68,000
├─ Reviews per month: 5,000,000 / 68,000 = 73 reviews
├─ Cost efficiency: Half as many reviews
└─ Best for: Selective, high-quality reviews

COST COMPARISON
├─ Same budget: 5M tokens
├─ Single agent can do: 142 reviews
├─ Concurrent can do: 73 reviews
├─ Sacrifice: 69 fewer reviews per month
├─ Gain: 4 expert perspectives per review
```

### Pricing Impact (USD)

Assuming Claude 3.5 Sonnet pricing (~$3 per 1M tokens):

```
SINGLE AGENT
├─ 35,000 tokens per review: $0.105 per review
├─ 142 reviews per month: $14.91/month (from shared budget)
└─ Cost per enterprise: ~$180/year

CONCURRENT AGENTS
├─ 68,000 tokens per review: $0.204 per review
├─ 73 reviews per month: $14.89/month (from shared budget)
└─ Cost per enterprise: ~$179/year

WITHIN SAME 5M BUDGET:
├─ Concurrent approach: 2x cost per review
├─ But same monthly spend
├─ Trade-off: Quantity vs. Quality
```

---

## Optimization Strategies

### Strategy 1: Use Single Agent for Everyday

```
Mix Approach:
├─ 80% of code reviews: Single agent (~28,000 tokens avg)
├─ 20% of code reviews: Concurrent agents (for critical work)

Monthly breakdown (5M budget):
├─ 80% single agent: ~114 reviews @ 28K tokens = ~3.2M tokens
├─ 20% concurrent agents: ~37 reviews @ 68K tokens = ~2.5M tokens
├─ Monthly capacity: 151 reviews
└─ Better mix of quality and quantity
```

### Strategy 2: Off-Peak Concurrent

```
Timing-Based Approach:
├─ Daytime (peak): Use single agent
├─ Nighttime/weekend (off-peak): Use concurrent agents
│  (API is less congested, better concurrency)

Benefits:
├─ Off-peak: Concurrent runs faster and better
├─ Peak: Avoid rate limiting issues
├─ Cost: Still 2x tokens
└─ Experience: Better latency during off-peak
```

### Strategy 3: Cost-Conscious Concurrent

```
Limited Use of Concurrent:
├─ Release reviews: Always concurrent (quality matters)
├─ Security-critical changes: Always concurrent
├─ Regular features: Single agent
├─ Bug fixes: Single agent

Monthly breakdown (5M budget):
├─ 2 releases/month @ 68K: 136K tokens
├─ 6 security reviews @ 68K: 408K tokens
├─ 100 regular features @ 28K: 2,800K tokens
├─ 50 bug fixes @ 28K: 1,400K tokens
└─ Total: ~4.7M tokens (stays within budget)
```

---

## Reducing Token Costs

### For Concurrent Agents

#### 1. Use "Lightweight" Input Mode

```
Standard Input (Full Context):
├─ Complete git diff: 2,500 tokens
├─ All modified files: 2,000 tokens
├─ Full file structure: 2,500 tokens
└─ Total input: ~7,000 tokens

Lightweight Input (Summary):
├─ Summarized diff: 500 tokens
├─ File names only: 200 tokens
├─ Structure summary: 500 tokens
└─ Total input: ~1,200 tokens

Savings: ~5,800 tokens per agent × 4 = ~23,200 tokens saved
New total: ~45,300 tokens (just 1.3x single agent!)
```

#### 2. Reduce Agent Scope

```
Full Scope (Current):
├─ Code Review: All aspects
├─ Architecture: 6 dimensions
├─ Security: Full OWASP
├─ Multi-Perspective: 6 angles
└─ Total: ~68,000 tokens

Reduced Scope:
├─ Code Review: Security + Structure only (saves 2,000)
├─ Architecture: Top 3 dimensions (saves 4,000)
├─ Security: OWASP critical only (saves 2,000)
├─ Multi-Perspective: 3 key angles (saves 3,000)
└─ Total: ~57,000 tokens

Savings: ~11,000 tokens (16% reduction)
```

#### 3. Skip Non-Critical Agents

```
Full Pipeline (4 agents):
└─ Total: ~68,000 tokens

Critical Only (2 agents):
├─ Code Review Agent: ~15,000 tokens
├─ Security Agent: ~16,000 tokens
└─ Total: ~31,000 tokens (same as single agent)

Use when:
- Simple changes (no architecture impact)
- No security implications
- Team review not needed
```

---

## When Higher Token Cost is Worth It

### ROI Calculation

```
Extra cost per review: 33,000 tokens (~$0.10)

Value of finding:
├─ 1 critical security issue: ~100x tokens saved
│  (cost of breach: $1M+, detection: $0.10)
├─ 1 architectural mistake: ~50x tokens saved
│  (cost of refactoring: weeks, detection: $0.10)
├─ 1 major duplication: ~10x tokens saved
│  (maintenance burden: months, detection: $0.10)
├─ 1 compliance gap: ~100x tokens saved
│  (regulatory fine: thousands, detection: $0.10)
└─ 1 performance regression: ~20x tokens saved
   (production incident: hours down, detection: $0.10)
```

### Examples Where ROI is Positive

1. **Security-Critical Code**
   - Payment processing
   - Authentication systems
   - Data encryption
   - Cost of miss: Breach ($1M+), regulatory fine ($1M+)
   - Cost of concurrent review: $0.10
   - ROI: Infinite (one miss pays for millions of reviews)

2. **Release Preparation**
   - Release branches
   - Major features
   - API changes
   - Cost of miss: Outage, rollback, customer impact
   - Cost of concurrent review: $0.10
   - ROI: Extremely high

3. **Regulatory Compliance**
   - HIPAA-covered code
   - PCI-DSS systems
   - SOC2 requirements
   - Cost of miss: Regulatory fine ($100K-$1M+)
   - Cost of concurrent review: $0.10
   - ROI: Astronomical

4. **Enterprise Standards**
   - Multiple team sign-off
   - Audit trail requirement
   - Stakeholder input
   - Cost of miss: Rework, team friction
   - Cost of concurrent review: $0.10
   - ROI: High (prevents rework)

---

## Token Usage Monitoring

### What to Track

```
Per Review:
├─ Actual tokens used (not estimated)
├─ Agent breakdown (which agent used most)
├─ Input size (diff size, file count)
└─ Output length (findings generated)

Monthly:
├─ Total tokens used
├─ Reviews completed
├─ Average tokens per review
└─ Trend analysis

Annual:
├─ Total token spend
├─ Cost vs. budget
├─ Reviews completed
└─ ROI analysis
```

### Setting Alerts

```
Rate Limit Alerts:
├─ 70% of TPM used in a minute → Warning
├─ 90% of TPM used in a minute → Critical
├─ Hit TPM limit → Block and notify

Monthly Budget Alerts:
├─ 50% of budget used → Informational
├─ 75% of budget used → Warning
├─ 90% of budget used → Critical

Cost Thresholds:
├─ Single review > 100K tokens → Unexpected (investigate)
├─ Average > 80K tokens → Possible over-analysis (review)
├─ Concurrent running during peak hours → Not optimal (schedule off-peak)
```

---

## Cost Optimization Summary

| Strategy | Token Saved | When to Use |
|----------|-------------|------------|
| **Mix single + concurrent** | Save 40% per month | Daily workflow |
| **Off-peak scheduling** | Save 15% (better concurrency) | When possible |
| **Lightweight input mode** | Save 35% per concurrent | Non-critical reviews |
| **Reduce agent scope** | Save 15-20% | Simple changes |
| **Skip non-critical agents** | Save 50% | Low-risk PRs |
| **Single agent only** | 50% baseline cost | Cost-sensitive |

---

## Recommendation

```
Use Concurrent Agents When:
├─ Token budget > 5M per month
├─ Quality > Cost priority
├─ Security-critical code
├─ Release reviews
├─ Multiple perspectives needed
└─ Regulatory requirements

Use Single Agent When:
├─ Limited token budget
├─ High-frequency reviews needed
├─ Simple changes
├─ Speed important (20-30% gain not material)
├─ Cost sensitive
└─ No multi-perspective requirement

Use Mix Strategy When:
├─ Want both quality and quantity
├─ Can do selective high-value concurrent reviews
├─ Have moderate token budget
├─ Enterprise with varied code types
└─ Want best of both worlds
```

---

**For full analysis, see [REALITY.md](REALITY.md) and [ARCHITECTURE.md](ARCHITECTURE.md).**