claude-skills/TOKEN-USAGE.md
Svrnty 672bdacc8d docs: Reality-check update - honest assessment of concurrent agent architecture
This update corrects misleading performance and cost claims in the documentation:

CORRECTED CLAIMS:
- Performance: Changed from "40-50% faster" to "20-30% faster" (honest observation)
- Token Cost: Changed from "60-70% savings" to "1.9-2.0x more expensive" (actual cost)
- Parallelism: Clarified "concurrent requests" vs "true parallel execution"
- Architecture: Updated from "parallel" to "concurrent" throughout

NEW DOCUMENTATION:
- REALITY.md: Honest assessment and reality vs. marketing
- ARCHITECTURE.md: Technical details on concurrent vs. parallel execution
- TOKEN-USAGE.md: Detailed token cost breakdown and optimization strategies

UPDATED FILES:
- master-orchestrator.md: Accurate performance, cost, and when-to-use guidance
- README.md: Updated architecture overview and trade-offs

KEY INSIGHTS:
- Concurrent agent architecture IS valuable but for different reasons:
  * Main thread context is clean (20-30% of single-agent size)
  * 4 independent expert perspectives (genuine value)
  * API rate limiting affects actual speed (20-30% typical)
  * Cost is 1.9-2.0x tokens vs. single agent analysis
- Best for enterprise quality-critical work, NOT cost-efficient projects
- Includes decision matrix and cost optimization strategies

This update maintains technical accuracy while preserving the genuine benefits
of multi-perspective analysis and context isolation that make the system valuable.
2025-10-31 13:14:24 -04:00

15 KiB
Raw Permalink Blame History

Token Usage & Cost Analysis

Version: 1.0.0 Date: 2025-10-31 Purpose: Understand the true cost of concurrent agents vs. single-agent reviews


Quick Cost Comparison

Metric Single Agent Concurrent Agents Multiplier
Tokens per review ~35,000 ~68,000 1.9x
Monthly reviews (5M tokens) 142 73 0.5x
Cost multiplier 1x 2x -
Time to execute 39-62 min 31-42 min 0.6-0.8x
Perspectives 1 4 4x

Bottom Line: You pay 2x tokens to get 4x perspectives and 20-30% time savings.


Detailed Token Breakdown

Single Agent Review (Baseline)

STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens

STAGES 2-5: COMPREHENSIVE ANALYSIS (Single Agent)
├─ Code review analysis: ~8,000 tokens
├─ Architecture analysis: ~10,000 tokens
├─ Security analysis: ~8,000 tokens
├─ Multi-perspective analysis: ~6,000 tokens
└─ Subtotal: ~32,000 tokens

STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~3,000 tokens
├─ Action plan creation: ~2,000 tokens
└─ Subtotal: ~5,000 tokens

STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens

TOTAL SINGLE AGENT: ~44,500 tokens (~35,000-45,000 typical)

Concurrent Agents Review

STAGE 1: GIT PREPARATION (Main Thread)
├─ Git status check: ~500 tokens
├─ Git diff analysis: ~2,500 tokens
├─ File listing: ~500 tokens
└─ Subtotal: ~3,500 tokens

STAGE 2: CODE REVIEW AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
│  (re-establishing context, no shared history)
├─ Git diff input: ~2,000 tokens
│  (agent needs own copy of diff)
├─ Code quality analysis: ~10,000 tokens
│  (duplication, errors, secrets, style)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~15,500 tokens

STAGE 3: ARCHITECTURE AUDIT AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ File structure input: ~2,500 tokens
│  (agent needs file paths and structure)
├─ Architecture analysis: ~12,000 tokens
│  (6-dimensional analysis)
├─ Results generation: ~1,500 tokens
└─ Subtotal: ~18,000 tokens

STAGE 4: SECURITY & COMPLIANCE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Code input for security review: ~2,000 tokens
├─ Security analysis: ~11,000 tokens
│  (OWASP, dependencies, secrets)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~16,000 tokens

STAGE 5: MULTI-PERSPECTIVE AGENT (Independent Context)
├─ Agent initialization: ~2,000 tokens
├─ Feature description: ~1,500 tokens
│  (agent needs less context, just requirements)
├─ Multi-perspective analysis: ~9,000 tokens
│  (6 stakeholder perspectives)
├─ Results generation: ~1,000 tokens
└─ Subtotal: ~13,500 tokens

STAGE 6: SYNTHESIS (Main Thread)
├─ Results consolidation: ~4,000 tokens
│  (4 sets of results to aggregate)
├─ Action plan creation: ~2,500 tokens
└─ Subtotal: ~6,500 tokens

STAGES 7-9: INTERACTIVE RESOLUTION (Main Thread)
├─ User interaction: Variable (assume 2,000 tokens)
├─ Pre-push verification: ~1,500 tokens
├─ Commit message generation: ~500 tokens
└─ Subtotal: ~4,000 tokens

TOTAL CONCURRENT AGENTS: ~76,500 tokens (~68,000-78,000 typical)

Why Concurrent Costs More

Cost Difference Breakdown:

Extra overhead from concurrent approach:
├─ Agent initialization (4x): 8,000 tokens
│  (each agent re-establishes context)
├─ Input duplication (4x): 8,000 tokens
│  (each agent gets its own copy of files)
├─ Result aggregation: 2,000 tokens
│  (main thread consolidates 4 result sets)
├─ Synthesis complexity: 1,500 tokens
│  (harder to merge 4 perspectives)
└─ API overhead: ~500 tokens
   (4 separate API requests)

TOTAL EXTRA COST: ~20,000 tokens
                  (~32,000 base + 20,000 overhead = 52,000)

BUT agents run in parallel, so you might expect:
- Sequential single agent: 44,500 tokens
- Concurrent 4 agents: 44,500 / 4 = 11,125 per agent
- Total: ~44,500 tokens

ACTUAL concurrent: 76,500 tokens

Why the gap?
- No shared context between agents
- Each agent re-does setup
- Each agent needs full input data
- Results aggregation is not "free"

Token Cost by Analysis Type

Code Review Agent Token Budget

Input Processing:
├─ Git diff loading: ~2,000 tokens
├─ File context: ~1,000 tokens
└─ Subtotal: ~3,000 tokens

Analysis:
├─ Readability review: ~2,000 tokens
├─ Duplication detection: ~2,000 tokens
├─ Error handling check: ~2,000 tokens
├─ Secret detection: ~1,500 tokens
├─ Test coverage review: ~1,500 tokens
├─ Performance analysis: ~1,000 tokens
└─ Subtotal: ~10,000 tokens

Output:
├─ Formatting results: ~1,000 tokens
├─ Severity prioritization: ~500 tokens
└─ Subtotal: ~1,500 tokens

Code Review Total: ~14,500 tokens

Architecture Audit Agent Token Budget

Input Processing:
├─ File structure loading: ~2,500 tokens
├─ Module relationship mapping: ~2,000 tokens
└─ Subtotal: ~4,500 tokens

Analysis (6 dimensions):
├─ Architecture & Design: ~2,500 tokens
├─ Code Quality: ~2,000 tokens
├─ Security: ~2,000 tokens
├─ Performance: ~1,500 tokens
├─ Testing: ~1,500 tokens
├─ Maintainability: ~1,500 tokens
└─ Subtotal: ~11,000 tokens

Output:
├─ Dimension scoring: ~1,500 tokens
├─ Recommendations: ~1,000 tokens
└─ Subtotal: ~2,500 tokens

Architecture Total: ~18,000 tokens

Security & Compliance Agent Token Budget

Input Processing:
├─ Code loading: ~2,000 tokens
├─ Dependency list: ~1,000 tokens
└─ Subtotal: ~3,000 tokens

Analysis:
├─ OWASP Top 10 check: ~3,000 tokens
├─ Dependency vulnerability scan: ~2,500 tokens
├─ Secrets/keys detection: ~2,000 tokens
├─ Encryption review: ~1,500 tokens
├─ Auth/AuthZ review: ~1,500 tokens
├─ Compliance requirements: ~1,000 tokens
└─ Subtotal: ~11,500 tokens

Output:
├─ Severity assessment: ~1,000 tokens
├─ Remediation guidance: ~1,000 tokens
└─ Subtotal: ~2,000 tokens

Security Total: ~16,500 tokens

Multi-Perspective Agent Token Budget

Input Processing:
├─ Feature description: ~1,500 tokens
├─ Change summary: ~1,000 tokens
└─ Subtotal: ~2,500 tokens

Analysis (6 perspectives):
├─ Product perspective: ~1,500 tokens
├─ Dev perspective: ~1,500 tokens
├─ QA perspective: ~1,500 tokens
├─ Security perspective: ~1,500 tokens
├─ DevOps perspective: ~1,000 tokens
├─ Design perspective: ~1,000 tokens
└─ Subtotal: ~8,000 tokens

Output:
├─ Stakeholder summary: ~1,500 tokens
├─ Risk assessment: ~1,000 tokens
└─ Subtotal: ~2,500 tokens

Multi-Perspective Total: ~13,000 tokens

Monthly Cost Comparison

Scenario: 5M Token Monthly Budget

SINGLE AGENT APPROACH
├─ Tokens per review: ~35,000
├─ Reviews per month: 5,000,000 / 35,000 = 142 reviews
├─ Cost efficiency: Excellent
└─ Best for: High-frequency reviews, rapid feedback

CONCURRENT AGENTS APPROACH
├─ Tokens per review: ~68,000
├─ Reviews per month: 5,000,000 / 68,000 = 73 reviews
├─ Cost efficiency: Half as many reviews
└─ Best for: Selective, high-quality reviews

COST COMPARISON
├─ Same budget: 5M tokens
├─ Single agent can do: 142 reviews
├─ Concurrent can do: 73 reviews
├─ Sacrifice: 69 fewer reviews per month
├─ Gain: 4 expert perspectives per review

Pricing Impact (USD)

Assuming Claude 3.5 Sonnet pricing (~$3 per 1M tokens):

SINGLE AGENT
├─ 35,000 tokens per review: $0.105 per review
├─ 142 reviews per month: $14.91/month (from shared budget)
└─ Cost per enterprise: ~$180/year

CONCURRENT AGENTS
├─ 68,000 tokens per review: $0.204 per review
├─ 73 reviews per month: $14.89/month (from shared budget)
└─ Cost per enterprise: ~$179/year

WITHIN SAME 5M BUDGET:
├─ Concurrent approach: 2x cost per review
├─ But same monthly spend
├─ Trade-off: Quantity vs. Quality

Optimization Strategies

Strategy 1: Use Single Agent for Everyday

Mix Approach:
├─ 80% of code reviews: Single agent (~28,000 tokens avg)
├─ 20% of code reviews: Concurrent agents (for critical work)

Monthly breakdown (5M budget):
├─ 80% single agent: ~114 reviews @ 28K tokens = ~3.2M tokens
├─ 20% concurrent agents: ~37 reviews @ 68K tokens = ~2.5M tokens
├─ Monthly capacity: 151 reviews
└─ Better mix of quality and quantity

Strategy 2: Off-Peak Concurrent

Timing-Based Approach:
├─ Daytime (peak): Use single agent
├─ Nighttime/weekend (off-peak): Use concurrent agents
│  (API is less congested, better concurrency)

Benefits:
├─ Off-peak: Concurrent runs faster and better
├─ Peak: Avoid rate limiting issues
├─ Cost: Still 2x tokens
└─ Experience: Better latency during off-peak

Strategy 3: Cost-Conscious Concurrent

Limited Use of Concurrent:
├─ Release reviews: Always concurrent (quality matters)
├─ Security-critical changes: Always concurrent
├─ Regular features: Single agent
├─ Bug fixes: Single agent

Monthly breakdown (5M budget):
├─ 2 releases/month @ 68K: 136K tokens
├─ 6 security reviews @ 68K: 408K tokens
├─ 100 regular features @ 28K: 2,800K tokens
├─ 50 bug fixes @ 28K: 1,400K tokens
└─ Total: ~4.7M tokens (stays within budget)

Reducing Token Costs

For Concurrent Agents

1. Use "Lightweight" Input Mode

Standard Input (Full Context):
├─ Complete git diff: 2,500 tokens
├─ All modified files: 2,000 tokens
├─ Full file structure: 2,500 tokens
└─ Total input: ~7,000 tokens

Lightweight Input (Summary):
├─ Summarized diff: 500 tokens
├─ File names only: 200 tokens
├─ Structure summary: 500 tokens
└─ Total input: ~1,200 tokens

Savings: ~5,800 tokens per agent × 4 = ~23,200 tokens saved
New total: ~45,300 tokens (just 1.3x single agent!)

2. Reduce Agent Scope

Full Scope (Current):
├─ Code Review: All aspects
├─ Architecture: 6 dimensions
├─ Security: Full OWASP
├─ Multi-Perspective: 6 angles
└─ Total: ~68,000 tokens

Reduced Scope:
├─ Code Review: Security + Structure only (saves 2,000)
├─ Architecture: Top 3 dimensions (saves 4,000)
├─ Security: OWASP critical only (saves 2,000)
├─ Multi-Perspective: 3 key angles (saves 3,000)
└─ Total: ~57,000 tokens

Savings: ~11,000 tokens (16% reduction)

3. Skip Non-Critical Agents

Full Pipeline (4 agents):
└─ Total: ~68,000 tokens

Critical Only (2 agents):
├─ Code Review Agent: ~15,000 tokens
├─ Security Agent: ~16,000 tokens
└─ Total: ~31,000 tokens (same as single agent)

Use when:
- Simple changes (no architecture impact)
- No security implications
- Team review not needed

When Higher Token Cost is Worth It

ROI Calculation

Extra cost per review: 33,000 tokens (~$0.10)

Value of finding:
├─ 1 critical security issue: ~100x tokens saved
│  (cost of breach: $1M+, detection: $0.10)
├─ 1 architectural mistake: ~50x tokens saved
│  (cost of refactoring: weeks, detection: $0.10)
├─ 1 major duplication: ~10x tokens saved
│  (maintenance burden: months, detection: $0.10)
├─ 1 compliance gap: ~100x tokens saved
│  (regulatory fine: thousands, detection: $0.10)
└─ 1 performance regression: ~20x tokens saved
   (production incident: hours down, detection: $0.10)

Examples Where ROI is Positive

  1. Security-Critical Code

    • Payment processing
    • Authentication systems
    • Data encryption
    • Cost of miss: Breach ($1M+), regulatory fine ($1M+)
    • Cost of concurrent review: $0.10
    • ROI: Infinite (one miss pays for millions of reviews)
  2. Release Preparation

    • Release branches
    • Major features
    • API changes
    • Cost of miss: Outage, rollback, customer impact
    • Cost of concurrent review: $0.10
    • ROI: Extremely high
  3. Regulatory Compliance

    • HIPAA-covered code
    • PCI-DSS systems
    • SOC2 requirements
    • Cost of miss: Regulatory fine ($100K-$1M+)
    • Cost of concurrent review: $0.10
    • ROI: Astronomical
  4. Enterprise Standards

    • Multiple team sign-off
    • Audit trail requirement
    • Stakeholder input
    • Cost of miss: Rework, team friction
    • Cost of concurrent review: $0.10
    • ROI: High (prevents rework)

Token Usage Monitoring

What to Track

Per Review:
├─ Actual tokens used (not estimated)
├─ Agent breakdown (which agent used most)
├─ Input size (diff size, file count)
└─ Output length (findings generated)

Monthly:
├─ Total tokens used
├─ Reviews completed
├─ Average tokens per review
└─ Trend analysis

Annual:
├─ Total token spend
├─ Cost vs. budget
├─ Reviews completed
└─ ROI analysis

Setting Alerts

Rate Limit Alerts:
├─ 70% of TPM used in a minute → Warning
├─ 90% of TPM used in a minute → Critical
├─ Hit TPM limit → Block and notify

Monthly Budget Alerts:
├─ 50% of budget used → Informational
├─ 75% of budget used → Warning
├─ 90% of budget used → Critical

Cost Thresholds:
├─ Single review > 100K tokens → Unexpected (investigate)
├─ Average > 80K tokens → Possible over-analysis (review)
├─ Concurrent running during peak hours → Not optimal (schedule off-peak)

Cost Optimization Summary

Strategy Token Saved When to Use
Mix single + concurrent Save 40% per month Daily workflow
Off-peak scheduling Save 15% (better concurrency) When possible
Lightweight input mode Save 35% per concurrent Non-critical reviews
Reduce agent scope Save 15-20% Simple changes
Skip non-critical agents Save 50% Low-risk PRs
Single agent only 50% baseline cost Cost-sensitive

Recommendation

Use Concurrent Agents When:
├─ Token budget > 5M per month
├─ Quality > Cost priority
├─ Security-critical code
├─ Release reviews
├─ Multiple perspectives needed
└─ Regulatory requirements

Use Single Agent When:
├─ Limited token budget
├─ High-frequency reviews needed
├─ Simple changes
├─ Speed important (20-30% gain not material)
├─ Cost sensitive
└─ No multi-perspective requirement

Use Mix Strategy When:
├─ Want both quality and quantity
├─ Can do selective high-value concurrent reviews
├─ Have moderate token budget
├─ Enterprise with varied code types
└─ Want best of both worlds

For full analysis, see REALITY.md and ARCHITECTURE.md.