Confidential Document

This document is restricted to RRI leadership.

Incorrect password
DERISK — Remove What Can Kill You
D4

Auth Load Capacity & Cognito Hardening

IN PROGRESS Wave 0 · 7 days

Executive Summary

Cognito has a default rate limit of 120 RPS shared across ALL user pools in the AWS account. For UPW March 12 (virtual, ~20,000 participants, ~1,500 buyers spread across 4 sales moments over 4 days), peak concurrent auth load during the biggest pitch window is likely 1-3 RPS at checkout, 10-50 RPS including page loads. This is well within the 120 RPS default limit.

For March 12, we deploy three layers: token caching, guest checkout fallback, and CloudWatch monitoring. The default 120 RPS limit is more than sufficient for a virtual event.

Key insight: Stripe natively supports guest customers. rri-order-ingestion already handles anonymous charges by email. The guest checkout fallback (Layer 4) requires zero downstream pipeline changes. For a virtual UPW where buyers click a link from Zoom, guest checkout is the simplest, most reliable path.

What Needs to Happen

Three-Layer Defense for UPW

LayerWhatTimelineCost
1Token caching via ElastiCache Redis3-4 days$50-80/month
2Guest checkout fallback (3-sec timeout → Stripe guest)2-3 days$0
3CloudWatch monitoring + Chatot alerts1-2 days$0

User pool isolation (checkout vs portal) is a Q2 follow-up at minimal cost.

  1. Deploy ElastiCache Redis token cache — Cache Cognito tokens to reduce direct API calls. 3-4 days.
  2. Build guest checkout fallback — 3-second timeout on Cognito auth → automatic fallback to Stripe guest mode. 2-3 days.
  3. Configure CloudWatch monitoring + Chatot alerts — Real-time auth failure rate monitoring with automated alerting. 1-2 days.
  4. Configure Chatot pre-warm cron — 100-200 bot sessions 45 minutes before each sales pitch window. Replaces Johnny’s manual warm-up process.

Revenue at risk per 30-minute auth failure window: $371K (750 failed logins x $495 average). Layer 4 (guest checkout fallback) is the safety net — bypasses Cognito entirely if auth is slow.

Claude Code acceleration: Redis cache configuration, fallback code patterns, and CloudWatch setup are all highly automatable. Estimated savings: 2-3 days from the original 7-day timeline.

Completion Criteria

  • ElastiCache Redis token cache deployed and reducing Cognito direct API calls
  • Guest checkout fallback tested: 3-second Cognito timeout → Stripe guest mode
  • CloudWatch monitoring active with Chatot alerting on auth failure rate spikes
  • Chatot pre-warm cron configured: 100-200 bot sessions 45min before each pitch window
  • March 11 Go/No-Go check passed: token caching active, guest checkout fallback tested, monitoring confirmed

Initiative Attributes

D4 — Auth Load Capacity & Cognito Hardening
Cost
$50-80/month ongoing (token caching) + ~$8-12K one-time labor
Timeline (Original)
7 working days — MUST complete before March 12
Timeline (With Claude Code)
4-5 days
Redis cache config + fallback code + CloudWatch setup
Owner
Johnny Yarlott + Zach Hardesty (CloudWatch/monitoring) + Spork (Chatot pre-warm)
Dependencies
None (starts immediately). Soft: D3 (credential rotation closes one attack vector)
Unblocks
D6 (load testing validates D4 layers), U2 (checkout depends on Cognito being reliable), U3 (SSO needs Cognito as foundation)
Revenue at Risk
$371K per 30-min window — guest checkout fallback is the safety net
Success Metrics
March 11: token caching active, guest checkout fallback tested, CloudWatch alerting confirmed

Tools Required

ToolPurposeCost
ElastiCache RedisToken caching — reduces direct Cognito API calls during peak auth$50-80/month
CloudWatchAuth failure rate monitoring + automated alertingIncluded in AWS
ChatotPre-warm cron — 100-200 bot sessions before pitch windowsExisting infrastructure

Related Risks

IDRiskSeverityProbabilityMitigation
RF7 Spork overload in Wave 0 (4 initiatives in 9 days) MEDIUM HIGH Kill 6+ daily meetings before March 12. Route status through Kingler. Erik must cancel cross-department meetings.