Team Structure & Process Design
Current state, proposed restructuring, new roles, sprint ceremonies, and product ownership — the organizational foundation for executing the technical roadmap.
Current Team Structure (March 2026)
RRI’s engineering organization is structurally broken: everyone does everything, which means nothing gets done reliably. No sprint has closed in 3 weeks. The team burned out after 3-4 weeks of 10-hour days. Spork attends 6+ standing meetings daily, functioning as a human router instead of an engineering director.
Structural problems:
• No separation between operations (reactive) and product development (planned)
• Spork is a human router — 6+ standing meetings/day, no time for engineering
• No sprint has closed in 3 weeks
• No PM layer — unplanned work hits engineers directly
• No QA function — engineers test their own code
• No product ownership model — 6+ people own fragments of the customer experience
• 5 engineers are bus factor 1 on revenue-critical systems
• Contractor notice periods unknown
Proposed Team Structure
The fix is proven: separate Run (Kanban, reactive) from Build (Scrum, planned) with a hard organizational wall between them. Framework: Team Topologies (2025 update) — Run = Platform Team, Build = Stream-Aligned Team.
Before (Current)
- Everyone does everything
- No sprint velocity tracking
- Spork routes all requests manually
- Engineers handle ops + features
- No PM or PO layer
- No QA function
- No on-call rotation
- 10-hour days, burnout
After (Restructured)
- Run Team (Kanban) + Build Team (Scrum)
- 70%+ sprint velocity target
- Run Team Lead triages all operational requests
- Build engineers protected from ops interrupts
- Dedicated PM + PO layer
- AI-powered QA agents in CI/CD
- OpsGenie on-call rotation
- 40-hour weeks, sustainable pace
IT Service Desk & MSP Layer
Internal support requests currently go directly to engineers via Slack DMs. This creates constant interrupts and makes it impossible to measure support volume. The fix: a formal three-tier support model with Dean Schwartz as IT Service Desk Lead and AI triage as the first line of defense.
Three-Tier Support Model
| Tier | Team | Handles | SLA | Escalation |
|---|---|---|---|---|
| L0 — AI Triage | Chatot Agent | Zendesk/Slack intake, auto-categorization, known-issue resolution, password resets, FAQ responses | < 2 min response | Auto-route to L1 if unresolved |
| L1 — IT Service Desk | Dean Schwartz + MSP | Account provisioning, hardware/software requests, VPN/network issues, vendor coordination, basic troubleshooting | 4 hour response | Run Team Lead (L2) |
| L2 — Engineering Ops | Run Team | Infrastructure issues, deployment failures, database problems, integration bugs, performance degradation | P2: 1 hour / P3: next day | Build Team PO (L3) |
| L3 — Engineering Dev | Build Team | Code-level bugs requiring feature changes, architectural issues, new integration development | Next sprint planning | CTO / Product Council |
Ticket Routing Flow
MSP Evaluation
Decision pending: Evaluate whether RRI needs a Managed Service Provider (MSP) for after-hours IT coverage, hardware lifecycle management, and L1 overflow. Dean currently handles this alone — single point of failure for IT support during events and after hours.
Evaluation criteria: After-hours coverage model, per-seat pricing vs. fixed fee, Zendesk integration capability, onsite support during events, SLA guarantees. Target decision: end of Phase 2 (Week 8).
What does NOT go through IT Service Desk:
• Production incidents (P1/P2) — go directly to OpsGenie → Run Team on-call
• Feature requests — go through Product Council → Build Team backlog
• Infrastructure changes — go through Change Advisory Board (Run Team Lead + Zach)
• Security incidents — go directly to Sean + CTO escalation
New Roles & Hiring Timeline
| Role | Team | Salary | Priority | Post Date | Start Date | First Productive |
|---|---|---|---|---|---|---|
| Jay Lane (FT conversion) | AI | $175K ($87.5K incr.) | #0 | N/A | April 1 | Immediate |
| Run Team Lead | Run (Spork) | $140-160K | #1 BLOCKING | March 17 | May 15 | June 15 |
| DevOps Engineer | Run (Spork) | $130-150K | #2 | March 17 | May 19 | June 19 |
| Integration Engineer | Run (Spork) | $120-140K | #3 | March 17 | May 21 | June 21 |
| Data Engineer | Build (Justin) | $130-150K | #4 | April 15 | June 3 | July 3 |
| Senior Backend Developer | Build (Justin) | $140-160K | #5 | April 15 | June 10 | July 10 |
| Full-Stack Developer | Build (Justin) | $120-140K | #6 | April 15 | June 17 | July 17 |
| PM / Scrum Master | Build + Cross-team | $120-140K | #5 HIGH | March 17 | May 26 | June 26 |
| Tony AI Product Owner | Build (Justin) | $130-160K | #6 | April 15 | June 17 | July 17 |
| TR Experience Product Owner | Build (Justin) | $130-160K | #7 | April 15 | June 24 | July 24 |
| Event Ops Contractor #1 | Run (event-gated) | $50-80K | Immediate | March 17 | April 14 | April 28 |
| Event Ops Contractor #2 | Run (event-gated) | $50-80K | Immediate | March 17 | April 21 | May 5 |
Role Descriptions
Run Team Lead ($140-160K) — This is the #1 blocking hire. Owns all operational triage. Routes P1/P2/P3 incidents. Shields Build team from interrupts. Without this role, the Build vs. Run separation is organizational theater — Spork continues as human router. 45-55 days post-to-offer means we must post within 5 days of announcement.
DevOps Engineer ($130-150K) — Zach’s designated backup. Critical for S2 (Heroku → K8s migration) — Zach cannot architect AND execute a 16-week infrastructure migration alone. Also reduces bus factor 1 risk on all infrastructure.
Integration Engineer ($120-140K) — Owns Salesforce-Stripe-HubSpot-Obv.io integration layer. Relieves Tim Hooker (currently doing Salesforce + 4 JIRA projects) and provides backstop for Federico (Members Portal maintenance).
Data Engineer ($130-150K) — Builds the data pipelines that make S8 (Event Intelligence Dashboard) and the broader TROS vision possible. Partners with Caitlin Noble (Data Analyst). Enables Yogesh to get the ROI data he needs to approve AI investments.
Senior Backend Developer ($140-160K) — Johnny Yarlott’s designated backup on payments, authentication, and core backend services. Reduces the single highest-risk bus factor in the organization. Also accelerates Build Team velocity by adding backend capacity — currently the Build Team has zero dedicated backend engineers.
Full-Stack Developer ($120-140K) — Splits time between Members Portal (Federico backup) and TonyRobbins.com (Nick backup). Directly addresses the two highest contractor-dependency risks. Must be comfortable with Next.js, Node.js, and the Sanity CMS stack.
PM / Scrum Master ($120-140K) — The person who sits in requirements meetings so Justin and Spork don’t have to. Attends all stakeholder meetings, translates requirements into stories (with BA Agent support), runs all sprint ceremonies, provides weekly status updates to leadership, and manages cross-team dependencies. This isn’t a coordinator — it’s a real PM who understands the technical stack and can push back on scope creep.
Tony AI Product Owner ($130-160K) — Dedicated owner for Tony AI — the $23M ARR product with 49K subscribers. Owns the product roadmap, growth strategy, retention metrics, feature prioritization, and the path from $39/mo to a full coaching companion. Reports to Justin. Must have SaaS product management experience, ideally in AI/ML consumer products.
TR Experience Product Owner ($130-160K) — Dedicated owner for the Tony Robbins Experience platform — the portal unification (S5), Mastery Path (S3), and Event Passport (S4). This is the product that turns RRI from an events company into a technology company. Owns the unified customer journey from event purchase through lifetime engagement. Reports to Justin. Must understand subscription models and multi-product platforms.
Event Ops Contractors ($50-80K each) — Dedicated to event operations (kiosk setup, day-of support, attendee troubleshooting). Frees senior engineers from event duty. Event-gated — only active during event windows.
Developer Derisking & AI-Augmented Resilience
Five engineers are bus factor 1 on revenue-critical systems. The traditional fix (hire backups) takes 3-6 months per person and doubles headcount cost. Our approach: a three-layer resilience model combining human backups with AI agents that serve as always-available knowledge repositories.
Three-Layer Resilience Model
Layer 1: Primary Owner
- Deep system expertise
- Makes architectural decisions
- Reviews all PRs for their system
- Writes documentation continuously
- Trains both human backup and AI agent
Layer 2: Human Backup
- Can handle P1 incidents solo
- Reviews 30%+ of PRs
- Shadows primary on deployments
- Rotates in during PTO/events
- Documented runbooks for key scenarios
Layer 3: AI Agent
- Instant codebase knowledge recall
- Answers “how does X work?” in seconds
- Guides human backup through unfamiliar code
- Generates context for incident response
- Never forgets, never goes on PTO
Critical System Resilience Map
AI Agents as Knowledge Repositories
| Agent | Status | Knowledge Domain | Primary Use Case |
|---|---|---|---|
| Kingler | ACTIVE | All RRI repositories, architecture docs, deployment configs | Codebase Q&A, onboarding acceleration, incident context |
| Chatot | ACTIVE | IT support knowledge base, Zendesk history, common issues | L0 triage, auto-resolution of known issues, ticket routing |
| Inigo | ACTIVE | Product strategy, roadmap context, competitive intelligence | Strategy analysis, pre-read generation, decision support for Justin |
| TonyRobbins.com Agent | PLANNED | TR.com codebase, Sanity CMS schemas, Next.js architecture | Nick’s knowledge backup, onboarding new devs to the site |
| Portal Agent | PLANNED | Members Portal codebase, API contracts, user flows | Federico dependency reduction, Josh Fuller training acceleration |
The resilience math: With all three layers active, losing any single person degrades capability but doesn’t create a crisis. The human backup can handle incidents with AI agent guidance. The AI agent provides instant context that would otherwise take weeks to rebuild. Combined effect: bus factor moves from 1 → 2.5 effective (human backup + AI-assisted recovery).
QA Strategy: AI-Powered Testing Agents
RRI has no QA function. Engineers test their own code, which means bugs ship to production regularly. Hiring a QA engineer ($90-120K) adds headcount; instead, we deploy AI-powered QA agents that run continuously in CI/CD for a fraction of the cost.
Before (No QA)
- Engineers test their own code
- No visual regression testing
- No accessibility auditing
- No load testing before events
- No API contract validation
- Bugs found in production by users
- No test coverage metrics
After (Agent-Powered QA)
- Automated QA in every PR and deploy
- Visual regression catches UI breaks
- WCAG 2.1 AA compliance enforced
- Pre-event load testing automated
- API contracts validated on every change
- Bugs caught before merge
- Coverage dashboards in Swarmia
Frontend QA Agent
Capabilities:
• Visual Regression Testing — Playwright screenshots compared against baselines on every PR. Catches unintended UI changes across all breakpoints.
• End-to-End Testing — Critical user flows (signup, purchase, login, RPM access) tested on every deploy. Playwright + custom assertions.
• Accessibility Auditing — axe-core integrated into CI. Every page scanned for WCAG 2.1 AA violations. PR blocked if new violations introduced.
• Performance Monitoring — Lighthouse CI runs on every PR. Core Web Vitals tracked. Regression alerts if LCP/CLS/FID degrade beyond threshold.
Backend QA Agent
Capabilities:
• API Contract Testing — OpenAPI spec validation on every backend PR. Ensures frontend/backend contracts stay in sync. Breaking changes flagged automatically.
• Pre-Event Load Testing — k6/Artillery load tests run automatically 48 hours before every event. Simulates expected concurrent users. Alerts if response times breach thresholds.
• Data Integrity Checks — Validates Stripe ↔ Salesforce ↔ Portal data consistency. Runs nightly + pre-event. Catches sync failures before they impact customers.
• Integration Health — Monitors all third-party API endpoints (Stripe, Salesforce, HubSpot, Obv.io). Proactive alerts before failures cascade.
Implementation Phases
| Phase | Timeline | Deliverables | Tools |
|---|---|---|---|
| Phase 1: Foundation | Weeks 5-8 | E2E tests for 5 critical user flows, API contract testing in CI, basic Lighthouse CI integration | Playwright, OpenAPI validator, Lighthouse CI |
| Phase 2: Visual + Accessibility | Weeks 9-12 | Visual regression baselines for all customer-facing pages, axe-core a11y scanning in CI, coverage dashboards | Playwright visual compare, axe-core, Swarmia |
| Phase 3: Load + Data | Weeks 13-16 | Pre-event load testing automation, Stripe/SF data integrity nightly checks, integration health monitoring | k6/Artillery, custom data validators, Datadog |
Cost comparison: QA Engineer salary: $90-120K/year + benefits. AI QA agent infrastructure: ~$200-500/month (CI compute + tool licenses). That’s 95% cheaper with 24/7 coverage that never calls in sick, never has context-switching overhead, and scales linearly with the number of repos.
Product Ownership & PM Layer
Product ownership across RRI is currently fragmented across 6+ people with no one owning the full customer experience. The proposed model unifies ownership under Justin Kahn as VP/Head of Product with a formal governance structure and dedicated PM roles for each team.
Why Dedicated POs Matter
Tony AI alone has 49K paying subscribers and $23M ARR — that’s a standalone product that needs a dedicated owner who wakes up thinking about retention, engagement, and growth. The Tony Robbins Experience (portal unification, Mastery Path, Event Passport) is the platform play that drives the $1B valuation story. These can’t be side projects for people who also handle RPM, integrations, and CRM.
The Core Problem: Justin & Spork Are Stuck in Meetings
Why nothing ships: Justin and Spork spend their days in requirements meetings, stakeholder updates, and cross-department coordination instead of leading their teams. Spork has 6+ standing meetings daily. Justin is pulled into every product conversation because there’s no one else. Engineers get interrupted directly via Slack. Nobody is protecting development time or running the process. The fix isn’t better time management — it’s dedicated people whose job is the meetings, the process, and the stakeholder communication.
Project Manager / Scrum Master (New Hire)
This is a dedicated PM hire — not Justin wearing another hat. This person sits in requirements meetings so Justin doesn’t have to. They update stakeholders on project status so Spork doesn’t have to. They run sprint ceremonies, protect the team from scope creep, and are the single point of contact for “when will X be done?”
PM / Scrum Master Responsibilities
- Attends all requirements meetings — so Justin and Spork don’t
- Updates stakeholders on project status — weekly reports, ad-hoc questions
- Facilitates all sprint ceremonies (planning, standup, review, retro)
- Translates business requirements into technical stories (with BA Agent support)
- Protects sprint from scope creep and unplanned work
- Tracks velocity, burndown, and DORA metrics in Swarmia
- Manages cross-team dependencies and blockers
- Coaches team on Scrum practices
Run Team: Technical PM (Run Team Lead)
- Manages Kanban board WIP limits
- Tracks SLA compliance
- Coordinates incident response
- Reports ops metrics to leadership
- Manages vendor/MSP relationships
- Built into the Run Team Lead role
The unlock: With a dedicated PM in requirements meetings and handling stakeholder updates, Justin focuses on product vision and architecture decisions. Spork focuses on engineering leadership and system reliability. Neither is a human router anymore. The PM becomes the “shield” that lets technical leaders do technical work.
Dedicated Product Owners by Product
Alex Hoisington currently covers too much ground. The three biggest products each need a dedicated owner who lives and breathes that product every day.
AI Business Analyst Agent
Requirements meetings generate ideas, decisions, and action items — but translating those into well-structured Jira tickets with acceptance criteria is tedious, error-prone, and often doesn’t happen. The BA Agent sits in every requirements meeting (via transcript) and automatically generates tickets.
BA Agent Capabilities:
• Meeting → Tickets: Ingests meeting transcripts (Zoom/Teams recording → Whisper transcription). Identifies action items, decisions, and feature requests. Generates draft Jira stories with title, description, acceptance criteria, and suggested priority.
• Requirements Structuring: Takes loose stakeholder language (“we need the checkout to be faster”) and structures it into testable acceptance criteria (“checkout page loads in <2s on 3G, Stripe Payment Element renders within 1s”).
• Impact Assessment: Cross-references new requirements against existing backlog and active sprint to flag conflicts, duplicates, and dependencies before tickets are committed.
• PM Review Queue: All BA Agent-generated tickets go into a PM review queue — the PM / Scrum Master approves, edits, or rejects before they hit the backlog. No auto-create to backlog.
| Input | BA Agent Action | Output |
|---|---|---|
| Meeting transcript | Extract action items, feature requests, bug reports | Draft Jira stories in PM review queue |
| Slack thread with stakeholder request | Structure into story with acceptance criteria | Draft ticket + link to original thread |
| Email from marketing (“new SKU needed”) | Generate PCR draft + engineering impact estimate | PCR in Product Council approval queue |
| Incident post-mortem | Extract follow-up action items | Bug/improvement tickets with post-mortem link |
Current Fragmentation
| Product Area | Current Owner | Problem |
|---|---|---|
| Tony AI & RPM | Justin Kahn | No dedicated product manager |
| Coaching Programs | Chris Schenke | No tech integration |
| Platinum Partnership | Scotty | Siloed from digital products |
| Inner Circle / Biz Accelerator | Bree (under Diane) | Separate tech stack |
| Summit & Marketing | Jesse | Controls HubSpot, changes pages 5 min before go-live |
| Traditional Events | No single owner | Requirements come from everywhere |
Proposed: SVPG Product Council
Based on Silicon Valley Product Group methodology. Justin Kahn becomes unified product owner. All product decisions evaluated against a single North Star: the Mastery Path progression (UPW → Tony AI → RPM → Coaching → Inner Circle → Platinum).
- Quarterly Strategy Review (3 hours) — 7 members max. Sets product direction for the quarter. Inigo (Justin’s AI strategy agent) drafts pre-reads.
- Monthly Operating Review (90 min) — Progress against quarterly goals, resource reallocation, cross-product dependencies.
- Product Change Request (PCR) Process — 30-day lead time for new SKUs. Engineering impact assessment required. Council approval gate before any SKU touches Stripe/Salesforce/Sanity/order-ingestion.
- Freeze Mode — PAD (Product Admin Dashboard, U4) enforces freeze windows. Product changes can be created but cannot publish without CTO approval. Transforms policy into system-enforced guardrail.
Critical: Authority without enforcement is theater. Erik must enforce the structure when the first bypass attempt happens. One enforcement moment sets the precedent. A code freeze was attempted a year ago — it lasted 2 weeks because nobody enforced it.
Sprint Ceremonies & Process Cadence
Build Team (Scrum)
2-week sprints with 20% interrupt buffer built in. First sprint deliberately at 50% velocity — build trust before optimizing throughput. Product Owner (Alex Hoisington) gates ALL unplanned work. No direct Slack pings to Build engineers about ops issues.
Run Team (Kanban)
Kanban with WIP limits (team_size + 1). P1/P2/P3 incident severity tiers. OpsGenie on-call rotation ($9/user/month). No sprints — continuous flow with SLA targets.
Incident Severity Tiers
| Tier | Definition | Response SLA | Resolution SLA | Escalation |
|---|---|---|---|---|
| P1 — Critical | Revenue impact, system down, data loss | 15 minutes | 30 minutes | All hands + CTO + Erik |
| P2 — High | Degraded service, workaround exists | 1 hour | 4 hours | Run Team Lead + Spork |
| P3 — Normal | Non-urgent bugs, enhancement requests | Next business day | 5 business days | Run Team Lead routes |
Cross-Team Ceremonies
Process Tooling
| Tool | Purpose | Team | Cost |
|---|---|---|---|
| Jira | Sprint boards (Build) + Kanban boards (Run) | Both | Existing |
| OpsGenie | On-call rotation, incident alerting, escalation | Run | $9/user/month |
| Swarmia | DORA metrics, sprint velocity, engineering analytics | Both | ~$30/user/month |
| Confluence | Documentation, post-mortems, architecture decisions | Both | Existing |
| GitHub | Code, PRs, CI/CD, CODEOWNERS | Both | Existing |
| Playwright | E2E testing, visual regression, cross-browser QA | QA Agents | Open source |
| axe-core | Accessibility auditing (WCAG 2.1 AA) in CI | QA Agents | Open source |
| k6 / Artillery | Load testing, pre-event capacity validation | QA Agents | Open source / ~$100/mo |
| Zendesk | IT service desk ticketing, Chatot AI triage integration | Run (IT) | ~$55/agent/month |
| Reclaim.ai | AI calendar tool for engineering focus time | Build | $10/user/month (optional) |
| Retool | Product Admin Dashboard Phase 1 UI | Build | $50/user/month |
Restructuring Timeline
| Phase | Timeline | Key Actions |
|---|---|---|
| Phase 1: Stabilize & Separate | Weeks 1-2 | Announce restructuring. Create two Jira boards. Establish P1/P2/P3 tiers. Set WIP limits. Spork stops attending Build standups. Implement 20% interrupt buffer. Kill Spork’s 6+ daily meetings. |
| Phase 2: Hire & Stabilize | Weeks 3-8 | Post Run Team Lead + DevOps + Integration Engineer + Data Engineer + Senior Backend Dev + Full-Stack Dev. Install OpsGenie. Documentation sprints for Zach and Johnny (D1). Convert Jay Lane full-time. Begin QA agent Phase 1. Evaluate MSP options. |
| Phase 3: Operational Cadence | Weeks 9-12 | Run Team Lead onboarded and independent. First quarterly roadmap planning session. Swarmia DORA metrics baseline established. Build team achieving 70%+ sprint velocity. QA agent Phase 2 (visual + a11y). IT Service Desk model operational. |
| Phase 4: Scale & Harden | Weeks 13-16 | New developers onboarded and productive. QA agents fully operational (Phase 3: load + data). Three-layer resilience model validated for all BF1 systems. MSP evaluation complete. Pre-event load testing automated. AI agent knowledge repositories indexed for all critical systems. |
Success looks like: Build team completing 70%+ of sprint commitments. P1 incidents resolved in 30 minutes. No engineer working 10-hour days for 3+ consecutive days. First event with a clean code freeze that actually holds. Product Council meeting monthly with documented decisions. All BF1 systems at bus factor 2+ with AI agent knowledge backup. QA agents catching bugs before they reach production.