Customer Portal — Failure Modes
Status: populated Owner: Product Engineering (Frontend) Last updated: 2026-04-18
1. Failure Taxonomy
| Mode | Detection | Portal Behavior | Recovery |
|---|---|---|---|
| Kong gateway down | 502/504 from fetch | Full-page error boundary; "Service temporarily unavailable" message | Auto-retry with exponential backoff on next navigation |
| auth-service down | 502 on /v1/auth/me or /v1/auth/firebase | Login fails with "Authentication service unavailable" toast | Retry on next login attempt |
| JWT expired (normal) | 401 from Kong | Middleware transparently refreshes using __refresh cookie | Silent; user unaffected |
| JWT expired + refresh token expired | 401 from refresh endpoint | Redirect to /login?reason=session_expired | User re-authenticates |
| Firebase Auth outage | Firebase SDK error during signInWithEmailAndPassword | Error toast: "Sign-in service unavailable. Please try again." | Retry when Firebase recovers |
| billing-service down | 502 on /v1/billing/* | /billing page shows error state card; rest of portal unaffected | Manual refresh |
| webhook-dispatcher down | 502 on /v1/webhooks/* | /webhooks page shows error state; no cascading effect | Manual refresh |
| Rate limiting (429) | 429 from Kong | Toast: "Rate limit exceeded — try again shortly." | Wait and retry |
| API key creation fails | 4xx/5xx on POST /v1/api-keys | Modal shows error message; no raw key displayed | User retries |
| Network timeout (server component) | fetch AbortError | Page renders error boundary component | Next navigation re-fetches |
| Sentry unavailable | SDK init failure | Portal continues normally; errors not captured externally | — |
2. Partial Degradation Strategy
The portal's layout is divided into independent data-fetching sections. A failure in one section does not break others:
- Dashboard summary cards each use independent
Suspenseboundaries withErrorBoundarywrappers. - If usage data fails to load, the API key count card still renders.
- The test SMS sender is independent of the message log; each has its own error boundary.
3. Auth Failure Escalation
Request to protected route
└─► middleware reads __session cookie
├─ Valid JWT → proceed
├─ Expired JWT + valid __refresh → refresh silently → set new cookies → proceed
└─ Both invalid/missing
└─► Redirect to /login?redirect=<original-path>
└─► User authenticates → redirected back to original path
4. Known Limitations
| Limitation | Impact | Mitigation |
|---|---|---|
| No real-time message status updates | Delivered status only visible after page refresh | Add SSE/polling in post-MVP |
| Firebase Auth as single IdP | Firebase outage blocks all logins | Email magic link fallback planned for v2 |
| No offline support | Portal unusable without network | PWA offline mode not in scope |