SERVICE_RISK_REGISTER — billing-service
Risks tracked at service level, scored Likelihood × Impact (1–5 each, 25 max). Reviewed monthly. P1 items have a named owner and an active mitigation plan; P2 items have a watch in observability; P3 items are documented and accepted.
| # | Risk | Likelihood | Impact | Score | Tier | Owner | Mitigation |
|---|---|---|---|---|---|---|---|
| R1 | PCI scope creep: a future contributor stores card data inadvertently (e.g., logging a processorRawResponse, joining a dev table that contains test PANs) | 3 | 5 | 15 | P1 | Security | CI guard rejecting PAN-shaped column names + Luhn-pattern strings in code, logs sample, outbox sample (see TESTING §9); architecture forbids payment-gateway-service from passing PAN to billing; PR review checklist item; quarterly DLP scan in audit-service |
| R2 | Cash variance fraud (insider): cashier and supervisor collude to under-report counted floats | 3 | 5 | 15 | P1 | Finance Ops | Two-staff sign-off enforced cryptographically + AI cash pattern detector + nightly variance trend per actor + GM weekly review + drawer rotation policy in tenant playbooks |
| R3 | Dunning regulatory exposure: auto-suspending a tenant where local law requires written notice with N days lead time | 3 | 4 | 12 | P1 | Compliance | Per-tenant configurable gracePeriodDays and dunningNotificationTemplate; suspend step blocked if last dunning notification.sent.v1 < gracePeriodDays ago; legal review per market on plan launch |
| R4 | Cross-tenant data leak via misrouted connection | 2 | 5 | 10 | P1 | Security | 4-layer defense (SECURITY_MODEL §2); cross-tenant attack test in CI; audit.cross_tenant_access alert |
| R5 | Multi-currency rounding drift over time: small per-folio residuals accumulate to a material number on the platform's books | 4 | 3 | 12 | P2 | Finance Eng | Banker's half-up enforced; nightly reconciliation alert on Settlement.residual_micro != 0; quarterly platform-wide reconciliation against gateway; minimum residual absorbed via adjustment charge |
| R6 | Tax-rule out-of-date: jurisdiction changes a VAT rate; tenant doesn't update; incorrect tax issued | 4 | 4 | 16 | P1 | Tenant Success | taxRules.effectiveTo field + taxRules.staleness job alerts after 180 days; per-jurisdiction subscription to gov-tax-board feeds where available; tenant-admin nudge on rule expiry |
| R7 | Invoice numbering gap or duplication after re-open or void | 3 | 5 | 15 | P1 | Finance Eng | Numbering uses per-tenant + per-jurisdiction monotonic DB sequence with UNIQUE constraint; void preserves the number; re-open issues a new sequenced number; CI assertion on number monotonicity per tenant per day |
| R8 | Cash drawer "stuck" sessions block shift handover during prolonged offline period > shift length | 3 | 3 | 9 | P2 | Engineering | Desktop UX prominently displays pending close + "use cloud console to close" supervisor escape hatch (FAILURE_MODES §8) |
| R9 | AI false positive flood: anomaly detector overwhelms supervisors with low-quality signals; "alert fatigue" hides real fraud | 4 | 3 | 12 | P2 | Engineering + Finance | Per-tenant signal threshold; weekly precision review on a labeled sample; capability kill switch per tenant (AI_INTEGRATION §7) |
| R10 | Subscription invoice payment-method token revoked by processor (card expired, PayPal closed) | 4 | 3 | 12 | P2 | Engineering | Pre-cycle freshness check; tenant nudge 7d before cycle if token age > 11 months; processor-emitted token-status events update subscriptions.payment_method_token |
| R11 | Per-tenant schema migrator drift: a tenant skips a migration window, then runs into a forward-incompatible release | 3 | 4 | 12 | P1 | SRE | Migrator is run for every tenant before deploying API revision that requires the new schema; expand-and-contract pattern across two releases; integrity audit job alerts on tenants > 2 schema versions behind |
| R12 | Sharia-compliant invariant bypass via API: caller posts kind='late_fee' with feeKind!='interest' to evade the rejection but text labels suggest interest | 2 | 4 | 8 | P3 | Compliance | Domain enforces feeKind taxonomy; tax-engine rule blocks computation; copy review per template at launch |
| R13 | PDF tampering after issuance | 1 | 5 | 5 | P3 | Security | PDF embeds sha256(payload) signed by per-tenant key; verifier endpoint; PDFs stored in CMEK GCS with versioning |
| R14 | Data-residency mismatch for Saudi tenant (request to host in me-central1) | 3 | 3 | 9 | P2 | SRE | v2 work item to add per-tenant region pinning; until then, contractual disclosure in tenant onboarding |
| R15 | Outbox drainer single-point bottleneck during burst (e.g., POS posting hour) | 2 | 3 | 6 | P3 | SRE | Drainer scales on outbox lag custom metric; per-schema sharding tactic in FAILURE_MODES §4 |
| R16 | Test data with PAN-shaped strings leaks into production logs via fixture import | 1 | 5 | 5 | P3 | Security | Fixtures live in test/; production code path cannot import; CI fence; runtime DLP scan |
Risk-tier conventions
- P1 (≥ 12 or any catastrophic): active mitigation, named owner, monthly review; documented in this register.
- P2 (8 ≤ score < 12): observed via metrics / alerts; quarterly review.
- P3 (< 8): accepted; documented; reviewed annually.
Cross-references
- Security threat model: SECURITY_MODEL §10.
- Operational failures and runbook: FAILURE_MODES.
- Readiness gates that close several of these risks: SERVICE_READINESS.