DLR Processor — Service Risk Register
Status: populated Owner: Platform Engineering Last updated: 2026-04-18
Risk Matrix
| ID | Risk | Likelihood | Impact | Severity | Mitigation | Owner |
|---|---|---|---|---|---|---|
| RISK-DLR-01 | High orphan rate causes billing gaps | Medium | High | HIGH | Orphan monitoring + reconciliation job; alert at 0.5% | Platform Eng |
| RISK-DLR-02 | PG write contention at peak DLR volume | Low | Medium | MEDIUM | Connection pooling (PgBouncer); ON CONFLICT fast path; partition table | Platform Eng |
| RISK-DLR-03 | smpp-connector schema change breaks inbound parsing | Medium | High | HIGH | Pact contract tests; tolerant reader pattern; schema versioning | smpp-connector team |
| RISK-DLR-04 | NATS stream retention too short → DLR loss on processor downtime | Low | High | HIGH | Set retention to 24 h; monitor consumer lag; HPA on pending count | SRE |
| RISK-DLR-05 | PII leakage via orphaned_receipts rawPayload | Low | Critical | CRITICAL | Tablespace encryption; restricted SELECT grants; no PII in logs | Security |
| RISK-DLR-06 | Outbox relay failure causes stale billing/webhook | Low | High | HIGH | Outbox pending alert; automatic retry; on-call runbook | Platform Eng |
| RISK-DLR-07 | Duplicate DLR flood from operator degrades performance | Medium | Medium | MEDIUM | Idempotency index exits fast; plan Redis bloom filter for high-volume operators | Platform Eng |
| RISK-DLR-08 | Race condition: DLR arrives before SENT status written to orch | Medium | Medium | MEDIUM | Correlation retry with 3s backoff before orphaning; reconciliation job | Platform Eng |
Risk Review Cadence
Risk register reviewed monthly in Platform Engineering architecture sync. New risks added as incidents or near-misses occur.