Skip to main content

DLR Processor — Service Readiness

Status: populated Owner: Platform Engineering Last updated: 2026-04-18

1. Definition of Ready (Before Sprint Start)

  • Domain model documented and reviewed
  • Event schemas published to schema registry
  • Data model migrations written and reviewed
  • Downstream contracts agreed with billing-service and webhook-dispatcher teams
  • NATS stream SMS_DLR provisioned in all environments
  • Test data seeded in dev environment

2. Definition of Done (Before Merge)

  • All unit and integration tests pass
  • Coverage ≥ 80%
  • Pact contracts verified
  • npm audit passes (no CRITICAL/HIGH)
  • Trivy image scan clean
  • PR reviewed by ≥ 1 engineer
  • Flyway migrations tested on clean DB
  • Feature flag added if partially complete

3. Production Readiness Checklist

Code Quality

  • No console.log or debug statements
  • No hardcoded secrets or configuration
  • Error handling comprehensive at all layers
  • Graceful shutdown implemented (SIGTERM → drain NATS consumer → close PG pool)

Observability

  • All key metrics instrumented
  • Structured log events for all processing paths
  • OTLP trace spans cover full pipeline
  • Grafana dashboard reviewed and approved

Operations

  • Runbooks written for all FM-DLR-* failure modes
  • Alert rules reviewed with SRE team
  • On-call rotation informed of new service
  • Deployment documented and tested in staging

Security

  • Security review completed
  • All secrets in Vault; none in source code or environment at build time
  • NetworkPolicy applied and verified
  • restricted Pod Security Admission profile applied

4. Launch Phases

PhaseCriteriaRollout
AlphaInternal test traffic only1 replica, staging
BetaCanary 5% of production DLR traffic2 replicas, production
GAFull production traffic3 replicas + HPA