Skip to main content

DLR Processor — Migration Plan

Status: populated Owner: Platform Engineering Last updated: 2026-04-18 Companion: DATA_MODEL · DEPLOYMENT_TOPOLOGY

1. Initial Bootstrap Migration (Phase 0 — Greenfield)

Since the platform is new, there is no legacy system to migrate from. The migration plan covers schema initialisation and the procedure for deploying the service to a new environment.

Flyway Migration Sequence

VersionFileDescription
V1V1__create_dlr_schema.sqlCreate dlr schema
V2V2__create_delivery_receipts.sqldlr.delivery_receipts table + indexes
V3V3__create_orphaned_receipts.sqldlr.orphaned_receipts table + indexes
V4V4__add_dlr_status_check.sqlCHECK constraint on dlr_status enum values
V5V5__add_outbox_table.sqldlr.outbox table for transactional outbox relay
V6V6__add_partitioning_delivery_receipts.sqlConvert delivery_receipts to monthly partitioned table

2. orch.sms_messages Column Addition

The DLR Processor requires new columns on the orchestrator-owned table. These are applied by the sms-orchestrator Flyway migrations, not this service.

Columns required (coordinate with sms-orchestrator team):

  • dlr_status VARCHAR(16) — nullable initially
  • dlr_received_at TIMESTAMPTZ — nullable
  • operator_message_id VARCHAR(64) — needed for correlation (index required)

3. Deployment Checklist (New Environment)

  • PostgreSQL dlr schema created with correct owner (dlr_svc)
  • Flyway migrations applied (V1–V6)
  • orch.sms_messages columns confirmed present
  • NATS stream SMS_DLR created with correct retention policy
  • NATS durable consumer dlr-processor provisioned
  • Kubernetes Secrets created (DATABASE_URL, NATS TLS certs)
  • ConfigMap applied
  • Deployment scaled to 3 replicas
  • HPA applied
  • Prometheus scrape target configured
  • Grafana dashboard imported
  • Alerting rules applied

4. Rollback Procedure

If a migration or deployment must be rolled back:

  1. Scale down dlr-processor to 0 replicas.
  2. Restore previous Docker image tag in Deployment.
  3. If schema migration must be reversed: apply down-migration script (maintained alongside each Flyway version).
  4. Scale back to 3 replicas.
  5. Verify /ready returns 200 on all pods.
  6. Confirm dlr_nats_consumer_status gauge = 1.

5. Data Backfill (Orphan Resolution)

After a significant schema fix (e.g., operatorMessageId missing from orch.sms_messages), run the reconciliation job to reprocess dlr.orphaned_receipts:

-- Mark resolved orphans
UPDATE dlr.orphaned_receipts
SET resolved_at = now(), resolved_receipt_id = $receipt_id
WHERE orphan_id = $orphan_id AND resolved_at IS NULL;

The reconciliation job is a separate one-shot Kubernetes Job triggered manually post-fix.