sender-id-registry-service — Deployment Topology
Version: 1.0 Status: Draft Owner: Trust & Safety + Platform SRE Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · SECURITY_MODEL · ADR-0004 §6
1. Runtime
- Language: Node.js 20 LTS
- Framework: NestJS 10 with Fastify adapter (lower latency than Express for high-RPS gRPC)
- Container base:
node:20-alpinewith non-root user, distroless final stage - Image:
ghcr.io/ghasi/sender-id-registry-service:{git-sha} - Node pool:
np-ctrl(control-plane, general-purpose) per ADR-0004 §6 — sender-ID is control-plane data, not data-plane traffic - Region affinity: Active in both
kblandmzr(multi-master per ADR-0004 §5)
2. Kubernetes Resources
2.1 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: sender-id-registry-service
namespace: sms-platform
spec:
replicas: 3
selector:
matchLabels: { app: sender-id-registry-service }
template:
metadata:
labels: { app: sender-id-registry-service }
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3091"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: sender-id-registry-service
nodeSelector: { ghasi-node-pool: np-ctrl }
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels: { app: sender-id-registry-service }
topologyKey: kubernetes.io/hostname
containers:
- name: sender-id-registry-service
image: ghcr.io/ghasi/sender-id-registry-service:latest
ports:
- containerPort: 50091 # gRPC
name: grpc
- containerPort: 3091 # HTTP (REST + metrics + health)
name: http
env:
- name: NODE_ENV
value: production
- name: GRPC_PORT
value: "50091"
- name: HTTP_PORT
value: "3091"
- name: REGION
valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } }
- name: DATABASE_URL
valueFrom: { secretKeyRef: { name: sid-db-secret, key: url } }
- name: REDIS_URL
valueFrom: { secretKeyRef: { name: sid-redis-secret, key: url } }
- name: NATS_URL
valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
- name: NATS_CREDS_PATH
value: /etc/nats/nats.creds
- name: S3_KYC_BUCKET
value: "ghasi-sid-kyc-{REGION}"
- name: S3_REGULATOR_EXPORT_BUCKET
value: "ghasi-sid-regulator-export-{REGION}"
- name: VAULT_ADDR
value: https://vault.sms-platform.svc.cluster.local:8200
- name: VAULT_TRANSIT_PATH_PREFIX
value: transit/ghasi-sid-kyc
- name: HSM_PKCS11_LIB
value: /opt/hsm/libpkcs11.so
- name: HSM_REGULATOR_EXPORT_KEY_SLOT
valueFrom: { secretKeyRef: { name: sid-hsm-config, key: regulator_export_slot } }
- name: AI_LLM_URL
value: http://local-llm-service.sms-platform.svc.cluster.local:8000
- name: AI_OCR_URL
value: http://sid-kyc-ocr.sms-platform.svc.cluster.local:8001
- name: ATRA_SFTP_HOST
valueFrom: { secretKeyRef: { name: sid-regulator-targets, key: atra_sftp_host } }
- name: KYC_DOC_MAX_BYTES
value: "26214400"
- name: VERIFY_CACHE_TTL_SECONDS
value: "300"
- name: REPUTATION_CRON
value: "30 0 * * *"
- name: REGULATOR_EXPORT_CRON
value: "0 4 * * *"
- name: LOG_LEVEL
value: info
envFrom:
- secretRef: { name: sid-vault-secrets }
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: 2000m, memory: 1Gi }
livenessProbe:
httpGet: { path: /health/live, port: http }
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet: { path: /health/ready, port: http }
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
volumeMounts:
- name: tls-certs
mountPath: /etc/tls
readOnly: true
- name: nats-creds
mountPath: /etc/nats
readOnly: true
- name: atra-sftp-key
mountPath: /etc/atra-sftp
readOnly: true
volumes:
- name: tls-certs
secret: { secretName: sid-tls }
- name: nats-creds
secret: { secretName: nats-credentials }
- name: atra-sftp-key
secret: { secretName: sid-atra-sftp-key }
2.2 HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: sender-id-registry-service-hpa
namespace: sms-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sender-id-registry-service
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 65 }
- type: Resource
resource:
name: memory
target: { type: Utilization, averageUtilization: 75 }
- type: Pods
pods:
metric: { name: sid_verify_duration_seconds_p95 }
target: { type: AverageValue, averageValue: "0.005" } # scale up if P95 ≥ 5 ms
2.3 PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: sender-id-registry-service-pdb
namespace: sms-platform
spec:
minAvailable: 2
selector:
matchLabels: { app: sender-id-registry-service }
2.4 Services
apiVersion: v1
kind: Service
metadata:
name: sender-id-registry-service-grpc
namespace: sms-platform
spec:
selector: { app: sender-id-registry-service }
ports:
- name: grpc
port: 50091
targetPort: grpc
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: sender-id-registry-service-http
namespace: sms-platform
spec:
selector: { app: sender-id-registry-service }
ports:
- name: http
port: 3091
targetPort: http
type: ClusterIP
2.5 NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sender-id-registry-service-netpol
namespace: sms-platform
spec:
podSelector:
matchLabels: { app: sender-id-registry-service }
policyTypes: [Ingress, Egress]
ingress:
- from:
- podSelector: { matchLabels: { app: compliance-engine } }
- podSelector: { matchLabels: { app: routing-engine } }
- podSelector: { matchLabels: { app: sms-firewall-service } }
- podSelector: { matchLabels: { app: channel-router-service } }
ports: [{ port: 50091 }]
- from:
- podSelector: { matchLabels: { app: kong } }
- podSelector: { matchLabels: { app: admin-dashboard } }
- podSelector: { matchLabels: { app: customer-portal } }
- podSelector: { matchLabels: { app: regulator-portal-service } }
ports: [{ port: 3091 }]
- from:
- namespaceSelector: { matchLabels: { name: monitoring } }
ports: [{ port: 3091 }]
egress:
- to: [{ podSelector: { matchLabels: { app: postgresql } } }]
ports: [{ port: 5432 }]
- to: [{ podSelector: { matchLabels: { app: redis } } }]
ports: [{ port: 6379 }]
- to: [{ podSelector: { matchLabels: { app: nats } } }]
ports: [{ port: 4222 }]
- to: [{ podSelector: { matchLabels: { app: minio } } }]
ports: [{ port: 9000 }]
- to: [{ podSelector: { matchLabels: { app: vault } } }]
ports: [{ port: 8200 }]
- to: [{ podSelector: { matchLabels: { app: local-llm } } }]
ports: [{ port: 8000 }]
- to: [{ podSelector: { matchLabels: { app: sid-kyc-ocr } } }]
ports: [{ port: 8001 }]
- to: [{ podSelector: { matchLabels: { app: channel-router-service } } }]
ports: [{ port: 50061 }]
# External egress: ATRA SFTP + DNS resolvers
- to:
- ipBlock: { cidr: 0.0.0.0/0, except: [10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16] }
ports:
- { port: 22 } # SFTP to ATRA
- { port: 853 } # DoT to 1.1.1.1, 8.8.8.8
3. Background Workers
Workers run as a separate Deployment sender-id-registry-workers to allow independent scaling. Each worker uses Redis distributed locks (SET NX EX) for multi-replica safety.
| Worker | Schedule | Description |
|---|---|---|
DnsVerificationPoller | every 30 min | Re-checks IN_PROGRESS DNS verifications; expires after 24 h (UC-05) |
OtpDispatcher | event-driven | Dispatches OTP via channel-router on Verification.created (UC-04) |
ReputationDailyCron | daily 00:30 UTC | Full reputation recompute (UC-15) |
ReputationIntradayWorker | event-driven | Consumes fraud.detected.*, compliance.message.blocked.v1 (UC-15) |
RegulatorExportCron | daily 04:00 UTC | Generate + sign + SFTP export (UC-14) |
KycHashTamperCheck | daily 02:00 UTC | Compare stored SHA-256 of every KYC blob to S3 ETag-extended attribute |
OutboxRelay | continuous (200 ms poll) | Drain outbox to NATS |
PartitionMaintenance | daily 03:00 UTC | Provision next 3 monthly partitions for audit + reputation_history |
KycViewArtefactCleanup | hourly | Lifecycle-purge watermarked KYC artefacts > 1 h old |
RestrictedPatternReloader | every 5 min | Refresh in-process compiled pattern set if version changed |
NotaryWhitelistReloader | every 5 min | Refresh in-process notary whitelist |
4. Infrastructure Dependencies
| Dependency | Version | Topology |
|---|---|---|
| PostgreSQL | 15+ | Patroni cluster per region; logical replication kbl ↔ mzr for sender_id_registry schema (per ADR-0004 §14) |
| Redis | 7.0+ | Cluster mode; sender-id-registry uses DB 5 |
| NATS JetStream | 2.10+ | Super-cluster kbl ↔ mzr; SENDER_ID_EVENTS mirrored; leaf-mirror to dxb |
| Object storage (MinIO / S3-compat) | latest | ghasi-sid-kyc-{region} buckets with per-tenant encryption + WORM retention; ghasi-sid-regulator-export-{region} WORM bucket |
| Vault | 1.15+ | Transit + KV; PKI for mTLS |
| HSM (PKCS#11, FIPS 140-2 L3) | per ADR-0004 §11 | Per-region HSM appliance pair; regulator-export key in dedicated slot |
local-llm (vLLM) | shared with compliance-engine | per AI_INTEGRATION |
sid-kyc-ocr sidecar deployment | PaddleOCR + GPU (optional) | per AI_INTEGRATION |
channel-router-service | latest | OTP delivery on lane P1 |
| ATRA SFTP endpoint | external | Configured per environment |
5. Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
NODE_ENV | Yes | — | production / staging / development |
REGION | Yes | — | kbl / mzr / dxb |
GRPC_PORT | No | 50091 | gRPC listener |
HTTP_PORT | No | 3091 | HTTP listener |
DATABASE_URL | Yes | — | Postgres connection string (schema search path includes sender_id_registry) |
REDIS_URL | Yes | — | Redis URL with DB index /5 |
NATS_URL | Yes | — | NATS server URL |
NATS_CREDS_PATH | Yes | — | Path to NATS credentials |
S3_KYC_BUCKET | Yes | — | KYC bucket name (region-prefixed) |
S3_REGULATOR_EXPORT_BUCKET | Yes | — | Export bucket name |
VAULT_ADDR | Yes | — | Vault address |
VAULT_TRANSIT_PATH_PREFIX | Yes | transit/ghasi-sid-kyc | Per-tenant KEK prefix |
HSM_PKCS11_LIB | Yes (prod) | — | PKCS#11 provider library |
HSM_REGULATOR_EXPORT_KEY_SLOT | Yes (prod) | — | HSM slot id holding signing key |
AI_LLM_URL | No | — | local-llm endpoint (AI assist) |
AI_OCR_URL | No | — | OCR sidecar endpoint |
ATRA_SFTP_HOST | Yes (prod) | — | SFTP host for regulator export |
KYC_DOC_MAX_BYTES | No | 26214400 | 25 MB cap |
VERIFY_CACHE_TTL_SECONDS | No | 300 | Verify cache TTL |
REPUTATION_CRON | No | 30 0 * * * | Daily cron schedule |
REGULATOR_EXPORT_CRON | No | 0 4 * * * | Daily export schedule |
OTP_LENGTH | No | 6 | OTP digit count |
OTP_TTL_SECONDS | No | 300 | OTP TTL |
OTP_RATE_LIMIT_PER_HOUR | No | 3 | Per registrant MSISDN |
DNS_RESOLVER_PRIMARY | No | 1.1.1.1 | DoT primary |
DNS_RESOLVER_SECONDARY | No | 8.8.8.8 | DoT secondary |
GRPC_TLS_ENABLED | No | true | mTLS toggle (must be true outside dev) |
LOG_LEVEL | No | info | debug / info / warn / error |
6. Deployment Environments
| Environment | Replicas | HSM | LLM | OCR | ATRA SFTP |
|---|---|---|---|---|---|
Production (kbl, mzr) | 3–10 (HPA) | Real PKCS#11 | local-llm shared | sid-kyc-ocr | Real ATRA |
| Staging | 2 | Soft-HSM (SoftHSM2) | local-llm shared (smaller) | sid-kyc-ocr | mock-regulator-sftp |
| Development | 1 | Soft-HSM (SoftHSM2) | mock | mock | mock-regulator-sftp |
| CI | 1 | Soft-HSM | mock | mock | mock |
7. Rollout Strategy
- Strategy: RollingUpdate with
maxUnavailable: 0,maxSurge: 1. - Pre-stop hook: drain in-flight gRPC connections (30 s grace).
- Health gating: readiness probe checks Postgres connectivity, Redis connectivity, Vault token validity. Failure on any → pod removed from service before traffic.
- Canary: new image deployed to 1 pod first; promote to full only if
sid_verify_requests_total{status="error"}< 0.01% over 10 min. - Rollback:
kubectl rollout undoreturns to previous image; cache populated lazily — no warm-up needed.