Skip to main content

sender-id-registry-service — Deployment Topology

Version: 1.0 Status: Draft Owner: Trust & Safety + Platform SRE Last Updated: 2026-04-21 Companion: SERVICE_OVERVIEW · SECURITY_MODEL · ADR-0004 §6


1. Runtime

  • Language: Node.js 20 LTS
  • Framework: NestJS 10 with Fastify adapter (lower latency than Express for high-RPS gRPC)
  • Container base: node:20-alpine with non-root user, distroless final stage
  • Image: ghcr.io/ghasi/sender-id-registry-service:{git-sha}
  • Node pool: np-ctrl (control-plane, general-purpose) per ADR-0004 §6 — sender-ID is control-plane data, not data-plane traffic
  • Region affinity: Active in both kbl and mzr (multi-master per ADR-0004 §5)

2. Kubernetes Resources

2.1 Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: sender-id-registry-service
namespace: sms-platform
spec:
replicas: 3
selector:
matchLabels: { app: sender-id-registry-service }
template:
metadata:
labels: { app: sender-id-registry-service }
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3091"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: sender-id-registry-service
nodeSelector: { ghasi-node-pool: np-ctrl }
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels: { app: sender-id-registry-service }
topologyKey: kubernetes.io/hostname
containers:
- name: sender-id-registry-service
image: ghcr.io/ghasi/sender-id-registry-service:latest
ports:
- containerPort: 50091 # gRPC
name: grpc
- containerPort: 3091 # HTTP (REST + metrics + health)
name: http
env:
- name: NODE_ENV
value: production
- name: GRPC_PORT
value: "50091"
- name: HTTP_PORT
value: "3091"
- name: REGION
valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } }
- name: DATABASE_URL
valueFrom: { secretKeyRef: { name: sid-db-secret, key: url } }
- name: REDIS_URL
valueFrom: { secretKeyRef: { name: sid-redis-secret, key: url } }
- name: NATS_URL
valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
- name: NATS_CREDS_PATH
value: /etc/nats/nats.creds
- name: S3_KYC_BUCKET
value: "ghasi-sid-kyc-{REGION}"
- name: S3_REGULATOR_EXPORT_BUCKET
value: "ghasi-sid-regulator-export-{REGION}"
- name: VAULT_ADDR
value: https://vault.sms-platform.svc.cluster.local:8200
- name: VAULT_TRANSIT_PATH_PREFIX
value: transit/ghasi-sid-kyc
- name: HSM_PKCS11_LIB
value: /opt/hsm/libpkcs11.so
- name: HSM_REGULATOR_EXPORT_KEY_SLOT
valueFrom: { secretKeyRef: { name: sid-hsm-config, key: regulator_export_slot } }
- name: AI_LLM_URL
value: http://local-llm-service.sms-platform.svc.cluster.local:8000
- name: AI_OCR_URL
value: http://sid-kyc-ocr.sms-platform.svc.cluster.local:8001
- name: ATRA_SFTP_HOST
valueFrom: { secretKeyRef: { name: sid-regulator-targets, key: atra_sftp_host } }
- name: KYC_DOC_MAX_BYTES
value: "26214400"
- name: VERIFY_CACHE_TTL_SECONDS
value: "300"
- name: REPUTATION_CRON
value: "30 0 * * *"
- name: REGULATOR_EXPORT_CRON
value: "0 4 * * *"
- name: LOG_LEVEL
value: info
envFrom:
- secretRef: { name: sid-vault-secrets }
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: 2000m, memory: 1Gi }
livenessProbe:
httpGet: { path: /health/live, port: http }
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet: { path: /health/ready, port: http }
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
volumeMounts:
- name: tls-certs
mountPath: /etc/tls
readOnly: true
- name: nats-creds
mountPath: /etc/nats
readOnly: true
- name: atra-sftp-key
mountPath: /etc/atra-sftp
readOnly: true
volumes:
- name: tls-certs
secret: { secretName: sid-tls }
- name: nats-creds
secret: { secretName: nats-credentials }
- name: atra-sftp-key
secret: { secretName: sid-atra-sftp-key }

2.2 HorizontalPodAutoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: sender-id-registry-service-hpa
namespace: sms-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sender-id-registry-service
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 65 }
- type: Resource
resource:
name: memory
target: { type: Utilization, averageUtilization: 75 }
- type: Pods
pods:
metric: { name: sid_verify_duration_seconds_p95 }
target: { type: AverageValue, averageValue: "0.005" } # scale up if P95 ≥ 5 ms

2.3 PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: sender-id-registry-service-pdb
namespace: sms-platform
spec:
minAvailable: 2
selector:
matchLabels: { app: sender-id-registry-service }

2.4 Services

apiVersion: v1
kind: Service
metadata:
name: sender-id-registry-service-grpc
namespace: sms-platform
spec:
selector: { app: sender-id-registry-service }
ports:
- name: grpc
port: 50091
targetPort: grpc
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: sender-id-registry-service-http
namespace: sms-platform
spec:
selector: { app: sender-id-registry-service }
ports:
- name: http
port: 3091
targetPort: http
type: ClusterIP

2.5 NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sender-id-registry-service-netpol
namespace: sms-platform
spec:
podSelector:
matchLabels: { app: sender-id-registry-service }
policyTypes: [Ingress, Egress]
ingress:
- from:
- podSelector: { matchLabels: { app: compliance-engine } }
- podSelector: { matchLabels: { app: routing-engine } }
- podSelector: { matchLabels: { app: sms-firewall-service } }
- podSelector: { matchLabels: { app: channel-router-service } }
ports: [{ port: 50091 }]
- from:
- podSelector: { matchLabels: { app: kong } }
- podSelector: { matchLabels: { app: admin-dashboard } }
- podSelector: { matchLabels: { app: customer-portal } }
- podSelector: { matchLabels: { app: regulator-portal-service } }
ports: [{ port: 3091 }]
- from:
- namespaceSelector: { matchLabels: { name: monitoring } }
ports: [{ port: 3091 }]
egress:
- to: [{ podSelector: { matchLabels: { app: postgresql } } }]
ports: [{ port: 5432 }]
- to: [{ podSelector: { matchLabels: { app: redis } } }]
ports: [{ port: 6379 }]
- to: [{ podSelector: { matchLabels: { app: nats } } }]
ports: [{ port: 4222 }]
- to: [{ podSelector: { matchLabels: { app: minio } } }]
ports: [{ port: 9000 }]
- to: [{ podSelector: { matchLabels: { app: vault } } }]
ports: [{ port: 8200 }]
- to: [{ podSelector: { matchLabels: { app: local-llm } } }]
ports: [{ port: 8000 }]
- to: [{ podSelector: { matchLabels: { app: sid-kyc-ocr } } }]
ports: [{ port: 8001 }]
- to: [{ podSelector: { matchLabels: { app: channel-router-service } } }]
ports: [{ port: 50061 }]
# External egress: ATRA SFTP + DNS resolvers
- to:
- ipBlock: { cidr: 0.0.0.0/0, except: [10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16] }
ports:
- { port: 22 } # SFTP to ATRA
- { port: 853 } # DoT to 1.1.1.1, 8.8.8.8

3. Background Workers

Workers run as a separate Deployment sender-id-registry-workers to allow independent scaling. Each worker uses Redis distributed locks (SET NX EX) for multi-replica safety.

WorkerScheduleDescription
DnsVerificationPollerevery 30 minRe-checks IN_PROGRESS DNS verifications; expires after 24 h (UC-05)
OtpDispatcherevent-drivenDispatches OTP via channel-router on Verification.created (UC-04)
ReputationDailyCrondaily 00:30 UTCFull reputation recompute (UC-15)
ReputationIntradayWorkerevent-drivenConsumes fraud.detected.*, compliance.message.blocked.v1 (UC-15)
RegulatorExportCrondaily 04:00 UTCGenerate + sign + SFTP export (UC-14)
KycHashTamperCheckdaily 02:00 UTCCompare stored SHA-256 of every KYC blob to S3 ETag-extended attribute
OutboxRelaycontinuous (200 ms poll)Drain outbox to NATS
PartitionMaintenancedaily 03:00 UTCProvision next 3 monthly partitions for audit + reputation_history
KycViewArtefactCleanuphourlyLifecycle-purge watermarked KYC artefacts > 1 h old
RestrictedPatternReloaderevery 5 minRefresh in-process compiled pattern set if version changed
NotaryWhitelistReloaderevery 5 minRefresh in-process notary whitelist

4. Infrastructure Dependencies

DependencyVersionTopology
PostgreSQL15+Patroni cluster per region; logical replication kbl ↔ mzr for sender_id_registry schema (per ADR-0004 §14)
Redis7.0+Cluster mode; sender-id-registry uses DB 5
NATS JetStream2.10+Super-cluster kbl ↔ mzr; SENDER_ID_EVENTS mirrored; leaf-mirror to dxb
Object storage (MinIO / S3-compat)latestghasi-sid-kyc-{region} buckets with per-tenant encryption + WORM retention; ghasi-sid-regulator-export-{region} WORM bucket
Vault1.15+Transit + KV; PKI for mTLS
HSM (PKCS#11, FIPS 140-2 L3)per ADR-0004 §11Per-region HSM appliance pair; regulator-export key in dedicated slot
local-llm (vLLM)shared with compliance-engineper AI_INTEGRATION
sid-kyc-ocr sidecar deploymentPaddleOCR + GPU (optional)per AI_INTEGRATION
channel-router-servicelatestOTP delivery on lane P1
ATRA SFTP endpointexternalConfigured per environment

5. Environment Variables

VariableRequiredDefaultDescription
NODE_ENVYesproduction / staging / development
REGIONYeskbl / mzr / dxb
GRPC_PORTNo50091gRPC listener
HTTP_PORTNo3091HTTP listener
DATABASE_URLYesPostgres connection string (schema search path includes sender_id_registry)
REDIS_URLYesRedis URL with DB index /5
NATS_URLYesNATS server URL
NATS_CREDS_PATHYesPath to NATS credentials
S3_KYC_BUCKETYesKYC bucket name (region-prefixed)
S3_REGULATOR_EXPORT_BUCKETYesExport bucket name
VAULT_ADDRYesVault address
VAULT_TRANSIT_PATH_PREFIXYestransit/ghasi-sid-kycPer-tenant KEK prefix
HSM_PKCS11_LIBYes (prod)PKCS#11 provider library
HSM_REGULATOR_EXPORT_KEY_SLOTYes (prod)HSM slot id holding signing key
AI_LLM_URLNolocal-llm endpoint (AI assist)
AI_OCR_URLNoOCR sidecar endpoint
ATRA_SFTP_HOSTYes (prod)SFTP host for regulator export
KYC_DOC_MAX_BYTESNo2621440025 MB cap
VERIFY_CACHE_TTL_SECONDSNo300Verify cache TTL
REPUTATION_CRONNo30 0 * * *Daily cron schedule
REGULATOR_EXPORT_CRONNo0 4 * * *Daily export schedule
OTP_LENGTHNo6OTP digit count
OTP_TTL_SECONDSNo300OTP TTL
OTP_RATE_LIMIT_PER_HOURNo3Per registrant MSISDN
DNS_RESOLVER_PRIMARYNo1.1.1.1DoT primary
DNS_RESOLVER_SECONDARYNo8.8.8.8DoT secondary
GRPC_TLS_ENABLEDNotruemTLS toggle (must be true outside dev)
LOG_LEVELNoinfodebug / info / warn / error

6. Deployment Environments

EnvironmentReplicasHSMLLMOCRATRA SFTP
Production (kbl, mzr)3–10 (HPA)Real PKCS#11local-llm sharedsid-kyc-ocrReal ATRA
Staging2Soft-HSM (SoftHSM2)local-llm shared (smaller)sid-kyc-ocrmock-regulator-sftp
Development1Soft-HSM (SoftHSM2)mockmockmock-regulator-sftp
CI1Soft-HSMmockmockmock

7. Rollout Strategy

  • Strategy: RollingUpdate with maxUnavailable: 0, maxSurge: 1.
  • Pre-stop hook: drain in-flight gRPC connections (30 s grace).
  • Health gating: readiness probe checks Postgres connectivity, Redis connectivity, Vault token validity. Failure on any → pod removed from service before traffic.
  • Canary: new image deployed to 1 pod first; promote to full only if sid_verify_requests_total{status="error"} < 0.01% over 10 min.
  • Rollback: kubectl rollout undo returns to previous image; cache populated lazily — no warm-up needed.