Skip to main content

Consent Ledger Service — Deployment Topology

Version: 1.0 Status: Draft Owner: Platform SRE / Trust & Safety Last Updated: 2026-04-21 Companion: SECURITY_MODEL · OBSERVABILITY · ADR-0004

1. Kubernetes resources

Namespace

sms-platform (consent-ledger-service co-resides with the broader platform). All resources include tier=control-plane and service=consent-ledger-service labels for ADR-0004 §3 region selection.

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: consent-ledger-service
namespace: sms-platform
labels:
app: consent-ledger-service
tier: control-plane
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
selector:
matchLabels: { app: consent-ledger-service }
template:
metadata:
labels:
app: consent-ledger-service
tier: control-plane
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3071"
prometheus.io/path: "/metrics"
sidecar.istio.io/inject: "true"
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "consent-ledger-service"
spec:
serviceAccountName: consent-ledger-service-sa
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels: { app: consent-ledger-service }
topologyKey: topology.kubernetes.io/zone
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/region
operator: In
values: [af-kabul-1, af-herat-1, af-mazar-1]
- key: workload-class
operator: In
values: [control-plane]
containers:
- name: consent-ledger-service
image: ghcr.io/ghasi/consent-ledger-service:1.0.0
ports:
- { containerPort: 50071, name: grpc }
- { containerPort: 3071, name: http }
env:
- { name: NODE_ENV, value: production }
- { name: LOG_LEVEL, value: info }
- { name: GRPC_PORT, value: "50071" }
- { name: HTTP_PORT, value: "3071" }
- { name: GRPC_TLS_ENABLED, value: "true" }
- { name: TLS_CERT_PATH, value: /etc/tls/tls.crt }
- { name: TLS_KEY_PATH, value: /etc/tls/tls.key }
- { name: TLS_CA_PATH, value: /etc/tls/ca.crt }
- name: DATABASE_URL
valueFrom: { secretKeyRef: { name: consent-ledger-db, key: url } }
- name: DATABASE_REPLICA_URL
valueFrom: { secretKeyRef: { name: consent-ledger-db, key: replica_url } }
- name: REDIS_URL
valueFrom: { secretKeyRef: { name: consent-ledger-redis, key: url } }
- name: NATS_URL
valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
- name: NATS_CREDS_PATH
value: /etc/nats/creds
- name: VAULT_ADDR
value: https://vault.platform.svc.cluster.local:8200
- name: VAULT_TRANSIT_KEY_PATH
value: transit/ghasi-consent-audit-signing
- name: ATRA_DND_ENDPOINT
value: sftp://atra-dnd.gov.af/dnd/latest.csv
- name: ATRA_PGP_FINGERPRINT
valueFrom: { secretKeyRef: { name: consent-ledger-atra, key: pgp_fingerprint } }
- name: REGION
valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } }
- name: PEPPER_VERSION
value: "v3"
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: 2, memory: 1Gi }
livenessProbe:
httpGet: { path: /health/live, port: http }
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet: { path: /health/ready, port: http }
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "kill -SIGTERM 1 && sleep 30"] # graceful drain
volumeMounts:
- { name: tls, mountPath: /etc/tls, readOnly: true }
- { name: nats, mountPath: /etc/nats, readOnly: true }
volumes:
- name: tls
secret: { secretName: consent-ledger-tls }
- name: nats
secret: { secretName: nats-credentials }
tolerations:
- { key: workload-class, operator: Equal, value: control-plane, effect: NoSchedule }

HorizontalPodAutoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: consent-ledger-service
namespace: sms-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: consent-ledger-service
minReplicas: 5
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 60 }
- type: Resource
resource:
name: memory
target: { type: Utilization, averageUtilization: 70 }
- type: Pods
pods:
metric: { name: consent_check_in_flight }
target: { type: AverageValue, averageValue: "1500" } # scale up if approaching the 2,000 cap
- type: Pods
pods:
metric: { name: consent_check_duration_seconds_p95 }
target: { type: AverageValue, averageValue: "0.004" } # scale up before P95 breaches 5 ms
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods, value: 3, periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods, value: 1, periodSeconds: 60

PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: consent-ledger-service-pdb, namespace: sms-platform }
spec:
minAvailable: 4
selector: { matchLabels: { app: consent-ledger-service } }

Services

apiVersion: v1
kind: Service
metadata: { name: consent-ledger-service-grpc, namespace: sms-platform }
spec:
selector: { app: consent-ledger-service }
ports: [ { name: grpc, port: 50071, targetPort: grpc } ]
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: consent-ledger-service-http, namespace: sms-platform }
spec:
selector: { app: consent-ledger-service }
ports: [ { name: http, port: 3071, targetPort: http } ]
type: ClusterIP

NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: consent-ledger-service, namespace: sms-platform }
spec:
podSelector: { matchLabels: { app: consent-ledger-service } }
policyTypes: [Ingress, Egress]
ingress:
# gRPC callers
- from:
- podSelector: { matchLabels: { app: compliance-engine } }
- podSelector: { matchLabels: { app: routing-engine } }
- podSelector: { matchLabels: { app: sms-firewall-service } }
- podSelector: { matchLabels: { app: tenant-sdk-gateway } }
- podSelector: { matchLabels: { app: regulator-portal-service } }
ports: [{ port: 50071, protocol: TCP }]
# REST callers
- from:
- podSelector: { matchLabels: { app: kong } } # admin / tenant / citizen
ports: [{ port: 3071, protocol: TCP }]
# Prometheus
- from:
- namespaceSelector: { matchLabels: { name: monitoring } }
ports: [{ port: 3071, protocol: TCP }]
egress:
- to: [{ podSelector: { matchLabels: { app: postgresql } } }]
ports: [{ port: 5432, protocol: TCP }]
- to: [{ podSelector: { matchLabels: { app: redis } } }]
ports: [{ port: 6379, protocol: TCP }]
- to: [{ podSelector: { matchLabels: { app: nats } } }]
ports: [{ port: 4222, protocol: TCP }]
- to: [{ podSelector: { matchLabels: { app: vault } } }]
ports: [{ port: 8200, protocol: TCP }]
- to: [{ podSelector: { matchLabels: { app: auth-service } } }]
ports: [{ port: 50051, protocol: TCP }]
# ATRA SFTP — explicit allowlist (no general egress)
- to:
- ipBlock:
cidr: 10.50.10.0/24 # ATRA SFTP VLAN (Afghan IP space; verified by NAC)
ports: [{ port: 22, protocol: TCP }]
# On-cluster LLM (compliance-ai) — used only by AI keyword suggester
- to: [{ podSelector: { matchLabels: { app: compliance-ai } } }]
ports: [{ port: 8000, protocol: TCP }]
# DNS, NTP
- to: [{ namespaceSelector: { matchLabels: { name: kube-system } } }]
ports: [{ port: 53, protocol: UDP }, { port: 53, protocol: TCP }]
# NO catch-all egress — offshore traffic is explicitly forbidden by absence

A platform-level default-deny NetworkPolicy ensures any pod without an explicit egress rule cannot reach external IPs; the lack of a 0.0.0.0/0 egress in this manifest is load-bearing for data-residency compliance.

Service mesh (Istio)

  • mTLS strict in the namespace (PeerAuthentication mode: STRICT).
  • AuthorizationPolicy restricts CheckConsent callers by SAN to the allowlist in SECURITY_MODEL §1.1.
  • DestinationRule: outlier detection on Postgres replica connection (eject after 5 consecutive 5xx).

2. Background workers (CronJobs / Deployments)

WorkerKindSchedule / patternLock
consent-dnd-syncCronJob0 3 * * * Asia/KabulRedis consent:lock:dnd_sync
consent-keyword-reloadLong-running pod (in main Deployment)Every 60 sNone (per-replica memory mirror)
consent-double-optin-expiryCronJob*/5 * * * *Redis consent:lock:double_optin_expiry
consent-record-expiryCronJob*/15 * * * *Redis consent:lock:record_expiry
consent-erasure-processorCronJob0 * * * *Redis consent:lock:erasure_processor
consent-audit-chain-verifierCronJob0 2 * * * Asia/KabulRedis consent:lock:audit_verifier
consent-audit-partition-maintainerCronJob30 2 * * * Asia/KabulRedis consent:lock:partition_maint
consent-outbox-relayLong-running Deployment (3 replicas)continuousPostgres FOR UPDATE SKIP LOCKED
consent-cache-warmerCronJob0 */6 * * *Redis consent:lock:cache_warmer
consent-keyword-suggester (AI)CronJob0 4 * * 0Redis consent:lock:keyword_suggester

CronJob template:

apiVersion: batch/v1
kind: CronJob
metadata: { name: consent-dnd-sync, namespace: sms-platform }
spec:
schedule: "0 3 * * *"
timeZone: "Asia/Kabul"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 7
jobTemplate:
spec:
backoffLimit: 2
activeDeadlineSeconds: 1800
template:
spec:
restartPolicy: OnFailure
serviceAccountName: consent-ledger-service-sa
containers:
- name: worker
image: ghcr.io/ghasi/consent-ledger-service:1.0.0
args: ["worker", "dnd-sync"]
env:
# … same env as Deployment …

3. Infrastructure dependencies

DependencyVersionTopologyRegion
PostgreSQL16+Patroni cluster (1 primary + 1 sync standby + 1 async standby)Kabul / Herat / Mazar (per ADR-0004 §3)
PgBouncer1.22+Sidecar in app pod, pool=transaction, pool_size=20 per replicaCo-located
Redis7.2+Cluster mode 3 masters × 2 replicas; consent-ledger-service uses logical DB 4Kabul + Herat replicas
NATS JetStream2.10+3-node cluster per region; mesh-bridged across regions for control-plane streamsKabul / Herat / Mazar
Vault1.16+HA mode; PKI engine (pki/ghasi-consent), Transit (transit/ghasi-consent-*), KV (secret/ghasi/consent/*)Kabul + Herat HA
ATRA SFTP/HTTPSExternal, on-prem at ATRA NOCKabul
Object storage (S3-compatible)MinIO 2026.01+4-node ec(2,4) cluster; bucket ghasi-consent-audit-cold with Object LockKabul + Herat replication
auth-serviceplatformgRPC dependency for citizen-OTP receipt verificationKabul
channel-router-serviceplatformNATS publish/consumeAll regions
compliance-ai (on-cluster LLM)vLLM 0.5+GPU pods (shared with compliance-engine)Kabul + Herat

4. Region affinity (per ADR-0004 §3)

RegionRoleNotes
af-kabul-1Primary for control-plane writesAll RecordConsent/RevokeConsent writes routed here by default
af-herat-1Synchronous standbyRPO ≤ 5 s; promotion target if Kabul fails > 60 s
af-mazar-1Asynchronous tertiaryRPO ≤ 60 s; promotion target if both Kabul and Herat fail

consent-ledger-service deployment runs in all three regions (replicas: 5 per region, total 15 across regions). Local reads are served by the local PG replica; writes proxy to the current primary via Patroni's leader-election Service VIP.

DNS-based routing:

  • consent-ledger-service-grpc.sms-platform.svc.cluster.local → local cluster's pods
  • consent-ledger-service-grpc.global.ghasi.gov.af → GeoDNS to the nearest healthy region
  • Inter-region promotion is handled by Patroni; clients reconnect transparently.

No replica is permitted outside af-* regions; the cluster admission webhook rejects pod schedules to node.topology.kubernetes.io/region != af-*.

5. Environment variables

VariableRequiredDefaultDescription
NODE_ENVyesproduction / staging / development
GRPC_PORTno50071gRPC listener
HTTP_PORTno3071HTTP listener
GRPC_TLS_ENABLEDnotrueRefuses to start false outside development
TLS_CERT_PATH / TLS_KEY_PATH / TLS_CA_PATHyes (prod)mTLS certs from Vault PKI
DATABASE_URLyesPrimary Postgres
DATABASE_REPLICA_URLyesRead replica
REDIS_URLyesRedis cluster
NATS_URLyesNATS server
NATS_CREDS_PATHyesNATS credentials file
VAULT_ADDRyesVault HA endpoint
VAULT_TRANSIT_KEY_PATHyesAudit signing key
ATRA_DND_ENDPOINTyesSFTP / HTTPS URL for daily DND fetch
ATRA_PGP_FINGERPRINTyesExpected signature fingerprint
REGIONyesUsed in metric labels and routing decisions
PEPPER_VERSIONyesIndicates the active MSISDN-pepper version (rotation support)
EVAL_BUDGET_MSno15CheckConsent internal budget
STOP_MO_BUDGET_MSno1500STOP MO end-to-end budget
LOG_LEVELnoinfodebug / info / warn / error
OTLP_ENDPOINTnohttp://otel-collector:4317OpenTelemetry collector

6. Secrets (Vault paths)

SecretVault path
Database credsdatabase/creds/ghasi-consent (24 h dynamic)
Redis credssecret/ghasi/consent/redis
NATS credssecret/ghasi/consent/nats
TLS server certpki/ghasi-consent/issue/server (30 d)
TLS client cert (for mesh peers)pki/ghasi-consent/issue/client/<peer>
MSISDN peppersecret/ghasi/consent/msisdn_pepper (versioned)
Audit signing keytransit/ghasi-consent-audit-signing
Per-tenant KEKtransit/ghasi-consent-<tenantId>
ATRA SFTP keysecret/ghasi/consent/atra-sftp
Citizen OTP HMACsecret/ghasi/consent/citizen-otp-hmac
Confirmation token saltsecret/ghasi/consent/double-optin-salt

Vault Agent Sidecar Injector renders these into in-memory tmpfs mounts; nothing is written to disk in clear.

7. Resource sizing reference

Based on load tests (CONS-US-004 §6 — sustained 5,000 RPS):

  • 5 replicas × (2 vCPU, 1 GiB) = 10 vCPU, 5 GiB total at min
  • Per-replica capacity: ~1,200 CheckConsent RPS at P95 ≤ 5 ms with cache hit
  • Burst headroom (with HPA at maxReplicas=15): ~18,000 RPS (enough for 3× peak + national-event spikes)
  • Postgres connection pool per replica: 20 transaction-mode connections; cluster total 300 (PgBouncer mediates)
  • Redis QPS: ~10,000 at peak; well within cluster headroom

8. Deployment environments

EnvironmentReplicasPostgresRedisNATSNotes
Production5–15 (HPA) per region; 3 regionsPatroni 3-nodeCluster 3×23-node JS clustermTLS strict, full residency enforcement
Staging3 (no HPA scale-up)1 primary + 1 standbyCluster 3×2 (smaller)3-nodeMirrors prod minus regional fan-out
Development1Single instanceSingle instanceEmbeddedmTLS optional; AI mock mode
CI1TestcontainersTestcontainersTestcontainersDeterministic fixtures; no Vault (env-var stubs)