Consent Ledger Service — Deployment Topology

Version: 1.0 Status: Draft Owner: Platform SRE / Trust & Safety Last Updated: 2026-04-21 Companion: SECURITY_MODEL · OBSERVABILITY · ADR-0004

1. Kubernetes resources

Namespace

sms-platform (consent-ledger-service co-resides with the broader platform). All resources include tier=control-plane and service=consent-ledger-service labels for ADR-0004 §3 region selection.

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: consent-ledger-service
  namespace: sms-platform
  labels:
    app: consent-ledger-service
    tier: control-plane
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 2
  selector:
    matchLabels: { app: consent-ledger-service }
  template:
    metadata:
      labels:
        app: consent-ledger-service
        tier: control-plane
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "3071"
        prometheus.io/path: "/metrics"
        sidecar.istio.io/inject: "true"
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "consent-ledger-service"
    spec:
      serviceAccountName: consent-ledger-service-sa
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels: { app: consent-ledger-service }
                topologyKey: topology.kubernetes.io/zone
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: topology.kubernetes.io/region
                    operator: In
                    values: [af-kabul-1, af-herat-1, af-mazar-1]
                  - key: workload-class
                    operator: In
                    values: [control-plane]
      containers:
        - name: consent-ledger-service
          image: ghcr.io/ghasi/consent-ledger-service:1.0.0
          ports:
            - { containerPort: 50071, name: grpc }
            - { containerPort: 3071,  name: http }
          env:
            - { name: NODE_ENV,           value: production }
            - { name: LOG_LEVEL,          value: info }
            - { name: GRPC_PORT,          value: "50071" }
            - { name: HTTP_PORT,          value: "3071" }
            - { name: GRPC_TLS_ENABLED,   value: "true" }
            - { name: TLS_CERT_PATH,      value: /etc/tls/tls.crt }
            - { name: TLS_KEY_PATH,       value: /etc/tls/tls.key }
            - { name: TLS_CA_PATH,        value: /etc/tls/ca.crt }
            - name: DATABASE_URL
              valueFrom: { secretKeyRef: { name: consent-ledger-db, key: url } }
            - name: DATABASE_REPLICA_URL
              valueFrom: { secretKeyRef: { name: consent-ledger-db, key: replica_url } }
            - name: REDIS_URL
              valueFrom: { secretKeyRef: { name: consent-ledger-redis, key: url } }
            - name: NATS_URL
              valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
            - name: NATS_CREDS_PATH
              value: /etc/nats/creds
            - name: VAULT_ADDR
              value: https://vault.platform.svc.cluster.local:8200
            - name: VAULT_TRANSIT_KEY_PATH
              value: transit/ghasi-consent-audit-signing
            - name: ATRA_DND_ENDPOINT
              value: sftp://atra-dnd.gov.af/dnd/latest.csv
            - name: ATRA_PGP_FINGERPRINT
              valueFrom: { secretKeyRef: { name: consent-ledger-atra, key: pgp_fingerprint } }
            - name: REGION
              valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } }
            - name: PEPPER_VERSION
              value: "v3"
          resources:
            requests: { cpu: 500m, memory: 512Mi }
            limits:   { cpu: 2,    memory: 1Gi }
          livenessProbe:
            httpGet: { path: /health/live, port: http }
            initialDelaySeconds: 15
            periodSeconds: 10
          readinessProbe:
            httpGet: { path: /health/ready, port: http }
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "kill -SIGTERM 1 && sleep 30"]   # graceful drain
          volumeMounts:
            - { name: tls,   mountPath: /etc/tls,   readOnly: true }
            - { name: nats,  mountPath: /etc/nats,  readOnly: true }
      volumes:
        - name: tls
          secret: { secretName: consent-ledger-tls }
        - name: nats
          secret: { secretName: nats-credentials }
      tolerations:
        - { key: workload-class, operator: Equal, value: control-plane, effect: NoSchedule }

HorizontalPodAutoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: consent-ledger-service
  namespace: sms-platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: consent-ledger-service
  minReplicas: 5
  maxReplicas: 15
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 60 }
    - type: Resource
      resource:
        name: memory
        target: { type: Utilization, averageUtilization: 70 }
    - type: Pods
      pods:
        metric: { name: consent_check_in_flight }
        target: { type: AverageValue, averageValue: "1500" }   # scale up if approaching the 2,000 cap
    - type: Pods
      pods:
        metric: { name: consent_check_duration_seconds_p95 }
        target: { type: AverageValue, averageValue: "0.004" }   # scale up before P95 breaches 5 ms
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Pods, value: 3, periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods, value: 1, periodSeconds: 60

PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: consent-ledger-service-pdb, namespace: sms-platform }
spec:
  minAvailable: 4
  selector: { matchLabels: { app: consent-ledger-service } }

Services

apiVersion: v1
kind: Service
metadata: { name: consent-ledger-service-grpc, namespace: sms-platform }
spec:
  selector: { app: consent-ledger-service }
  ports: [ { name: grpc, port: 50071, targetPort: grpc } ]
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata: { name: consent-ledger-service-http, namespace: sms-platform }
spec:
  selector: { app: consent-ledger-service }
  ports: [ { name: http, port: 3071, targetPort: http } ]
  type: ClusterIP

NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: consent-ledger-service, namespace: sms-platform }
spec:
  podSelector: { matchLabels: { app: consent-ledger-service } }
  policyTypes: [Ingress, Egress]
  ingress:
    # gRPC callers
    - from:
        - podSelector: { matchLabels: { app: compliance-engine } }
        - podSelector: { matchLabels: { app: routing-engine } }
        - podSelector: { matchLabels: { app: sms-firewall-service } }
        - podSelector: { matchLabels: { app: tenant-sdk-gateway } }
        - podSelector: { matchLabels: { app: regulator-portal-service } }
      ports: [{ port: 50071, protocol: TCP }]
    # REST callers
    - from:
        - podSelector: { matchLabels: { app: kong } }   # admin / tenant / citizen
      ports: [{ port: 3071, protocol: TCP }]
    # Prometheus
    - from:
        - namespaceSelector: { matchLabels: { name: monitoring } }
      ports: [{ port: 3071, protocol: TCP }]
  egress:
    - to: [{ podSelector: { matchLabels: { app: postgresql } } }]
      ports: [{ port: 5432, protocol: TCP }]
    - to: [{ podSelector: { matchLabels: { app: redis } } }]
      ports: [{ port: 6379, protocol: TCP }]
    - to: [{ podSelector: { matchLabels: { app: nats } } }]
      ports: [{ port: 4222, protocol: TCP }]
    - to: [{ podSelector: { matchLabels: { app: vault } } }]
      ports: [{ port: 8200, protocol: TCP }]
    - to: [{ podSelector: { matchLabels: { app: auth-service } } }]
      ports: [{ port: 50051, protocol: TCP }]
    # ATRA SFTP — explicit allowlist (no general egress)
    - to:
        - ipBlock:
            cidr: 10.50.10.0/24                  # ATRA SFTP VLAN (Afghan IP space; verified by NAC)
      ports: [{ port: 22, protocol: TCP }]
    # On-cluster LLM (compliance-ai) — used only by AI keyword suggester
    - to: [{ podSelector: { matchLabels: { app: compliance-ai } } }]
      ports: [{ port: 8000, protocol: TCP }]
    # DNS, NTP
    - to: [{ namespaceSelector: { matchLabels: { name: kube-system } } }]
      ports: [{ port: 53, protocol: UDP }, { port: 53, protocol: TCP }]
  # NO catch-all egress — offshore traffic is explicitly forbidden by absence

A platform-level default-deny NetworkPolicy ensures any pod without an explicit egress rule cannot reach external IPs; the lack of a 0.0.0.0/0 egress in this manifest is load-bearing for data-residency compliance.

Service mesh (Istio)

mTLS strict in the namespace (PeerAuthentication mode: STRICT).
AuthorizationPolicy restricts CheckConsent callers by SAN to the allowlist in SECURITY_MODEL §1.1.
DestinationRule: outlier detection on Postgres replica connection (eject after 5 consecutive 5xx).

2. Background workers (CronJobs / Deployments)

Worker	Kind	Schedule / pattern	Lock
`consent-dnd-sync`	CronJob	`0 3 * * *` Asia/Kabul	Redis `consent:lock:dnd_sync`
`consent-keyword-reload`	Long-running pod (in main Deployment)	Every 60 s	None (per-replica memory mirror)
`consent-double-optin-expiry`	CronJob	`/5 * * *`	Redis `consent:lock:double_optin_expiry`
`consent-record-expiry`	CronJob	`/15 * * *`	Redis `consent:lock:record_expiry`
`consent-erasure-processor`	CronJob	`0 * * * *`	Redis `consent:lock:erasure_processor`
`consent-audit-chain-verifier`	CronJob	`0 2 * * *` Asia/Kabul	Redis `consent:lock:audit_verifier`
`consent-audit-partition-maintainer`	CronJob	`30 2 * * *` Asia/Kabul	Redis `consent:lock:partition_maint`
`consent-outbox-relay`	Long-running Deployment (3 replicas)	continuous	Postgres `FOR UPDATE SKIP LOCKED`
`consent-cache-warmer`	CronJob	`0 /6 * *`	Redis `consent:lock:cache_warmer`
`consent-keyword-suggester` (AI)	CronJob	`0 4 * * 0`	Redis `consent:lock:keyword_suggester`

CronJob template:

apiVersion: batch/v1
kind: CronJob
metadata: { name: consent-dnd-sync, namespace: sms-platform }
spec:
  schedule: "0 3 * * *"
  timeZone: "Asia/Kabul"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 7
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 1800
      template:
        spec:
          restartPolicy: OnFailure
          serviceAccountName: consent-ledger-service-sa
          containers:
            - name: worker
              image: ghcr.io/ghasi/consent-ledger-service:1.0.0
              args: ["worker", "dnd-sync"]
              env:
                # … same env as Deployment …

3. Infrastructure dependencies

Dependency	Version	Topology	Region
PostgreSQL	16+	Patroni cluster (1 primary + 1 sync standby + 1 async standby)	Kabul / Herat / Mazar (per ADR-0004 §3)
PgBouncer	1.22+	Sidecar in app pod, pool=transaction, pool_size=20 per replica	Co-located
Redis	7.2+	Cluster mode 3 masters × 2 replicas; `consent-ledger-service` uses logical DB 4	Kabul + Herat replicas
NATS JetStream	2.10+	3-node cluster per region; mesh-bridged across regions for control-plane streams	Kabul / Herat / Mazar
Vault	1.16+	HA mode; PKI engine (`pki/ghasi-consent`), Transit (`transit/ghasi-consent-`), KV (`secret/ghasi/consent/`)	Kabul + Herat HA
ATRA SFTP/HTTPS	—	External, on-prem at ATRA NOC	Kabul
Object storage (S3-compatible)	MinIO 2026.01+	4-node ec(2,4) cluster; bucket `ghasi-consent-audit-cold` with Object Lock	Kabul + Herat replication
`auth-service`	platform	gRPC dependency for citizen-OTP receipt verification	Kabul
`channel-router-service`	platform	NATS publish/consume	All regions
`compliance-ai` (on-cluster LLM)	vLLM 0.5+	GPU pods (shared with compliance-engine)	Kabul + Herat

4. Region affinity (per ADR-0004 §3)

Region	Role	Notes
`af-kabul-1`	Primary for control-plane writes	All `RecordConsent`/`RevokeConsent` writes routed here by default
`af-herat-1`	Synchronous standby	RPO ≤ 5 s; promotion target if Kabul fails > 60 s
`af-mazar-1`	Asynchronous tertiary	RPO ≤ 60 s; promotion target if both Kabul and Herat fail

consent-ledger-service deployment runs in all three regions (replicas: 5 per region, total 15 across regions). Local reads are served by the local PG replica; writes proxy to the current primary via Patroni's leader-election Service VIP.

DNS-based routing:

consent-ledger-service-grpc.sms-platform.svc.cluster.local → local cluster's pods
consent-ledger-service-grpc.global.ghasi.gov.af → GeoDNS to the nearest healthy region
Inter-region promotion is handled by Patroni; clients reconnect transparently.

No replica is permitted outside af-* regions; the cluster admission webhook rejects pod schedules to node.topology.kubernetes.io/region != af-*.

5. Environment variables

Variable	Required	Default	Description
`NODE_ENV`	yes	—	`production` / `staging` / `development`
`GRPC_PORT`	no	`50071`	gRPC listener
`HTTP_PORT`	no	`3071`	HTTP listener
`GRPC_TLS_ENABLED`	no	`true`	Refuses to start `false` outside `development`
`TLS_CERT_PATH` / `TLS_KEY_PATH` / `TLS_CA_PATH`	yes (prod)	—	mTLS certs from Vault PKI
`DATABASE_URL`	yes	—	Primary Postgres
`DATABASE_REPLICA_URL`	yes	—	Read replica
`REDIS_URL`	yes	—	Redis cluster
`NATS_URL`	yes	—	NATS server
`NATS_CREDS_PATH`	yes	—	NATS credentials file
`VAULT_ADDR`	yes	—	Vault HA endpoint
`VAULT_TRANSIT_KEY_PATH`	yes	—	Audit signing key
`ATRA_DND_ENDPOINT`	yes	—	SFTP / HTTPS URL for daily DND fetch
`ATRA_PGP_FINGERPRINT`	yes	—	Expected signature fingerprint
`REGION`	yes	—	Used in metric labels and routing decisions
`PEPPER_VERSION`	yes	—	Indicates the active MSISDN-pepper version (rotation support)
`EVAL_BUDGET_MS`	no	`15`	CheckConsent internal budget
`STOP_MO_BUDGET_MS`	no	`1500`	STOP MO end-to-end budget
`LOG_LEVEL`	no	`info`	`debug` / `info` / `warn` / `error`
`OTLP_ENDPOINT`	no	`http://otel-collector:4317`	OpenTelemetry collector

6. Secrets (Vault paths)

Secret	Vault path
Database creds	`database/creds/ghasi-consent` (24 h dynamic)
Redis creds	`secret/ghasi/consent/redis`
NATS creds	`secret/ghasi/consent/nats`
TLS server cert	`pki/ghasi-consent/issue/server` (30 d)
TLS client cert (for mesh peers)	`pki/ghasi-consent/issue/client/<peer>`
MSISDN pepper	`secret/ghasi/consent/msisdn_pepper` (versioned)
Audit signing key	`transit/ghasi-consent-audit-signing`
Per-tenant KEK	`transit/ghasi-consent-<tenantId>`
ATRA SFTP key	`secret/ghasi/consent/atra-sftp`
Citizen OTP HMAC	`secret/ghasi/consent/citizen-otp-hmac`
Confirmation token salt	`secret/ghasi/consent/double-optin-salt`

Vault Agent Sidecar Injector renders these into in-memory tmpfs mounts; nothing is written to disk in clear.

7. Resource sizing reference

Based on load tests (CONS-US-004 §6 — sustained 5,000 RPS):

5 replicas × (2 vCPU, 1 GiB) = 10 vCPU, 5 GiB total at min
Per-replica capacity: ~1,200 CheckConsent RPS at P95 ≤ 5 ms with cache hit
Burst headroom (with HPA at maxReplicas=15): ~18,000 RPS (enough for 3× peak + national-event spikes)
Postgres connection pool per replica: 20 transaction-mode connections; cluster total 300 (PgBouncer mediates)
Redis QPS: ~10,000 at peak; well within cluster headroom

8. Deployment environments

Environment	Replicas	Postgres	Redis	NATS	Notes
Production	5–15 (HPA) per region; 3 regions	Patroni 3-node	Cluster 3×2	3-node JS cluster	mTLS strict, full residency enforcement
Staging	3 (no HPA scale-up)	1 primary + 1 standby	Cluster 3×2 (smaller)	3-node	Mirrors prod minus regional fan-out
Development	1	Single instance	Single instance	Embedded	mTLS optional; AI mock mode
CI	1	Testcontainers	Testcontainers	Testcontainers	Deterministic fixtures; no Vault (env-var stubs)

1. Kubernetes resources​

Namespace​

Deployment​

HorizontalPodAutoscaler​

PodDisruptionBudget​

Services​

NetworkPolicy​

Service mesh (Istio)​

2. Background workers (CronJobs / Deployments)​

3. Infrastructure dependencies​

4. Region affinity (per ADR-0004 §3)​

5. Environment variables​

6. Secrets (Vault paths)​

7. Resource sizing reference​

8. Deployment environments​