Number Intelligence Service — Deployment Topology

Version: 1.0 Status: Draft Owner: Messaging Core / Platform SRE Last Updated: 2026-04-21 Companion: SECURITY_MODEL · OBSERVABILITY · ADR-0004 §14

1. Kubernetes resources

1.1 Namespace and labels

sms-platform. All resources include tier=control-plane-adjacent (NI is read-heavy active-active, not strict primary-standby) and service=number-intelligence-service labels for ADR-0004 §14 region selection.

1.2 Workloads

NI deploys as two Deployments + one DaemonSet:

number-intelligence-service Deployment — gRPC + REST API (HPA on grpc_inflight_requests + cpu).
number-intelligence-batch Deployment — MNP / EIR reconciliation, audit verifier, partition maintainer (single replica with leader-election; pinned to kbl region).
ni-hlr-gateway DaemonSet — one pod per data-plane node; holds SIGTRAN sockets and per-MNO REST connections.

1.3 Hot-path Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: number-intelligence-service
  namespace: sms-platform
  labels: { app: number-intelligence-service, tier: control-plane-adjacent }
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate: { maxUnavailable: 1, maxSurge: 2 }
  selector: { matchLabels: { app: number-intelligence-service } }
  template:
    metadata:
      labels: { app: number-intelligence-service, tier: control-plane-adjacent }
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "3073"
        prometheus.io/path: "/metrics"
        sidecar.istio.io/inject: "true"
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "number-intelligence-service"
    spec:
      serviceAccountName: number-intelligence-service-sa
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector: { matchLabels: { app: number-intelligence-service } }
                topologyKey: topology.kubernetes.io/zone
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - { key: topology.kubernetes.io/region, operator: In, values: [af-kabul-1, af-mzr-1] }
                  - { key: workload-class, operator: In, values: [hot-path] }
      containers:
        - name: number-intelligence-service
          image: ghcr.io/ghasi/number-intelligence-service:1.0.0
          ports:
            - { containerPort: 50073, name: grpc }
            - { containerPort: 3073,  name: http }
          env:
            - { name: NODE_ENV, value: production }
            - { name: LOG_LEVEL, value: info }
            - { name: GRPC_PORT, value: "50073" }
            - { name: HTTP_PORT, value: "3073" }
            - { name: GRPC_TLS_ENABLED, value: "true" }
            - { name: TLS_CERT_PATH, value: /etc/tls/tls.crt }
            - { name: TLS_KEY_PATH,  value: /etc/tls/tls.key }
            - { name: TLS_CA_PATH,   value: /etc/tls/ca.crt }
            - name: DATABASE_URL
              valueFrom: { secretKeyRef: { name: numint-db, key: url } }
            - name: DATABASE_REPLICA_URL
              valueFrom: { secretKeyRef: { name: numint-db, key: replica_url } }
            - name: REDIS_URL
              valueFrom: { secretKeyRef: { name: numint-redis, key: url } }
            - name: NATS_URL
              valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
            - name: NATS_CREDS_PATH
              value: /etc/nats/creds
            - name: VAULT_ADDR
              value: https://vault.platform.svc.cluster.local:8200
            - name: HLR_GATEWAY_ENDPOINT
              value: dns:///ni-hlr-gateway.sms-platform.svc.cluster.local:50074
            - name: REGION
              valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } }
            - name: CACHE_WARM_TARGET
              value: "500000"
            - name: LRU_MAX_ENTRIES
              value: "100000"
          readinessProbe:
            httpGet: { path: /health/ready, port: 3073 }
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet: { path: /health/live, port: 3073 }
            initialDelaySeconds: 15
            periodSeconds: 10
          resources:
            requests: { cpu: "1500m", memory: "2Gi" }
            limits:   { cpu: "4000m", memory: "4Gi" }
          lifecycle:
            preStop:
              exec: { command: ["/bin/sh", "-c", "sleep 15"] }
      terminationGracePeriodSeconds: 30

1.4 HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: number-intelligence-service, namespace: sms-platform }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: number-intelligence-service }
  minReplicas: 6
  maxReplicas: 30
  metrics:
    - type: Resource
      resource: { name: cpu, target: { type: Utilization, averageUtilization: 65 } }
    - type: Pods
      pods: { metric: { name: grpc_inflight_requests }, target: { type: AverageValue, averageValue: "1500" } }
  behavior:
    scaleDown: { stabilizationWindowSeconds: 300, policies: [{ type: Percent, value: 25, periodSeconds: 60 }] }
    scaleUp:   { stabilizationWindowSeconds: 30,  policies: [{ type: Percent, value: 100, periodSeconds: 30 }] }

1.5 PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: number-intelligence-service-pdb, namespace: sms-platform }
spec:
  minAvailable: 4   # per region; ensures rolling restarts cannot drop below 4 hot-path replicas
  selector: { matchLabels: { app: number-intelligence-service } }

1.6 Batch Deployment (single-leader)

apiVersion: apps/v1
kind: Deployment
metadata: { name: number-intelligence-batch, namespace: sms-platform }
spec:
  replicas: 2          # leader-election; 1 active, 1 warm-standby
  selector: { matchLabels: { app: number-intelligence-batch } }
  template:
    metadata: { labels: { app: number-intelligence-batch, tier: control-plane-adjacent } }
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - { key: topology.kubernetes.io/region, operator: In, values: [af-kabul-1] }
                  - { key: workload-class, operator: In, values: [batch] }
      containers:
        - name: batch
          image: ghcr.io/ghasi/number-intelligence-service:1.0.0
          command: ["node", "dist/batch/main.js"]
          env:
            - { name: BATCH_LEADER_LOCK_KEY, value: "numint:lock:batch-leader" }
            # MNP/EIR cron schedules read from ConfigMap
          resources:
            requests: { cpu: "1000m", memory: "2Gi" }
            limits:   { cpu: "2000m", memory: "4Gi" }

A Mazar-region warm-standby (also replicas: 2, pinned to af-mzr-1) waits on the same Redis lock and only takes over if the Kabul lock has been unheld > 10 min.

1.7 ni-hlr-gateway DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata: { name: ni-hlr-gateway, namespace: sms-platform }
spec:
  selector: { matchLabels: { app: ni-hlr-gateway } }
  template:
    metadata: { labels: { app: ni-hlr-gateway, tier: data-plane } }
    spec:
      hostNetwork: true   # required for SCTP M3UA on dedicated NIC
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - { key: workload-class, operator: In, values: [signalling] }
                  - { key: topology.kubernetes.io/region, operator: In, values: [af-kabul-1, af-mzr-1] }
      containers:
        - name: gateway
          image: ghcr.io/ghasi/ni-hlr-gateway:1.0.0
          ports:
            - { containerPort: 50074, name: grpc, protocol: TCP }
            - { containerPort: 2905,  name: m3ua, protocol: SCTP }
          env:
            - { name: SIGTRAN_LOCAL_PC, valueFrom: { secretKeyRef: { name: numint-sigtran, key: local_pc } } }
            - { name: PCAP_KMS_KEY,    value: numint-pcap-kek }
          securityContext:
            capabilities: { add: [NET_ADMIN, NET_RAW] }
          resources:
            requests: { cpu: "500m", memory: "512Mi" }
            limits:   { cpu: "1500m", memory: "1Gi" }

2. Services

apiVersion: v1
kind: Service
metadata: { name: number-intelligence-service, namespace: sms-platform }
spec:
  type: ClusterIP
  ports:
    - { name: grpc, port: 50073, targetPort: 50073 }
    - { name: http, port: 3073,  targetPort: 3073 }
  selector: { app: number-intelligence-service }
---
apiVersion: v1
kind: Service
metadata: { name: ni-hlr-gateway, namespace: sms-platform }
spec:
  type: ClusterIP
  clusterIP: None     # headless — gRPC clients DNS-resolve all gateway pods
  ports:
    - { name: grpc, port: 50074, targetPort: 50074 }
  selector: { app: ni-hlr-gateway }

3. Kong route (Public Lookup API)

apiVersion: configuration.konghq.com/v1
kind: KongIngress
metadata: { name: numint-public-lookup, namespace: sms-platform }
upstream:
  algorithm: least-connections
proxy:
  read_timeout: 5000
  connect_timeout: 2000
  retries: 0
plugins:
  - jwt
  - { name: rate-limiting-advanced, config: { limit: [600], window_size: [60], strategy: redis } }
  - { name: cors }
  - { name: bot-detection }

4. ConfigMap

apiVersion: v1
kind: ConfigMap
metadata: { name: numint-config, namespace: sms-platform }
data:
  cron.mnp_recon.daily_at: "02:30"   # Asia/Kabul
  cron.eir_recon.daily_at: "03:30"
  cron.audit_verifier.daily_at: "04:30"
  cron.cache_warm.hourly: "*/60"
  ttl.line_type_seconds: "2592000"   # 30d
  ttl.mno_seconds: "86400"           # 24h
  ttl.vlr_seconds: "300"             # 5m
  ttl.eir_seconds: "86400"
  budget.lookup_total_ms: "12"
  hlr.map.timeout_ms: "1500"
  hlr.rest.timeout_ms: "800"
  hlr.pcap_sample_rate: "0.001"
  prefix_table_csv_url: "vault:secret/ghasi/numint/prefix-table#csv"

5. NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: numint-ingress, namespace: sms-platform }
spec:
  podSelector: { matchLabels: { app: number-intelligence-service } }
  policyTypes: [Ingress]
  ingress:
    - from:
        - { podSelector: { matchLabels: { app: routing-engine } } }
        - { podSelector: { matchLabels: { app: sms-firewall-service } } }
        - { podSelector: { matchLabels: { app: compliance-engine } } }
        - { podSelector: { matchLabels: { app: channel-router-service } } }
        - { podSelector: { matchLabels: { app: fraud-intel-service } } }
        - { podSelector: { matchLabels: { app: sms-orchestrator } } }
        - { namespaceSelector: { matchLabels: { name: kong } } }   # Public Lookup REST
      ports:
        - { port: 50073, protocol: TCP }
        - { port: 3073,  protocol: TCP }
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: numint-egress, namespace: sms-platform }
spec:
  podSelector: { matchLabels: { app: number-intelligence-service } }
  policyTypes: [Egress]
  egress:
    - to:
        - { podSelector: { matchLabels: { component: postgres-numint } } }
        - { podSelector: { matchLabels: { component: redis-numint } } }
        - { namespaceSelector: { matchLabels: { name: nats } } }
        - { namespaceSelector: { matchLabels: { name: vault } } }
        - { podSelector: { matchLabels: { app: ni-hlr-gateway } } }
        - { podSelector: { matchLabels: { app: minio } } }
    - to:
        - { ipBlock: { cidr: 10.0.0.0/8 } }   # internal cluster only
      ports:
        - { port: 443, protocol: TCP }   # NO public internet egress from hot path

A separate egress policy on number-intelligence-batch permits SFTP to per-MNO endpoints (Vault-pinned IP allowlist).

6. Istio AuthorizationPolicy

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata: { name: numint-spiffe-allowlist, namespace: sms-platform }
spec:
  selector: { matchLabels: { app: number-intelligence-service } }
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "spiffe://ghasi.platform/ns/routing/sa/routing-engine"
              - "spiffe://ghasi.platform/ns/firewall/sa/sms-firewall-service"
              - "spiffe://ghasi.platform/ns/compliance/sa/compliance-engine"
              - "spiffe://ghasi.platform/ns/router/sa/channel-router-service"
              - "spiffe://ghasi.platform/ns/fraud/sa/fraud-intel-service"
              - "spiffe://ghasi.platform/ns/orchestrator/sa/sms-orchestrator"
              - "spiffe://ghasi.platform/ns/gateway/sa/tenant-sdk-gateway"
      to:
        - operation: { ports: ["50073"] }

7. Infrastructure dependencies

Dependency	Spec
PostgreSQL (Patroni)	`numint` schema; 3-node cluster per region (kbl + mzr); 16 vCPU / 64 GB / 2 TB NVMe per node; PgBouncer in transaction-pooling mode (max_client_conn 5000, default_pool_size 50)
PostgreSQL read-replicas	2 per region for hot-path SELECTs
Redis cluster	6 nodes per region (3 primaries × 2 replicas); cluster mode; 16 GB memory per primary; hash-slot tags `{tenantId}` and `{msisdnHash}`
NATS JetStream	Shared platform cluster; streams `NUMBER_INTELLIGENCE_EVENTS`, `NUMINT_RECONCILIATION`, `NUMINT_EIR`, `NUMINT_AUDIT_OPS`, `NUMINT_BILLING`
Vault	KV (peppers, salts, MNO creds), Transit (PCAP KEK, audit signing), PKI (mTLS certs)
MinIO / S3	Buckets `numint-mnp-raw`, `numint-eir-raw`, `numint-hlr-pcap`, `numint-audit-cold`; SSE-KMS; Object Lock (governance, 7 y) on audit cold
Optional SS7 gateway	Per-MNO M3UA/SCTP point codes; SCCP GTs; provisioned in `ni-hlr-gateway` config map

8. Secrets (Vault Agent injection)

Secret	Vault path	Mounted as
gRPC server cert + key	`pki/ghasi-numint/issue/server`	File `/etc/tls/{tls.crt,tls.key,ca.crt}`
Postgres credentials	`database/creds/numint-app`	Env `DATABASE_URL`
Redis credentials	`secret/ghasi/numint/redis`	Env `REDIS_URL`
NATS credentials	`secret/ghasi/nats/numint`	File `/etc/nats/creds`
MSISDN pepper	`secret/ghasi/numint/msisdn_pepper`	Env (refresh on rotation)
IMEI pepper	`secret/ghasi/numint/imei_pepper`	Env
Per-tenant salts	`secret/ghasi/numint/tenant-salts/*`	Lazy fetch (cached ≤ 5 min)
MNO MNP SFTP keys	`secret/ghasi/numint/mno-sftp/*`	Mounted in batch pods only
MNO REST adapter creds	`secret/ghasi/numint/mno-rest/*`	Mounted in `ni-hlr-gateway` pods
PCAP KEK	`transit/ghasi-numint-pcap-kek`	Vault Transit (no export)
Audit chain signing key	`transit/ghasi-numint-audit-signing`	Vault Transit (no export)

9. Multi-region posture (per ADR-0004 §14)

              ┌───────────────────────┐         ┌───────────────────────┐
              │   af-kabul-1 (RW)     │ stream  │   af-mzr-1 (RW)       │
              │  ─────────────────── │ ◄──────►│  ─────────────────── │
              │  6× hot-path pods     │  sync   │  6× hot-path pods     │
              │  2× batch (leader)    │         │  2× batch (warm-idle) │
              │  1× hlr-gw / node     │         │  1× hlr-gw / node     │
              │  Postgres primary     │         │  Postgres primary     │
              │  Redis cluster        │         │  Redis cluster        │
              └───────────────────────┘         └───────────────────────┘
                       ▲                                   ▲
                       └────── routing-engine reads ───────┘
                                  (region-local)

Both regions actively serve hot reads. Writes go to the region-local Postgres (multi-master via streaming replication; per-aggregate conflict policy in SYNC_CONTRACT §4). Batch jobs run in Kabul only under leader-lock; Mazar takes over only on extended Kabul outage.

10. Rollout strategy

Image pinned by digest; promotion to staging → canary 10 % in kbl for 1 h → canary 50 % for 1 h → 100 % both regions.
Cache warm on every replica; readiness gate prevents traffic until ≥ 80 % warmed.
No deploy permitted during MNP-recon hours (02:00 – 05:00 Asia/Kabul) without on-call approval.
Rollback: image pin to previous digest; kubectl rollout undo; cache invalidation not required (LRU rebuilds, Redis keys unchanged).

11. Observability binding

Prometheus scrapes both ports; Service-monitor created via Helm chart.
Grafana dashboards numint-hot-path.json, numint-mnp-eir.json, numint-adapter.json, numint-public-api.json, numint-audit.json deployed via the Grafana operator.
Alerts in OBSERVABILITY §3.
Tempo/Jaeger receives OTLP from all NI pods.

1. Kubernetes resources​

1.1 Namespace and labels​

1.2 Workloads​

1.3 Hot-path Deployment​

1.4 HPA​

1.5 PodDisruptionBudget​

1.6 Batch Deployment (single-leader)​

1.7 ni-hlr-gateway DaemonSet​

2. Services​

3. Kong route (Public Lookup API)​

4. ConfigMap​

5. NetworkPolicy​

6. Istio AuthorizationPolicy​

7. Infrastructure dependencies​

8. Secrets (Vault Agent injection)​

9. Multi-region posture (per ADR-0004 §14)​

10. Rollout strategy​

11. Observability binding​