Skip to main content

Number Intelligence Service — Deployment Topology

Version: 1.0 Status: Draft Owner: Messaging Core / Platform SRE Last Updated: 2026-04-21 Companion: SECURITY_MODEL · OBSERVABILITY · ADR-0004 §14

1. Kubernetes resources

1.1 Namespace and labels

sms-platform. All resources include tier=control-plane-adjacent (NI is read-heavy active-active, not strict primary-standby) and service=number-intelligence-service labels for ADR-0004 §14 region selection.

1.2 Workloads

NI deploys as two Deployments + one DaemonSet:

  • number-intelligence-service Deployment — gRPC + REST API (HPA on grpc_inflight_requests + cpu).
  • number-intelligence-batch Deployment — MNP / EIR reconciliation, audit verifier, partition maintainer (single replica with leader-election; pinned to kbl region).
  • ni-hlr-gateway DaemonSet — one pod per data-plane node; holds SIGTRAN sockets and per-MNO REST connections.

1.3 Hot-path Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
name: number-intelligence-service
namespace: sms-platform
labels: { app: number-intelligence-service, tier: control-plane-adjacent }
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate: { maxUnavailable: 1, maxSurge: 2 }
selector: { matchLabels: { app: number-intelligence-service } }
template:
metadata:
labels: { app: number-intelligence-service, tier: control-plane-adjacent }
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3073"
prometheus.io/path: "/metrics"
sidecar.istio.io/inject: "true"
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "number-intelligence-service"
spec:
serviceAccountName: number-intelligence-service-sa
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector: { matchLabels: { app: number-intelligence-service } }
topologyKey: topology.kubernetes.io/zone
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- { key: topology.kubernetes.io/region, operator: In, values: [af-kabul-1, af-mzr-1] }
- { key: workload-class, operator: In, values: [hot-path] }
containers:
- name: number-intelligence-service
image: ghcr.io/ghasi/number-intelligence-service:1.0.0
ports:
- { containerPort: 50073, name: grpc }
- { containerPort: 3073, name: http }
env:
- { name: NODE_ENV, value: production }
- { name: LOG_LEVEL, value: info }
- { name: GRPC_PORT, value: "50073" }
- { name: HTTP_PORT, value: "3073" }
- { name: GRPC_TLS_ENABLED, value: "true" }
- { name: TLS_CERT_PATH, value: /etc/tls/tls.crt }
- { name: TLS_KEY_PATH, value: /etc/tls/tls.key }
- { name: TLS_CA_PATH, value: /etc/tls/ca.crt }
- name: DATABASE_URL
valueFrom: { secretKeyRef: { name: numint-db, key: url } }
- name: DATABASE_REPLICA_URL
valueFrom: { secretKeyRef: { name: numint-db, key: replica_url } }
- name: REDIS_URL
valueFrom: { secretKeyRef: { name: numint-redis, key: url } }
- name: NATS_URL
valueFrom: { secretKeyRef: { name: nats-credentials, key: url } }
- name: NATS_CREDS_PATH
value: /etc/nats/creds
- name: VAULT_ADDR
value: https://vault.platform.svc.cluster.local:8200
- name: HLR_GATEWAY_ENDPOINT
value: dns:///ni-hlr-gateway.sms-platform.svc.cluster.local:50074
- name: REGION
valueFrom: { fieldRef: { fieldPath: metadata.labels['topology.kubernetes.io/region'] } }
- name: CACHE_WARM_TARGET
value: "500000"
- name: LRU_MAX_ENTRIES
value: "100000"
readinessProbe:
httpGet: { path: /health/ready, port: 3073 }
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet: { path: /health/live, port: 3073 }
initialDelaySeconds: 15
periodSeconds: 10
resources:
requests: { cpu: "1500m", memory: "2Gi" }
limits: { cpu: "4000m", memory: "4Gi" }
lifecycle:
preStop:
exec: { command: ["/bin/sh", "-c", "sleep 15"] }
terminationGracePeriodSeconds: 30

1.4 HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: number-intelligence-service, namespace: sms-platform }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: number-intelligence-service }
minReplicas: 6
maxReplicas: 30
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 65 } }
- type: Pods
pods: { metric: { name: grpc_inflight_requests }, target: { type: AverageValue, averageValue: "1500" } }
behavior:
scaleDown: { stabilizationWindowSeconds: 300, policies: [{ type: Percent, value: 25, periodSeconds: 60 }] }
scaleUp: { stabilizationWindowSeconds: 30, policies: [{ type: Percent, value: 100, periodSeconds: 30 }] }

1.5 PodDisruptionBudget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: number-intelligence-service-pdb, namespace: sms-platform }
spec:
minAvailable: 4 # per region; ensures rolling restarts cannot drop below 4 hot-path replicas
selector: { matchLabels: { app: number-intelligence-service } }

1.6 Batch Deployment (single-leader)

apiVersion: apps/v1
kind: Deployment
metadata: { name: number-intelligence-batch, namespace: sms-platform }
spec:
replicas: 2 # leader-election; 1 active, 1 warm-standby
selector: { matchLabels: { app: number-intelligence-batch } }
template:
metadata: { labels: { app: number-intelligence-batch, tier: control-plane-adjacent } }
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- { key: topology.kubernetes.io/region, operator: In, values: [af-kabul-1] }
- { key: workload-class, operator: In, values: [batch] }
containers:
- name: batch
image: ghcr.io/ghasi/number-intelligence-service:1.0.0
command: ["node", "dist/batch/main.js"]
env:
- { name: BATCH_LEADER_LOCK_KEY, value: "numint:lock:batch-leader" }
# MNP/EIR cron schedules read from ConfigMap
resources:
requests: { cpu: "1000m", memory: "2Gi" }
limits: { cpu: "2000m", memory: "4Gi" }

A Mazar-region warm-standby (also replicas: 2, pinned to af-mzr-1) waits on the same Redis lock and only takes over if the Kabul lock has been unheld > 10 min.

1.7 ni-hlr-gateway DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata: { name: ni-hlr-gateway, namespace: sms-platform }
spec:
selector: { matchLabels: { app: ni-hlr-gateway } }
template:
metadata: { labels: { app: ni-hlr-gateway, tier: data-plane } }
spec:
hostNetwork: true # required for SCTP M3UA on dedicated NIC
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- { key: workload-class, operator: In, values: [signalling] }
- { key: topology.kubernetes.io/region, operator: In, values: [af-kabul-1, af-mzr-1] }
containers:
- name: gateway
image: ghcr.io/ghasi/ni-hlr-gateway:1.0.0
ports:
- { containerPort: 50074, name: grpc, protocol: TCP }
- { containerPort: 2905, name: m3ua, protocol: SCTP }
env:
- { name: SIGTRAN_LOCAL_PC, valueFrom: { secretKeyRef: { name: numint-sigtran, key: local_pc } } }
- { name: PCAP_KMS_KEY, value: numint-pcap-kek }
securityContext:
capabilities: { add: [NET_ADMIN, NET_RAW] }
resources:
requests: { cpu: "500m", memory: "512Mi" }
limits: { cpu: "1500m", memory: "1Gi" }

2. Services

apiVersion: v1
kind: Service
metadata: { name: number-intelligence-service, namespace: sms-platform }
spec:
type: ClusterIP
ports:
- { name: grpc, port: 50073, targetPort: 50073 }
- { name: http, port: 3073, targetPort: 3073 }
selector: { app: number-intelligence-service }
---
apiVersion: v1
kind: Service
metadata: { name: ni-hlr-gateway, namespace: sms-platform }
spec:
type: ClusterIP
clusterIP: None # headless — gRPC clients DNS-resolve all gateway pods
ports:
- { name: grpc, port: 50074, targetPort: 50074 }
selector: { app: ni-hlr-gateway }

3. Kong route (Public Lookup API)

apiVersion: configuration.konghq.com/v1
kind: KongIngress
metadata: { name: numint-public-lookup, namespace: sms-platform }
upstream:
algorithm: least-connections
proxy:
read_timeout: 5000
connect_timeout: 2000
retries: 0
plugins:
- jwt
- { name: rate-limiting-advanced, config: { limit: [600], window_size: [60], strategy: redis } }
- { name: cors }
- { name: bot-detection }

4. ConfigMap

apiVersion: v1
kind: ConfigMap
metadata: { name: numint-config, namespace: sms-platform }
data:
cron.mnp_recon.daily_at: "02:30" # Asia/Kabul
cron.eir_recon.daily_at: "03:30"
cron.audit_verifier.daily_at: "04:30"
cron.cache_warm.hourly: "*/60"
ttl.line_type_seconds: "2592000" # 30d
ttl.mno_seconds: "86400" # 24h
ttl.vlr_seconds: "300" # 5m
ttl.eir_seconds: "86400"
budget.lookup_total_ms: "12"
hlr.map.timeout_ms: "1500"
hlr.rest.timeout_ms: "800"
hlr.pcap_sample_rate: "0.001"
prefix_table_csv_url: "vault:secret/ghasi/numint/prefix-table#csv"

5. NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: numint-ingress, namespace: sms-platform }
spec:
podSelector: { matchLabels: { app: number-intelligence-service } }
policyTypes: [Ingress]
ingress:
- from:
- { podSelector: { matchLabels: { app: routing-engine } } }
- { podSelector: { matchLabels: { app: sms-firewall-service } } }
- { podSelector: { matchLabels: { app: compliance-engine } } }
- { podSelector: { matchLabels: { app: channel-router-service } } }
- { podSelector: { matchLabels: { app: fraud-intel-service } } }
- { podSelector: { matchLabels: { app: sms-orchestrator } } }
- { namespaceSelector: { matchLabels: { name: kong } } } # Public Lookup REST
ports:
- { port: 50073, protocol: TCP }
- { port: 3073, protocol: TCP }
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: numint-egress, namespace: sms-platform }
spec:
podSelector: { matchLabels: { app: number-intelligence-service } }
policyTypes: [Egress]
egress:
- to:
- { podSelector: { matchLabels: { component: postgres-numint } } }
- { podSelector: { matchLabels: { component: redis-numint } } }
- { namespaceSelector: { matchLabels: { name: nats } } }
- { namespaceSelector: { matchLabels: { name: vault } } }
- { podSelector: { matchLabels: { app: ni-hlr-gateway } } }
- { podSelector: { matchLabels: { app: minio } } }
- to:
- { ipBlock: { cidr: 10.0.0.0/8 } } # internal cluster only
ports:
- { port: 443, protocol: TCP } # NO public internet egress from hot path

A separate egress policy on number-intelligence-batch permits SFTP to per-MNO endpoints (Vault-pinned IP allowlist).

6. Istio AuthorizationPolicy

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata: { name: numint-spiffe-allowlist, namespace: sms-platform }
spec:
selector: { matchLabels: { app: number-intelligence-service } }
action: ALLOW
rules:
- from:
- source:
principals:
- "spiffe://ghasi.platform/ns/routing/sa/routing-engine"
- "spiffe://ghasi.platform/ns/firewall/sa/sms-firewall-service"
- "spiffe://ghasi.platform/ns/compliance/sa/compliance-engine"
- "spiffe://ghasi.platform/ns/router/sa/channel-router-service"
- "spiffe://ghasi.platform/ns/fraud/sa/fraud-intel-service"
- "spiffe://ghasi.platform/ns/orchestrator/sa/sms-orchestrator"
- "spiffe://ghasi.platform/ns/gateway/sa/tenant-sdk-gateway"
to:
- operation: { ports: ["50073"] }

7. Infrastructure dependencies

DependencySpec
PostgreSQL (Patroni)numint schema; 3-node cluster per region (kbl + mzr); 16 vCPU / 64 GB / 2 TB NVMe per node; PgBouncer in transaction-pooling mode (max_client_conn 5000, default_pool_size 50)
PostgreSQL read-replicas2 per region for hot-path SELECTs
Redis cluster6 nodes per region (3 primaries × 2 replicas); cluster mode; 16 GB memory per primary; hash-slot tags {tenantId} and {msisdnHash}
NATS JetStreamShared platform cluster; streams NUMBER_INTELLIGENCE_EVENTS, NUMINT_RECONCILIATION, NUMINT_EIR, NUMINT_AUDIT_OPS, NUMINT_BILLING
VaultKV (peppers, salts, MNO creds), Transit (PCAP KEK, audit signing), PKI (mTLS certs)
MinIO / S3Buckets numint-mnp-raw, numint-eir-raw, numint-hlr-pcap, numint-audit-cold; SSE-KMS; Object Lock (governance, 7 y) on audit cold
Optional SS7 gatewayPer-MNO M3UA/SCTP point codes; SCCP GTs; provisioned in ni-hlr-gateway config map

8. Secrets (Vault Agent injection)

SecretVault pathMounted as
gRPC server cert + keypki/ghasi-numint/issue/serverFile /etc/tls/{tls.crt,tls.key,ca.crt}
Postgres credentialsdatabase/creds/numint-appEnv DATABASE_URL
Redis credentialssecret/ghasi/numint/redisEnv REDIS_URL
NATS credentialssecret/ghasi/nats/numintFile /etc/nats/creds
MSISDN peppersecret/ghasi/numint/msisdn_pepperEnv (refresh on rotation)
IMEI peppersecret/ghasi/numint/imei_pepperEnv
Per-tenant saltssecret/ghasi/numint/tenant-salts/*Lazy fetch (cached ≤ 5 min)
MNO MNP SFTP keyssecret/ghasi/numint/mno-sftp/*Mounted in batch pods only
MNO REST adapter credssecret/ghasi/numint/mno-rest/*Mounted in ni-hlr-gateway pods
PCAP KEKtransit/ghasi-numint-pcap-kekVault Transit (no export)
Audit chain signing keytransit/ghasi-numint-audit-signingVault Transit (no export)

9. Multi-region posture (per ADR-0004 §14)

┌───────────────────────┐ ┌───────────────────────┐
│ af-kabul-1 (RW) │ stream │ af-mzr-1 (RW) │
│ ─────────────────── │ ◄──────►│ ─────────────────── │
│ 6× hot-path pods │ sync │ 6× hot-path pods │
│ 2× batch (leader) │ │ 2× batch (warm-idle) │
│ 1× hlr-gw / node │ │ 1× hlr-gw / node │
│ Postgres primary │ │ Postgres primary │
│ Redis cluster │ │ Redis cluster │
└───────────────────────┘ └───────────────────────┘
▲ ▲
└────── routing-engine reads ───────┘
(region-local)

Both regions actively serve hot reads. Writes go to the region-local Postgres (multi-master via streaming replication; per-aggregate conflict policy in SYNC_CONTRACT §4). Batch jobs run in Kabul only under leader-lock; Mazar takes over only on extended Kabul outage.

10. Rollout strategy

  • Image pinned by digest; promotion to staging → canary 10 % in kbl for 1 h → canary 50 % for 1 h → 100 % both regions.
  • Cache warm on every replica; readiness gate prevents traffic until ≥ 80 % warmed.
  • No deploy permitted during MNP-recon hours (02:00 – 05:00 Asia/Kabul) without on-call approval.
  • Rollback: image pin to previous digest; kubectl rollout undo; cache invalidation not required (LRU rebuilds, Redis keys unchanged).

11. Observability binding

  • Prometheus scrapes both ports; Service-monitor created via Helm chart.
  • Grafana dashboards numint-hot-path.json, numint-mnp-eir.json, numint-adapter.json, numint-public-api.json, numint-audit.json deployed via the Grafana operator.
  • Alerts in OBSERVABILITY §3.
  • Tempo/Jaeger receives OTLP from all NI pods.