Skip to main content

numbering-service — Security Model

Version: 1.0 Status: Draft Owner: Security + Commerce Engineering Last Updated: 2026-04-21 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · ../../docs/13-security-compliance-tenancy.md


1. Authentication

1.1 gRPC plane — :50061

  • mTLS required — gRPC server accepts only connections presenting a client certificate signed by the platform CA (Vault PKI root).
  • Caller CN allowlist pins the exact set of authorised services and the RPCs each may invoke (see API_CONTRACTS §1.mTLS caller allowlist). Violations → UNAUTHENTICATED.
  • Client certs are mounted from Vault via the Vault Agent Sidecar Injector and rotated every 30 days; the server hot-reloads on file change.
  • Local-dev bypass via GRPC_TLS_ENABLED=false is permitted only when NODE_ENV = 'development'; a start-up guard refuses to boot with TLS disabled in any other environment.

1.2 REST plane — :3021

  • Kong validates the platform JWT (issued by auth-service, RS256, JWKS-backed). Missing / invalid tokens → 401 at the edge; numbering-service never parses JWTs directly.
  • Kong injects X-Tenant-Id, X-Account-Id, X-User-Id, X-Roles, X-Idp headers on authenticated upstream requests.
  • Tenant portal handlers set SET LOCAL app.current_tenant_id = <X-Tenant-Id> per request, enforcing Row-Level Security as a defence-in-depth layer.
  • Admin handlers set SET LOCAL app.caller_role = <platform-role> to activate RLS admin bypass policies.

1.3 IdP-agnostic

Per ADR-0002, numbering-service is indifferent to the IdP that issued the token (Keycloak default / tenant OIDC / tenant SAML / Firebase legacy). The idp claim is captured in numbering.audit for forensic purposes but does not affect authorisation.


2. Authorization (RBAC)

Operations require specific platform or tenant-scoped roles. Role definitions live in auth-service; numbering enforces scope/role at handler entry via NestJS Guards.

RoleCapabilities
platform.numbering.adminFull CRUD on pools, contracts, blocks; import MNO batches; recall/reinstate; quarantine override; regenerate regulator exports
platform.numbering.opsRead all; quarantine admin-override (requires justification); bulk recall assist
platform.auditorRead-only on numbers, leases, audit, regulator_exports; no state changes
platform.supportRead-only, heavily redacted view (no per-number lease history for other tenants)
Tenant sms:numbering:readRead own tenant's pool, leases, reservations; browse AVAILABLE
Tenant sms:numbering:writeReserve / Hold / Assign / Release on own tenant
Tenant sms:numbering:vanityLease vanity short codes (premium tier gate)
Tenant sms:numbering:billingRead lease billing terms, trigger renewal

Enforcement points

  1. NestJS RoleGuard — runs first, rejects with 403 INSUFFICIENT_SCOPE.
  2. Per-handler @RequireRoles(...) decorator — declarative and contract-tested.
  3. Postgres RLS on numbers, leases, tenant_pools — final defence for tenant isolation. Without app.current_tenant_id, SELECTs return zero rows for tenant-scoped queries.
  4. gRPC CallerAllowlistInterceptor — matches client certificate CN against the per-RPC allowlist.

3. Data Protection

3.1 Data inventory & classification

FieldClassStorageTransit
numbers.value (MSISDN / short / alpha)INTERNAL — platform inventory asset, not subscriber-linkedDB-at-rest encryption (disk-level); per-field encryption not appliedTLS 1.3
leases.tenant_id / leases.account_idINTERNALPlainTLS 1.3
mno_signing_keys.public_key_pemINTERNAL (public)PlainTLS 1.3
lease_contracts.signature_refReference to Vault-managed blobPointer onlyTLS 1.3
lease_import_batches.file_sha256Integrity hashPlainTLS 1.3
audit.*CONFIDENTIAL — tamper-evident evidenceHash-chained; append-only; DB-at-rest encryptionTLS 1.3
regulator_exports.*CONFIDENTIAL — regulator submissionS3 object-lock WORM (7 y)TLS 1.3
idempotency_keys.responseINTERNAL — cached API responsesPlainTLS 1.3

Note on MSISDNs: Unlike consumer-facing services (consent-ledger-service, subscription services), MSISDNs in numbering-service represent inventory leased from MNOs to the platform, not subscriber identities. They are platform assets. Still classified INTERNAL and protected accordingly; analytics exports hash them (sha256(value)) by default.

3.2 Encryption keys

KeyStoreRotation
mTLS server + client certsVault PKI30 d auto-rotate
PostgreSQL credentialsVault DB dynamic secretPer-session
Redis credentialsVault KVQuarterly
NATS credentialsVault KVQuarterly
MNO public signing keysnumbering.mno_signing_keys (PG)Per MNO policy; recorded on rotation
Regulator-export signing key (platform RSA)Vault Transit (transit/ghasi-numbering-regulator)Annual
S3 object-lock retentionIAM + bucket policy; WORMImmutable
Disk-at-rest encryption keysCloud KMS (platform-wide)Annual

3.3 Hash-chained audit

numbering.audit.row_hash = sha256(prev_hash || canonical_row_bytes) computed by a Postgres trigger at INSERT time. The chain is validated daily by a reconciliation cron. Any gap or hash mismatch raises a CRITICAL alert and blocks the next regulator export until investigated.

CREATE FUNCTION numbering.compute_audit_hash() RETURNS TRIGGER AS $$
DECLARE prev BYTEA;
BEGIN
SELECT row_hash INTO prev
FROM numbering.audit
WHERE number_id = NEW.number_id
ORDER BY occurred_at DESC LIMIT 1;
NEW.prev_hash := COALESCE(prev, '\x00'::bytea);
NEW.row_hash := digest(
NEW.prev_hash
|| NEW.number_id::text
|| NEW.from_state::text
|| NEW.to_state::text
|| NEW.reason_code
|| NEW.occurred_at::text,
'sha256'
);
RETURN NEW;
END $$ LANGUAGE plpgsql;

CREATE TRIGGER trg_audit_hash BEFORE INSERT ON numbering.audit
FOR EACH ROW EXECUTE FUNCTION numbering.compute_audit_hash();

4. Tenant Isolation

  • Postgres RLS on numbers, leases, reservations, tenant_pools keyed on current_setting('app.current_tenant_id'). Handlers unable to set the variable (missing Kong header) cannot read tenant rows.
  • num:valid:*, num:pool:*, num:quota:* Redis keys are tenant-scoped so a cache invalidation for tenant A never affects tenant B.
  • platform.numbering.admin bypasses RLS via app.caller_role — audit trail captures the admin's actorUserId for every cross-tenant action.
  • Alpha-ID platform uniqueness is enforced at the inventory level — a tenant cannot observe another tenant's active alpha-IDs through the browse API (portal filters by state = AVAILABLE only).

5. Secrets

SecretStoreInjected as
gRPC server cert + keyVault PKI → K8s Secret (Vault Agent)File mount /etc/tls/server.{crt,key}
gRPC CA bundleVault PKIFile mount /etc/tls/ca.crt
PostgreSQL credentialsVault DB (dynamic)Env var DATABASE_URL
Redis credentialsVault KVEnv var REDIS_URL
NATS credentialsVault KVFile mount via NATS_CREDS_PATH
Regulator-export signing keyVault Transit (referenced)— (never exported)
S3 credentials for regulator exportsVault IAMEnv vars

No secret is written to logs, events, or config files. Pre-commit gitleaks + CI trivy-config scans block accidental commits.


6. Fail-Closed Posture

Numbering-service is fail-closed on writes and fail-open-cache-only on hot-path reads:

ScenarioBehaviour
PG down (writes)Reserve/Assign/Release/Recall return UNAVAILABLE; clients retry or fail-closed
PG down (reads, cache warm)ValidateLease serves Redis for up to 60 s TTL, returning the last-known state
PG down (reads, cache expired)ValidateLease returns UNAVAILABLE; sms-orchestrator fail-closes (does not dispatch)
Redis downPG direct; latency increases but correctness maintained
sender-id-registry down during Assign (alpha path)Assign returns FAILED_PRECONDITION — fail-closed; tenant must retry later
NATS down (outbox relay)State writes succeed; events buffer in numbering.outbox and publish on NATS recovery
MNO signing key expired during importImport rejected with SIGNATURE_INVALID — no partial ingest
Monthly regulator export failsExport stays in PENDING status; alert fires; platform admin manually retries

Security implication: availability attacks against numbering-service cannot cause incorrect allocation. At worst they delay allocations, which is acceptable in the control-plane context.


7. Threat Model

ThreatMitigation
Malicious admin tier-overrides a tenant's pool to bypass quotaAll pool changes audit-logged with actorUserId; compliance.rule.changed.v1 mirror (numbering.audit.v1) replicated to SIEM; quarterly cross-admin review
Compromised sms-orchestrator floods ValidateLeasemTLS + per-cert rate limit; aggressive 60 s Redis caching makes replay cheap; per-pod concurrency cap
Cross-tenant claim (tenant A leases tenant B's number)Partial unique index on (value, type) WHERE state IN active states; CAS on every transition; RLS as defence-in-depth
Double-assignment across regionsSynchronous cross-region quorum on numbers/leases updates per ADR-0004 §14; CAS races resolved; nightly reconciliation catches divergence
CSV-injection via MNO lease importCSV parsed row-by-row with strict column typing; ', \r, \n rejected; E.164 regex enforced; MNO RSA signature verifies source
Attacker brute-forces AVAILABLE pool enumeration via ValidateLeaseNOT_REGISTERED responses cached 30 s; browse endpoints rate-limited 120/min/tenant; unknown-identifier burst → fraud-intel signal
Audit log tamperingHash chain with pgcrypto; daily verification cron; append-only Postgres rules; cold archive to S3 object-lock
Malicious tenant reserves inventory to exhaust poolmaxActiveReservations quota; num:rate:reserve:{tenantId} sliding window (60/min); RESERVATION_BURST anomaly → fraud-intel; auto-release TTL
Phisher recalls-and-re-leases same number under new tenant identityQuarantine cool-off (90 d MSISDN / 30 d short / 365 d vanity); platform-wide uniqueness on alpha-IDs; admin override requires justification ≥ 20 chars; fraud-intel signal on repeat pattern
Leak of MNO signing key allows forged lease importsRotate mno_signing_keys quarterly; import pipeline logs file_sha256 — duplicates flagged; CFO-level approval required for emergency block ingest
Regulator-export tamperingSHA-256 + RSA signature at generation; stored in S3 object-lock bucket (WORM); ATRA verifies via platform public key

8. GDPR / Regulatory

  • Right to erasure (GDPR Art. 17): numbering-service consumes auth.user.erased.v1. Response:
    • leases, reservations, numbers.assigned_tenant_id for the affected tenant → tenant-id tombstoned (replaced with 00000000-0000-0000-0000-{tombstone-hash}) only after all active leases are recalled. Raw MSISDNs are not PII in this context and are retained.
    • audit is NOT deleted — regulatory evidence overrides erasure; the row stays, but any embedded actorUserId referencing the erased user is pseudonymised.
  • Data residency: control-plane data is multi-region across kbl and mzr (both Afghanistan). No cross-border data flow by default.
  • ATRA audit evidence window: hash-chained audit retained 13 m hot + 7 y cold (S3 object-lock).
  • Sub-processor list: no third-party AI/ML provider; numbering-service core logic is deterministic.

9. Security Testing

  • Role-matrix integration test — every REST endpoint × every role → expected 200 / 403.
  • RLS cross-tenant test — tenant A API calls MUST NOT return tenant B's leases/reservations/quota data. Verified on each CI run.
  • CAS race test — 100 concurrent Reserve calls on same AVAILABLE number; exactly one succeeds, 99 get CONFLICT.
  • mTLS allowlist test — client cert with non-allowlisted CN is rejected with UNAUTHENTICATED.
  • CSV injection test — fuzzing harness of malformed / injection-laced rows on MNO import; verify zero rows ingested.
  • Hash-chain tampering test — attempt direct UPDATE numbering.audit → Postgres rule rejects; attempt raw-SQL manipulation via disaster-recovery tool → daily reconciliation flags.
  • Contract tests (Pact) on gRPC ValidateLease schema with sms-orchestrator, routing-engine.
  • Quarterly penetration test scoped to numbering REST surface.
  • gitleaks, osv-scanner, trivy in CI.

End of SECURITY_MODEL.md