numbering-service — Security Model
Version: 1.0 Status: Draft Owner: Security + Commerce Engineering Last Updated: 2026-04-21 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS · ../../docs/13-security-compliance-tenancy.md
1. Authentication
1.1 gRPC plane — :50061
- mTLS required — gRPC server accepts only connections presenting a client certificate signed by the platform CA (Vault PKI root).
- Caller CN allowlist pins the exact set of authorised services and the RPCs each may invoke (see API_CONTRACTS §1.mTLS caller allowlist). Violations →
UNAUTHENTICATED. - Client certs are mounted from Vault via the Vault Agent Sidecar Injector and rotated every 30 days; the server hot-reloads on file change.
- Local-dev bypass via
GRPC_TLS_ENABLED=falseis permitted only whenNODE_ENV = 'development'; a start-up guard refuses to boot with TLS disabled in any other environment.
1.2 REST plane — :3021
- Kong validates the platform JWT (issued by
auth-service, RS256, JWKS-backed). Missing / invalid tokens → 401 at the edge; numbering-service never parses JWTs directly. - Kong injects
X-Tenant-Id,X-Account-Id,X-User-Id,X-Roles,X-Idpheaders on authenticated upstream requests. - Tenant portal handlers set
SET LOCAL app.current_tenant_id = <X-Tenant-Id>per request, enforcing Row-Level Security as a defence-in-depth layer. - Admin handlers set
SET LOCAL app.caller_role = <platform-role>to activate RLS admin bypass policies.
1.3 IdP-agnostic
Per ADR-0002, numbering-service is indifferent to the IdP that issued the token (Keycloak default / tenant OIDC / tenant SAML / Firebase legacy). The idp claim is captured in numbering.audit for forensic purposes but does not affect authorisation.
2. Authorization (RBAC)
Operations require specific platform or tenant-scoped roles. Role definitions live in auth-service; numbering enforces scope/role at handler entry via NestJS Guards.
| Role | Capabilities |
|---|---|
platform.numbering.admin | Full CRUD on pools, contracts, blocks; import MNO batches; recall/reinstate; quarantine override; regenerate regulator exports |
platform.numbering.ops | Read all; quarantine admin-override (requires justification); bulk recall assist |
platform.auditor | Read-only on numbers, leases, audit, regulator_exports; no state changes |
platform.support | Read-only, heavily redacted view (no per-number lease history for other tenants) |
Tenant sms:numbering:read | Read own tenant's pool, leases, reservations; browse AVAILABLE |
Tenant sms:numbering:write | Reserve / Hold / Assign / Release on own tenant |
Tenant sms:numbering:vanity | Lease vanity short codes (premium tier gate) |
Tenant sms:numbering:billing | Read lease billing terms, trigger renewal |
Enforcement points
- NestJS
RoleGuard— runs first, rejects with 403INSUFFICIENT_SCOPE. - Per-handler
@RequireRoles(...)decorator — declarative and contract-tested. - Postgres RLS on
numbers,leases,tenant_pools— final defence for tenant isolation. Withoutapp.current_tenant_id, SELECTs return zero rows for tenant-scoped queries. - gRPC
CallerAllowlistInterceptor— matches client certificate CN against the per-RPC allowlist.
3. Data Protection
3.1 Data inventory & classification
| Field | Class | Storage | Transit |
|---|---|---|---|
numbers.value (MSISDN / short / alpha) | INTERNAL — platform inventory asset, not subscriber-linked | DB-at-rest encryption (disk-level); per-field encryption not applied | TLS 1.3 |
leases.tenant_id / leases.account_id | INTERNAL | Plain | TLS 1.3 |
mno_signing_keys.public_key_pem | INTERNAL (public) | Plain | TLS 1.3 |
lease_contracts.signature_ref | Reference to Vault-managed blob | Pointer only | TLS 1.3 |
lease_import_batches.file_sha256 | Integrity hash | Plain | TLS 1.3 |
audit.* | CONFIDENTIAL — tamper-evident evidence | Hash-chained; append-only; DB-at-rest encryption | TLS 1.3 |
regulator_exports.* | CONFIDENTIAL — regulator submission | S3 object-lock WORM (7 y) | TLS 1.3 |
idempotency_keys.response | INTERNAL — cached API responses | Plain | TLS 1.3 |
Note on MSISDNs: Unlike consumer-facing services (consent-ledger-service, subscription services), MSISDNs in numbering-service represent inventory leased from MNOs to the platform, not subscriber identities. They are platform assets. Still classified INTERNAL and protected accordingly; analytics exports hash them (sha256(value)) by default.
3.2 Encryption keys
| Key | Store | Rotation |
|---|---|---|
| mTLS server + client certs | Vault PKI | 30 d auto-rotate |
| PostgreSQL credentials | Vault DB dynamic secret | Per-session |
| Redis credentials | Vault KV | Quarterly |
| NATS credentials | Vault KV | Quarterly |
| MNO public signing keys | numbering.mno_signing_keys (PG) | Per MNO policy; recorded on rotation |
| Regulator-export signing key (platform RSA) | Vault Transit (transit/ghasi-numbering-regulator) | Annual |
| S3 object-lock retention | IAM + bucket policy; WORM | Immutable |
| Disk-at-rest encryption keys | Cloud KMS (platform-wide) | Annual |
3.3 Hash-chained audit
numbering.audit.row_hash = sha256(prev_hash || canonical_row_bytes) computed by a Postgres trigger at INSERT time. The chain is validated daily by a reconciliation cron. Any gap or hash mismatch raises a CRITICAL alert and blocks the next regulator export until investigated.
CREATE FUNCTION numbering.compute_audit_hash() RETURNS TRIGGER AS $$
DECLARE prev BYTEA;
BEGIN
SELECT row_hash INTO prev
FROM numbering.audit
WHERE number_id = NEW.number_id
ORDER BY occurred_at DESC LIMIT 1;
NEW.prev_hash := COALESCE(prev, '\x00'::bytea);
NEW.row_hash := digest(
NEW.prev_hash
|| NEW.number_id::text
|| NEW.from_state::text
|| NEW.to_state::text
|| NEW.reason_code
|| NEW.occurred_at::text,
'sha256'
);
RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER trg_audit_hash BEFORE INSERT ON numbering.audit
FOR EACH ROW EXECUTE FUNCTION numbering.compute_audit_hash();
4. Tenant Isolation
- Postgres RLS on
numbers,leases,reservations,tenant_poolskeyed oncurrent_setting('app.current_tenant_id'). Handlers unable to set the variable (missing Kong header) cannot read tenant rows. num:valid:*,num:pool:*,num:quota:*Redis keys are tenant-scoped so a cache invalidation for tenant A never affects tenant B.platform.numbering.adminbypasses RLS viaapp.caller_role— audit trail captures the admin'sactorUserIdfor every cross-tenant action.- Alpha-ID platform uniqueness is enforced at the inventory level — a tenant cannot observe another tenant's active alpha-IDs through the browse API (portal filters by
state = AVAILABLEonly).
5. Secrets
| Secret | Store | Injected as |
|---|---|---|
| gRPC server cert + key | Vault PKI → K8s Secret (Vault Agent) | File mount /etc/tls/server.{crt,key} |
| gRPC CA bundle | Vault PKI | File mount /etc/tls/ca.crt |
| PostgreSQL credentials | Vault DB (dynamic) | Env var DATABASE_URL |
| Redis credentials | Vault KV | Env var REDIS_URL |
| NATS credentials | Vault KV | File mount via NATS_CREDS_PATH |
| Regulator-export signing key | Vault Transit (referenced) | — (never exported) |
| S3 credentials for regulator exports | Vault IAM | Env vars |
No secret is written to logs, events, or config files. Pre-commit gitleaks + CI trivy-config scans block accidental commits.
6. Fail-Closed Posture
Numbering-service is fail-closed on writes and fail-open-cache-only on hot-path reads:
| Scenario | Behaviour |
|---|---|
| PG down (writes) | Reserve/Assign/Release/Recall return UNAVAILABLE; clients retry or fail-closed |
| PG down (reads, cache warm) | ValidateLease serves Redis for up to 60 s TTL, returning the last-known state |
| PG down (reads, cache expired) | ValidateLease returns UNAVAILABLE; sms-orchestrator fail-closes (does not dispatch) |
| Redis down | PG direct; latency increases but correctness maintained |
sender-id-registry down during Assign (alpha path) | Assign returns FAILED_PRECONDITION — fail-closed; tenant must retry later |
| NATS down (outbox relay) | State writes succeed; events buffer in numbering.outbox and publish on NATS recovery |
| MNO signing key expired during import | Import rejected with SIGNATURE_INVALID — no partial ingest |
| Monthly regulator export fails | Export stays in PENDING status; alert fires; platform admin manually retries |
Security implication: availability attacks against numbering-service cannot cause incorrect allocation. At worst they delay allocations, which is acceptable in the control-plane context.
7. Threat Model
| Threat | Mitigation |
|---|---|
| Malicious admin tier-overrides a tenant's pool to bypass quota | All pool changes audit-logged with actorUserId; compliance.rule.changed.v1 mirror (numbering.audit.v1) replicated to SIEM; quarterly cross-admin review |
Compromised sms-orchestrator floods ValidateLease | mTLS + per-cert rate limit; aggressive 60 s Redis caching makes replay cheap; per-pod concurrency cap |
| Cross-tenant claim (tenant A leases tenant B's number) | Partial unique index on (value, type) WHERE state IN active states; CAS on every transition; RLS as defence-in-depth |
| Double-assignment across regions | Synchronous cross-region quorum on numbers/leases updates per ADR-0004 §14; CAS races resolved; nightly reconciliation catches divergence |
| CSV-injection via MNO lease import | CSV parsed row-by-row with strict column typing; ', \r, \n rejected; E.164 regex enforced; MNO RSA signature verifies source |
Attacker brute-forces AVAILABLE pool enumeration via ValidateLease | NOT_REGISTERED responses cached 30 s; browse endpoints rate-limited 120/min/tenant; unknown-identifier burst → fraud-intel signal |
| Audit log tampering | Hash chain with pgcrypto; daily verification cron; append-only Postgres rules; cold archive to S3 object-lock |
| Malicious tenant reserves inventory to exhaust pool | maxActiveReservations quota; num:rate:reserve:{tenantId} sliding window (60/min); RESERVATION_BURST anomaly → fraud-intel; auto-release TTL |
| Phisher recalls-and-re-leases same number under new tenant identity | Quarantine cool-off (90 d MSISDN / 30 d short / 365 d vanity); platform-wide uniqueness on alpha-IDs; admin override requires justification ≥ 20 chars; fraud-intel signal on repeat pattern |
| Leak of MNO signing key allows forged lease imports | Rotate mno_signing_keys quarterly; import pipeline logs file_sha256 — duplicates flagged; CFO-level approval required for emergency block ingest |
| Regulator-export tampering | SHA-256 + RSA signature at generation; stored in S3 object-lock bucket (WORM); ATRA verifies via platform public key |
8. GDPR / Regulatory
- Right to erasure (GDPR Art. 17): numbering-service consumes
auth.user.erased.v1. Response:leases,reservations,numbers.assigned_tenant_idfor the affected tenant → tenant-id tombstoned (replaced with00000000-0000-0000-0000-{tombstone-hash}) only after all active leases are recalled. Raw MSISDNs are not PII in this context and are retained.auditis NOT deleted — regulatory evidence overrides erasure; the row stays, but any embeddedactorUserIdreferencing the erased user is pseudonymised.
- Data residency: control-plane data is multi-region across
kblandmzr(both Afghanistan). No cross-border data flow by default. - ATRA audit evidence window: hash-chained
auditretained 13 m hot + 7 y cold (S3 object-lock). - Sub-processor list: no third-party AI/ML provider; numbering-service core logic is deterministic.
9. Security Testing
- Role-matrix integration test — every REST endpoint × every role → expected 200 / 403.
- RLS cross-tenant test — tenant A API calls MUST NOT return tenant B's leases/reservations/quota data. Verified on each CI run.
- CAS race test — 100 concurrent
Reservecalls on sameAVAILABLEnumber; exactly one succeeds, 99 getCONFLICT. - mTLS allowlist test — client cert with non-allowlisted CN is rejected with
UNAUTHENTICATED. - CSV injection test — fuzzing harness of malformed / injection-laced rows on MNO import; verify zero rows ingested.
- Hash-chain tampering test — attempt direct
UPDATE numbering.audit→ Postgres rule rejects; attempt raw-SQL manipulation via disaster-recovery tool → daily reconciliation flags. - Contract tests (Pact) on gRPC
ValidateLeaseschema withsms-orchestrator,routing-engine. - Quarterly penetration test scoped to numbering REST surface.
gitleaks,osv-scanner,trivyin CI.
End of SECURITY_MODEL.md