Skip to main content

sender-id-registry-service — Service Readiness

Version: 1.0 Status: Draft Owner: Trust and Safety + Regulator-facing Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, _report.md, docs/architecture/ADR-0004-national-backbone-resilience.md

Readiness criteria for taking sender-id-registry-service from development to production. The service is the national authority for SMS sender-ID registration and is consulted synchronously by compliance-engine, routing-engine, and sms-firewall-service on every outbound message; the reputation feed affects tenant scoring and auto-suspension; regulator-export is a regulator-reportable artefact. Readiness bar is elevated accordingly.


1. Code Readiness

CriterionStatusNotes
gRPC SenderIdRegistryService.v1 — Verify / GetReputationCore hot-path.
REST /v1/sender-ids/* — tenant submit, verification artefact upload, public search, admin workflow
State machine: SUBMITTED → KYC_REVIEW → VERIFIED → ACTIVE → SUSPENDED → REVOKED
DNS-TXT verification challenger (_ghasi-sid-verify.{domain})US-SID-006
OTP verification via channel-router (lane=P1, 5-min TTL, 3-attempt max)US-SID-007
Notarised-document review UI (admin) with document watermarkingUS-SID-008
Document verification (commercial licence + national ID)US-SID-009
Restricted-pattern enforcement at submit timeUS-SID-005
Reputation cron (daily 00:30 UTC) computes score from compliance+fraud+DLR signalsUS-SID-017
Reputation auto-suspension when score < 30US-SID-012
Reputation history persistence + trend APIUS-SID-018, US-SID-019
Fraud signal NATS consumer (fraud.detected.*, compliance.message.blocked.v1)US-SID-020
Public search endpoint with rate-limit + tarpitUS-SID-015
Regulator export cron (daily 04:00 Asia/Kabul) with signed JSON Lines output to S3US-SID-016
KYC document S3 storage with per-tenant DEK encryption
Audit hash-chain on sender_id_registry.audit
Redis hot-cache for Verify (TTL 300 s; invalidated on state change)
mTLS gRPC + Kong JWT REST
Idempotency-Key support on REST writes

2. Testing Readiness

CriterionTargetStatus
Unit test coverage≥ 90% line (domain), ≥ 80% branch
Unit tests for state-machine transitions≥ 30 (every allowed / disallowed edge)
Unit tests for restricted-pattern matcher≥ 50 including obfuscated variants
Unit tests for reputation formula≥ 20 including boundary conditions
Unit tests for verification challenge generators (DNS / OTP / document)≥ 15 each
Property-based tests — reputation monotonicity under increasing fraud events≥ 10 properties
Integration: gRPC Verify P95 ≤ 5 ms @ 5 000 RPSPassed
Integration: full registration → verification → activation e2ePassed
Integration: OTP verification through channel-router mockPassed
Integration: DNS-TXT verification against local DNS test serverPassed
Integration: regulator export cron produces signed file under expected schemaPassed
Integration: fraud signal consumer reduces reputation within 60 s of NATS eventPassed
Integration: RLS cross-tenant access blocked (mandatory tenant-isolation.spec.ts)Passed
Contract: compliance-engine SENDER_ID_VERIFICATION rule (EP-CE-15) integrationPassed
Contract: routing-engine last-mile vetoPassed
Contract: sms-firewall EvaluateTransit flowPassed
Chaos: Postgres unavailable → Verify fails-closed; public search fails-safe (503)Passed
Chaos: S3 KYC fetch fails → admin UI shows degraded but reviewer can proceed via cachePassed
Chaos: DNS resolver unreachable → verification defers to manualPassed
Security: restricted-pattern enforcement cannot be bypassed via Unicode homoglyphsPassed
Security: document watermark present on every inline view; audit row confirmsPassed
Security: KYC document encrypted at rest with tenant DEKPassed
Security: audit log UPDATE/DELETE rejectedPassed
Load test: 500 registrations/min sustained; 10 000 Verify RPS burstPassed

3. Observability Readiness

CriterionStatus
All Prometheus metrics emitting (see OBSERVABILITY.md §1)
Grafana dashboard sender-id-registry-service.json deployed
All alerts configured in Alertmanager with runbook links
Structured JSON logs with document-id hashing (no raw contents)
OTel trace propagation from compliance-engine → sender-id-registry verified
Loki parsing rules validated
SIEM forwarding of sender.id.* events verified

Alerts Configured

  • SidVerifyLatencyHigh (gRPC Verify P95 > 15 ms for 5 min)
  • SidKycReviewSlaBreach (any pending registration > 5 business days)
  • SidPublicSearchAbuse (single IP > 200 RPS for 5 min)
  • SidReputationCollapse (>= 2 sender-IDs dropping below 30 in 1 h)
  • SidRegulatorExportFailed (daily export job failed or missing)
  • SidAuditChainBroken (daily verifier detected break — Critical)
  • SidDocumentFetchFailureHigh (> 5% of inline view attempts fail for 10 min)
  • SidDnsVerificationStorm (> 100 verification retries/min)

4. Security Readiness

CriterionStatus
mTLS on gRPC + SPIRE SVID per ADR-0004 §12
NetworkPolicy restricting gRPC ingress to compliance-engine, routing-engine, sms-firewall
Public search route has separate Kong rate-limit + JA3 fingerprint filter
KYC S3 bucket: per-tenant DEK; HSM-wrapped KEK; bucket policy forbids public
Document watermark HMAC-signed with reviewer identity and timestamp
Audit UPDATE/DELETE rejected by Postgres trigger
Notary whitelist digitally signed by Legal
Restricted-pattern list digitally signed by Legal+CISO on each change
Pen test against public-search + admin review UI completed
Homoglyph attack corpus tested against restricted-pattern matcher
Security team sign-off

5. Operational Readiness

CriterionStatus
K8s Deployment (3–10 replicas) reviewed; HPA on gRPC RPS
PDB minAvailable: 2 per region
Rolling update tested: zero dropped Verify calls under 2 000 RPS
Graceful shutdown: 15 s SIGTERM drain
Postgres conn pool sized (pgbouncer transaction mode)
Redis conn pool sized (50 min / 200 max per pod)
S3 IAM policy reviewed; only the service account can Get/Put KYC objects
Multi-region replication verified (sender-IDs are control-plane — multi-master)
Regulator-export SFTP delivery tested end-to-end
DNS-TXT verification runbook drafted
Reputation-collapse incident runbook drafted
Restricted-pattern-update runbook (dual-control) drafted
On-call rotation assigned (Trust & Safety primary; SRE secondary)

6. Documentation Readiness

DocumentStatus
SERVICE_OVERVIEW.mdComplete
DOMAIN_MODEL.mdComplete
APPLICATION_LOGIC.mdComplete
API_CONTRACTS.mdComplete
EVENT_SCHEMAS.mdComplete
DATA_MODEL.mdComplete
SYNC_CONTRACT.mdComplete
SECURITY_MODEL.mdComplete
OBSERVABILITY.mdComplete
TESTING_STRATEGY.mdComplete
DEPLOYMENT_TOPOLOGY.mdComplete
FAILURE_MODES.mdComplete
LOCAL_DEV_SETUP.mdComplete
AI_INTEGRATION.mdComplete
MIGRATION_PLAN.mdComplete
SERVICE_RISK_REGISTER.mdComplete
Runbook: Verify latency spike
Runbook: KYC review SLA breach triage
Runbook: reputation-score collapse
Runbook: regulator export failure
Runbook: restricted-pattern change
Reviewer handbook for Trust & Safety workbench
Tenant onboarding guide (registration + verification)

7. Compliance / Regulatory Readiness

CriterionStatus
ATRA engagement on regulator-export format + cadence
Notary-whitelist convention agreed with Legal
Bank/GOV/MNO restricted-pattern list ratified by Legal + CISO
DPIA authored for KYC document processing
Citizen public-search Terms of Use + privacy notice approved
Audit retention policy (13 m hot, 7 y cold) configured
SIEM forwarding of sender.id.* events to regulator-portal approved
Tenant attestation form for KYC-document authenticity signed at onboarding

8. Go/No-Go Criteria Summary

Production deployment is GO when all of the following are met:

  • All items in §1 Code Readiness complete.
  • Coverage targets met.
  • Load test at 1.5× expected peak RPS (target 7 500 Verify RPS) sustains P99 ≤ 25 ms.
  • Homoglyph and restricted-pattern bypass corpus test passes (0 bypasses).
  • Regulator Liaison signs off on daily export schema.
  • 14-day shadow of verification workflow with design-partner tenants completed.
  • Chaos drill: Postgres, Redis, S3, DNS, NATS failure injections all degrade as designed.
  • Security team sign-off.
  • Reputation formula reviewed by Trust & Safety lead; false-positive rate projection < 2%.
  • Legal sign-off on notary whitelist + restricted-pattern list.
  • Rollback plan validated in staging.

9. Post-Launch Review

Within 30 days of full enforcement:

  • Registration throughput vs. forecast; adjust reviewer capacity if needed.
  • KYC-review SLA compliance (target: 100% within 5 business days; 95% within 3).
  • Restricted-pattern false-positive rate (target: < 5% of rejections are legitimate brand names).
  • Reputation-score distribution (sanity: not pathologically bimodal).
  • Auto-suspension review (target: < 10% of auto-suspensions reversed on manual review).
  • Public-search scraping attempts detected and mitigated count.
  • Regulator-export job reliability (target: 100% daily exports delivered + ACK).
  • Audit-chain integrity: 0 breaks.
  • Cost analysis: per-1000 Verify calls; S3 storage growth; notary-review labour hours.
  • Sender-ID suspend ↔ tenant churn correlation analysis.

10. Phased Rollout

PhaseDurationBehaviourExit criteria
P1 — Registration Open, Enforcement Off30 dTenants can submit; KYC review + verification workflows active. Verify returns ACTIVE for any submitted sender-ID regardless of status (observation mode).80% of active tenants have ≥ 1 registered sender-ID.
P2 — Enforcement: DOCUMENT+ tiers14 dVerify returns actual status + level; compliance-engine SENDER_ID_VERIFICATION rule enabled. Unregistered sender-IDs still allowed.No tenant escalation beyond forecast; regulator sign-off.
P3 — Full EnforcementOngoingUnregistered sender-IDs blocked for P0/P1/P2 lanes; P3/P4 warning-then-block on 30-d grace. Public search live. Regulator export live.N/A.

Rollback at any phase via feature flags: SID_ENFORCEMENT_LEVEL=OFF|DOCUMENT_PLUS|ALL.