sender-id-registry-service — Service Readiness
Version: 1.0 Status: Draft Owner: Trust and Safety + Regulator-facing Last Updated: 2026-04-21 References: SERVICE_OVERVIEW.md, _report.md, docs/architecture/ADR-0004-national-backbone-resilience.md
Readiness criteria for taking sender-id-registry-service from development to production. The service is the national authority for SMS sender-ID registration and is consulted synchronously by compliance-engine, routing-engine, and sms-firewall-service on every outbound message; the reputation feed affects tenant scoring and auto-suspension; regulator-export is a regulator-reportable artefact. Readiness bar is elevated accordingly.
1. Code Readiness
| Criterion | Status | Notes |
|---|---|---|
gRPC SenderIdRegistryService.v1 — Verify / GetReputation | ☐ | Core hot-path. |
REST /v1/sender-ids/* — tenant submit, verification artefact upload, public search, admin workflow | ☐ | |
| State machine: SUBMITTED → KYC_REVIEW → VERIFIED → ACTIVE → SUSPENDED → REVOKED | ☐ | |
DNS-TXT verification challenger (_ghasi-sid-verify.{domain}) | ☐ | US-SID-006 |
| OTP verification via channel-router (lane=P1, 5-min TTL, 3-attempt max) | ☐ | US-SID-007 |
| Notarised-document review UI (admin) with document watermarking | ☐ | US-SID-008 |
| Document verification (commercial licence + national ID) | ☐ | US-SID-009 |
| Restricted-pattern enforcement at submit time | ☐ | US-SID-005 |
| Reputation cron (daily 00:30 UTC) computes score from compliance+fraud+DLR signals | ☐ | US-SID-017 |
| Reputation auto-suspension when score < 30 | ☐ | US-SID-012 |
| Reputation history persistence + trend API | ☐ | US-SID-018, US-SID-019 |
Fraud signal NATS consumer (fraud.detected.*, compliance.message.blocked.v1) | ☐ | US-SID-020 |
| Public search endpoint with rate-limit + tarpit | ☐ | US-SID-015 |
| Regulator export cron (daily 04:00 Asia/Kabul) with signed JSON Lines output to S3 | ☐ | US-SID-016 |
| KYC document S3 storage with per-tenant DEK encryption | ☐ | |
Audit hash-chain on sender_id_registry.audit | ☐ | |
| Redis hot-cache for Verify (TTL 300 s; invalidated on state change) | ☐ | |
| mTLS gRPC + Kong JWT REST | ☐ | |
| Idempotency-Key support on REST writes | ☐ |
2. Testing Readiness
| Criterion | Target | Status |
|---|---|---|
| Unit test coverage | ≥ 90% line (domain), ≥ 80% branch | ☐ |
| Unit tests for state-machine transitions | ≥ 30 (every allowed / disallowed edge) | ☐ |
| Unit tests for restricted-pattern matcher | ≥ 50 including obfuscated variants | ☐ |
| Unit tests for reputation formula | ≥ 20 including boundary conditions | ☐ |
| Unit tests for verification challenge generators (DNS / OTP / document) | ≥ 15 each | ☐ |
| Property-based tests — reputation monotonicity under increasing fraud events | ≥ 10 properties | ☐ |
Integration: gRPC Verify P95 ≤ 5 ms @ 5 000 RPS | Passed | ☐ |
| Integration: full registration → verification → activation e2e | Passed | ☐ |
| Integration: OTP verification through channel-router mock | Passed | ☐ |
| Integration: DNS-TXT verification against local DNS test server | Passed | ☐ |
| Integration: regulator export cron produces signed file under expected schema | Passed | ☐ |
| Integration: fraud signal consumer reduces reputation within 60 s of NATS event | Passed | ☐ |
Integration: RLS cross-tenant access blocked (mandatory tenant-isolation.spec.ts) | Passed | ☐ |
| Contract: compliance-engine SENDER_ID_VERIFICATION rule (EP-CE-15) integration | Passed | ☐ |
| Contract: routing-engine last-mile veto | Passed | ☐ |
| Contract: sms-firewall EvaluateTransit flow | Passed | ☐ |
| Chaos: Postgres unavailable → Verify fails-closed; public search fails-safe (503) | Passed | ☐ |
| Chaos: S3 KYC fetch fails → admin UI shows degraded but reviewer can proceed via cache | Passed | ☐ |
| Chaos: DNS resolver unreachable → verification defers to manual | Passed | ☐ |
| Security: restricted-pattern enforcement cannot be bypassed via Unicode homoglyphs | Passed | ☐ |
| Security: document watermark present on every inline view; audit row confirms | Passed | ☐ |
| Security: KYC document encrypted at rest with tenant DEK | Passed | ☐ |
| Security: audit log UPDATE/DELETE rejected | Passed | ☐ |
| Load test: 500 registrations/min sustained; 10 000 Verify RPS burst | Passed | ☐ |
3. Observability Readiness
| Criterion | Status |
|---|---|
| All Prometheus metrics emitting (see OBSERVABILITY.md §1) | ☐ |
Grafana dashboard sender-id-registry-service.json deployed | ☐ |
| All alerts configured in Alertmanager with runbook links | ☐ |
| Structured JSON logs with document-id hashing (no raw contents) | ☐ |
| OTel trace propagation from compliance-engine → sender-id-registry verified | ☐ |
| Loki parsing rules validated | ☐ |
SIEM forwarding of sender.id.* events verified | ☐ |
Alerts Configured
-
SidVerifyLatencyHigh(gRPC Verify P95 > 15 ms for 5 min) -
SidKycReviewSlaBreach(any pending registration > 5 business days) -
SidPublicSearchAbuse(single IP > 200 RPS for 5 min) -
SidReputationCollapse(>= 2 sender-IDs dropping below 30 in 1 h) -
SidRegulatorExportFailed(daily export job failed or missing) -
SidAuditChainBroken(daily verifier detected break — Critical) -
SidDocumentFetchFailureHigh(> 5% of inline view attempts fail for 10 min) -
SidDnsVerificationStorm(> 100 verification retries/min)
4. Security Readiness
| Criterion | Status |
|---|---|
| mTLS on gRPC + SPIRE SVID per ADR-0004 §12 | ☐ |
| NetworkPolicy restricting gRPC ingress to compliance-engine, routing-engine, sms-firewall | ☐ |
| Public search route has separate Kong rate-limit + JA3 fingerprint filter | ☐ |
| KYC S3 bucket: per-tenant DEK; HSM-wrapped KEK; bucket policy forbids public | ☐ |
| Document watermark HMAC-signed with reviewer identity and timestamp | ☐ |
| Audit UPDATE/DELETE rejected by Postgres trigger | ☐ |
| Notary whitelist digitally signed by Legal | ☐ |
| Restricted-pattern list digitally signed by Legal+CISO on each change | ☐ |
| Pen test against public-search + admin review UI completed | ☐ |
| Homoglyph attack corpus tested against restricted-pattern matcher | ☐ |
| Security team sign-off | ☐ |
5. Operational Readiness
| Criterion | Status |
|---|---|
| K8s Deployment (3–10 replicas) reviewed; HPA on gRPC RPS | ☐ |
PDB minAvailable: 2 per region | ☐ |
| Rolling update tested: zero dropped Verify calls under 2 000 RPS | ☐ |
| Graceful shutdown: 15 s SIGTERM drain | ☐ |
| Postgres conn pool sized (pgbouncer transaction mode) | ☐ |
| Redis conn pool sized (50 min / 200 max per pod) | ☐ |
| S3 IAM policy reviewed; only the service account can Get/Put KYC objects | ☐ |
| Multi-region replication verified (sender-IDs are control-plane — multi-master) | ☐ |
| Regulator-export SFTP delivery tested end-to-end | ☐ |
| DNS-TXT verification runbook drafted | ☐ |
| Reputation-collapse incident runbook drafted | ☐ |
| Restricted-pattern-update runbook (dual-control) drafted | ☐ |
| On-call rotation assigned (Trust & Safety primary; SRE secondary) | ☐ |
6. Documentation Readiness
| Document | Status |
|---|---|
| SERVICE_OVERVIEW.md | Complete |
| DOMAIN_MODEL.md | Complete |
| APPLICATION_LOGIC.md | Complete |
| API_CONTRACTS.md | Complete |
| EVENT_SCHEMAS.md | Complete |
| DATA_MODEL.md | Complete |
| SYNC_CONTRACT.md | Complete |
| SECURITY_MODEL.md | Complete |
| OBSERVABILITY.md | Complete |
| TESTING_STRATEGY.md | Complete |
| DEPLOYMENT_TOPOLOGY.md | Complete |
| FAILURE_MODES.md | Complete |
| LOCAL_DEV_SETUP.md | Complete |
| AI_INTEGRATION.md | Complete |
| MIGRATION_PLAN.md | Complete |
| SERVICE_RISK_REGISTER.md | Complete |
| Runbook: Verify latency spike | ☐ |
| Runbook: KYC review SLA breach triage | ☐ |
| Runbook: reputation-score collapse | ☐ |
| Runbook: regulator export failure | ☐ |
| Runbook: restricted-pattern change | ☐ |
| Reviewer handbook for Trust & Safety workbench | ☐ |
| Tenant onboarding guide (registration + verification) | ☐ |
7. Compliance / Regulatory Readiness
| Criterion | Status |
|---|---|
| ATRA engagement on regulator-export format + cadence | ☐ |
| Notary-whitelist convention agreed with Legal | ☐ |
| Bank/GOV/MNO restricted-pattern list ratified by Legal + CISO | ☐ |
| DPIA authored for KYC document processing | ☐ |
| Citizen public-search Terms of Use + privacy notice approved | ☐ |
| Audit retention policy (13 m hot, 7 y cold) configured | ☐ |
SIEM forwarding of sender.id.* events to regulator-portal approved | ☐ |
| Tenant attestation form for KYC-document authenticity signed at onboarding | ☐ |
8. Go/No-Go Criteria Summary
Production deployment is GO when all of the following are met:
- All items in §1 Code Readiness complete.
- Coverage targets met.
- Load test at 1.5× expected peak RPS (target 7 500 Verify RPS) sustains P99 ≤ 25 ms.
- Homoglyph and restricted-pattern bypass corpus test passes (0 bypasses).
- Regulator Liaison signs off on daily export schema.
- 14-day shadow of verification workflow with design-partner tenants completed.
- Chaos drill: Postgres, Redis, S3, DNS, NATS failure injections all degrade as designed.
- Security team sign-off.
- Reputation formula reviewed by Trust & Safety lead; false-positive rate projection < 2%.
- Legal sign-off on notary whitelist + restricted-pattern list.
- Rollback plan validated in staging.
9. Post-Launch Review
Within 30 days of full enforcement:
- Registration throughput vs. forecast; adjust reviewer capacity if needed.
- KYC-review SLA compliance (target: 100% within 5 business days; 95% within 3).
- Restricted-pattern false-positive rate (target: < 5% of rejections are legitimate brand names).
- Reputation-score distribution (sanity: not pathologically bimodal).
- Auto-suspension review (target: < 10% of auto-suspensions reversed on manual review).
- Public-search scraping attempts detected and mitigated count.
- Regulator-export job reliability (target: 100% daily exports delivered + ACK).
- Audit-chain integrity: 0 breaks.
- Cost analysis: per-1000 Verify calls; S3 storage growth; notary-review labour hours.
- Sender-ID suspend ↔ tenant churn correlation analysis.
10. Phased Rollout
| Phase | Duration | Behaviour | Exit criteria |
|---|---|---|---|
| P1 — Registration Open, Enforcement Off | 30 d | Tenants can submit; KYC review + verification workflows active. Verify returns ACTIVE for any submitted sender-ID regardless of status (observation mode). | 80% of active tenants have ≥ 1 registered sender-ID. |
| P2 — Enforcement: DOCUMENT+ tiers | 14 d | Verify returns actual status + level; compliance-engine SENDER_ID_VERIFICATION rule enabled. Unregistered sender-IDs still allowed. | No tenant escalation beyond forecast; regulator sign-off. |
| P3 — Full Enforcement | Ongoing | Unregistered sender-IDs blocked for P0/P1/P2 lanes; P3/P4 warning-then-block on 30-d grace. Public search live. Regulator export live. | N/A. |
Rollback at any phase via feature flags: SID_ENFORCEMENT_LEVEL=OFF|DOCUMENT_PLUS|ALL.