consent-ledger-service — Service Readiness
Version: 1.0
Status: Draft
Owner: Trust and Safety
Last Updated: 2026-04-21
References: SERVICE_OVERVIEW.md, _report.md, docs/architecture/ADR-0004-national-backbone-resilience.md
This document tracks the readiness criteria for taking consent-ledger-service from development to production. Given the service is the platform's authoritative consent ledger and is consulted synchronously on every outbound SMS, readiness bar is elevated: fail-closed behaviour, sub-5 ms P95 CheckConsent, hash-chain audit integrity, and 7-year regulator-defensible retention.
1. Code Readiness
| Criterion | Status | Notes |
|---|
gRPC ConsentLedgerService.v1 — CheckConsent / RecordConsent / RevokeConsent / RecordConsentBatch | ☐ | Core hot-path. |
REST /v1/consent/* — tenant records, double-opt-in, erasure, admin DND | ☐ | |
Citizen-portal REST /v1/consent/records?msisdn= with MSISDN-OTP verification | ☐ | |
STOP-keyword NATS consumer on sms.mo.inbound | ☐ | Durable, queue group consent-ledger-stop. |
| ATRA National DND sync worker (cron daily 03:00 Asia/Kabul) | ☐ | Graceful on ATRA unreachable. |
Audit hash-chain implementation (prev_hash, `record_hash = sha256(payload | | prev_hash)`) |
| Audit chain daily verifier job (last 24 h) | ☐ | |
| Erasure processor (MSISDN → deterministic-hash tokenisation) | ☐ | GDPR 30-day SLA. |
| Monthly partition creator + cold-tier archive job (> 13 m → S3) | ☐ | |
| Redis hot-cache fill + invalidation on state change | ☐ | TTL 300 s; invalidation on revoke. |
Fail-closed on CheckConsent when Redis cache miss + Postgres unavailable | ☐ | Return allowed=false, reason=CONSENT_UNKNOWN. |
| Localised STOP-ack dispatcher (en/fa/ps/ar) via channel-router | ☐ | Lane=P2 transactional. |
| Bulk-import CSV processor (US-CONS-018) | ☐ | |
| Consent SDK published (US-CONS-019) | ☐ | Node, Python, Java initial set. |
| Idempotency-Key support on REST writes | ☐ | |
| mTLS gRPC client-cert verification | ☐ | Mesh SVID enforcement. |
2. Testing Readiness
| Criterion | Target | Status |
|---|
| Unit test coverage | ≥ 90% line (domain) / ≥ 80% branch | ☐ |
| Unit tests for consent state machine transitions | ≥ 20 tests per scope | ☐ |
| Unit tests for STOP-keyword matcher per language | ≥ 50 per language (en/fa/ps/ar) | ☐ |
| Unit tests for MSISDN normalisation and hash-tokenisation | ≥ 30 | ☐ |
| Unit tests for hash-chain integrity (happy path, tamper, break) | ≥ 15 | ☐ |
| Property-based tests (fast-check) — chain monotonicity, scope isolation | ≥ 10 properties | ☐ |
Integration tests: gRPC CheckConsent P95 ≤ 5 ms @ 5000 RPS | Passed | ☐ |
| Integration test: STOP MO → consent.revoked.v1 end-to-end < 1 s | Passed | ☐ |
| Integration test: ATRA DND sync with mock endpoint | Passed | ☐ |
| Integration test: multi-region replication of control-plane data | Passed | ☐ |
| Contract test with compliance-engine (CONSENT rule integration) | Passed | ☐ |
| Contract test with routing-engine (last-mile veto) | Passed | ☐ |
| Contract test with channel-router (MO STOP detection) | Passed | ☐ |
| Chaos test: Postgres unavailable → fail-closed verified | Passed | ☐ |
| Chaos test: Redis unavailable → PG fallback, P95 degrades gracefully | Passed | ☐ |
| Chaos test: NATS lag → STOP-keyword processing queues, no message loss | Passed | ☐ |
| Security test: RLS cross-tenant read/write blocked | Passed | ☐ |
| Security test: audit log UPDATE/DELETE rejected at Postgres trigger | Passed | ☐ |
| Security test: hash-chain tamper detected by verifier within 24 h | Passed | ☐ |
| Security test: MSISDN erasure actually purges from records + audit (tokenised) | Passed | ☐ |
| Load test: 10 000 RPS sustained for 1 h, P99 ≤ 20 ms | Passed | ☐ |
3. Observability Readiness
| Criterion | Status |
|---|
| All Prometheus metrics emitting (see OBSERVABILITY.md §1) | ☐ |
Grafana dashboard consent-ledger-service.json deployed | ☐ |
| All alerts configured in Alertmanager with runbooks | ☐ |
| Structured JSON logs (Pino) with MSISDN hash-masking | ☐ |
| OpenTelemetry trace propagation from Kong → compliance-engine → consent-ledger verified | ☐ |
| Loki parsing rules for service logs validated | ☐ |
SIEM forwarding of consent.* events via regulator-portal-service verified | ☐ |
4. Security Readiness
| Criterion | Status |
|---|
| mTLS enforced on gRPC port (SPIRE SVID, per ADR-0004 §12) | ☐ |
| NetworkPolicy restricting ingress to compliance-engine, routing-engine, sms-firewall, channel-router | ☐ |
| Kong JWT validation on all REST endpoints | ☐ |
| Citizen-portal MSISDN-OTP verification flow hardened (rate-limit, anti-enumeration) | ☐ |
| RBAC: tenant scope, citizen self-only, admin | ☐ |
| MSISDN encryption at rest (per-tenant DEK wrapped by HSM KEK per ADR-0004 §11) | ☐ |
| Erasure tokenisation uses HSM-bound deterministic key (FF1) | ☐ |
| Audit log trigger rejects UPDATE/DELETE (Postgres rule) | ☐ |
RLS policies verified on consent.records and consent.audit | ☐ |
| Penetration test against citizen-portal + gRPC completed | ☐ |
| Security team sign-off | ☐ |
5. Operational Readiness
| Criterion | Status |
|---|
| K8s Deployment manifest (3–15 replicas, HPA on gRPC RPS) reviewed | ☐ |
PodDisruptionBudget minAvailable: 2 (per region) | ☐ |
| Rolling update tested: zero dropped gRPC calls under steady 2 000 RPS | ☐ |
| Graceful shutdown: 15 s drain with SIGTERM handler | ☐ |
| Resource requests/limits validated under 5 000 RPS load | ☐ |
Postgres connection pool sized (pgbouncer in transaction mode recommended) | ☐ |
| Redis connection pool sized (min 50, max 200 per pod) | ☐ |
| Multi-region replication verified (kbl ↔ mzr logical repl for control-plane) | ☐ |
| ATRA DND-sync runbook drafted | ☐ |
| Erasure-request handling runbook drafted (Legal + Trust & Safety joint) | ☐ |
| Hash-chain-break incident runbook drafted | ☐ |
| On-call rotation assigned (Trust & Safety + SRE shared primary) | ☐ |
6. Documentation Readiness
| Document | Status |
|---|
| SERVICE_OVERVIEW.md | Complete |
| DOMAIN_MODEL.md | Complete |
| APPLICATION_LOGIC.md | Complete |
| API_CONTRACTS.md | Complete |
| EVENT_SCHEMAS.md | Complete |
| DATA_MODEL.md | Complete |
| SYNC_CONTRACT.md | Complete |
| SECURITY_MODEL.md | Complete |
| OBSERVABILITY.md | Complete |
| TESTING_STRATEGY.md | Complete |
| DEPLOYMENT_TOPOLOGY.md | Complete |
| FAILURE_MODES.md | Complete |
| LOCAL_DEV_SETUP.md | Complete |
| AI_INTEGRATION.md | Complete |
| MIGRATION_PLAN.md | Complete |
| SERVICE_RISK_REGISTER.md | Complete |
| Runbook: DND sync staleness | ☐ |
| Runbook: hash-chain verifier break | ☐ |
| Runbook: STOP-keyword false positive triage | ☐ |
| Runbook: citizen erasure end-to-end | ☐ |
| Legal briefing: 7-year retention + GDPR erasure interaction | ☐ |
| Operator handbook for Trust & Safety reviewers | ☐ |
7. Compliance / Regulatory Readiness
| Criterion | Status |
|---|
| DPIA authored for MSISDN processing and hashing | ☐ |
| Legal review of 7-year audit retention vs. GDPR right-to-erasure interaction | ☐ |
| ATRA Memorandum of Understanding on National DND registry integration | ☐ |
| Citizen-portal Terms of Use and privacy notice approved | ☐ |
| STOP-keyword default catalog reviewed by Legal and Trust & Safety lead | ☐ |
| Initial scope catalog (TRANSACTIONAL/MARKETING/OTP/EMERGENCY) approved | ☐ |
| Cross-tenant STOP-propagation policy signed off (per-tenant default) | ☐ |
| Tenant-portal consent-inspection flow reviewed | ☐ |
| 7-year retention policy with S3 immutable bucket configured | ☐ |
| SIEM-forwarding of consent events to regulator-portal approved | ☐ |
8. Go/No-Go Criteria Summary
Production deployment is GO when all of the following are met:
9. Post-Launch Review
Within 30 days of full enforcement:
10. Phased Rollout
The service follows a 3-phase rollout to de-risk national-backbone deployment:
| Phase | Duration | Behaviour | Exit criteria |
|---|
| P1 — Shadow | 14 d | CheckConsent returns allowed=true for all requests; record hypothetical verdicts in audit table. No enforcement. | Metrics parity; false-positive rate projection < 1%. |
| P2 — Enforcement (single scope) | 7 d | MARKETING scope enforcement on only; other scopes still shadow. | No tenant escalation beyond expected; SLA met. |
| P3 — Full Enforcement | Ongoing | All scopes enforced; citizen-portal live; STOP-keyword processing live. | N/A (steady state). |
Rollback at any phase: feature-flag CONSENT_ENFORCEMENT_ENABLED=false, consumers fall back to previous behaviour (which, per the critique, was implicit).