Consent Ledger Service — Sync Contract
Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: API_CONTRACTS · APPLICATION_LOGIC · SECURITY_MODEL · ADR-0004 §3
This document defines what other services depend on from consent-ledger-service and what it depends on from others, how its aggregates resolve concurrent updates, how it replicates across regions per ADR-0004, and the proto contract for the synchronous gRPC surface.
1. Consumers of consent-ledger-service
| Service | Interface | Dependency type | SLA expectation |
|---|---|---|---|
compliance-engine (CONSENT rule type) | gRPC ConsentLedgerService/CheckConsent | Synchronous, per-message in async pipeline | P95 ≤ 5 ms; availability 99.95% |
routing-engine (last-mile veto) | gRPC CheckConsent | Synchronous, last-mile pre-dispatch | P95 ≤ 5 ms; availability 99.95% |
sms-firewall-service (inbound MT) | gRPC CheckConsent | Synchronous, per inbound MT | P95 ≤ 10 ms (firewall has slacker SLA); availability 99.9% |
| Tenant SDK / dev portal | gRPC RecordConsent, RecordConsentBatch, RevokeConsent | Synchronous tenant write path | P95 ≤ 80 ms; availability 99.9% |
| Tenant portal (REST) | /v1/consent/records*, /v1/consent/double-opt-in/* | Synchronous admin-style | P95 ≤ 200 ms; availability 99.5% |
| Citizen portal | /v1/consent/citizen/* | Synchronous public-facing | P95 ≤ 500 ms; availability 99.5% |
regulator-portal-service | /v1/admin/consent/audit*, NATS consent.audit.* | Synchronous query + async events | Query P95 ≤ 2 s (hot window); event delivery P95 ≤ 5 s |
Async contract semantics
Although CheckConsent is synchronous, all callers are operating inside the platform's async outbound message pipeline: tenants have already received 202 Accepted, the orchestrator is processing from NATS, and any failure of CheckConsent translates to a do-not-ack and JetStream redelivery upstream. This means consent-ledger-service operational fail-closed never violates the tenant's API contract — it manifests as DEAD_LETTER status in the portal after retry exhaustion.
compliance-engine, routing-engine, and sms-firewall-service MUST treat any non-OK response from CheckConsent as allowed: false for non-emergency lanes (defence in depth even though consent-ledger-service itself returns fail-closed on its own internal errors).
2. Dependencies of consent-ledger-service
| Dependency | Interface | Failure mode if unavailable |
|---|---|---|
PostgreSQL consent schema | Read/write SQL via PgBouncer pool | CheckConsent falls back to Redis only; if both miss → fail-closed CONSENT_UNKNOWN. Writes return 503 |
| Redis (cluster mode, DB 4) | GET/SET; Lua scripts for chained operations | Writes succeed without cache fill (DB hit on next read); CheckConsent falls back to Postgres direct |
| NATS JetStream | Publishes via outbox; consumes sms.mo.inbound, auth.user.erased.v1, tenant.lifecycle.* | Outbox accumulates rows; OutboxRelay retries; STOP processing pauses; alert ConsentMoConsumerLag |
| ATRA DND endpoint (SFTP/HTTPS) | Daily fetch | DndSyncWorker records error; ConsentDndStale fires after 24 h; verdicts continue against last-known DND |
| Vault (KMS Transit, KV, PKI) | Per-tenant KEK; pepper; mTLS certs | Service refuses to boot without TLS; cached DEKs cover ≤ 30 min outage; alert and refuse new tenant onboarding after expiry |
auth-service.VerifyOtpReceipt | gRPC | Citizen erasure / inspection blocked; tenant flows unaffected |
channel-router-service (publishes ack-back MT and double-opt-in MT) | NATS sms.outbound.request | Ack-back / opt-in SMS not dispatched; consent revocation still recorded; alert |
3. Proto Definition
syntax = "proto3";
package ghasi.sms.consent.v1;
option go_package = "github.com/ghasi/sms-gateway/consent/v1";
import "google/protobuf/timestamp.proto";
service ConsentLedgerService {
rpc CheckConsent (CheckConsentRequest) returns (CheckConsentResponse);
rpc RecordConsent (RecordConsentRequest) returns (RecordConsentResponse);
rpc RecordConsentBatch (RecordConsentBatchRequest) returns (RecordConsentBatchResponse);
rpc RevokeConsent (RevokeConsentRequest) returns (RevokeConsentResponse);
rpc GetReputation (GetReputationRequest) returns (GetReputationResponse);
}
message CheckConsentRequest {
string tenant_id = 1;
string msisdn = 2;
ConsentScope scope = 3;
string trace_id = 4;
Lane lane = 5;
}
message CheckConsentResponse {
bool allowed = 1;
CheckConsentReason reason = 2;
string record_id = 3;
google.protobuf.Timestamp cached_at = 4;
google.protobuf.Timestamp valid_until = 5;
}
// (full enums and other messages as in API_CONTRACTS §1)
The complete proto including enums, batch/reputation messages, and error definitions is the same as in API_CONTRACTS §1 and is not duplicated here.
4. Per-aggregate conflict policy
Three aggregate classes have distinct conflict semantics. All inter-region replication follows ADR-0004 §3 (Kabul control-plane primary; Herat / Mazar control-plane standbys; consent ledger is control-plane data — Kabul-region primary, hot standby Herat).
| Aggregate | Policy | Rationale |
|---|---|---|
ConsentRecord | server_authoritative with monotonic version | Records are immutable per row (replaced_by chain). Concurrent RecordConsent and RevokeConsent from the same tenant on the same (msisdn, scope) are serialised by a row-level advisory lock (`pg_advisory_xact_lock(hashtext(tenant_id::text |
ConsentAuditEntry | append_only with chain ordering | Per-partition seq is monotonic; Postgres advisory lock on partition_name serialises inserts. The hash chain itself is the integrity guarantee. Cross-region replication is logical streaming replication (pg_logical) — Herat replays the chain in order. The chain verifier runs in both regions and cross-checks; a divergence raises ConsentAuditChainBroken (CRITICAL) with region tags. |
NationalDndEntry | server_authoritative sourced from ATRA | The DND aggregate is a mirror; ATRA is the upstream master. The DndSyncWorker writes only in the Kabul primary; Herat receives via streaming replication. In a Kabul-region failure, the standby region serves DND reads from its replicated copy. |
StopKeyword | server_authoritative with last-write-wins on updated_at | Catalog mutations are infrequent and admin-driven; LWW is acceptable. Hot reload in process memory uses consent.stop_keywords.updated_at as the change marker. |
DoubleOptin | server_authoritative with state-machine guards | Status transitions enforced at the SQL level (CHECK + advisory lock on optin_id). |
ErasureRequest | server_authoritative | Single owner per erasure_id; processor uses row-level advisory lock to prevent double processing. |
Outbox pattern
Every state mutation writes a row to consent.outbox in the same Postgres transaction. The OutboxRelay worker (continuous; per-replica with SELECT ... FOR UPDATE SKIP LOCKED):
- Picks up to 200 unpublished rows ordered by
created_at. - Publishes to NATS with
Msg-Id: event_idfor consumer-side dedup. - Updates
published_aton success; incrementsattemptsand storeslast_erroron failure.
After 3 attempts the row remains for SRE inspection; ConsentOutboxStuck alert fires on count(unpublished AND attempts >= 3) > 0.
Cross-region replication topology
Kabul (primary) Herat (sync standby) Mazar (async standby)
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Postgres 16 │ sync │ Postgres 16 │ async │ Postgres 16 │
Writes ──▶ │ consent schema │ ───────▶ │ consent schema │ ─────▶ │ consent schema │
│ RPO ≤ 5 s │ │ RPO ≤ 5 s │ │ RPO ≤ 60 s │
│ NATS cluster A │ │ NATS cluster B │ │ NATS cluster C │
│ Vault Transit │ │ Vault Transit │ │ Vault Transit │
│ Redis cluster │ │ Redis cluster │ │ Redis cluster │
└──────────────────┘ └──────────────────┘ └──────────────────┘
▲ ▲ ▲
│ promote on failure ──────────┘ │
└──────── tertiary failover ────────────────────────────────┘
Per ADR-0004 §3:
- Kabul is the control-plane primary for consent-ledger-service.
- Herat is the synchronous standby; promotion target for any Kabul outage > 60 s.
- Mazar is the asynchronous tertiary; promotion target if both Kabul and Herat are unavailable.
- All three regions are within Afghanistan; no consent data leaves the country (per CONS-US-015 data residency invariant).
Failover semantics
| Failure | Detection | Action |
|---|---|---|
| Kabul Postgres primary down | Patroni health-check + 2 of 3 etcd consensus | Auto-promote Herat to primary; ≤ 90 s RTO; clients reconnect via DNS-managed Service VIP |
| Kabul control-plane region isolated | NetworkPolicy + cluster Heartbeat lost > 30 s | Manual cutover by on-call; consent-ledger-service write traffic routed to Herat |
| Audit chain divergence between regions | Cross-region AuditChainVerifier cron | CRITICAL alert; freeze writes; investigate; never auto-resolve |
| ATRA DND fetch fails | Worker error metric | Service continues with last-known DND; ConsentDndStale after 24 h |
5. Schema stability guarantees
gRPC proto
| Field | Stability |
|---|---|
CheckConsentRequest.tenant_id, msisdn, scope | Stable; required forever |
CheckConsentResponse.allowed, reason | Stable |
CheckConsentReason enum | Stable; new values may be added; callers MUST handle REASON_UNSPECIFIED |
Lane enum | Stable; new values may be added |
RecordConsentRequest.source | Stable; ConsentSource may add fields |
| New fields with proto3 default values | Non-breaking |
REST API
- Routes under
/v1/consent/*maintain backwards compatibility withinv1. - Breaking changes require
/v2/consent/*and a 90-day deprecation window. - Citizen-facing endpoints have stricter compatibility because of the human-facing flow; deprecation requires UX comms.
Event subjects
Per EVENT_SCHEMAS §7.
6. Versioning policy
- gRPC package:
ghasi.sms.consent.v1. Major version bump → coordinated migration plan. - REST: semantic versioning at the API level; OpenAPI document at
/v1/consent/openapi.jsonis the contract source of truth. - Contract tests: Pact (admin REST), gRPC reflection-based contract tests for
compliance-engine,routing-engine,sms-firewall-service. Run on every PR; failures block merge.
7. Failure-closed integration semantics
Consumers MUST implement defence-in-depth:
// inside compliance-engine CONSENT rule evaluator
let consent: CheckConsentResponse;
try {
consent = await consentClient.checkConsent({
tenantId, msisdn, scope, traceId, lane,
}, { deadline: 50 /* ms; well above 5 ms P95 */ });
} catch (err) {
// UNAVAILABLE / DEADLINE_EXCEEDED / INTERNAL → fail-closed
if (lane === Lane.P0_EMERGENCY) {
return { verdict: 'ALLOW', finding: 'consent_check_failed_emergency_bypass' };
}
return { verdict: 'BLOCK', finding: 'consent_check_failed_failclosed' };
}
if (!consent.allowed) {
return {
verdict: 'BLOCK',
finding: `consent_${consent.reason.toLowerCase()}`,
evidence: { recordId: consent.recordId },
};
}
return { verdict: 'ALLOW' };
routing-engine and sms-firewall-service use the same pattern with their own deadlines (routing-engine: 30 ms, sms-firewall-service: 80 ms to accommodate inbound peer SLAs).