Consent Ledger Service — Sync Contract

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: API_CONTRACTS · APPLICATION_LOGIC · SECURITY_MODEL · ADR-0004 §3

This document defines what other services depend on from consent-ledger-service and what it depends on from others, how its aggregates resolve concurrent updates, how it replicates across regions per ADR-0004, and the proto contract for the synchronous gRPC surface.

1. Consumers of `consent-ledger-service`

Service	Interface	Dependency type	SLA expectation
`compliance-engine` (CONSENT rule type)	gRPC `ConsentLedgerService/CheckConsent`	Synchronous, per-message in async pipeline	P95 ≤ 5 ms; availability 99.95%
`routing-engine` (last-mile veto)	gRPC `CheckConsent`	Synchronous, last-mile pre-dispatch	P95 ≤ 5 ms; availability 99.95%
`sms-firewall-service` (inbound MT)	gRPC `CheckConsent`	Synchronous, per inbound MT	P95 ≤ 10 ms (firewall has slacker SLA); availability 99.9%
Tenant SDK / dev portal	gRPC `RecordConsent`, `RecordConsentBatch`, `RevokeConsent`	Synchronous tenant write path	P95 ≤ 80 ms; availability 99.9%
Tenant portal (REST)	`/v1/consent/records`, `/v1/consent/double-opt-in/`	Synchronous admin-style	P95 ≤ 200 ms; availability 99.5%
Citizen portal	`/v1/consent/citizen/*`	Synchronous public-facing	P95 ≤ 500 ms; availability 99.5%
`regulator-portal-service`	`/v1/admin/consent/audit`, NATS `consent.audit.`	Synchronous query + async events	Query P95 ≤ 2 s (hot window); event delivery P95 ≤ 5 s

Async contract semantics

Although CheckConsent is synchronous, all callers are operating inside the platform's async outbound message pipeline: tenants have already received 202 Accepted, the orchestrator is processing from NATS, and any failure of CheckConsent translates to a do-not-ack and JetStream redelivery upstream. This means consent-ledger-service operational fail-closed never violates the tenant's API contract — it manifests as DEAD_LETTER status in the portal after retry exhaustion.

compliance-engine, routing-engine, and sms-firewall-service MUST treat any non-OK response from CheckConsent as allowed: false for non-emergency lanes (defence in depth even though consent-ledger-service itself returns fail-closed on its own internal errors).

2. Dependencies of `consent-ledger-service`

Dependency	Interface	Failure mode if unavailable
PostgreSQL `consent` schema	Read/write SQL via PgBouncer pool	`CheckConsent` falls back to Redis only; if both miss → fail-closed `CONSENT_UNKNOWN`. Writes return `503`
Redis (cluster mode, DB 4)	GET/SET; Lua scripts for chained operations	Writes succeed without cache fill (DB hit on next read); `CheckConsent` falls back to Postgres direct
NATS JetStream	Publishes via outbox; consumes `sms.mo.inbound`, `auth.user.erased.v1`, `tenant.lifecycle.*`	Outbox accumulates rows; `OutboxRelay` retries; STOP processing pauses; alert `ConsentMoConsumerLag`
ATRA DND endpoint (SFTP/HTTPS)	Daily fetch	`DndSyncWorker` records error; `ConsentDndStale` fires after 24 h; verdicts continue against last-known DND
Vault (KMS Transit, KV, PKI)	Per-tenant KEK; pepper; mTLS certs	Service refuses to boot without TLS; cached DEKs cover ≤ 30 min outage; alert and refuse new tenant onboarding after expiry
`auth-service.VerifyOtpReceipt`	gRPC	Citizen erasure / inspection blocked; tenant flows unaffected
`channel-router-service` (publishes ack-back MT and double-opt-in MT)	NATS `sms.outbound.request`	Ack-back / opt-in SMS not dispatched; consent revocation still recorded; alert

3. Proto Definition

syntax = "proto3";
package ghasi.sms.consent.v1;
option go_package = "github.com/ghasi/sms-gateway/consent/v1";

import "google/protobuf/timestamp.proto";

service ConsentLedgerService {
  rpc CheckConsent       (CheckConsentRequest)        returns (CheckConsentResponse);
  rpc RecordConsent      (RecordConsentRequest)       returns (RecordConsentResponse);
  rpc RecordConsentBatch (RecordConsentBatchRequest)  returns (RecordConsentBatchResponse);
  rpc RevokeConsent      (RevokeConsentRequest)       returns (RevokeConsentResponse);
  rpc GetReputation      (GetReputationRequest)       returns (GetReputationResponse);
}

message CheckConsentRequest {
  string tenant_id     = 1;
  string msisdn        = 2;
  ConsentScope scope   = 3;
  string trace_id      = 4;
  Lane lane            = 5;
}

message CheckConsentResponse {
  bool allowed                          = 1;
  CheckConsentReason reason             = 2;
  string record_id                      = 3;
  google.protobuf.Timestamp cached_at   = 4;
  google.protobuf.Timestamp valid_until = 5;
}

// (full enums and other messages as in API_CONTRACTS §1)

The complete proto including enums, batch/reputation messages, and error definitions is the same as in API_CONTRACTS §1 and is not duplicated here.

4. Per-aggregate conflict policy

Three aggregate classes have distinct conflict semantics. All inter-region replication follows ADR-0004 §3 (Kabul control-plane primary; Herat / Mazar control-plane standbys; consent ledger is control-plane data — Kabul-region primary, hot standby Herat).

Aggregate	Policy	Rationale
`ConsentRecord`	server_authoritative with monotonic version	Records are immutable per row (`replaced_by` chain). Concurrent `RecordConsent` and `RevokeConsent` from the same tenant on the same `(msisdn, scope)` are serialised by a row-level advisory lock (`pg_advisory_xact_lock(hashtext(tenant_id::text
`ConsentAuditEntry`	append_only with chain ordering	Per-partition `seq` is monotonic; Postgres advisory lock on `partition_name` serialises inserts. The hash chain itself is the integrity guarantee. Cross-region replication is logical streaming replication (`pg_logical`) — Herat replays the chain in order. The chain verifier runs in both regions and cross-checks; a divergence raises `ConsentAuditChainBroken` (CRITICAL) with region tags.
`NationalDndEntry`	server_authoritative sourced from ATRA	The DND aggregate is a mirror; ATRA is the upstream master. The `DndSyncWorker` writes only in the Kabul primary; Herat receives via streaming replication. In a Kabul-region failure, the standby region serves DND reads from its replicated copy.
`StopKeyword`	server_authoritative with last-write-wins on `updated_at`	Catalog mutations are infrequent and admin-driven; LWW is acceptable. Hot reload in process memory uses `consent.stop_keywords.updated_at` as the change marker.
`DoubleOptin`	server_authoritative with state-machine guards	Status transitions enforced at the SQL level (`CHECK` + advisory lock on `optin_id`).
`ErasureRequest`	server_authoritative	Single owner per `erasure_id`; processor uses row-level advisory lock to prevent double processing.

Outbox pattern

Every state mutation writes a row to consent.outbox in the same Postgres transaction. The OutboxRelay worker (continuous; per-replica with SELECT ... FOR UPDATE SKIP LOCKED):

Picks up to 200 unpublished rows ordered by created_at.
Publishes to NATS with Msg-Id: event_id for consumer-side dedup.
Updates published_at on success; increments attempts and stores last_error on failure.

After 3 attempts the row remains for SRE inspection; ConsentOutboxStuck alert fires on count(unpublished AND attempts >= 3) > 0.

Cross-region replication topology

              Kabul (primary)              Herat (sync standby)        Mazar (async standby)
              ┌──────────────────┐          ┌──────────────────┐        ┌──────────────────┐
              │ Postgres 16      │ sync     │ Postgres 16      │ async  │ Postgres 16      │
Writes ──▶    │ consent schema   │ ───────▶ │ consent schema   │ ─────▶ │ consent schema   │
              │  RPO ≤ 5 s       │          │  RPO ≤ 5 s       │        │  RPO ≤ 60 s      │
              │ NATS cluster A   │          │ NATS cluster B   │        │ NATS cluster C   │
              │ Vault Transit    │          │ Vault Transit    │        │ Vault Transit    │
              │ Redis cluster    │          │ Redis cluster    │        │ Redis cluster    │
              └──────────────────┘          └──────────────────┘        └──────────────────┘
                       ▲                              ▲                              ▲
                       │ promote on failure ──────────┘                              │
                       └──────── tertiary failover ────────────────────────────────┘

Per ADR-0004 §3:

Kabul is the control-plane primary for consent-ledger-service.
Herat is the synchronous standby; promotion target for any Kabul outage > 60 s.
Mazar is the asynchronous tertiary; promotion target if both Kabul and Herat are unavailable.
All three regions are within Afghanistan; no consent data leaves the country (per CONS-US-015 data residency invariant).

Failover semantics

Failure	Detection	Action
Kabul Postgres primary down	Patroni health-check + 2 of 3 etcd consensus	Auto-promote Herat to primary; ≤ 90 s RTO; clients reconnect via DNS-managed Service VIP
Kabul control-plane region isolated	NetworkPolicy + cluster Heartbeat lost > 30 s	Manual cutover by on-call; `consent-ledger-service` write traffic routed to Herat
Audit chain divergence between regions	Cross-region `AuditChainVerifier` cron	CRITICAL alert; freeze writes; investigate; never auto-resolve
ATRA DND fetch fails	Worker error metric	Service continues with last-known DND; `ConsentDndStale` after 24 h

5. Schema stability guarantees

gRPC proto

Field	Stability
`CheckConsentRequest.tenant_id`, `msisdn`, `scope`	Stable; required forever
`CheckConsentResponse.allowed`, `reason`	Stable
`CheckConsentReason` enum	Stable; new values may be added; callers MUST handle `REASON_UNSPECIFIED`
`Lane` enum	Stable; new values may be added
`RecordConsentRequest.source`	Stable; `ConsentSource` may add fields
New fields with proto3 default values	Non-breaking

REST API

Routes under /v1/consent/* maintain backwards compatibility within v1.
Breaking changes require /v2/consent/* and a 90-day deprecation window.
Citizen-facing endpoints have stricter compatibility because of the human-facing flow; deprecation requires UX comms.

Event subjects

Per EVENT_SCHEMAS §7.

6. Versioning policy

gRPC package: ghasi.sms.consent.v1. Major version bump → coordinated migration plan.
REST: semantic versioning at the API level; OpenAPI document at /v1/consent/openapi.json is the contract source of truth.
Contract tests: Pact (admin REST), gRPC reflection-based contract tests for compliance-engine, routing-engine, sms-firewall-service. Run on every PR; failures block merge.

7. Failure-closed integration semantics

Consumers MUST implement defence-in-depth:

// inside compliance-engine CONSENT rule evaluator
let consent: CheckConsentResponse;
try {
  consent = await consentClient.checkConsent({
    tenantId, msisdn, scope, traceId, lane,
  }, { deadline: 50 /* ms; well above 5 ms P95 */ });
} catch (err) {
  // UNAVAILABLE / DEADLINE_EXCEEDED / INTERNAL → fail-closed
  if (lane === Lane.P0_EMERGENCY) {
    return { verdict: 'ALLOW', finding: 'consent_check_failed_emergency_bypass' };
  }
  return { verdict: 'BLOCK', finding: 'consent_check_failed_failclosed' };
}

if (!consent.allowed) {
  return {
    verdict: 'BLOCK',
    finding: `consent_${consent.reason.toLowerCase()}`,
    evidence: { recordId: consent.recordId },
  };
}
return { verdict: 'ALLOW' };

routing-engine and sms-firewall-service use the same pattern with their own deadlines (routing-engine: 30 ms, sms-firewall-service: 80 ms to accommodate inbound peer SLAs).

1. Consumers of consent-ledger-service​

Async contract semantics​

2. Dependencies of consent-ledger-service​

3. Proto Definition​

4. Per-aggregate conflict policy​

Outbox pattern​

Cross-region replication topology​

Failover semantics​

5. Schema stability guarantees​

gRPC proto​

REST API​

Event subjects​

6. Versioning policy​

7. Failure-closed integration semantics​