Skip to main content

Consent Ledger Service — Sync Contract

Version: 1.0 Status: Draft Owner: Trust & Safety Last Updated: 2026-04-21 Companion: API_CONTRACTS · APPLICATION_LOGIC · SECURITY_MODEL · ADR-0004 §3

This document defines what other services depend on from consent-ledger-service and what it depends on from others, how its aggregates resolve concurrent updates, how it replicates across regions per ADR-0004, and the proto contract for the synchronous gRPC surface.


ServiceInterfaceDependency typeSLA expectation
compliance-engine (CONSENT rule type)gRPC ConsentLedgerService/CheckConsentSynchronous, per-message in async pipelineP95 ≤ 5 ms; availability 99.95%
routing-engine (last-mile veto)gRPC CheckConsentSynchronous, last-mile pre-dispatchP95 ≤ 5 ms; availability 99.95%
sms-firewall-service (inbound MT)gRPC CheckConsentSynchronous, per inbound MTP95 ≤ 10 ms (firewall has slacker SLA); availability 99.9%
Tenant SDK / dev portalgRPC RecordConsent, RecordConsentBatch, RevokeConsentSynchronous tenant write pathP95 ≤ 80 ms; availability 99.9%
Tenant portal (REST)/v1/consent/records*, /v1/consent/double-opt-in/*Synchronous admin-styleP95 ≤ 200 ms; availability 99.5%
Citizen portal/v1/consent/citizen/*Synchronous public-facingP95 ≤ 500 ms; availability 99.5%
regulator-portal-service/v1/admin/consent/audit*, NATS consent.audit.*Synchronous query + async eventsQuery P95 ≤ 2 s (hot window); event delivery P95 ≤ 5 s

Async contract semantics

Although CheckConsent is synchronous, all callers are operating inside the platform's async outbound message pipeline: tenants have already received 202 Accepted, the orchestrator is processing from NATS, and any failure of CheckConsent translates to a do-not-ack and JetStream redelivery upstream. This means consent-ledger-service operational fail-closed never violates the tenant's API contract — it manifests as DEAD_LETTER status in the portal after retry exhaustion.

compliance-engine, routing-engine, and sms-firewall-service MUST treat any non-OK response from CheckConsent as allowed: false for non-emergency lanes (defence in depth even though consent-ledger-service itself returns fail-closed on its own internal errors).


DependencyInterfaceFailure mode if unavailable
PostgreSQL consent schemaRead/write SQL via PgBouncer poolCheckConsent falls back to Redis only; if both miss → fail-closed CONSENT_UNKNOWN. Writes return 503
Redis (cluster mode, DB 4)GET/SET; Lua scripts for chained operationsWrites succeed without cache fill (DB hit on next read); CheckConsent falls back to Postgres direct
NATS JetStreamPublishes via outbox; consumes sms.mo.inbound, auth.user.erased.v1, tenant.lifecycle.*Outbox accumulates rows; OutboxRelay retries; STOP processing pauses; alert ConsentMoConsumerLag
ATRA DND endpoint (SFTP/HTTPS)Daily fetchDndSyncWorker records error; ConsentDndStale fires after 24 h; verdicts continue against last-known DND
Vault (KMS Transit, KV, PKI)Per-tenant KEK; pepper; mTLS certsService refuses to boot without TLS; cached DEKs cover ≤ 30 min outage; alert and refuse new tenant onboarding after expiry
auth-service.VerifyOtpReceiptgRPCCitizen erasure / inspection blocked; tenant flows unaffected
channel-router-service (publishes ack-back MT and double-opt-in MT)NATS sms.outbound.requestAck-back / opt-in SMS not dispatched; consent revocation still recorded; alert

3. Proto Definition

syntax = "proto3";
package ghasi.sms.consent.v1;
option go_package = "github.com/ghasi/sms-gateway/consent/v1";

import "google/protobuf/timestamp.proto";

service ConsentLedgerService {
rpc CheckConsent (CheckConsentRequest) returns (CheckConsentResponse);
rpc RecordConsent (RecordConsentRequest) returns (RecordConsentResponse);
rpc RecordConsentBatch (RecordConsentBatchRequest) returns (RecordConsentBatchResponse);
rpc RevokeConsent (RevokeConsentRequest) returns (RevokeConsentResponse);
rpc GetReputation (GetReputationRequest) returns (GetReputationResponse);
}

message CheckConsentRequest {
string tenant_id = 1;
string msisdn = 2;
ConsentScope scope = 3;
string trace_id = 4;
Lane lane = 5;
}

message CheckConsentResponse {
bool allowed = 1;
CheckConsentReason reason = 2;
string record_id = 3;
google.protobuf.Timestamp cached_at = 4;
google.protobuf.Timestamp valid_until = 5;
}

// (full enums and other messages as in API_CONTRACTS §1)

The complete proto including enums, batch/reputation messages, and error definitions is the same as in API_CONTRACTS §1 and is not duplicated here.


4. Per-aggregate conflict policy

Three aggregate classes have distinct conflict semantics. All inter-region replication follows ADR-0004 §3 (Kabul control-plane primary; Herat / Mazar control-plane standbys; consent ledger is control-plane data — Kabul-region primary, hot standby Herat).

AggregatePolicyRationale
ConsentRecordserver_authoritative with monotonic versionRecords are immutable per row (replaced_by chain). Concurrent RecordConsent and RevokeConsent from the same tenant on the same (msisdn, scope) are serialised by a row-level advisory lock (`pg_advisory_xact_lock(hashtext(tenant_id::text
ConsentAuditEntryappend_only with chain orderingPer-partition seq is monotonic; Postgres advisory lock on partition_name serialises inserts. The hash chain itself is the integrity guarantee. Cross-region replication is logical streaming replication (pg_logical) — Herat replays the chain in order. The chain verifier runs in both regions and cross-checks; a divergence raises ConsentAuditChainBroken (CRITICAL) with region tags.
NationalDndEntryserver_authoritative sourced from ATRAThe DND aggregate is a mirror; ATRA is the upstream master. The DndSyncWorker writes only in the Kabul primary; Herat receives via streaming replication. In a Kabul-region failure, the standby region serves DND reads from its replicated copy.
StopKeywordserver_authoritative with last-write-wins on updated_atCatalog mutations are infrequent and admin-driven; LWW is acceptable. Hot reload in process memory uses consent.stop_keywords.updated_at as the change marker.
DoubleOptinserver_authoritative with state-machine guardsStatus transitions enforced at the SQL level (CHECK + advisory lock on optin_id).
ErasureRequestserver_authoritativeSingle owner per erasure_id; processor uses row-level advisory lock to prevent double processing.

Outbox pattern

Every state mutation writes a row to consent.outbox in the same Postgres transaction. The OutboxRelay worker (continuous; per-replica with SELECT ... FOR UPDATE SKIP LOCKED):

  1. Picks up to 200 unpublished rows ordered by created_at.
  2. Publishes to NATS with Msg-Id: event_id for consumer-side dedup.
  3. Updates published_at on success; increments attempts and stores last_error on failure.

After 3 attempts the row remains for SRE inspection; ConsentOutboxStuck alert fires on count(unpublished AND attempts >= 3) > 0.

Cross-region replication topology

Kabul (primary) Herat (sync standby) Mazar (async standby)
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Postgres 16 │ sync │ Postgres 16 │ async │ Postgres 16 │
Writes ──▶ │ consent schema │ ───────▶ │ consent schema │ ─────▶ │ consent schema │
│ RPO ≤ 5 s │ │ RPO ≤ 5 s │ │ RPO ≤ 60 s │
│ NATS cluster A │ │ NATS cluster B │ │ NATS cluster C │
│ Vault Transit │ │ Vault Transit │ │ Vault Transit │
│ Redis cluster │ │ Redis cluster │ │ Redis cluster │
└──────────────────┘ └──────────────────┘ └──────────────────┘
▲ ▲ ▲
│ promote on failure ──────────┘ │
└──────── tertiary failover ────────────────────────────────┘

Per ADR-0004 §3:

  • Kabul is the control-plane primary for consent-ledger-service.
  • Herat is the synchronous standby; promotion target for any Kabul outage > 60 s.
  • Mazar is the asynchronous tertiary; promotion target if both Kabul and Herat are unavailable.
  • All three regions are within Afghanistan; no consent data leaves the country (per CONS-US-015 data residency invariant).

Failover semantics

FailureDetectionAction
Kabul Postgres primary downPatroni health-check + 2 of 3 etcd consensusAuto-promote Herat to primary; ≤ 90 s RTO; clients reconnect via DNS-managed Service VIP
Kabul control-plane region isolatedNetworkPolicy + cluster Heartbeat lost > 30 sManual cutover by on-call; consent-ledger-service write traffic routed to Herat
Audit chain divergence between regionsCross-region AuditChainVerifier cronCRITICAL alert; freeze writes; investigate; never auto-resolve
ATRA DND fetch failsWorker error metricService continues with last-known DND; ConsentDndStale after 24 h

5. Schema stability guarantees

gRPC proto

FieldStability
CheckConsentRequest.tenant_id, msisdn, scopeStable; required forever
CheckConsentResponse.allowed, reasonStable
CheckConsentReason enumStable; new values may be added; callers MUST handle REASON_UNSPECIFIED
Lane enumStable; new values may be added
RecordConsentRequest.sourceStable; ConsentSource may add fields
New fields with proto3 default valuesNon-breaking

REST API

  • Routes under /v1/consent/* maintain backwards compatibility within v1.
  • Breaking changes require /v2/consent/* and a 90-day deprecation window.
  • Citizen-facing endpoints have stricter compatibility because of the human-facing flow; deprecation requires UX comms.

Event subjects

Per EVENT_SCHEMAS §7.


6. Versioning policy

  • gRPC package: ghasi.sms.consent.v1. Major version bump → coordinated migration plan.
  • REST: semantic versioning at the API level; OpenAPI document at /v1/consent/openapi.json is the contract source of truth.
  • Contract tests: Pact (admin REST), gRPC reflection-based contract tests for compliance-engine, routing-engine, sms-firewall-service. Run on every PR; failures block merge.

7. Failure-closed integration semantics

Consumers MUST implement defence-in-depth:

// inside compliance-engine CONSENT rule evaluator
let consent: CheckConsentResponse;
try {
consent = await consentClient.checkConsent({
tenantId, msisdn, scope, traceId, lane,
}, { deadline: 50 /* ms; well above 5 ms P95 */ });
} catch (err) {
// UNAVAILABLE / DEADLINE_EXCEEDED / INTERNAL → fail-closed
if (lane === Lane.P0_EMERGENCY) {
return { verdict: 'ALLOW', finding: 'consent_check_failed_emergency_bypass' };
}
return { verdict: 'BLOCK', finding: 'consent_check_failed_failclosed' };
}

if (!consent.allowed) {
return {
verdict: 'BLOCK',
finding: `consent_${consent.reason.toLowerCase()}`,
evidence: { recordId: consent.recordId },
};
}
return { verdict: 'ALLOW' };

routing-engine and sms-firewall-service use the same pattern with their own deadlines (routing-engine: 30 ms, sms-firewall-service: 80 ms to accommodate inbound peer SLAs).