Skip to main content

sender-id-registry-service — Sync Contract

Version: 1.0 Status: Draft Owner: Trust & Safety + Regulator-facing Last Updated: 2026-04-21 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS

This document defines what other services depend on from sender-id-registry-service, what it depends on from others, the conflict policy per aggregate for the multi-region (kbl ↔ mzr) replication topology, and the canonical proto/IDL.


1. Consumers of sender-id-registry-service

ServiceInterfaceDependency typeSLA expectation
compliance-enginegRPC Verify, GetReputationSynchronous on every per-message evaluation (SENDER_ID + REPUTATION rule)P95 ≤ 5 ms; availability 99.99%
routing-enginegRPC VerifySynchronous last-mile veto before submit_smP95 ≤ 5 ms; availability 99.99%
sms-firewall-servicegRPC VerifySynchronous inbound MT firewall + transit firewallP95 ≤ 5 ms; availability 99.99%
channel-router-servicegRPC VerifySynchronous on multi-channel fallbackP95 ≤ 10 ms; availability 99.95%
admin-dashboardHTTP REST /v1/admin/sender-ids/* + SSE on sender.id.*Reviewer workbench (per EP-ADMDASH-11)P95 ≤ 500 ms; availability 99.9%
customer-portalHTTP REST /v1/sender-ids/* (tenant-scoped)Tenant submission, verification, statusP95 ≤ 500 ms; availability 99.9%
regulator-portal-serviceHTTP REST /v1/admin/sender-ids/export (mTLS) + NATS sender.id.regulator.exported.v1Regulator export receiverP95 ≤ 2 s; availability 99.5%
Public citizensHTTP REST /v1/sender-ids/public/* (anonymous, edge-cached)Citizen lookupP95 ≤ 100 ms (edge)
analytics-serviceNATS sender.id.*Long-term archivalbest-effort
notification-serviceNATS sender.id.kyc_*, .activated, .suspended, .reactivated, .revoked, .reputation.changedTenant-side notificationsbest-effort

Contract semantics

  • Hot-path callers (compliance-engine, routing-engine, sms-firewall-service) MUST treat any error or RegistryStatus ∈ {UNKNOWN, TENANT_MISMATCH, SUSPENDED, REVOKED} as a non-allow signal and apply their lane-specific fail-closed rule. They MUST NOT cache Verify responses beyond 5 minutes and MUST honour sender.id.cache.invalidate events to drop in-process cache eagerly.
  • Admin / portal callers receive HTTP 503 DEPENDENCY_UNAVAILABLE on Postgres outage and SHOULD retry with backoff.
  • Regulator export is eventually consistent: an on-demand export reflects state at request time; concurrent state changes are captured in the next export.

2. Dependencies of sender-id-registry-service

DependencyInterfaceFailure mode if unavailable
PostgreSQL sender_id_registry schemaRead/write SQL via connection poolVerify falls back to last-cached Redis value; cold miss → status: UNKNOWN. Submission returns 503.
RedisGET/SET/INCR/EXPIRECache miss fallback to DB; latency degrades P95 to ~30 ms; OTP submit returns 503 (no plaintext to compare)
NATS JetStreamPublish (outbox relay) + consume (fraud.detected.*, compliance.message.*, regulator.complaint.*, dlr.aggregate.*, auth.user.erased.*)Outbox accumulates; reputation deltas pause; regulator export still possible
Object storage (S3-compatible / MinIO)KYC blob + regulator export blobSubmission returns 503 (cannot persist KYC); regulator export deferred
Vault TransitPer-tenant KEK for KYC encryption + signing of regulator exportSubmission returns 503; export queued
HSM (PKCS#11)Regulator export signing key (per ADR-0004 §11)Export queued; alert SidExportSignerDown
auth-serviceValidates JWTs upstream of KongReject all REST traffic (Kong handles)
channel-router-serviceOTP delivery on lane P1OTP issuance returns 503; tenant uses DOCUMENT or NOTARISED method as workaround
External DNS (DoT to 1.1.1.1, 8.8.8.8)TXT record resolution for DOMAIN_DNS verificationVerification check defers; manual fallback after 24 h
ATRA SFTP endpointRegulator export deliveryLocal export persisted; transmission retried 6 h; on persistent failure alert regulator-liaison

3. Proto Definition

syntax = "proto3";
package ghasi.sms.sid.v1;
option go_package = "github.com/ghasi/sms-gateway/sid/v1";

import "google/protobuf/timestamp.proto";

service SenderIdRegistryService {
// Hot path. P95 ≤ 5 ms, P99 ≤ 15 ms. mTLS required.
rpc Verify(VerifyRequest) returns (VerifyResponse);

// Hot path. P95 ≤ 5 ms. mTLS required.
rpc GetReputation(GetReputationRequest) returns (GetReputationResponse);

// Bulk lookup (warm). P95 ≤ 50 ms for ≤ 100 ids. mTLS required.
rpc BatchVerify(BatchVerifyRequest) returns (BatchVerifyResponse);
}

message VerifyRequest {
string sender_id = 1;
SenderIdType type = 2;
string tenant_id = 3;
string trace_id = 4;
}

message VerifyResponse {
RegistryStatus status = 1;
VerificationLevel current_level = 2;
bool has_domain_dns = 3;
google.protobuf.Timestamp last_verified_at = 4;
int32 reputation_score = 5;
string restricted_category = 6;
bool meets_required_level = 7;
string registrant_org_name = 8;
}

message GetReputationRequest {
string sender_id = 1;
SenderIdType type = 2;
bool include_trend_90d = 3;
}

message GetReputationResponse {
int32 score = 1;
google.protobuf.Timestamp last_computed_at = 2;
repeated DailyScore trend_90d = 3;
}

message DailyScore { string date = 1; int32 score = 2; }

message BatchVerifyRequest { repeated VerifyRequest items = 1; }
message BatchVerifyResponse { repeated VerifyResponse items = 1; }

enum SenderIdType {
SENDER_ID_TYPE_UNSPECIFIED = 0; ALPHA = 1; SHORT = 2; LONG = 3;
}

enum RegistryStatus {
REGISTRY_STATUS_UNSPECIFIED = 0;
ACTIVE = 1; SUSPENDED = 2; REVOKED = 3;
UNKNOWN = 4; TENANT_MISMATCH = 5; PENDING = 6;
}

enum VerificationLevel {
VERIFICATION_LEVEL_UNSPECIFIED = 0;
NONE = 1; OTP = 2; DOCUMENT = 3; NOTARISED = 4;
}

4. Conflict policy per aggregate (multi-region)

Per ADR-0004 §5, sender-ID data is multi-master across kbl and mzr. The following conflict policies are enforced at the logical-replication apply layer:

AggregatePolicyRationale
SenderIdserver_authoritative + LWW (HLC) on the active row keyed by (value_normalised, type)The "winner" is the row whose origin region accepted the KYC_APPROVED transition first (recorded in HLC); subsequent state changes are LWW. Same-instant submissions in two regions resolve to the row with the lower senderIdInternalId UUID lexicographically.
KycDocumentappend_onlyUUID-keyed; no conflict possible
Verificationappend_only + LWW on terminal stateMultiple concurrent verifications can exist; the first to reach SUCCEEDED raises level. State machine prohibits regression.
RestrictedPatternserver_authoritative + LWWAdmin-only writes; conflict rare; LWW on version.
ReputationSnapshotappend_only with deduplicationEach region computes its own daily snapshots; deduplication key (snapshot_at, sender_id_internal_id, trigger_event_id). Cron is single-region (lock in Redis).
RegulatorExportsingle-region ownershipExport is owned by exactly one region at a time (Redis lock sid:export:lock:cluster); the other region records receipt only.
AuditEntryappend_onlyUUID-keyed; both regions can write; replication union-merges.

Split-brain scenario. If kbl ↔ mzr link partitions for >5 min:

  • Both regions continue accepting submissions and state transitions.
  • On heal, replication apply detects conflicts via HLC + UUID. Same-(value, type) collisions trigger Trust & Safety alert SidSplitBrainConflict for manual reconciliation. The "winning" row's tenant retains the value; the losing tenant is notified to resubmit.
  • Audit and reputation diverge harmlessly; both rows are preserved.

5. Outbox Pattern

sender-id-registry-service writes events via a transactional outbox to guarantee state-change ↔ event coupling.

-- In a single transaction:
BEGIN;
UPDATE sender_id_registry.sender_ids SET state = 'ACTIVE', ... WHERE ...;
INSERT INTO sender_id_registry.outbox (event_id, subject, payload) VALUES (...);
INSERT INTO sender_id_registry.audit (...) VALUES (...);
COMMIT;

A relay process (OutboxPublisher, leader-elected via Redis lock per pod) polls every 200 ms:

SELECT event_id, subject, payload
FROM sender_id_registry.outbox
WHERE published_at IS NULL
ORDER BY created_at
LIMIT 200
FOR UPDATE SKIP LOCKED;

Publishes to NATS, on success UPDATE outbox SET published_at = now(). On failure, increments attempts; after 10 attempts the row is moved to outbox_dead (separate table) and an alert fires.


6. Inbox Pattern

For consumed NATS events that mutate state (e.g. fraud.detected.* adjusting reputation), an inbox table prevents duplicate processing on consumer redelivery:

CREATE TABLE sender_id_registry.inbox (
event_id UUID PRIMARY KEY,
subject TEXT NOT NULL,
consumed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Consumer logic:

on event(eventId, payload):
BEGIN;
INSERT INTO inbox (event_id, subject) VALUES (eventId, subject)
ON CONFLICT DO NOTHING RETURNING event_id;
if no row inserted -> already processed, ack and return
apply state mutation (insert reputation_history row, etc.)
write outbox row for any downstream emission
COMMIT;
ack NATS message

Inbox rows TTL'd after 7 days (NATS deduplication window covers earlier replays).


7. Schema Stability Guarantees

gRPC Proto

FieldStability
VerifyRequest.* required fieldsStable
VerifyResponse.* required fieldsStable
RegistryStatus enum valuesStable — new values may be added; callers must handle UNSPECIFIED
VerificationLevel enum valuesStable — additions only
New fields with default valuesNon-breaking (proto3 semantics)

REST API

  • Routes under /v1/sender-ids/* and /v1/admin/sender-ids/* maintain backwards compatibility within the major version.
  • Breaking changes require /v2/ prefix and a 90-day deprecation window.

NATS subjects

  • .v1 suffix on every event subject; new subjects (sender.id.transferred.v1 etc.) are non-breaking; consumers opt in.
  • Event payload schemas are JSON Schema Draft 2020-12; additions are non-breaking; removals or type changes require .v2.

8. Versioning Policy

  • gRPC package: ghasi.sms.sid.v1. Co-ordinated v2 migration if breaking.
  • REST: semver. Breaking changes only on major versions, parallel-served for ≥ 90 days.
  • NATS: per-event schemaVersion field; subject-suffix major version.