sender-id-registry-service — Sync Contract
Version: 1.0 Status: Draft Owner: Trust & Safety + Regulator-facing Last Updated: 2026-04-21 Companion: API_CONTRACTS · DATA_MODEL · EVENT_SCHEMAS
This document defines what other services depend on from sender-id-registry-service, what it depends on from others, the conflict policy per aggregate for the multi-region (kbl ↔ mzr) replication topology, and the canonical proto/IDL.
1. Consumers of sender-id-registry-service
| Service | Interface | Dependency type | SLA expectation |
|---|---|---|---|
compliance-engine | gRPC Verify, GetReputation | Synchronous on every per-message evaluation (SENDER_ID + REPUTATION rule) | P95 ≤ 5 ms; availability 99.99% |
routing-engine | gRPC Verify | Synchronous last-mile veto before submit_sm | P95 ≤ 5 ms; availability 99.99% |
sms-firewall-service | gRPC Verify | Synchronous inbound MT firewall + transit firewall | P95 ≤ 5 ms; availability 99.99% |
channel-router-service | gRPC Verify | Synchronous on multi-channel fallback | P95 ≤ 10 ms; availability 99.95% |
admin-dashboard | HTTP REST /v1/admin/sender-ids/* + SSE on sender.id.* | Reviewer workbench (per EP-ADMDASH-11) | P95 ≤ 500 ms; availability 99.9% |
customer-portal | HTTP REST /v1/sender-ids/* (tenant-scoped) | Tenant submission, verification, status | P95 ≤ 500 ms; availability 99.9% |
regulator-portal-service | HTTP REST /v1/admin/sender-ids/export (mTLS) + NATS sender.id.regulator.exported.v1 | Regulator export receiver | P95 ≤ 2 s; availability 99.5% |
| Public citizens | HTTP REST /v1/sender-ids/public/* (anonymous, edge-cached) | Citizen lookup | P95 ≤ 100 ms (edge) |
analytics-service | NATS sender.id.* | Long-term archival | best-effort |
notification-service | NATS sender.id.kyc_*, .activated, .suspended, .reactivated, .revoked, .reputation.changed | Tenant-side notifications | best-effort |
Contract semantics
- Hot-path callers (
compliance-engine,routing-engine,sms-firewall-service) MUST treat any error orRegistryStatus ∈ {UNKNOWN, TENANT_MISMATCH, SUSPENDED, REVOKED}as a non-allow signal and apply their lane-specific fail-closed rule. They MUST NOT cacheVerifyresponses beyond 5 minutes and MUST honoursender.id.cache.invalidateevents to drop in-process cache eagerly. - Admin / portal callers receive HTTP 503
DEPENDENCY_UNAVAILABLEon Postgres outage and SHOULD retry with backoff. - Regulator export is eventually consistent: an on-demand export reflects state at request time; concurrent state changes are captured in the next export.
2. Dependencies of sender-id-registry-service
| Dependency | Interface | Failure mode if unavailable |
|---|---|---|
PostgreSQL sender_id_registry schema | Read/write SQL via connection pool | Verify falls back to last-cached Redis value; cold miss → status: UNKNOWN. Submission returns 503. |
| Redis | GET/SET/INCR/EXPIRE | Cache miss fallback to DB; latency degrades P95 to ~30 ms; OTP submit returns 503 (no plaintext to compare) |
| NATS JetStream | Publish (outbox relay) + consume (fraud.detected.*, compliance.message.*, regulator.complaint.*, dlr.aggregate.*, auth.user.erased.*) | Outbox accumulates; reputation deltas pause; regulator export still possible |
| Object storage (S3-compatible / MinIO) | KYC blob + regulator export blob | Submission returns 503 (cannot persist KYC); regulator export deferred |
| Vault Transit | Per-tenant KEK for KYC encryption + signing of regulator export | Submission returns 503; export queued |
| HSM (PKCS#11) | Regulator export signing key (per ADR-0004 §11) | Export queued; alert SidExportSignerDown |
auth-service | Validates JWTs upstream of Kong | Reject all REST traffic (Kong handles) |
channel-router-service | OTP delivery on lane P1 | OTP issuance returns 503; tenant uses DOCUMENT or NOTARISED method as workaround |
External DNS (DoT to 1.1.1.1, 8.8.8.8) | TXT record resolution for DOMAIN_DNS verification | Verification check defers; manual fallback after 24 h |
| ATRA SFTP endpoint | Regulator export delivery | Local export persisted; transmission retried 6 h; on persistent failure alert regulator-liaison |
3. Proto Definition
syntax = "proto3";
package ghasi.sms.sid.v1;
option go_package = "github.com/ghasi/sms-gateway/sid/v1";
import "google/protobuf/timestamp.proto";
service SenderIdRegistryService {
// Hot path. P95 ≤ 5 ms, P99 ≤ 15 ms. mTLS required.
rpc Verify(VerifyRequest) returns (VerifyResponse);
// Hot path. P95 ≤ 5 ms. mTLS required.
rpc GetReputation(GetReputationRequest) returns (GetReputationResponse);
// Bulk lookup (warm). P95 ≤ 50 ms for ≤ 100 ids. mTLS required.
rpc BatchVerify(BatchVerifyRequest) returns (BatchVerifyResponse);
}
message VerifyRequest {
string sender_id = 1;
SenderIdType type = 2;
string tenant_id = 3;
string trace_id = 4;
}
message VerifyResponse {
RegistryStatus status = 1;
VerificationLevel current_level = 2;
bool has_domain_dns = 3;
google.protobuf.Timestamp last_verified_at = 4;
int32 reputation_score = 5;
string restricted_category = 6;
bool meets_required_level = 7;
string registrant_org_name = 8;
}
message GetReputationRequest {
string sender_id = 1;
SenderIdType type = 2;
bool include_trend_90d = 3;
}
message GetReputationResponse {
int32 score = 1;
google.protobuf.Timestamp last_computed_at = 2;
repeated DailyScore trend_90d = 3;
}
message DailyScore { string date = 1; int32 score = 2; }
message BatchVerifyRequest { repeated VerifyRequest items = 1; }
message BatchVerifyResponse { repeated VerifyResponse items = 1; }
enum SenderIdType {
SENDER_ID_TYPE_UNSPECIFIED = 0; ALPHA = 1; SHORT = 2; LONG = 3;
}
enum RegistryStatus {
REGISTRY_STATUS_UNSPECIFIED = 0;
ACTIVE = 1; SUSPENDED = 2; REVOKED = 3;
UNKNOWN = 4; TENANT_MISMATCH = 5; PENDING = 6;
}
enum VerificationLevel {
VERIFICATION_LEVEL_UNSPECIFIED = 0;
NONE = 1; OTP = 2; DOCUMENT = 3; NOTARISED = 4;
}
4. Conflict policy per aggregate (multi-region)
Per ADR-0004 §5, sender-ID data is multi-master across kbl and mzr. The following conflict policies are enforced at the logical-replication apply layer:
| Aggregate | Policy | Rationale |
|---|---|---|
SenderId | server_authoritative + LWW (HLC) on the active row keyed by (value_normalised, type) | The "winner" is the row whose origin region accepted the KYC_APPROVED transition first (recorded in HLC); subsequent state changes are LWW. Same-instant submissions in two regions resolve to the row with the lower senderIdInternalId UUID lexicographically. |
KycDocument | append_only | UUID-keyed; no conflict possible |
Verification | append_only + LWW on terminal state | Multiple concurrent verifications can exist; the first to reach SUCCEEDED raises level. State machine prohibits regression. |
RestrictedPattern | server_authoritative + LWW | Admin-only writes; conflict rare; LWW on version. |
ReputationSnapshot | append_only with deduplication | Each region computes its own daily snapshots; deduplication key (snapshot_at, sender_id_internal_id, trigger_event_id). Cron is single-region (lock in Redis). |
RegulatorExport | single-region ownership | Export is owned by exactly one region at a time (Redis lock sid:export:lock:cluster); the other region records receipt only. |
AuditEntry | append_only | UUID-keyed; both regions can write; replication union-merges. |
Split-brain scenario. If kbl ↔ mzr link partitions for >5 min:
- Both regions continue accepting submissions and state transitions.
- On heal, replication apply detects conflicts via HLC + UUID. Same-
(value, type)collisions trigger Trust & Safety alertSidSplitBrainConflictfor manual reconciliation. The "winning" row's tenant retains the value; the losing tenant is notified to resubmit. - Audit and reputation diverge harmlessly; both rows are preserved.
5. Outbox Pattern
sender-id-registry-service writes events via a transactional outbox to guarantee state-change ↔ event coupling.
-- In a single transaction:
BEGIN;
UPDATE sender_id_registry.sender_ids SET state = 'ACTIVE', ... WHERE ...;
INSERT INTO sender_id_registry.outbox (event_id, subject, payload) VALUES (...);
INSERT INTO sender_id_registry.audit (...) VALUES (...);
COMMIT;
A relay process (OutboxPublisher, leader-elected via Redis lock per pod) polls every 200 ms:
SELECT event_id, subject, payload
FROM sender_id_registry.outbox
WHERE published_at IS NULL
ORDER BY created_at
LIMIT 200
FOR UPDATE SKIP LOCKED;
Publishes to NATS, on success UPDATE outbox SET published_at = now(). On failure, increments attempts; after 10 attempts the row is moved to outbox_dead (separate table) and an alert fires.
6. Inbox Pattern
For consumed NATS events that mutate state (e.g. fraud.detected.* adjusting reputation), an inbox table prevents duplicate processing on consumer redelivery:
CREATE TABLE sender_id_registry.inbox (
event_id UUID PRIMARY KEY,
subject TEXT NOT NULL,
consumed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Consumer logic:
on event(eventId, payload):
BEGIN;
INSERT INTO inbox (event_id, subject) VALUES (eventId, subject)
ON CONFLICT DO NOTHING RETURNING event_id;
if no row inserted -> already processed, ack and return
apply state mutation (insert reputation_history row, etc.)
write outbox row for any downstream emission
COMMIT;
ack NATS message
Inbox rows TTL'd after 7 days (NATS deduplication window covers earlier replays).
7. Schema Stability Guarantees
gRPC Proto
| Field | Stability |
|---|---|
VerifyRequest.* required fields | Stable |
VerifyResponse.* required fields | Stable |
RegistryStatus enum values | Stable — new values may be added; callers must handle UNSPECIFIED |
VerificationLevel enum values | Stable — additions only |
| New fields with default values | Non-breaking (proto3 semantics) |
REST API
- Routes under
/v1/sender-ids/*and/v1/admin/sender-ids/*maintain backwards compatibility within the major version. - Breaking changes require
/v2/prefix and a 90-day deprecation window.
NATS subjects
.v1suffix on every event subject; new subjects (sender.id.transferred.v1etc.) are non-breaking; consumers opt in.- Event payload schemas are JSON Schema Draft 2020-12; additions are non-breaking; removals or type changes require
.v2.
8. Versioning Policy
- gRPC package:
ghasi.sms.sid.v1. Co-ordinatedv2migration if breaking. - REST: semver. Breaking changes only on major versions, parallel-served for ≥ 90 days.
- NATS: per-event
schemaVersionfield; subject-suffix major version.