SMS Orchestrator — Jira-Ready Epics & User Stories
Status: populated Owner: Platform Engineering Last updated: 2026-04-18 Service prefix: ORCH Scope: New epics/stories covering HTTP submit migration (from retired api-gateway per ADR-0001), pipeline orchestration, idempotency, retry/DLQ, and observability.
Epic Summary
| Epic ID | Title | Stories | Points |
|---|---|---|---|
| EP-ORCH-01 | HTTP Submit API (Kong-Fronted) | US-ORCH-001 – US-ORCH-006 | 34 |
| EP-ORCH-02 | Outbound Pipeline Orchestration | US-ORCH-010 – US-ORCH-016 | 40 |
| EP-ORCH-03 | Idempotency & Deduplication | US-ORCH-020 – US-ORCH-023 | 18 |
| EP-ORCH-04 | Retry & Dead-Letter Handling | US-ORCH-030 – US-ORCH-034 | 22 |
| EP-ORCH-05 | Observability & Readiness | US-ORCH-040 – US-ORCH-044 | 16 |
EP-ORCH-01 · HTTP Submit API (Kong-Fronted)
Context: Per ADR-0001 the retired custom
api-gatewayis replaced by Kong. HTTP submit responsibility moves tosms-orchestrator. This epic covers implementing the HTTP-facing submit endpoints that Kong proxies.
US-ORCH-001 · Implement POST /v1/sms/send endpoint
Type: Feature | Points: 5
Description:
As a Kong upstream target, I need a POST /v1/sms/send endpoint in sms-orchestrator that accepts a single outbound SMS request so that clients can submit messages through Kong.
Acceptance Criteria:
-
POST /v1/sms/sendaccepts{ to, from, body, messageId?, metadata? }JSON payload - Reads
X-Tenant-IdandX-Request-Idheaders injected by Kong - Returns
202 Acceptedwith{ messageId, status: "QUEUED", acceptedAt }on success - Returns
400 Bad Requestwith structured error body on Zod validation failure - Returns
409 Conflicton idempotency key collision (duplicateIdempotency-Keyheader within 48h) -
messageIdauto-generated as UUID v4 if not provided by client - Integration test: valid payload → 202 with messageId
US-ORCH-002 · Implement POST /v1/sms/bulk endpoint
Type: Feature | Points: 8
Description:
As a Kong upstream target, I need a POST /v1/sms/bulk endpoint accepting up to 1,000 SMS submissions per request so that clients can submit bulk campaigns efficiently.
Acceptance Criteria:
- Accepts
{ messages: Array<{ to, from, body, messageId? }> }with max 1,000 items - Returns
202 Acceptedwith{ batchId, accepted: N, rejected: N, results: [...] } - Each message in results includes
messageIdandstatus(QUEUED or INVALID) - Invalid messages within a batch are rejected individually; valid ones proceed
- Returns
400if all messages fail validation - Returns
413if array exceeds 1,000 items - E2E test: 500 messages, mix of valid/invalid → correct accepted/rejected counts
US-ORCH-003 · Implement GET /v1/sms/{messageId} status endpoint
Type: Feature | Points: 3
Description:
As a client, I need to poll GET /v1/sms/{messageId} to check the current status of a submitted message.
Acceptance Criteria:
- Returns
{ messageId, status, tenantId, to, from, createdAt, updatedAt }for known message - Returns
404for unknownmessageId - Returns
403ifX-Tenant-Idheader does not match message's tenantId - Response time P95 ≤ 50 ms (PG indexed query on
messageId)
US-ORCH-004 · Zod schema validation middleware
Type: Feature | Points: 5
Description:
As the submit pipeline, I need all incoming payloads validated against Zod schemas before processing so that malformed requests are rejected at the HTTP boundary.
Acceptance Criteria:
- E.164 regex validation on
tofield; returns field-level error path on failure -
from(sender ID): 1–11 chars alphanumeric or 1–15 digit numeric -
body: 1–1600 characters; segment count computed and returned in 202 response -
messageId: UUID v4 format when provided - Validation errors return
{ errors: [{ field, message, code }] }array - Unit tests for all validation rules including boundary values
US-ORCH-005 · Idempotency-Key header processing (HTTP layer)
Type: Feature | Points: 8
Description:
As the HTTP submit layer, I need to process Idempotency-Key headers so that duplicate requests within 48 hours return the original response without reprocessing.
Acceptance Criteria:
- On first request: compute
sha256(tenantId + ":" + Idempotency-Key), store in Redisorch:submit-idem:{hash}with 48h TTL, value = serialized 202 response - On replay: return stored 202 response with
Idempotency-Replayed: trueheader, skip pipeline - On Redis unavailable: process request normally (fail open) + emit
warnlog - Key collision scenario tested: two concurrent requests with same key → only one processed
-
SET NX EXused atomically
US-ORCH-006 · Kong route configuration for /v1/sms/* routes
Type: Configuration | Points: 5
Description:
As a platform operator, I need Kong routes for /v1/sms/send, /v1/sms/bulk, and /v1/sms/{messageId} pointing to sms-orchestrator so that client traffic reaches the correct upstream.
Acceptance Criteria:
- Kong
Serviceresource:sms-orchestrator, upstreamhttp://sms-orchestrator:3001 - Kong
Routeresources for all three paths with correct methods (POST, POST, GET) -
jwtplugin applied (validates Bearer token from auth-service JWKS) -
correlation-idplugin injectsX-Request-Id -
request-transformerplugin injectsX-Tenant-Idfrom JWTsubclaim - Configuration stored in
services/api-gateway/kong/declarative config - Integration test through Kong: 401 without token, 202 with valid token + payload
EP-ORCH-02 · Outbound Pipeline Orchestration
Context: Core NATS consumer pipeline: idempotency → validation → routing → operator publish → state persistence.
US-ORCH-010 · NATS consumer setup (sms.outbound.request)
Type: Feature | Points: 5
Description:
As the pipeline, I need a durable NATS JetStream consumer on sms.outbound.request so that submitted messages are processed reliably with at-least-once delivery.
Acceptance Criteria:
- Durable consumer name:
orch-consumer -
AckExplicitmode — NATS message acknowledged only after pipeline completion -
AckWait30s;MaxDeliver3 (application handles retries, not NATS) - Configurable
MAX_CONCURRENCY(default 10 in-flight messages) - Reconnect on NATS disconnect without losing in-flight messages
- Metrics:
nats_consumer_lag,nats_messages_in_flightexposed on/metrics
US-ORCH-011 · Pipeline idempotency check (NATS layer)
Type: Feature | Points: 3
Description:
As the NATS consumer pipeline, I need to check Redis for a processed messageId before executing pipeline stages so that NATS redeliveries don't double-process messages.
Acceptance Criteria:
- Key pattern:
orch:idem:{messageId}checked with Redis GET - On key present: ACK NATS message, emit
warnlog withduplicate: true, return - On key absent: SET NX with 48h TTL before processing
- On Redis unavailable: proceed with processing, emit
warn
US-ORCH-012 · Domain validation (pipeline stage)
Type: Feature | Points: 3
Description:
As the pipeline, I need domain-level validation of the NATS message payload so that structurally invalid messages are terminated early without retrying.
Acceptance Criteria:
- E.164
tovalidation, non-emptyfrom, body length ≤ 1600 chars, valid UUIDmessageId, non-emptytenantId - On failure: update PG status to
FAILED, publishsms.outbound.deadletter, ACK NATS, no retry - Segment count computed and stored in
sms_messages.segment_count
US-ORCH-013 · gRPC routing stage (routing-engine integration)
Type: Feature | Points: 8
Description:
As the pipeline, I need to call the routing-engine via gRPC to select an operator for each message so that messages are dispatched to the correct SMPP connector.
Acceptance Criteria:
- gRPC call:
SelectOperator(tenantId, to, from, messageType, messageId)→{operatorId, operatorSubject} -
NO_ROUTE_FOUNDerror → permanent failure: FAILED status + DLQ, no retry - Transient gRPC error (timeout, UNAVAILABLE) → triggers retry mechanism (EP-ORCH-04)
- P95 gRPC call latency ≤ 50 ms (measured via span)
-
operatorIdandrouteIdstored insms_messageson success - Update PG status to
ROUTINGbefore gRPC call,ROUTEDon success
US-ORCH-014 · Operator NATS publish stage
Type: Feature | Points: 5
Description:
As the pipeline, I need to publish the SMS payload to smpp.operator.{operatorId} after routing so that the SMPP connector receives the message for carrier submission.
Acceptance Criteria:
- Published subject:
smpp.operator.{operatorId}withSmppOutboundMessageschema -
X-Correlation-IDNATS header set tomessageId - Original
messageId,tenantId, and routing metadata included in payload - Update PG status to
SENTonly after successful NATS publish ACK - On NATS publish failure: triggers retry mechanism
US-ORCH-015 · Domain event emission (sms.events.status)
Type: Feature | Points: 8
Description:
As downstream consumers (billing, webhooks), I need status change events published to sms.events.status after every PG write so that consumers can react to message lifecycle transitions.
Acceptance Criteria:
- Payload:
{ messageId, tenantId, previousStatus, newStatus, timestamp, metadata? } - Published after PG commit, not before
- Publish failure logged but does not fail the pipeline stage
- All transitions emitted: QUEUED→ROUTING, ROUTING→ROUTED, ROUTED→SENT, *→FAILED, *→DEAD_LETTER, *→RETRY
- TypeScript interface defined in
event-schemas.ts
US-ORCH-016 · Message state persistence (PostgreSQL)
Type: Feature | Points: 8
Description:
As the audit trail, I need all message state transitions written to orch.sms_messages atomically so that message history is reliable and queryable.
Acceptance Criteria:
- INSERT on QUEUED (HTTP layer); UPDATE on all subsequent transitions
- All status updates wrapped in PG transactions
-
status_updated_atupdated on every transition -
processed_atset on first SENT or DEAD_LETTER terminal transition -
attempt_countincremented on each RETRY -
last_errorupdated with failure reason on each failed attempt - PG partitioned by month (
PARTITION BY RANGE (created_at)), 90-day retention policy
EP-ORCH-03 · Idempotency & Deduplication
US-ORCH-020 · Redis SET NX idempotency key creation
Type: Feature | Points: 3
Description:
As the pipeline, I need atomic SET NX operations for idempotency keys so that concurrent message deliveries don't result in double-processing in multi-replica deployments.
Acceptance Criteria:
-
SET NX EX 172800used for all idempotency key writes - Key pattern
orch:idem:{messageId}for pipeline-level;orch:submit-idem:{hash}for HTTP-level - Race condition test: two concurrent goroutines/workers with same messageId → only one proceeds
- Redis
MULTI/EXECnot required (SET NX is atomic)
US-ORCH-021 · Idempotency TTL and expiry behaviour
Type: Feature | Points: 3
Description:
As the platform, I need idempotency keys to expire after 48 hours so that Redis memory is bounded and replay protection windows are well-defined.
Acceptance Criteria:
- TTL = 172800 seconds (48h) set at key creation
- Expired keys allow re-processing (new submission treated as fresh request)
- TTL visible in Redis key inspection (
TTL orch:idem:*) - Redis key count monitored in Prometheus:
redis_key_count{prefix="orch:idem"}
US-ORCH-022 · Idempotency replay response for HTTP clients
Type: Feature | Points: 5
Description:
As an HTTP client, I need replayed requests to return the original 202 response body so that retrying clients receive consistent responses.
Acceptance Criteria:
- Original 202 response body serialized and stored in Redis alongside idempotency key
- Replay returns identical
{ messageId, status, acceptedAt }body - Response includes
Idempotency-Replayed: trueheader - Storage overhead bounded: response body stored as compact JSON string
US-ORCH-023 · Idempotency Redis failover behaviour
Type: Feature | Points: 7
Description:
As the platform, I need graceful degradation when Redis is unavailable so that idempotency failures don't block message submission.
Acceptance Criteria:
- Redis connection failure →
warnlog emitted withcomponent: idempotency - Message processing continues (fail open) — no 503 to client
-
redis_idempotency_skip_totalcounter incremented on each skip - Alert rule:
redis_idempotency_skip_total > 10in 5m window → PagerDuty warning
EP-ORCH-04 · Retry & Dead-Letter Handling
US-ORCH-030 · Exponential backoff retry policy
Type: Feature | Points: 5
Description:
As the pipeline, I need transient failures to trigger an exponential backoff retry so that temporary operator or routing outages don't immediately dead-letter messages.
Acceptance Criteria:
- Max 3 attempts (1 initial + 2 retries)
- Delays: attempt 1 → 1s, attempt 2 → 2s, attempt 3 → 4s
-
attempt_countincremented in PG on each attempt - Status updated to
RETRYwithlast_errorpopulated - Retry timing enforced via NATS delayed NAK (
nak(delay))
US-ORCH-031 · sms.outbound.retry event on each retry
Type: Feature | Points: 3
Description:
As downstream consumers and operations, I need a sms.outbound.retry event emitted on each retry attempt so that retry patterns are observable.
Acceptance Criteria:
- Published to
sms.outbound.retrysubject with{ messageId, tenantId, attemptNumber, failureReason, nextRetryAt } - TypeScript interface defined in event-schemas
- Published before delayed NAK
- Unit test: 3 consecutive failures → 3 retry events emitted
US-ORCH-032 · Dead-letter queue routing after max retries
Type: Feature | Points: 5
Description:
As the platform, I need exhausted messages routed to sms.outbound.deadletter so that no message is silently lost and dead-letter consumers can handle reprocessing or alerting.
Acceptance Criteria:
- After 3 failed attempts: publish to
sms.outbound.deadletter - DLQ payload:
{ messageId, tenantId, to, from, body, attemptCount, failureReason, failedAt } - PG status updated to
DEAD_LETTER - Original NATS message ACK'd only after successful DLQ publish
- DLQ publish failure retried up to 3 times independently before logging error
US-ORCH-033 · Permanent failure handling (validation + no-route)
Type: Feature | Points: 5
Description:
As the pipeline, I need certain failure types (invalid payload, no route found) to skip retries and go directly to DLQ so that unretryable messages don't consume retry budget.
Acceptance Criteria:
- Validation failure → immediate FAILED + DLQ,
attemptCount: 1 -
NO_ROUTE_FOUNDgRPC error → immediate FAILED + DLQ,attemptCount: 1 - Failure reason encoded as structured object
{ code: "VALIDATION_FAILED" | "NO_ROUTE" | ..., detail: string } - Unit tests for each permanent failure path
US-ORCH-034 · Stuck ROUTED row reconciliation job
Type: Feature | Points: 4
Description:
As the platform, I need a periodic reconciliation job that detects messages stuck in ROUTED status (operator publish confirmed but crash before PG update) so that they can be recovered or alerting triggered.
Acceptance Criteria:
- Cron job runs every 5 minutes
- Queries
orch.sms_messages WHERE status = 'ROUTED' AND status_updated_at < NOW() - INTERVAL '2 minutes' - Emits
warnlog per stuck message; incrementsorch_stuck_routed_totalcounter - Configurable threshold:
STUCK_ROUTED_THRESHOLD_SECONDSenv var (default 120) - Does NOT auto-retry (manual intervention or separate DLQ consumer)
EP-ORCH-05 · Observability & Readiness
US-ORCH-040 · Health and readiness endpoints
Type: Feature | Points: 2
Description:
As Kubernetes, I need /health/live and /health/ready endpoints so that liveness and readiness probes work correctly.
Acceptance Criteria:
-
GET /health/live→ 200 always if process is running -
GET /health/ready→ 200 only if NATS, PG, Redis, and routing-engine gRPC are reachable -
GET /health/ready→ 503 with{ dependencies: { nats: "down", ... } }if any dependency unhealthy - Response time ≤ 200 ms (with 1s timeout per dependency check)
US-ORCH-041 · Prometheus metrics endpoint
Type: Feature | Points: 3
Description:
As Prometheus, I need a /metrics endpoint in OpenMetrics format so that all pipeline metrics are scrapable.
Acceptance Criteria:
- Metrics exposed:
orch_messages_submitted_total,orch_pipeline_duration_seconds,orch_retry_total,orch_dlq_total,orch_idempotency_hit_total,nats_consumer_lag - All metrics labeled with
tenant_id,status,operator_idwhere applicable - Histogram buckets for
orch_pipeline_duration_seconds: 50ms, 100ms, 200ms, 500ms, 1s, 2s -
/metricsendpoint not exposed via Kong (internal only)
US-ORCH-042 · Structured JSON logging
Type: Feature | Points: 3
Description:
As the operations team, I need all logs emitted as structured JSON so that Loki can index and query them efficiently.
Acceptance Criteria:
- Log fields:
level,timestamp,messageId,tenantId,traceId,spanId,service: "sms-orchestrator",action,durationMs -
pinologger (or equivalent) configured with JSON transport - Sensitive fields redacted:
body(SMS content) not logged at INFO; only at DEBUG with explicit opt-in - Log level configurable via
LOG_LEVELenv var
US-ORCH-043 · OpenTelemetry trace propagation
Type: Feature | Points: 5
Description:
As the observability platform, I need trace context propagated from Kong through sms-orchestrator to downstream services so that end-to-end request traces are available in Grafana Tempo.
Acceptance Criteria:
-
W3C TraceContextheader propagated from KongX-Request-Id→ OTel span - Spans created for each pipeline stage: validate, idempotency, route, publish, persist
- gRPC call to routing-engine propagates trace context
- NATS publish includes
traceparentheader - OTel exporter configured via
OTEL_EXPORTER_OTLP_ENDPOINTenv var
US-ORCH-044 · Kubernetes deployment manifest and HPA
Type: DevOps | Points: 3
Description:
As the platform, I need a Kubernetes Deployment, Service, and HPA for sms-orchestrator so that it scales horizontally under load.
Acceptance Criteria:
-
DeploymentwithminReplicas: 2, resource requestscpu: 100m mem: 128Mi, limitscpu: 500m mem: 512Mi -
HPAscaling onnats_consumer_lag > 1000(KEDA) or CPU > 70% -
ServiceClusterIP on port 3001 -
livenessProbeandreadinessProbewired to health endpoints -
PodDisruptionBudgetensuring at least 1 pod available during rolling updates
EP-ORCH-06 · Priority-Lane Routing on Inbound Submit
Context: Per
EP-PLAT-NB-06(national priority lanes), every inbound submit must be tagged into one of P0 (emergency), P1 (OTP), P2 (transactional), P3 (marketing), P4 (broadcast). The orchestrator is the single point where lane assignment happens, and lane assignment determines NATS subject (lane.p0.*…lane.p4.*), TPS budget, compliance treatment, and SLA target.
US-ORCH-050 · Accept and validate X-Priority-Lane header
Type: Feature | Points: 5
Description:
As a tenant, I need to optionally hint the priority lane via X-Priority-Lane: P0|P1|P2|P3|P4 so that traffic class is explicit. The orchestrator must validate the tenant is authorised for the requested lane.
Acceptance Criteria:
- Header values constrained to
P0..P4; invalid values return 400 withcode: "INVALID_LANE". - Authorisation table
auth.tenant_lane_grants(tenantId, allowedLanes[]); enforced in orchestrator. Unauthorised lane → 403code: "LANE_NOT_GRANTED". - Default lane assignment when header absent: per tenant tier (
tier=ENTERPRISE → P2,tier=STANDARD → P3,tier=TRIAL → P3). - P0 requests require both
X-Priority-Lane: P0and a valid government PKI signature inX-Gov-Signature(else 403). - Unit test covers: each lane × authorised vs. unauthorised matrix (10 cases minimum).
US-ORCH-051 · Lane-aware NATS subject routing
Type: Feature | Points: 5
Description:
As the orchestrator, I need to publish sms.outbound.request to a lane-specific subject so that downstream consumers (compliance, routing, smpp) can apply lane-aware processing.
Acceptance Criteria:
- Subjects:
lane.p0.outbound.request,lane.p1.outbound.request, …,lane.p4.outbound.request. - Subject choice driven by the resolved lane (US-ORCH-050).
- Each lane subject is a distinct JetStream consumer with independent ack/lag metrics.
- Metric
orch_publish_lane_total{lane}increments per publish.
US-ORCH-052 · Per-lane back-pressure and 429-shaping
Type: Feature | Points: 5
Description: As the orchestrator, I need to shed inbound P3/P4 traffic before P0/P1 if NATS lag spikes so that high-priority traffic always wins.
Acceptance Criteria:
- Redis gauge
lane:lag:{P}updated every 5 s by a sidecar that reads JetStream consumer lag. - If
lag.P0 > 0(any backlog) for 30 s → reject all P3/P4 with 429code: "LANE_SHED". - If
lag.P1 > 1000for 30 s → reject P4 with 429code: "LANE_SHED". - Shed responses include
Retry-After: 60. - Telemetry:
orch_lane_shed_total{shedLane,protectedLane}counter.
US-ORCH-053 · Lane carried in NATS payload and DB row
Type: Feature | Points: 3
Description:
The lane assignment is a permanent property of the message and must be persisted on orch.sms_messages.lane and included in every downstream NATS payload so it can be enforced and reported on.
Acceptance Criteria:
- Schema migration adds
lane VARCHAR(2) NOT NULL DEFAULT 'P3'toorch.sms_messages. - All downstream NATS payloads (
sms.outbound.request,compliance.evaluate.request,sms.dispatch.command) includelanefield. -
GET /v1/sms/{messageId}response includeslanein the payload.
US-ORCH-054 · Lane-SLO Prometheus instrumentation
Type: Feature | Points: 3
Description: As SRE, I need Prometheus histograms per lane so that lane-specific SLO alerts can be wired (P1 OTP submit→DLR P95 ≤ 3 s, P2 transactional ≤ 10 s, …).
Acceptance Criteria:
- Histograms
orch_submit_to_dlr_seconds{lane}andorch_submit_to_ack_seconds{lane}registered. - Buckets tuned per lane (P1 buckets 0.5–10 s, P3 buckets 5–300 s).
- Recording rules emit per-lane SLO compliance percentages.
- Alert
OrchLaneSloBreach{lane=P1}fires when 5-min P95 > 3 s.
EP-ORCH-07 · Trusted-Tenant Fast-Path Submit (signed-template short-circuit)
Context: Per
EP-PLAT-NB-08andEP-CE-13(trusted-tenant fast-path), pre-vetted regulated tenants (banks, ministries, healthcare) submit using a pre-approved signed template; the orchestrator verifies fingerprint and routes with compliance in shadow mode rather than blocking. This delivers OTP-class latency without losing compliance evidence.
US-ORCH-060 · Accept X-Template-Id and variable bindings
Type: Feature | Points: 5
Description: As a trusted tenant, I want to submit by template ID + variable bindings so that the message body is reconstructed server-side from a pre-approved template.
Acceptance Criteria:
-
POST /v1/sms/sendaccepts{ to, from, templateId, variables }instead ofbody. - Template fetched from
compliance.approved_templates(cached in Redis 5 min); 404 withcode: "TEMPLATE_NOT_FOUND"if missing or revoked. - Variables substituted via Mustache; unsupplied variables → 400 with
code: "TEMPLATE_VARIABLE_MISSING". - Resulting body length recomputed; segment count returned in 202 response.
US-ORCH-061 · Verify content fingerprint and tenant approval
Type: Feature | Points: 5
Description: As the orchestrator, I need to verify that the rendered body fingerprint matches the approved template hash and that the tenant is on the trusted-tenant allow-list for that template.
Acceptance Criteria:
- Fingerprint =
sha256(templateId || normalised(rendered_body)). -
compliance.approved_templates.fingerprint_patternis a regex (allows variable spans); orchestrator verifies the rendered body matches. -
compliance.template_tenant_grants(templateId, tenantId, expiresAt); enforced. - Mismatch → fall back to full compliance evaluation; emit
orch.fastpath.fallback.v1event.
US-ORCH-062 · Compliance shadow-mode evaluation in fast path
Type: Feature | Points: 5
Description: Even when the fast path is taken, compliance must be evaluated in shadow mode so that any drift is detected without blocking delivery.
Acceptance Criteria:
- Orchestrator publishes
compliance.evaluate.shadow.v1(non-blocking, fire-and-forget within ack budget). - Compliance verdict on shadow path is logged but does not alter delivery decision.
- If shadow verdict ≠
ALLOW, alertComplianceFastpathDrift{tenantId,templateId}fires. - 1-in-1000 sample of fast-path messages is re-evaluated in blocking mode for drift detection.
US-ORCH-063 · Fast-path metrics and audit
Type: Feature | Points: 3
Description: As an auditor, I need metrics distinguishing fast-path from full-compliance traffic so that the share of bypass is monitorable.
Acceptance Criteria:
- Metric
orch_submit_path_total{path="fastpath"|"full"}increments per submit. -
orch.sms_messages.compliance_pathcolumn (FAST_PATH|FULL) populated. - Audit event
orch.fastpath.taken.v1emitted withtenantId, templateId, messageId, fingerprintMatched. - Grafana panel: fast-path share by tenant per hour.
US-ORCH-064 · Per-tenant fast-path kill-switch
Type: Feature | Points: 3
Description: As a security incident responder, I need to disable fast-path for a specific tenant or template within 30 s so that an abused fast-path can be revoked instantly.
Acceptance Criteria:
-
POST /v1/internal/orch/fastpath/disableaccepts{ tenantId?, templateId? }and pushes a Redis key with TTL 24 h. - Orchestrator checks the kill-switch on every submit before granting fast path.
- Re-enable via
DELETEor TTL expiry. - Audit event
orch.fastpath.killed.v1published.