AI Gateway Service — Epics

Service: ai-gateway-service Epic prefix: AIGW-EPIC Last updated: 2026-04-18

Epics

AIGW-EPIC-01 — AI Assist Core (Policy-Gated Inference)

Field	Value
Issue type	Epic
Summary	Policy-gated AI assist endpoint with provenance stamping
Status	To Do
Priority	Must
Labels	service:ai-gateway-service, domain:ai_gateway, slice:S0
Components	ai-gateway-service, identity-service, config-service
Fix version	M1
FR references	FR-AIGW-001, FR-AIGW-002, FR-AIGW-003
Legacy FR refs	FR-AI-002, FR-AI-003, FR-NFR-015 (from _sources/ai-orchestrator/)
Dependencies	AIGW-EPIC-02, cross-service: IDENT-EPIC-01
Rollup status	Not started

Business outcome: Every AI-enabled feature in the platform has a single, secure, policy-checked inference entry point — eliminating direct provider key exposure and providing a traceable provenance record for every AI output.

Description: Implement the POST /v1/ai/assist endpoint as the sole gateway for all model inference calls. The endpoint must evaluate JWT validity, module entitlement, feature-flag status, quota availability, and (for PHI-touching features) consent before forwarding to a provider. Every completed inference produces an immutable AIProvenance record and an AIDecision in draft state. Fail-closed policy must apply when the policy service is unavailable. Success criteria: all consumer services using the gateway with zero direct provider calls remaining.

Stories: AIGW-US-001, AIGW-US-002, AIGW-US-003, AIGW-US-004

AIGW-EPIC-02 — Provider Routing and Resilience

Field	Value
Issue type	Epic
Summary	Multi-provider routing with circuit breakers and fallback
Status	To Do
Priority	Must
Labels	service:ai-gateway-service, domain:ai_gateway, slice:S0
Components	ai-gateway-service, config-service
Fix version	M1
FR references	FR-AIGW-004, FR-AIGW-005
Legacy FR refs	FR-NFR-015, FR-NFR-017
Dependencies	AIGW-EPIC-01, cross-service: CONFIG-EPIC-01
Rollup status	Not started

Business outcome: The platform is not dependent on a single AI provider. Routing rules, data-residency constraints, and fallback chains allow transparent provider failover with no caller impact, supporting Afghanistan MoPH data-residency requirements through on-prem routing.

Description: Build a ProviderRoutingRule aggregate driven by config-service that maps (tenant, featureKey, residency) to an ordered provider list with fallback. Implement adapters for Anthropic, OpenAI, Azure OpenAI, AWS Bedrock, on-prem vLLM, and Ollama behind the ModelProvider port. Circuit breakers must open after consecutive failures and emit ai_gateway.provider.degraded.v1. Routing rules must be hot-reloadable without service restart. Success criteria: single provider outage causes automatic failover within 500 ms with no caller 5xx.

Stories: AIGW-US-005, AIGW-US-006, AIGW-US-007

AIGW-EPIC-03 — HITL (Human-in-the-Loop) Workflow

Field	Value
Issue type	Epic
Summary	HITL queue with reviewer accept/reject and clinical safety gate
Status	To Do
Priority	Must
Labels	service:ai-gateway-service, domain:ai_gateway, slice:S1
Components	ai-gateway-service, communication-service, provider-directory-service
Fix version	M2
FR references	FR-AIGW-006, FR-AIGW-007, FR-AIGW-008
Legacy FR refs	FR-AI-004
Dependencies	AIGW-EPIC-01, cross-service: COMMS-EPIC-01
Rollup status	Not started

Business outcome: Clinical AI outputs with HITLPolicy=required never reach patient charts without explicit reviewer sign-off, satisfying the core clinical safety requirement that AI is assistive and not autonomous for clinical decision-adjacent features.

Description: When HITLPolicy for a feature key is required or required_for_phi, the AIDecision persists in draft state and a reviewer notification is dispatched via communication-service. Reviewers access the queue via GET /v1/ai/decisions/review-queue and submit verdict via POST /v1/ai/decisions/:id/review. Accepted decisions emit ai_gateway.decision.accepted.v1 with full provenance, consumed by the owning clinical service. Auto-reject fires on configurable timeout. Success criteria: HITL end-to-end tested with patient-chart-service consuming accepted event and writing NoteAIProvenance.

Stories: AIGW-US-008, AIGW-US-009, AIGW-US-010

AIGW-EPIC-04 — AI Moderation and Safety Controls

Field	Value
Issue type	Epic
Summary	Pre- and post-inference moderation for safety and PHI minimisation
Status	To Do
Priority	Must
Labels	service:ai-gateway-service, domain:ai_gateway, slice:S1
Components	ai-gateway-service, moderation-classifier
Fix version	M2
FR references	FR-AIGW-009, FR-AIGW-010
Legacy FR refs	FR-AI-005
Dependencies	AIGW-EPIC-01
Rollup status	Not started

Business outcome: Malicious prompts and unsafe outputs are intercepted before they enter the clinical record or reach patients, protecting both clinicians and the platform from prompt-injection and content safety violations.

Description: Deploy a moderation pipeline running pre-inference (input check) and post-inference (output check). The classifier evaluates safety categories (violence, self-harm, misinformation, PHI exposure) and returns a ModerationVerdict of allow, flag, or block. Blocked inputs return 422 AI_MODERATION_BLOCKED and emit ai_gateway.moderation.flagged.v1. PHI minimisation preprocessing strips or masks patient identifiers before prompt construction when feature key config requires it. Success criteria: adversarial test suite passes with false-positive rate < 1 %.

Stories: AIGW-US-011, AIGW-US-012

AIGW-EPIC-05 — Quota and Cost Governance

Field	Value
Issue type	Epic
Summary	Per-tenant rolling quota with spend alerts
Status	To Do
Priority	Should
Labels	service:ai-gateway-service, domain:ai_gateway, slice:S1
Components	ai-gateway-service, config-service
Fix version	M2
FR references	FR-AIGW-011, FR-AIGW-012
Legacy FR refs	FR-AI-006
Dependencies	AIGW-EPIC-01, AIGW-EPIC-02
Rollup status	Not started

Business outcome: Tenant AI usage is bounded and predictable; runaway spend from misconfiguration or abuse is prevented; platform SRE can observe and alert on quota utilisation in real time.

Description: Implement rolling quota windows backed by Redis INCR with window TTL. Quota is consumed atomically with assist request acceptance; compensating decrements apply on provider error. Exceeded quota returns 429 AI_QUOTA_EXCEEDED with retry-after header. Spend alerts fire at 80 % and 100 % of configured window. Tenant admins can view current quota status via GET /v1/ai/quota. Success criteria: quota exceeded returns 429 reliably under concurrent load; alert fires within 2 minutes of crossing threshold.

Stories: AIGW-US-013, AIGW-US-014

AIGW-EPIC-06 — Prompt Template Governance

Field	Value
Issue type	Epic
Summary	Semver-versioned prompt template registry with tenant overrides
Status	To Do
Priority	Should
Labels	service:ai-gateway-service, domain:ai_gateway, slice:S0
Components	ai-gateway-service, config-service
Fix version	M1
FR references	FR-AIGW-013
Legacy FR refs	—
Dependencies	AIGW-EPIC-01
Rollup status	Not started

Business outcome: Prompt templates are centrally managed, versioned, and tamper-evident — ensuring clinical AI behaviour is reproducible, auditable, and can be updated without code deployments.

Description: Build a PromptTemplate registry accessible via POST /v1/admin/prompt-templates (platform admin) and GET /v1/ai/prompt-templates/:key (caller). Templates are semver-versioned; the raw template body is stored only in a secure registry (hashed in DB). Feature keys reference templates by { key, version }. Tenant-scoped overrides are supported. Deprecation workflow prevents deletion of templates referenced by active AIDecision rows. Success criteria: template version bump triggers no silent behaviour change in active features; hash mismatch returns 500 with alert.

Stories: AIGW-US-015, AIGW-US-016

AIGW-EPIC-07 — AI Gateway Observability and Audit Integration

Field	Value
Issue type	Epic
Summary	Full OTEL tracing, SLO dashboards, and audit event emission
Status	To Do
Priority	Must
Labels	service:ai-gateway-service, domain:ai_gateway, slice:S0
Components	ai-gateway-service, audit-service, observability
Fix version	M1
FR references	FR-AIGW-014, FR-AIGW-015
Legacy FR refs	FR-NFR-018
Dependencies	AIGW-EPIC-01, cross-service: AUDIT-EPIC-01
Rollup status	Not started

Business outcome: Every AI decision is traceable from request to audit trail with sub-second latency observability, enabling compliance officers to answer "what AI output was used in this clinical decision?" with full provenance.

Description: Instrument every assist, moderation, HITL, and provider call with OpenTelemetry spans. Emit structured ai.* NATS events (consumed by audit-service) for every state transition. Publish ai_gateway_assist_duration_ms, ai_gateway_moderation_blocked_total, and ai_gateway_quota_exceeded_total Prometheus metrics. Deploy Grafana dashboard "AI Gateway — Overview" with SLO burn-rate panels. All ai.* events must carry correlationId and provenanceId but must NOT carry raw prompt text in event payload (FR-AIGW-015 / FR-NFR-018). Success criteria: traces visible in Tempo; audit-service ingesting all ai.* events; dashboard live in staging.

Stories: AIGW-US-017, AIGW-US-018

Epics​

AIGW-EPIC-01 — AI Assist Core (Policy-Gated Inference)​

AIGW-EPIC-02 — Provider Routing and Resilience​

AIGW-EPIC-03 — HITL (Human-in-the-Loop) Workflow​

AIGW-EPIC-04 — AI Moderation and Safety Controls​

AIGW-EPIC-05 — Quota and Cost Governance​

AIGW-EPIC-06 — Prompt Template Governance​

AIGW-EPIC-07 — AI Gateway Observability and Audit Integration​