AI Gateway Service — Epics
Service: ai-gateway-service Epic prefix: AIGW-EPIC Last updated: 2026-04-18
Epics
AIGW-EPIC-01 — AI Assist Core (Policy-Gated Inference)
| Field | Value |
|---|---|
| Issue type | Epic |
| Summary | Policy-gated AI assist endpoint with provenance stamping |
| Status | To Do |
| Priority | Must |
| Labels | service:ai-gateway-service, domain:ai_gateway, slice:S0 |
| Components | ai-gateway-service, identity-service, config-service |
| Fix version | M1 |
| FR references | FR-AIGW-001, FR-AIGW-002, FR-AIGW-003 |
| Legacy FR refs | FR-AI-002, FR-AI-003, FR-NFR-015 (from _sources/ai-orchestrator/) |
| Dependencies | AIGW-EPIC-02, cross-service: IDENT-EPIC-01 |
| Rollup status | Not started |
Business outcome: Every AI-enabled feature in the platform has a single, secure, policy-checked inference entry point — eliminating direct provider key exposure and providing a traceable provenance record for every AI output.
Description:
Implement the POST /v1/ai/assist endpoint as the sole gateway for all model inference calls. The endpoint must evaluate JWT validity, module entitlement, feature-flag status, quota availability, and (for PHI-touching features) consent before forwarding to a provider. Every completed inference produces an immutable AIProvenance record and an AIDecision in draft state. Fail-closed policy must apply when the policy service is unavailable. Success criteria: all consumer services using the gateway with zero direct provider calls remaining.
Stories: AIGW-US-001, AIGW-US-002, AIGW-US-003, AIGW-US-004
AIGW-EPIC-02 — Provider Routing and Resilience
| Field | Value |
|---|---|
| Issue type | Epic |
| Summary | Multi-provider routing with circuit breakers and fallback |
| Status | To Do |
| Priority | Must |
| Labels | service:ai-gateway-service, domain:ai_gateway, slice:S0 |
| Components | ai-gateway-service, config-service |
| Fix version | M1 |
| FR references | FR-AIGW-004, FR-AIGW-005 |
| Legacy FR refs | FR-NFR-015, FR-NFR-017 |
| Dependencies | AIGW-EPIC-01, cross-service: CONFIG-EPIC-01 |
| Rollup status | Not started |
Business outcome: The platform is not dependent on a single AI provider. Routing rules, data-residency constraints, and fallback chains allow transparent provider failover with no caller impact, supporting Afghanistan MoPH data-residency requirements through on-prem routing.
Description:
Build a ProviderRoutingRule aggregate driven by config-service that maps (tenant, featureKey, residency) to an ordered provider list with fallback. Implement adapters for Anthropic, OpenAI, Azure OpenAI, AWS Bedrock, on-prem vLLM, and Ollama behind the ModelProvider port. Circuit breakers must open after consecutive failures and emit ai_gateway.provider.degraded.v1. Routing rules must be hot-reloadable without service restart. Success criteria: single provider outage causes automatic failover within 500 ms with no caller 5xx.
Stories: AIGW-US-005, AIGW-US-006, AIGW-US-007
AIGW-EPIC-03 — HITL (Human-in-the-Loop) Workflow
| Field | Value |
|---|---|
| Issue type | Epic |
| Summary | HITL queue with reviewer accept/reject and clinical safety gate |
| Status | To Do |
| Priority | Must |
| Labels | service:ai-gateway-service, domain:ai_gateway, slice:S1 |
| Components | ai-gateway-service, communication-service, provider-directory-service |
| Fix version | M2 |
| FR references | FR-AIGW-006, FR-AIGW-007, FR-AIGW-008 |
| Legacy FR refs | FR-AI-004 |
| Dependencies | AIGW-EPIC-01, cross-service: COMMS-EPIC-01 |
| Rollup status | Not started |
Business outcome: Clinical AI outputs with HITLPolicy=required never reach patient charts without explicit reviewer sign-off, satisfying the core clinical safety requirement that AI is assistive and not autonomous for clinical decision-adjacent features.
Description:
When HITLPolicy for a feature key is required or required_for_phi, the AIDecision persists in draft state and a reviewer notification is dispatched via communication-service. Reviewers access the queue via GET /v1/ai/decisions/review-queue and submit verdict via POST /v1/ai/decisions/:id/review. Accepted decisions emit ai_gateway.decision.accepted.v1 with full provenance, consumed by the owning clinical service. Auto-reject fires on configurable timeout. Success criteria: HITL end-to-end tested with patient-chart-service consuming accepted event and writing NoteAIProvenance.
Stories: AIGW-US-008, AIGW-US-009, AIGW-US-010
AIGW-EPIC-04 — AI Moderation and Safety Controls
| Field | Value |
|---|---|
| Issue type | Epic |
| Summary | Pre- and post-inference moderation for safety and PHI minimisation |
| Status | To Do |
| Priority | Must |
| Labels | service:ai-gateway-service, domain:ai_gateway, slice:S1 |
| Components | ai-gateway-service, moderation-classifier |
| Fix version | M2 |
| FR references | FR-AIGW-009, FR-AIGW-010 |
| Legacy FR refs | FR-AI-005 |
| Dependencies | AIGW-EPIC-01 |
| Rollup status | Not started |
Business outcome: Malicious prompts and unsafe outputs are intercepted before they enter the clinical record or reach patients, protecting both clinicians and the platform from prompt-injection and content safety violations.
Description:
Deploy a moderation pipeline running pre-inference (input check) and post-inference (output check). The classifier evaluates safety categories (violence, self-harm, misinformation, PHI exposure) and returns a ModerationVerdict of allow, flag, or block. Blocked inputs return 422 AI_MODERATION_BLOCKED and emit ai_gateway.moderation.flagged.v1. PHI minimisation preprocessing strips or masks patient identifiers before prompt construction when feature key config requires it. Success criteria: adversarial test suite passes with false-positive rate < 1 %.
Stories: AIGW-US-011, AIGW-US-012
AIGW-EPIC-05 — Quota and Cost Governance
| Field | Value |
|---|---|
| Issue type | Epic |
| Summary | Per-tenant rolling quota with spend alerts |
| Status | To Do |
| Priority | Should |
| Labels | service:ai-gateway-service, domain:ai_gateway, slice:S1 |
| Components | ai-gateway-service, config-service |
| Fix version | M2 |
| FR references | FR-AIGW-011, FR-AIGW-012 |
| Legacy FR refs | FR-AI-006 |
| Dependencies | AIGW-EPIC-01, AIGW-EPIC-02 |
| Rollup status | Not started |
Business outcome: Tenant AI usage is bounded and predictable; runaway spend from misconfiguration or abuse is prevented; platform SRE can observe and alert on quota utilisation in real time.
Description:
Implement rolling quota windows backed by Redis INCR with window TTL. Quota is consumed atomically with assist request acceptance; compensating decrements apply on provider error. Exceeded quota returns 429 AI_QUOTA_EXCEEDED with retry-after header. Spend alerts fire at 80 % and 100 % of configured window. Tenant admins can view current quota status via GET /v1/ai/quota. Success criteria: quota exceeded returns 429 reliably under concurrent load; alert fires within 2 minutes of crossing threshold.
Stories: AIGW-US-013, AIGW-US-014
AIGW-EPIC-06 — Prompt Template Governance
| Field | Value |
|---|---|
| Issue type | Epic |
| Summary | Semver-versioned prompt template registry with tenant overrides |
| Status | To Do |
| Priority | Should |
| Labels | service:ai-gateway-service, domain:ai_gateway, slice:S0 |
| Components | ai-gateway-service, config-service |
| Fix version | M1 |
| FR references | FR-AIGW-013 |
| Legacy FR refs | — |
| Dependencies | AIGW-EPIC-01 |
| Rollup status | Not started |
Business outcome: Prompt templates are centrally managed, versioned, and tamper-evident — ensuring clinical AI behaviour is reproducible, auditable, and can be updated without code deployments.
Description:
Build a PromptTemplate registry accessible via POST /v1/admin/prompt-templates (platform admin) and GET /v1/ai/prompt-templates/:key (caller). Templates are semver-versioned; the raw template body is stored only in a secure registry (hashed in DB). Feature keys reference templates by { key, version }. Tenant-scoped overrides are supported. Deprecation workflow prevents deletion of templates referenced by active AIDecision rows. Success criteria: template version bump triggers no silent behaviour change in active features; hash mismatch returns 500 with alert.
Stories: AIGW-US-015, AIGW-US-016
AIGW-EPIC-07 — AI Gateway Observability and Audit Integration
| Field | Value |
|---|---|
| Issue type | Epic |
| Summary | Full OTEL tracing, SLO dashboards, and audit event emission |
| Status | To Do |
| Priority | Must |
| Labels | service:ai-gateway-service, domain:ai_gateway, slice:S0 |
| Components | ai-gateway-service, audit-service, observability |
| Fix version | M1 |
| FR references | FR-AIGW-014, FR-AIGW-015 |
| Legacy FR refs | FR-NFR-018 |
| Dependencies | AIGW-EPIC-01, cross-service: AUDIT-EPIC-01 |
| Rollup status | Not started |
Business outcome: Every AI decision is traceable from request to audit trail with sub-second latency observability, enabling compliance officers to answer "what AI output was used in this clinical decision?" with full provenance.
Description:
Instrument every assist, moderation, HITL, and provider call with OpenTelemetry spans. Emit structured ai.* NATS events (consumed by audit-service) for every state transition. Publish ai_gateway_assist_duration_ms, ai_gateway_moderation_blocked_total, and ai_gateway_quota_exceeded_total Prometheus metrics. Deploy Grafana dashboard "AI Gateway — Overview" with SLO burn-rate panels. All ai.* events must carry correlationId and provenanceId but must NOT carry raw prompt text in event payload (FR-AIGW-015 / FR-NFR-018). Success criteria: traces visible in Tempo; audit-service ingesting all ai.* events; dashboard live in staging.
Stories: AIGW-US-017, AIGW-US-018