Skip to main content

AI Gateway Service — Service Readiness

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · DEFINITION_OF_DONE

1. Readiness gate overview

The AI Gateway is a safety-critical platform service. Production deployment is blocked until all gates below are green. Each gate has a designated verifier.


2. Gate checklist

Gate 1 — Documentation

CheckStatusVerifier
All 17 canonical docs populated (not stubs)Tech Lead
EPICS.md + USER_STORIES.md in JiraProduct Owner
SECURITY_MODEL.md reviewed and signedSecurity Lead
AI_INTEGRATION.md up-to-date with all feature keysClinical Informatics
MIGRATION_PLAN.md phase completion signedPlatform Eng Lead

Gate 2 — Code quality

CheckStatusVerifier
TypeScript strict; zero errorsCI
ESLint: zero violations; hexagonal import rules passCI
Unit test coverage ≥ 80 % (domain + application layers)CI
Integration test coverage ≥ 70 %CI
test/integration/tenant-isolation.spec.ts greenCI — mandatory
test/integration/outbox.spec.ts greenCI — mandatory
test/integration/inbox.spec.ts greenCI — mandatory
Pact consumer contract tests green against Pact brokerCI
Event schema conformance tests greenCI

Gate 3 — Security

CheckStatusVerifier
No raw model API keys in any service other than ai-gateway-serviceSecurity Scan / SRE
API keys stored in vault (not in .env committed to source)Security Lead
RLS policies enabled on all tablesDBA
ai_provenance: UPDATE/DELETE revoked from ai_gateway_app roleDBA
PHI-minimisation: prompt text not written to default logsSecurity Review
DPIA completed for each provider handling PHI data routesCompliance Officer
Prompt-injection defenses tested (adversarial test suite)Security Lead
Moderation classifier deployed and passing baseline test suiteAI Safety Lead

Gate 4 — Observability

CheckStatusVerifier
OpenTelemetry traces visible in Grafana TempoSRE
ai_gateway_assist_duration_ms histogram publishingSRE
NATS events reaching audit-service (ai.* stream healthy)SRE
Grafana dashboard "AI Gateway — Overview" deployedSRE
SLO burn-rate alert configured (P99 latency > 5 s)SRE
Dead-letter queue alert configuredSRE
On-call runbooks linked from OBSERVABILITY.mdSRE

Gate 5 — Operations

CheckStatusVerifier
Canary deploy completed (5 %, 30 min) in staging; rollback verifiedSRE
Circuit-breaker behaviour manually validated per providerPlatform Eng
Quota enforcement tested (quota exceeded → 429 returned)Platform Eng
HITL queue tested end-to-end (draft → review → accept / reject)Clinical Informatics
SERVICE_RISK_REGISTER.md reviewed; all CRITICAL/HIGH mitigatedTech Lead + SRE
On-call rotation assignedEngineering Manager

Gate 6 — Clinical safety (AI-specific)

CheckStatusVerifier
No feature key classified HITLPolicy=none for clinical-decision features without CMO sign-offChief Medical Officer / Clinical Informatics
AIProvenance requirement enforced — no clinical artifact merged without provenanceIdTech Lead
Moderation false-positive rate < 1 % confirmed in stagingAI Safety Lead
Reviewer notification latency < 5 min (P95) tested in stagingClinical Informatics
HITL timeout / auto-reject behaviour confirmed per featureClinical Informatics + Product Owner

3. Sign-off matrix

GateRequired signers
1 — DocumentationTech Lead, Product Owner
2 — Code qualityCI (automated) + Tech Lead
3 — SecuritySecurity Lead, DBA, Compliance Officer
4 — ObservabilitySRE Lead
5 — OperationsSRE Lead, Engineering Manager
6 — Clinical safetyChief Medical Officer or designee, Clinical Informatics Lead

Production deploy is blocked until all 6 gate checklists are fully checked.


4. Readiness level targets

LevelDescriptionTarget milestone
L1Service boots; mock provider works; health endpoint 200M0
L2Single-provider (Anthropic) assist + provenance live in stagingM1
L3HITL queue live; moderation live; all consumer cutover completeM2
L4Multi-provider; per-tenant quota; all 6 gates green; production liveM3