AI Gateway Service — Service Readiness
Status: populated
Owner: TBD
Last updated: 2026-04-18
Companion: Service Template · DEFINITION_OF_DONE
1. Readiness gate overview
The AI Gateway is a safety-critical platform service. Production deployment is blocked until all gates below are green. Each gate has a designated verifier.
2. Gate checklist
Gate 1 — Documentation
| Check | Status | Verifier |
|---|
| All 17 canonical docs populated (not stubs) | ☐ | Tech Lead |
EPICS.md + USER_STORIES.md in Jira | ☐ | Product Owner |
SECURITY_MODEL.md reviewed and signed | ☐ | Security Lead |
AI_INTEGRATION.md up-to-date with all feature keys | ☐ | Clinical Informatics |
MIGRATION_PLAN.md phase completion signed | ☐ | Platform Eng Lead |
Gate 2 — Code quality
| Check | Status | Verifier |
|---|
| TypeScript strict; zero errors | ☐ | CI |
| ESLint: zero violations; hexagonal import rules pass | ☐ | CI |
| Unit test coverage ≥ 80 % (domain + application layers) | ☐ | CI |
| Integration test coverage ≥ 70 % | ☐ | CI |
test/integration/tenant-isolation.spec.ts green | ☐ | CI — mandatory |
test/integration/outbox.spec.ts green | ☐ | CI — mandatory |
test/integration/inbox.spec.ts green | ☐ | CI — mandatory |
| Pact consumer contract tests green against Pact broker | ☐ | CI |
| Event schema conformance tests green | ☐ | CI |
Gate 3 — Security
| Check | Status | Verifier |
|---|
| No raw model API keys in any service other than ai-gateway-service | ☐ | Security Scan / SRE |
API keys stored in vault (not in .env committed to source) | ☐ | Security Lead |
| RLS policies enabled on all tables | ☐ | DBA |
ai_provenance: UPDATE/DELETE revoked from ai_gateway_app role | ☐ | DBA |
| PHI-minimisation: prompt text not written to default logs | ☐ | Security Review |
| DPIA completed for each provider handling PHI data routes | ☐ | Compliance Officer |
| Prompt-injection defenses tested (adversarial test suite) | ☐ | Security Lead |
| Moderation classifier deployed and passing baseline test suite | ☐ | AI Safety Lead |
Gate 4 — Observability
| Check | Status | Verifier |
|---|
| OpenTelemetry traces visible in Grafana Tempo | ☐ | SRE |
ai_gateway_assist_duration_ms histogram publishing | ☐ | SRE |
NATS events reaching audit-service (ai.* stream healthy) | ☐ | SRE |
| Grafana dashboard "AI Gateway — Overview" deployed | ☐ | SRE |
| SLO burn-rate alert configured (P99 latency > 5 s) | ☐ | SRE |
| Dead-letter queue alert configured | ☐ | SRE |
| On-call runbooks linked from OBSERVABILITY.md | ☐ | SRE |
Gate 5 — Operations
| Check | Status | Verifier |
|---|
| Canary deploy completed (5 %, 30 min) in staging; rollback verified | ☐ | SRE |
| Circuit-breaker behaviour manually validated per provider | ☐ | Platform Eng |
| Quota enforcement tested (quota exceeded → 429 returned) | ☐ | Platform Eng |
| HITL queue tested end-to-end (draft → review → accept / reject) | ☐ | Clinical Informatics |
SERVICE_RISK_REGISTER.md reviewed; all CRITICAL/HIGH mitigated | ☐ | Tech Lead + SRE |
| On-call rotation assigned | ☐ | Engineering Manager |
Gate 6 — Clinical safety (AI-specific)
| Check | Status | Verifier |
|---|
No feature key classified HITLPolicy=none for clinical-decision features without CMO sign-off | ☐ | Chief Medical Officer / Clinical Informatics |
AIProvenance requirement enforced — no clinical artifact merged without provenanceId | ☐ | Tech Lead |
| Moderation false-positive rate < 1 % confirmed in staging | ☐ | AI Safety Lead |
| Reviewer notification latency < 5 min (P95) tested in staging | ☐ | Clinical Informatics |
| HITL timeout / auto-reject behaviour confirmed per feature | ☐ | Clinical Informatics + Product Owner |
3. Sign-off matrix
| Gate | Required signers |
|---|
| 1 — Documentation | Tech Lead, Product Owner |
| 2 — Code quality | CI (automated) + Tech Lead |
| 3 — Security | Security Lead, DBA, Compliance Officer |
| 4 — Observability | SRE Lead |
| 5 — Operations | SRE Lead, Engineering Manager |
| 6 — Clinical safety | Chief Medical Officer or designee, Clinical Informatics Lead |
Production deploy is blocked until all 6 gate checklists are fully checked.
4. Readiness level targets
| Level | Description | Target milestone |
|---|
| L1 | Service boots; mock provider works; health endpoint 200 | M0 |
| L2 | Single-provider (Anthropic) assist + provenance live in staging | M1 |
| L3 | HITL queue live; moderation live; all consumer cutover complete | M2 |
| L4 | Multi-provider; per-tenant quota; all 6 gates green; production live | M3 |