ai-orchestrator-service — Service Readiness
Companion to:
docs/standards/SERVICE_TEMPLATE.md·OBSERVABILITY.md·SECURITY_MODEL.md
The readiness checklist gates promotion of ai-orchestrator-service to production. Each item is binary (yes/no) and links to the artifact that proves it.
1. Documentation
| Item | Status |
|---|---|
SERVICE_OVERVIEW.md complete | yes |
DOMAIN_MODEL.md enumerates all aggregates + invariants | yes |
APPLICATION_LOGIC.md documents every use case + ports | yes |
API_CONTRACTS.md matches generated OpenAPI for v1 | yes |
EVENT_SCHEMAS.md matches the event registry for every published event | yes |
DATA_MODEL.md matches Flyway migrations and includes RLS policies | yes |
SYNC_CONTRACT.md describes desktop snapshot and push payloads | yes |
SECURITY_MODEL.md includes auth, multi-tenancy, redaction, egress rules | yes |
OBSERVABILITY.md lists all OTel metrics and SLOs | yes |
TESTING_STRATEGY.md lists eval suites and red-team coverage | yes |
DEPLOYMENT_TOPOLOGY.md describes Cloud Run + Vertex region choices | yes |
FAILURE_MODES.md covers all error codes with responses | yes |
LOCAL_DEV_SETUP.md is reproducible from a clean clone in ≤ 10 min | yes |
MIGRATION_PLAN.md is current per release | yes |
2. Code & quality
| Item | Threshold | Status |
|---|---|---|
| Unit test coverage | ≥ 90% lines, ≥ 85% branches | gated in CI |
Integration tests IT-AI-001..015 | all green | gated in CI |
| Red-team suite | all green | nightly + gated on prompt promotions |
| Edge replay nightly | passes 2 nights in a row | gated for release tags |
| ESLint / TS strict | 0 errors | gated in CI |
| Dependency audit | 0 high/critical | gated in CI |
| Secret scanner | 0 findings | gated in CI |
| Performance test | 200 RPS sustained, p95 ≤ 2.5 s | required for release tags |
3. Operations
| Item | Status |
|---|---|
PagerDuty service ai-orchestrator exists with on-call rotation | yes |
Runbooks for every error code in ERROR_CODES.md | yes |
Dashboard AI Orchestrator — Golden published in Cloud Monitoring | yes |
Dashboard AI Token Cost published in Looker Studio | yes |
BigQuery ai_calls_fact schema reviewed by data-eng | yes |
| Alert routing reviewed (PagerDuty + Slack + finance) | yes |
| Game-day Q-1 executed within last quarter | yes |
| Backup + restore drill (Cloud SQL PITR) successful within last quarter | yes |
4. Security
| Item | Status |
|---|---|
| Threat model reviewed by security-eng | yes |
| mTLS verified between every consuming service | yes |
| RLS policies enforced; cross-tenant test green | yes |
| Provider keys in Secret Manager; rotation policy 60 d active | yes |
| Manifest signing key in KMS; rotation policy 365 d active | yes |
| VPC-SC perimeter includes the AI project | yes |
| External provider DPAs signed (Anthropic, OpenAI) | yes |
PII flow review — capabilities marked pii_class correctly | yes |
| Cloud Armor rules deployed; DDoS response tested | yes |
5. Data & compliance
| Item | Status |
|---|---|
| Audit log retention 7 years configured | yes |
| Operational data retention configured (180 d hot, archive after) | yes |
Right-to-erasure flow tested end-to-end (consumes tenant.guest.erasure_requested.v1, purges embeddings + provenance with caveats) | yes |
| BAA / DPA referenced from Compliance register | yes |
6. Cost
| Item | Status |
|---|---|
| Per-tenant default budgets configured per tier | yes |
| Per-purpose default budgets configured | yes |
| Daily cost dashboard reviewed by finance-ops weekly | yes |
| Cache hit ratio ≥ 35% in 7-day trailing | yes |
| Top-10 expensive prompts reviewed monthly with optimization tickets | yes |
7. Sign-off
| Role | Owner | Date |
|---|---|---|
| Service tech lead | (to fill) | yyyy-mm-dd |
| AI engineering lead | (to fill) | yyyy-mm-dd |
| Security engineering | (to fill) | yyyy-mm-dd |
| SRE / on-call lead | (to fill) | yyyy-mm-dd |
| Compliance / DPO | (to fill) | yyyy-mm-dd |
Sign-off is renewed every 6 months or upon a major architectural change (e.g., new provider added, schema-breaking migration, new HITL policy class).