Skip to main content

ai-orchestrator-service — Service Readiness

Companion to: docs/standards/SERVICE_TEMPLATE.md · OBSERVABILITY.md · SECURITY_MODEL.md

The readiness checklist gates promotion of ai-orchestrator-service to production. Each item is binary (yes/no) and links to the artifact that proves it.

1. Documentation

ItemStatus
SERVICE_OVERVIEW.md completeyes
DOMAIN_MODEL.md enumerates all aggregates + invariantsyes
APPLICATION_LOGIC.md documents every use case + portsyes
API_CONTRACTS.md matches generated OpenAPI for v1yes
EVENT_SCHEMAS.md matches the event registry for every published eventyes
DATA_MODEL.md matches Flyway migrations and includes RLS policiesyes
SYNC_CONTRACT.md describes desktop snapshot and push payloadsyes
SECURITY_MODEL.md includes auth, multi-tenancy, redaction, egress rulesyes
OBSERVABILITY.md lists all OTel metrics and SLOsyes
TESTING_STRATEGY.md lists eval suites and red-team coverageyes
DEPLOYMENT_TOPOLOGY.md describes Cloud Run + Vertex region choicesyes
FAILURE_MODES.md covers all error codes with responsesyes
LOCAL_DEV_SETUP.md is reproducible from a clean clone in ≤ 10 minyes
MIGRATION_PLAN.md is current per releaseyes

2. Code & quality

ItemThresholdStatus
Unit test coverage≥ 90% lines, ≥ 85% branchesgated in CI
Integration tests IT-AI-001..015all greengated in CI
Red-team suiteall greennightly + gated on prompt promotions
Edge replay nightlypasses 2 nights in a rowgated for release tags
ESLint / TS strict0 errorsgated in CI
Dependency audit0 high/criticalgated in CI
Secret scanner0 findingsgated in CI
Performance test200 RPS sustained, p95 ≤ 2.5 srequired for release tags

3. Operations

ItemStatus
PagerDuty service ai-orchestrator exists with on-call rotationyes
Runbooks for every error code in ERROR_CODES.mdyes
Dashboard AI Orchestrator — Golden published in Cloud Monitoringyes
Dashboard AI Token Cost published in Looker Studioyes
BigQuery ai_calls_fact schema reviewed by data-engyes
Alert routing reviewed (PagerDuty + Slack + finance)yes
Game-day Q-1 executed within last quarteryes
Backup + restore drill (Cloud SQL PITR) successful within last quarteryes

4. Security

ItemStatus
Threat model reviewed by security-engyes
mTLS verified between every consuming serviceyes
RLS policies enforced; cross-tenant test greenyes
Provider keys in Secret Manager; rotation policy 60 d activeyes
Manifest signing key in KMS; rotation policy 365 d activeyes
VPC-SC perimeter includes the AI projectyes
External provider DPAs signed (Anthropic, OpenAI)yes
PII flow review — capabilities marked pii_class correctlyyes
Cloud Armor rules deployed; DDoS response testedyes

5. Data & compliance

ItemStatus
Audit log retention 7 years configuredyes
Operational data retention configured (180 d hot, archive after)yes
Right-to-erasure flow tested end-to-end (consumes tenant.guest.erasure_requested.v1, purges embeddings + provenance with caveats)yes
BAA / DPA referenced from Compliance registeryes

6. Cost

ItemStatus
Per-tenant default budgets configured per tieryes
Per-purpose default budgets configuredyes
Daily cost dashboard reviewed by finance-ops weeklyyes
Cache hit ratio ≥ 35% in 7-day trailingyes
Top-10 expensive prompts reviewed monthly with optimization ticketsyes

7. Sign-off

RoleOwnerDate
Service tech lead(to fill)yyyy-mm-dd
AI engineering lead(to fill)yyyy-mm-dd
Security engineering(to fill)yyyy-mm-dd
SRE / on-call lead(to fill)yyyy-mm-dd
Compliance / DPO(to fill)yyyy-mm-dd

Sign-off is renewed every 6 months or upon a major architectural change (e.g., new provider added, schema-breaking migration, new HITL policy class).