SERVICE_READINESS — inventory-service
Sibling: SERVICE_RISK_REGISTER · SECURITY_MODEL · OBSERVABILITY · TESTING_STRATEGY · DEPLOYMENT_TOPOLOGY
Strategic anchor: standards/DEFINITION_OF_DONE
This is the gate the service must clear to be promoted from "in development" to "production-supported." Each row maps to verifiable evidence; placeholders are unacceptable.
1. Status snapshot
| Dimension | Status | Notes |
|---|---|---|
| Domain model | Ready | DOMAIN_MODEL — invariants enforced; OCC; state machine documented and tested |
| Application layer | Ready | APPLICATION_LOGIC — every use case has tests; saga participation matrix documented |
| API contracts | Ready | API_CONTRACTS — OpenAPI emitted; Pact contracts published |
| Event contracts | Ready | EVENT_SCHEMAS — JSON Schemas registered; ordering keys defined |
| Data model | Ready | DATA_MODEL — partitioning live; RLS enabled; advisory-lock function deployed |
| Sync contract | Ready | SYNC_CONTRACT — desktop snapshot endpoint live; offline-arbitration saga implemented and tested |
| AI integration | Ready | AI_INTEGRATION — no critical-path AI; events flow to ai-orchestrator |
| Security | Ready | SECURITY_MODEL — five-layer tenant isolation; threat model reviewed |
| Observability | Ready | OBSERVABILITY — dashboards live; alert ladder wired with runbooks |
| Testing | Ready | TESTING_STRATEGY — mandatory three present; concurrency suite green; Jepsen-style green |
| Deployment | Ready | DEPLOYMENT_TOPOLOGY — Cloud Run revisions; Cloud Scheduler entries; canary process |
| Failure modes | Ready | FAILURE_MODES — runbooks linked from every alert |
| Risk register | Ready | SERVICE_RISK_REGISTER — top risks acknowledged + mitigated |
| Migration plan | Ready | MIGRATION_PLAN — expand/contract gates enforced |
2. Production readiness checklist
| # | Item | Owner | Evidence |
|---|---|---|---|
| 1 | Mandatory three integration tests pass on main (tenant-isolation, outbox, inbox) | tech-lead | CI green; test/integration/*.spec.ts |
| 2 | Concurrency suite passes including the double-allocation race | tech-lead | test/concurrency/double-allocation-race.spec.ts |
| 3 | Jepsen-style consistency check passes on RC | tech-lead | test/jepsen/ last run id |
| 4 | OpenAPI snapshot diff clean against last release | api-owner | CI step openapi:diff |
| 5 | Every produced event has a JSON Schema in registry; consumers verified | events-owner | Pact broker dashboard |
| 6 | Every consumed event has an inbound contract test | events-owner | test/contract/ |
| 7 | RLS enabled on every inventory table; verified by tenant-isolation.spec.ts | security | DB introspection script |
| 8 | Advisory-lock function acquire_alloc_lock(tenant_id, …) deployed; integration test | tech-lead | test/integration/advisory-lock.spec.ts |
| 9 | Partition rotation job scheduled and verified to run end-to-end | sre | Cloud Scheduler entry; last successful run |
| 10 | Calendar extender scheduled; horizon ≥ 540 days for all live properties | sre | Metric inventory_calendar_horizon_days_min |
| 11 | Hold-expiry sweeper scheduled (every 30 s); single-replica enforced | sre | Cloud Scheduler entry |
| 12 | Reconciliation job scheduled; drift alert wired | sre | Cloud Scheduler entry |
| 13 | Outbox relay live; lag SLO panel green for 7 days | sre | Dashboard inventory-service: integrity |
| 14 | DLQs configured with alert RESV-INV-007 | sre | Pub/Sub config |
| 15 | All 14 alerts wired with runbooks under runbooks/inventory/ | sre | Runbook directory |
| 16 | Synthetic checks live for availability/search, walk-in, end-to-end Pub/Sub | sre | Cloud Monitoring uptime checks |
| 17 | RESV-INV-001 (false overbooking) routes to on-call and tech lead simultaneously | sre | Alert routing config |
| 18 | Audit-log emission proven for every state-changing endpoint | security | audit-service query for sample principals |
| 19 | IAM service accounts least-privileged; documented in SECURITY_MODEL §4 | security | IAM policy export |
| 20 | Break-glass admin endpoints behind two-person rule and audited | security | IAM + audit log |
| 21 | Backups: daily Cloud SQL snapshot + 35-day PITR; restore drill documented | sre | Backup config; last drill date |
| 22 | Region pinning honored; cross-region writes rejected | platform | Connection middleware test |
| 23 | Load test (500 search / 100 hold concurrent) passed last release | tech-lead | k6 report artifact |
| 24 | Chaos drill (Postgres failover, Pub/Sub blackout, sweeper outage) executed in last 90 d | sre | Chaos report |
| 25 | All public docs in this folder reviewed and signed off | tech-lead + product | Doc PR approval log |
A row marked "no evidence" blocks promotion.
3. Operational signals to watch on day-1 production
inventory_overbooking_actual_totalshould remain at 0 forever.inventory_allocation_duration_msp99 < 200 ms.inventory_availability_search_duration_ms{cache="miss"}p99 < 300 ms.inventory_outbox_lag_secondsp99 < 5 s.inventory_hold_expiry_lag_secondsp99 < 30 s.- Calendar horizon ≥ 30 days for every live property.
- DLQs flat at zero on every inbound subject.
inventory_partition_rotation_total{outcome="success"}= 1 each day after 02:30 UTC.
4. Sign-off
| Role | Sign-off | Date |
|---|---|---|
| Service tech lead | required | — |
| Platform SRE | required | — |
| Security engineer | required | — |
| Product owner (PMS) | required | — |
| Architecture review | required | — |
A failed promotion sets the service back to "in development" and requires re-running this checklist end-to-end.