Skip to main content

SERVICE_READINESS — inventory-service

Sibling: SERVICE_RISK_REGISTER · SECURITY_MODEL · OBSERVABILITY · TESTING_STRATEGY · DEPLOYMENT_TOPOLOGY

Strategic anchor: standards/DEFINITION_OF_DONE

This is the gate the service must clear to be promoted from "in development" to "production-supported." Each row maps to verifiable evidence; placeholders are unacceptable.


1. Status snapshot

DimensionStatusNotes
Domain modelReadyDOMAIN_MODEL — invariants enforced; OCC; state machine documented and tested
Application layerReadyAPPLICATION_LOGIC — every use case has tests; saga participation matrix documented
API contractsReadyAPI_CONTRACTS — OpenAPI emitted; Pact contracts published
Event contractsReadyEVENT_SCHEMAS — JSON Schemas registered; ordering keys defined
Data modelReadyDATA_MODEL — partitioning live; RLS enabled; advisory-lock function deployed
Sync contractReadySYNC_CONTRACT — desktop snapshot endpoint live; offline-arbitration saga implemented and tested
AI integrationReadyAI_INTEGRATION — no critical-path AI; events flow to ai-orchestrator
SecurityReadySECURITY_MODEL — five-layer tenant isolation; threat model reviewed
ObservabilityReadyOBSERVABILITY — dashboards live; alert ladder wired with runbooks
TestingReadyTESTING_STRATEGY — mandatory three present; concurrency suite green; Jepsen-style green
DeploymentReadyDEPLOYMENT_TOPOLOGY — Cloud Run revisions; Cloud Scheduler entries; canary process
Failure modesReadyFAILURE_MODES — runbooks linked from every alert
Risk registerReadySERVICE_RISK_REGISTER — top risks acknowledged + mitigated
Migration planReadyMIGRATION_PLAN — expand/contract gates enforced

2. Production readiness checklist

#ItemOwnerEvidence
1Mandatory three integration tests pass on main (tenant-isolation, outbox, inbox)tech-leadCI green; test/integration/*.spec.ts
2Concurrency suite passes including the double-allocation racetech-leadtest/concurrency/double-allocation-race.spec.ts
3Jepsen-style consistency check passes on RCtech-leadtest/jepsen/ last run id
4OpenAPI snapshot diff clean against last releaseapi-ownerCI step openapi:diff
5Every produced event has a JSON Schema in registry; consumers verifiedevents-ownerPact broker dashboard
6Every consumed event has an inbound contract testevents-ownertest/contract/
7RLS enabled on every inventory table; verified by tenant-isolation.spec.tssecurityDB introspection script
8Advisory-lock function acquire_alloc_lock(tenant_id, …) deployed; integration testtech-leadtest/integration/advisory-lock.spec.ts
9Partition rotation job scheduled and verified to run end-to-endsreCloud Scheduler entry; last successful run
10Calendar extender scheduled; horizon ≥ 540 days for all live propertiessreMetric inventory_calendar_horizon_days_min
11Hold-expiry sweeper scheduled (every 30 s); single-replica enforcedsreCloud Scheduler entry
12Reconciliation job scheduled; drift alert wiredsreCloud Scheduler entry
13Outbox relay live; lag SLO panel green for 7 dayssreDashboard inventory-service: integrity
14DLQs configured with alert RESV-INV-007srePub/Sub config
15All 14 alerts wired with runbooks under runbooks/inventory/sreRunbook directory
16Synthetic checks live for availability/search, walk-in, end-to-end Pub/SubsreCloud Monitoring uptime checks
17RESV-INV-001 (false overbooking) routes to on-call and tech lead simultaneouslysreAlert routing config
18Audit-log emission proven for every state-changing endpointsecurityaudit-service query for sample principals
19IAM service accounts least-privileged; documented in SECURITY_MODEL §4securityIAM policy export
20Break-glass admin endpoints behind two-person rule and auditedsecurityIAM + audit log
21Backups: daily Cloud SQL snapshot + 35-day PITR; restore drill documentedsreBackup config; last drill date
22Region pinning honored; cross-region writes rejectedplatformConnection middleware test
23Load test (500 search / 100 hold concurrent) passed last releasetech-leadk6 report artifact
24Chaos drill (Postgres failover, Pub/Sub blackout, sweeper outage) executed in last 90 dsreChaos report
25All public docs in this folder reviewed and signed offtech-lead + productDoc PR approval log

A row marked "no evidence" blocks promotion.


3. Operational signals to watch on day-1 production

  • inventory_overbooking_actual_total should remain at 0 forever.
  • inventory_allocation_duration_ms p99 < 200 ms.
  • inventory_availability_search_duration_ms{cache="miss"} p99 < 300 ms.
  • inventory_outbox_lag_seconds p99 < 5 s.
  • inventory_hold_expiry_lag_seconds p99 < 30 s.
  • Calendar horizon ≥ 30 days for every live property.
  • DLQs flat at zero on every inbound subject.
  • inventory_partition_rotation_total{outcome="success"} = 1 each day after 02:30 UTC.

4. Sign-off

RoleSign-offDate
Service tech leadrequired
Platform SRErequired
Security engineerrequired
Product owner (PMS)required
Architecture reviewrequired

A failed promotion sets the service back to "in development" and requires re-running this checklist end-to-end.