maintenance-service · SERVICE_READINESS
Production cutover gate. Every box must be ticked and signed off by the named role before traffic is permitted on a fresh region or before declaring the service GA.
1. Documentation
-
docs/03-microservices/maintenance-service.mdpublished and linked from index - All 17 service-bundle files present, no
TODOmarkers -
SYNC_CONTRACT.mdcross-referenced from desktop's docs index - Error codes registered in
docs/standards/ERROR_CODES.mdwith HTTP, retriability, i18n, runbooks - ULID prefixes registered in
docs/standards/NAMING.md - Topics + schemas registered in
docs/04-event-driven-architecture.mdtopic registry - OpenAPI v1 published to schema registry; consumers notified
- Runbooks linked from every alert in
OBSERVABILITY.md§6 — links resolve - Sign-off: Tech writer, Domain owner
2. Code quality
- No
TODO/FIXMEcomments indomain/orapplication/ - No
anyindomain/or public DTOs - All public functions JSDoc-documented at the use case level
- No dead code (verified by
ts-prune) - No
console.log; structured logger only -
eslint,tsc --noEmit,prettier --checkall green on main - No circular dependencies (
madge) - No imports from
infrastructure/insideapplication/ordomain/ - Sign-off: Engineering lead
3. Test coverage and gates
- Branch coverage:
domain/≥ 95%,application/≥ 90%,infrastructure/≥ 70%,controllers/≥ 80% - Mandatory specs present and green:
-
integration/tenant-isolation.spec.ts -
integration/outbox.spec.ts -
integration/inbox.spec.ts
-
- State machine: every transition (allowed and denied) covered
- Saga compensation: room-block rejection, vendor no-show, relocation no-inventory, OCC concurrent resolve
- Concurrency: at least one fast-check property test on cost rollup; explicit OCC race spec
- Sync push: OCC, idempotency, partial failure, clock skew specs
- AI: orchestrator timeout, invalid enum, budget exhaustion specs
- Performance: locust scripts pass NFR targets in
OBSERVABILITY.md§3 - Pact contracts published and verified against all consumers (
property,reservation,housekeeping,notification,billing,analytics,audit,bff-backoffice,sync) - OpenAPI diff against previous version: no breaking changes
- Event schema golden tests: TS / JSON Schema / sample payload all in lock-step
- Sign-off: Engineering lead, QA
4. API & event hygiene
- OpenAPI spec auto-generated; CI fails on uncommitted diff
- All public endpoints documented with examples
- All error codes return RFC 7807 envelope
- Idempotency keys honoured on every state-changing endpoint
- Rate limits configured at Kong: 50 RPS per tenant for BFF
- Each event has TS interface + JSON Schema + golden sample + retention class set
- No event has "raw PII" — vendor identifiers used, no plaintext numbers
- Outbox + inbox correctness verified by chaos run
- Sign-off: API governance, Eventing team
5. Storage & migrations
- All tables have RLS enabled with
tenant_idpolicy - All money columns are
bigintmicro-units; nevernumericdecimal - All
CREATE INDEXstatements useCONCURRENTLYin production migrations - Expand-then-contract migrations only; no destructive ops in single step
- PgBouncer in transaction mode in front of Cloud SQL
- Backups: daily full + 7 d PITR; cross-region replica in
europe-west4 - DR runbook exists and was last drilled in staging within the last 90 days
- Sign-off: DBA, SRE
6. Security
- All endpoints behind JWT + tenant header check at Kong
- RLS verified by
tenant-isolation.spec.ts - Workload Identity used for all GCP API calls; no static SA keys
- Secrets in Secret Manager, never in env vars
- CMEK enabled on Cloud SQL and
melmastoon-vendor-invoices/bucket - AI calls go through orchestrator; no model SDK in service
- PII redaction verified before any orchestrator call
- Audit events emitted on every state mutation
- Threat model in
SECURITY_MODEL.md§11 reviewed in last 12 months - Penetration test report on file (last 12 months)
- Sign-off: Security
7. Observability
- OTel tracing enabled with required attributes (per
OBSERVABILITY.md§1) - All logs structured JSON; no PII in logs
- Dashboards published: Service health, Operations, AI, Sync
- Alerts configured per
OBSERVABILITY.md§6 with runbook links - SLOs published in error-budget tracking system
- Synthetic checks running every 60 s from 3 regions
- BigQuery archive sink connected and ingesting
- Sign-off: SRE
8. Deployment
- Two Cloud Run services (api + workers) deployed via CI from
mainonly - Canary release: 10% → 50% → 100% traffic shift with bake intervals
- Migration job runs pre-deploy; failure halts pipeline
- Rollback runbook tested (last quarter)
- Health and readiness probes configured
- Resource limits set (CPU, mem) per
DEPLOYMENT_TOPOLOGY.md§2 - Cloud Scheduler entries deployed for all worker cron jobs
- Outbox relay configured for
maintenance.outboxtable - Sign-off: SRE, Platform
9. Desktop & sync integration
-
SYNC_CONTRACT.mdaccepted by Desktop team - Pull payload size verified at < 200 KB compressed for typical property
- Push commands accept the canonical command set; rejected commands return clear error codes
- Conflict log entry surfaced in desktop "Sync issues" view
- Offline UX walkthrough recorded and approved
- Clock skew warning UX implemented
- Sign-off: Desktop lead
10. AI integration
- All capabilities registered with
ai-orchestrator-service - Provenance persisted on every AI-influenced field
- HITL flow implemented in BFF for severity and vendor-message-draft
- Per-tenant budget configured; soft-cap and hard-cap behaviour verified
- PII redaction verified in audit trail samples
- Model rollback procedure documented and tested
- Sign-off: AI/ML lead
11. Operations
- On-call rotation defined; PagerDuty integration tested
- Runbooks linked from every alert
- Tenant onboarding runbook tested in staging
- Tenant offboarding (data export + erasure) runbook tested
- Customer-support runbooks for common issues drafted (vendor no-show, OOO not lifting, AI suggestion wrong)
- Dependency map up to date in service catalog
- Sign-off: Support, On-call
12. Compliance
- Data retention configured per
DATA_MODEL.md§5 - Vendor invoice files retained 7 years (regulated)
- Audit events retained per platform policy
- GDPR: erasure flow tested for vendor records
- Data residency: confirmed all production storage in
europe-west1 - DPIA on file (Data Protection Impact Assessment)
- Sign-off: Compliance, Legal
13. Capacity & cost
- Capacity plan in
DEPLOYMENT_TOPOLOGY.md§9 reviewed - Cloud SQL sized for 2× expected peak
- Cloud Run min/max instances set per environment
- AI cost guardrails configured per tenant
- BigQuery storage cost projection within budget
- Sign-off: Finance ops, SRE
14. Communication
- Service announcement drafted for internal Slack
- Customer-facing changelog entry drafted
- Backoffice product team trained on the new operations
- Vendor outreach playbook updated (WhatsApp/SMS templates approved)
- Sign-off: Product, Marketing ops
Final cutover sign-off: Engineering lead + SRE + Security + Product + Domain owner (5 signatures). The signed checklist is committed at services/maintenance-service/READINESS_<YYYY-MM-DD>.md and referenced in the release tag annotation.