file-storage-service — SERVICE_READINESS
Companion: SERVICE_OVERVIEW · DEPLOYMENT_TOPOLOGY · SERVICE_RISK_REGISTER · DoD
This is the gate that decides whether the service is allowed into a given environment (dev, staging, pilot, GA). Each environment progressively raises the bar. This document is the checklist on the door: it must be satisfied, with evidence, before promotion.
1. Readiness levels
| Level | Audience | Stable allowed? | SLO published? | On-call? |
|---|
| L0 — Dev | engineers, integration tests | no | n/a | no |
| L1 — Staging | internal demo, QA | no | shadow | best-effort |
| L2 — Pilot | first paying tenant(s) | yes (limited) | yes | yes (24×5) |
| L3 — GA | all tenants | yes | yes | yes (24×7) |
| L4 — Mature | regulated workloads, multi-region | yes | tightened | yes (24×7 + DR drills) |
Today, file-storage-service targets L2 → L3 transition for Phase 1 GA.
2. Domain readiness
| # | Item | Status check | Evidence |
|---|
| D1 | Bounded context documented | SERVICE_OVERVIEW exists, DOMAIN_MODEL exists | services/file-storage-service/SERVICE_OVERVIEW.md, DOMAIN_MODEL.md |
| D2 | Aggregates and invariants enumerated | DOMAIN_MODEL §3-§8 | code references to invariant guards |
| D3 | Domain errors mapped to platform error codes | DOMAIN_MODEL §10, ERROR_CODES.md | MELMASTOON.FILE.* registered |
| D4 | Tenant prefix invariant tested | tenant-isolation.spec.ts includes ObjectKey check | unit + integration test green |
| D5 | Domain events versioned and registered | events/file-storage/REGISTRY.md lists all .v1 topics | CI check passes |
3. API readiness
| # | Item | Status check | Evidence |
|---|
| A1 | OpenAPI committed and lint-clean | openapi/v1.yaml, spectral 0 errors | CI artifact |
| A2 | All endpoints implemented | API_CONTRACTS endpoints have unit + integration coverage | coverage report |
| A3 | Idempotency middleware applied | unit IdempotencyMiddleware.spec.ts | passes |
| A4 | Tenant header enforced | TenantContextGuard.spec.ts | passes |
| A5 | Pact contracts published for downstream consumers | property-service, billing-service, notification-service, reservation-service have Pact files | Pact broker green |
| A6 | Error responses follow MELMASTOON.<DOMAIN>.<CODE> | error-response.spec.ts | passes |
| A7 | Rate limit configuration documented | API_CONTRACTS §11 | per-endpoint table reviewed |
| A8 | Auth scopes and roles documented | API_CONTRACTS §3, SECURITY_MODEL §5 | reviewed |
4. Event readiness
| # | Item | Status check | Evidence |
|---|
| E1 | Outbox + relay deployed | worker:relay job in helm chart | helm get values |
| E2 | Inbox + dedupe verified | outbox.spec.ts, inbox.spec.ts | passes |
| E3 | Event JSON Schemas committed | events/file-storage/*.json schemas | repo |
| E4 | Schema-compat CI gate | events:compatibility-check | passes |
| E5 | DLQ subscriptions present | terraform pubsub_dlq.tf per topic | applied |
| E6 | Consumer expectations documented | EVENT_SCHEMAS §11 | reviewed |
| E7 | Consumed events handled idempotently | inbox.spec covers each consumed topic | passes |
5. Data readiness
| # | Item | Status check | Evidence |
|---|
| DA1 | DDL committed under migrations/ | initial migration applied to staging | kysely_migrations_log |
| DA2 | RLS enabled and forced on all tenant tables | tenant-isolation.spec.ts covers every table | passes |
| DA3 | ID prefix conventions enforced via CHECK | DDL includes CHECK (id LIKE 'med_%') etc. | DDL review |
| DA4 | Indexes for hot queries present | DATA_MODEL §6 | EXPLAIN review |
| DA5 | Backups and PITR configured | Cloud SQL automated backups + 7d PITR | screenshot in DR runbook |
| DA6 | BigQuery export wired | Datastream pipeline live, freshness < 5 min | dashboard |
| DA7 | CMEK applied to private bucket | Storage > Bucket > Encryption shows CMEK | screenshot |
6. Security readiness
| # | Item | Status check | Evidence |
|---|
| S1 | Secrets in Secret Manager only | grep -r 'process.env.*SECRET' returns 0 hardcoded | scan |
| S2 | mTLS enforced for service-to-service | mesh policy STRICT | mesh config |
| S3 | Per-tenant prefix invariant has 3 layers | DOMAIN_MODEL §6, persistence CHECK, GCS Conditions | code review |
| S4 | Signed URLs scoped + revocable | SECURITY_MODEL §6 | code review |
| S5 | Quarantined files cannot be downloaded | FsmGuard.spec.ts | passes |
| S6 | DSR / GDPR erasure E2E test | dsr-erasure.e2e.spec.ts | passes |
| S7 | Security bounty hunter scan run | report attached to release | report |
| S8 | Dependency scan clean (no high CVEs) | pnpm audit --prod, Snyk green | report |
| S9 | Threat model reviewed by security-reviewer | comment on PR / Linear issue | link |
| S10 | EXIF scrub verified | unit + e2e | passes |
7. Observability readiness
| # | Item | Status check | Evidence |
|---|
| O1 | /healthz and /readyz implemented | curl returns 200 | check |
| O2 | /metrics exposes documented Prometheus metrics | OBSERVABILITY §3 | scraped in dev |
| O3 | OpenTelemetry traces emitted with required attributes | OBSERVABILITY §4 | trace example |
| O4 | Structured logs (pino JSON) with required fields | sample log line | review |
| O5 | Dashboards published in Grafana | OBSERVABILITY §5 | dashboard URL |
| O6 | SLOs published and alert rules deployed | OBSERVABILITY §2, §7 | terraform applied |
| O7 | PagerDuty rotation configured | DEPLOYMENT_TOPOLOGY §6 | PagerDuty schedule |
| O8 | Audit logs export to BigQuery | OBSERVABILITY §6 | sample query |
8. Failure handling readiness
| # | Item | Status check | Evidence |
|---|
| F1 | Failure modes catalog complete | FAILURE_MODES exists | this doc |
| F2 | Each failure has a runbook URL | FAILURE_MODES table column | runbooks repo |
| F3 | Compensating actions for sagas implemented | APPLICATION_LOGIC §8 | code |
| F4 | DLQ alerts wired | OBSERVABILITY §7 alert FileStorage_DLQGrowth | alert active |
| F5 | Game day drill executed (region failover) | drill report | report |
| F6 | Quarantine override flow exercised | manual run-through | runbook |
| F7 | Erasure failure path exercised | replay test | report |
9. Sync (desktop) readiness
| # | Item | Status check | Evidence |
|---|
| SY1 | SYNC_CONTRACT exists and matches code | SYNC_CONTRACT.md | doc |
| SY2 | Read-through cache TTL behavior verified | desktop integration test | passes |
| SY3 | Offline outbox queue tested | low-bandwidth e2e | passes |
| SY4 | Resumable upload survives reconnect | desktop chaos test | passes |
| SY5 | Renderer never bypasses BFF auth | code review (no direct GCS bytes from renderer except resumable session URL it received from API) | review |
10. Testing readiness
| # | Item | Status check | Evidence |
|---|
| T1 | Unit coverage ≥ 85% lines / 85% branches in domain | coverage report | CI artifact |
| T2 | Integration tests cover all use cases | integration coverage > 80% | report |
| T3 | Mandatory tests present: tenant-isolation, outbox, inbox | TESTING_STRATEGY §3 | passes |
| T4 | Pact contracts in CI for all consumers | broker green | broker URL |
| T5 | E2E happy path covers upload→scan→optimize→download | e2e/upload-flow.spec.ts | passes |
| T6 | Performance baseline collected | perf/baseline.json checked in | file |
| T7 | Chaos test executed (region brownout) | report | report |
11. Deployment readiness
| # | Item | Status check | Evidence |
|---|
| DE1 | IaC committed (terraform + helm) | infra/terraform/file-storage/ | repo |
| DE2 | Pipelines pass on develop and main | CI dashboard | green |
| DE3 | Image SBOM generated and signed (cosign) | provenance attestation | artifact |
| DE4 | Canary + auto-rollback enabled | DEPLOYMENT_TOPOLOGY §10 | config |
| DE5 | Blue/green capable for major changes | doc | doc |
| DE6 | Resource budgets set (CPU/memory limits) | k8s/Cloud Run config | config |
| DE7 | Network policies / VPC SC enforced | terraform | applied |
12. Compliance readiness
| # | Item | Status check | Evidence |
|---|
| CO1 | DPA template covers blob storage on GCS in EU | legal | DPA appendix |
| CO2 | DPIA performed for pii_id_scan scope | legal sign-off | DPIA doc |
| CO3 | GDPR DSR runbook exists | runbook | runbook |
| CO4 | Audit log retention 7 y enforced | bucket lock | screenshot |
| CO5 | Tax invoice retention 10 y enforced | retention policies in DB | dump |
| CO6 | Data residency stated and enforced | bucket region constraint | config |
13. Documentation readiness
| # | Item | Status check | Evidence |
|---|
| DC1 | All 17 service docs committed and current | services/file-storage-service/*.md | this directory |
| DC2 | Top-level summary in docs/03-microservices/file-storage-service.md | exists, ≥ 150 lines | file |
| DC3 | Runbooks linked from FAILURE_MODES exist | runbooks repo | review |
| DC4 | API consumer onboarding guide exists | docs/05-api-design.md includes example | doc |
| DC5 | ADRs reference file-storage where applicable | docs/architecture/ | search |
14. Sign-offs
| Role | Name | Date | Sign-off |
|---|
| Service tech lead | | | |
| Database reviewer | | | |
| Security reviewer | | | |
| SRE / on-call lead | | | |
| Product owner | | | |
| Compliance / DPO | | | |
| Platform architecture | | | |
A promotion (Pilot → GA) requires all rows ticked. Defects discovered post-sign-off must either be fixed or accepted in SERVICE_RISK_REGISTER with a remediation deadline.
15. GA exit criteria (from Pilot)
In addition to the above:
- Pilot tenant has been live ≥ 4 weeks with no P1 incident attributable to file-storage.
- SLO error budget for the Pilot period is ≥ 50% unburned.
- All Pilot postmortem actions are closed or accepted with mitigation.
- 2 unannounced game-days passed (region failover, ClamAV outage).
- Cost per file (GCS + compute + AI) within ±20% of forecast.
16. Recovery posture (always-on after L2)
| Property | Target |
|---|
| RTO (regional) | 30 min |
| RPO | ≤ 5 min |
| Backups verified | weekly restore drill into staging |
| Last DR drill | (filled at sign-off) |
| Last cross-tenant audit | (filled at sign-off) |
17. Continuous readiness
After GA, this document is revalidated quarterly by the service tech lead. Any "no" answer downgrades the readiness level until remediated, and an automatic Linear issue is opened tagged service:file-storage,readiness.