Skip to main content

file-storage-service — SERVICE_READINESS

Companion: SERVICE_OVERVIEW · DEPLOYMENT_TOPOLOGY · SERVICE_RISK_REGISTER · DoD

This is the gate that decides whether the service is allowed into a given environment (dev, staging, pilot, GA). Each environment progressively raises the bar. This document is the checklist on the door: it must be satisfied, with evidence, before promotion.

1. Readiness levels

LevelAudienceStable allowed?SLO published?On-call?
L0 — Devengineers, integration testsnon/ano
L1 — Staginginternal demo, QAnoshadowbest-effort
L2 — Pilotfirst paying tenant(s)yes (limited)yesyes (24×5)
L3 — GAall tenantsyesyesyes (24×7)
L4 — Matureregulated workloads, multi-regionyestightenedyes (24×7 + DR drills)

Today, file-storage-service targets L2 → L3 transition for Phase 1 GA.

2. Domain readiness

#ItemStatus checkEvidence
D1Bounded context documentedSERVICE_OVERVIEW exists, DOMAIN_MODEL existsservices/file-storage-service/SERVICE_OVERVIEW.md, DOMAIN_MODEL.md
D2Aggregates and invariants enumeratedDOMAIN_MODEL §3-§8code references to invariant guards
D3Domain errors mapped to platform error codesDOMAIN_MODEL §10, ERROR_CODES.mdMELMASTOON.FILE.* registered
D4Tenant prefix invariant testedtenant-isolation.spec.ts includes ObjectKey checkunit + integration test green
D5Domain events versioned and registeredevents/file-storage/REGISTRY.md lists all .v1 topicsCI check passes

3. API readiness

#ItemStatus checkEvidence
A1OpenAPI committed and lint-cleanopenapi/v1.yaml, spectral 0 errorsCI artifact
A2All endpoints implementedAPI_CONTRACTS endpoints have unit + integration coveragecoverage report
A3Idempotency middleware appliedunit IdempotencyMiddleware.spec.tspasses
A4Tenant header enforcedTenantContextGuard.spec.tspasses
A5Pact contracts published for downstream consumersproperty-service, billing-service, notification-service, reservation-service have Pact filesPact broker green
A6Error responses follow MELMASTOON.<DOMAIN>.<CODE>error-response.spec.tspasses
A7Rate limit configuration documentedAPI_CONTRACTS §11per-endpoint table reviewed
A8Auth scopes and roles documentedAPI_CONTRACTS §3, SECURITY_MODEL §5reviewed

4. Event readiness

#ItemStatus checkEvidence
E1Outbox + relay deployedworker:relay job in helm charthelm get values
E2Inbox + dedupe verifiedoutbox.spec.ts, inbox.spec.tspasses
E3Event JSON Schemas committedevents/file-storage/*.json schemasrepo
E4Schema-compat CI gateevents:compatibility-checkpasses
E5DLQ subscriptions presentterraform pubsub_dlq.tf per topicapplied
E6Consumer expectations documentedEVENT_SCHEMAS §11reviewed
E7Consumed events handled idempotentlyinbox.spec covers each consumed topicpasses

5. Data readiness

#ItemStatus checkEvidence
DA1DDL committed under migrations/initial migration applied to stagingkysely_migrations_log
DA2RLS enabled and forced on all tenant tablestenant-isolation.spec.ts covers every tablepasses
DA3ID prefix conventions enforced via CHECKDDL includes CHECK (id LIKE 'med_%') etc.DDL review
DA4Indexes for hot queries presentDATA_MODEL §6EXPLAIN review
DA5Backups and PITR configuredCloud SQL automated backups + 7d PITRscreenshot in DR runbook
DA6BigQuery export wiredDatastream pipeline live, freshness < 5 mindashboard
DA7CMEK applied to private bucketStorage > Bucket > Encryption shows CMEKscreenshot

6. Security readiness

#ItemStatus checkEvidence
S1Secrets in Secret Manager onlygrep -r 'process.env.*SECRET' returns 0 hardcodedscan
S2mTLS enforced for service-to-servicemesh policy STRICTmesh config
S3Per-tenant prefix invariant has 3 layersDOMAIN_MODEL §6, persistence CHECK, GCS Conditionscode review
S4Signed URLs scoped + revocableSECURITY_MODEL §6code review
S5Quarantined files cannot be downloadedFsmGuard.spec.tspasses
S6DSR / GDPR erasure E2E testdsr-erasure.e2e.spec.tspasses
S7Security bounty hunter scan runreport attached to releasereport
S8Dependency scan clean (no high CVEs)pnpm audit --prod, Snyk greenreport
S9Threat model reviewed by security-reviewercomment on PR / Linear issuelink
S10EXIF scrub verifiedunit + e2epasses

7. Observability readiness

#ItemStatus checkEvidence
O1/healthz and /readyz implementedcurl returns 200check
O2/metrics exposes documented Prometheus metricsOBSERVABILITY §3scraped in dev
O3OpenTelemetry traces emitted with required attributesOBSERVABILITY §4trace example
O4Structured logs (pino JSON) with required fieldssample log linereview
O5Dashboards published in GrafanaOBSERVABILITY §5dashboard URL
O6SLOs published and alert rules deployedOBSERVABILITY §2, §7terraform applied
O7PagerDuty rotation configuredDEPLOYMENT_TOPOLOGY §6PagerDuty schedule
O8Audit logs export to BigQueryOBSERVABILITY §6sample query

8. Failure handling readiness

#ItemStatus checkEvidence
F1Failure modes catalog completeFAILURE_MODES existsthis doc
F2Each failure has a runbook URLFAILURE_MODES table columnrunbooks repo
F3Compensating actions for sagas implementedAPPLICATION_LOGIC §8code
F4DLQ alerts wiredOBSERVABILITY §7 alert FileStorage_DLQGrowthalert active
F5Game day drill executed (region failover)drill reportreport
F6Quarantine override flow exercisedmanual run-throughrunbook
F7Erasure failure path exercisedreplay testreport

9. Sync (desktop) readiness

#ItemStatus checkEvidence
SY1SYNC_CONTRACT exists and matches codeSYNC_CONTRACT.mddoc
SY2Read-through cache TTL behavior verifieddesktop integration testpasses
SY3Offline outbox queue testedlow-bandwidth e2epasses
SY4Resumable upload survives reconnectdesktop chaos testpasses
SY5Renderer never bypasses BFF authcode review (no direct GCS bytes from renderer except resumable session URL it received from API)review

10. Testing readiness

#ItemStatus checkEvidence
T1Unit coverage ≥ 85% lines / 85% branches in domaincoverage reportCI artifact
T2Integration tests cover all use casesintegration coverage > 80%report
T3Mandatory tests present: tenant-isolation, outbox, inboxTESTING_STRATEGY §3passes
T4Pact contracts in CI for all consumersbroker greenbroker URL
T5E2E happy path covers upload→scan→optimize→downloade2e/upload-flow.spec.tspasses
T6Performance baseline collectedperf/baseline.json checked infile
T7Chaos test executed (region brownout)reportreport

11. Deployment readiness

#ItemStatus checkEvidence
DE1IaC committed (terraform + helm)infra/terraform/file-storage/repo
DE2Pipelines pass on develop and mainCI dashboardgreen
DE3Image SBOM generated and signed (cosign)provenance attestationartifact
DE4Canary + auto-rollback enabledDEPLOYMENT_TOPOLOGY §10config
DE5Blue/green capable for major changesdocdoc
DE6Resource budgets set (CPU/memory limits)k8s/Cloud Run configconfig
DE7Network policies / VPC SC enforcedterraformapplied

12. Compliance readiness

#ItemStatus checkEvidence
CO1DPA template covers blob storage on GCS in EUlegalDPA appendix
CO2DPIA performed for pii_id_scan scopelegal sign-offDPIA doc
CO3GDPR DSR runbook existsrunbookrunbook
CO4Audit log retention 7 y enforcedbucket lockscreenshot
CO5Tax invoice retention 10 y enforcedretention policies in DBdump
CO6Data residency stated and enforcedbucket region constraintconfig

13. Documentation readiness

#ItemStatus checkEvidence
DC1All 17 service docs committed and currentservices/file-storage-service/*.mdthis directory
DC2Top-level summary in docs/03-microservices/file-storage-service.mdexists, ≥ 150 linesfile
DC3Runbooks linked from FAILURE_MODES existrunbooks reporeview
DC4API consumer onboarding guide existsdocs/05-api-design.md includes exampledoc
DC5ADRs reference file-storage where applicabledocs/architecture/search

14. Sign-offs

RoleNameDateSign-off
Service tech lead
Database reviewer
Security reviewer
SRE / on-call lead
Product owner
Compliance / DPO
Platform architecture

A promotion (Pilot → GA) requires all rows ticked. Defects discovered post-sign-off must either be fixed or accepted in SERVICE_RISK_REGISTER with a remediation deadline.

15. GA exit criteria (from Pilot)

In addition to the above:

  • Pilot tenant has been live ≥ 4 weeks with no P1 incident attributable to file-storage.
  • SLO error budget for the Pilot period is ≥ 50% unburned.
  • All Pilot postmortem actions are closed or accepted with mitigation.
  • 2 unannounced game-days passed (region failover, ClamAV outage).
  • Cost per file (GCS + compute + AI) within ±20% of forecast.

16. Recovery posture (always-on after L2)

PropertyTarget
RTO (regional)30 min
RPO≤ 5 min
Backups verifiedweekly restore drill into staging
Last DR drill(filled at sign-off)
Last cross-tenant audit(filled at sign-off)

17. Continuous readiness

After GA, this document is revalidated quarterly by the service tech lead. Any "no" answer downgrades the readiness level until remediated, and an automatic Linear issue is opened tagged service:file-storage,readiness.