file-storage-service — Service Overview
Companion: Summary · Service Template · Naming · 02 Enterprise Architecture · 05 API Design · 06 Data Models · 07 Security & Tenancy
1. One-paragraph mission
file-storage-service is the single platform API for binary content in Ghasi Melmastoon. It encapsulates Google Cloud Storage behind a tenant-scoped, signed-URL workflow with virus scanning, image optimization, retention enforcement, GDPR erasure, audited access, and per-tenant key-prefix isolation. Producers — property-service (photos), notification-service (PDF attachments), billing-service (invoice PDFs and receipt scans), reservation-service (guest ID scans), lock-integration-service (vendor reports), tenant-service (logos), theme-config-service (theme assets) — interact only through this API and never touch GCS directly. Consumers retrieve content through CDN-fronted public URLs (low-sensitivity assets) or per-request signed URLs (private assets). The service does not generate documents, does not decide what to do with files, and does not own product metadata; it owns bytes, metadata about bytes, and the policies that govern bytes.
2. Bounded Context Position
| Field | Value |
|---|---|
| Bounded Context | Storage |
| Domain Type | Generic / supporting |
| Strategic intent | Reliable, audited, compliance-ready file lifecycle so every other service can offload binary handling to one place |
| Upstream contexts | Identity (caller auth), Tenant (quota + plan), AI Orchestrator (image safety + alt text + EXIF redaction), Lock Integration (vendor report uploads) |
| Downstream contexts | Property, Notification, Billing, Reservation, Theme Config, BFFs (consumer / tenant-booking / backoffice), Search Aggregation (cover image URLs) |
| Pattern with downstream | Open Host Service (signed-URL REST + Published Language events) |
| Shared kernel | TenantId, MediaRef, Sha256, Locale, AIProvenance, RetentionClass from @ghasi/contracts-melmastoon |
3. Capabilities
| Capability | Notes |
|---|---|
| Initiate signed upload | Returns short-lived signed PUT URL bound to a tenant-prefixed object key; chunked / resumable for ≥ 50 MB |
| Confirm upload | Verifies the object exists in GCS, records SHA-256 + bytes + content-type, dispatches scan + optimization |
| Abort upload | Cleans up partial upload; emits upload.failed.v1 |
| Issue signed download URL | Per-request, scoped to a single object, default 5-minute TTL (max 1 h); audited |
| Virus scan | Pluggable adapter (ClamAV self-hosted; Cloud DLP for PII content); passed / failed outcomes; quarantine on fail |
| Image optimization pipeline | Cloud Run job triggered by melmastoon.file.upload.completed.v1; produces WebP + AVIF in thumb (320 px) / hero (1280 px) / full (1920 px) |
| Video transcoding | Phase 3; HLS + thumbnail extract for tutorial / vendor reports |
| Retention policies | Named policy applied at upload (pii_id_scan, tax_compliance, vendor_report_12m, theme_asset, default); enforced by sweeper |
| GDPR erasure | Cascading hard-delete by guest ID, tenant ID, or property scope; signed audit certificate emitted |
| Access audit | Every signed-URL issuance + every successful access is logged via Cloud Storage access logs joined with our access_grants table |
| Per-tenant quota | Bytes + object count caps per tenant; alert at 80 % / 95 %; block at 100 % |
| CDN cache integration | Cloud CDN in front of public bucket; invalidation on delete + variant publish |
| Duplicate detection | SHA-256 dedupe per (tenant_id, scope); second upload becomes an alias to the first object |
| Soft-delete + hard-delete pipeline | archived for 30 days, then purged unless retention policy says longer |
4. Non-Capabilities (explicitly out of scope)
| Capability | Owned by |
|---|---|
| Generating invoices, PDFs, or notification bodies | billing-service, notification-service |
| Photo curation (alt text, ordering, hero selection) | property-service, theme-config-service |
| Theme tokens and brand kit | theme-config-service |
| Guest identity and ID document parsing | iam-service, reservation-service |
| Cross-tenant search / discoverability of files | search-aggregation-service |
| Live face / liveness detection on ID scans | future kyc-service |
| ClamAV / DLP cluster operation (the worker layer) | platform infra (this service holds the adapter, not the cluster) |
5. Architecture (Clean / Hexagonal)
file-storage-service/
└── src/
├── domain/ # pure: aggregates, VOs, domain events, invariants
│ ├── file-object/
│ │ ├── FileObject.ts # FileObject aggregate root
│ │ ├── FileStatus.ts # VO enum + transitions
│ │ ├── ContentType.ts # VO with allow-list per scope
│ │ ├── Sha256.ts # VO with format validation
│ │ ├── ByteSize.ts # VO with budget per scope
│ │ └── events/ # FileInitiated, FileUploaded, FileQuarantined, FileDeleted
│ ├── upload-session/
│ │ ├── UploadSession.ts
│ │ ├── UploadSessionStatus.ts
│ │ └── events/
│ ├── variant/
│ │ ├── Variant.ts # Image/video derivative
│ │ ├── VariantPreset.ts # 'thumb' | 'hero' | 'full' | 'hls_720p'
│ │ └── events/
│ ├── scan-result/
│ │ ├── ScanResult.ts
│ │ ├── ScanVerdict.ts # 'passed' | 'failed' | 'inconclusive'
│ │ └── events/
│ ├── retention-policy/
│ │ ├── RetentionPolicy.ts
│ │ └── RetentionClass.ts # 'operational' | 'regulated' | 'audit'
│ ├── bucket/
│ │ ├── Bucket.ts # logical bucket: { gcsBucket, prefix, dataClass }
│ │ └── DataClass.ts # 'public_media' | 'private' | 'archive'
│ ├── access-grant/
│ │ ├── AccessGrant.ts # signed-URL issuance audit row
│ │ └── events/
│ └── shared/
│ ├── FileObjectId.ts # branded `med_…`
│ ├── UploadSessionId.ts # `ups_…`
│ ├── VariantId.ts # `var_…`
│ ├── ScanResultId.ts # `scn_…`
│ ├── RetentionPolicyId.ts # `ret_…`
│ ├── BucketId.ts # `bkt_…`
│ ├── AccessGrantId.ts # `grt_…`
│ └── errors/
├── application/ # use cases, ports, CQRS handlers
│ ├── ports/
│ │ ├── BlobStoragePort.ts # signed URL, head, copy, delete, list
│ │ ├── ScanPort.ts # request virus scan
│ │ ├── ImageOptimizerPort.ts # request variant build
│ │ ├── VideoTranscoderPort.ts # phase 3
│ │ ├── CdnInvalidationPort.ts
│ │ ├── KmsPort.ts # CMEK lookup for private bucket
│ │ ├── EventPublisher.ts # outbox publisher
│ │ ├── FileObjectRepository.ts
│ │ ├── UploadSessionRepository.ts
│ │ ├── VariantRepository.ts
│ │ ├── ScanResultRepository.ts
│ │ ├── AccessGrantRepository.ts
│ │ ├── RetentionPolicyRepository.ts
│ │ ├── QuotaPort.ts # current usage + cap
│ │ ├── AIClient.ts # safety classify, alt text, OCR redact
│ │ └── Clock.ts
│ ├── commands/ # one file per command + handler
│ │ ├── initiate-upload.use-case.ts
│ │ ├── confirm-upload.use-case.ts
│ │ ├── abort-upload.use-case.ts
│ │ ├── issue-download-url.use-case.ts
│ │ ├── delete-file.use-case.ts
│ │ ├── restore-file.use-case.ts
│ │ ├── apply-retention.use-case.ts
│ │ ├── erase-by-guest.use-case.ts
│ │ └── erase-by-tenant.use-case.ts
│ ├── queries/
│ │ ├── get-file-metadata.query.ts
│ │ ├── list-variants.query.ts
│ │ ├── get-quota.query.ts
│ │ └── get-access-log.query.ts
│ └── policies/ # quota check, scan-passed gate, prefix invariant
├── infrastructure/ # adapters
│ ├── postgres/
│ │ ├── FileObjectRepositoryPg.ts
│ │ ├── UploadSessionRepositoryPg.ts
│ │ ├── VariantRepositoryPg.ts
│ │ ├── ScanResultRepositoryPg.ts
│ │ ├── AccessGrantRepositoryPg.ts
│ │ ├── RetentionPolicyRepositoryPg.ts
│ │ ├── QuotaRepositoryPg.ts
│ │ ├── OutboxRepositoryPg.ts
│ │ ├── InboxRepositoryPg.ts
│ │ └── tenant-context.ts # SET LOCAL app.tenant_id
│ ├── gcs/
│ │ └── GcsBlobStorageAdapter.ts # uses @google-cloud/storage; signed V4 URLs
│ ├── scan/
│ │ ├── ClamAvScanAdapter.ts # via internal HTTP scan worker
│ │ └── CloudDlpScanAdapter.ts # for PII / sensitive scopes
│ ├── optimizer/
│ │ ├── PubSubImageOptimizerAdapter.ts # publishes job to Cloud Run optimizer
│ │ └── PubSubVideoTranscoderAdapter.ts
│ ├── cdn/
│ │ └── CloudCdnInvalidationAdapter.ts
│ ├── kms/
│ │ └── GcpKmsAdapter.ts
│ ├── pubsub/
│ │ ├── EventPublisherPubSub.ts
│ │ └── consumers/ # tenant.deleted, guest.erasure_requested, property.photo.removed
│ ├── ai/
│ │ └── AIClientHttpAdapter.ts
│ └── cache/
│ └── SignedUrlCacheRedis.ts
└── presentation/ # controllers, DTOs, OpenAPI
├── http/
│ ├── UploadsController.ts
│ ├── FilesController.ts
│ ├── DownloadsController.ts
│ ├── QuotasController.ts
│ ├── ErasureController.ts
│ ├── InternalCallbacksController.ts # /internal/v1/files/scan-callback, optimize-callback
│ └── HealthController.ts
└── dto/
├── InitiateUploadDto.ts
├── ConfirmUploadDto.ts
├── DownloadUrlRequestDto.ts
├── FileObjectDto.ts
└── VariantDto.ts
Dependency rule: presentation → application → domain, infrastructure → application (adapters implement ports). domain imports nothing outside itself and @ghasi/contracts-melmastoon shared-kernel VOs. The GCS SDK lives only in infrastructure/gcs/; the rest of the code talks to BlobStoragePort.
6. Tech Stack
| Layer | Choice |
|---|---|
| Language / runtime | TypeScript on Node 20 LTS |
| HTTP framework | NestJS (Fastify adapter) |
| ORM / DB driver | pg + kysely; migrations via node-pg-migrate |
| Validation | zod for DTOs and event payloads |
| Messaging | GCP Pub/Sub (transactional outbox) |
| Cache | Memorystore (Redis 7) — signed URL cache, dedupe lookup, quota counters |
| Blob store | Google Cloud Storage (signed V4 URLs, resumable uploads, CMEK on private bucket) |
| CDN | Google Cloud CDN |
| Image optimization | Cloud Run job (@ghasi/optimizer-worker) using sharp; triggered by Pub/Sub |
| Video transcoding (Phase 3) | Cloud Run + ffmpeg; HLS output |
| Virus scan | Self-hosted ClamAV cluster on GKE (HTTP wrapper) for general scopes; Cloud DLP for pii_* scopes |
| Crypto | KMS for CMEK key reference; crypto.subtle for SHA-256 verification |
| Logging | pino JSON |
| Tracing | OpenTelemetry → Cloud Trace |
| Metrics | OpenTelemetry → Cloud Monitoring |
7. SLOs
| SLI | Target |
|---|---|
| Initiate-upload p95 latency | ≤ 150 ms |
| Confirm-upload p95 latency | ≤ 250 ms |
| Issue-download-url p95 latency | ≤ 120 ms |
Time from upload.completed.v1 → scan.passed.v1 (p95) | ≤ 15 s |
Time from scan.passed.v1 → optimization.completed.v1 (p95) | ≤ 30 s for ≤ 5 MB images |
| CDN-fronted GET p95 (cache hit) | ≤ 80 ms |
| Availability (read) | 99.95 % monthly |
| Availability (write) | 99.9 % monthly |
| Cross-tenant access leak | 0 |
| Retention sweep miss (objects past TTL still readable) | 0 |
| Outbox publish lag (p95) | ≤ 2 s |
8. Quotas / limits per tenant (defaults; overridable by plan)
| Resource | Default cap |
|---|---|
| Total bytes stored | 50 GB |
| Total objects | 100 000 |
| Max object size | 50 MB (chunked above 8 MB) |
| Max video size (Phase 3) | 500 MB |
| Per-photo MIME types | image/jpeg, image/png, image/webp, image/heic |
| Per-document MIME types | application/pdf |
| Per-ID-scan MIME types | image/jpeg, image/png, application/pdf |
| Signed-URL TTL (download) | 5 min default, 60 min max |
| Signed-URL TTL (upload) | 10 min |
| CDN cache TTL (public assets) | 8 h |
| Concurrent open upload sessions per user | 20 |
| Erasure request rate | 10 / min per tenant |
Plan-level overrides come from tenant-service via tenant.plan_changed.v1.
9. Risks Snapshot
| Risk | Mitigation |
|---|---|
| Cross-tenant signed URL leak | Mandatory tenant-prefix in object key; signed URL bound to exact path; integration test verifies impossibility |
| GCS object enumeration | Bucket uniformBucketLevelAccess=true; service account is the only IAM principal; no public list grants |
| Virus scanner bypass | Reads block on scan_pending; quarantine on fail; audit alert if scan SLO breached |
| Quarantine purge regret | Quarantined files retained 30 d for forensic export, then purged |
| GDPR erasure incompletes (CDN cached copies remain) | Erasure cascade includes synchronous CDN invalidation + audit certificate listing all object IDs purged |
| Long-tail of orphaned upload sessions | Cleanup job every 15 min |
| Optimizer worker hot-loop on a poison object | Per-object retry cap (5) → DLQ + alert |
| Retention sweeper falls behind | Lag SLO ≤ 1 h; alert fires at 30 min lag |
| CMEK key rotation breaks decrypts | Versioned KMS keys; old versions retained for ≥ retention horizon |
Full register: SERVICE_RISK_REGISTER.
10. Definition of Ready / Done
- Ready (per story): AC, NFRs, OpenAPI delta, event schema delta, tenancy implications, retention class, AI provenance (if any), observability, runbook entry.
- Done (per story): tests in pyramid (unit, integration, contract, scan-bypass, prefix-isolation), tenant-isolation spec passes, outbox spec passes, OpenAPI lint passes, dashboards updated, ADR if cross-cutting, SECURITY review for any new bucket / IAM surface.
11. Glossary
| Term | Meaning |
|---|---|
FileObject | Aggregate representing one logical file: GCS object pointer + metadata + status + retention class. |
Bucket (logical) | A named pair of (gcsBucket, prefix) plus a DataClass. |
UploadSession | Time-bound signed-URL grant, with chunk metadata for resumable uploads. |
Variant | Derived artifact (e.g., hero.webp) produced by the optimizer. |
ScanResult | Outcome of a virus / DLP scan on a FileObject. |
RetentionPolicy | Named policy with class, minRetention, maxRetention, redactionAfter. |
AccessGrant | Audit row recording a signed-URL issuance. |
Quarantine | A FileObject in status quarantined; reads blocked, retained for forensic export then purged. |
Tenant prefix | Mandatory tenants/{tenantId}/ GCS object key prefix; the tenant-isolation invariant. |