Skip to main content

tenant-service — SERVICE_READINESS

Cross-cutting readiness conventions live in SERVICE_TEMPLATE. This file records production-readiness criteria for tenant-service.

tenant-service is a tier-1 dependency for every other service on the platform. Its readiness gates are stricter than for non-PDP services; nothing ships to prod without all canonical gates green.


1. Milestone Plan

MilestoneScopeExit criteria
M0 — SkeletonNestJS scaffold, domain primitives, OpenAPI scaffold, CI pipeline/healthz + /readyz return 200
M1 — Tenant + ConfigTenant and TenantConfig aggregates; provision, suspend, reactivate, close (no saga); REST + events; outbox/inboxG1, G2, G3, G6 green
M2 — Memberships + RolesMembership, Role, RoleAssignment, Invitation; system role seed; RoleEscalationGuard; OwnerProtectionServiceG1-G6 green; two-tenant simulator green; ABAC fuzz green
M3 — Org TreeOrganizationUnit with ltree; chain restructuring saga (MoveProperty); property-scope ABACG1-G6 green; saga test suite green
M4 — Operational RobustnessCascade-delete saga; sync surface for desktop; feature flags; billing-contact; AI advisory hooksG1-G8 green; load test passes; chaos drill green
M5 — GAAll canonical gates green; pen-test passed; runbooks complete; on-call trainedProduction cutover signed off by platform team

2. Canonical Gates (G1-G8)

The platform standards define eight canonical gates for every service. Below are the per-tenant-service evidence pointers.

G1 — Domain

  • All aggregates from DOMAIN_MODEL implemented with invariants.
  • State machines covered by unit tests (≥ 1 per illegal transition).
  • OwnerProtectionService and RoleEscalationGuard tested.
  • Property-based tests for OrgUnit.move, Invitation.accept, RoleAssignment scope narrowing.

G2 — API

  • Every endpoint in API_CONTRACTS implemented.
  • OpenAPI spec emitted, committed, and validated against contract tests.
  • Pact provider tests pass for bff-backoffice-service, bff-tenant-booking-service, notification-service, billing-service.
  • Problem+JSON conformance for every error code in API_CONTRACTS §13.

G3 — Events

  • Every published event in EVENT_SCHEMAS has a JSON Schema, fixture, and contract test.
  • Outbox writes in same tx as domain mutation (verified by outbox.spec.ts).
  • Inbox dedup verified (inbox.spec.ts).
  • Subscriber for iam.user.registered.v1, iam.user.deleted.v1, billing.subscription.cancelled.v1, billing.subscription.reactivated.v1 implemented and idempotent.
  • DLQ topic configured per source topic; alerts routed.

G4 — Sync

  • Pull surface for tenant_config, org_units, my_membership, role_catalog, my_role_assignments, feature_flags works against the desktop fixture.
  • Push surface accepts the limited TenantConfig PATCH and FeatureFlag toggle; rejects role/membership writes with MELMASTOON.SYNC.ONLINE_REQUIRED.
  • Cursor-too-old returns 410 with full-resync hint.
  • Conflict policy lww+diff for feature flags audited.

G5 — AI

  • AIClient adapter wired to ai-orchestrator-service.
  • Invite-abuse classifier integrated as advisory.
  • Bulk-removal anomaly detector wired.
  • Provenance recorded for every AI-influenced action.
  • Feature-flag aiEnabled correctly disables every surface.

G6 — Observability

  • Metrics from OBSERVABILITY §2 emitted and visible in Cloud Monitoring.
  • All six dashboards built and shared with the team.
  • All P1/P2 alerts wired with PagerDuty routing.
  • Synthetic monitor + two-tenant canary running every 60 s / nightly.
  • Runbooks committed in services/tenant-service/runbooks/ for every alert.

G7 — Performance

  • k6 scenarios from TESTING_STRATEGY §8 green at SLO budgets.
  • authz_check p95 ≤ 15 ms (cache hit), ≤ 50 ms (miss) at 5 000 rps.
  • tenant_config_read p95 ≤ 25 ms at 2 000 rps.
  • Outbox lag p95 ≤ 2 s at 1 000 events/s.
  • No memory leak across a 4-hour soak test.

G8 — Security

  • Two-tenant simulator green on every PR + nightly canary.
  • ABAC fuzz green.
  • OWASP ASVS L2 self-assessment passed.
  • External pen-test report received and findings addressed (or accepted with explicit risk).
  • CMEK enforced on Cloud SQL; secrets in Secret Manager only.
  • pgaudit enabled on sensitive tables.
  • Audit-event sink to BigQuery verified.

3. SLOs (recap)

SLITarget
tenant.config availability99.95 %
authz.check availability99.99 %
authz.check p95 (cache hit)≤ 15 ms
tenant.config p95 (cache hit)≤ 25 ms
tenant.config PATCH p95≤ 200 ms
Outbox dispatch lag p95≤ 2 s
Two-tenant isolation100 % (zero cross-tenant reads)

4. Definition of Done (per PR)

  • All checks in TESTING_STRATEGY §11 green.
  • OpenAPI spec regenerated and committed if the API changed.
  • Event JSON Schemas updated and fixtures regenerated if events changed.
  • Migration applied locally and rolled back cleanly if it has a down script.
  • Two-tenant simulator green.
  • At least one new test added if behavior changed.
  • Runbook updated if operational behavior changed.
  • Doc updated (this bundle) if any contract changed.

5. Release Checklist (per Cloud Run revision)

  • CI is green on the merge commit.
  • Image signed via Binary Authorization, scanned by Trivy with no high-severity findings.
  • Migration scripts applied in staging without errors; tested rollback path documented.
  • Smoke tests against staging green.
  • Synthetic monitor on staging green for ≥ 15 min.
  • Canary at 5 % production traffic for ≥ 30 min with no SLO regression.
  • Canary at 50 % for ≥ 30 min.
  • Full rollout to 100 %.
  • Release tagged in Git; release notes posted to platform channel.
  • On-call notified.

6. Production Cutover Sign-off (M5 → GA)

Required signatures (recorded in releases/v1.0.md):

  • Service tech lead
  • Platform tech lead
  • Security on-call
  • Compliance officer
  • Customer-success lead (for first chain customers)

Without all five, GA is not declared.


7. Post-GA Operating Rhythm

  • Monthly SLO review.
  • Quarterly chaos drill (Cloud SQL failover, region failover, outbox burst).
  • Quarterly security review of role/permission registry.
  • Quarterly bias review of AI invite-classifier outputs.
  • Annual external pen-test.