iam-service — Service Readiness
Catalog · SERVICE_OVERVIEW · TESTING_STRATEGY · SECURITY_MODEL
iam-service is T0 / platform-critical. The bar for "ready" is non-negotiable: any RED gate blocks merge to main and prevents promotion past staging.
1. Readiness Levels
| Level | Meaning | Required for |
|---|
| L0 — Sketch | Bundle directories exist; SERVICE_OVERVIEW drafted | Architecture sign-off |
| L1 — Designed | Domain model, API, events, data model, security model documented; OpenAPI/AsyncAPI lint clean | Story estimation |
| L2 — Implemented | Code matches docs; unit + integration tests green; coverage thresholds met | Internal alpha |
| L3 — Hardened | Security tests, load baseline, chaos scenarios, SLOs defined, runbooks complete | Closed beta |
| L4 — Production-ready | All gates green, canary verified, on-call rotation active, DR drilled | Public M0 launch |
| L5 — Multi-region | Active-active deployed, cross-region failover drilled | M2 launch |
Current target: L4 by M0 cutover.
2. Canonical Readiness Gates
Each gate has a state of GREEN / AMBER / RED tracked in infra/readiness/iam-service.yaml.
2.1 Domain & Documentation
| Gate | Criteria |
|---|
docs.bundle_complete | All 17 deep-bundle docs exist + summary doc + linked from 03-microservices |
docs.cross_links_valid | No broken markdown links (CI: markdown-link-check) |
docs.adr_aligned | All ADRs touching iam are linked from SERVICE_OVERVIEW |
domain.aggregates_defined | All seven aggregates have invariants + domain events documented |
domain.ubiquitous_language | Glossary exists in DOMAIN_MODEL |
2.2 API
| Gate | Criteria |
|---|
api.openapi_present | openapi/iam-service.yaml parses, lints clean (Spectral) |
api.error_envelope | Every endpoint returns application/problem+json per ERROR_CODES |
api.error_codes_registered | Every MELMASTOON.IAM.* code is in the registry |
api.versioning | /api/v1/* prefix; Sunset plan for breaking changes |
api.idempotency | All POST mutations accept Idempotency-Key |
api.tracing | traceparent propagated end-to-end |
api.contract_tests | Pact provider verification green |
2.3 Events
| Gate | Criteria |
|---|
events.asyncapi_present | asyncapi/iam-service.yaml parses, lints clean |
events.naming | All subjects use melmastoon.iam.<entity>.<verb>.vN |
events.envelope | All events use the canonical envelope (id, time, source, subject, type, datacontenttype, data) |
events.outbox | Transactional outbox in same Postgres tx as domain row |
events.idempotent_consumers | Inbox dedup table + eventId unique constraint |
events.schema_registry | Schemas published to registry; CI checks compatibility |
2.4 Data
| Gate | Criteria |
|---|
data.migrations_forward_only | All migrations are forward-compatible per MIGRATION_PLAN |
data.rls_enabled | RLS on every tenant-scoped table; CI assertion |
data.indexes_documented | Every index in DATA_MODEL exists in DB and vice-versa (drift check) |
data.partitions_provisioned | audit_events monthly partitions exist for next 6 months |
data.backups_verified | PITR test succeeded within last 30 d |
2.5 Sync
| Gate | Criteria |
|---|
sync.minimal_surface | Only Device entity is in client sync per SYNC_CONTRACT |
sync.signed_deltas | All deltas signed; client verifies Ed25519 signature |
sync.offline_refresh_works | E2E test: 7-d offline grace works with device cert |
2.6 AI
| Gate | Criteria |
|---|
ai.through_orchestrator | iam never calls model endpoints directly; only via ai-orchestrator-service |
ai.fallback_path | Rules-based fallback exists; covered by tests |
ai.provenance | Every decision logged with runId and decisionId |
ai.hitl_for_locks | AI-suggested locks > confidence threshold require admin confirmation |
ai.bias_review | Quarterly review of false-positive rate by tenant geography |
2.7 Observability
| Gate | Criteria |
|---|
obs.logs | Structured logs; redaction enforced |
obs.metrics | RED + auth-specific metrics exposed at /metrics |
obs.traces | traceparent propagated; OTel exporter healthy |
obs.dashboards | IAM SRE + IAM Security + Per-Tenant dashboards live |
obs.slos | SLOs declared in error-budget service; burn alerts wired to PagerDuty |
obs.runbooks | Every alert has a runbook; every failure mode has a runbook |
obs.synthetic | Canary probes from ≥ 3 regions every 60 s |
2.8 Security
| Gate | Criteria |
|---|
sec.threat_model | Threat model reviewed within last 6 months |
sec.crypto | Argon2id, Ed25519 (KMS), TLS 1.3 in place |
sec.audit_logging | All actions in SECURITY_MODEL §12 emit audit rows |
sec.gdpr_participation | Erasure saga end-to-end test green |
sec.pen_test | Annual pen test report on file; criticals closed |
sec.dependency_scan | Trivy/Snyk + Semgrep clean (no high/critical) |
sec.secret_scan | Gitleaks clean |
sec.waf_rules | Cloud Armor rules deployed and tested |
sec.access_review | KMS + Secret Manager IAM reviewed quarterly |
| Gate | Criteria |
|---|
perf.load_baseline_recorded | Per TESTING_STRATEGY §9 |
perf.no_regression | < 20 % regression vs previous release |
perf.cold_start | < 0.5 % error rate in first 60 s post-deploy |
2.10 Operational
| Gate | Criteria |
|---|
ops.on_call_rotation | PagerDuty rotation active; primary + secondary |
ops.canary_verified | Last release canary passed all criteria |
ops.dr_drill | DR drill within last 90 d; passed |
ops.rollback_tested | Rollback path verified within last 30 d |
ops.cost_dashboard | Per-tenant cost dashboard exists; weekly review |
3. Gate Status (M0 target)
| Gate Group | Target | Owner |
|---|
| Documentation | GREEN | Architecture |
| API | GREEN | iam-team |
| Events | GREEN | iam-team |
| Data | GREEN | iam-team |
| Sync | GREEN | iam-team |
| AI | GREEN | iam-team + AI platform |
| Observability | GREEN | SRE |
| Security | GREEN | Security |
| Performance | GREEN | iam-team |
| Operational | GREEN | SRE |
CI publishes gate status badges in the service README.
4. SLOs (committed)
| SLO | Target | Window |
|---|
| Auth availability | 99.99 % | 30 d |
| JWKS availability | 99.999 % | 30 d |
| Login latency p99 | < 800 ms | 30 d |
| Refresh latency p95 | < 100 ms | 30 d |
| MFA challenge success | > 99.5 % | 7 d |
| SSO callback success | > 99.5 % | 7 d |
| Outbox publish lag p95 | < 5 s | 30 d |
| Auth error rate (5xx) | < 0.1 % | 7 d |
Error budget burn alerts wired per OBSERVABILITY §6.
5. Definition of Done — Per Story
Every iam story merges only when all of:
6. Release Readiness Checklist
Before promoting to prod:
7. Owner Sign-Off Matrix
| Aspect | Sign-off owner |
|---|
| Domain & API | iam team lead |
| Events & data | iam team lead + DBRE |
| Security | Security lead |
| Observability | SRE on-call |
| Performance | iam team lead |
| Compliance / GDPR | Compliance lead |
| Cost posture | Finance ops |
All sign-offs recorded in the release ticket; readiness automation refuses promotion otherwise.
8. M0 Launch Gate
For first production tenant onboarding:
| Gate | Status |
|---|
| All 17 docs complete | required |
| All canonical gates GREEN | required |
| Last DR drill passed | required |
| Pen test report on file | required |
| Tenant CA bootstrap automation tested | required |
| Offline-cert renewal tested at 7-d boundary | required |
Tenant offboarding (tenant.deleted) end-to-end tested | required |
| GDPR erasure saga end-to-end tested | required |
| Migration plan reviewed (no legacy data scenario applies) | required |
9. Continuous Readiness
- Weekly: gate status reviewed in iam team standup; any AMBER/RED → action item.
- Monthly: SLO review; error-budget posture; incident review.
- Quarterly: threat model review; access reviews; bias/fairness review for AI-driven decisions.
- Annually: pen test; DR drill formal report; CA rotation.
Readiness is not a milestone — it's a steady state.