Skip to main content

Operator Management Service — Service Readiness

Status: populated Owner: Platform Engineering + SRE Last updated: 2026-04-18

Service is production-ready only when EVERY box below is checked.

Docs

  • All 17 service docs complete (no stubs remain).
  • Vault bootstrap runbook linked from LOCAL_DEV_SETUP.

Code + Tests

  • TypeScript strict, zero errors.
  • ESLint: no banned patterns (no PG password columns, no direct Vault SDK from controllers).
  • Unit coverage: aggregates/VOs ≥ 95%, domain services ≥ 90%, use cases ≥ 90%.
  • Mutation score on changed files ≥ 75%.
  • Integration tests pass: create-operator, duplicate-prevention, soft-delete, credentials-endpoint, vault-failure, health-ingest, routing-rules.
  • Contract tests green: schema registry for all 4 produced events.
  • OpenAPI diff gate: no breaking change without major bump.

Security

  • security-reviewer agent run, zero critical/high.
  • Password never appears in API responses (automated test asserts absence of password key in GET responses).
  • Password never appears in NATS event payloads (schema registry enforces).
  • Vault policy scoped correctly (no access outside secret/ops/operators/*).
  • mTLS enforced on internal API (NetworkPolicy + PeerAuthentication in staging).
  • OWASP ZAP baseline passed against staging admin API.

Observability

  • /metrics, /health/live, /health/ready up.
  • All metric families visible in Grafana dashboard.
  • OTel spans in trace backend.
  • All 6 alerts have runbooks linked.

Infra / Rollout

  • Helm chart + Terraform module committed.
  • Vault Kubernetes auth role configured in Terraform.
  • K8s PDB, HPA, resource limits per DEPLOYMENT_TOPOLOGY.
  • NetworkPolicy applied in staging and validated.
  • mTLS cert rotation tested (cert-manager).
  • On-call rotation assigned.

Data

  • PG migrations applied in staging.
  • pg_partman configured for ops.operator_health_log partitions.
  • Legacy operator migration script executed in staging; ops team validated config.

Sign-off

  • Tech lead ✅
  • SRE ✅
  • Security ✅
  • Carrier Relations (ops team) ✅