Skip to main content

Platform Admin Service — User Stories

Status: populated Owner: TBD Last updated: 2026-04-18

Story index

IDEpicSummaryPriorityMilestone
PLTADM-US-001PLTADM-EPIC-01Create and update platform config entriesMustM0
PLTADM-US-002PLTADM-EPIC-01Read and delete platform config entriesMustM0
PLTADM-US-003PLTADM-EPIC-02Create and update feature flagsMustM0
PLTADM-US-004PLTADM-EPIC-02Manage tenant overrides on feature flagsMustM0
PLTADM-US-005PLTADM-EPIC-02Evaluate feature flag for a tenantMustM0
PLTADM-US-006PLTADM-EPIC-03View aggregate platform healthMustM0
PLTADM-US-007PLTADM-EPIC-03Register and update health sources dynamicallyMustM1
PLTADM-US-008PLTADM-EPIC-04Verify service meets coverage and latency targetsMustM1
PLTADM-US-009PLTADM-EPIC-04Validate observability and audit trail completenessMustM1
PLTADM-US-010PLTADM-EPIC-05Retrieve paginated config historyShouldM1
PLTADM-US-011PLTADM-EPIC-05List feature flags visible to a tenant adminShouldM1

PLTADM-US-001 — Create and update platform config entries

FieldValue
Issue typeStory
SummaryAs a platform operator, I can create and update config entries so that I can govern platform-wide and tenant-scoped settings
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentsconfig-module
Fix versionM0
Epic linkPLTADM-EPIC-01
FR referencesFR-PLTADM-CFG-001, FR-PLTADM-CFG-002
Legacy FR refsFR-ADM-CFG-001, FR-ADM-CFG-002

User story:

As a platform operator, I want to create and update platform config entries via the admin API so that I can govern platform-wide settings and per-tenant overrides from a single governed store.

Acceptance criteria:

Scenario: Create a new PLATFORM-scoped config entry
Given I am authenticated as SUPER_ADMIN
And the config key "session.timeout_minutes" exists in the allow-list
When I POST /api/v1/admin/platform-config
with { "key": "session.timeout_minutes", "value": "60", "scope": "PLATFORM" }
Then the response is 201 Created
And the config entry is persisted with scope=PLATFORM
And event "platform_admin.config.updated.v1" is published to NATS

Scenario: Reject an unknown config key
Given I am authenticated as SUPER_ADMIN
When I POST /api/v1/admin/platform-config
with { "key": "unknown.key", "value": "x", "scope": "PLATFORM" }
Then the response is 400 Bad Request
And the error code is "ADM_CONFIG_KEY_UNKNOWN"
And no event is published

Scenario: Update an existing config entry
Given config entry "mfa.required" exists with value "false"
When I PATCH /api/v1/admin/platform-config/mfa.required
with { "value": "true" }
Then the response is 200 OK
And a history record is appended to config_history with previous_value="false" and new_value="true"

Scenario: Create a TENANT-scoped config entry
Given tenant "ten_DEMO001" exists and is ACTIVE
When I POST /api/v1/admin/platform-config
with { "key": "session.timeout_minutes", "value": "30", "scope": "TENANT", "tenantId": "ten_DEMO001" }
Then the response is 201 Created
And the config entry is stored with tenantId="ten_DEMO001"

Scenario: Reject config mutation from non-SUPER_ADMIN
Given I am authenticated as TENANT_ADMIN
When I POST /api/v1/admin/platform-config
Then the response is 403 Forbidden

Technical notes:

  • Allow-list enforced in ConfigAllowListPort; unknown keys return ADM_CONFIG_KEY_UNKNOWN (400).
  • scope enum: PLATFORM | TENANT. TENANT scope requires tenantId.
  • Type validation: each allow-listed key has a defined type (string | number | boolean | json); value is validated against the type before persist.
  • secret type keys: value stored via Secrets Manager reference; GET returns ***REDACTED***.
  • Mutation emits platform_admin.config.updated.v1; downstream consumers (e.g., identity-service for session.timeout_minutes) subscribe.
  • History record written in same DB transaction as config update (transactional outbox).

Definition of done:

  • POST /api/v1/admin/platform-config and PATCH /api/v1/admin/platform-config/:key implemented and tested
  • Allow-list validation enforced; 23 seeded keys present in test fixtures
  • config_history row appended on every mutation
  • platform_admin.config.updated.v1 published via outbox
  • SUPER_ADMIN scope enforced; 403 returned for non-SUPER_ADMIN callers
  • Unit coverage ≥ 80% for config-module

PLTADM-US-002 — Read and delete platform config entries

FieldValue
Issue typeStory
SummaryAs a platform operator, I can list and delete config entries so that I have full read/delete lifecycle control over governed settings
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentsconfig-module
Fix versionM0
Epic linkPLTADM-EPIC-01
FR referencesFR-PLTADM-CFG-003, FR-PLTADM-CFG-004
Legacy FR refsFR-ADM-CFG-003, FR-ADM-CFG-004

User story:

As a platform operator, I want to list all config entries and delete individual ones so that I can inspect the current configuration state and remove obsolete tenant overrides.

Acceptance criteria:

Scenario: List all config entries
Given 10 config entries exist (3 PLATFORM-scoped, 7 TENANT-scoped across 2 tenants)
When I GET /api/v1/admin/platform-config
Then the response is 200 OK
And the response contains all 10 entries
And secret-type entries show value "***REDACTED***"

Scenario: Get a single config entry by key and scope
Given config entry "mfa.required" exists with scope=PLATFORM
When I GET /api/v1/admin/platform-config/mfa.required?scope=PLATFORM
Then the response is 200 OK
And the response includes { "key": "mfa.required", "scope": "PLATFORM", "value": "true" }

Scenario: Delete a TENANT-scoped config entry
Given a TENANT-scoped config for key "session.timeout_minutes" exists for tenant "ten_DEMO001"
When I DELETE /api/v1/admin/platform-config/session.timeout_minutes?scope=TENANT&tenantId=ten_DEMO001
Then the response is 204 No Content
And the entry is removed from platform_configs

Scenario: Secret key value is redacted in list response
Given config entry "smtp.password" with type=secret exists
When I GET /api/v1/admin/platform-config
Then the entry for "smtp.password" shows value "***REDACTED***"

Technical notes:

  • GET /api/v1/admin/platform-config returns array; no pagination required at M0 (max ~200 entries at launch).
  • Delete does not write a history record (deletion is audited via the outbox event only).
  • Redis cache for config reads: TTL 5 min; cache invalidated on mutation event.

Definition of done:

  • GET /api/v1/admin/platform-config and GET /api/v1/admin/platform-config/:key implemented
  • DELETE /api/v1/admin/platform-config/:key implemented with scope/tenantId query params
  • Secret-type redaction verified in list and get responses
  • SUPER_ADMIN scope enforced on all write/delete endpoints

PLTADM-US-003 — Create and update feature flags

FieldValue
Issue typeStory
SummaryAs a platform engineer, I can create and update feature flags so that I can control feature rollout without code deployment
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentsfeature-flag-module
Fix versionM0
Epic linkPLTADM-EPIC-02
FR referencesFR-PLTADM-FF-001, FR-PLTADM-FF-002, FR-PLTADM-FF-003
Legacy FR refsFR-ADM-FF-001, FR-ADM-FF-002, FR-ADM-FF-003

User story:

As a platform engineer, I want to create, update, and archive feature flags via the admin API so that I can safely control feature rollout across all tenants or specific tenants without a code deployment.

Acceptance criteria:

Scenario: Create a new feature flag
Given I am authenticated as SUPER_ADMIN
When I POST /api/v1/admin/flags
with { "key": "OFFLINE_SYNC_V2", "defaultEnabled": false, "description": "Enable offline sync v2" }
Then the response is 201 Created
And the flag is persisted with status=ACTIVE and defaultEnabled=false
And event "platform_admin.flag.created.v1" is published

Scenario: Reject duplicate flag key
Given flag "OFFLINE_SYNC_V2" already exists
When I POST /api/v1/admin/flags with the same key
Then the response is 409 Conflict
And the error code is "ADM_FLAG_KEY_DUPLICATE"

Scenario: Update flag defaultEnabled and description
Given flag "OFFLINE_SYNC_V2" exists with defaultEnabled=false
When I PATCH /api/v1/admin/flags/OFFLINE_SYNC_V2
with { "defaultEnabled": true, "description": "Enable offline sync v2 - GA" }
Then the response is 200 OK
And the flag defaultEnabled is updated to true
And event "platform_admin.flag.updated.v1" is published

Scenario: Archive a flag
Given flag "OFFLINE_SYNC_V1" exists with status=ACTIVE
When I DELETE /api/v1/admin/flags/OFFLINE_SYNC_V1
Then the response is 200 OK
And the flag status is set to ARCHIVED
And event "platform_admin.flag.archived.v1" is published
And subsequent evaluate calls for this flag return false

Scenario: Cannot update an archived flag
Given flag "OFFLINE_SYNC_V1" has status=ARCHIVED
When I PATCH /api/v1/admin/flags/OFFLINE_SYNC_V1
Then the response is 422 Unprocessable Entity
And the error code is "ADM_FLAG_ARCHIVED"

Technical notes:

  • key is globally unique; validated as UPPER_SNAKE_CASE.
  • Archive is terminal — no reactivation path.
  • Redis cache invalidated on create/update/archive via platform_admin.flag.*.v1 events.
  • Flag status enum: ACTIVE | ARCHIVED.

Definition of done:

  • POST /api/v1/admin/flags, PATCH /api/v1/admin/flags/:key, DELETE /api/v1/admin/flags/:key implemented
  • Duplicate key rejection with ADM_FLAG_KEY_DUPLICATE (409)
  • Archive terminal state enforced; archived flag updates return ADM_FLAG_ARCHIVED (422)
  • Events published via outbox for create/update/archive
  • Redis cache invalidated on mutation

PLTADM-US-004 — Manage tenant overrides on feature flags

FieldValue
Issue typeStory
SummaryAs a platform engineer, I can set per-tenant overrides on feature flags so that I can enable or disable features for specific tenants independently
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentsfeature-flag-module
Fix versionM0
Epic linkPLTADM-EPIC-02
FR referencesFR-PLTADM-FF-004, FR-PLTADM-FF-005
Legacy FR refsFR-ADM-FF-004, FR-ADM-FF-005

User story:

As a platform engineer, I want to add or remove per-tenant overrides on a feature flag so that I can enable a beta feature for specific tenants or block a feature for a problem tenant without affecting the global default.

Acceptance criteria:

Scenario: Enable a flag for a specific tenant (override)
Given flag "NEW_DASHBOARD" has defaultEnabled=false
When I POST /api/v1/admin/flags/NEW_DASHBOARD/overrides
with { "tenantId": "ten_DEMO001", "override": "ENABLED" }
Then the response is 200 OK
And ten_DEMO001 is added to enabledTenantIds
And evaluate(NEW_DASHBOARD, ten_DEMO001) returns true
And event "platform_admin.flag.updated.v1" is published

Scenario: Disable a flag for a specific tenant
Given flag "NEW_DASHBOARD" has defaultEnabled=true
When I POST /api/v1/admin/flags/NEW_DASHBOARD/overrides
with { "tenantId": "ten_DEMO001", "override": "DISABLED" }
Then ten_DEMO001 is added to disabledTenantIds
And evaluate(NEW_DASHBOARD, ten_DEMO001) returns false

Scenario: Remove a tenant override
Given ten_DEMO001 is in enabledTenantIds for "NEW_DASHBOARD"
When I DELETE /api/v1/admin/flags/NEW_DASHBOARD/overrides/ten_DEMO001
Then the tenant is removed from enabledTenantIds
And evaluate falls back to defaultEnabled

Scenario: Cannot set override on archived flag
Given flag "OLD_FEATURE" has status=ARCHIVED
When I POST /api/v1/admin/flags/OLD_FEATURE/overrides
Then the response is 422 Unprocessable Entity
And the error code is "ADM_FLAG_ARCHIVED"

Technical notes:

  • enabledTenantIds and disabledTenantIds are JSONB arrays on the feature_flags row.
  • A tenant can appear in at most one of the two arrays; adding to one removes from the other.
  • Redis cache key: flag:{key}:{tenantId} and flag:{key}:*; invalidate both on override change.

Definition of done:

  • POST /api/v1/admin/flags/:key/overrides and DELETE /api/v1/admin/flags/:key/overrides/:tenantId implemented
  • Mutual exclusion of enabled/disabled arrays enforced at domain level
  • Archived flag override rejection (ADM_FLAG_ARCHIVED)
  • Cache invalidation on override change (scoped and global flag cache keys)

PLTADM-US-005 — Evaluate feature flag for a tenant

FieldValue
Issue typeStory
SummaryAs a downstream service, I can call the internal evaluate endpoint so that I can determine feature availability for a tenant with sub-120ms latency
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentsfeature-flag-module
Fix versionM0
Epic linkPLTADM-EPIC-02
FR referencesFR-PLTADM-FF-006, FR-PLTADM-ENH-003, FR-PLTADM-ENH-004
Legacy FR refsFR-ADM-FF-006, FR-ADM-ENH-003, FR-ADM-ENH-004

User story:

As a downstream service, I want to call GET /internal/admin/flags/:key/evaluate?tenantId=... so that I can determine whether a feature is enabled for a tenant with deterministic logic and p95 latency ≤ 120 ms.

Acceptance criteria:

Scenario: Evaluate archived flag returns false
Given flag "OLD_FEATURE" has status=ARCHIVED
When GET /internal/admin/flags/OLD_FEATURE/evaluate?tenantId=ten_DEMO001
Then the response is 200 OK
And { "enabled": false, "reason": "ARCHIVED" }

Scenario: Evaluate flag with disabled tenant override
Given flag "NEW_DASHBOARD" has defaultEnabled=true
And ten_DEMO001 is in disabledTenantIds
When GET /internal/admin/flags/NEW_DASHBOARD/evaluate?tenantId=ten_DEMO001
Then { "enabled": false, "reason": "TENANT_DISABLED" }

Scenario: Evaluate flag with enabled tenant override
Given flag "NEW_DASHBOARD" has defaultEnabled=false
And ten_DEMO001 is in enabledTenantIds
When GET /internal/admin/flags/NEW_DASHBOARD/evaluate?tenantId=ten_DEMO001
Then { "enabled": true, "reason": "TENANT_ENABLED" }

Scenario: Evaluate flag falls back to defaultEnabled
Given flag "NEW_DASHBOARD" has defaultEnabled=true
And ten_DEMO001 has no override
When GET /internal/admin/flags/NEW_DASHBOARD/evaluate?tenantId=ten_DEMO001
Then { "enabled": true, "reason": "DEFAULT" }

Scenario: Evaluate returns p95 <= 120ms under load
Given 1000 concurrent evaluate calls with Redis cache warm
Then p95 response time is <= 120ms

Scenario: Bootstrap endpoint returns all flag evaluations for a tenant
Given 15 active flags, 2 with overrides for ten_DEMO001
When GET /internal/admin/flags/bootstrap?tenantId=ten_DEMO001
Then the response contains all 15 flags with their evaluated enabled values

Technical notes:

  • Evaluation logic order: ARCHIVED → false > disabledTenantIds → false > enabledTenantIds → true > defaultEnabled.
  • Redis cache: flag:{key}:{tenantId} TTL 60 s; warm on first call; invalidated via NATS event listener.
  • Bootstrap endpoint is used by service startup and client SDK hydration.
  • Internal endpoint restricted to cluster-internal IPs (no JWT required at evaluate path; network policy enforces).
  • Compatibility routes: GET /api/platform/flags/:key/evaluate redirects to internal path with deprecation header.

Definition of done:

  • GET /internal/admin/flags/:key/evaluate implements 4-step logic deterministically
  • GET /internal/admin/flags/bootstrap returns all active flags for tenant
  • Redis cache warm + TTL 60 s verified
  • Event-driven cache invalidation wired (flag.updated + flag.archived events)
  • p95 ≤ 120 ms verified under 1000 RPS load test
  • Compatibility route present with Deprecation and Sunset response headers

PLTADM-US-006 — View aggregate platform health

FieldValue
Issue typeStory
SummaryAs a platform operator, I can call the health aggregate endpoint so that I can triage incidents with a single view of all service statuses
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentshealth-module
Fix versionM0
Epic linkPLTADM-EPIC-03
FR referencesFR-PLTADM-HLT-001, FR-PLTADM-HLT-002
Legacy FR refsFR-ADM-HLT-001, FR-ADM-HLT-002

User story:

As a platform operator, I want to call GET /api/v1/admin/health/aggregate so that I can see the overall platform health status and per-service breakdown to quickly triage incidents.

Acceptance criteria:

Scenario: Aggregate health returns overall UP when all services are healthy
Given 5 registered health sources, all returning healthy in the last poll
When GET /api/v1/admin/health/aggregate
Then the response is 200 OK
And { "overall": "UP", "services": [ { "name": "...", "status": "UP", "latencyMs": ... }, ... ] }

Scenario: Aggregate health returns DEGRADED when one service is unhealthy
Given service "notification-service" returned UNHEALTHY in last poll
When GET /api/v1/admin/health/aggregate
Then { "overall": "DEGRADED", "services": [ ..., { "name": "notification-service", "status": "DOWN" } ] }

Scenario: Response is served from 10s cache
Given the cache was populated 5 seconds ago
When two rapid successive GET /api/v1/admin/health/aggregate calls are made
Then both return 200 within 50ms (cache hit)
And no upstream health probes are triggered

Scenario: Response returns within 2 seconds
Given 27 registered health sources with staggered probe results
When GET /api/v1/admin/health/aggregate
Then the response time is <= 2000ms

Scenario: Non-authenticated request returns 401
Given no Authorization header
When GET /api/v1/admin/health/aggregate
Then 401 Unauthorized

Technical notes:

  • Response cached at 10 s TTL in Redis; HealthPollerJob probes each source every 15 s in background.
  • overall logic: UP if all sources healthy; DEGRADED if ≥1 down but <50%; DOWN if ≥50% down.
  • At M0 health sources are seeded statically; dynamic registration added in PLTADM-US-007 (M1).
  • Response size: 27 services × ~100 bytes = ~2.7 KB; no pagination needed.

Definition of done:

  • GET /api/v1/admin/health/aggregate implemented with correct overall logic
  • 10 s Redis cache wired to HealthPollerJob
  • p99 response time ≤ 2 s verified under load
  • SUPER_ADMIN authentication enforced
  • Static seed list of health sources populates on service start

PLTADM-US-007 — Register and update health sources dynamically

FieldValue
Issue typeStory
SummaryAs a service instance, I can register itself as a health source so that dynamic deployments are reflected in the aggregate health without hardcoded lists
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S1
Componentshealth-module
Fix versionM1
Epic linkPLTADM-EPIC-03
FR referencesFR-PLTADM-HLT-003, FR-PLTADM-HLT-004, FR-PLTADM-ENH-002
Legacy FR refsFR-ADM-HLT-003, FR-ADM-HLT-004, FR-ADM-ENH-002

User story:

As a Kubernetes service instance, I want to POST my health endpoint to /internal/admin/health/sources on startup so that I am automatically included in the aggregate health view without requiring a hardcoded list update.

Acceptance criteria:

Scenario: Register a new health source
Given service "new-service" has not previously registered
When POST /internal/admin/health/sources
with { "name": "new-service", "healthUrl": "http://new-service:3020/health" }
Then the response is 201 Created
And the source is stored in health_sources
And event "platform_admin.health_source.registered.v1" is published

Scenario: Re-registration updates heartbeat timestamp
Given source "identity-service" was last registered 30 seconds ago
When POST /internal/admin/health/sources again with same payload
Then the response is 200 OK
And lastRegisteredAt is updated to now

Scenario: Stale source is marked unhealthy
Given source "old-service" last registered 90 seconds ago (staleness threshold=60s)
When HealthPollerJob runs
Then "old-service" status is set to UNHEALTHY in health_sources
And aggregate health reflects the degraded status

Scenario: Stale source re-registers and recovers
Given "old-service" is marked UNHEALTHY due to staleness
When "old-service" POSTs to /internal/admin/health/sources
Then lastRegisteredAt is updated
And on next poll the source returns to HEALTHY if its health endpoint responds 200

Technical notes:

  • POST /internal/admin/health/sources is idempotent on name; upsert by name.
  • Staleness threshold configurable via PLTADM_HEALTH_STALENESS_S (default 60 s).
  • HealthPollerJob CronJob runs every 15 s; probes healthUrl; updates health_check_results.
  • Dynamic registration replaces static seed list at M1; static seed remains as fallback behind feature flag DYNAMIC_HEALTH_REGISTRATION.

Definition of done:

  • POST /internal/admin/health/sources upserts by name with heartbeat update
  • Staleness check in HealthPollerJob marks stale sources UNHEALTHY
  • platform_admin.health_source.registered.v1 published on new registration
  • Static seed list kept behind DYNAMIC_HEALTH_REGISTRATION flag for rollback
  • Integration test: register → probe → aggregate reflects new source

PLTADM-US-008 — Verify service meets coverage and latency targets

FieldValue
Issue typeStory
SummaryAs a platform team lead, I can confirm test coverage ≥ 80% and flag evaluate p95 ≤ 120ms so that platform-admin-service meets quality gates before M1 sign-off
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentscross-cutting
Fix versionM1
Epic linkPLTADM-EPIC-04
FR referencesFR-PLTADM-NFR-001, FR-PLTADM-NFR-002
Legacy FR refsNFR-ADM-001, NFR-ADM-002

User story:

As a platform team lead, I want to run the test suite and see coverage ≥ 80% with zero lint/typecheck errors, and confirm flag evaluate p95 ≤ 120 ms so that platform-admin-service can pass the quality gate and proceed to production.

Acceptance criteria:

Scenario: Unit and integration coverage threshold met
When pnpm test:cov is executed
Then overall statement coverage is >= 80%
And branch coverage is >= 80%
And the following test files exist:
- config-module.spec.ts (allow-list, type validation, history)
- feature-flag.spec.ts (CRUD, evaluation logic, cache)
- health-aggregate.spec.ts (aggregation, staleness)
- tenant-isolation.spec.ts
- outbox.spec.ts
- inbox.spec.ts

Scenario: ESLint and TypeScript type checks pass
When pnpm lint && pnpm typecheck is executed
Then exit code is 0 with zero errors

Scenario: Flag evaluate p95 latency target met
Given Redis cache warm
When 1000 concurrent evaluate requests are sent
Then p95 response time is <= 120ms
And p99 response time is <= 200ms

Scenario: Aggregate health p99 latency target met
Given 27 registered health sources with cached results
When 100 concurrent aggregate health requests are sent
Then p99 response time is <= 2000ms

Technical notes:

  • Load test script: k6 run tests/load/flag-evaluate.k6.js targeting 1000 RPS for 60 s.
  • Coverage report generated by Vitest with --coverage flag; threshold enforced in vitest.config.ts.
  • CI gate: coverage check runs in the test job; load test runs in a separate perf-test job on pre-prod.

Definition of done:

  • vitest.config.ts coverage thresholds set to 80% (statements, branches, functions, lines)
  • All 6 mandatory test files present and passing
  • pnpm lint && pnpm typecheck returns exit 0
  • k6 load test confirms p95 ≤ 120 ms at 1000 RPS (recorded in CI artefact)
  • Health aggregate p99 ≤ 2 s verified

PLTADM-US-009 — Validate observability and audit trail completeness

FieldValue
Issue typeStory
SummaryAs a platform SRE, I can confirm OTel traces are visible, SLO burn alerts are configured, and config audit history is preserved for 7 years
StatusTo Do
PriorityMust
Labelsservice:platform-admin, domain:platform_admin, slice:S0
Componentscross-cutting
Fix versionM1
Epic linkPLTADM-EPIC-04
FR referencesFR-PLTADM-NFR-003
Legacy FR refsNFR-ADM-003

User story:

As a platform SRE, I want to confirm that OpenTelemetry traces flow through to the tracing backend, SLO burn-rate alerts are active, and config_history rows are retained for 7 years so that the service meets platform-wide observability and compliance requirements.

Acceptance criteria:

Scenario: OTel trace visible for flag evaluate
Given OTel exporter is configured and staging is running
When GET /internal/admin/flags/OFFLINE_SYNC_V2/evaluate?tenantId=ten_DEMO001
Then a trace is visible in the tracing backend
And span "flag.evaluate" includes attributes: flag_key, tenant_id, evaluation_result, cache_hit

Scenario: OTel trace visible for config mutation
When PATCH /api/v1/admin/platform-config/:key is called
Then a trace with span "config.update" includes: config_key, config_scope, actor_sub

Scenario: SLO burn-rate alert fires on latency regression
Given the SLO for flag evaluate p95 is 120ms
When flag evaluate p95 exceeds 120ms for 5 consecutive minutes
Then an alert fires to the SRE on-call channel

Scenario: Config audit history retained for 7 years
Given a config mutation happened 2 years ago
When querying config_history for that key
Then the record is still present
And the retention policy annotation confirms 7-year retention with S3 archive after 2 years

Technical notes:

  • OTel SDK: @opentelemetry/sdk-node; exporter: OTLP HTTP to OTEL_EXPORTER_OTLP_ENDPOINT.
  • Key span names: flag.evaluate, flag.cache.hit, flag.cache.miss, config.update, health.poll.
  • SLO burn-rate alert configured in Prometheus/Alertmanager: platform_admin_flag_evaluate_p95_ms > 120 for 5 min → page SRE.
  • config_history retention: 7-year PostgreSQL retention policy; rows older than 2 years archived to S3 via nightly job.

Definition of done:

  • OTel instrumentation in config-module, feature-flag-module, health-module (all key spans instrumented)
  • Traces visible in staging tracing backend
  • SLO burn-rate Prometheus alert rule deployed and tested (fire/resolve cycle)
  • config_history retention policy set via pg_partman or equivalent; S3 archive job configured
  • Compliance team sign-off on audit trail documented in SERVICE_READINESS.md

PLTADM-US-010 — Retrieve paginated config history

FieldValue
Issue typeStory
SummaryAs a platform operator, I can retrieve paginated config change history so that I can audit who changed what and when
StatusTo Do
PriorityShould
Labelsservice:platform-admin, domain:platform_admin, slice:S1
Componentsconfig-module
Fix versionM1
Epic linkPLTADM-EPIC-05
FR referencesFR-PLTADM-ENH-001
Legacy FR refsFR-ADM-ENH-001

User story:

As a platform operator, I want to call GET /api/v1/admin/platform-config/:key/history so that I can audit every change to a config entry with who made it and what values changed, with cursor-based pagination.

Acceptance criteria:

Scenario: Retrieve history for a config key
Given 50 change records exist for key "session.timeout_minutes"
When GET /api/v1/admin/platform-config/session.timeout_minutes/history?limit=20
Then the response is 200 OK
And the response contains 20 records sorted by changed_at DESC
And each record includes: id, key, previous_value, new_value, changed_by, changed_at
And a nextCursor is included in the response

Scenario: Cursor-based pagination yields consistent results
Given the first page returned nextCursor "cursor_abc"
When GET /api/v1/admin/platform-config/session.timeout_minutes/history?limit=20&cursor=cursor_abc
Then the next 20 records are returned without duplicates

Scenario: History for unknown key returns 404
When GET /api/v1/admin/platform-config/unknown.key/history
Then 404 Not Found with error code "ADM_CONFIG_KEY_UNKNOWN"

Scenario: Secret-type key history redacts values
Given key "smtp.password" has type=secret
When GET /api/v1/admin/platform-config/smtp.password/history
Then all previous_value and new_value fields show "***REDACTED***"

Technical notes:

  • Cursor: opaque base64-encoded { id, changed_at } for keyset pagination.
  • Default sort: changed_at DESC.
  • History endpoint is read-only; no write operations.
  • changed_by is the sub claim from the SUPER_ADMIN JWT that performed the mutation.

Definition of done:

  • GET /api/v1/admin/platform-config/:key/history implemented with cursor pagination
  • Sort order changed_at DESC enforced
  • Secret-type value redaction in history responses
  • changed_by populated from JWT sub on every mutation
  • Integration test: 50-record seed → paginate through all records in 3 pages

PLTADM-US-011 — List feature flags visible to a tenant admin

FieldValue
Issue typeStory
SummaryAs a tenant admin, I can list feature flags applicable to my tenant so that I can understand which features are available to my organization
StatusTo Do
PriorityShould
Labelsservice:platform-admin, domain:platform_admin, slice:S1
Componentsfeature-flag-module
Fix versionM1
Epic linkPLTADM-EPIC-05
FR referencesFR-PLTADM-ENH-003
Legacy FR refsFR-ADM-ENH-003

User story:

As a tenant admin, I want to call GET /api/v1/tenant/flags so that I can see all active feature flags and their evaluated status for my tenant, enabling me to understand which features my organization can use.

Acceptance criteria:

Scenario: Tenant admin lists flags for their tenant
Given I am authenticated as TENANT_ADMIN for tenant "ten_DEMO001"
And 12 active flags exist; 3 have specific overrides for ten_DEMO001
When GET /api/v1/tenant/flags
Then the response is 200 OK
And the response contains 12 flags (archived flags excluded)
And each flag shows: key, description, enabled (evaluated for ten_DEMO001), hasOverride

Scenario: Archived flags are excluded from tenant listing
Given 2 flags have status=ARCHIVED
When GET /api/v1/tenant/flags
Then the response does not include the 2 archived flags

Scenario: Tenant admin can only see their own tenant's flag evaluation
Given I am TENANT_ADMIN for "ten_DEMO001"
When GET /api/v1/tenant/flags
Then all enabled values are evaluated for tenantId="ten_DEMO001"
And I cannot see flags evaluated for any other tenant

Scenario: SUPER_ADMIN can query flags for any tenant
Given I am authenticated as SUPER_ADMIN
When GET /api/v1/tenant/flags?tenantId=ten_DEMO001
Then flags evaluated for ten_DEMO001 are returned

Technical notes:

  • tenantId resolved from JWT tenant_id claim for TENANT_ADMIN; can be overridden with query param for SUPER_ADMIN only.
  • Response uses cached evaluate results where available (60 s TTL); falls back to DB.
  • hasOverride: true if tenant appears in either enabledTenantIds or disabledTenantIds.
  • Archived flags filtered at query level (WHERE status = 'ACTIVE').
  • Compatibility route: GET /api/platform/flags with deprecation header redirects to this endpoint.

Definition of done:

  • GET /api/v1/tenant/flags returns active flags with per-tenant evaluation
  • Archived flags excluded from response
  • TENANT_ADMIN scoped to their own tenantId; SUPER_ADMIN can override with query param
  • hasOverride field correctly populated
  • Compatibility route GET /api/platform/flags present with Deprecation response header
  • Unit test: 15 flags (3 archived, 3 with overrides) → correct subset returned with correct enabled values