Platform Admin Service — User Stories

Status: populated Owner: TBD Last updated: 2026-04-18

Story index

ID	Epic	Summary	Priority	Milestone
PLTADM-US-001	PLTADM-EPIC-01	Create and update platform config entries	Must	M0
PLTADM-US-002	PLTADM-EPIC-01	Read and delete platform config entries	Must	M0
PLTADM-US-003	PLTADM-EPIC-02	Create and update feature flags	Must	M0
PLTADM-US-004	PLTADM-EPIC-02	Manage tenant overrides on feature flags	Must	M0
PLTADM-US-005	PLTADM-EPIC-02	Evaluate feature flag for a tenant	Must	M0
PLTADM-US-006	PLTADM-EPIC-03	View aggregate platform health	Must	M0
PLTADM-US-007	PLTADM-EPIC-03	Register and update health sources dynamically	Must	M1
PLTADM-US-008	PLTADM-EPIC-04	Verify service meets coverage and latency targets	Must	M1
PLTADM-US-009	PLTADM-EPIC-04	Validate observability and audit trail completeness	Must	M1
PLTADM-US-010	PLTADM-EPIC-05	Retrieve paginated config history	Should	M1
PLTADM-US-011	PLTADM-EPIC-05	List feature flags visible to a tenant admin	Should	M1

PLTADM-US-001 — Create and update platform config entries

Field	Value
Issue type	Story
Summary	As a platform operator, I can create and update config entries so that I can govern platform-wide and tenant-scoped settings
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	config-module
Fix version	M0
Epic link	PLTADM-EPIC-01
FR references	FR-PLTADM-CFG-001, FR-PLTADM-CFG-002
Legacy FR refs	FR-ADM-CFG-001, FR-ADM-CFG-002

User story:

As a platform operator, I want to create and update platform config entries via the admin API so that I can govern platform-wide settings and per-tenant overrides from a single governed store.

Acceptance criteria:

Scenario: Create a new PLATFORM-scoped config entry
  Given I am authenticated as SUPER_ADMIN
  And the config key "session.timeout_minutes" exists in the allow-list
  When I POST /api/v1/admin/platform-config
    with { "key": "session.timeout_minutes", "value": "60", "scope": "PLATFORM" }
  Then the response is 201 Created
  And the config entry is persisted with scope=PLATFORM
  And event "platform_admin.config.updated.v1" is published to NATS

Scenario: Reject an unknown config key
  Given I am authenticated as SUPER_ADMIN
  When I POST /api/v1/admin/platform-config
    with { "key": "unknown.key", "value": "x", "scope": "PLATFORM" }
  Then the response is 400 Bad Request
  And the error code is "ADM_CONFIG_KEY_UNKNOWN"
  And no event is published

Scenario: Update an existing config entry
  Given config entry "mfa.required" exists with value "false"
  When I PATCH /api/v1/admin/platform-config/mfa.required
    with { "value": "true" }
  Then the response is 200 OK
  And a history record is appended to config_history with previous_value="false" and new_value="true"

Scenario: Create a TENANT-scoped config entry
  Given tenant "ten_DEMO001" exists and is ACTIVE
  When I POST /api/v1/admin/platform-config
    with { "key": "session.timeout_minutes", "value": "30", "scope": "TENANT", "tenantId": "ten_DEMO001" }
  Then the response is 201 Created
  And the config entry is stored with tenantId="ten_DEMO001"

Scenario: Reject config mutation from non-SUPER_ADMIN
  Given I am authenticated as TENANT_ADMIN
  When I POST /api/v1/admin/platform-config
  Then the response is 403 Forbidden

Technical notes:

Allow-list enforced in ConfigAllowListPort; unknown keys return ADM_CONFIG_KEY_UNKNOWN (400).
scope enum: PLATFORM | TENANT. TENANT scope requires tenantId.
Type validation: each allow-listed key has a defined type (string | number | boolean | json); value is validated against the type before persist.
secret type keys: value stored via Secrets Manager reference; GET returns ***REDACTED***.
Mutation emits platform_admin.config.updated.v1; downstream consumers (e.g., identity-service for session.timeout_minutes) subscribe.
History record written in same DB transaction as config update (transactional outbox).

Definition of done:

POST /api/v1/admin/platform-config and PATCH /api/v1/admin/platform-config/:key implemented and tested
Allow-list validation enforced; 23 seeded keys present in test fixtures
config_history row appended on every mutation
platform_admin.config.updated.v1 published via outbox
SUPER_ADMIN scope enforced; 403 returned for non-SUPER_ADMIN callers
Unit coverage ≥ 80% for config-module

PLTADM-US-002 — Read and delete platform config entries

Field	Value
Issue type	Story
Summary	As a platform operator, I can list and delete config entries so that I have full read/delete lifecycle control over governed settings
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	config-module
Fix version	M0
Epic link	PLTADM-EPIC-01
FR references	FR-PLTADM-CFG-003, FR-PLTADM-CFG-004
Legacy FR refs	FR-ADM-CFG-003, FR-ADM-CFG-004

User story:

As a platform operator, I want to list all config entries and delete individual ones so that I can inspect the current configuration state and remove obsolete tenant overrides.

Acceptance criteria:

Scenario: List all config entries
  Given 10 config entries exist (3 PLATFORM-scoped, 7 TENANT-scoped across 2 tenants)
  When I GET /api/v1/admin/platform-config
  Then the response is 200 OK
  And the response contains all 10 entries
  And secret-type entries show value "***REDACTED***"

Scenario: Get a single config entry by key and scope
  Given config entry "mfa.required" exists with scope=PLATFORM
  When I GET /api/v1/admin/platform-config/mfa.required?scope=PLATFORM
  Then the response is 200 OK
  And the response includes { "key": "mfa.required", "scope": "PLATFORM", "value": "true" }

Scenario: Delete a TENANT-scoped config entry
  Given a TENANT-scoped config for key "session.timeout_minutes" exists for tenant "ten_DEMO001"
  When I DELETE /api/v1/admin/platform-config/session.timeout_minutes?scope=TENANT&tenantId=ten_DEMO001
  Then the response is 204 No Content
  And the entry is removed from platform_configs

Scenario: Secret key value is redacted in list response
  Given config entry "smtp.password" with type=secret exists
  When I GET /api/v1/admin/platform-config
  Then the entry for "smtp.password" shows value "***REDACTED***"

Technical notes:

GET /api/v1/admin/platform-config returns array; no pagination required at M0 (max ~200 entries at launch).
Delete does not write a history record (deletion is audited via the outbox event only).
Redis cache for config reads: TTL 5 min; cache invalidated on mutation event.

Definition of done:

GET /api/v1/admin/platform-config and GET /api/v1/admin/platform-config/:key implemented
DELETE /api/v1/admin/platform-config/:key implemented with scope/tenantId query params
Secret-type redaction verified in list and get responses
SUPER_ADMIN scope enforced on all write/delete endpoints

PLTADM-US-003 — Create and update feature flags

Field	Value
Issue type	Story
Summary	As a platform engineer, I can create and update feature flags so that I can control feature rollout without code deployment
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	feature-flag-module
Fix version	M0
Epic link	PLTADM-EPIC-02
FR references	FR-PLTADM-FF-001, FR-PLTADM-FF-002, FR-PLTADM-FF-003
Legacy FR refs	FR-ADM-FF-001, FR-ADM-FF-002, FR-ADM-FF-003

User story:

As a platform engineer, I want to create, update, and archive feature flags via the admin API so that I can safely control feature rollout across all tenants or specific tenants without a code deployment.

Acceptance criteria:

Scenario: Create a new feature flag
  Given I am authenticated as SUPER_ADMIN
  When I POST /api/v1/admin/flags
    with { "key": "OFFLINE_SYNC_V2", "defaultEnabled": false, "description": "Enable offline sync v2" }
  Then the response is 201 Created
  And the flag is persisted with status=ACTIVE and defaultEnabled=false
  And event "platform_admin.flag.created.v1" is published

Scenario: Reject duplicate flag key
  Given flag "OFFLINE_SYNC_V2" already exists
  When I POST /api/v1/admin/flags with the same key
  Then the response is 409 Conflict
  And the error code is "ADM_FLAG_KEY_DUPLICATE"

Scenario: Update flag defaultEnabled and description
  Given flag "OFFLINE_SYNC_V2" exists with defaultEnabled=false
  When I PATCH /api/v1/admin/flags/OFFLINE_SYNC_V2
    with { "defaultEnabled": true, "description": "Enable offline sync v2 - GA" }
  Then the response is 200 OK
  And the flag defaultEnabled is updated to true
  And event "platform_admin.flag.updated.v1" is published

Scenario: Archive a flag
  Given flag "OFFLINE_SYNC_V1" exists with status=ACTIVE
  When I DELETE /api/v1/admin/flags/OFFLINE_SYNC_V1
  Then the response is 200 OK
  And the flag status is set to ARCHIVED
  And event "platform_admin.flag.archived.v1" is published
  And subsequent evaluate calls for this flag return false

Scenario: Cannot update an archived flag
  Given flag "OFFLINE_SYNC_V1" has status=ARCHIVED
  When I PATCH /api/v1/admin/flags/OFFLINE_SYNC_V1
  Then the response is 422 Unprocessable Entity
  And the error code is "ADM_FLAG_ARCHIVED"

Technical notes:

key is globally unique; validated as UPPER_SNAKE_CASE.
Archive is terminal — no reactivation path.
Redis cache invalidated on create/update/archive via platform_admin.flag.*.v1 events.
Flag status enum: ACTIVE | ARCHIVED.

Definition of done:

POST /api/v1/admin/flags, PATCH /api/v1/admin/flags/:key, DELETE /api/v1/admin/flags/:key implemented
Duplicate key rejection with ADM_FLAG_KEY_DUPLICATE (409)
Archive terminal state enforced; archived flag updates return ADM_FLAG_ARCHIVED (422)
Events published via outbox for create/update/archive
Redis cache invalidated on mutation

PLTADM-US-004 — Manage tenant overrides on feature flags

Field	Value
Issue type	Story
Summary	As a platform engineer, I can set per-tenant overrides on feature flags so that I can enable or disable features for specific tenants independently
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	feature-flag-module
Fix version	M0
Epic link	PLTADM-EPIC-02
FR references	FR-PLTADM-FF-004, FR-PLTADM-FF-005
Legacy FR refs	FR-ADM-FF-004, FR-ADM-FF-005

User story:

As a platform engineer, I want to add or remove per-tenant overrides on a feature flag so that I can enable a beta feature for specific tenants or block a feature for a problem tenant without affecting the global default.

Acceptance criteria:

Scenario: Enable a flag for a specific tenant (override)
  Given flag "NEW_DASHBOARD" has defaultEnabled=false
  When I POST /api/v1/admin/flags/NEW_DASHBOARD/overrides
    with { "tenantId": "ten_DEMO001", "override": "ENABLED" }
  Then the response is 200 OK
  And ten_DEMO001 is added to enabledTenantIds
  And evaluate(NEW_DASHBOARD, ten_DEMO001) returns true
  And event "platform_admin.flag.updated.v1" is published

Scenario: Disable a flag for a specific tenant
  Given flag "NEW_DASHBOARD" has defaultEnabled=true
  When I POST /api/v1/admin/flags/NEW_DASHBOARD/overrides
    with { "tenantId": "ten_DEMO001", "override": "DISABLED" }
  Then ten_DEMO001 is added to disabledTenantIds
  And evaluate(NEW_DASHBOARD, ten_DEMO001) returns false

Scenario: Remove a tenant override
  Given ten_DEMO001 is in enabledTenantIds for "NEW_DASHBOARD"
  When I DELETE /api/v1/admin/flags/NEW_DASHBOARD/overrides/ten_DEMO001
  Then the tenant is removed from enabledTenantIds
  And evaluate falls back to defaultEnabled

Scenario: Cannot set override on archived flag
  Given flag "OLD_FEATURE" has status=ARCHIVED
  When I POST /api/v1/admin/flags/OLD_FEATURE/overrides
  Then the response is 422 Unprocessable Entity
  And the error code is "ADM_FLAG_ARCHIVED"

Technical notes:

enabledTenantIds and disabledTenantIds are JSONB arrays on the feature_flags row.
A tenant can appear in at most one of the two arrays; adding to one removes from the other.
Redis cache key: flag:{key}:{tenantId} and flag:{key}:*; invalidate both on override change.

Definition of done:

POST /api/v1/admin/flags/:key/overrides and DELETE /api/v1/admin/flags/:key/overrides/:tenantId implemented
Mutual exclusion of enabled/disabled arrays enforced at domain level
Archived flag override rejection (ADM_FLAG_ARCHIVED)
Cache invalidation on override change (scoped and global flag cache keys)

PLTADM-US-005 — Evaluate feature flag for a tenant

Field	Value
Issue type	Story
Summary	As a downstream service, I can call the internal evaluate endpoint so that I can determine feature availability for a tenant with sub-120ms latency
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	feature-flag-module
Fix version	M0
Epic link	PLTADM-EPIC-02
FR references	FR-PLTADM-FF-006, FR-PLTADM-ENH-003, FR-PLTADM-ENH-004
Legacy FR refs	FR-ADM-FF-006, FR-ADM-ENH-003, FR-ADM-ENH-004

User story:

As a downstream service, I want to call GET /internal/admin/flags/:key/evaluate?tenantId=... so that I can determine whether a feature is enabled for a tenant with deterministic logic and p95 latency ≤ 120 ms.

Acceptance criteria:

Scenario: Evaluate archived flag returns false
  Given flag "OLD_FEATURE" has status=ARCHIVED
  When GET /internal/admin/flags/OLD_FEATURE/evaluate?tenantId=ten_DEMO001
  Then the response is 200 OK
  And { "enabled": false, "reason": "ARCHIVED" }

Scenario: Evaluate flag with disabled tenant override
  Given flag "NEW_DASHBOARD" has defaultEnabled=true
  And ten_DEMO001 is in disabledTenantIds
  When GET /internal/admin/flags/NEW_DASHBOARD/evaluate?tenantId=ten_DEMO001
  Then { "enabled": false, "reason": "TENANT_DISABLED" }

Scenario: Evaluate flag with enabled tenant override
  Given flag "NEW_DASHBOARD" has defaultEnabled=false
  And ten_DEMO001 is in enabledTenantIds
  When GET /internal/admin/flags/NEW_DASHBOARD/evaluate?tenantId=ten_DEMO001
  Then { "enabled": true, "reason": "TENANT_ENABLED" }

Scenario: Evaluate flag falls back to defaultEnabled
  Given flag "NEW_DASHBOARD" has defaultEnabled=true
  And ten_DEMO001 has no override
  When GET /internal/admin/flags/NEW_DASHBOARD/evaluate?tenantId=ten_DEMO001
  Then { "enabled": true, "reason": "DEFAULT" }

Scenario: Evaluate returns p95 <= 120ms under load
  Given 1000 concurrent evaluate calls with Redis cache warm
  Then p95 response time is <= 120ms

Scenario: Bootstrap endpoint returns all flag evaluations for a tenant
  Given 15 active flags, 2 with overrides for ten_DEMO001
  When GET /internal/admin/flags/bootstrap?tenantId=ten_DEMO001
  Then the response contains all 15 flags with their evaluated enabled values

Technical notes:

Evaluation logic order: ARCHIVED → false > disabledTenantIds → false > enabledTenantIds → true > defaultEnabled.
Redis cache: flag:{key}:{tenantId} TTL 60 s; warm on first call; invalidated via NATS event listener.
Bootstrap endpoint is used by service startup and client SDK hydration.
Internal endpoint restricted to cluster-internal IPs (no JWT required at evaluate path; network policy enforces).
Compatibility routes: GET /api/platform/flags/:key/evaluate redirects to internal path with deprecation header.

Definition of done:

GET /internal/admin/flags/:key/evaluate implements 4-step logic deterministically
GET /internal/admin/flags/bootstrap returns all active flags for tenant
Redis cache warm + TTL 60 s verified
Event-driven cache invalidation wired (flag.updated + flag.archived events)
p95 ≤ 120 ms verified under 1000 RPS load test
Compatibility route present with Deprecation and Sunset response headers

PLTADM-US-006 — View aggregate platform health

Field	Value
Issue type	Story
Summary	As a platform operator, I can call the health aggregate endpoint so that I can triage incidents with a single view of all service statuses
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	health-module
Fix version	M0
Epic link	PLTADM-EPIC-03
FR references	FR-PLTADM-HLT-001, FR-PLTADM-HLT-002
Legacy FR refs	FR-ADM-HLT-001, FR-ADM-HLT-002

User story:

As a platform operator, I want to call GET /api/v1/admin/health/aggregate so that I can see the overall platform health status and per-service breakdown to quickly triage incidents.

Acceptance criteria:

Scenario: Aggregate health returns overall UP when all services are healthy
  Given 5 registered health sources, all returning healthy in the last poll
  When GET /api/v1/admin/health/aggregate
  Then the response is 200 OK
  And { "overall": "UP", "services": [ { "name": "...", "status": "UP", "latencyMs": ... }, ... ] }

Scenario: Aggregate health returns DEGRADED when one service is unhealthy
  Given service "notification-service" returned UNHEALTHY in last poll
  When GET /api/v1/admin/health/aggregate
  Then { "overall": "DEGRADED", "services": [ ..., { "name": "notification-service", "status": "DOWN" } ] }

Scenario: Response is served from 10s cache
  Given the cache was populated 5 seconds ago
  When two rapid successive GET /api/v1/admin/health/aggregate calls are made
  Then both return 200 within 50ms (cache hit)
  And no upstream health probes are triggered

Scenario: Response returns within 2 seconds
  Given 27 registered health sources with staggered probe results
  When GET /api/v1/admin/health/aggregate
  Then the response time is <= 2000ms

Scenario: Non-authenticated request returns 401
  Given no Authorization header
  When GET /api/v1/admin/health/aggregate
  Then 401 Unauthorized

Technical notes:

Response cached at 10 s TTL in Redis; HealthPollerJob probes each source every 15 s in background.
overall logic: UP if all sources healthy; DEGRADED if ≥1 down but <50%; DOWN if ≥50% down.
At M0 health sources are seeded statically; dynamic registration added in PLTADM-US-007 (M1).
Response size: 27 services × ~100 bytes = ~2.7 KB; no pagination needed.

Definition of done:

GET /api/v1/admin/health/aggregate implemented with correct overall logic
10 s Redis cache wired to HealthPollerJob
p99 response time ≤ 2 s verified under load
SUPER_ADMIN authentication enforced
Static seed list of health sources populates on service start

PLTADM-US-007 — Register and update health sources dynamically

Field	Value
Issue type	Story
Summary	As a service instance, I can register itself as a health source so that dynamic deployments are reflected in the aggregate health without hardcoded lists
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S1
Components	health-module
Fix version	M1
Epic link	PLTADM-EPIC-03
FR references	FR-PLTADM-HLT-003, FR-PLTADM-HLT-004, FR-PLTADM-ENH-002
Legacy FR refs	FR-ADM-HLT-003, FR-ADM-HLT-004, FR-ADM-ENH-002

User story:

As a Kubernetes service instance, I want to POST my health endpoint to /internal/admin/health/sources on startup so that I am automatically included in the aggregate health view without requiring a hardcoded list update.

Acceptance criteria:

Scenario: Register a new health source
  Given service "new-service" has not previously registered
  When POST /internal/admin/health/sources
    with { "name": "new-service", "healthUrl": "http://new-service:3020/health" }
  Then the response is 201 Created
  And the source is stored in health_sources
  And event "platform_admin.health_source.registered.v1" is published

Scenario: Re-registration updates heartbeat timestamp
  Given source "identity-service" was last registered 30 seconds ago
  When POST /internal/admin/health/sources again with same payload
  Then the response is 200 OK
  And lastRegisteredAt is updated to now

Scenario: Stale source is marked unhealthy
  Given source "old-service" last registered 90 seconds ago (staleness threshold=60s)
  When HealthPollerJob runs
  Then "old-service" status is set to UNHEALTHY in health_sources
  And aggregate health reflects the degraded status

Scenario: Stale source re-registers and recovers
  Given "old-service" is marked UNHEALTHY due to staleness
  When "old-service" POSTs to /internal/admin/health/sources
  Then lastRegisteredAt is updated
  And on next poll the source returns to HEALTHY if its health endpoint responds 200

Technical notes:

POST /internal/admin/health/sources is idempotent on name; upsert by name.
Staleness threshold configurable via PLTADM_HEALTH_STALENESS_S (default 60 s).
HealthPollerJob CronJob runs every 15 s; probes healthUrl; updates health_check_results.
Dynamic registration replaces static seed list at M1; static seed remains as fallback behind feature flag DYNAMIC_HEALTH_REGISTRATION.

Definition of done:

POST /internal/admin/health/sources upserts by name with heartbeat update
Staleness check in HealthPollerJob marks stale sources UNHEALTHY
platform_admin.health_source.registered.v1 published on new registration
Static seed list kept behind DYNAMIC_HEALTH_REGISTRATION flag for rollback
Integration test: register → probe → aggregate reflects new source

PLTADM-US-008 — Verify service meets coverage and latency targets

Field	Value
Issue type	Story
Summary	As a platform team lead, I can confirm test coverage ≥ 80% and flag evaluate p95 ≤ 120ms so that platform-admin-service meets quality gates before M1 sign-off
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	cross-cutting
Fix version	M1
Epic link	PLTADM-EPIC-04
FR references	FR-PLTADM-NFR-001, FR-PLTADM-NFR-002
Legacy FR refs	NFR-ADM-001, NFR-ADM-002

User story:

As a platform team lead, I want to run the test suite and see coverage ≥ 80% with zero lint/typecheck errors, and confirm flag evaluate p95 ≤ 120 ms so that platform-admin-service can pass the quality gate and proceed to production.

Acceptance criteria:

Scenario: Unit and integration coverage threshold met
  When pnpm test:cov is executed
  Then overall statement coverage is >= 80%
  And branch coverage is >= 80%
  And the following test files exist:
    - config-module.spec.ts (allow-list, type validation, history)
    - feature-flag.spec.ts (CRUD, evaluation logic, cache)
    - health-aggregate.spec.ts (aggregation, staleness)
    - tenant-isolation.spec.ts
    - outbox.spec.ts
    - inbox.spec.ts

Scenario: ESLint and TypeScript type checks pass
  When pnpm lint && pnpm typecheck is executed
  Then exit code is 0 with zero errors

Scenario: Flag evaluate p95 latency target met
  Given Redis cache warm
  When 1000 concurrent evaluate requests are sent
  Then p95 response time is <= 120ms
  And p99 response time is <= 200ms

Scenario: Aggregate health p99 latency target met
  Given 27 registered health sources with cached results
  When 100 concurrent aggregate health requests are sent
  Then p99 response time is <= 2000ms

Technical notes:

Load test script: k6 run tests/load/flag-evaluate.k6.js targeting 1000 RPS for 60 s.
Coverage report generated by Vitest with --coverage flag; threshold enforced in vitest.config.ts.
CI gate: coverage check runs in the test job; load test runs in a separate perf-test job on pre-prod.

Definition of done:

vitest.config.ts coverage thresholds set to 80% (statements, branches, functions, lines)
All 6 mandatory test files present and passing
pnpm lint && pnpm typecheck returns exit 0
k6 load test confirms p95 ≤ 120 ms at 1000 RPS (recorded in CI artefact)
Health aggregate p99 ≤ 2 s verified

PLTADM-US-009 — Validate observability and audit trail completeness

Field	Value
Issue type	Story
Summary	As a platform SRE, I can confirm OTel traces are visible, SLO burn alerts are configured, and config audit history is preserved for 7 years
Status	To Do
Priority	Must
Labels	service:platform-admin, domain:platform_admin, slice:S0
Components	cross-cutting
Fix version	M1
Epic link	PLTADM-EPIC-04
FR references	FR-PLTADM-NFR-003
Legacy FR refs	NFR-ADM-003

User story:

As a platform SRE, I want to confirm that OpenTelemetry traces flow through to the tracing backend, SLO burn-rate alerts are active, and config_history rows are retained for 7 years so that the service meets platform-wide observability and compliance requirements.

Acceptance criteria:

Scenario: OTel trace visible for flag evaluate
  Given OTel exporter is configured and staging is running
  When GET /internal/admin/flags/OFFLINE_SYNC_V2/evaluate?tenantId=ten_DEMO001
  Then a trace is visible in the tracing backend
  And span "flag.evaluate" includes attributes: flag_key, tenant_id, evaluation_result, cache_hit

Scenario: OTel trace visible for config mutation
  When PATCH /api/v1/admin/platform-config/:key is called
  Then a trace with span "config.update" includes: config_key, config_scope, actor_sub

Scenario: SLO burn-rate alert fires on latency regression
  Given the SLO for flag evaluate p95 is 120ms
  When flag evaluate p95 exceeds 120ms for 5 consecutive minutes
  Then an alert fires to the SRE on-call channel

Scenario: Config audit history retained for 7 years
  Given a config mutation happened 2 years ago
  When querying config_history for that key
  Then the record is still present
  And the retention policy annotation confirms 7-year retention with S3 archive after 2 years

Technical notes:

OTel SDK: @opentelemetry/sdk-node; exporter: OTLP HTTP to OTEL_EXPORTER_OTLP_ENDPOINT.
Key span names: flag.evaluate, flag.cache.hit, flag.cache.miss, config.update, health.poll.
SLO burn-rate alert configured in Prometheus/Alertmanager: platform_admin_flag_evaluate_p95_ms > 120 for 5 min → page SRE.
config_history retention: 7-year PostgreSQL retention policy; rows older than 2 years archived to S3 via nightly job.

Definition of done:

OTel instrumentation in config-module, feature-flag-module, health-module (all key spans instrumented)
Traces visible in staging tracing backend
SLO burn-rate Prometheus alert rule deployed and tested (fire/resolve cycle)
config_history retention policy set via pg_partman or equivalent; S3 archive job configured
Compliance team sign-off on audit trail documented in SERVICE_READINESS.md

PLTADM-US-010 — Retrieve paginated config history

Field	Value
Issue type	Story
Summary	As a platform operator, I can retrieve paginated config change history so that I can audit who changed what and when
Status	To Do
Priority	Should
Labels	service:platform-admin, domain:platform_admin, slice:S1
Components	config-module
Fix version	M1
Epic link	PLTADM-EPIC-05
FR references	FR-PLTADM-ENH-001
Legacy FR refs	FR-ADM-ENH-001

User story:

As a platform operator, I want to call GET /api/v1/admin/platform-config/:key/history so that I can audit every change to a config entry with who made it and what values changed, with cursor-based pagination.

Acceptance criteria:

Scenario: Retrieve history for a config key
  Given 50 change records exist for key "session.timeout_minutes"
  When GET /api/v1/admin/platform-config/session.timeout_minutes/history?limit=20
  Then the response is 200 OK
  And the response contains 20 records sorted by changed_at DESC
  And each record includes: id, key, previous_value, new_value, changed_by, changed_at
  And a nextCursor is included in the response

Scenario: Cursor-based pagination yields consistent results
  Given the first page returned nextCursor "cursor_abc"
  When GET /api/v1/admin/platform-config/session.timeout_minutes/history?limit=20&cursor=cursor_abc
  Then the next 20 records are returned without duplicates

Scenario: History for unknown key returns 404
  When GET /api/v1/admin/platform-config/unknown.key/history
  Then 404 Not Found with error code "ADM_CONFIG_KEY_UNKNOWN"

Scenario: Secret-type key history redacts values
  Given key "smtp.password" has type=secret
  When GET /api/v1/admin/platform-config/smtp.password/history
  Then all previous_value and new_value fields show "***REDACTED***"

Technical notes:

Cursor: opaque base64-encoded { id, changed_at } for keyset pagination.
Default sort: changed_at DESC.
History endpoint is read-only; no write operations.
changed_by is the sub claim from the SUPER_ADMIN JWT that performed the mutation.

Definition of done:

GET /api/v1/admin/platform-config/:key/history implemented with cursor pagination
Sort order changed_at DESC enforced
Secret-type value redaction in history responses
changed_by populated from JWT sub on every mutation
Integration test: 50-record seed → paginate through all records in 3 pages

PLTADM-US-011 — List feature flags visible to a tenant admin

Field	Value
Issue type	Story
Summary	As a tenant admin, I can list feature flags applicable to my tenant so that I can understand which features are available to my organization
Status	To Do
Priority	Should
Labels	service:platform-admin, domain:platform_admin, slice:S1
Components	feature-flag-module
Fix version	M1
Epic link	PLTADM-EPIC-05
FR references	FR-PLTADM-ENH-003
Legacy FR refs	FR-ADM-ENH-003

User story:

As a tenant admin, I want to call GET /api/v1/tenant/flags so that I can see all active feature flags and their evaluated status for my tenant, enabling me to understand which features my organization can use.

Acceptance criteria:

Scenario: Tenant admin lists flags for their tenant
  Given I am authenticated as TENANT_ADMIN for tenant "ten_DEMO001"
  And 12 active flags exist; 3 have specific overrides for ten_DEMO001
  When GET /api/v1/tenant/flags
  Then the response is 200 OK
  And the response contains 12 flags (archived flags excluded)
  And each flag shows: key, description, enabled (evaluated for ten_DEMO001), hasOverride

Scenario: Archived flags are excluded from tenant listing
  Given 2 flags have status=ARCHIVED
  When GET /api/v1/tenant/flags
  Then the response does not include the 2 archived flags

Scenario: Tenant admin can only see their own tenant's flag evaluation
  Given I am TENANT_ADMIN for "ten_DEMO001"
  When GET /api/v1/tenant/flags
  Then all enabled values are evaluated for tenantId="ten_DEMO001"
  And I cannot see flags evaluated for any other tenant

Scenario: SUPER_ADMIN can query flags for any tenant
  Given I am authenticated as SUPER_ADMIN
  When GET /api/v1/tenant/flags?tenantId=ten_DEMO001
  Then flags evaluated for ten_DEMO001 are returned

Technical notes:

tenantId resolved from JWT tenant_id claim for TENANT_ADMIN; can be overridden with query param for SUPER_ADMIN only.
Response uses cached evaluate results where available (60 s TTL); falls back to DB.
hasOverride: true if tenant appears in either enabledTenantIds or disabledTenantIds.
Archived flags filtered at query level (WHERE status = 'ACTIVE').
Compatibility route: GET /api/platform/flags with deprecation header redirects to this endpoint.

Definition of done:

GET /api/v1/tenant/flags returns active flags with per-tenant evaluation
Archived flags excluded from response
TENANT_ADMIN scoped to their own tenantId; SUPER_ADMIN can override with query param
hasOverride field correctly populated
Compatibility route GET /api/platform/flags present with Deprecation response header
Unit test: 15 flags (3 archived, 3 with overrides) → correct subset returned with correct enabled values

Story index​

PLTADM-US-001 — Create and update platform config entries​

PLTADM-US-002 — Read and delete platform config entries​

PLTADM-US-003 — Create and update feature flags​

PLTADM-US-004 — Manage tenant overrides on feature flags​

PLTADM-US-005 — Evaluate feature flag for a tenant​

PLTADM-US-006 — View aggregate platform health​

PLTADM-US-007 — Register and update health sources dynamically​

PLTADM-US-008 — Verify service meets coverage and latency targets​

PLTADM-US-009 — Validate observability and audit trail completeness​

PLTADM-US-010 — Retrieve paginated config history​

PLTADM-US-011 — List feature flags visible to a tenant admin​

Story index

PLTADM-US-001 — Create and update platform config entries

PLTADM-US-002 — Read and delete platform config entries

PLTADM-US-003 — Create and update feature flags

PLTADM-US-004 — Manage tenant overrides on feature flags

PLTADM-US-005 — Evaluate feature flag for a tenant

PLTADM-US-006 — View aggregate platform health

PLTADM-US-007 — Register and update health sources dynamically

PLTADM-US-008 — Verify service meets coverage and latency targets

PLTADM-US-009 — Validate observability and audit trail completeness

PLTADM-US-010 — Retrieve paginated config history

PLTADM-US-011 — List feature flags visible to a tenant admin