Platform Admin Service — AI Integration

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · ai-gateway-service · 03 platform-services

1. Overview

The platform-admin-service has limited, advisory-only AI integration via ai-gateway-service. Core operations (configuration governance, feature flag evaluation, health status derivation) remain fully deterministic. AI is used exclusively for operational intelligence and operator assistance — never for automated remediation. All AI-surfaced findings require Human-in-the-Loop (HITL) review before any action is taken.

2. AI Calls Catalog

#	Feature	Purpose	Tier	HITL	Status
1	Cross-tenant health anomaly detection	Identify correlated degradation patterns across multiple registered services that rule-based threshold alerts miss	B (internal ops)	Required — SRE must confirm before any incident is opened	Planned (M3)
2	Natural-language incident query	Allow platform operators to query aggregated health and incident history in plain language (e.g., "Which services had elevated error rates last Tuesday?")	B	HITL — operator reviews AI-assembled query result before acting	Planned (M3)
3	Automated incident categorization	Classify incoming health anomalies by probable root-cause category (e.g., `db_latency`, `upstream_dependency`, `resource_exhaustion`) to pre-populate incident triage fields	B	Required — SRE confirms or overrides category before incident is filed	Planned (M3)

3. Integration Details

3.1 Cross-Tenant Health Anomaly Detection

Endpoint called: POST /v1/ai/completions on ai-gateway-service (Tier B — internal ops, non-patient-facing)

Trigger: Scheduled analysis job runs every 5 minutes against the aggregated health time-series collected by the platform-admin health poller.

Prompt template name: platform_health_anomaly_v1

Inputs: Aggregated service health status time series (last 30 min); error rate deltas per service; no PHI; no tenant-identifiable data beyond service names.

Output: List of suspected correlated degradation patterns with confidence score. Displayed in the Super Admin console. SRE must acknowledge and decide whether to open an incident.

HITL gate: No automated incident creation. The AI output is advisory only. All remediation actions (e.g., scaling a service, triggering a rollback) are initiated manually by the platform operator.

3.2 Natural-Language Incident Query

Endpoint called: POST /v1/ai/completions on ai-gateway-service

Prompt template name: platform_incident_query_v1

Inputs: Operator's natural-language question (max 500 chars); health event log summary (last 7 days, aggregated, no PHI).

Output: Structured query result assembled by the AI, displayed to the operator for review. The operator must confirm the interpretation before exporting or acting on the result.

3.3 Automated Incident Categorization

Prompt template name: platform_incident_categorize_v1

Inputs: Health event snapshot (error type, affected services, rate delta); no PHI.

Output: Suggested category label + confidence score. Pre-populates the incident triage form in the Super Admin console. SRE confirms or overrides before submission.

4. Constraints and Guardrails

Guardrail	Description
No automated remediation	AI findings never trigger configuration changes, feature flag toggles, or scaling actions without explicit operator confirmation
No PHI in prompts	Health metrics are anonymized aggregates; no patient identifiers or clinical data are used
No tenant PII in prompts	Tenant names are masked to `tenant_{hash}` in AI prompts
HITL mandatory	All three AI features require operator confirmation before any downstream action
Feature flag gated	All three features gated behind `ai.platform-ops` feature flag; disabled by default
Audit	All AI interactions logged to ai-gateway-service audit stream with `operatorId` and prompt template version

5. Non-AI Operations (Deterministic)

The following operations are explicitly not AI-assisted and will remain deterministic:

Operation	Why not AI
Config governance / allow-list validation	Must be strictly typed and auditable; no probabilistic output
Feature flag evaluation	Stateless boolean logic; determinism required for Kong route decisions
Health status derivation (healthy/degraded/unhealthy)	Rule-based thresholds; SRE must trust the signal without AI variability

1. Overview​

2. AI Calls Catalog​

3. Integration Details​

3.1 Cross-Tenant Health Anomaly Detection​

3.2 Natural-Language Incident Query​

3.3 Automated Incident Categorization​

4. Constraints and Guardrails​

5. Non-AI Operations (Deterministic)​