Platform Admin Service — AI Integration
Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · ai-gateway-service · 03 platform-services
1. Overview
The platform-admin-service has limited, advisory-only AI integration via ai-gateway-service. Core operations (configuration governance, feature flag evaluation, health status derivation) remain fully deterministic. AI is used exclusively for operational intelligence and operator assistance — never for automated remediation. All AI-surfaced findings require Human-in-the-Loop (HITL) review before any action is taken.
2. AI Calls Catalog
| # | Feature | Purpose | Tier | HITL | Status |
|---|---|---|---|---|---|
| 1 | Cross-tenant health anomaly detection | Identify correlated degradation patterns across multiple registered services that rule-based threshold alerts miss | B (internal ops) | Required — SRE must confirm before any incident is opened | Planned (M3) |
| 2 | Natural-language incident query | Allow platform operators to query aggregated health and incident history in plain language (e.g., "Which services had elevated error rates last Tuesday?") | B | HITL — operator reviews AI-assembled query result before acting | Planned (M3) |
| 3 | Automated incident categorization | Classify incoming health anomalies by probable root-cause category (e.g., db_latency, upstream_dependency, resource_exhaustion) to pre-populate incident triage fields | B | Required — SRE confirms or overrides category before incident is filed | Planned (M3) |
3. Integration Details
3.1 Cross-Tenant Health Anomaly Detection
Endpoint called: POST /v1/ai/completions on ai-gateway-service (Tier B — internal ops, non-patient-facing)
Trigger: Scheduled analysis job runs every 5 minutes against the aggregated health time-series collected by the platform-admin health poller.
Prompt template name: platform_health_anomaly_v1
Inputs: Aggregated service health status time series (last 30 min); error rate deltas per service; no PHI; no tenant-identifiable data beyond service names.
Output: List of suspected correlated degradation patterns with confidence score. Displayed in the Super Admin console. SRE must acknowledge and decide whether to open an incident.
HITL gate: No automated incident creation. The AI output is advisory only. All remediation actions (e.g., scaling a service, triggering a rollback) are initiated manually by the platform operator.
3.2 Natural-Language Incident Query
Endpoint called: POST /v1/ai/completions on ai-gateway-service
Prompt template name: platform_incident_query_v1
Inputs: Operator's natural-language question (max 500 chars); health event log summary (last 7 days, aggregated, no PHI).
Output: Structured query result assembled by the AI, displayed to the operator for review. The operator must confirm the interpretation before exporting or acting on the result.
3.3 Automated Incident Categorization
Prompt template name: platform_incident_categorize_v1
Inputs: Health event snapshot (error type, affected services, rate delta); no PHI.
Output: Suggested category label + confidence score. Pre-populates the incident triage form in the Super Admin console. SRE confirms or overrides before submission.
4. Constraints and Guardrails
| Guardrail | Description |
|---|---|
| No automated remediation | AI findings never trigger configuration changes, feature flag toggles, or scaling actions without explicit operator confirmation |
| No PHI in prompts | Health metrics are anonymized aggregates; no patient identifiers or clinical data are used |
| No tenant PII in prompts | Tenant names are masked to tenant_{hash} in AI prompts |
| HITL mandatory | All three AI features require operator confirmation before any downstream action |
| Feature flag gated | All three features gated behind ai.platform-ops feature flag; disabled by default |
| Audit | All AI interactions logged to ai-gateway-service audit stream with operatorId and prompt template version |
5. Non-AI Operations (Deterministic)
The following operations are explicitly not AI-assisted and will remain deterministic:
| Operation | Why not AI |
|---|---|
| Config governance / allow-list validation | Must be strictly typed and auditable; no probabilistic output |
| Feature flag evaluation | Stateless boolean logic; determinism required for Kong route decisions |
| Health status derivation (healthy/degraded/unhealthy) | Rule-based thresholds; SRE must trust the signal without AI variability |