Skip to main content

Platform Admin Service — AI Integration

Status: populated Owner: TBD Last updated: 2026-04-18 Companion: Service Template · ai-gateway-service · 03 platform-services

1. Overview

The platform-admin-service has limited, advisory-only AI integration via ai-gateway-service. Core operations (configuration governance, feature flag evaluation, health status derivation) remain fully deterministic. AI is used exclusively for operational intelligence and operator assistance — never for automated remediation. All AI-surfaced findings require Human-in-the-Loop (HITL) review before any action is taken.


2. AI Calls Catalog

#FeaturePurposeTierHITLStatus
1Cross-tenant health anomaly detectionIdentify correlated degradation patterns across multiple registered services that rule-based threshold alerts missB (internal ops)Required — SRE must confirm before any incident is openedPlanned (M3)
2Natural-language incident queryAllow platform operators to query aggregated health and incident history in plain language (e.g., "Which services had elevated error rates last Tuesday?")BHITL — operator reviews AI-assembled query result before actingPlanned (M3)
3Automated incident categorizationClassify incoming health anomalies by probable root-cause category (e.g., db_latency, upstream_dependency, resource_exhaustion) to pre-populate incident triage fieldsBRequired — SRE confirms or overrides category before incident is filedPlanned (M3)

3. Integration Details

3.1 Cross-Tenant Health Anomaly Detection

Endpoint called: POST /v1/ai/completions on ai-gateway-service (Tier B — internal ops, non-patient-facing)

Trigger: Scheduled analysis job runs every 5 minutes against the aggregated health time-series collected by the platform-admin health poller.

Prompt template name: platform_health_anomaly_v1

Inputs: Aggregated service health status time series (last 30 min); error rate deltas per service; no PHI; no tenant-identifiable data beyond service names.

Output: List of suspected correlated degradation patterns with confidence score. Displayed in the Super Admin console. SRE must acknowledge and decide whether to open an incident.

HITL gate: No automated incident creation. The AI output is advisory only. All remediation actions (e.g., scaling a service, triggering a rollback) are initiated manually by the platform operator.

3.2 Natural-Language Incident Query

Endpoint called: POST /v1/ai/completions on ai-gateway-service

Prompt template name: platform_incident_query_v1

Inputs: Operator's natural-language question (max 500 chars); health event log summary (last 7 days, aggregated, no PHI).

Output: Structured query result assembled by the AI, displayed to the operator for review. The operator must confirm the interpretation before exporting or acting on the result.

3.3 Automated Incident Categorization

Prompt template name: platform_incident_categorize_v1

Inputs: Health event snapshot (error type, affected services, rate delta); no PHI.

Output: Suggested category label + confidence score. Pre-populates the incident triage form in the Super Admin console. SRE confirms or overrides before submission.


4. Constraints and Guardrails

GuardrailDescription
No automated remediationAI findings never trigger configuration changes, feature flag toggles, or scaling actions without explicit operator confirmation
No PHI in promptsHealth metrics are anonymized aggregates; no patient identifiers or clinical data are used
No tenant PII in promptsTenant names are masked to tenant_{hash} in AI prompts
HITL mandatoryAll three AI features require operator confirmation before any downstream action
Feature flag gatedAll three features gated behind ai.platform-ops feature flag; disabled by default
AuditAll AI interactions logged to ai-gateway-service audit stream with operatorId and prompt template version

5. Non-AI Operations (Deterministic)

The following operations are explicitly not AI-assisted and will remain deterministic:

OperationWhy not AI
Config governance / allow-list validationMust be strictly typed and auditable; no probabilistic output
Feature flag evaluationStateless boolean logic; determinism required for Kong route decisions
Health status derivation (healthy/degraded/unhealthy)Rule-based thresholds; SRE must trust the signal without AI variability