AI Integration

:::info Source Sourced from services/assignment-service/AI_INTEGRATION.md in the documentation repo. :::

Companion: 03 ai-gateway-service · SERVICE_OVERVIEW (Slice S5)

1. Scope

Assignment-service never talks to an LLM directly. All AI calls are routed through ai-gateway-service, which provides routing, prompt-registry lookup, cost accounting, PII redaction, provider fallback, and traceability.

Two AI-augmented features:

Feature	Slice	Trigger
Suggested Assignments	S5	POST `/api/v1/assignments/suggest`
Explain Compliance Gap	S5	Query-time augmentation of compliance report

Both features are human-in-the-loop — no action is taken without an admin reviewing and accepting the proposal.

2. Prompt Registry

Prompts live in @ghasi/prompt-registry, keyed by promptId:

promptId	Purpose	Version at M4
`assignment/suggest`	Given tenant context → propose assignment(s)	`1.0.0`
`assignment/explain_gap`	Explain why an OU has N open windows	`1.0.0`
`assignment/target_suggest`	Given course → propose target groups	`0.2.0` (experimental)

Every prompt ships:

System prompt (immutable per version)
Few-shot examples
Output JSON schema (Zod on our side, matching)
Min / max token, temperature caps
Permitted model tiers (typically standard or premium)

3. Feature: Suggested Assignments

3.1 Input

interface SuggestContext {
  tenantId: TenantId;
  orgUnitId?: OrgUnitId;
  roleId?: RoleId;
  complianceFramework?: 'JCI'|'HIPAA'|'GDPR'|'ISO27001'|'OSHA'|string;
  lookbackDays?: number;           // default 365
  priorCompletions?: CompletionSnapshot[];  // small synopsis
  availableCourses: Array<{
    courseId: CourseId;
    title: I18nString;
    tags: string[];
    duration: ISODuration;
  }>;
}

3.2 Call path

admin → POST /assignments/suggest
  → AssignmentService.build_context_from_tenant_projection()
  → AI Gateway.invoke(promptId='assignment/suggest', input=ctx, model='auto')
  → Gateway applies: prompt caching, PII redaction, cost meter, fallback chain
  → returns { proposal, rationale } + AIProvenance
  → Assignment Service validates proposal via Zod schema
  → stores proposal under pending_ai_suggestion table (TTL 7d)
  → returns to admin UI

3.3 Output contract (strict)

interface SuggestOutput {
  proposal: CreateAssignmentCommand;     // MUST pass full Zod validation
  rationale: I18nString;
  confidence: number;                    // 0..1
  alternatives?: Array<{ proposal: CreateAssignmentCommand; rationale: I18nString }>;
}

If AI returns an invalid proposal → we return HTTP 502 ai/bad-proposal and log a regression sample.

3.4 HITL flow

No auto-activation. The admin:

Sees the proposal pre-filled in the authoring UI.
Adjusts any field.
Calls the standard POST /api/v1/assignments (which we mark aiSuggested=true and attach aiProvenance using the stored suggestion id).
Activates separately.

pending_ai_suggestion → assignment.ai_provenance link confirms the lineage.

4. Feature: Explain Compliance Gap

Query-time call, not saga-driven.

admin requests /compliance-report?explainGap=true
  → we produce structured compliance summary
  → AI Gateway.invoke(promptId='assignment/explain_gap', input=summary)
  → returns I18n explanation + suggested remediation actions
  → returned to admin alongside report

Outputs are informational only — never auto-create assignments.

5. AIProvenance Capture

Every AI-touched entity carries AIProvenance:

{
  model: 'claude-sonnet-4-20250514',
  version: 'gw-routed',
  promptId: 'assignment/suggest',
  promptVersion: '1.0.0',
  traceId: '01HXYZ…',
  decisionId: 'dec_…',
  local: false,
  generatedAt: '2026-04-15T…',
  reviewedBy: 'usr_admin_42',
  reviewedAt: '2026-04-15T…',
  cost: { microUSD: 18200, tokens: { in: 1320, out: 642 } }
}

reviewedBy / reviewedAt populated when admin activates the suggestion.

6. Cost & Rate

Budget: ≤ $1.00 per successful suggest call at p95 (tracked by Gateway).
Rate limit: 100 suggest calls / day / tenant (tunable).
explain_gap cost: ≤ $0.05 p95 (smaller context).

7. Safety & Governance

No PII in prompts. User IDs hashed to stable per-tenant tokens. Names/emails removed by Gateway redactor.
Bounded outputs. Max 10 proposed target groups, max 3 alternative proposals, max 2000 output tokens.
Tenant opt-out. If tenant policy has aiFeatures.suggestionsEnabled=false, the endpoint returns 403 ai/tenant-disabled.
Model allow-list. Per tenant, controlled by AI Gateway.
Prompt-injection defense. All tenant text concatenated into the prompt is wrapped in delimiter markers and Gateway's injection classifier is run pre-call.
No chained autonomy. AI cannot call activate. Period.

8. On-Device AI (future)

Reserved for M6+: lightweight on-device models used for admin productivity hints when offline (e.g., "similar courses in your tenant"). Not in S5.

9. Evaluation

Golden-set prompt eval in CI: ≥ 30 labelled (context, expected proposal) pairs. Pass criterion: 90% of outputs must produce a proposal that validates AND scores ≥ 0.7 cosine similarity against golden on title, targets, rrule.
Regression harness runs weekly in staging.

10. Observability

Each AI call produces:

OTel trace with ai.gateway.call span
Metrics: assignment.ai.suggest.count, …latency, …cost_micro_usd, …validation_failure
Log: structured with traceId, promptId, promptVersion, model, status

11. Failure Modes

Failure	Handling
Gateway down	Return 503; admin UI degrades to manual authoring
Model returns malformed JSON	Gateway already validates; on pass-through failure → 502 with support id
Cost budget exceeded	Gateway rejects with 402; we surface friendly error
Prompt-injection detected	Gateway returns 422; we log + alert
Tenant AI disabled	403 at our layer before calling Gateway

1. Scope​

2. Prompt Registry​

3. Feature: Suggested Assignments​

3.1 Input​

3.2 Call path​

3.3 Output contract (strict)​

3.4 HITL flow​

4. Feature: Explain Compliance Gap​

5. AIProvenance Capture​

6. Cost & Rate​

7. Safety & Governance​

8. On-Device AI (future)​

9. Evaluation​

10. Observability​

11. Failure Modes​