Skip to main content

AI Integration

:::info Source Sourced from services/media-service/AI_INTEGRATION.md in the documentation repo. :::

1. AI Capabilities

CapabilityPromptClassification
Image generationmedia.image.generateLimited-risk
TTS (text-to-speech)media.audio.ttsLimited-risk
Auto-captioning (STT)media.stt.captionLimited-risk
Transcript generationmedia.stt.transcriptLimited-risk
Image alt-textmedia.image.alt_textLimited-risk
Content safety pre-scanmedia.safety.classifySafety
Image-to-image edit (M5+)media.image.editLimited-risk

All via AIClient port with provenance.

2. Safety Pipeline

  • Pre-generate: prompt moderated by ai-gateway.
  • Post-generate: image moderated (NSFW, violence, CSAM).
  • Hit → quarantine + law enforcement reporting if CSAM.

3. Provenance

Every AI artifact has aiProvenance:

  • Model + prompt ID + version.
  • Traces back to requesting user + decision ID.
  • Visible in UI as "AI-generated" badge.

4. Cost Controls

  • Per-tenant AI budget enforced at ai-gateway.
  • Image gen ~$0.02 per image.
  • STT ~$0.006 per minute.
  • Budget UI alerts at 80%.

5. Local vs Cloud

  • Image generation: cloud (quality).
  • TTS: cloud preferred; local for quick preview (lower quality).
  • STT: cloud + optional on-device for privacy-sensitive tenants.

6. Data Privacy

  • Uploaded images → AI for alt-text: with tenant consent; redaction applied.
  • noTrain flag verified on all providers.
  • HIPAA tenants: restricted providers with BAA.

7. Caching

  • Same (prompt hash, model, size) → cache 30 days.
  • STT of same (asset sha256, model, language) → cache indefinitely (deterministic).

8. Right to Explanation

  • AI-generated captions show confidence score.
  • Alt-text shows what the model "saw" + suggests review.

9. Bias & Accessibility

  • Caption quality eval quarterly (accuracy on diverse speakers).
  • Image gen bias eval on demographic prompts.