AI Integration
:::info Source
Sourced from services/media-service/AI_INTEGRATION.md in the documentation repo.
:::
1. AI Capabilities
| Capability | Prompt | Classification |
|---|---|---|
| Image generation | media.image.generate | Limited-risk |
| TTS (text-to-speech) | media.audio.tts | Limited-risk |
| Auto-captioning (STT) | media.stt.caption | Limited-risk |
| Transcript generation | media.stt.transcript | Limited-risk |
| Image alt-text | media.image.alt_text | Limited-risk |
| Content safety pre-scan | media.safety.classify | Safety |
| Image-to-image edit (M5+) | media.image.edit | Limited-risk |
All via AIClient port with provenance.
2. Safety Pipeline
- Pre-generate: prompt moderated by ai-gateway.
- Post-generate: image moderated (NSFW, violence, CSAM).
- Hit → quarantine + law enforcement reporting if CSAM.
3. Provenance
Every AI artifact has aiProvenance:
- Model + prompt ID + version.
- Traces back to requesting user + decision ID.
- Visible in UI as "AI-generated" badge.
4. Cost Controls
- Per-tenant AI budget enforced at ai-gateway.
- Image gen ~$0.02 per image.
- STT ~$0.006 per minute.
- Budget UI alerts at 80%.
5. Local vs Cloud
- Image generation: cloud (quality).
- TTS: cloud preferred; local for quick preview (lower quality).
- STT: cloud + optional on-device for privacy-sensitive tenants.
6. Data Privacy
- Uploaded images → AI for alt-text: with tenant consent; redaction applied.
noTrainflag verified on all providers.- HIPAA tenants: restricted providers with BAA.
7. Caching
- Same (prompt hash, model, size) → cache 30 days.
- STT of same (asset sha256, model, language) → cache indefinitely (deterministic).
8. Right to Explanation
- AI-generated captions show confidence score.
- Alt-text shows what the model "saw" + suggests review.
9. Bias & Accessibility
- Caption quality eval quarterly (accuracy on diverse speakers).
- Image gen bias eval on demographic prompts.