Campaign Service (campaign-service) — Service Overview
Version: 1.0 Status: Draft Owner: Product Last Updated: 2026-04-20 Companion: DOMAIN_MODEL · API_CONTRACTS · EVENT_SCHEMAS · AI_INTEGRATION Related ADR: ADR-0004 National-Backbone Resilience
1. Purpose — Tenant-Facing Marketing & Notification Orchestration
The Campaign Service is the tenant-facing orchestration plane for high-volume, scheduled, segmented SMS programs — the equivalent of Twilio Engage, MessageBird Studio, or Infobip Moments. Where sms-orchestrator is the per-message API and pipeline, the Campaign Service is the per-program surface: define an audience, choose a template, schedule a window, throttle the firehose, run an A/B test, attach a kill-switch, and watch deliverability and conversion roll in.
It is the canonical home of:
- Campaign builder — segment query DSL over recipient profiles; schedule; throttle (TPS / per-MNO ceilings); A/B variants; kill-switch.
- Template catalog — versioned, approved tenant templates with merge fields, conditional content, multi-language variants (Pashto / Dari / English / Arabic).
- Approved-template workflow — submission → compliance review → approval → publish, paired with
EP-CE-13trusted-tenant fast-path so verified senders skip per-message review. - Campaign reporting — deliverability, spend, opt-outs, conversion (URL-callback or pixel), pivot tables and CSV.
The service is not on the per-message data plane — it stages and submits batches into sms-orchestrator (or channel-router-service for multi-channel) and consults compliance-engine, consent-ledger-service, and analytics-service along the way.
2. Position in the Platform
Tenant Marketer / Ops
│
▼ https://app.ghasi.af/campaigns
┌────────────────────────┐
│ customer-portal │ (UI only)
└───────────┬────────────┘
│ HTTPS / mTLS
▼
┌─────────────────────────────────────┐
│ campaign-service │
│ │
│ ┌──────────┐ ┌────────────────┐ │
│ │ Builder │ │ Template cat. │ │
│ └──────────┘ └────────────────┘ │
│ ┌──────────┐ ┌────────────────┐ │
│ │ Scheduler │ │ Kill-switch │ │
│ │ + throt │ │ ≤ 5s stop │ │
│ └──────────┘ └────────────────┘ │
│ ┌──────────┐ ┌────────────────┐ │
│ │ A/B alloc │ │ Reporting │ │
│ └──────────┘ └────────────────┘ │
└────────────┬────────────────────────┘
│
┌──────────────────┬────────────────┼────────────────┬─────────────────┐
▼ ▼ ▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ compl.- │ │ consent- │ │ channel- │ │ sms- │ │ analytics- │
│ engine │ │ ledger- │ │ router- │ │ orchestrator │ │ service │
│ (templ. │ │ service │ │ service │ │ (per-msg │ │ (campaign │
│ approv. │ │ (opt-out │ │ (multi-ch │ │ ingest) │ │ reporting) │
│ + fast- │ │ + DND) │ │ fanout) │ │ │ │ │
│ path) │ │ │ │ │ │ │ │ │
└──────────┘ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
The campaign service composes the platform's existing services into a higher-level abstraction; it does not duplicate any of them.
3. Bounded Context
| Dimension | Value |
|---|---|
| Domain | Marketing / Bulk Notification / Tenant Productivity |
| Owner squad | Product |
| Deployment unit | Kubernetes Deployment — campaign-service (NestJS API + worker) |
| Communication style | Inbound: HTTPS (customer-portal, tenant API) · Outbound: HTTPS / gRPC to sms-orchestrator, channel-router-service, compliance-engine, consent-ledger-service, analytics-service · NATS for events |
| Storage | PostgreSQL schema campaign · Redis (kill-switch, dedupe, throttle counters) · ClickHouse via analytics-service for reporting reads |
| Failure mode | Fail-safe-stop — on dependency failure (compliance, consent), the campaign is paused with a tenant-visible reason; in-flight throttle continues to drain, no new sends queue. The data plane is unaffected. |
4. Responsibilities
| # | Responsibility |
|---|---|
| R1 | Provide a segment query DSL (JSON-DSL compiled to SQL against tenant recipient tables) to define audience for a campaign |
| R2 | Manage a template catalog with versioning, merge-field syntax (Mustache subset + ICU MessageFormat for plurals), conditional blocks, multi-language variants |
| R3 | Operate the approved-template workflow (submit → compliance review → approve / reject) and integrate EP-CE-13 trusted-tenant fast-path |
| R4 | Schedule campaigns (one-shot, recurring cron-like, time-zone aware) and dispatch into sms-orchestrator/channel-router-service honouring tenant throttle caps and per-MNO ceilings |
| R5 | Run A/B variant assignment with consistent hashing on recipient identifier so the same recipient always gets the same variant within a campaign |
| R6 | Provide a kill-switch that halts in-flight dispatch within ≤ 5 s end-to-end (P95) — measured from operator click to last queued message dropped |
| R7 | Consult consent-ledger-service to drop opted-out recipients before send; record drop reason for audit |
| R8 | Emit campaign.* lifecycle events (submitted, approved, started, paused, killed, completed) and per-recipient send/skip events for reporting |
| R9 | Surface campaign reporting (deliverability %, spend, opt-out delta, conversion via webhook callback) sourced from analytics-service and the consent ledger |
| R10 | Enforce per-tenant campaign quotas (max active campaigns, max messages/day, segment size cap) |
5. Non-Responsibilities
- Does not dispatch SMS directly — submits to
sms-orchestratororchannel-router-service. - Does not evaluate per-message compliance —
compliance-enginedoes that on every individual outbound message; the campaign service consults the engine only for template approval. - Does not own consent records —
consent-ledger-serviceis authoritative; campaign service is a read-mostly consumer. - Does not own billing —
billing-servicemeters spend; campaign service surfaces spend read-only viaanalytics-service. - Does not own recipient profile data beyond metadata needed for segmentation (it queries the tenant's recipient store via a defined contract).
6. Upstream / Downstream Dependencies
| Direction | Service | Protocol | Purpose |
|---|---|---|---|
| Inbound user | customer-portal UI | HTTPS | Builder, catalog, reporting |
| Inbound machine | Tenant API | HTTPS (OAuth/API key) | Programmatic campaign create / start / kill |
| Outbound | sms-orchestrator | HTTPS POST /v1/sms/bulk | Batch SMS submit |
| Outbound | channel-router-service | gRPC | Multi-channel fanout (SMS / WA / Voice) |
| Outbound | compliance-engine | gRPC | Template approval workflow + trusted-tenant fast-path |
| Outbound | consent-ledger-service | gRPC BatchCheckConsent(recipients[]) | Drop opted-out + DND recipients pre-send |
| Outbound | analytics-service | gRPC | Campaign reporting reads |
| Outbound | billing-service | gRPC | Pre-flight spend estimate; budget cap check |
| Outbound events | NATS JetStream | TCP | campaign.* lifecycle + per-recipient events |
| Inbound events | NATS JetStream consent.events.opt_out.v1 | TCP | Live opt-out propagation; mid-flight drop |
7. High-Level Flow — Campaign Submit → Run → Complete
8. Key Design Decisions
| Decision | Rationale |
|---|---|
| Segment DSL is JSON-AST, not raw SQL | Auditable, sandboxable, compiled to parameterised SQL server-side; eliminates SQL injection class |
Segment compiler emits EXPLAIN-validated SQL with mandatory tenant-id predicate and a hard row-cap | Prevents accidental cross-tenant scans; protects RDS from runaway queries |
Template merge syntax = Mustache subset + ICU MessageFormat plurals (e.g. {count, plural, one {1 message} other {# messages}}) | Mustache is familiar; ICU handles linguistic plurals correctly across en/ps/fa/ar |
A/B assignment via consistent hashing of recipientId (sha256(campaignId:recipientId) mod 100) | Same recipient always lands in the same variant for the campaign; deterministic without a join table |
| Kill-switch latency budget = 5 s end-to-end (P95) | Measured from kill click to last orchestrator submit; achieved by Redis-backed kill flag checked per-batch and worker poll loop ≤ 250 ms |
| Per-MNO throttle ceilings enforced (e.g. AWCC 100 TPS, Roshan 80 TPS) via token bucket with Redis backing | Protects MNO peers from tenant batch storms; respects interconnect SLAs |
Approved-template workflow honours EP-CE-13 trusted-tenant fast-path | Established tenants don't get blocked at template approval gate; per-message compliance still runs |
| Pre-flight consent batch check in chunks of 1000 | Reduces opt-out leakage by checking immediately before send rather than at submission time |
Live opt-out drop via consumed consent.events.opt_out.v1 events while a campaign runs | Mid-flight opt-outs are honoured within seconds, not at the next campaign |
Reporting reads come from analytics-service (ClickHouse) | Keeps PG schema small and lets reporting aggregate across multiple sources (DLR, opt-out, conversion) |
| Conversion tracked via signed callback URL or 1×1 pixel — tenant-supplied | Doesn't require integrating a tracking pipeline inside the campaign service |
| Campaign cannot enter RUNNING without an approved template and a successful batch consent check | Prevents both regulator violations and pointless dispatch cost |
| Campaigns dispatching to > 1000 recipients require explicit operator confirmation in the UI ("type the campaign name to confirm") | Reduces ops error blast-radius |
9. Runtime Topology
| Component | Stack | Replicas (prod) | Notes |
|---|---|---|---|
| Campaign API (NestJS) | Node 22 / NestJS 11 | 3 | CRUD, builder, reporting |
| Campaign worker | Node 22 | 4 | Scheduler, throttler, dispatch loop, kill-switch poll |
| Approval worker | Node 22 | 2 | Template lifecycle, fast-path resolver |
| PostgreSQL | Postgres 16 | 1 primary + 2 replicas | campaign schema |
| Redis | Redis 7 cluster | 3 nodes | Kill-flag, throttle tokens, dedupe |
| ClickHouse | via analytics-service | n/a | Reporting reads |
10. Aggregates Owned
- Campaign — lifecycle root: state machine (
DRAFT → SCHEDULED → RUNNING → PAUSED|KILLED|COMPLETED), schedule, throttle, segment ref, template ref, A/B config - CampaignSegment — JSON-DSL definition, compiled SQL fingerprint, last preview row count
- CampaignTemplate — versioned content per language, merge fields, approval state, links to
compliance-enginetemplate entity - CampaignBatch — per-1000-message dispatch unit; tracks accepted/rejected/dropped counts
- CampaignVariant — A/B variant config (allocation %, content delta, success metric)
- TemplateApproval — submission → review → approve/reject lifecycle
11. Standards & Compliance
- GDPR / Afghan Data Protection — opt-out propagation < 24 h (target seconds via live event consumption)
- TCPA-like principles — no marketing to opted-out recipients, ever
- Tenant data isolation — RLS on every table by
tenantId - Audit log of every state transition, kill click, and template approval — append-only, partitioned monthly
12. Cross-Service Contracts (summary)
- Submits batches to
sms-orchestrator.POST /v1/sms/bulk - Calls
compliance-engine.SubmitTemplate / ApproveTemplate / IsTrustedTenant - Calls
consent-ledger-service.BatchCheckConsent - Emits
campaign.created/submitted/started/paused/killed/completed/batch_dispatched - Consumes
consent.events.opt_out.v1for live mid-flight drops - Consumes
compliance.template.approved/rejectedevents
13. Out-of-Scope (v1.0)
- AI-generated campaign content (deferred — see AI_INTEGRATION.md §2)
- Multi-step drip / nurture sequences (v1.1)
- Inbound MO conversational replies tied to campaign attribution (v1.1; will use
channel-router-service.session-manager)
14. Glossary
| Term | Definition |
|---|---|
| Segment | Set of recipients defined by a JSON-DSL query against tenant recipient profiles |
| Throttle | Rate cap on messages submitted per second / per minute, optionally per MNO |
| A/B variant | Alternative content tested for performance; assignment is consistent per recipient |
| Kill-switch | Operator-triggered halt of in-flight dispatch within ≤ 5 s |
| Trusted-tenant fast-path | EP-CE-13 mechanism that bypasses per-message template re-review for established senders |