maintenance-service
Bounded Context: Maintenance (Supporting) · Owner: PMS Operations · Phase: 0 (basic ticketing) → Phase 1 (preventive + assets) · Storage: Cloud SQL Postgres (shared schema + RLS) + transactional outbox · Bundle: services/maintenance-service/
maintenance-service owns work orders for everything a hotel breaks, wears out, or has to keep alive: HVAC compressors, plumbing, electrical, generators, water tanks, BLE/lock devices, linen, furniture, IT, and structural issues. It is the system of record for the WorkOrder lifecycle (open → assigned → in_progress → blocked → resolved → verified, plus terminal cancelled), the PreventiveSchedule cron (generator run-hour service, HVAC seasonal, water-tank cleaning, lock-battery checks), the Asset registry per property, light PartUsage tracking, and Vendor records (often phone-only, no email — many service providers in the operating markets reach the platform via WhatsApp/SMS bridges).
It coordinates closely with housekeeping-service (housekeeping flags issues; maintenance owns them), property-service (rooms transitioned to OOO/OOS while a work order is active and severity warrants), lock-integration-service (auto-creates lock-maintenance work orders on device.health_alert.v1), and reservation-service (a high-severity WO on a room with an active reservation triggers a re-accommodation prompt).
Purpose
- Be the single authoritative aggregate for every "something needs fixing here" event in the hotel — internal report, guest complaint, housekeeping flag, lock health alert, or scheduled preventive task.
- Own the room-OOO contract: when severity warrants, a work order takes a room out of order (publishing
melmastoon.maintenance.work_order.room_blocked.v1);property-servicereacts and the BFF arrivals board reflects it. When the WO is verified, the room is returned to the housekeeping queue. - Own the preventive cadence for safety-critical assets — generator run-hours (load-shedding markets in Afghanistan/Pakistan border areas push generators hard; a missed oil change is a fire risk), HVAC seasonal filter swaps in dust-heavy regions, quarterly water-tank chlorination, lock-device battery monitoring.
- Coordinate with vendors that may have no email — WhatsApp/SMS bridge via
notification-service, callback windows captured as structured fields, and a manual-acknowledgement path so a vendor confirming "I'll be there at 5pm" via SMS lands as a structured event. - Track cost per work order (labor minutes × rate, parts consumed × cost, vendor invoice) so finance can roll up cost-per-room-night and cost-per-property in
analytics-service.
The service does not own:
- Room state transitions themselves (
property-serviceownsRoom.status); we publish events that drive them. - Folio postings for vendor invoices (
billing-serviceowns posting; we publish a recorded-invoice event with line items). - Lock device pairing or vendor credentials (
lock-integration-serviceowns those; we only consumehealth_alert.v1and create a WO). - Housekeeping turnover tasks (
housekeeping-serviceowns those; we just emitverified.v1and they re-queue cleaning).
Key responsibilities
- Work order lifecycle —
open → assigned → in_progress → blocked → resolved → verified, with first-class cancellation, re-assignment, and OCC-protected concurrency. - Auto-creation paths — react to
housekeeping.room.maintenance_required.v1,lock.device.health_alert.v1,preventive.due.v1(own subject), andreservation.guest_complaint.recorded.v1(Phase 2) by drafting work orders with appropriate severity/category and assignment hints. - Severity-driven OOO — a WO with severity
highorcriticalon aroomasset publisheswork_order.room_blocked.v1;property-serviceflips the room toout_of_order. Onverified.v1, the block is lifted. - Preventive schedules — recurring rules per asset class (e.g., "generator: every 250 run-hours OR every 90 days, whichever first"; "HVAC dust filter: every 30 days in
Asia/Kabul"; "water tank: every 90 days"; "lock device battery: check every 30 days, replace at < 25%"). The scheduler emitspreventive.due.v1which we materialise as a draft WO. - Vendor coordination — vendor records with optional email/phone;
notification-servicechannel preference per vendor (whatsapp | sms | email | call_only); structured callback windows; manual-acknowledgement endpoint for staff to record "vendor confirmed by phone". - Parts inventory (light) — per-property part stock with
partNumber, name, on-hand quantity, reorder threshold, last-purchased cost; PartUsage records consumption against a WO and emitspart_usage.recorded.v1. Heavy inventory (rate, distributor, multi-warehouse) is out of scope. - Cost tracking — labor (assignee × minutes × hourly rate) + parts (
PartUsage.cost) + vendor invoice; rolled up onto the WorkOrder; emitted inresolved.v1foranalytics-service. - SLA tracking & auto-escalation — per category and severity, an
slaTargetMinuteswindow. Breaches emitsla_breached.v1and trigger escalation per tenant policy (escalate to GM after N breaches, etc.). - Asset registry — per property: HVAC units (with model, install date, last-service date, run-hours), generators (capacity, last-oil-change run-hours), water tanks (capacity, last-cleaned date), lock devices (mirrored from
lock-integration-servicefor joins), linen sets, furniture lots. Each asset has ahealthIndex(0–100) updated by AI on event signals. - Re-accommodation handshake — when a
roomasset goes OOO and a confirmed reservation overlaps the OOO window, emitwork_order.relocation_required.v1forreservation-serviceto fan into aroom_changesub-saga.
Aggregates owned
| Aggregate | Cardinality | Purpose | Identity prefix |
|---|---|---|---|
WorkOrder | root, 1 per ticket | State machine, severity, category, asset link, cost rollup, SLA timers | mnt_ |
MaintenanceTask | 0..N per WO | Sub-step inside a WO (e.g., "drain tank", "refill", "test") used for multi-step jobs | mtk_ |
PreventiveSchedule | 1..N per Asset | Recurring rule producing draft WOs | psch_ |
Asset | 1..N per Property | Registered HVAC unit, generator, water tank, lock device, linen lot, furniture lot, etc. | ast_ |
PartUsage | 0..N per WO | Consumption row referencing a part stocked at the property | pus_ |
Vendor | 1..N per tenant | External contractor or supplier with channel preference | vnd_ |
MaintenanceCategory | 9 canonical entries (per tenant override allowed) | plumbing, electrical, hvac, lock, generator, water, structural, it, other | mcat_ |
Key APIs (REST, /api/v1/maintenance)
| Method | Path | Purpose |
|---|---|---|
POST | /work-orders | Create a work order (manual report or BFF-orchestrated auto path) |
GET | /work-orders/:id | Read full work order |
GET | /work-orders | List with filters (status, severity, category, assetId, assigneeId, vendorId, propertyId, dateFrom, dateTo) |
PATCH | /work-orders/:id | Update narrow fields (description, severity, category) |
POST | /work-orders/:id/assign | Assign to internal staff or vendor |
POST | /work-orders/:id/start | Transition to in_progress |
POST | /work-orders/:id/block | Mark blocked with reason and (optionally) ETA |
POST | /work-orders/:id/resolve | Submit resolution + final cost lines + parts used |
POST | /work-orders/:id/verify | GM/owner verifies the resolution; releases room OOO if applicable |
POST | /work-orders/:id/cancel | Cancel with reason |
POST | /work-orders/:id/escalate | Manual escalation hop |
POST | /work-orders/:id/parts-usage | Append a PartUsage row |
POST | /work-orders/:id/vendor-acknowledged | Staff records vendor verbal/SMS confirmation |
POST | /work-orders/:id/vendor-invoice | Record vendor invoice (amount, due date, file ref) |
GET | /preventive-schedules / POST / PATCH /:id / DELETE /:id | CRUD on schedules |
POST | /preventive-schedules/:id/trigger-now | Manual fire (creates draft WO immediately) |
GET | /assets / POST / PATCH /:id / DELETE /:id | Asset registry CRUD |
POST | /assets/:id/health-update | Manual or sensor health data point |
GET | /vendors / POST / PATCH /:id / DELETE /:id | Vendor CRUD |
GET | /parts / POST / PATCH /:id | Light parts inventory |
Consumed by bff-backoffice-service (staff) and Pub/Sub push handlers for the auto-creation paths. Never called directly from bff-tenant-booking-service or bff-consumer-service.
Key events published
| Event | Trigger |
|---|---|
melmastoon.maintenance.work_order.created.v1 | A WO is created (manual, auto, or scheduled) |
melmastoon.maintenance.work_order.assigned.v1 | Assignee (staff or vendor) attached |
melmastoon.maintenance.work_order.started.v1 | assigned → in_progress |
melmastoon.maintenance.work_order.in_progress.v1 | Status note appended while in progress |
melmastoon.maintenance.work_order.blocked.v1 | Blocked with reason + optional ETA |
melmastoon.maintenance.work_order.resolved.v1 | Resolved with cost roll-up |
melmastoon.maintenance.work_order.verified.v1 | GM/owner sign-off; releases OOO; final |
melmastoon.maintenance.work_order.cancelled.v1 | Cancelled |
melmastoon.maintenance.work_order.escalated.v1 | Manual or auto escalation hop |
melmastoon.maintenance.work_order.sla_breached.v1 | SLA timer elapsed without target transition |
melmastoon.maintenance.work_order.room_blocked.v1 | Auto-OOO request to property-service |
melmastoon.maintenance.work_order.relocation_required.v1 | A confirmed reservation overlaps the OOO window |
melmastoon.maintenance.preventive.scheduled.v1 | New schedule created or modified |
melmastoon.maintenance.preventive.due.v1 | Schedule cron fired; draft WO materialised |
melmastoon.maintenance.preventive.completed.v1 | Resolved WO that originated from a schedule; updates next-due timestamp |
melmastoon.maintenance.asset.registered.v1 | Asset added |
melmastoon.maintenance.asset.health_changed.v1 | Health index changed (manual, AI, or event-driven) |
melmastoon.maintenance.vendor.assigned.v1 | A vendor was attached to a WO (subset of assigned.v1) |
melmastoon.maintenance.vendor.invoice_recorded.v1 | Vendor invoice recorded against a WO |
melmastoon.maintenance.part_usage.recorded.v1 | Part consumed against a WO |
Key events consumed
| Event | Effect |
|---|---|
melmastoon.housekeeping.room.maintenance_required.v1 | Auto-create WO with `category=hvac |
melmastoon.lock_integration.device.health_alert.v1 | Auto-create WO in category=lock, asset linked to the lock device; severity per battery / online status |
melmastoon.property.room.taken_out_of_order.v1 | Find any active WOs on that room and link them so the OOO source-of-truth chain is intact |
melmastoon.staff.shift.started.v1 | Refresh in-memory roster of available technicians for assignment hints |
melmastoon.tenant.settings.changed.v1 | Refresh SLA targets, escalation rules, preventive defaults |
melmastoon.reservation.checked_in.v1 | Re-evaluate active WOs on the room/property for in-stay impact |
melmastoon.billing.vendor_invoice.posted.v1 | Mark our Vendor.invoiceRecorded as posted_to_folio for reconciliation |
Upstream / downstream
Upstream (we consume): housekeeping-service, lock-integration-service, property-service, staff-service, tenant-service, reservation-service (Phase 2 guest complaint).
Downstream (we publish for): property-service (room OOO/OOS), reservation-service (relocation prompt), housekeeping-service (re-queue clean after verify), notification-service (vendor outbound, GM escalations), billing-service (vendor invoice posting), analytics-service, audit-service, search-aggregation-service (asset & vendor search), sync-service (desktop), bff-backoffice-service.
Non-functional requirements
| NFR | Target |
|---|---|
| WO create-to-assign p95 (auto-path) | < 5 s end-to-end including notification fan-out |
| WO list query p95 (filtered, < 5 k rows) | < 400 ms |
| Preventive scheduler lag | < 60 s beyond due time |
| SLA breach detection lag | < 60 s beyond breach moment |
| API availability | 99.9% monthly |
| Tenant isolation | RLS-enforced; tenant-isolation.spec.ts mandatory in CI |
| Sync footprint | All open WOs + last 30 days closed + asset registry replicated to desktop SQLite |
| Replicas | Min 2 Cloud Run instances; preventive cron as separate Cloud Run service every 60 s |
Where to go next
- Implementation-grade detail:
services/maintenance-service/SERVICE_OVERVIEW.mdand the rest of the 18-doc bundle. - Room OOO interaction:
services/property-service/EVENT_SCHEMAS.md(room.taken_out_of_order.v1) anddocs/04-event-driven-architecture.md§housekeeping-maintenance choreography. - Lock-device health alerts:
docs/09-lock-and-key-integration.md§health monitoring. - Conflict policy table for the desktop sync engine:
docs/02-enterprise-architecture.md§8.2. - Vendor SMS/WhatsApp bridge:
services/notification-service/API_CONTRACTS.md.