Skip to main content

maintenance-service

Bounded Context: Maintenance (Supporting) · Owner: PMS Operations · Phase: 0 (basic ticketing) → Phase 1 (preventive + assets) · Storage: Cloud SQL Postgres (shared schema + RLS) + transactional outbox · Bundle: services/maintenance-service/

maintenance-service owns work orders for everything a hotel breaks, wears out, or has to keep alive: HVAC compressors, plumbing, electrical, generators, water tanks, BLE/lock devices, linen, furniture, IT, and structural issues. It is the system of record for the WorkOrder lifecycle (open → assigned → in_progress → blocked → resolved → verified, plus terminal cancelled), the PreventiveSchedule cron (generator run-hour service, HVAC seasonal, water-tank cleaning, lock-battery checks), the Asset registry per property, light PartUsage tracking, and Vendor records (often phone-only, no email — many service providers in the operating markets reach the platform via WhatsApp/SMS bridges).

It coordinates closely with housekeeping-service (housekeeping flags issues; maintenance owns them), property-service (rooms transitioned to OOO/OOS while a work order is active and severity warrants), lock-integration-service (auto-creates lock-maintenance work orders on device.health_alert.v1), and reservation-service (a high-severity WO on a room with an active reservation triggers a re-accommodation prompt).

Purpose

  • Be the single authoritative aggregate for every "something needs fixing here" event in the hotel — internal report, guest complaint, housekeeping flag, lock health alert, or scheduled preventive task.
  • Own the room-OOO contract: when severity warrants, a work order takes a room out of order (publishing melmastoon.maintenance.work_order.room_blocked.v1); property-service reacts and the BFF arrivals board reflects it. When the WO is verified, the room is returned to the housekeeping queue.
  • Own the preventive cadence for safety-critical assets — generator run-hours (load-shedding markets in Afghanistan/Pakistan border areas push generators hard; a missed oil change is a fire risk), HVAC seasonal filter swaps in dust-heavy regions, quarterly water-tank chlorination, lock-device battery monitoring.
  • Coordinate with vendors that may have no email — WhatsApp/SMS bridge via notification-service, callback windows captured as structured fields, and a manual-acknowledgement path so a vendor confirming "I'll be there at 5pm" via SMS lands as a structured event.
  • Track cost per work order (labor minutes × rate, parts consumed × cost, vendor invoice) so finance can roll up cost-per-room-night and cost-per-property in analytics-service.

The service does not own:

  • Room state transitions themselves (property-service owns Room.status); we publish events that drive them.
  • Folio postings for vendor invoices (billing-service owns posting; we publish a recorded-invoice event with line items).
  • Lock device pairing or vendor credentials (lock-integration-service owns those; we only consume health_alert.v1 and create a WO).
  • Housekeeping turnover tasks (housekeeping-service owns those; we just emit verified.v1 and they re-queue cleaning).

Key responsibilities

  1. Work order lifecycleopen → assigned → in_progress → blocked → resolved → verified, with first-class cancellation, re-assignment, and OCC-protected concurrency.
  2. Auto-creation paths — react to housekeeping.room.maintenance_required.v1, lock.device.health_alert.v1, preventive.due.v1 (own subject), and reservation.guest_complaint.recorded.v1 (Phase 2) by drafting work orders with appropriate severity/category and assignment hints.
  3. Severity-driven OOO — a WO with severity high or critical on a room asset publishes work_order.room_blocked.v1; property-service flips the room to out_of_order. On verified.v1, the block is lifted.
  4. Preventive schedules — recurring rules per asset class (e.g., "generator: every 250 run-hours OR every 90 days, whichever first"; "HVAC dust filter: every 30 days in Asia/Kabul"; "water tank: every 90 days"; "lock device battery: check every 30 days, replace at < 25%"). The scheduler emits preventive.due.v1 which we materialise as a draft WO.
  5. Vendor coordination — vendor records with optional email/phone; notification-service channel preference per vendor (whatsapp | sms | email | call_only); structured callback windows; manual-acknowledgement endpoint for staff to record "vendor confirmed by phone".
  6. Parts inventory (light) — per-property part stock with partNumber, name, on-hand quantity, reorder threshold, last-purchased cost; PartUsage records consumption against a WO and emits part_usage.recorded.v1. Heavy inventory (rate, distributor, multi-warehouse) is out of scope.
  7. Cost tracking — labor (assignee × minutes × hourly rate) + parts (PartUsage.cost) + vendor invoice; rolled up onto the WorkOrder; emitted in resolved.v1 for analytics-service.
  8. SLA tracking & auto-escalation — per category and severity, an slaTargetMinutes window. Breaches emit sla_breached.v1 and trigger escalation per tenant policy (escalate to GM after N breaches, etc.).
  9. Asset registry — per property: HVAC units (with model, install date, last-service date, run-hours), generators (capacity, last-oil-change run-hours), water tanks (capacity, last-cleaned date), lock devices (mirrored from lock-integration-service for joins), linen sets, furniture lots. Each asset has a healthIndex (0–100) updated by AI on event signals.
  10. Re-accommodation handshake — when a room asset goes OOO and a confirmed reservation overlaps the OOO window, emit work_order.relocation_required.v1 for reservation-service to fan into a room_change sub-saga.

Aggregates owned

AggregateCardinalityPurposeIdentity prefix
WorkOrderroot, 1 per ticketState machine, severity, category, asset link, cost rollup, SLA timersmnt_
MaintenanceTask0..N per WOSub-step inside a WO (e.g., "drain tank", "refill", "test") used for multi-step jobsmtk_
PreventiveSchedule1..N per AssetRecurring rule producing draft WOspsch_
Asset1..N per PropertyRegistered HVAC unit, generator, water tank, lock device, linen lot, furniture lot, etc.ast_
PartUsage0..N per WOConsumption row referencing a part stocked at the propertypus_
Vendor1..N per tenantExternal contractor or supplier with channel preferencevnd_
MaintenanceCategory9 canonical entries (per tenant override allowed)plumbing, electrical, hvac, lock, generator, water, structural, it, othermcat_

Key APIs (REST, /api/v1/maintenance)

MethodPathPurpose
POST/work-ordersCreate a work order (manual report or BFF-orchestrated auto path)
GET/work-orders/:idRead full work order
GET/work-ordersList with filters (status, severity, category, assetId, assigneeId, vendorId, propertyId, dateFrom, dateTo)
PATCH/work-orders/:idUpdate narrow fields (description, severity, category)
POST/work-orders/:id/assignAssign to internal staff or vendor
POST/work-orders/:id/startTransition to in_progress
POST/work-orders/:id/blockMark blocked with reason and (optionally) ETA
POST/work-orders/:id/resolveSubmit resolution + final cost lines + parts used
POST/work-orders/:id/verifyGM/owner verifies the resolution; releases room OOO if applicable
POST/work-orders/:id/cancelCancel with reason
POST/work-orders/:id/escalateManual escalation hop
POST/work-orders/:id/parts-usageAppend a PartUsage row
POST/work-orders/:id/vendor-acknowledgedStaff records vendor verbal/SMS confirmation
POST/work-orders/:id/vendor-invoiceRecord vendor invoice (amount, due date, file ref)
GET/preventive-schedules / POST / PATCH /:id / DELETE /:idCRUD on schedules
POST/preventive-schedules/:id/trigger-nowManual fire (creates draft WO immediately)
GET/assets / POST / PATCH /:id / DELETE /:idAsset registry CRUD
POST/assets/:id/health-updateManual or sensor health data point
GET/vendors / POST / PATCH /:id / DELETE /:idVendor CRUD
GET/parts / POST / PATCH /:idLight parts inventory

Consumed by bff-backoffice-service (staff) and Pub/Sub push handlers for the auto-creation paths. Never called directly from bff-tenant-booking-service or bff-consumer-service.

Key events published

EventTrigger
melmastoon.maintenance.work_order.created.v1A WO is created (manual, auto, or scheduled)
melmastoon.maintenance.work_order.assigned.v1Assignee (staff or vendor) attached
melmastoon.maintenance.work_order.started.v1assigned → in_progress
melmastoon.maintenance.work_order.in_progress.v1Status note appended while in progress
melmastoon.maintenance.work_order.blocked.v1Blocked with reason + optional ETA
melmastoon.maintenance.work_order.resolved.v1Resolved with cost roll-up
melmastoon.maintenance.work_order.verified.v1GM/owner sign-off; releases OOO; final
melmastoon.maintenance.work_order.cancelled.v1Cancelled
melmastoon.maintenance.work_order.escalated.v1Manual or auto escalation hop
melmastoon.maintenance.work_order.sla_breached.v1SLA timer elapsed without target transition
melmastoon.maintenance.work_order.room_blocked.v1Auto-OOO request to property-service
melmastoon.maintenance.work_order.relocation_required.v1A confirmed reservation overlaps the OOO window
melmastoon.maintenance.preventive.scheduled.v1New schedule created or modified
melmastoon.maintenance.preventive.due.v1Schedule cron fired; draft WO materialised
melmastoon.maintenance.preventive.completed.v1Resolved WO that originated from a schedule; updates next-due timestamp
melmastoon.maintenance.asset.registered.v1Asset added
melmastoon.maintenance.asset.health_changed.v1Health index changed (manual, AI, or event-driven)
melmastoon.maintenance.vendor.assigned.v1A vendor was attached to a WO (subset of assigned.v1)
melmastoon.maintenance.vendor.invoice_recorded.v1Vendor invoice recorded against a WO
melmastoon.maintenance.part_usage.recorded.v1Part consumed against a WO

Key events consumed

EventEffect
melmastoon.housekeeping.room.maintenance_required.v1Auto-create WO with `category=hvac
melmastoon.lock_integration.device.health_alert.v1Auto-create WO in category=lock, asset linked to the lock device; severity per battery / online status
melmastoon.property.room.taken_out_of_order.v1Find any active WOs on that room and link them so the OOO source-of-truth chain is intact
melmastoon.staff.shift.started.v1Refresh in-memory roster of available technicians for assignment hints
melmastoon.tenant.settings.changed.v1Refresh SLA targets, escalation rules, preventive defaults
melmastoon.reservation.checked_in.v1Re-evaluate active WOs on the room/property for in-stay impact
melmastoon.billing.vendor_invoice.posted.v1Mark our Vendor.invoiceRecorded as posted_to_folio for reconciliation

Upstream / downstream

Upstream (we consume): housekeeping-service, lock-integration-service, property-service, staff-service, tenant-service, reservation-service (Phase 2 guest complaint).

Downstream (we publish for): property-service (room OOO/OOS), reservation-service (relocation prompt), housekeeping-service (re-queue clean after verify), notification-service (vendor outbound, GM escalations), billing-service (vendor invoice posting), analytics-service, audit-service, search-aggregation-service (asset & vendor search), sync-service (desktop), bff-backoffice-service.

Non-functional requirements

NFRTarget
WO create-to-assign p95 (auto-path)< 5 s end-to-end including notification fan-out
WO list query p95 (filtered, < 5 k rows)< 400 ms
Preventive scheduler lag< 60 s beyond due time
SLA breach detection lag< 60 s beyond breach moment
API availability99.9% monthly
Tenant isolationRLS-enforced; tenant-isolation.spec.ts mandatory in CI
Sync footprintAll open WOs + last 30 days closed + asset registry replicated to desktop SQLite
ReplicasMin 2 Cloud Run instances; preventive cron as separate Cloud Run service every 60 s

Where to go next