feat: Uptime Monitoring Alarm Integration#328
feat: Uptime Monitoring Alarm Integration#328dagangtj wants to merge 2 commits intodatabuddy-analytics:mainfrom
Conversation
- Add alarms and alarm_trigger_history tables to database schema - Implement alarm trigger logic in uptime service - Add state tracking for consecutive failures and status changes - Integrate with @databuddy/notifications package for Slack/Discord - Support configurable thresholds and notification channels - Log alarm trigger history for audit trail - Prevent duplicate notifications with smart state tracking Closes databuddy-analytics#268
|
@dagangtj is attempting to deploy a commit to the Databuddy OSS Team on Vercel. A member of the Team first needs to authorize it. |
|
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR implements uptime monitoring alarm integration, adding two new database tables ( Key changes:
Issues found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant QStash
participant index.ts
participant StateTracker
participant alarms.ts
participant DB
participant Slack/Discord
QStash->>index.ts: POST / (uptime check trigger)
index.ts->>index.ts: lookupSchedule(scheduleId)
index.ts->>index.ts: checkUptime(url)
index.ts->>index.ts: sendUptimeEvent(result)
alt websiteId present
index.ts->>StateTracker: updateState(monitorId, status, timestamp)
StateTracker-->>index.ts: { previousStatus, consecutiveFailures, downtimeDuration }
index.ts->>alarms.ts: processUptimeAlarms(context)
alarms.ts->>DB: getWebsiteAlarms(websiteId)
DB-->>alarms.ts: alarms[]
loop for each alarm
alarms.ts->>alarms.ts: shouldTriggerAlarm(alarm, context)
alt alarm triggers
alarms.ts->>alarms.ts: buildDownNotification / buildUpNotification
alarms.ts->>Slack/Discord: sendNotification (Promise.allSettled)
alarms.ts->>DB: logAlarmTrigger (alarm_trigger_history)
end
end
end
index.ts-->>QStash: 200 OK
Last reviewed commit: 61fa51d |
apps/uptime/src/state-tracker.ts
Outdated
| if (statusChanged) { | ||
| // Status changed | ||
| consecutiveFailures = currentStatus === MonitorStatus.DOWN ? 1 : 0; | ||
| lastStatusChange = timestamp; | ||
| } else if (currentStatus === MonitorStatus.DOWN) { | ||
| // Still down, increment failures | ||
| consecutiveFailures += 1; | ||
| } else { | ||
| // Still up, reset failures | ||
| consecutiveFailures = 0; | ||
| } | ||
|
|
||
| const downtimeDuration = | ||
| statusChanged && currentStatus === MonitorStatus.UP | ||
| ? timestamp - lastStatusChange | ||
| : undefined; |
There was a problem hiding this comment.
downtimeDuration always calculates as zero
lastStatusChange is overwritten to timestamp inside the statusChanged block on line 56 — before being used in the downtime duration calculation on line 65. This means timestamp - lastStatusChange is always 0 when the site recovers, and the recovery notification will always show "0 minutes" for the downtime duration.
The original lastStatusChange value (from existing.lastStatusChange) must be captured before the local variable is reassigned:
| if (statusChanged) { | |
| // Status changed | |
| consecutiveFailures = currentStatus === MonitorStatus.DOWN ? 1 : 0; | |
| lastStatusChange = timestamp; | |
| } else if (currentStatus === MonitorStatus.DOWN) { | |
| // Still down, increment failures | |
| consecutiveFailures += 1; | |
| } else { | |
| // Still up, reset failures | |
| consecutiveFailures = 0; | |
| } | |
| const downtimeDuration = | |
| statusChanged && currentStatus === MonitorStatus.UP | |
| ? timestamp - lastStatusChange | |
| : undefined; | |
| const previousLastStatusChange = existing.lastStatusChange; | |
| if (statusChanged) { | |
| // Status changed | |
| consecutiveFailures = currentStatus === MonitorStatus.DOWN ? 1 : 0; | |
| lastStatusChange = timestamp; | |
| } else if (currentStatus === MonitorStatus.DOWN) { | |
| // Still down, increment failures | |
| consecutiveFailures += 1; | |
| } else { | |
| // Still up, reset failures | |
| consecutiveFailures = 0; | |
| } | |
| const downtimeDuration = | |
| statusChanged && currentStatus === MonitorStatus.UP | |
| ? timestamp - previousLastStatusChange | |
| : undefined; |
| // Check consecutive failures threshold | ||
| if ( | ||
| conditions?.consecutiveFailuresThreshold && | ||
| context.consecutiveFailures | ||
| ) { | ||
| if (context.consecutiveFailures >= conditions.consecutiveFailuresThreshold) { | ||
| return true; | ||
| } | ||
| } |
There was a problem hiding this comment.
Consecutive-failures threshold causes spam notifications
Once consecutiveFailures >= consecutiveFailuresThreshold, this condition is true on every subsequent check for as long as the site stays down. There is no "already triggered" guard for this path, so every uptime poll after the threshold is exceeded will fire a new notification — directly contradicting the "Duplicate Prevention" claim in the PR description.
The status-change path is fine (it only fires when previousStatus !== currentStatus), but this threshold path needs its own deduplication. One approach is to record a lastThresholdAlertAt (or a flag like thresholdAlerted: boolean) in MonitorState and reset it only when the site recovers. For example, in StateTracker:
// In MonitorState
thresholdAlerted: boolean;
// In updateState – reset on recovery
if (statusChanged && currentStatus === MonitorStatus.UP) {
thresholdAlerted = false;
}Then gate the threshold check in shouldTriggerAlarm so it only fires once per downtime episode.
apps/uptime/src/alarms.ts
Outdated
| const channels = alarm.notificationChannels; | ||
| const promises: Promise<unknown>[] = []; |
There was a problem hiding this comment.
unknown type violates project style guide
The project's style guide explicitly prohibits using any, unknown, or never — prefer an explicit type. Since all notification helpers return void (they're fire-and-forget), Promise<void> is more accurate here.
| const channels = alarm.notificationChannels; | |
| const promises: Promise<unknown>[] = []; | |
| const promises: Promise<void>[] = []; |
Context Used: Context from dashboard - Basic guidelines for the project so vibe coders don't fuck it up (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| class StateTracker { | ||
| private states = new Map<string, MonitorState>(); |
There was a problem hiding this comment.
In-memory state is lost on service restart
StateTracker stores all monitor state in a process-local Map. Any restart (deploy, crash, scale-down) wipes the entire state. After a restart:
- A site that is already down will be treated as a first-ever check (
previousStatus: undefined), so theshouldTriggerAlarmstatus-change condition (previousStatus && status !== previousStatus) will be false and the alarm won't fire until the next check. - The
consecutiveFailurescounter resets to 1, meaning a site that was at failure Truncat too long Organisation Names #10 will briefly stop triggering threshold-based alarms.
This is a known trade-off for in-memory state, but it should be documented in the class or flagged as a known limitation, especially since the PR advertises duplicate-prevention semantics that are violated after restarts.
If the service runs multiple replicas, each instance has a completely independent state — the same event may fire an alarm on one instance but be silently ignored by another, or fire redundantly on all.
Additional Comments (1)
The Consider adding: foreignKey({
columns: [table.websiteId],
foreignColumns: [websites.id],
name: "alarm_trigger_history_website_id_fkey",
}).onDelete("set null"), |
- Fix downtimeDuration calculation bug in state-tracker.ts (was always 0) - Add deduplication for consecutive failure threshold alerts - Add missing foreign key constraint on alarm_trigger_history.website_id with CASCADE DELETE - Fix Promise type annotation in alarms.ts (Promise<void>[] instead of Promise<unknown>[])
Greptile Review Issues FixedThanks for the detailed review! I've addressed all the issues mentioned: 1. ✅ Fixed calculation bug (Critical)Problem: was being updated before calculating , causing recovery notifications to always report 0 minutes of downtime. Fix: Moved the duration calculation to happen BEFORE updating in (line 51-54). 2. ✅ Added deduplication for consecutive failure threshold (Significant)Problem: Once , the alarm would fire on every poll, causing notification spam. Fix: Added Map to track which alarms have already fired at specific failure counts. The alarm now only triggers once when the threshold is first reached, and resets on status change (line 38, 75-90 in ). 3. ✅ Added missing foreign key constraint (Notable)Problem: had no FK constraint, allowing orphaned history rows when websites are deleted. Fix: Added foreign key constraint with to → in (line 1074-1078). 4. ✅ Fixed type annotationChanged to in to match project style guide. Note on in-memory state: The in-memory design is intentional for this initial implementation. For production deployments with multiple replicas, we can add Redis/database persistence in a follow-up PR if needed. All fixes are now pushed to this branch. |
Closes #268
Summary
Implements uptime monitoring alarm integration as requested in bounty #268 ().
Changes
Database Schema
alarmstable with support for multiple notification channels (Slack, Discord, Email, Webhook)alarm_trigger_historytable for audit trailUptime Service Integration
Notification System
@databuddy/notificationspackage for Slack and Discord webhooksFeatures Implemented
✅ Uptime trigger integration (down/up events)
✅ Consecutive failures threshold support
✅ Response time threshold support (optional/stretch)
✅ Alarm assignment to websites via websiteId
✅ Duplicate notification prevention
✅ Alarm trigger history logging
✅ Multi-channel support (Slack, Discord ready; Email/Webhook structure in place)
Technical Details
apps/uptime/@databuddy/notificationspackage helpersFiles Changed
packages/db/src/drizzle/schema.ts- Database schema for alarmsapps/uptime/src/alarms.ts- Alarm trigger and notification logicapps/uptime/src/state-tracker.ts- State tracking for consecutive failuresapps/uptime/src/index.ts- Integration into uptime serviceTesting Notes
websiteIdis present in scheduleNext Steps (UI - Not in Scope)
This PR provides the backend foundation. UI implementation would include:
Dependencies
Notes