feat: Uptime Monitoring Alarm Integration by dagangtj · Pull Request #328 · databuddy-analytics/Databuddy

dagangtj · 2026-02-26T21:15:39Z

Closes #268

Summary

Implements uptime monitoring alarm integration as requested in bounty #268 ().

Changes

Database Schema

Added alarms table with support for multiple notification channels (Slack, Discord, Email, Webhook)
Added alarm_trigger_history table for audit trail
Proper indexes on user_id, organization_id, website_id, and enabled fields
Foreign key constraints with cascade delete

Uptime Service Integration

State Tracking: Implemented in-memory state tracker to monitor consecutive failures and status changes
Alarm Processing: Integrated alarm trigger logic into uptime check workflow
Smart Notifications: Only triggers on status changes (up ↔ down) or threshold breaches
Duplicate Prevention: State tracker prevents spam notifications

Notification System

Uses @databuddy/notifications package for Slack and Discord webhooks
Down Notification: Includes URL, HTTP status, downtime start, consecutive failures, error details
Up Notification: Includes URL, recovery time, downtime duration, response time
Proper error handling - notification failures don't crash uptime service

Features Implemented

✅ Uptime trigger integration (down/up events)
✅ Consecutive failures threshold support
✅ Response time threshold support (optional/stretch)
✅ Alarm assignment to websites via websiteId
✅ Duplicate notification prevention
✅ Alarm trigger history logging
✅ Multi-channel support (Slack, Discord ready; Email/Webhook structure in place)

Technical Details

Follows existing codebase patterns in apps/uptime/
Uses @databuddy/notifications package helpers
TypeScript with strict types
Proper error handling with tracing integration
Non-blocking alarm processing (failures logged but don't affect uptime checks)

Files Changed

packages/db/src/drizzle/schema.ts - Database schema for alarms
apps/uptime/src/alarms.ts - Alarm trigger and notification logic
apps/uptime/src/state-tracker.ts - State tracking for consecutive failures
apps/uptime/src/index.ts - Integration into uptime service

Testing Notes

Alarm processing only runs when websiteId is present in schedule
State tracker maintains in-memory state for each monitor
Notifications sent via Promise.allSettled (failures don't block)
All errors captured via tracing system

Next Steps (UI - Not in Scope)

This PR provides the backend foundation. UI implementation would include:

Alarm management page in dashboard settings
Alarm assignment UI in uptime monitoring section
Test notification button
Alarm trigger history view

Dependencies

Depends on 🎯 Bounty: Alarms System - Database, API & Dashboard UI #267 (Alarms System) for full functionality
This PR implements the core alarm tables and uptime integration
Can be merged independently; alarms will work once users create them via API/DB

Notes

Email and custom webhook providers are structured but not fully implemented (awaiting email service configuration)
Focused on core uptime alarm functionality as requested in bounty

- Add alarms and alarm_trigger_history tables to database schema - Implement alarm trigger logic in uptime service - Add state tracking for consecutive failures and status changes - Integrate with @databuddy/notifications package for Slack/Discord - Support configurable thresholds and notification channels - Log alarm trigger history for audit trail - Prevent duplicate notifications with smart state tracking Closes databuddy-analytics#268

vercel · 2026-02-26T21:15:46Z

@dagangtj is attempting to deploy a commit to the Databuddy OSS Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-02-26T21:15:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-02-26T21:15:52Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

dosubot · 2026-02-26T21:17:11Z

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

greptile-apps · 2026-02-26T21:19:29Z

Greptile Summary

This PR implements uptime monitoring alarm integration, adding two new database tables (alarms, alarm_trigger_history), an in-memory state tracker for consecutive failure counting, and alarm/notification dispatch logic wired into the existing uptime check flow. The overall architecture is sound and follows codebase conventions, but there are two functional bugs that need to be fixed before merging.

Key changes:

packages/db/src/drizzle/schema.ts — New alarms and alarm_trigger_history tables with indexes and FK constraints (missing FK on alarm_trigger_history.website_id)
apps/uptime/src/state-tracker.ts — In-memory singleton tracking status changes and consecutive failure counts; contains a critical bug where downtimeDuration always evaluates to 0
apps/uptime/src/alarms.ts — Alarm query, condition evaluation, notification dispatch, and history logging; the consecutive-failures threshold path has no deduplication and will spam notifications once exceeded
apps/uptime/src/index.ts — Clean integration of state tracker + alarm processing as a non-blocking side-effect of each uptime check

Issues found:

Critical: downtimeDuration in state-tracker.ts always calculates as 0 because lastStatusChange is overwritten to timestamp before the duration formula runs, making every recovery notification report "0 minutes" of downtime
Significant: The consecutive-failures threshold in shouldTriggerAlarm fires on every poll once consecutiveFailures >= threshold with no "already alerted" guard, causing notification spam — contradicting the stated duplicate-prevention behaviour
Notable: The in-memory StateTracker loses all state on service restart or when multiple replicas run, which can cause missed or duplicate alarms; this should at minimum be documented
No database migration file was included alongside the schema changes

Confidence Score: 2/5

Not safe to merge — a critical calculation bug means recovery notifications always report 0 minutes of downtime, and the threshold path will spam users once triggered.
Two functional bugs require fixes before this is production-ready: the downtimeDuration always-zero bug makes recovery notifications misleading, and the missing deduplication on the consecutive-failures threshold directly contradicts the advertised spam-prevention feature. The integration layer itself is clean and non-breaking, but the core alarm logic cannot ship as-is.
apps/uptime/src/state-tracker.ts (critical downtimeDuration bug) and apps/uptime/src/alarms.ts (threshold deduplication gap) need the most attention.

Important Files Changed

Filename	Overview
apps/uptime/src/state-tracker.ts	Contains a critical bug where `lastStatusChange` is overwritten before being used to compute `downtimeDuration`, causing the recovery notification to always report 0 minutes of downtime. Also stores all state in-memory with no persistence, losing context on restart or across replicas.
apps/uptime/src/alarms.ts	Alarm processing and notification dispatch logic is generally sound, but the consecutive-failures threshold path has no deduplication guard, meaning an alarm fires on every poll once the threshold is exceeded — contradicting the stated spam-prevention behaviour. Minor: uses `Promise<unknown>[]` in violation of the project style guide.
packages/db/src/drizzle/schema.ts	New `alarms` and `alarm_trigger_history` tables added with proper indexes and most foreign keys; `alarm_trigger_history.website_id` is missing a FK constraint. No accompanying migration file was added.
apps/uptime/src/index.ts	Integration of state tracker and alarm processing is clean and non-blocking; errors are captured and logged without affecting the primary uptime check flow.

Sequence Diagram

sequenceDiagram
    participant QStash
    participant index.ts
    participant StateTracker
    participant alarms.ts
    participant DB
    participant Slack/Discord

    QStash->>index.ts: POST / (uptime check trigger)
    index.ts->>index.ts: lookupSchedule(scheduleId)
    index.ts->>index.ts: checkUptime(url)
    index.ts->>index.ts: sendUptimeEvent(result)

    alt websiteId present
        index.ts->>StateTracker: updateState(monitorId, status, timestamp)
        StateTracker-->>index.ts: { previousStatus, consecutiveFailures, downtimeDuration }

        index.ts->>alarms.ts: processUptimeAlarms(context)
        alarms.ts->>DB: getWebsiteAlarms(websiteId)
        DB-->>alarms.ts: alarms[]

        loop for each alarm
            alarms.ts->>alarms.ts: shouldTriggerAlarm(alarm, context)
            alt alarm triggers
                alarms.ts->>alarms.ts: buildDownNotification / buildUpNotification
                alarms.ts->>Slack/Discord: sendNotification (Promise.allSettled)
                alarms.ts->>DB: logAlarmTrigger (alarm_trigger_history)
            end
        end
    end

    index.ts-->>QStash: 200 OK

_{Last reviewed commit: 61fa51d}

greptile-apps

_{4 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-26T21:19:33Z

apps/uptime/src/state-tracker.ts

+		if (statusChanged) {
+			// Status changed
+			consecutiveFailures = currentStatus === MonitorStatus.DOWN ? 1 : 0;
+			lastStatusChange = timestamp;
+		} else if (currentStatus === MonitorStatus.DOWN) {
+			// Still down, increment failures
+			consecutiveFailures += 1;
+		} else {
+			// Still up, reset failures
+			consecutiveFailures = 0;
+		}
+
+		const downtimeDuration =
+			statusChanged && currentStatus === MonitorStatus.UP
+				? timestamp - lastStatusChange
+				: undefined;


downtimeDuration always calculates as zero

lastStatusChange is overwritten to timestamp inside the statusChanged block on line 56 — before being used in the downtime duration calculation on line 65. This means timestamp - lastStatusChange is always 0 when the site recovers, and the recovery notification will always show "0 minutes" for the downtime duration.

The original lastStatusChange value (from existing.lastStatusChange) must be captured before the local variable is reassigned:

Suggested change

if (statusChanged) {

// Status changed

consecutiveFailures = currentStatus === MonitorStatus.DOWN ? 1 : 0;

lastStatusChange = timestamp;

} else if (currentStatus === MonitorStatus.DOWN) {

// Still down, increment failures

consecutiveFailures += 1;

} else {

// Still up, reset failures

consecutiveFailures = 0;

}

const downtimeDuration =

statusChanged && currentStatus === MonitorStatus.UP

? timestamp - lastStatusChange

: undefined;

const previousLastStatusChange = existing.lastStatusChange;

if (statusChanged) {

// Status changed

consecutiveFailures = currentStatus === MonitorStatus.DOWN ? 1 : 0;

lastStatusChange = timestamp;

} else if (currentStatus === MonitorStatus.DOWN) {

// Still down, increment failures

consecutiveFailures += 1;

} else {

// Still up, reset failures

consecutiveFailures = 0;

}

const downtimeDuration =

statusChanged && currentStatus === MonitorStatus.UP

? timestamp - previousLastStatusChange

: undefined;

greptile-apps · 2026-02-26T21:19:34Z

apps/uptime/src/alarms.ts

+	// Check consecutive failures threshold
+	if (
+		conditions?.consecutiveFailuresThreshold &&
+		context.consecutiveFailures
+	) {
+		if (context.consecutiveFailures >= conditions.consecutiveFailuresThreshold) {
+			return true;
+		}
+	}


Consecutive-failures threshold causes spam notifications

Once consecutiveFailures >= consecutiveFailuresThreshold, this condition is true on every subsequent check for as long as the site stays down. There is no "already triggered" guard for this path, so every uptime poll after the threshold is exceeded will fire a new notification — directly contradicting the "Duplicate Prevention" claim in the PR description.

The status-change path is fine (it only fires when previousStatus !== currentStatus), but this threshold path needs its own deduplication. One approach is to record a lastThresholdAlertAt (or a flag like thresholdAlerted: boolean) in MonitorState and reset it only when the site recovers. For example, in StateTracker:

// In MonitorState thresholdAlerted: boolean; // In updateState – reset on recovery if (statusChanged && currentStatus === MonitorStatus.UP) { thresholdAlerted = false; }

Then gate the threshold check in shouldTriggerAlarm so it only fires once per downtime episode.

greptile-apps · 2026-02-26T21:19:35Z

apps/uptime/src/alarms.ts

+	const channels = alarm.notificationChannels;
+	const promises: Promise<unknown>[] = [];


unknown type violates project style guide

The project's style guide explicitly prohibits using any, unknown, or never — prefer an explicit type. Since all notification helpers return void (they're fire-and-forget), Promise<void> is more accurate here.

Suggested change

const channels = alarm.notificationChannels;

const promises: Promise<unknown>[] = [];

const promises: Promise<void>[] = [];

Context Used: Context from dashboard - Basic guidelines for the project so vibe coders don't fuck it up (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-26T21:19:36Z

apps/uptime/src/state-tracker.ts

+class StateTracker {
+	private states = new Map<string, MonitorState>();


In-memory state is lost on service restart

StateTracker stores all monitor state in a process-local Map. Any restart (deploy, crash, scale-down) wipes the entire state. After a restart:

A site that is already down will be treated as a first-ever check (previousStatus: undefined), so the shouldTriggerAlarm status-change condition (previousStatus && status !== previousStatus) will be false and the alarm won't fire until the next check.

The consecutiveFailures counter resets to 1, meaning a site that was at failure Truncat too long Organisation Names #10 will briefly stop triggering threshold-based alarms.

This is a known trade-off for in-memory state, but it should be documented in the class or flagged as a known limitation, especially since the PR advertises duplicate-prevention semantics that are violated after restarts.

If the service runs multiple replicas, each instance has a completely independent state — the same event may fire an alarm on one instance but be silently ignored by another, or fire redundantly on all.

greptile-apps · 2026-02-26T21:19:43Z

Additional Comments (1)

packages/db/src/drizzle/schema.ts
alarm_trigger_history.website_id has no foreign key constraint

The alarms table has a properly constrained websiteId with a CASCADE DELETE foreign key to websites.id. alarmTriggerHistory also carries a websiteId column but has no corresponding constraint — orphaned history rows will accumulate if a website is deleted. Given that the alarms FK already cascades, the corresponding history rows would survive the website deletion but their websiteId would point to a non-existent row.

Consider adding:

foreignKey({
  columns: [table.websiteId],
  foreignColumns: [websites.id],
  name: "alarm_trigger_history_website_id_fkey",
}).onDelete("set null"),

- Fix downtimeDuration calculation bug in state-tracker.ts (was always 0) - Add deduplication for consecutive failure threshold alerts - Add missing foreign key constraint on alarm_trigger_history.website_id with CASCADE DELETE - Fix Promise type annotation in alarms.ts (Promise<void>[] instead of Promise<unknown>[])

dagangtj · 2026-02-26T22:11:00Z

Greptile Review Issues Fixed

Thanks for the detailed review! I've addressed all the issues mentioned:

1. ✅ Fixed calculation bug (Critical)

Problem: was being updated before calculating , causing recovery notifications to always report 0 minutes of downtime.

Fix: Moved the duration calculation to happen BEFORE updating in (line 51-54).

2. ✅ Added deduplication for consecutive failure threshold (Significant)

Problem: Once , the alarm would fire on every poll, causing notification spam.

Fix: Added Map to track which alarms have already fired at specific failure counts. The alarm now only triggers once when the threshold is first reached, and resets on status change (line 38, 75-90 in ).

3. ✅ Added missing foreign key constraint (Notable)

Problem: had no FK constraint, allowing orphaned history rows when websites are deleted.

Fix: Added foreign key constraint with to → in (line 1074-1078).

4. ✅ Fixed type annotation

Changed to in to match project style guide.

Note on in-memory state: The in-memory design is intentional for this initial implementation. For production deployments with multiple replicas, we can add Redis/database persistence in a follow-up PR if needed.

All fixes are now pushed to this branch.

greptile-apps bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Uptime Monitoring Alarm Integration#328

feat: Uptime Monitoring Alarm Integration#328
dagangtj wants to merge 2 commits intodatabuddy-analytics:mainfrom
dagangtj:feat/uptime-alarm-integration

dagangtj commented Feb 26, 2026

Uh oh!

vercel bot commented Feb 26, 2026

Uh oh!

CLAassistant commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026

Review skipped

Uh oh!

dosubot bot commented Feb 26, 2026

Uh oh!

greptile-apps bot commented Feb 26, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

greptile-apps bot commented Feb 26, 2026

Uh oh!

dagangtj commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const channels = alarm.notificationChannels;
		const promises: Promise<unknown>[] = [];

	const channels = alarm.notificationChannels;
	const promises: Promise<unknown>[] = [];
	const promises: Promise<void>[] = [];

		class StateTracker {
		private states = new Map<string, MonitorState>();

Conversation

dagangtj commented Feb 26, 2026

Summary

Changes

Database Schema

Uptime Service Integration

Notification System

Features Implemented

Technical Details

Files Changed

Testing Notes

Next Steps (UI - Not in Scope)

Dependencies

Notes

Uh oh!

vercel bot commented Feb 26, 2026

Uh oh!

CLAassistant commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026

Review skipped

Uh oh!

dosubot bot commented Feb 26, 2026

Uh oh!

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 26, 2026

Uh oh!

dagangtj commented Feb 26, 2026

Greptile Review Issues Fixed

1. ✅ Fixed calculation bug (Critical)

2. ✅ Added deduplication for consecutive failure threshold (Significant)

3. ✅ Added missing foreign key constraint (Notable)

4. ✅ Fixed type annotation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants