Skip to content

feat(sentry): error tracking, performance monitoring, and session replay#280

Open
Just-Insane wants to merge 9 commits intodevfrom
feat/sentry-pr
Open

feat(sentry): error tracking, performance monitoring, and session replay#280
Just-Insane wants to merge 9 commits intodevfrom
feat/sentry-pr

Conversation

@Just-Insane
Copy link
Contributor

@Just-Insane Just-Insane commented Mar 15, 2026

Adds comprehensive Sentry error tracking, performance monitoring, and session replay to Sable.

What's in this PR

Core integration (G1)

  • Error tracking: Automatic capture of unhandled exceptions + React render errors via ErrorBoundary; crash page shows the Sentry event ID with one-click feedback
  • Performance monitoring: Spans and custom metrics across all critical user paths — auth login, message send, timeline jump-load, per-event decryption, sliding sync cycles, key backup, store wipe, UTD failures, sync degradation, background notification clients, media uploads
  • Session replay: Full replay with privacy masking (text, media, inputs) — opt-in, off by default
  • Bug report integration: /bugreport modal sends to Sentry Feedback API; debug log attachment optional
  • Developer settings: Settings → Developer Tools → Error Tracking (Sentry) — breadcrumb category toggles, live error/warning counters, debug log export
  • CI: VITE_SENTRY_DSN + VITE_SENTRY_ENVIRONMENT passed in preview (100% sampling) and production (10%) builds; source maps uploaded via @sentry/vite-plugin

Sentry→GitHub Issues triage (G2)

New .github/workflows/sentry-preview-issues.yml: After every PR preview build, a GitHub Actions run queries Sentry for errors seen in that preview environment and posts a summary comment to the PR. Helps catch regressions before merge.

Data scrubbing + settings UI + privacy policy (G3)

beforeSend scrubbing in instrument.ts now covers:

  • Matrix user IDs (@user:server@[USER_ID]), room IDs, event IDs
  • Percent-encoded sigils (%40, %21, %24, %23) and hybrid-encoded forms (decoded sigil + %3A colon)
  • Tokens, credentials, query params (access_token, next_batch, etc.)
  • All span data string values (not just http.url)
  • Profile URLs, key backup paths, media paths, preview_url params, exception values
  • 'none' sentinel for unset room tags

Settings → General → Diagnostics & Privacy section with error reporting and session replay toggles, plus a link to docs/PRIVACY.md.

Timeline instrumentation + anomaly detection (G4)

RoomTimeline.tsx now emits structured breadcrumbs for every significant scroll/layout event:

  • ui.scroll: Every atBottom true→false transition, with rapid-flip anomaly detection (true→false within 200 ms flags a likely IntersectionObserver false positive from a DOM layout shift)
  • ui.scroll: Every scroll-to-bottom trigger; warns if fired while user is scrolled up without a pending reset (indicates a logic error)
  • ui.timeline: Virtual paginator window shifts (start/end changes)
  • timeline.events: Every eventsLength batch update — delta, batch size label (single/small/medium/large), range gap, atBottom; warns on large batch while liveTimelineLinked=true (sliding sync adaptive load)
  • sync.sliding: onFirstRoomData breadcrumb in slidingSync.ts with latency + event count; emits sable.sync.room_sub_event_count distribution metric

Message interaction counters: sable.call.answered/declined, sable.message.delete/forward/report.*, sable.message.reaction.toggle.

Bug fixes found via Sentry Replay (G5)

Scroll-to-bottom after list→subscription timeline expansion
When a room with a single cached event (list subscription, timeline_limit=1) transitions to a full subscription, TimelineReset fires before any events land. The "stay at bottom" effect queued a scroll while the DOM was still empty — by the time events loaded, the scroll had fired against an empty container and was a no-op. Fix: the stay-at-bottom effect now re-queues a scroll via scrollToBottomRef.current.count += 1 after setTimeline(getInitialTimeline(room)). — src/app/features/room/RoomTimeline.tsx

TS2367 redundant phase guard in useCallSignaling
A phase !== undefined check was always true at that point in the control flow; TypeScript correctly flagged it as a comparison error. Dead branch removed. — src/app/hooks/useCallSignaling.ts


Setup

Sentry is opt-in: the integration is a no-op unless VITE_SENTRY_DSN is set at build time. Self-hosters who do not set this variable are unaffected.

Variable Required Notes
VITE_SENTRY_DSN Yes Sentry project DSN
VITE_SENTRY_ENVIRONMENT No production (10% sampling) or preview (100%)
SENTRY_AUTH_TOKEN CI only Source map upload

See docs/SENTRY_INTEGRATION.md for full configuration details, self-hosting with Docker, and the complete metrics reference.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 15, 2026

OpenTofu plan for production

Plan: 2 to add, 0 to change, 2 to destroy.
OpenTofu used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

OpenTofu will perform the following actions:

  # cloudflare_worker_version.site must be replaced
-/+ resource "cloudflare_worker_version" "site" {
!~      annotations         = {
+           workers_message      = (known after apply)
+           workers_tag          = (known after apply)
!~          workers_triggered_by = "create_version_api" -> (known after apply)
        } -> (known after apply)
!~      assets              = { # forces replacement
!~          asset_manifest_sha256 = "b44bbf6a507cd60278d078f529f5e4b4bfe36580139ecb1ff4fc842de520526f" -> "b09b2cd338d7f1837f401a4e7ecfadd5c5dc08b21c56faf91effbe4bfc7f2547" # forces replacement
!~          directory             = "/home/runner/work/Sable/Sable/dist" -> "/github/workspace/dist"
#            (1 unchanged attribute hidden)
        }
+       compatibility_flags = (known after apply)
!~      created_on          = "2026-03-14T23:24:54Z" -> (known after apply)
!~      id                  = "************************************" -> (known after apply)
+       limits              = (known after apply)
+       main_script_base64  = (known after apply)
!~      number              = 112 -> (known after apply)
!~      source              = "terraform" -> (known after apply)
+       startup_time_ms     = (known after apply)
#        (4 unchanged attributes hidden)
    }

  # cloudflare_workers_deployment.site must be replaced
-/+ resource "cloudflare_workers_deployment" "site" {
!~      annotations  = { # forces replacement
!~          workers_message      = "b90555a27c7dac104a96bf0e2e5b66c2c815da2f" -> "feat(sentry): error tracking, performance monitoring, and session replay"
!~          workers_triggered_by = "deployment" -> (known after apply)
        }
!~      author_email = "cloudflare@sl.sable.moe" -> (known after apply)
!~      created_on   = "2026-03-14T23:24:55Z" -> (known after apply)
!~      id           = "************************************" -> (known after apply)
!~      source       = "terraform" -> (known after apply)
!~      versions     = [ # forces replacement
!~          {
!~              version_id = "************************************" -> (known after apply)
#                (1 unchanged attribute hidden)
            },
        ]
#        (3 unchanged attributes hidden)
    }

Plan: 2 to add, 0 to change, 2 to destroy.

Warning: Attribute Deprecated

  with cloudflare_workers_custom_domain.site,
  on main.tf line 41, in resource "cloudflare_workers_custom_domain" "site":
  41:   environment = "production"

This attribute is deprecated.

(and one more similar warning elsewhere)

📝 Plan generated in Cloudflare Infra #72

@Just-Insane Just-Insane force-pushed the feat/sentry-pr branch 6 times, most recently from 660f8a4 to d9bffa6 Compare March 15, 2026 06:42
@Just-Insane Just-Insane marked this pull request as ready for review March 15, 2026 08:55
@Just-Insane Just-Insane requested a review from a team March 15, 2026 08:55
…settings, and error capture

Adds Sentry error tracking, performance monitoring, and session replay:

- Environment-based sampling: production 10%, preview/dev 100%
- Session replay with full privacy masking (text, media, inputs)
- Hashed MXID as anonymous user ID with homeserver tag
- Error rate limiting (50 events/session) to protect quota

Performance spans and metrics across all critical paths: auth login,
message send, timeline jump-load, per-event decryption, sliding sync
cycles, key backup, store wipe, UTD failures, sync degradation,
background notification clients, media uploads.

Developer settings panel (Settings → General → Diagnostics & Privacy):
error reporting toggle, session replay opt-in, breadcrumb category
controls, debug log export.

Crash page shows Sentry event ID with one-click feedback dialog.
Bug report modal sends structured reports to Sentry feedback API
with optional debug log attachment.

CI: Sentry env vars injected into preview (100% sampling) and
production (10% sampling) builds. Source maps uploaded via
@sentry/vite-plugin.
- Tag all Sentry events with PR number via VITE_SENTRY_PR env var
  (instrument.ts: setTag('pr', prNumber) on global scope)
- cloudflare-web-preview.yml: inject Sentry DSN, environment=preview,
  PR number, and source-map secrets into build env for PR runs
- New workflow sentry-preview-issues.yml: on every PR push, query Sentry
  for unresolved issues tagged with the PR number and environment=preview;
  create a GitHub issue per unique error (deduplicated by sentry-id marker),
  labelled 'sentry-preview' + 'pr-{N}'; maintain a sticky PR comment
  with a summary table; reopen closed issues on regression
- Labels allow filtering: -label:sentry-preview hides all automated issues
…policy

Before-send scrubbing in instrument.ts covers:
- Matrix user IDs (@user:server → @[USER_ID])
- Room IDs (!room:server → ![ROOM_ID])
- Event IDs ($event → $[EVENT_ID])
- Percent-encoded sigils (%40, %21, %24, %23)
- Hybrid-encoded forms (decoded sigil + %3A colon)
- Tokens, credentials, and query params (access_token, next_batch, etc.)
- All span data string values (not just http.url)
- Profile URLs, key backup paths, media paths in span data
- preview_url query params and exception values
- 'none' sentinel for unset room tags

Privacy policy added to docs/PRIVACY.md (linked from General settings).
DiagnosticsAndPrivacy component added to General settings with error
reporting and session replay toggles.
TS2367 redundant phase guard removed from useCallSignaling.ts.
…h monitoring, and anomaly detection

RoomTimeline.tsx:
- atBottom transition breadcrumbs (ui.scroll) with rapid-flip detection:
  true→false within 200ms captures warning (IntersectionObserver false-positive)
- Virtual paginator window shift breadcrumbs (ui.timeline)
- scroll-to-bottom trigger breadcrumb; captures warning when fired while
  user is scrolled up without a pending reset
- eventsLength batch monitoring: breadcrumb on every batch with delta,
  batchSize label (single/small/medium/large), rangeGap, atBottom
- Large batch warning (>50 events, liveTimelineLinked=true) for sliding
  sync adaptive load detection

slidingSync.ts / initMatrix.ts:
- sync.sliding breadcrumb in onFirstRoomData with latency + event count
- sable.sync.room_sub_event_count distribution metric

Message interaction metrics (IncomingCallModal, MessageDelete, MessageForward,
MessageReport, RoomTimeline reaction handler):
- sable.call.answered/declined, sable.message.delete/forward/report/reaction.*
@Just-Insane Just-Insane force-pushed the feat/sentry-pr branch 2 times, most recently from e17a427 to 793dfed Compare March 16, 2026 03:51
…ption batch

The scroll queued at TimelineReset fires when range.end=0 (SDK fires Reset
before populating the fresh timeline), so the DOM is empty and the scroll
is a no-op. The stay-at-bottom effect then correctly expands the range but
never re-queues a scroll, leaving the user at the wrong position.

Fix: increment scrollToBottomRef.count in the stay-at-bottom effect before
calling setTimeline. The next render sees the new count, the useLayoutEffect
fires, and scroll-to-bottom runs against the now-populated DOM.

Also adds monitoring:
- Sentry breadcrumb (ui.scroll) recording each range expansion + scroll,
  with rangeGap, eventsLength, and wasReset fields for Replay analysis.
- slidingSync onFirstRoomData now records the live-timeline event count at
  subscription activation time as a Sentry metric and breadcrumb, making
  the list-to-subscription page size visible in dashboards.
- Flip Sentry from opt-out to opt-in: enabled only when sable_sentry_enabled === 'true'
- Add TelemetryConsentBanner: fixed bottom slide-up banner shown once on first login
  - 'Enable crash reports' reloads page so Sentry initialises immediately
  - 'No thanks' / dismiss X sets sable_sentry_enabled=false, no repeat prompts
  - Only shown when VITE_SENTRY_DSN is configured (self-hosters unaffected)
- Update DiagnosticsAndPrivacy toggle: enable path now sets 'true' instead of removeItem
- Update SentrySettings dev panel to match new opt-in read logic
Add Sentry integration for error tracking and bug reporting
@github-actions
Copy link
Contributor

Deploying with  Cloudflare Workers  Cloudflare Workers

Status Preview URL Commit Alias Updated (UTC)
✅ Deployment successful! https://pr-280-sable.raspy-dream-bb1d.workers.dev 2abb63f pr-280 Mon, 16 Mar 2026 18:01:13 GMT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant