Skip to content

Fix single-replica partition blocking forever during bootstrap/deactivation/disconnection#3204

Merged
justinlin-linkedin merged 1 commit intolinkedin:masterfrom
justinlin-linkedin:justin/fixsingle
Mar 2, 2026
Merged

Fix single-replica partition blocking forever during bootstrap/deactivation/disconnection#3204
justinlin-linkedin merged 1 commit intolinkedin:masterfrom
justinlin-linkedin:justin/fixsingle

Conversation

@justinlin-linkedin
Copy link
Collaborator

Summary

When a partition has only one replica, there are no peers, so updateReplicaLagAndCheckSyncStatus is never called by the replication manager. This caused waitBootstrapCompleted (and the equivalent wait methods for deactivation/disconnection) to block forever.

Two changes fix this:

  1. Cap catchupTarget at the total number of available peers so hasSyncedUpWithEnoughPeers() returns true when there are zero peers.
  2. Check isSyncUpComplete() right after initiation and immediately complete the operation if already satisfied, since no replication events will ever arrive to trigger completion.

Testing Done

Added new unit tests

…vation/disconnection

When a partition has only one replica, there are no peers, so
updateReplicaLagAndCheckSyncStatus is never called by the replication
manager. This caused waitBootstrapCompleted (and the equivalent wait
methods for deactivation/disconnection) to block forever.

Two changes fix this:
1. Cap catchupTarget at the total number of available peers so
   hasSyncedUpWithEnoughPeers() returns true when there are zero peers.
2. Check isSyncUpComplete() right after initiation and immediately
   complete the operation if already satisfied, since no replication
   events will ever arrive to trigger completion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link

codecov-commenter commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 86.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.81%. Comparing base (52ba813) to head (6fcb45d).
⚠️ Report is 348 commits behind head on master.

Files with missing lines Patch % Lines
...ub/ambry/clustermap/AmbryReplicaSyncUpManager.java 86.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3204      +/-   ##
============================================
+ Coverage     64.24%   69.81%   +5.56%     
- Complexity    10398    12816    +2418     
============================================
  Files           840      930      +90     
  Lines         71755    79101    +7346     
  Branches       8611     9464     +853     
============================================
+ Hits          46099    55222    +9123     
+ Misses        23004    20931    -2073     
- Partials       2652     2948     +296     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@justinlin-linkedin justinlin-linkedin merged commit 49f8397 into linkedin:master Mar 2, 2026
5 checks passed
@justinlin-linkedin justinlin-linkedin deleted the justin/fixsingle branch March 2, 2026 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants