Skip to content

Fix gc lock removals#6166

Open
ddanielr wants to merge 1 commit intoapache:2.1from
ddanielr:bugfix/fix-gc-lock-deletions
Open

Fix gc lock removals#6166
ddanielr wants to merge 1 commit intoapache:2.1from
ddanielr:bugfix/fix-gc-lock-deletions

Conversation

@ddanielr
Copy link
Contributor

@ddanielr ddanielr commented Mar 3, 2026

Does not delete the top level gc/lock/ path, instead it just deletes all locks underneath the path.

Added validation checks from other lock deletion code paths.
Updates ITs and the Admin utility to delete gc locks the same way.

Fixes IT failures caused by #6077

Does not delete the top level `gc/lock/` path, instead it just deletes
all locks underneath the path.

Updates ITs and the Admin utility to delete gc locks the same way.
@ddanielr ddanielr added this to the 2.1.5 milestone Mar 3, 2026
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test was failing? Was there a problem with the ZooZap code?

Comment on lines +832 to +834
if (!zoo.exists(path)) {
throw new IllegalStateException("Path " + path + " does not exist");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists check may not be needed. Seems like ServiceLock.getLockData() may call Zookeeper.getChildren() on the path and that may throw NoNodeException if it does not exists. Was a specific exception type needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, I had used this for some of my test debugging and it can go away.

The null checks are a pre-optimization as well in preparation for future refactoring. The current code shouldn't allow null objects to be passed in. I can also pull those out as well if desired.

@ddanielr
Copy link
Contributor Author

ddanielr commented Mar 4, 2026

Which test was failing?

BigRootTabletIT.java and AdvertiseAndBindIT.java were the two I noticed failing, but any test that stopped and started the cluster should have been failing.

Was there a problem with the ZooZap code?

The ZooZap code was incorrect. #6077 replaced the use of removeSingletonLock with a new method in ServiceLock.

removeSingletonLock calls zapDirectory first which takes a path and only calls recursiveDelete for each child of the path.

The new ServiceLock code was calling recursiveDelete directly with the top-level path. So it was removing all the gc locks, but then also removing the path node gc/lock so new gc processes could not find the gc/lock path node so all new locks were failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants