Standardize fenced IPC/corosync code#4005
Merged
nrwahl2 merged 25 commits intoClusterLabs:mainfrom Dec 12, 2025
Merged
Conversation
Contributor
It's absolutely on the list of things I'd like to do. I did a bunch of refactors in #3863 and #3866, but those are just the tip of the iceberg. That's when I saw what a mess it is. Lately I've been focusing on booth refactors while waiting for review of my libcrmcommon refactors. Booth isn't widely used, but I shudder to think of trying to fix anything there someday, given the current state of the code... |
bfc8b57 to
8bc78d0
Compare
nrwahl2
reviewed
Dec 11, 2025
Contributor
nrwahl2
left a comment
There was a problem hiding this comment.
Partial review, through "Don't call pcmk__assert in fenced handlers."
This brings it in line with how other daemons are organized. Note that unlike other daemons I've reorganized recently, fenced uses both corosync and IPC. This will require further reorganization and refactoring in later commits. For the moment, not everything IPC related is in this file.
* Add doxygen comments to explain what each function does. * Remove unnecessary whitespace in function headers. * Standardize log messages with what's happening in the same functions in attrd. * Standardize variable names (c becomes client, qbc becomes c, request becomes msg). * Make sure return values and comments match what libqb documents.
* Add fenced_ipc_init to register callback functions and add them to the mainloop. This means the callback struct can now be static. * Add fenced_ipc_cleanup to clean up. This includes client dropping and cleanup tasks that were previously not happening. I'm not sure if this was a bug or if it doesn't really matter, but it's still good to do this in all daemons.
This isn't all the corosync code - just the initial setup/teardown stuff. There are no code changes aside from renaming the cluster variable to fenced_cluster and defining it in fenced_corosync.c.
This brings them in line with what's happening in attrd. Note that stonith_peer_callback remains unchanged, but there's other things that need to be done to that function so it will be handled separately.
* Add doxygen comments to each. * Improve whitespace. * Rename groupName parameter to group_name. * Rearrange code in fenced_cpg_dispatch to be more in line with what's happening in attrd.
* Add doxygen comments to each. * Improve whitespace. * Rename groupName parameter to group_name.
This brings it closer to what is happening in attrd. For the moment it's still basically just a wrapper around stonith_command, but that will change in future patches.
And make it look a lot more like the version in attrd. It looks likely that there's always going to be daemon-specific code to these handle_request functions, but so far it looks like it will be pretty minimal.
The comment indicates the reply might be freed before we log the op, but I don't see anywhere that happens.
If the handler only deals with one type of message (IPC or CPG), it should first check for the error case and handle that. If it deals with both types of messages, it should first check for whichever one requires less code to process. This allows as much of the body of the function as possible to be indented as little as possible.
A couple of these handlers are only meant for IPC clients - if we get a CPG client, that's an unknown request. One and only one of ipc_client or peer must be non-NULL at a time, and in general we don't assert in other daemons like this.
* Add some additional checks in line with what is happening in other daemon dispatch functions. * Move a couple comments above the code they document for line length reasons. * Remove the redundant trace logging call. This is already done by call_api_dispatch.
To make this look as similar as possible to the dispatch functions in other daemons, I want to get rid of the stonith_command function. This will result in a little bit of code duplication for now, but I'm planning on eventually reducing that duplication across all daemons.
Instead, inline the portions of stonith_command that would be used and then get rid of that function entirely.
This does lose some of the existing error handling - if there's an OOM condition, this code will no longer abort and it will instead be up to whatever next allocates memory to fail. However, this is almost exactly what's happening in attrd and OOM should be extremely rare. It's expected that eventually all daemons will use identical code for this purpose, at which point we can handle these rare error cases better.
And move the definition of attrd_cluster into attrd_corosync.c where it belongs.
Some (but not all) daemons can handle both IPC and CPG message types, so the unknown request message should be able to print both. I'm doing this even on daemons that only handle one type in the interest of making these functions as similar as possible. Other daemons have a different style of unknown request function. Rather than sort that out right now (I think we'll need to figure out the ACK vs NACK thing first), I'm just removing the message type detail from the result.
8bc78d0 to
6bdf2de
Compare
nrwahl2
reviewed
Dec 12, 2025
nrwahl2
approved these changes
Dec 12, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ugh, this one wanted to get out of control. There's a whole lot of code in the fence daemon. This is basically the same as the previous PRs, but I did not move code out of fenced_commands.c into fenced_messages.c for consistency with other daemons. There's just so much code in that file that it didn't seem worth it to introduce so much churn for no good reason. I also didn't update the header file blocks in the files I didn't touch for similar reasons.
There's a lot that could still be done just in fenced_commands.c:
This could probably keep one of us busy for a week or two, after which time maybe reorganizing the remaining code into fenced_messages.c would be easier and make more sense.