Skip to content

Comments

DAOS-18587 chk: handle report upcall failure - b26#17557

Open
Nasf-Fan wants to merge 1 commit intorelease/2.6from
Nasf-Fan/DAOS-18587_b26
Open

DAOS-18587 chk: handle report upcall failure - b26#17557
Nasf-Fan wants to merge 1 commit intorelease/2.6from
Nasf-Fan/DAOS-18587_b26

Conversation

@Nasf-Fan
Copy link
Contributor

Anytime when DAOS engine logic needs interaction with admin, it will generate new interaction record in chk_instance::ci_pending_hdl tree, and then trigger dRPP upcall to control plane that may fail for some reason. If hit failure, the dRPC sponsor needs to remove such record from chk_instance::ci_pending_hdl tree before destroying it to avoid triggering fake assertion.

The patch also fixes a container label check issue: If the label is transferred as d_iov_t instead of string, then the buffer maybe not '\0' terminated, need to check its buffer length.

Test-tag: recovery

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link

Ticket title is 'CR - dmg check start causes engine crash on Aurora'
Status is 'In Progress'
Labels: 'catastrophic_recovery,test_2.8'
https://daosio.atlassian.net/browse/DAOS-18587

Anytime when DAOS engine logic needs interaction with admin, it will
generate new interaction record in chk_instance::ci_pending_hdl tree,
and then trigger dRPP upcall to control plane that may fail for some
reason. If hit failure, the dRPC sponsor needs to remove such record
from chk_instance::ci_pending_hdl tree before destroying it to avoid
triggering fake assertion.

The patch also fixes a container label check issue:
If the label is transferred as d_iov_t instead of string, then the
buffer maybe not '\0' terminated, need to check its buffer length.

Test-tag: recovery

Signed-off-by: Fan Yong <fan.yong@hpe.com>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-18587_b26 branch from d3329c4 to 9a1efff Compare February 13, 2026 15:24
@daosbuild3
Copy link
Collaborator

@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Feb 14, 2026

Passed all required CI tests. NLT failure is not related with the patch.

@Nasf-Fan Nasf-Fan marked this pull request as ready for review February 14, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants