Skip to content

add resourceversion when updating objects#2203

Open
rahulait wants to merge 2 commits intoNVIDIA:mainfrom
rahulait:add-resourceversion
Open

add resourceversion when updating objects#2203
rahulait wants to merge 2 commits intoNVIDIA:mainfrom
rahulait:add-resourceversion

Conversation

@rahulait
Copy link
Contributor

@rahulait rahulait commented Mar 11, 2026

Description

  1. For optimistic concurrency control, k8s requires ResourceVersion to be set when updating a resource. Several resource update paths were calling n.client.Update(ctx, obj) on a freshly built object and that lacked ResourceVersion, which could cause updates to fail (not sure how updates were passing before if they were).

The fix follows the same pattern everywhere: before updating, fetch the existing resource and copy its ResourceVersion before updating.

  1. Instead of using create-then-update-if-exists, use a get-first-then-create-or-update pattern. Every reconcile, we were making create calls which would be failing and then it would be doing update call. We can get the resource first and if it exists, do an update. If it doesn't, then we can create the resource. This way, we reduce create call on every reconcile.

Checklist

  • No secrets, sensitive information, or unrelated changes
  • Lint checks passing (make lint)
  • Generated assets in-sync (make validate-generated-assets)
  • Go mod artifacts in-sync (make validate-modules)
  • Test cases are added for new code paths

Testing

Signed-off-by: Rahul Sharma <rahulsharm@nvidia.com>
@rahulait
Copy link
Contributor Author

rahulait commented Mar 11, 2026

Existing behavior/code kind of works for us because role, rolebinding, clusterrole, clusterrolebinding, configmap, serviceaccount, service, deployment, daemonset allow updates without resource version (unconditional update = true).

Ex:
https://github.com/kubernetes/kubernetes/blob/537ab630b23903686992b4cfb905ff7a412becbf/pkg/registry/rbac/role/strategy.go#L97-L104
https://github.com/kubernetes/kubernetes/blob/537ab630b23903686992b4cfb905ff7a412becbf/pkg/registry/apps/deployment/strategy.go#L148-L150

I am not sure what is the better approach here. Current change does adds additional Get() call to get revisionVersion and avoid concurrent updates (uses cache though). But we don't have concurrent updates for these resources as of today (maybe we will once we enable/support HA?). Also, with the new change, we are going GET before doing CREATE or UPDATE and that will help in avoiding all the CREATE calls we were making till today on every reconcile as the resource would be there and we would be just updating it after that.

Also, we can also use patch instead of update to reduce the amount of data sent to apiserver.

I would like to get others thoughts on tradeoffs and improvement and if we should get this change in.

This comment was marked as outdated.

Signed-off-by: Rahul Sharma <rahulsharm@nvidia.com>

This comment was marked as outdated.

@rajathagasthya
Copy link
Contributor

@tariq1890
Copy link
Contributor

Also, we can also use patch instead of update to reduce the amount of data sent to apiserver.

I was going to suggest this. If we don't strictly need update, can we just use patch? We had changed some update calls to patch for K8s node updates in our controller logic before. So far, they have worked fine and have also solved problems of node update race conditions when we encountered them in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants