Potential Vulnerability in Update Sequence During Slot Migration

In the `tryUpdateMigrationStatus` logic, after a successful migration, the controller first updates the topology in the kvrocks cluster via `SetSlot`, then updates the etcd data using `clusterStore.UpdateCluster`. However, there appears to be a potential vulnerability here:  
- If `SetSlot` succeeds but a network failure prevents `UpdateCluster` from completing (or worse, if `SetSlot` partially succeeds with some nodes updated and others failing), this could leave the kvrocks cluster version **higher** than the cluster version stored in etcd.  
- Even if another controller takes over from the failed one, it would be unable to execute new operations because subsequent "versions" issued by the controller would no longer exceed the version already in kvrocks.  

To address this, I suggest reversing the order: **update etcd first**, then propagate the new topology to kvrocks.  
As a side note, if this approach is adopted, the `clusterx set slot` command in kvrocks might become redundant, as we could fully update the post-migration topology using `setnodes` instead.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Vulnerability in Update Sequence During Slot Migration #268

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential Vulnerability in Update Sequence During Slot Migration #268

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions