-
Notifications
You must be signed in to change notification settings - Fork 123
add AgentSessionStateDelta for state sync #1393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
| string agent_name = 3; | ||
| string metadata = 4; | ||
| bytes session_data = 5; | ||
| repeated AgentSessionStateDelta session_state = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two options for sync state from cloud to worker:
- always send the full db
- ask the worker what version you have and sync a partial of the delta to worker
for the second case we may also need a protocol to ask and answer the db version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can maintain sticky routing with state on the server that includes which state version a worker has.
in the TextWorkerRequest when we think the state already exists we can send the state version instead of session data.
the agent can optionally respond with an error instructing the server to retry with a state snapshot - this should be very uncommon.
chenghao-mou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| optional string base_version = 1; | ||
| string new_version = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we keep this simple by making these uint64s and incrementing by 1 for every turn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it could be int
| string job_id = 1; | ||
| } | ||
|
|
||
| message AgentSessionStateDelta { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message AgentSessionState {
uint64 version = 1;
bytes snapshot = 2;
bytes delta = 3;
}request
- version + snapshot - cold start request for worker that is not expected to have the state already
- version - request for worker expected to have up to date state (recently served the same session)
response
- version + snapshot - initial response or resync after schema change
- version + delta - response to ongoing session
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there could be multiple deltas if the cloud doesn't combine them into a single one. maybe we can have only deltas and use the v1 as the base snapshot, what do you think
message AgentSessionState {
// the version of the latest delta, version_1 means the delta from empty state
uint64 version = 1;
repeated bytes deltas = 2;
}request
- deltas empty - request for worker to have up to date state, which is the same as version in the request
- deltas from 1 up to the version - cold start
response
- version 1 + delta - initial response or request to drop previous deltas in cloud if any
- version X + delta - response to ongoing session
- version 0 + error - ask for cold start (potentially return a version X that requests deltas after that version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on benchmarks merging the changesets is inexpensive enough that we can always deliver snapshots for a cold start request.
we shouldn't expect the server to maintain the full changeset history. the minimum requirement is changeset version n and snapshot n-1 (in case we have to cold start a worker for a client retry).
separating the snapshot and delta fields makes the api more explicit than relying on version number to interpret the meaning of the data field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean the cloud will merge the changesets into a single changeset, or merge the changesets to the snapshot? I think it makes a lot of sense if the cloud can merge the changesets to the snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's what i mean. we'll only ever send the agent a snapshot for cold start
No description provided.