-
Notifications
You must be signed in to change notification settings - Fork 1.1k
ADR: Python context compaction strategy #3802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ADR: Python context compaction strategy #3802
Conversation
|
|
||
| This ADR applies to two scenarios where the **client** constructs and manages the message list sent to the model: | ||
|
|
||
| 1. **With local storage** (e.g., `InMemoryHistoryProvider`, Redis, Cosmos) — compaction is needed at all four points (post-load, pre-write, in-run, existing storage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will our OOB memory providers always read the full chat history by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, there is no filtering.
| - **Truncation**: Keep only the last N messages or N tokens | ||
| - **Summarization**: Replace older messages with an LLM-generated summary | ||
| - **Selective removal**: Remove tool call/result pairs while keeping user/assistant turns | ||
| - **Sliding window with anchor**: Keep system message + last N messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must keep the system message in all cases
| - **Summarization**: Replace older messages with an LLM-generated summary | ||
| - **Selective removal**: Remove tool call/result pairs while keeping user/assistant turns | ||
| - **Sliding window with anchor**: Keep system message + last N messages | ||
| - **Token budget**: Remove oldest messages until under a token threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as #1 when using N tokens
| - **Summarization**: Replace older messages with an LLM-generated summary | ||
| - **Selective removal**: Remove tool call/result pairs while keeping user/assistant turns | ||
| - **Sliding window with anchor**: Keep system message + last N messages | ||
| - **Token budget**: Remove oldest messages until under a token threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also want to allow for custom compaction strategies.
| A compaction strategy takes a list of messages and returns a (potentially shorter) list: | ||
|
|
||
| - **Truncation**: Keep only the last N messages or N tokens | ||
| - **Summarization**: Replace older messages with an LLM-generated summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SK we didn't replace older messages instead
- Generate a summary of a range of memories
- Include more recent summary
- Insert the summary into the chat history in the appropriate place
- Return recent messages up to the most recent summary
We don't want to lose any data from chat history
|
|
||
| For compaction on existing storage (and pre-write compaction that rewrites history), we need a way to overwrite rather than append. Two options: | ||
|
|
||
| 1. **Add a `replace_messages()` method** to `HistoryProvider`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this result in data loss?
| ```python | ||
| # Inside the function invocation loop (e.g., in _try_execute_function_calls) | ||
| messages.append(tool_result_message) | ||
| if config.get("compaction_strategy"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe this code has a bug. Instead:
compacted = await config["compaction_strategy"].compact(messages)
messages.clear()
messages.extend(compacted)
# or: messages[:] = compactedThe current code reassigns the local variable but doesn't mutate the list in-place. If the calling code holds a reference to the original list (which it does — prepped_messages in _tools.py), the reassignment has no effect.
| history = await self.post_load_compaction.compact(history) | ||
| context.extend_messages(self.source_id, history) | ||
|
|
||
| async def after_run(self, agent, session, context, state) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save_messages is append-only, and _collect_messages only returns new messages from this run. So pre-write compaction only compacts the current turn's messages before appending, it never touches the existing history. Is that the intent?
If you want to compact the full stored history on write, you'd need to read-compact-replace, which
requires the replace_messages extension discussed separately, if I am not mistaken.
Can we clarify whether "pre-write" means:
- Compact only the new messages before appending (current code), or
- Compact the full history (existing + new) and replace
These are different behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it is only compacting the response messages, compacting what's already in the store is separate.
| ### Open Questions | ||
|
|
||
| 1. **Naming**: Should we use `CompactionStrategy`, `ChatReducer` (for .NET alignment), or `ContextReducer`? | ||
| 2. **Trigger mechanism for in-run**: Should compaction run after every tool call, or only when a threshold is exceeded (e.g., token count, message count)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this trigger mechanism open question deserves more than a mention here. It feels like one of the most important design decision for in-run compaction. Running compact() after every single tool call feels wasteful, because most runs won't need compaction, and summarization strategies involve an LLM call.
What if we make the trigger part of the strategy interface:
class CompactionStrategy(ABC):
def should_compact(self, messages: Sequence[Message]) -> bool:
"""Check if compaction is needed. Called after each tool result."""
return True # Default: always compact (subclasses override)
@abstractmethod
async def compact(self, messages: Sequence[Message]) -> list[Message]:
...In this way, it keeps the trigger logic co-located with the strategy (a token-budget strategy knows its own budget) and avoids the LLM call overhead when compaction isn't needed. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
|
|
||
| A critical constraint for any compaction strategy: **tool calls and their results must be kept together**. LLM APIs (OpenAI, Azure, etc.) require that an assistant message containing `tool_calls` is always followed by corresponding `tool` result messages. A compaction strategy that removes one without the other will cause API errors. | ||
|
|
||
| Strategies must treat `[assistant message with tool_calls] + [tool result messages]` as atomic groups — either keep the entire group or remove it entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that [assistant with tool_calls] + [tool results] must be treated atomically, but then the doc leaves it to each strategy to implement correctly. Isn't this too error-prone? Every custom strategy would need to re-implement the same grouping logic.
What if we have a utility?
def group_atomic_messages(messages: Sequence[Message]) -> list[list[Message]]:
"""Group messages into atomic units that must be kept or removed together."""
...Or even better, make compact() operate on grouped messages by default, with a lower-level compact_raw() for strategies that need full control. This would prevent the most common class of bugs in custom strategies, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, but that's a implementation detail
|
|
||
| ```python | ||
| history = await provider.get_messages(session_id) | ||
| compacted = await strategy.compact(history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if compact() raises during the tool loop? I don't see this called out in the doc. For robustness, the loop should probably catch exceptions from compaction and continue with uncompacted messages and log a warning, rather than failing the entire agent.run(). Do you agree? A truncation strategy should never fail, but a summarization strategy that calls an LLM absolutely can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good one, will add
|
|
||
| ### Open Questions | ||
|
|
||
| 1. **Naming**: Should we use `CompactionStrategy`, `ChatReducer` (for .NET alignment), or `ContextReducer`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep CompactionStrategy. I think it's more descriptive as to what is actually happening.
|
|
||
| 1. **Naming**: Should we use `CompactionStrategy`, `ChatReducer` (for .NET alignment), or `ContextReducer`? | ||
| 2. **Trigger mechanism for in-run**: Should compaction run after every tool call, or only when a threshold is exceeded (e.g., token count, message count)? | ||
| 3. **Chaining**: Should multiple strategies be chainable (e.g., summarize then truncate)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this comes naturally based on Option 1's design. It should be built-in.
class ChainedStrategy(CompactionStrategy):
def __init__(self, *strategies: CompactionStrategy):
self.strategies = strategies
async def compact(self, messages):
for strategy in self.strategies:
messages = await strategy.compact(messages)
return messages|
|
||
| ## Pros and Cons of the Options | ||
|
|
||
| ### Option 1: Standalone `CompactionStrategy` Object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought in terms of observability: long-running agents with compaction will be hard to debug without visibility into when compaction happened and what was removed. Should we consider whether the strategy should return metadata (like CompactionResult with messages + removed_count + summary) or whether this should be handled via logging/events? Even a simple log line ("Compacted 150 messages to 50") could be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to do some work there, I think it might make sense to actually separate the messages that are sent to the model (with compaction) split from the messages that are returned to the user/session, good logging should also be part indeed
Summary
Proposes a design for context compaction in the Python Agent Framework, extracted from the open discussion in ADR-0016.
Problem
Long-running agents with many tool calls accumulate unbounded message lists. The current architecture cannot compact messages during the tool loop — stores are read once, and middleware only modifies copies.
Options Proposed
CompactionStrategyobject — composed intoHistoryProviderandFunctionInvocationConfigurationCompactionStrategyas a mixin forHistoryProvidersubclassesCompactionProvideras aContextProvidersubclassChatMiddleware— refactor copy semanticsKey Design Points
source_idattribution from ADR-0016HistoryProvider.save_messages()needs an overwrite mode for storage compactionReferences