ADR: Python context compaction strategy #3802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

eavanvalkenburg wants to merge 2 commits into microsoft:main from eavanvalkenburg:eavanvalkenburg/adr-context-compaction

Member

eavanvalkenburg commented Feb 10, 2026

Summary

Proposes a design for context compaction in the Python Agent Framework, extracted from the open discussion in ADR-0016.

Problem

Long-running agents with many tool calls accumulate unbounded message lists. The current architecture cannot compact messages during the tool loop — stores are read once, and middleware only modifies copies.

Options Proposed

Standalone CompactionStrategy object — composed into HistoryProvider and FunctionInvocationConfiguration
CompactionStrategy as a mixin for HistoryProvider subclasses
Separate CompactionProvider as a ContextProvider subclass
Mutable message access in ChatMiddleware — refactor copy semantics

Key Design Points

Four compaction points: post-load, pre-write, in-run (tool loop), existing storage
Not applicable to service-managed storage
Atomic group preservation (tool calls + results)
Leverages source_id attribution from ADR-0016
HistoryProvider.save_messages() needs an overwrite mode for storage compaction

References


          Add ADR for Python context compaction strategy

c941a62

markwallace-microsoft added the documentation label


          Remove async vs sync open question - compact() is async

c22c60c

markwallace-microsoft reviewed

View reviewed changes

docs/decisions/00XX-python-context-compaction-strategy.md


		This ADR applies to two scenarios where the client constructs and manages the message list sent to the model:

		1. With local storage (e.g., `InMemoryHistoryProvider`, Redis, Cosmos) — compaction is needed at all four points (post-load, pre-write, in-run, existing storage)

Member

markwallace-microsoft Feb 10, 2026

Will our OOB memory providers always read the full chat history by default?

Member Author

eavanvalkenburg Feb 11, 2026

right now, there is no filtering.

docs/decisions/00XX-python-context-compaction-strategy.md

+              - **Truncation**: Keep only the last N messages or N tokens
+              - **Summarization**: Replace older messages with an LLM-generated summary
+              - **Selective removal**: Remove tool call/result pairs while keeping user/assistant turns
+              - **Sliding window with anchor**: Keep system message + last N messages

Member

markwallace-microsoft Feb 10, 2026

We must keep the system message in all cases

docs/decisions/00XX-python-context-compaction-strategy.md

+              - **Summarization**: Replace older messages with an LLM-generated summary
+              - **Selective removal**: Remove tool call/result pairs while keeping user/assistant turns
+              - **Sliding window with anchor**: Keep system message + last N messages
+              - **Token budget**: Remove oldest messages until under a token threshold

Member

markwallace-microsoft Feb 10, 2026

Is this the same as #1 when using N tokens

docs/decisions/00XX-python-context-compaction-strategy.md

+              - **Summarization**: Replace older messages with an LLM-generated summary
+              - **Selective removal**: Remove tool call/result pairs while keeping user/assistant turns
+              - **Sliding window with anchor**: Keep system message + last N messages
+              - **Token budget**: Remove oldest messages until under a token threshold

Member

markwallace-microsoft Feb 10, 2026

We also want to allow for custom compaction strategies.

docs/decisions/00XX-python-context-compaction-strategy.md

+              A compaction strategy takes a list of messages and returns a (potentially shorter) list:
+              - **Truncation**: Keep only the last N messages or N tokens
+              - **Summarization**: Replace older messages with an LLM-generated summary

Member

markwallace-microsoft Feb 10, 2026

In SK we didn't replace older messages instead

Generate a summary of a range of memories
Include more recent summary
Insert the summary into the chat history in the appropriate place
Return recent messages up to the most recent summary

We don't want to lose any data from chat history

docs/decisions/00XX-python-context-compaction-strategy.md


		For compaction on existing storage (and pre-write compaction that rewrites history), we need a way to overwrite rather than append. Two options:

		1. Add a `replace_messages()` method to `HistoryProvider`:

Member

markwallace-microsoft Feb 10, 2026

Will this result in data loss?

moonbox3 reviewed

View reviewed changes

docs/decisions/00XX-python-context-compaction-strategy.md

+              ```python
+              # Inside the function invocation loop (e.g., in _try_execute_function_calls)
+              messages.append(tool_result_message)
+              if config.get("compaction_strategy"):

Contributor

moonbox3 Feb 11, 2026

nit: I believe this code has a bug. Instead:

compacted = await config["compaction_strategy"].compact(messages)
messages.clear()
messages.extend(compacted)
# or: messages[:] = compacted

The current code reassigns the local variable but doesn't mutate the list in-place. If the calling code holds a reference to the original list (which it does — prepped_messages in _tools.py), the reassignment has no effect.

docs/decisions/00XX-python-context-compaction-strategy.md

+                          history = await self.post_load_compaction.compact(history)
+                      context.extend_messages(self.source_id, history)
+                  async def after_run(self, agent, session, context, state) -> None:

Contributor

moonbox3 Feb 11, 2026

save_messages is append-only, and _collect_messages only returns new messages from this run. So pre-write compaction only compacts the current turn's messages before appending, it never touches the existing history. Is that the intent?

If you want to compact the full stored history on write, you'd need to read-compact-replace, which
requires the replace_messages extension discussed separately, if I am not mistaken.

Can we clarify whether "pre-write" means:

Compact only the new messages before appending (current code), or
Compact the full history (existing + new) and replace

These are different behaviors.

Member Author

eavanvalkenburg Feb 11, 2026

yeah, it is only compacting the response messages, compacting what's already in the store is separate.

docs/decisions/00XX-python-context-compaction-strategy.md

+              ### Open Questions
+. **Naming**: Should we use `CompactionStrategy`, `ChatReducer` (for .NET alignment), or `ContextReducer`?
+. **Trigger mechanism for in-run**: Should compaction run after every tool call, or only when a threshold is exceeded (e.g., token count, message count)?

Contributor

moonbox3 Feb 11, 2026

I feel like this trigger mechanism open question deserves more than a mention here. It feels like one of the most important design decision for in-run compaction. Running compact() after every single tool call feels wasteful, because most runs won't need compaction, and summarization strategies involve an LLM call.

What if we make the trigger part of the strategy interface:

class CompactionStrategy(ABC):
    def should_compact(self, messages: Sequence[Message]) -> bool:
        """Check if compaction is needed. Called after each tool result."""
        return True  # Default: always compact (subclasses override)

    @abstractmethod
    async def compact(self, messages: Sequence[Message]) -> list[Message]:
        ...

In this way, it keeps the trigger logic co-located with the strategy (a token-budget strategy knows its own budget) and avoids the LLM call overhead when compaction isn't needed. Thoughts?

Member Author

eavanvalkenburg Feb 11, 2026

makes sense

docs/decisions/00XX-python-context-compaction-strategy.md


		A critical constraint for any compaction strategy: tool calls and their results must be kept together. LLM APIs (OpenAI, Azure, etc.) require that an assistant message containing `tool_calls` is always followed by corresponding `tool` result messages. A compaction strategy that removes one without the other will cause API errors.

		Strategies must treat `[assistant message with tool_calls] + [tool result messages]` as atomic groups — either keep the entire group or remove it entirely.

Contributor

moonbox3 Feb 11, 2026

I get that [assistant with tool_calls] + [tool results] must be treated atomically, but then the doc leaves it to each strategy to implement correctly. Isn't this too error-prone? Every custom strategy would need to re-implement the same grouping logic.

What if we have a utility?

def group_atomic_messages(messages: Sequence[Message]) -> list[list[Message]]:
    """Group messages into atomic units that must be kept or removed together."""
    ...

Or even better, make compact() operate on grouped messages by default, with a lower-level compact_raw() for strategies that need full control. This would prevent the most common class of bugs in custom strategies, I think.

Member Author

eavanvalkenburg Feb 11, 2026

you are right, but that's a implementation detail

docs/decisions/00XX-python-context-compaction-strategy.md

+              ```python
+              history = await provider.get_messages(session_id)
+              compacted = await strategy.compact(history)

Contributor

moonbox3 Feb 11, 2026

What happens if compact() raises during the tool loop? I don't see this called out in the doc. For robustness, the loop should probably catch exceptions from compaction and continue with uncompacted messages and log a warning, rather than failing the entire agent.run(). Do you agree? A truncation strategy should never fail, but a summarization strategy that calls an LLM absolutely can.

Member Author

eavanvalkenburg Feb 11, 2026

good one, will add

docs/decisions/00XX-python-context-compaction-strategy.md


		### Open Questions

		1. Naming: Should we use `CompactionStrategy`, `ChatReducer` (for .NET alignment), or `ContextReducer`?

Contributor

moonbox3 Feb 11, 2026

I'd keep CompactionStrategy. I think it's more descriptive as to what is actually happening.

docs/decisions/00XX-python-context-compaction-strategy.md

+. **Naming**: Should we use `CompactionStrategy`, `ChatReducer` (for .NET alignment), or `ContextReducer`?
+. **Trigger mechanism for in-run**: Should compaction run after every tool call, or only when a threshold is exceeded (e.g., token count, message count)?
+. **Chaining**: Should multiple strategies be chainable (e.g., summarize then truncate)?

Contributor

moonbox3 Feb 11, 2026

I feel like this comes naturally based on Option 1's design. It should be built-in.

class ChainedStrategy(CompactionStrategy):
    def __init__(self, *strategies: CompactionStrategy):
        self.strategies = strategies

    async def compact(self, messages):
        for strategy in self.strategies:
            messages = await strategy.compact(messages)
        return messages

docs/decisions/00XX-python-context-compaction-strategy.md


		## Pros and Cons of the Options

		### Option 1: Standalone `CompactionStrategy` Object

Contributor

moonbox3 Feb 11, 2026

A thought in terms of observability: long-running agents with compaction will be hard to debug without visibility into when compaction happened and what was removed. Should we consider whether the strategy should return metadata (like CompactionResult with messages + removed_count + summary) or whether this should be handled via logging/events? Even a simple log line ("Compacted 150 messages to 50") could be helpful.

Member Author

eavanvalkenburg Feb 11, 2026

I need to do some work there, I think it might make sense to actually separate the messages that are sent to the model (with compaction) split from the messages that are returned to the user/session, good logging should also be part indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels