Skip to content

Conversation

@dbschmigelski
Copy link
Member

This PR proposes a decision record aligning with the outcome of the review of the following document.


Should new features implement HookProvider?

Problem Statement

We are introducing RetryStrategy as a first-class parameter in the Agent constructor. RetryStrategy determines whether and how to retry failed model invocations or tool executions. Like SessionManager and ConversationManager, RetryStrategy will leverage HookProvider to integrate with the agent lifecycle. The team has agreed that the public interface should be retry_strategy: RetryStrategy rather than the more generic retry_strategy: HookProvider. The open question is whether the RetryStrategy interface itself should extend HookProvider to perform retries via the hook system, or whether it should expose a simpler contract such as should_retry(exception: Exception) -> bool with the framework handling invocation internally.

While RetryStrategy provides the immediate context for this decision, the question applies broadly to all future interfaces we introduce. Should new capabilities like guardrails, rate limiters, cost trackers, or audit loggers extend HookProvider as SessionManager and ConversationManager do today? Or should we establish a different pattern where domain interfaces remain pure and the framework handles lifecycle integration internally? The decision we make for RetryStrategy will set precedent for the SDK's architectural direction.

This discussion assumes HookProvider is a valid internal mechanism for lifecycle coordination. Whether hooks are the right architectural choice for the framework is a separate question. The question here is narrower: when the framework uses hooks internally, should that implementation detail be exposed in user-facing interfaces?

Context: HookProvider as Internal Primitive

Today, both SessionManager and ConversationManager implement HookProvider. This architectural choice enabled rapid iteration because new capabilities could be built using the same extensibility mechanism available to external contributors. When we needed session persistence, we did not invent a new lifecycle integration pattern; we registered callbacks for AgentInitializedEvent, MessageAddedEvent, and AfterInvocationEvent. The hook system became our internal composition primitive.

This approach follows the principle of dogfooding. By building features through the public extensibility API, we validate that API continuously and ensure external developers have access to the same power we do. This also reinforces a key architectural property: Martin Fowler describes the distinction between a framework and a library as inversion of control—frameworks call your code rather than the other way around. Hooks give us that inversion uniformly whether the code is internal or external.

The Case for Extending HookProvider

Consistency argues for HookProvider. If SessionManager persists state via hooks and ConversationManager reduces context via hooks, then RetryStrategy performing retries via hooks maintains a uniform mental model. Developers who understand one component understand the composition pattern of all components. This aligns with our tenet of composability: primitives are building blocks with each other, and each feature is developed with all other features in mind.

Under this approach, the interface would look like the following:

class RetryStrategy(HookProvider, ABC):
    """Base class for retry strategies that integrate via the hook system."""

    def register_hooks(self, registry: HookRegistry, **kwargs: Any) -> None:
        registry.add_callback(AfterModelCallEvent, self._on_model_error)

    def _on_model_error(self, event: AfterModelCallEvent) -> None:
        if self.should_retry_model(event.exception, event.attempt):
            event.retry = True

    @abstractmethod
    def should_retry_model(self, exception: Exception, attempt: int) -> bool:
        """Return true if the model call should be retried based on this exception."""
        ...

Implementing HookProvider also honors our commitment to being extensible by design. A RetryStrategy could respond to multiple events such as AfterModelCallEvent and AfterToolCallEvent, adjust its behavior based on telemetry events, or compose with other hook providers. The Open-Closed Principle suggests we should design for extension, and HookProvider provides that extension surface.

The Case Against: Leaky Abstractions and Implicit Behavior

The counterargument centers on abstraction boundaries. When a user configures session_manager=FileSessionManager(), they cannot easily reason about when persistence occurs. The session is not saved through an explicit session_manager.save() call; instead, persistence happens implicitly through hook callbacks responding to events the user may not know exist. This tension directly challenges our tenet that the obvious path is the happy path. If understanding when sessions persist requires knowledge of hook registration and event dispatch, we have made the non-obvious path the only path.

Joshua Bloch's guidance on API design emphasizes that interfaces should be easy to use correctly and hard to use incorrectly. A RetryStrategy with a should_retry method is self-documenting:

class RetryStrategy(ABC):
    """Simple interface for retry decisions."""

    @abstractmethod
    def should_retry(self, exception: Exception, attempt: int) -> bool:
        """Return true if the operation should be retried."""
        ...

The framework calls should_retry, the strategy returns a boolean, and the framework acts accordingly. The user need not understand AfterModelCallEvent, hook registration, or callback ordering. This simplicity embodies our tenet that simple things should be simple. A retry strategy is conceptually simple; its interface should reflect that simplicity rather than exposing the machinery of lifecycle integration.

The Deeper Question: Primitive or Leak?

The core tension is whether HookProvider is a base primitive upon which everything should be built, or whether its internal usage constitutes an abstraction leak that we have simply normalized.

The hook-based approach has enabled customers to solve problems the framework did not anticipate. Consider a real-world example where a team needed to run guardrail validation before session persistence. Because hook ordering is determined by registration order, they embedded guardrail registration inside their session manager subclass:

class SessionManagerWithGuardrails(BaseSessionManager):
    def register_hooks(self, registry: HookRegistry, **kwargs: Any) -> None:
        # Register guardrail first so it runs before persistence
        registry.add_callback(MessageAddedEvent, self._guardrail.on_message_added)
        # Then let the base session manager add its callbacks
        super().register_hooks(registry, **kwargs)

The HookProvider abstraction gave them the power to solve their problem without waiting for framework changes. This is extensibility by design working as intended.

However, this flexibility comes with tradeoffs. The workaround violates separation of concerns: a session manager is an unexpected place for guardrail configuration. A reader must understand hook registration sequencing to understand why guardrails appear in a session manager class. If the base class changes its registration order, the workaround may silently break. The question is whether these tradeoffs are acceptable costs of extensibility, or whether they indicate the abstraction is leaking implementation details that force unrelated concerns to merge.

Consider Java's Comparator interface. A comparator exposes compare(a, b) and the framework calls it at the appropriate time. The comparator knows nothing about sorting internals or when comparisons occur. A simple RetryStrategy follows the same pattern: expose should_retry and let the framework call it. Extending HookProvider inverts this relationship, requiring the user to understand the event system, register callbacks, and respond to event objects just to implement retry logic.

Exposing HookProvider in an interface is not inherently a leak—it depends on what the interface requires. SessionManager legitimately needs to respond to initialization, message addition, and invocation completion as separate events. If we chose not to have SessionManager implement HookProvider, we would introduce abstract methods like on_agent_initialized, on_message_added, and on_invocation_complete—strongly coupled to the same lifecycle events, just with extra indirection. RetryStrategy is different: it has a single decision point, so a simple should_retry method captures the entire domain without mirroring event structure. Exposing HookProvider becomes a leak when we apply it uniformly rather than asking what each interface actually requires.

Recommendation

While RetryStrategy is simple enough that a non-HookProvider interface would suffice, for consistency with SessionManager and ConversationManager, RetryStrategy will extend HookProvider. This maintains a uniform pattern across all agent constructor parameters that integrate with the lifecycle: users implementing any of these interfaces learn one composition model.

Decision Framework for Future Interfaces

When introducing a new internal interface, use the following criteria to determine whether it should extend HookProvider or expose a simple domain interface.

Consider a simple interface (not extending HookProvider) when:
  1. The interface has a single, well-defined responsibility that can be expressed as one or two methods with no lifecycle coordination.
  2. The interface does not need to respond to multiple lifecycle events. If the capability involves a single decision point rather than reacting to initialization, message flow, and invocation completion separately, HookProvider adds complexity without benefit.
  3. Consistency with existing interfaces is not a priority. If the new interface stands alone or establishes a new pattern, simplicity may outweigh uniformity.
Use HookProvider (or extend it) when:
  1. The capability inherently requires responding to multiple distinct lifecycle events. SessionManager must act on initialization, message addition, and invocation completion. This multi-event coordination is the domain, not an implementation detail.
  2. Users are expected to customize not just behavior but timing. If users need to decide which events to subscribe to, or need to register additional callbacks beyond what the base class provides, HookProvider exposure is appropriate.

When uncertain, prefer the simple interface. A simple interface can always be evolved to extend HookProvider if real use cases emerge. The opposite migration path breaks existing implementations.

@strands-agent
Copy link
Contributor

Documentation Deployment Complete

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-457/

@dbschmigelski dbschmigelski merged commit 38a4c79 into strands-agents:main Jan 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants