Added API layer for tool calls #89

Daedaelius · 2025-12-30T15:58:11Z

Summary by CodeRabbit

New Features
- Direct tool execution without LLM orchestration, returning structured results with execution time and metadata.
- Public endpoints to list and run available tools, plus actions: click, type, scroll, navigate, hover.
- Accessibility-tree retrieval and OpenAI-compatible execution result formatting.
- Execution tracing/logging surfaced for diagnostics.
Chores
- Configurable upstream branch for browser operator builds and updated bundled agent asset.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

chatgpt-codex-connector · 2025-12-30T15:58:15Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

coderabbitai · 2025-12-30T15:59:00Z

📝 Walkthrough

Walkthrough

This PR adds direct tool-execution endpoints and handlers across backend and frontend: new /v1/tools and /actions HTTP routes, a BrowserAgentServer.executeToolDirect RPC path, frontend evaluation protocol/agent support for execute_tool messages, and a GRD bundling update for ActionAgentV1.js.

Changes

Cohort / File(s)	Summary
Backend API & Tool Execution `agent-server/nodejs/src/api-server.js`	New HTTP routes: `/v1/tools/execute`, `/v1/tools/list`, `/actions/*` and `/page/accessibility-tree`. Added executeToolDirect(payload), getAvailableTools(), action handlers (click/type/scroll/navigate/hover) and logging/tracing scaffolding; standardized execution result shape.
Backend RPC Integration `agent-server/nodejs/src/lib/BrowserAgentServer.js`	Added BrowserAgentServer.prototype.executeToolDirect(connection, tool, args, timeout = 30000) to invoke DevTools RPC (`execute_tool`), generate rpcId, log details, and return/rethrow RPC responses.
Frontend: Evaluation Protocols `front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts` `front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts`	Added ToolExecutionRequest/Params, ToolExecutionSuccessResponse, ToolExecutionErrorResponse, isToolExecutionRequest type guard, and helper constructors for success/error responses.
Frontend: Evaluation Agents `front_end/panels/ai_chat/evaluation/EvaluationAgent.ts` `front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts`	Routed execute_tool messages to new handleToolExecutionRequest; look up tools in ToolRegistry, enforce timeout, execute via executeToolWithTimeout, return success/error responses and log timings.
Build config / Bundled sources `config/gni/devtools_grd_files.gni`	Replaced bundled GRD entry `ActionAgentV0.js` → `ActionAgentV1.js` in `grd_files_bundled_sources`.
Docker build `docker/Dockerfile`	Added BROWSER_OPERATOR_BRANCH build ARG and switched upstream checkout to `upstream/${BROWSER_OPERATOR_BRANCH}`; added `gn gen out/Default` regeneration step.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant APIServer
    participant BrowserAgentServer
    participant DevToolsRPC

    Client->>APIServer: POST /v1/tools/execute {tool, args}
    activate APIServer
    APIServer->>APIServer: validate payload, resolve client/tab
    APIServer->>BrowserAgentServer: executeToolDirect(connection, tool, args, timeout)
    deactivate APIServer

    activate BrowserAgentServer
    BrowserAgentServer->>BrowserAgentServer: generate rpcId, log invocation
    BrowserAgentServer->>DevToolsRPC: callMethod("execute_tool", {tool, args, timeout})
    deactivate BrowserAgentServer

    activate DevToolsRPC
    DevToolsRPC->>DevToolsRPC: execute tool, produce result
    DevToolsRPC-->>BrowserAgentServer: {success, output, executionTime}
    deactivate DevToolsRPC

    activate BrowserAgentServer
    BrowserAgentServer-->>APIServer: RPC response (result or error)
    deactivate BrowserAgentServer

    activate APIServer
    APIServer->>Client: {clientId, tabId, tool, success, output, executionTime, timestamp}
    deactivate APIServer

sequenceDiagram
    actor Frontend
    participant EvaluationAgent
    participant ToolRegistry
    participant Tool

    Frontend->>EvaluationAgent: ToolExecutionRequest {tool, args, timeout}
    activate EvaluationAgent
    EvaluationAgent->>EvaluationAgent: isToolExecutionRequest() check
    EvaluationAgent->>ToolRegistry: lookup(tool)
    activate ToolRegistry
    ToolRegistry-->>EvaluationAgent: Tool or null
    deactivate ToolRegistry

    alt Tool Found
        EvaluationAgent->>Tool: executeToolWithTimeout(args, timeout)
        activate Tool
        Tool-->>EvaluationAgent: {success, output, executionTime}
        deactivate Tool
        EvaluationAgent-->>Frontend: ToolExecutionSuccessResponse
    else Tool Not Found
        EvaluationAgent-->>Frontend: ToolExecutionErrorResponse {code, message}
    end
    deactivate EvaluationAgent

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Remove duplicate files in build #80: Touches the same config/gni/devtools_grd_files.gni bundled sources list and agent toolset entries.
Local Dockerised Eval Server #52: Overlaps on frontend LLM/evaluation protocol surface changes and agent/tool execution paths.

Suggested reviewers

tysonthomas9

"I hop and tap the console bright,
Tools hum to life in morning light,
Clicks and scrolls with nimble cheer,
LLM-free actions springing near,
A rabbit's dance—direct and light!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding an API layer for direct tool execution across multiple endpoints and implementations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (7)

docker/Makefile (2)

48-54: Note: macOS-specific Chrome path.

The chrome target uses a hardcoded path to Chrome Canary on macOS. Users on Linux or Windows will need to modify this command for their platform. Consider adding a comment noting this is macOS-specific, or document alternative commands for other platforms in the help text.

4-4: Optional: Static analysis suggestions.

The checkmake linter suggests adding conventional phony targets (all, clean, test), but these are not essential for this orchestration-focused Makefile. The warnings about target body length are also pedantic for this use case where readability benefits from keeping help text inline.

Based on static analysis hints.
agent-server/nodejs/src/lib/BrowserAgentServer.js (1)
1421-1468: Unused variable in executeToolDirect method.

The rpcId variable at line 1431 is declared but never used. Consider removing it or using it for logging/tracking purposes.
🔎 Proposed fix
   async executeToolDirect(connection, tool, args, timeout = 30000) {
-    const rpcId = `tool-${Date.now()}-${Math.random().toString(36).substring(2, 9)}`;
-
     logger.info('Executing tool directly', {
       clientId: connection.clientId,
       tool,
       timeout
     });
front_end/panels/ai_chat/LLM/LLMClient.ts (1)
212-242: Tracing metadata fallback logic is well-implemented.

The logic correctly prioritizes explicit request.tracingMetadata over the global tracing context, with appropriate null/empty checks. This ensures tracing data flows correctly even when async context is lost.

Consider using logger.debug instead of logger.info for the detailed tracing diagnostics to reduce log verbosity in production.
🔎 Suggested log level adjustment
-      logger.info('LLMClient.call() - Checking tracing context (fallback):', {
+      logger.debug('LLMClient.call() - Checking tracing context (fallback):', {
         hasContext: !!tracingContext,
         // ...
       });
-    logger.info('LLMClient.call() - Using explicit tracingMetadata from request:', {
+    logger.debug('LLMClient.call() - Using explicit tracingMetadata from request:', {
       // ...
     });
-      logger.info('Passing tracing metadata to provider:', tracingMetadata);
+      logger.debug('Passing tracing metadata to provider:', tracingMetadata);
-      logger.info('No tracing metadata available');
+      logger.debug('No tracing metadata available');
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (1)

354-439: Tool execution protocol definitions.

The tool execution types, type guard, and helper functions are correctly implemented. Note that these definitions are duplicated between EvaluationProtocol.ts and remote/EvaluationProtocol.ts.

Consider extracting shared protocol types to a common module to reduce duplication and ensure consistency.

The tool execution protocol types (ToolExecutionRequest, ToolExecutionParams, ToolExecutionSuccessResponse, ToolExecutionErrorResponse) and their helper functions are duplicated across:

front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts

This could lead to drift over time. Consider extracting these shared types to a common module that both files can import.
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (1)
1363-1399: Consider adding tracing support for direct tool execution.

Unlike handleEvaluationRequest, this handler passes undefined for tracingContext (line 1397). This means direct tool executions won't appear in Langfuse traces, which could make debugging harder. If the request contains tracing metadata (like evaluation requests can), consider creating a minimal tracing context.
🔎 Suggested enhancement for tracing support
  private async handleToolExecutionRequest(request: ToolExecutionRequest): Promise<void> {
    const { params, id } = request;
    const startTime = Date.now();

    logger.info('Received tool execution request', {
      tool: params.tool,
      hasArgs: !!params.args,
      timeout: params.timeout || 30000
    });

+   // Create minimal tracing context if tracing metadata provided
+   const requestTracing = (params as any).tracing || {};
+   let tracingContext: TracingContext | undefined;
+   if (requestTracing.trace_id || requestTracing.session_id) {
+     tracingContext = {
+       traceId: requestTracing.trace_id || `tool-${Date.now()}`,
+       sessionId: requestTracing.session_id || `tool-session-${Date.now()}`,
+       parentObservationId: undefined,
+       metadata: requestTracing
+     };
+   }

    try {
      // Get the tool from registry
      const tool = ToolRegistry.getRegisteredTool(params.tool);
      // ... rest of implementation
      const result = await this.executeToolWithTimeout(
        tool,
        params.args || {},
        timeout,
-       undefined,
+       tracingContext,
        params.tool
      );
agent-server/nodejs/src/api-server.js (1)
558-567: Consider adding defensive handling for malformed RPC results.

The response construction assumes result has a specific structure (result.result.success, result.result.output, etc.). If executeToolDirect throws or returns an unexpected structure, line 562's result?.result?.success ?? false handles it defensively, but the overall pattern could be cleaner with explicit structure validation.
🔎 Suggested defensive extraction
+   const output = result?.result ?? {};
    return {
      clientId: baseClientId,
      tabId,
      tool,
-     success: result?.result?.success ?? false,
-     output: result?.result?.output,
-     executionTime: result?.result?.executionTime,
+     success: output.success ?? false,
+     output: output.output,
+     executionTime: output.executionTime,
      error: result?.error?.message,
      timestamp: Date.now()
    };

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74d3fc1 and 7731989.

📒 Files selected for processing (18)

agent-server/nodejs/src/api-server.js
agent-server/nodejs/src/lib/BrowserAgentServer.js
docker/Dockerfile
docker/Makefile
docker/README.md
docker/docker-compose.lightweight.yml
front_end/panels/ai_chat/LLM/LLMClient.ts
front_end/panels/ai_chat/LLM/LLMTypes.ts
front_end/panels/ai_chat/LLM/LiteLLMProvider.ts
front_end/panels/ai_chat/agent_framework/AgentRunner.ts
front_end/panels/ai_chat/core/AgentNodes.ts
front_end/panels/ai_chat/core/AgentService.ts
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts
front_end/panels/ai_chat/tools/LLMTracingWrapper.ts
front_end/panels/ai_chat/tracing/TracingProvider.ts

🧰 Additional context used

📓 Path-based instructions (1)

agent-server/nodejs/src/api-server.js

📄 CodeRabbit inference engine (agent-server/nodejs/CLAUDE.md)

agent-server/nodejs/src/api-server.js: Expose REST endpoint POST /v1/responses that accepts task input, URL, timeout, and model configuration, and returns OpenAI-compatible response with metadata
Use formatResponse() method to convert agent responses to OpenAI-compatible format and include metadata with clientId and tabId for screenshot capture
Model configuration must use canonical nested format with main_model, mini_model, and nano_model tiers, each containing provider, model, and api_key fields
POST /page/screenshot endpoint must accept clientId and tabId, use CDP Page.captureScreenshot, and return base64-encoded PNG with metadata and timestamp
POST /page/content endpoint must accept clientId, tabId, format (html or text), and includeIframes parameters; recursively capture iframe content when includeIframes is true
POST /page/execute endpoint must accept clientId, tabId, expression, returnByValue, and awaitPromise; use CDP Runtime.evaluate and return result with type and value
Accept POST /v1/responses input as either string format (simple message) or conversation array format with role and content fields; enforce at least one user message and max 100 messages/10,000 characters per message

Files:

agent-server/nodejs/src/api-server.js

🧠 Learnings (8)

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : Use Chrome DevTools Protocol (CDP) for direct browser communication including screenshot capture via Page.captureScreenshot, page content via Runtime.evaluate, and tab management via Target.createTarget/closeTarget

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js
front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
agent-server/nodejs/src/api-server.js
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : POST /page/execute endpoint must accept clientId, tabId, expression, returnByValue, and awaitPromise; use CDP Runtime.evaluate and return result with type and value

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js
front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts
agent-server/nodejs/src/api-server.js
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : Implement WebSocket server for browser agent connections with client lifecycle management (connect, ready, disconnect)

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : CDP communication pattern: discover endpoint via http://localhost:9223/json/version, establish WebSocket per command, send JSON-RPC 2.0, attach for tab-level ops, detach, close WebSocket after response

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/rpc-client.js : Implement JSON-RPC 2.0 protocol for bidirectional communication with request/response correlation using unique IDs, timeout handling, and error conditions

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : Expose REST endpoint POST /v1/responses that accepts task input, URL, timeout, and model configuration, and returns OpenAI-compatible response with metadata

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : Use formatResponse() method to convert agent responses to OpenAI-compatible format and include metadata with clientId and tabId for screenshot capture

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : Model configuration must use canonical nested format with main_model, mini_model, and nano_model tiers, each containing provider, model, and api_key fields

Applied to files:

agent-server/nodejs/src/api-server.js

🧬 Code graph analysis (7)

front_end/panels/ai_chat/LLM/LLMClient.ts (1)

front_end/panels/ai_chat/tracing/TracingConfig.ts (1)

getCurrentTracingContext (299-301)

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (1)

front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts (7)

ToolExecutionRequest (83-88)

ToolExecutionParams (90-94)

ToolExecutionSuccessResponse (96-105)

ToolExecutionErrorResponse (107-118)

isToolExecutionRequest (211-213)

createToolExecutionSuccessResponse (301-317)

createToolExecutionErrorResponse (319-338)

front_end/panels/ai_chat/core/AgentService.ts (1)

front_end/panels/ai_chat/core/BuildConfig.ts (1)

BUILD_CONFIG (11-20)

front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts (1)

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (7)

ToolExecutionRequest (358-363)

ToolExecutionParams (365-369)

ToolExecutionSuccessResponse (371-380)

ToolExecutionErrorResponse (382-393)

isToolExecutionRequest (396-402)

createToolExecutionSuccessResponse (405-421)

createToolExecutionErrorResponse (423-439)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (4)

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (5)

isToolExecutionRequest (396-402)

ToolExecutionRequest (358-363)

createToolExecutionErrorResponse (423-439)

ErrorCodes (161-175)

createToolExecutionSuccessResponse (405-421)

front_end/panels/ai_chat/tracing/TracingConfig.ts (3)

isTracingEnabled (141-143)

setTracingConfig (134-136)

createTracingProvider (225-227)

front_end/panels/ai_chat/tracing/TracingProvider.ts (1)

TracingContext (8-31)

front_end/panels/ai_chat/agent_framework/ConfigurableAgentTool.ts (1)

ToolRegistry (221-268)

agent-server/nodejs/src/api-server.js (3)

front_end/panels/ai_chat/core/AgentService.ts (1)

logger (420-431)

front_end/panels/ai_chat/LLM/LiteLLMProvider.ts (1)

getEndpoint (48-64)

front_end/panels/ai_chat/core/LLMConfigurationManager.ts (1)

getEndpoint (222-229)

front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (2)

front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts (5)

isToolExecutionRequest (211-213)

ToolExecutionRequest (83-88)

createToolExecutionErrorResponse (319-338)

ErrorCodes (181-195)

createToolExecutionSuccessResponse (301-317)

front_end/panels/ai_chat/agent_framework/ConfigurableAgentTool.ts (1)

ToolRegistry (221-268)

🪛 checkmake (0.2.2)

docker/Makefile

[warning] 6-6: Target body for "help" exceeds allowed length of 5 (14).

(maxbodylength)

[warning] 30-30: Target body for "up" exceeds allowed length of 5 (6).

(maxbodylength)

[warning] 48-48: Target body for "chrome" exceeds allowed length of 5 (6).

(maxbodylength)

[warning] 4-4: Missing required phony target "all"

(minphony)

[warning] 4-4: Missing required phony target "clean"

(minphony)

[warning] 4-4: Missing required phony target "test"

(minphony)

🔇 Additional comments (34)

docker/README.md (1)

5-7: LGTM! Clear documentation enhancement.

The deployment type descriptions and quick reference link provide helpful context for users choosing between deployment options.

front_end/panels/ai_chat/agent_framework/AgentRunner.ts (1)

752-753: LGTM! Consistent tracing metadata propagation.

The explicit forwarding of tracingMetadata from the tracing context enables Langfuse integration for AgentRunner LLM calls. The optional chaining ensures graceful handling when tracing context is unavailable.

front_end/panels/ai_chat/core/AgentNodes.ts (1)

242-243: LGTM! Consistent tracing integration.

The tracing metadata is correctly sourced from the state context and forwarded to the LLM call, maintaining consistency with the tracing propagation pattern established in AgentRunner and other components.

front_end/panels/ai_chat/tools/LLMTracingWrapper.ts (1)

88-89: LGTM! Tracing metadata forwarding in wrapper.

The explicit forwarding of tracing metadata in the callLLMWithTracing wrapper ensures that tool-initiated LLM calls include Langfuse integration metadata, completing the tracing propagation chain.

front_end/panels/ai_chat/LLM/LiteLLMProvider.ts (1)

212-216: LGTM! Proper LiteLLM metadata propagation.

The conditional addition of tracing metadata to the request payload enables LiteLLM to forward this information to Langfuse callbacks. The placement after tool configuration and the conditional check ensure clean request construction.

front_end/panels/ai_chat/tracing/TracingProvider.ts (1)

21-30: LGTM! Well-structured tracing metadata foundation.

The addition of the metadata field to TracingContext provides a clean foundation for Langfuse integration. The optional fields align with Langfuse's evaluation framework, and the index signature allows for future extensibility without interface changes.

front_end/panels/ai_chat/LLM/LLMTypes.ts (1)

204-211: Well-structured tracing metadata interface.

The tracingMetadata field is properly typed with specific Langfuse properties while allowing flexibility via the index signature. This supports end-to-end tracing propagation.

agent-server/nodejs/src/lib/BrowserAgentServer.js (1)

733-744: Tracing metadata forwarding looks correct.

The tracing metadata is properly forwarded via request.tracing || {} with appropriate debug logging showing the session_id, trace_id, and other keys.

front_end/panels/ai_chat/LLM/LLMClient.ts (2)

20-20: Import for tracing context retrieval.

The import enables fallback to global tracing context when explicit metadata is not provided.

52-52: Added tracingMetadata to LLMCallRequest interface.

This enables explicit tracing metadata to be passed through the request object, aligning with the new tracing propagation pattern.

front_end/panels/ai_chat/core/AgentService.ts (2)

279-287: AUTOMATED_MODE API key bypass is correctly implemented.

The condition !apiKey && !BUILD_CONFIG.AUTOMATED_MODE correctly allows API keys to be provided dynamically in automated mode while maintaining validation in normal mode. The fallback to empty string for apiKey is appropriate for this use case.

600-602: Tracing metadata forwarding for Langfuse session grouping.

The metadata from the evaluation context is now correctly forwarded into the tracing context, enabling proper session grouping in Langfuse.

front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (4)

23-39: New imports for tool execution protocol.

The imports are correctly added to support the new direct tool execution pathway.

180-182: Tool execution request handling in message router.

The isToolExecutionRequest type guard correctly routes direct tool execution messages to the new handler.

333-350: Tracing metadata extraction and context creation.

The tracing metadata is correctly extracted from params.tracing and used to create a TracingContext with the full metadata payload for downstream LLM calls.

611-694: Direct tool execution handler implementation.

The implementation correctly:

Validates tool existence via ToolRegistry.getRegisteredTool

Executes tools without LLM orchestration

Returns proper JSON-RPC success/error responses

Uses undefined for tracing context (appropriate for direct API calls)

One consideration: Line 648 passes undefined for the tracing context in direct tool calls. If tracing is desired for direct tool executions in the future, you may want to optionally accept tracing metadata in ToolExecutionParams.

front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts (3)

132-141: Tracing metadata extension for Langfuse integration.

The tracing field in EvaluationParams enables Langfuse session grouping with well-defined properties and flexibility via the index signature.

211-213: Type guard for tool execution requests.

The type guard correctly identifies tool execution requests by checking jsonrpc === '2.0' and method === 'execute_tool'.

300-337: Helper functions for tool execution responses.

Both createToolExecutionSuccessResponse and createToolExecutionErrorResponse correctly construct JSON-RPC 2.0 compliant response objects.

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (1)

107-121: Enhanced tracing metadata with Langfuse credentials.

The tracing field in this remote protocol version includes Langfuse credential fields (langfuse_endpoint, langfuse_public_key, langfuse_secret_key) for auto-configuration when tracing is not already enabled. This is a valuable addition for seamless Langfuse integration.

docker/Dockerfile (1)

50-59: LGTM - Conditional AUTOMATED_MODE build configuration.

The implementation correctly uses a build argument to conditionally modify BuildConfig.ts. The string comparison "$AUTOMATED_MODE" = "true" is appropriate for shell conditionals. The sed pattern AUTOMATED_MODE: false matches the actual file content and will substitute correctly.

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (6)

14-14: LGTM on import additions.

The new tracing configuration imports (setTracingConfig, refreshTracingProvider) align with the auto-configuration feature introduced in this PR.

32-48: LGTM on protocol imports.

The new tool execution protocol imports are correctly added to support the direct tool execution flow.

205-207: LGTM on tool execution request handling.

The new message handler for isToolExecutionRequest properly routes to the dedicated handler method.

447-472: Tracing auto-configuration only applies once per session.

The !isTracingEnabled() guard prevents re-configuration with different Langfuse credentials in subsequent requests. If the first request enables tracing, subsequent requests with different credentials will silently use the original configuration. This may be intentional, but consider whether requests should be able to override tracing config.

Verify whether this is the intended behavior for multi-request scenarios with varying Langfuse configurations.

1373-1389: Good error handling for missing tools.

The error response correctly uses INVALID_PARAMS error code and provides clear error messaging when a tool is not found in the registry.

1421-1440: Error handling is complete and consistent.

The catch block properly calculates execution time, logs the error with context, and sends a structured error response using the protocol helpers.

agent-server/nodejs/src/api-server.js (7)

166-229: New API routes are well-structured.

The new endpoints follow consistent patterns:

Method validation with 405 responses

JSON body parsing

Delegation to handler methods

The /page/accessibility-tree endpoint correctly supports both GET and POST methods.

578-704: Static tool list is clearly documented.

The comment explaining that the actual registry lives in the DevTools frontend is helpful. The tool schemas provide useful documentation for API consumers.

1008-1023: LiteLLM endpoint fallback logic is sound.

The getEndpoint helper correctly implements the fallback chain: tier-specific endpoint → top-level endpoint → environment variable for LiteLLM provider. Based on the coding guidelines, this aligns with the expected nested model configuration format.

1066-1074: API key fallback for LiteLLM provider.

Good addition of LITELLM_API_KEY environment variable fallback when the provider is LiteLLM and no explicit key is provided.

972-976: Tracing metadata extraction is straightforward.

The extraction of requestBody.metadata for tracing purposes is clean. This allows callers to pass Langfuse session/trace IDs through the API.

1191-1235: Tracing metadata correctly propagated to evaluation request.

The createDynamicRequestNested function now accepts and forwards tracing metadata to the DevTools-side evaluation agent, enabling end-to-end Langfuse integration.

500-568: Direct tool execution endpoint is well-designed.

The implementation follows the existing patterns in the codebase:

Proper parameter validation

Clear error messages for connection issues

Structured response format with execution metadata

Based on learnings, this correctly uses CDP for browser communication through the BrowserAgentServer.

agent-server/nodejs/src/api-server.js

docker/docker-compose.lightweight.yml

front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts

claude · 2025-12-30T16:04:24Z

Claude finished @Daedaelius's task —— View job

Pull Request Review

I've completed a comprehensive review of this PR focusing on code quality, security, performance, and test coverage.

Review Checklist

Analyze code structure and architecture
Review security implications of new API endpoints
Check error handling and input validation
Assess performance considerations
Review test coverage gaps
Validate automated review feedback
Provide final recommendations

Architecture Review ✅

The PR introduces a well-structured direct tool execution API that bypasses LLM orchestration, along with comprehensive Langfuse tracing integration. The architecture follows good separation of concerns:

API Layer: New endpoints in api-server.js for tool execution and common actions
Execution Layer: BrowserAgentServer.executeToolDirect() method handles DevTools RPC communication
Tracing Integration: Metadata propagation through LLM calls and evaluation flows
Docker Configuration: Lightweight deployment option with proper service separation

Security Analysis ⚠️

Critical Issues:

Overly Permissive CORS (api-server.js:61)
```
res.setHeader('Access-Control-Allow-Origin', '*');
```
- Allows any domain to make requests to the API
- Risk: CSRF attacks, unauthorized cross-origin access
- Recommendation: Restrict to specific allowed origins or implement proper authentication
Missing Input Validation (api-server.js:713-826)
- Action handlers don't validate nodeId parameter before execution
- Risk: Tool execution with invalid parameters, unclear error messages
- Affected: handleActionClick, handleActionType, handleActionHover
API Key Exposure in Logs (api-server.js:924)
```
api_key: mk?.api_key ? `${String(mk.api_key).slice(0, 4)}...` : undefined
```
- Still logs partial API keys (good: truncated, bad: still in logs)
- Risk: Sensitive data in log files
- Recommendation: Consider removing API key logging entirely or use secure log levels

Positive Security Practices:

Proper timeout handling for tool execution
Environment variable fallbacks for API keys prevent hardcoding
Error message sanitization in most places

Error Handling & Input Validation ⚠️

Issues Found:

Inconsistent Validation Pattern
- handleActionNavigate validates url but other handlers don't validate nodeId
- handleActionType validates text but not nodeId
- Missing validation could lead to unclear error messages from tool execution
Timeout Handling
- Good: Consistent 30-second default timeout across all endpoints
- Good: Additional buffer time added in BrowserAgentServer.executeToolDirect (line 1449)
- Potential issue: No maximum timeout limit enforced
Error Response Structure
- Well-structured error responses with execution metadata
- Good defensive handling in executeToolDirect (line 562): result?.result?.success ?? false

Performance Considerations ✅

Good Practices:

Connection pooling and reuse in BrowserAgentServer
Proper timeout configuration prevents hanging requests
Efficient RPC communication pattern with DevTools
Minimal overhead for tracing metadata propagation

Potential Concerns:

Large api-server.js file (1,342 lines) could benefit from modularization
Static tool list instead of dynamic discovery (documented limitation)

Test Coverage Analysis ❌

Critical Gap: No tests found for new functionality

Searched for tests covering executeToolDirect, BrowserAgentServer, or API endpoints
Existing tool tests are limited to frontend DevTools tools (ListFilesTool, ReadFileTool, etc.)
Missing: Integration tests, API endpoint tests, error condition tests
Recommendation: Add test coverage for:
- Direct tool execution flow
- API endpoint validation
- Error handling scenarios
- Docker deployment configuration

Docker & Deployment ⚠️

Issue: Linux compatibility problem

docker-compose.lightweight.yml uses CDP_HOST=host.docker.internal
This hostname is not available by default on Linux Docker hosts
Solution: Add extra_hosts: ["host.docker.internal:host-gateway"] to the agent-server service

Automated Review Validation ✅

The CodeRabbit automated review identified several valid issues that I concur with:

Docstring coverage at 25% (requirement: 80%) - Valid concern
NodeId validation missing in action handlers - Confirmed
Docker Linux compatibility - Confirmed and detailed above
Unused variables in executeToolDirect - Minor but valid

Final Recommendations

Must Fix (Security):

Implement proper CORS policy or authentication
Add nodeId validation to action handlers
Consider removing API key logging

Should Fix (Quality):

Add comprehensive test coverage for new API endpoints
Fix Docker Linux compatibility issue
Address docstring coverage gap

Consider:

Modularize large api-server.js file
Implement rate limiting for API endpoints
Add API versioning strategy

Overall Assessment: This is a well-architected feature with good separation of concerns and comprehensive tracing integration. However, security and testing concerns need to be addressed before production deployment.

Job Run | PR Branch: feat/tool-calls-api

…owser

claude · 2025-12-30T23:00:23Z

Claude finished @Daedaelius's task —— View job

Pull Request Review

I've completed a comprehensive review of this PR focusing on code quality, security, performance, and test coverage.

Review Checklist

Analyze code structure and architecture
Review security implications of new API endpoints
Check error handling and input validation
Assess performance considerations
Review test coverage gaps
Validate automated review feedback
Provide final recommendations

Architecture Review ✅

The PR introduces a well-structured direct tool execution API that bypasses LLM orchestration, along with comprehensive Langfuse tracing integration. The architecture follows good separation of concerns:

API Layer: New endpoints in api-server.js for tool execution and common actions
Execution Layer: BrowserAgentServer.executeToolDirect() method handles DevTools RPC communication
Tracing Integration: Metadata propagation through LLM calls and evaluation flows
Docker Configuration: Lightweight deployment option with proper service separation

Security Analysis ⚠️

Critical Issues:

Overly Permissive CORS (api-server.js:61)
```
res.setHeader('Access-Control-Allow-Origin', '*');
```
- Allows any domain to make requests to the API
- Risk: CSRF attacks, unauthorized cross-origin access
- Recommendation: Restrict to specific allowed origins or implement proper authentication
Missing Input Validation (api-server.js:713-826)
- Action handlers don't validate nodeId parameter before execution
- Risk: Tool execution with invalid parameters, unclear error messages
- Affected: handleActionClick, handleActionType, handleActionHover
API Key Exposure in Logs (api-server.js:924)
```
api_key: mk?.api_key ? `${String(mk.api_key).slice(0, 4)}...` : undefined
```
- Still logs partial API keys (good: truncated, bad: still in logs)
- Risk: Sensitive data in log files
- Recommendation: Consider removing API key logging entirely or use secure log levels

Positive Security Practices:

Proper timeout handling for tool execution
Environment variable fallbacks for API keys prevent hardcoding
Error message sanitization in most places

Error Handling & Input Validation ⚠️

Issues Found:

Inconsistent Validation Pattern
- handleActionNavigate validates url but other handlers don't validate nodeId
- handleActionType validates text but not nodeId
- Missing validation could lead to unclear error messages from tool execution
Timeout Handling
- Good: Consistent 30-second default timeout across all endpoints
- Good: Additional buffer time added in BrowserAgentServer.executeToolDirect (line 1449)
- Potential issue: No maximum timeout limit enforced
Error Response Structure
- Well-structured error responses with execution metadata
- Good defensive handling in executeToolDirect (line 562): result?.result?.success ?? false

Performance Considerations ✅

Good Practices:

Connection pooling and reuse in BrowserAgentServer
Proper timeout configuration prevents hanging requests
Efficient RPC communication pattern with DevTools
Minimal overhead for tracing metadata propagation

Potential Concerns:

Large api-server.js file (1,342 lines) could benefit from modularization
Static tool list instead of dynamic discovery (documented limitation)

Test Coverage Analysis ❌

Critical Gap: No tests found for new functionality

Searched for tests covering executeToolDirect, BrowserAgentServer, or API endpoints
Existing tool tests are limited to frontend DevTools tools (ListFilesTool, ReadFileTool, etc.)
Missing: Integration tests, API endpoint tests, error condition tests
Recommendation: Add test coverage for:
- Direct tool execution flow
- API endpoint validation
- Error handling scenarios
- Docker deployment configuration

Docker & Deployment ⚠️

Issue: Linux compatibility problem

docker-compose.lightweight.yml uses CDP_HOST=host.docker.internal
This hostname is not available by default on Linux Docker hosts
Solution: Add extra_hosts: ["host.docker.internal:host-gateway"] to the agent-server service

Automated Review Validation ✅

The CodeRabbit automated review identified several valid issues that I concur with:

Docstring coverage at 25% (requirement: 80%) - Valid concern
NodeId validation missing in action handlers - Confirmed
Docker Linux compatibility - Confirmed and detailed above
Unused variables in executeToolDirect - Minor but valid

Final Recommendations

Must Fix (Security):

Implement proper CORS policy or authentication
Add nodeId validation to action handlers
Consider removing API key logging

Should Fix (Quality):

Add comprehensive test coverage for new API endpoints
Fix Docker Linux compatibility issue
Address docstring coverage gap

Consider:

Modularize large api-server.js file
Implement rate limiting for API endpoints
Add API versioning strategy

Overall Assessment: This is a well-architected feature with good separation of concerns and comprehensive tracing integration. However, security and testing concerns need to be addressed before production deployment.

Job Run | PR Branch: feat/tool-calls-api

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

agent-server/nodejs/src/api-server.js (3)
716-730: Missing nodeId validation for click action.

The click action requires nodeId to identify the target element, but this isn't validated before calling executeToolDirect. A missing nodeId will cause tool execution to fail with a less clear error.
🔎 Suggested fix
  async handleActionClick(payload) {
    const { clientId, tabId, nodeId, timeout = 30000 } = payload;

+   if (nodeId === undefined || nodeId === null) {
+     throw new Error('nodeId is required for click action');
+   }
+
    return this.executeToolDirect({
      clientId,
      tabId,
738-757: Missing nodeId validation for type action.

Similar to click, the type action validates text but not nodeId. Both are required for the action to succeed.
🔎 Suggested fix
  async handleActionType(payload) {
    const { clientId, tabId, nodeId, text, timeout = 30000 } = payload;

+   if (nodeId === undefined || nodeId === null) {
+     throw new Error('nodeId is required for type action');
+   }
+
    if (!text) {
      throw new Error('Text is required for type action');
    }
812-826: Missing nodeId validation for hover action.

Consistent with the pattern, hover also needs nodeId validation.
🔎 Suggested fix
  async handleActionHover(payload) {
    const { clientId, tabId, nodeId, timeout = 30000 } = payload;

+   if (nodeId === undefined || nodeId === null) {
+     throw new Error('nodeId is required for hover action');
+   }
+
    return this.executeToolDirect({

🧹 Nitpick comments (2)

docker/Makefile (1)
30-36: Consider extracting hardcoded service URLs to variables.

The service URLs are hardcoded and could become stale if port mappings change in docker-compose.lightweight.yml. While not critical, extracting these to variables or documenting the dependency would improve maintainability.
🔎 Optional refactor
+# Service ports (must match docker-compose.lightweight.yml)
+DEVTOOLS_PORT=8000
+AGENT_API_PORT=8080
+AGENT_WS_PORT=8082
+
 up: ## Start DevTools + Agent Server (Type 2 - lightweight)
 	docker-compose -f docker-compose.lightweight.yml up -d
 	@echo ""
 	@echo "Services:"
-	@echo "  DevTools:   http://localhost:8000"
-	@echo "  Agent API:  http://localhost:8080"
-	@echo "  Agent WS:   ws://localhost:8082"
+	@echo "  DevTools:   http://localhost:$(DEVTOOLS_PORT)"
+	@echo "  Agent API:  http://localhost:$(AGENT_API_PORT)"
+	@echo "  Agent WS:   ws://localhost:$(AGENT_WS_PORT)"
agent-server/nodejs/src/api-server.js (1)
765-778: Consider validating scroll direction against allowed values.

The direction parameter has a default but isn't validated against the schema's enum ['up', 'down', 'left', 'right']. While the tool execution may handle invalid values gracefully, early validation would provide clearer error messages.
🔎 Optional validation
  async handleActionScroll(payload) {
    const { clientId, tabId, direction = 'down', amount, pages, timeout = 30000 } = payload;

+   const validDirections = ['up', 'down', 'left', 'right'];
+   if (!validDirections.includes(direction)) {
+     throw new Error(`Invalid direction "${direction}". Must be one of: ${validDirections.join(', ')}`);
+   }
+
    return this.executeToolDirect({

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7731989 and 742a6b6.

📒 Files selected for processing (8)

agent-server/nodejs/src/api-server.js
agent-server/nodejs/src/lib/BrowserAgentServer.js
docker/Makefile
docker/docker-compose.lightweight.yml
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts
front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts

🚧 Files skipped from review as they are similar to previous changes (3)

docker/docker-compose.lightweight.yml
front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts
front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts

🧰 Additional context used

📓 Path-based instructions (1)

agent-server/nodejs/src/api-server.js

📄 CodeRabbit inference engine (agent-server/nodejs/CLAUDE.md)

agent-server/nodejs/src/api-server.js: Expose REST endpoint POST /v1/responses that accepts task input, URL, timeout, and model configuration, and returns OpenAI-compatible response with metadata
Use formatResponse() method to convert agent responses to OpenAI-compatible format and include metadata with clientId and tabId for screenshot capture
Model configuration must use canonical nested format with main_model, mini_model, and nano_model tiers, each containing provider, model, and api_key fields
POST /page/screenshot endpoint must accept clientId and tabId, use CDP Page.captureScreenshot, and return base64-encoded PNG with metadata and timestamp
POST /page/content endpoint must accept clientId, tabId, format (html or text), and includeIframes parameters; recursively capture iframe content when includeIframes is true
POST /page/execute endpoint must accept clientId, tabId, expression, returnByValue, and awaitPromise; use CDP Runtime.evaluate and return result with type and value
Accept POST /v1/responses input as either string format (simple message) or conversation array format with role and content fields; enforce at least one user message and max 100 messages/10,000 characters per message

Files:

agent-server/nodejs/src/api-server.js

🧠 Learnings (9)

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : Use Chrome DevTools Protocol (CDP) for direct browser communication including screenshot capture via Page.captureScreenshot, page content via Runtime.evaluate, and tab management via Target.createTarget/closeTarget

Applied to files:

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
agent-server/nodejs/src/api-server.js
agent-server/nodejs/src/lib/BrowserAgentServer.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : POST /page/execute endpoint must accept clientId, tabId, expression, returnByValue, and awaitPromise; use CDP Runtime.evaluate and return result with type and value

Applied to files:

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts
front_end/panels/ai_chat/evaluation/EvaluationAgent.ts
agent-server/nodejs/src/api-server.js
agent-server/nodejs/src/lib/BrowserAgentServer.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : Use formatResponse() method to convert agent responses to OpenAI-compatible format and include metadata with clientId and tabId for screenshot capture

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : Expose REST endpoint POST /v1/responses that accepts task input, URL, timeout, and model configuration, and returns OpenAI-compatible response with metadata

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : POST /page/screenshot endpoint must accept clientId and tabId, use CDP Page.captureScreenshot, and return base64-encoded PNG with metadata and timestamp

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/client-manager.js : Each browser tab must be identified by composite client ID in format baseClientId:tabId

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : Implement WebSocket server for browser agent connections with client lifecycle management (connect, ready, disconnect)

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/rpc-client.js : Implement JSON-RPC 2.0 protocol for bidirectional communication with request/response correlation using unique IDs, timeout handling, and error conditions

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : CDP communication pattern: discover endpoint via http://localhost:9223/json/version, establish WebSocket per command, send JSON-RPC 2.0, attach for tab-level ops, detach, close WebSocket after response

Applied to files:

agent-server/nodejs/src/lib/BrowserAgentServer.js

🧬 Code graph analysis (2)

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (4)

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (5)

isToolExecutionRequest (396-402)

ToolExecutionRequest (358-363)

createToolExecutionErrorResponse (423-439)

ErrorCodes (161-175)

createToolExecutionSuccessResponse (405-421)

front_end/panels/ai_chat/ui/EvaluationDialog.ts (1)

message (1612-1623)

front_end/panels/ai_chat/ui/AIChatPanel.ts (4)

logger (958-1032)

logger (1151-1164)

logger (1208-1211)

error (1544-1557)

front_end/panels/ai_chat/agent_framework/ConfigurableAgentTool.ts (1)

ToolRegistry (221-268)

front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (4)

front_end/panels/ai_chat/evaluation/EvaluationProtocol.ts (5)

isToolExecutionRequest (211-213)

ToolExecutionRequest (83-88)

createToolExecutionErrorResponse (319-338)

ErrorCodes (181-195)

createToolExecutionSuccessResponse (301-317)

front_end/panels/ai_chat/evaluation/remote/EvaluationProtocol.ts (5)

isToolExecutionRequest (396-402)

ToolExecutionRequest (358-363)

createToolExecutionErrorResponse (423-439)

ErrorCodes (161-175)

createToolExecutionSuccessResponse (405-421)

front_end/panels/ai_chat/ui/AIChatPanel.ts (4)

logger (958-1032)

logger (1151-1164)

logger (1208-1211)

error (1544-1557)

front_end/panels/ai_chat/agent_framework/ConfigurableAgentTool.ts (1)

ToolRegistry (221-268)

🪛 checkmake (0.2.2)

docker/Makefile

[warning] 6-6: Target body for "help" exceeds allowed length of 5 (14).

(maxbodylength)

[warning] 30-30: Target body for "up" exceeds allowed length of 5 (6).

(maxbodylength)

[warning] 48-48: Target body for "chrome" exceeds allowed length of 5 (6).

(maxbodylength)

[warning] 4-4: Missing required phony target "all"

(minphony)

[warning] 4-4: Missing required phony target "clean"

(minphony)

[warning] 4-4: Missing required phony target "test"

(minphony)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: claude-review

🔇 Additional comments (12)

docker/Makefile (2)

6-21: LGTM!

The help target provides clear deployment guidance. The static analysis warning about target body length is pedantic; help targets inherently require more lines to document usage properly.

22-23: LGTM!

The build, devtools-up, down, logs, and status targets are well-implemented:

The down and logs targets use good fallback patterns with error suppression (2>/dev/null || true), ensuring graceful handling when either stack isn't running.

The status target effectively filters containers by name.

All targets are simple, focused, and correct.

Also applies to: 25-28, 38-43, 45-46

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts (2)

32-32: LGTM: Tool execution protocol imports added correctly.

The imports for direct tool execution support are properly aligned with the protocol definitions.

Also applies to: 38-38, 47-48

205-207: LGTM: Message routing integrated correctly.

The routing for ToolExecutionRequest follows the established pattern for other message types.

front_end/panels/ai_chat/evaluation/EvaluationAgent.ts (3)

23-25: LGTM: Tool execution imports added correctly.

The imports for direct tool execution are properly structured and aligned with the protocol.

Also applies to: 30-30, 38-39

180-182: LGTM: Message routing follows established patterns.

The routing for ToolExecutionRequest is consistent with other message type handlers.

611-694: LGTM: Direct tool execution implemented correctly.

The method correctly:

Uses specific error codes (INVALID_TOOL, TOOL_EXECUTION_ERROR)

Reuses executeToolWithTimeout for consistent timeout handling

Intentionally skips tracing, navigation, and retries (per design comments)

Includes appropriate logging for debugging

The implementation is clean and follows the established patterns in the codebase.

agent-server/nodejs/src/api-server.js (5)

166-229: LGTM!

The new route definitions are well-structured with proper method validation, body parsing, and async handling.

513-568: LGTM!

The method has comprehensive validation of required fields, connection state, and provides clear error messages. The structured response format is appropriate.

578-704: LGTM!

The static tool registry is well-documented with comprehensive JSON Schema definitions. The comment appropriately notes that dynamic discovery would use the DevTools connection.

787-804: LGTM!

The navigate action properly validates the required url parameter and has appropriate error messaging.

834-854: LGTM!

The accessibility tree endpoint properly validates required fields and correctly handles the optional maxDepth parameter.

coderabbitai · 2025-12-30T23:05:55Z

agent-server/nodejs/src/lib/BrowserAgentServer.js

+  /**
+   * Execute a tool directly on a connected DevTools client
+   * This bypasses LLM orchestration and calls the tool directly
+   * @param {Object} connection - DevTools WebSocket connection
+   * @param {string} tool - Tool name (e.g., 'perform_action', 'navigate_url')
+   * @param {Object} args - Tool-specific arguments
+   * @param {number} timeout - Execution timeout in milliseconds
+   * @returns {Promise<Object>} Tool execution result
+   */
+  async executeToolDirect(connection, tool, args, timeout = 30000) {
+    const rpcId = `tool-${Date.now()}-${Math.random().toString(36).substring(2, 9)}`;
+
+    logger.info('Executing tool directly', {
+      clientId: connection.clientId,
+      tool,
+      timeout
+    });
+
+    try {
+      // Prepare RPC request for execute_tool method
+      const response = await connection.rpcClient.callMethod(
+        connection.ws,
+        'execute_tool',
+        {
+          tool,
+          args,
+          timeout
+        },
+        timeout + 5000 // Add buffer for network overhead
+      );
+
+      logger.info('Tool execution completed', {
+        clientId: connection.clientId,
+        tool,
+        success: response?.result?.success
+      });
+
+      return response;
+
+    } catch (error) {
+      logger.error('Tool execution failed', {
+        clientId: connection.clientId,
+        tool,
+        error: error.message
+      });
+      throw error;
+    }
+  }


⚠️ Potential issue | 🟡 Minor

Verify connection readiness before tool execution.

The method doesn't validate that the connection is in a ready state before attempting RPC communication. If the connection isn't ready, the RPC call will fail with a less informative error.

🔎 Suggested validation

async executeToolDirect(connection, tool, args, timeout = 30000) { const rpcId = `tool-${Date.now()}-${Math.random().toString(36).substring(2, 9)}`; + if (!connection.ready) { + throw new Error(`Connection for client ${connection.clientId} is not ready for tool execution`); + } + logger.info('Executing tool directly', { clientId: connection.clientId, tool, timeout });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/**

* Execute a tool directly on a connected DevTools client

* This bypasses LLM orchestration and calls the tool directly

* @param {Object} connection - DevTools WebSocket connection

* @param {string} tool - Tool name (e.g., 'perform_action', 'navigate_url')

* @param {Object} args - Tool-specific arguments

* @param {number} timeout - Execution timeout in milliseconds

* @returns {Promise<Object>} Tool execution result

*/

async executeToolDirect(connection, tool, args, timeout = 30000) {

const rpcId = `tool-${Date.now()}-${Math.random().toString(36).substring(2, 9)}`;

logger.info('Executing tool directly', {

clientId: connection.clientId,

tool,

timeout

});

try {

// Prepare RPC request for execute_tool method

const response = await connection.rpcClient.callMethod(

connection.ws,

'execute_tool',

{

tool,

args,

timeout

},

timeout + 5000 // Add buffer for network overhead

);

logger.info('Tool execution completed', {

clientId: connection.clientId,

tool,

success: response?.result?.success

});

return response;

} catch (error) {

logger.error('Tool execution failed', {

clientId: connection.clientId,

tool,

error: error.message

});

throw error;

}

}

/**

* Execute a tool directly on a connected DevTools client

* This bypasses LLM orchestration and calls the tool directly

* @param {Object} connection - DevTools WebSocket connection

* @param {string} tool - Tool name (e.g., 'perform_action', 'navigate_url')

* @param {Object} args - Tool-specific arguments

* @param {number} timeout - Execution timeout in milliseconds

* @returns {Promise<Object>} Tool execution result

*/

async executeToolDirect(connection, tool, args, timeout = 30000) {

const rpcId = `tool-${Date.now()}-${Math.random().toString(36).substring(2, 9)}`;

if (!connection.ready) {

throw new Error(`Connection for client ${connection.clientId} is not ready for tool execution`);

}

logger.info('Executing tool directly', {

clientId: connection.clientId,

tool,

timeout

});

try {

// Prepare RPC request for execute_tool method

const response = await connection.rpcClient.callMethod(

connection.ws,

'execute_tool',

{

tool,

args,

timeout

},

timeout + 5000 // Add buffer for network overhead

);

logger.info('Tool execution completed', {

clientId: connection.clientId,

tool,

success: response?.result?.success

});

return response;

} catch (error) {

logger.error('Tool execution failed', {

clientId: connection.clientId,

tool,

error: error.message

});

throw error;

}

}

docker/Makefile

coderabbitai · 2025-12-30T23:05:55Z

front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts

+  /**
+   * Handle direct tool execution requests without LLM orchestration
+   */
+  private async handleToolExecutionRequest(request: ToolExecutionRequest): Promise<void> {
+    const { params, id } = request;
+    const startTime = Date.now();
+
+    logger.info('Received tool execution request', {
+      tool: params.tool,
+      hasArgs: !!params.args,
+      timeout: params.timeout || 30000
+    });
+
+    try {
+      // Get the tool from registry
+      const tool = ToolRegistry.getRegisteredTool(params.tool);
+      if (!tool) {
+        const errorResponse = createToolExecutionErrorResponse(
+          id,
+          ErrorCodes.INVALID_PARAMS,
+          `Tool not found: ${params.tool}`,
+          params.tool,
+          `Tool '${params.tool}' is not registered`
+        );
+        if (this.client) {
+          this.client.send(errorResponse);
+        }
+        logger.error('Tool not found', { tool: params.tool });
+        return;
+      }
+
+      // Execute the tool with timeout
+      const timeout = params.timeout || 30000;
+      const result = await this.executeToolWithTimeout(
+        tool,
+        params.args || {},
+        timeout,
+        undefined,
+        params.tool
+      );
+
+      const executionTime = Date.now() - startTime;
+
+      // Send success response
+      const successResponse = createToolExecutionSuccessResponse(
+        id,
+        params.tool,
+        result,
+        executionTime
+      );
+
+      if (this.client) {
+        this.client.send(successResponse);
+      }
+
+      logger.info('Tool execution completed', {
+        tool: params.tool,
+        executionTime,
+        hasResult: !!result
+      });
+
+    } catch (error) {
+      const executionTime = Date.now() - startTime;
+      logger.error('Tool execution failed', {
+        tool: params.tool,
+        error: error instanceof Error ? error.message : String(error),
+        executionTime
+      });
+
+      const errorResponse = createToolExecutionErrorResponse(
+        id,
+        ErrorCodes.INTERNAL_ERROR,
+        'Tool execution failed',
+        params.tool,
+        error instanceof Error ? error.message : String(error)
+      );
+
+      if (this.client) {
+        this.client.send(errorResponse);
+      }
+    }
+  }


⚠️ Potential issue | 🟡 Minor

Use more specific error codes for consistency.

The error responses use generic error codes instead of the more specific codes available:

Line 1380 uses ErrorCodes.INVALID_PARAMS but should use ErrorCodes.INVALID_TOOL (-32004)

Line 1432 uses ErrorCodes.INTERNAL_ERROR but should use ErrorCodes.TOOL_EXECUTION_ERROR (-32000)

The non-remote version (EvaluationAgent.ts lines 631, 684) correctly uses these specific codes. Using consistent, specific error codes helps API consumers distinguish between different failure modes.

🔎 Proposed fix

const tool = ToolRegistry.getRegisteredTool(params.tool); if (!tool) { const errorResponse = createToolExecutionErrorResponse( id, - ErrorCodes.INVALID_PARAMS, + ErrorCodes.INVALID_TOOL, `Tool not found: ${params.tool}`, params.tool, `Tool '${params.tool}' is not registered` );

const errorResponse = createToolExecutionErrorResponse( id, - ErrorCodes.INTERNAL_ERROR, + ErrorCodes.TOOL_EXECUTION_ERROR, 'Tool execution failed', params.tool, error instanceof Error ? error.message : String(error) );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/**

* Handle direct tool execution requests without LLM orchestration

*/

private async handleToolExecutionRequest(request: ToolExecutionRequest): Promise<void> {

const { params, id } = request;

const startTime = Date.now();

logger.info('Received tool execution request', {

tool: params.tool,

hasArgs: !!params.args,

timeout: params.timeout || 30000

});

try {

// Get the tool from registry

const tool = ToolRegistry.getRegisteredTool(params.tool);

if (!tool) {

const errorResponse = createToolExecutionErrorResponse(

id,

ErrorCodes.INVALID_PARAMS,

`Tool not found: ${params.tool}`,

params.tool,

`Tool '${params.tool}' is not registered`

);

if (this.client) {

this.client.send(errorResponse);

}

logger.error('Tool not found', { tool: params.tool });

return;

}

// Execute the tool with timeout

const timeout = params.timeout || 30000;

const result = await this.executeToolWithTimeout(

tool,

params.args || {},

timeout,

undefined,

params.tool

);

const executionTime = Date.now() - startTime;

// Send success response

const successResponse = createToolExecutionSuccessResponse(

id,

params.tool,

result,

executionTime

);

if (this.client) {

this.client.send(successResponse);

}

logger.info('Tool execution completed', {

tool: params.tool,

executionTime,

hasResult: !!result

});

} catch (error) {

const executionTime = Date.now() - startTime;

logger.error('Tool execution failed', {

tool: params.tool,

error: error instanceof Error ? error.message : String(error),

executionTime

});

const errorResponse = createToolExecutionErrorResponse(

id,

ErrorCodes.INTERNAL_ERROR,

'Tool execution failed',

params.tool,

error instanceof Error ? error.message : String(error)

);

if (this.client) {

this.client.send(errorResponse);

}

}

}

/**

* Handle direct tool execution requests without LLM orchestration

*/

private async handleToolExecutionRequest(request: ToolExecutionRequest): Promise<void> {

const { params, id } = request;

const startTime = Date.now();

logger.info('Received tool execution request', {

tool: params.tool,

hasArgs: !!params.args,

timeout: params.timeout || 30000

});

try {

// Get the tool from registry

const tool = ToolRegistry.getRegisteredTool(params.tool);

if (!tool) {

const errorResponse = createToolExecutionErrorResponse(

id,

ErrorCodes.INVALID_TOOL,

`Tool not found: ${params.tool}`,

params.tool,

`Tool '${params.tool}' is not registered`

);

if (this.client) {

this.client.send(errorResponse);

}

logger.error('Tool not found', { tool: params.tool });

return;

}

// Execute the tool with timeout

const timeout = params.timeout || 30000;

const result = await this.executeToolWithTimeout(

tool,

params.args || {},

timeout,

undefined,

params.tool

);

const executionTime = Date.now() - startTime;

// Send success response

const successResponse = createToolExecutionSuccessResponse(

id,

params.tool,

result,

executionTime

);

if (this.client) {

this.client.send(successResponse);

}

logger.info('Tool execution completed', {

tool: params.tool,

executionTime,

hasResult: !!result

});

} catch (error) {

const executionTime = Date.now() - startTime;

logger.error('Tool execution failed', {

tool: params.tool,

error: error instanceof Error ? error.message : String(error),

executionTime

});

const errorResponse = createToolExecutionErrorResponse(

id,

ErrorCodes.TOOL_EXECUTION_ERROR,

'Tool execution failed',

params.tool,

error instanceof Error ? error.message : String(error)

);

if (this.client) {

this.client.send(errorResponse);

}

}

}

🤖 Prompt for AI Agents

In front_end/panels/ai_chat/evaluation/remote/EvaluationAgent.ts around lines 1361-1442, the error responses use generic codes; replace the INVALID_PARAMS code used when a tool is not found (around line ~1380) with ErrorCodes.INVALID_TOOL (-32004), and replace the INTERNAL_ERROR code used in the catch block (around line ~1432) with ErrorCodes.TOOL_EXECUTION_ERROR (-32000); update the createToolExecutionErrorResponse calls accordingly so the errorResponse for missing tools and for execution failures use the specific codes.

coderabbitai

Actionable comments posted: 1

Fix all issues with AI Agents 🤖

In @agent-server/nodejs/src/api-server.js:
- Around line 834-854: The getAccessibilityTree method is calling
executeToolDirect with the wrong tool name; update the tool argument in
getAccessibilityTree to 'get_accessibility_tree' (instead of 'get_page_content')
so executeToolDirect invokes the correct tool; keep the rest of the payload
(clientId, tabId, args.maxDepth, timeout) unchanged and ensure the change is
made inside the async getAccessibilityTree function where executeToolDirect is
invoked.

🧹 Nitpick comments (2)

agent-server/nodejs/src/api-server.js (2)
513-527: Consider adding validation for the args parameter.

While clientId, tabId, and tool are validated, the args parameter is not checked. Some tools may require specific arguments, and passing invalid or missing args could result in less clear error messages from the downstream tool execution.
🔎 Optional validation improvement
  async executeToolDirect(payload) {
    const { clientId, tabId, tool, args, timeout = 30000 } = payload;

    if (!clientId) {
      throw new Error('Client ID is required');
    }

    if (!tabId) {
      throw new Error('Tab ID is required');
    }

    if (!tool) {
      throw new Error('Tool name is required');
    }
+
+   // Validate args is an object if provided
+   if (args !== undefined && (typeof args !== 'object' || args === null)) {
+     throw new Error('Tool arguments must be an object');
+   }

    const baseClientId = clientId.split(':')[0];
765-779: Consider validating the direction parameter.

The scroll action defaults direction to 'down' but doesn't validate it against the allowed enum values ('up', 'down', 'left', 'right') defined in the tool schema at line 634. Invalid directions will fail during tool execution with potentially unclear error messages.
🔎 Optional validation improvement
  async handleActionScroll(payload) {
    const { clientId, tabId, direction = 'down', amount, pages, timeout = 30000 } = payload;

+   const validDirections = ['up', 'down', 'left', 'right'];
+   if (!validDirections.includes(direction)) {
+     throw new Error(`Invalid scroll direction: "${direction}". Must be one of: ${validDirections.join(', ')}`);
+   }
+
    return this.executeToolDirect({

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 742a6b6 and 49216cf.

📒 Files selected for processing (1)

agent-server/nodejs/src/api-server.js

🧰 Additional context used

📓 Path-based instructions (1)

agent-server/nodejs/src/api-server.js

📄 CodeRabbit inference engine (agent-server/nodejs/CLAUDE.md)

agent-server/nodejs/src/api-server.js: Expose REST endpoint POST /v1/responses that accepts task input, URL, timeout, and model configuration, and returns OpenAI-compatible response with metadata
Use formatResponse() method to convert agent responses to OpenAI-compatible format and include metadata with clientId and tabId for screenshot capture
Model configuration must use canonical nested format with main_model, mini_model, and nano_model tiers, each containing provider, model, and api_key fields
POST /page/screenshot endpoint must accept clientId and tabId, use CDP Page.captureScreenshot, and return base64-encoded PNG with metadata and timestamp
POST /page/content endpoint must accept clientId, tabId, format (html or text), and includeIframes parameters; recursively capture iframe content when includeIframes is true
POST /page/execute endpoint must accept clientId, tabId, expression, returnByValue, and awaitPromise; use CDP Runtime.evaluate and return result with type and value
Accept POST /v1/responses input as either string format (simple message) or conversation array format with role and content fields; enforce at least one user message and max 100 messages/10,000 characters per message

Files:

agent-server/nodejs/src/api-server.js

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : Use Chrome DevTools Protocol (CDP) for direct browser communication including screenshot capture via Page.captureScreenshot, page content via Runtime.evaluate, and tab management via Target.createTarget/closeTarget

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Browser must be started with remote debugging enabled via --remote-debugging-port=9223 with CDP endpoint at http://localhost:9223/json/version

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : POST /page/execute endpoint must accept clientId, tabId, expression, returnByValue, and awaitPromise; use CDP Runtime.evaluate and return result with type and value

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : POST /page/execute endpoint must accept clientId, tabId, expression, returnByValue, and awaitPromise; use CDP Runtime.evaluate and return result with type and value

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/lib/EvalServer.js : Use Chrome DevTools Protocol (CDP) for direct browser communication including screenshot capture via Page.captureScreenshot, page content via Runtime.evaluate, and tab management via Target.createTarget/closeTarget

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : POST /page/screenshot endpoint must accept clientId and tabId, use CDP Page.captureScreenshot, and return base64-encoded PNG with metadata and timestamp

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/client-manager.js : Each browser tab must be identified by composite client ID in format baseClientId:tabId

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : Expose REST endpoint POST /v1/responses that accepts task input, URL, timeout, and model configuration, and returns OpenAI-compatible response with metadata

Applied to files:

agent-server/nodejs/src/api-server.js

📚 Learning: 2025-12-07T00:27:56.465Z

Learnt from: CR
Repo: BrowserOperator/browser-operator-core PR: 0
File: agent-server/nodejs/CLAUDE.md:0-0
Timestamp: 2025-12-07T00:27:56.465Z
Learning: Applies to agent-server/nodejs/src/api-server.js : Use formatResponse() method to convert agent responses to OpenAI-compatible format and include metadata with clientId and tabId for screenshot capture

Applied to files:

agent-server/nodejs/src/api-server.js

🧬 Code graph analysis (1)

agent-server/nodejs/src/api-server.js (2)

front_end/legacy_test_runner/test_runner/TestRunner.js (1)

res (851-851)

front_end/panels/ai_chat/core/AgentService.ts (1)

logger (420-431)

🔇 Additional comments (5)

agent-server/nodejs/src/api-server.js (5)

166-229: LGTM on the new endpoint routing.

The new tool execution and action endpoints follow a consistent pattern with proper method validation and error handling. The accessibility tree endpoint correctly supports both GET and POST methods with appropriate parameter parsing.

578-704: LGTM on the static tool registry.

The static tool list with schemas provides a reasonable API surface for tool discovery. The comment appropriately notes that the authoritative registry lives in the DevTools frontend.

716-730: Missing nodeId validation for click action.

The click action requires nodeId but this isn't validated before calling executeToolDirect, leading to unclear failures downstream.

Likely an incorrect or invalid review comment.

812-826: Missing nodeId validation for hover action.

Consistent with the pattern, hover also needs nodeId validation before calling executeToolDirect.

Likely an incorrect or invalid review comment.

738-757: Missing nodeId validation for type action.

Similar to click, the type action validates text but not nodeId. Both are required for successful execution.

Likely an incorrect or invalid review comment.

coderabbitai · 2026-01-06T02:14:25Z

agent-server/nodejs/src/api-server.js

+  async getAccessibilityTree(payload) {
+    const { clientId, tabId, maxDepth } = payload;
+
+    if (!clientId) {
+      throw new Error('Client ID is required');
+    }
+
+    if (!tabId) {
+      throw new Error('Tab ID is required');
+    }
+
+    return this.executeToolDirect({
+      clientId,
+      tabId,
+      tool: 'get_page_content',
+      args: {
+        maxDepth
+      },
+      timeout: 30000
+    });
+  }


⚠️ Potential issue | 🔴 Critical

Critical: Wrong tool name used for accessibility tree.

Line 848 calls executeToolDirect with tool: 'get_page_content', but based on the method name, endpoint path, and the tool registry at line 650, it should use tool: 'get_accessibility_tree'. This will execute the wrong tool and return incorrect data.

🔎 Proposed fix

async getAccessibilityTree(payload) { const { clientId, tabId, maxDepth } = payload; if (!clientId) { throw new Error('Client ID is required'); } if (!tabId) { throw new Error('Tab ID is required'); } return this.executeToolDirect({ clientId, tabId, - tool: 'get_page_content', + tool: 'get_accessibility_tree', args: { maxDepth }, timeout: 30000 }); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async getAccessibilityTree(payload) {

const { clientId, tabId, maxDepth } = payload;

if (!clientId) {

throw new Error('Client ID is required');

}

if (!tabId) {

throw new Error('Tab ID is required');

}

return this.executeToolDirect({

clientId,

tabId,

tool: 'get_page_content',

args: {

maxDepth

},

timeout: 30000

});

}

async getAccessibilityTree(payload) {

const { clientId, tabId, maxDepth } = payload;

if (!clientId) {

throw new Error('Client ID is required');

}

if (!tabId) {

throw new Error('Tab ID is required');

}

return this.executeToolDirect({

clientId,

tabId,

tool: 'get_accessibility_tree',

args: {

maxDepth

},

timeout: 30000

});

}

🤖 Prompt for AI Agents

In @agent-server/nodejs/src/api-server.js around lines 834-854, The getAccessibilityTree method is calling executeToolDirect with the wrong tool name; update the tool argument in getAccessibilityTree to 'get_accessibility_tree' (instead of 'get_page_content') so executeToolDirect invokes the correct tool; keep the rest of the payload (clientId, tabId, args.maxDepth, timeout) unchanged and ensure the change is made inside the async getAccessibilityTree function where executeToolDirect is invoked.

# Conflicts: # docker/Makefile # docker/docker-compose.lightweight.yml

claude · 2026-01-16T00:09:57Z

Claude finished @Daedaelius's task —— View job

I'll analyze this and get back to you.

The GRD file had incorrect references: - Removed: ActionAgentV0.js (file doesn't exist) - Added: ActionAgentV1.js (file exists but was missing from list) This fixes the GN build error: ERROR Input to target not generated by a dependency. The file: ActionAgentV0.js is listed but no targets generate that file. And the assert_grd error: File ActionAgentV1.js is not listed in devtools_grd_files.gni. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-17T14:41:17Z

Claude finished @Daedaelius's task —— View job

Pull Request Review

I've completed a comprehensive review of this PR focusing on code quality, security, performance, and test coverage.

Review Checklist

Analyze code structure and architecture
Review security implications of new API endpoints
Check error handling and input validation
Assess performance considerations
Review test coverage gaps
Validate automated review feedback
Provide final recommendations

Architecture Review ✅

The PR introduces a well-structured direct tool execution API that bypasses LLM orchestration, along with comprehensive Langfuse tracing integration. The architecture follows good separation of concerns:

API Layer: New endpoints in api-server.js for tool execution and common actions
Execution Layer: BrowserAgentServer.executeToolDirect() method handles DevTools RPC communication
Tracing Integration: Metadata propagation through LLM calls and evaluation flows
Docker Configuration: Lightweight deployment option with proper service separation

Security Analysis ⚠️

Critical Issues:

Overly Permissive CORS (api-server.js:61)
```
res.setHeader('Access-Control-Allow-Origin', '*');
```
- Allows any domain to make requests to the API
- Risk: CSRF attacks, unauthorized cross-origin access
- Recommendation: Restrict to specific allowed origins or implement proper authentication
Missing Input Validation (api-server.js:713-826)
- Action handlers don't validate nodeId parameter before execution
- Risk: Tool execution with invalid parameters, unclear error messages
- Affected: handleActionClick, handleActionType, handleActionHover
API Key Exposure in Logs (api-server.js:924)
```
api_key: mk?.api_key ? `${String(mk.api_key).slice(0, 4)}...` : undefined
```
- Still logs partial API keys (good: truncated, bad: still in logs)
- Risk: Sensitive data in log files
- Recommendation: Consider removing API key logging entirely or use secure log levels

Positive Security Practices:

Proper timeout handling for tool execution
Environment variable fallbacks for API keys prevent hardcoding
Error message sanitization in most places

Error Handling & Input Validation ⚠️

Issues Found:

Inconsistent Validation Pattern
- handleActionNavigate validates url but other handlers don't validate nodeId
- handleActionType validates text but not nodeId
- Missing validation could lead to unclear error messages from tool execution
Timeout Handling
- Good: Consistent 30-second default timeout across all endpoints
- Good: Additional buffer time added in BrowserAgentServer.executeToolDirect (line 1449)
- Potential issue: No maximum timeout limit enforced
Error Response Structure
- Well-structured error responses with execution metadata
- Good defensive handling in executeToolDirect (line 562): result?.result?.success ?? false

Performance Considerations ✅

Good Practices:

Connection pooling and reuse in BrowserAgentServer
Proper timeout configuration prevents hanging requests
Efficient RPC communication pattern with DevTools
Minimal overhead for tracing metadata propagation

Potential Concerns:

Large api-server.js file (1,342 lines) could benefit from modularization
Static tool list instead of dynamic discovery (documented limitation)

Test Coverage Analysis ❌

Critical Gap: No tests found for new functionality

Searched for tests covering executeToolDirect, BrowserAgentServer, or API endpoints
Existing tool tests are limited to frontend DevTools tools (ListFilesTool, ReadFileTool, etc.)
Missing: Integration tests, API endpoint tests, error condition tests
Recommendation: Add test coverage for:
- Direct tool execution flow
- API endpoint validation
- Error handling scenarios
- Docker deployment configuration

Docker & Deployment ⚠️

Issue: Linux compatibility problem

docker-compose.lightweight.yml uses CDP_HOST=host.docker.internal
This hostname is not available by default on Linux Docker hosts
Solution: Add extra_hosts: ["host.docker.internal:host-gateway"] to the agent-server service

Automated Review Validation ✅

The CodeRabbit automated review identified several valid issues that I concur with:

Docstring coverage at 25% (requirement: 80%) - Valid concern
NodeId validation missing in action handlers - Confirmed
Docker Linux compatibility - Confirmed and detailed above
Unused variables in executeToolDirect - Minor but valid

Final Recommendations

Must Fix (Security):

Implement proper CORS policy or authentication
Add nodeId validation to action handlers
Consider removing API key logging

Should Fix (Quality):

Add comprehensive test coverage for new API endpoints
Fix Docker Linux compatibility issue
Address docstring coverage gap

Consider:

Modularize large api-server.js file
Implement rate limiting for API endpoints
Add API versioning strategy

Overall Assessment: This is a well-architected feature with good separation of concerns and comprehensive tracing integration. However, security and testing concerns need to be addressed before production deployment.

Job Run | PR Branch: feat/tool-calls-api

…PI for better debugging from Claude Code

claude · 2026-01-21T16:00:37Z

Claude finished @Daedaelius's task —— View job

Pull Request Review

I've completed a comprehensive review of this PR focusing on code quality, security, performance, and test coverage.

Review Checklist

Analyze code structure and architecture
Review security implications of new API endpoints
Check error handling and input validation
Assess performance considerations
Review test coverage gaps
Validate automated review feedback
Provide final recommendations

Architecture Review ✅

The PR introduces a well-structured direct tool execution API that bypasses LLM orchestration, along with comprehensive Langfuse tracing integration. The architecture follows good separation of concerns:

API Layer: New endpoints in api-server.js for tool execution and common actions
Execution Layer: BrowserAgentServer.executeToolDirect() method handles DevTools RPC communication
Tracing Integration: Metadata propagation through LLM calls and evaluation flows
Docker Configuration: Lightweight deployment option with proper service separation

Security Analysis ⚠️

Critical Issues:

Overly Permissive CORS (api-server.js:61)
```
res.setHeader('Access-Control-Allow-Origin', '*');
```
- Allows any domain to make requests to the API
- Risk: CSRF attacks, unauthorized cross-origin access
- Recommendation: Restrict to specific allowed origins or implement proper authentication
Missing Input Validation (api-server.js:713-826)
- Action handlers don't validate nodeId parameter before execution
- Risk: Tool execution with invalid parameters, unclear error messages
- Affected: handleActionClick, handleActionType, handleActionHover
API Key Exposure in Logs (api-server.js:924)
```
api_key: mk?.api_key ? `${String(mk.api_key).slice(0, 4)}...` : undefined
```
- Still logs partial API keys (good: truncated, bad: still in logs)
- Risk: Sensitive data in log files
- Recommendation: Consider removing API key logging entirely or use secure log levels

Positive Security Practices:

Proper timeout handling for tool execution
Environment variable fallbacks for API keys prevent hardcoding
Error message sanitization in most places

Error Handling & Input Validation ⚠️

Issues Found:

Inconsistent Validation Pattern
- handleActionNavigate validates url but other handlers don't validate nodeId
- handleActionType validates text but not nodeId
- Missing validation could lead to unclear error messages from tool execution
Timeout Handling
- Good: Consistent 30-second default timeout across all endpoints
- Good: Additional buffer time added in BrowserAgentServer.executeToolDirect (line 1449)
- Potential issue: No maximum timeout limit enforced
Error Response Structure
- Well-structured error responses with execution metadata
- Good defensive handling in executeToolDirect (line 562): result?.result?.success ?? false

Performance Considerations ✅

Good Practices:

Connection pooling and reuse in BrowserAgentServer
Proper timeout configuration prevents hanging requests
Efficient RPC communication pattern with DevTools
Minimal overhead for tracing metadata propagation

Potential Concerns:

Large api-server.js file (1,342 lines) could benefit from modularization
Static tool list instead of dynamic discovery (documented limitation)

Test Coverage Analysis ❌

Critical Gap: No tests found for new functionality

Searched for tests covering executeToolDirect, BrowserAgentServer, or API endpoints
Existing tool tests are limited to frontend DevTools tools (ListFilesTool, ReadFileTool, etc.)
Missing: Integration tests, API endpoint tests, error condition tests
Recommendation: Add test coverage for:
- Direct tool execution flow
- API endpoint validation
- Error handling scenarios
- Docker deployment configuration

Docker & Deployment ⚠️

Issue: Linux compatibility problem

docker-compose.lightweight.yml uses CDP_HOST=host.docker.internal
This hostname is not available by default on Linux Docker hosts
Solution: Add extra_hosts: ["host.docker.internal:host-gateway"] to the agent-server service

Automated Review Validation ✅

The CodeRabbit automated review identified several valid issues that I concur with:

Docstring coverage at 25% (requirement: 80%) - Valid concern
NodeId validation missing in action handlers - Confirmed
Docker Linux compatibility - Confirmed and detailed above
Unused variables in executeToolDirect - Minor but valid

Final Recommendations

Must Fix (Security):

Implement proper CORS policy or authentication
Add nodeId validation to action handlers
Consider removing API key logging

Should Fix (Quality):

Add comprehensive test coverage for new API endpoints
Fix Docker Linux compatibility issue
Address docstring coverage gap

Consider:

Modularize large api-server.js file
Implement rate limiting for API endpoints
Add API versioning strategy

Overall Assessment: This is a well-architected feature with good separation of concerns and comprehensive tracing integration. However, security and testing concerns need to be addressed before production deployment.

Job Run | PR Branch: feat/tool-calls-api

coderabbitai bot reviewed Dec 30, 2025

View reviewed changes

Daedaelius changed the title ~~Feat/tool calls api~~ Added API layer for tool calls Dec 30, 2025

olesho and others added 3 commits December 30, 2025 16:59

Added tracing metadata for LiteLLM provider

49a6e24

Added lightweight deployment: dockerized devtools panel + external br…

356fb31

…owser

Tool calls expose as Agent Server API

742a6b6

Daedaelius force-pushed the feat/tool-calls-api branch from afb8d2a to 742a6b6 Compare December 30, 2025 22:59

coderabbitai bot reviewed Dec 30, 2025

View reviewed changes

Fix tool name

49216cf

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into feat/tool-calls-api

fccbc85

# Conflicts: # docker/Makefile # docker/docker-compose.lightweight.yml

Daedaelius and others added 2 commits January 16, 2026 22:46

Merge GRD fix into feat/tool-calls-api

b5c08ad

Fixed Docker image and agent server to support tool-calling through A…

3f204a2

…PI for better debugging from Claude Code

Added API layer for tool calls #89

Are you sure you want to change the base?

Added API layer for tool calls #89

Conversation

Daedaelius commented Dec 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector bot commented Dec 30, 2025

Uh oh!

coderabbitai bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Review

Review Checklist

Architecture Review ✅

Security Analysis ⚠️

Error Handling & Input Validation ⚠️

Performance Considerations ✅

Test Coverage Analysis ❌

Docker & Deployment ⚠️

Automated Review Validation ✅

Final Recommendations

Uh oh!

claude bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Review

Review Checklist

Architecture Review ✅

Security Analysis ⚠️

Error Handling & Input Validation ⚠️

Performance Considerations ✅

Test Coverage Analysis ❌

Docker & Deployment ⚠️

Automated Review Validation ✅

Final Recommendations

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Review

Review Checklist

Architecture Review ✅

Security Analysis ⚠️

Error Handling & Input Validation ⚠️

Performance Considerations ✅

Daedaelius commented Dec 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 30, 2025 •

edited

Loading

claude bot commented Dec 30, 2025 •

edited

Loading

claude bot commented Dec 30, 2025 •

edited

Loading

claude bot commented Jan 16, 2026 •

edited

Loading

claude bot commented Jan 17, 2026 •

edited

Loading

claude bot commented Jan 21, 2026 •

edited

Loading