Skip to content

feat: support structured outputs (response_format) in chat completions#43

Open
giwaov wants to merge 1 commit intoOpenGradient:mainfrom
giwaov:feat/structured-outputs
Open

feat: support structured outputs (response_format) in chat completions#43
giwaov wants to merge 1 commit intoOpenGradient:mainfrom
giwaov:feat/structured-outputs

Conversation

@giwaov
Copy link
Contributor

@giwaov giwaov commented Mar 23, 2026

Summary

Implements OpenAI-compatible structured outputs support by wiring the response_format parameter through the chat completion pipeline, as requested in #14.

Changes

tee_gateway/controllers/chat_controller.py

  • Non-streaming path (_create_non_streaming_response): After tool binding, checks response_format. If the type is json_object or json_schema, binds it to the LangChain model via model.bind(response_format=...). The text type is a no-op (default behavior).
  • Streaming path (_create_streaming_response): Identical logic applied after tool binding.
  • TEE hash dict (_chat_request_to_dict): Includes response_format in the canonical serialized dict so the TEE signature covers the requested output format.

tests/test_structured_outputs.py

14 unit tests covering:

  • Parsing response_format from request dicts (text, json_object, json_schema, and absent)
  • Inclusion in the TEE hash dict (presence, absence, determinism, differentiation)
  • Model binding behavior (json_object binds, json_schema binds full schema, text does not bind, absent does not bind)
  • Interaction with tool calling (both bind_tools and bind(response_format=...) chain correctly)

Design Decisions

  • No changes to llm_backend.py: The response_format is bound per-request via model.bind() after retrieving the cached model, following the same pattern already used for tool binding. This keeps the LRU cache clean (keyed only on model/temperature/max_tokens).
  • Pass-through approach: The response_format dict is forwarded as-is to LangChain, which handles provider-specific translation. This maintains OpenAI API compatibility and works with all supported providers (OpenAI, Anthropic, Google, xAI).

Supported Formats

Per the OpenAPI spec already defined in the repo:

  • {type: text} plain text (default, no-op)
  • {type: json_object} JSON mode
  • {type: json_schema, json_schema: {name: ..., schema: {...}, strict: true}} strict schema-constrained output

Closes #14

Wire the OpenAI-compatible response_format parameter through the chat
completion pipeline:

- Bind response_format to LangChain model via model.bind() for
  json_object and json_schema types (text is a no-op)
- Apply to both streaming and non-streaming code paths
- Include response_format in the canonical request dict so TEE
  hashing covers the requested output format
- Add 14 unit tests covering parsing, hash-dict serialization,
  model binding, and interaction with tool calling

Closes OpenGradient#14
@adambalogh adambalogh requested a review from kylexqian March 23, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for structured outputs

1 participant