Skip to content

Conversation

@Kaedeser
Copy link

@Kaedeser Kaedeser commented Jan 9, 2026

解决issue719问题。原因应该是向量化器配置在校验/导入时没有拿到 type (或拿到的是空配置),于是框架退回去尝试按基类构造,触发 pyhocon 的缺键异常,最终包装成 invalid vectorizer config: 'No configuration setting found for key name' 。我已经做出修改同时兼容 vectorizer 和 vectorize_model 两种写法,并把维度写回两边,避免后续流程再读空

解决issue719问题。原因应该是向量化器配置在校验/导入时没有拿到 type (或拿到的是空配置),于是框架退回去尝试按基类构造,触发 pyhocon 的缺键异常,最终包装成 invalid vectorizer config: 'No configuration setting found for key name' 。我已经做出修改同时兼容 vectorizer 和 vectorize_model 两种写法,并把维度写回两边,避免后续流程再读空:
@Kaedeser Kaedeser changed the title Enhance vectorizer config handling in project.py Enhance vectorizer config handling in project.py#719 Jan 9, 2026
@Kaedeser Kaedeser changed the title Enhance vectorizer config handling in project.py#719 Enhance vectorizer config handling in project.py Jan 9, 2026
Copy link
Collaborator

@whfcarter whfcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both keys exist with different values, the code prioritizes vectorizer but doesn't warn users about config conflicts

vectorize_model_config_checker = VectorizeModelConfigChecker()
llm_config = config.get("chat_llm", {})
vectorize_model_config = config.get("vectorizer", {})
vectorize_model_config = config.get("vectorizer", {}) or config.get(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty dict triggers the second part of 'or'

"vectorize_model", {}
)
if "vectorizer" not in config and "vectorize_model" in config:
config["vectorizer"] = config.get("vectorize_model", {})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is unnecessary since the key existence was already checked.

Suggest changing to

config["vectorizer"] = config["vectorize_model"]

Copy link
Author

@Kaedeser Kaedeser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hav fixed according to the suggestions and resubmit.

Copy link
Collaborator

@whfcarter whfcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体写的还是有点冗余,可以参考。

统一的配置获取和规范化逻辑

vectorizer_config = config.get("vectorizer", {})
vectorize_model_config = config.get("vectorize_model", {})

选择非空配置,如果都存在则优先使用 vectorizer

if vectorizer_config:
vectorize_model_config = vectorizer_config
elif vectorize_model_config:
vectorizer_config = vectorize_model_config
else:
vectorizer_config = vectorize_model_config = {}

统一写回(如果都为空就不需要写回)

if vectorizer_config or vectorize_model_config:
config["vectorizer"] = vectorizer_config
config["vectorize_model"] = vectorize_model_config

try:
llm_config_checker.check(json.dumps(llm_config))
dim = vectorize_model_config_checker.check(json.dumps(vectorizer_config))

# 只有在配置存在且为字典时才写入维度
if isinstance(config.get("vectorizer"), dict):
    config["vectorizer"]["vector_dimensions"] = dim
if isinstance(config.get("vectorize_model"), dict):
    config["vectorize_model"]["vector_dimensions"] = dim

Copy link
Collaborator

@whfcarter whfcarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants