-
Notifications
You must be signed in to change notification settings - Fork 655
Enhance vectorizer config handling in project.py #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
解决issue719问题。原因应该是向量化器配置在校验/导入时没有拿到 type (或拿到的是空配置),于是框架退回去尝试按基类构造,触发 pyhocon 的缺键异常,最终包装成 invalid vectorizer config: 'No configuration setting found for key name' 。我已经做出修改同时兼容 vectorizer 和 vectorize_model 两种写法,并把维度写回两边,避免后续流程再读空:
whfcarter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both keys exist with different values, the code prioritizes vectorizer but doesn't warn users about config conflicts
knext/command/sub_command/project.py
Outdated
| vectorize_model_config_checker = VectorizeModelConfigChecker() | ||
| llm_config = config.get("chat_llm", {}) | ||
| vectorize_model_config = config.get("vectorizer", {}) | ||
| vectorize_model_config = config.get("vectorizer", {}) or config.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty dict triggers the second part of 'or'
knext/command/sub_command/project.py
Outdated
| "vectorize_model", {} | ||
| ) | ||
| if "vectorizer" not in config and "vectorize_model" in config: | ||
| config["vectorizer"] = config.get("vectorize_model", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is unnecessary since the key existence was already checked.
Suggest changing to
config["vectorizer"] = config["vectorize_model"]
Kaedeser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hav fixed according to the suggestions and resubmit.
whfcarter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体写的还是有点冗余,可以参考。
统一的配置获取和规范化逻辑
vectorizer_config = config.get("vectorizer", {})
vectorize_model_config = config.get("vectorize_model", {})
选择非空配置,如果都存在则优先使用 vectorizer
if vectorizer_config:
vectorize_model_config = vectorizer_config
elif vectorize_model_config:
vectorizer_config = vectorize_model_config
else:
vectorizer_config = vectorize_model_config = {}
统一写回(如果都为空就不需要写回)
if vectorizer_config or vectorize_model_config:
config["vectorizer"] = vectorizer_config
config["vectorize_model"] = vectorize_model_config
try:
llm_config_checker.check(json.dumps(llm_config))
dim = vectorize_model_config_checker.check(json.dumps(vectorizer_config))
# 只有在配置存在且为字典时才写入维度
if isinstance(config.get("vectorizer"), dict):
config["vectorizer"]["vector_dimensions"] = dim
if isinstance(config.get("vectorize_model"), dict):
config["vectorize_model"]["vector_dimensions"] = dim
whfcarter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
解决issue719问题。原因应该是向量化器配置在校验/导入时没有拿到 type (或拿到的是空配置),于是框架退回去尝试按基类构造,触发 pyhocon 的缺键异常,最终包装成 invalid vectorizer config: 'No configuration setting found for key name' 。我已经做出修改同时兼容 vectorizer 和 vectorize_model 两种写法,并把维度写回两边,避免后续流程再读空