Skip to content

fix: cypher prefix matching in embeddings query#301

Open
portwebdesign wants to merge 2 commits intovitali87:mainfrom
portwebdesign:fix/embeddings-query-prefix
Open

fix: cypher prefix matching in embeddings query#301
portwebdesign wants to merge 2 commits intovitali87:mainfrom
portwebdesign:fix/embeddings-query-prefix

Conversation

@portwebdesign
Copy link

This change avoids string concatenation inside the Cypher query by passing the fully-formed prefix from Python.
It prevents type errors like Invalid types: bool and string for '+' when m.qualified_name contains unexpected values.

Copilot AI review requested due to automatic review settings February 7, 2026 00:31
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @portwebdesign, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a bug in the Cypher query responsible for retrieving embeddings by adjusting how project name prefixes are handled. By eliminating an internal string concatenation within the Cypher query and instead relying on the Python application to provide the complete prefix, the change significantly improves the query's stability and prevents specific type-related errors, ensuring more dependable data retrieval.

Highlights

  • Cypher Query Refinement: The CYPHER_QUERY_EMBEDDINGS in codebase_rag/constants.py has been modified to remove the + '.' concatenation from the m.qualified_name STARTS WITH $project_name clause. The full prefix is now expected to be passed directly from Python.
  • Error Prevention: This change prevents potential type errors, such as 'Invalid types: bool and string for '+'', which could arise from string concatenation within the Cypher query when m.qualified_name contains unexpected values.
Changelog
  • codebase_rag/constants.py
    • Removed the + '.' concatenation from the m.qualified_name STARTS WITH $project_name clause within the CYPHER_QUERY_EMBEDDINGS constant.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

  • Adjusts the Cypher embeddings query to avoid string concatenation in STARTS WITH by relying on a fully-formed $project_name prefix parameter.
  • Intended to prevent Cypher type errors when m.qualified_name contains unexpected values.
  • This query is consumed by GraphUpdater._generate_semantic_embeddings() to fetch functions/methods for embedding generation and storage.

Confidence Score: 2/5

  • Not safe to merge as-is due to a likely behavioral mismatch that can stop embeddings generation.
  • The Cypher query change alters how the module prefix is matched, but the only known caller still appends a '.' to the parameter; this inconsistent contract can cause the query to return no rows in normal operation. Aside from that, the change is small and localized.
  • codebase_rag/constants.py (and verify call site in codebase_rag/graph_updater.py uses the intended prefix format)

Important Files Changed

Filename Overview
codebase_rag/constants.py Updates CYPHER_QUERY_EMBEDDINGS to use $project_name directly for STARTS WITH, but the only caller still passes project_name + '.' so semantics shift and the query likely matches nothing.

Sequence Diagram

sequenceDiagram
  participant GU as GraphUpdater
  participant CS as constants.py
  participant DB as Neo4j/Ingestor

  GU->>CS: uses CYPHER_QUERY_EMBEDDINGS
  GU->>DB: fetch_all(query, {project_name: project_name + "."})
  DB-->>DB: MATCH Module->DEFINES->Function/Method
  DB-->>DB: WHERE m.qualified_name STARTS WITH $project_name
  DB-->>GU: results rows
  GU-->>GU: embed_code() + store_embedding() for each row
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a type error in a Cypher query by removing string concatenation. While the intention is good, the change as-is could lead to incorrect query results by matching unintended projects. I've suggested a more robust query logic that handles edge cases correctly and prevents ambiguity. This proposed change relies on fixing the root cause of the type error in the calling Python code, which is the recommended approach.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the embeddings Cypher query to avoid string concatenation inside STARTS WITH, preventing Cypher type/precedence errors and relying on Python to pass the complete module-qualified-name prefix.

Changes:

  • Remove Cypher-side concatenation ($project_name + '.') from CYPHER_QUERY_EMBEDDINGS.
  • Make STARTS WITH compare directly against the provided parameter value.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@portwebdesign portwebdesign force-pushed the fix/embeddings-query-prefix branch from 0281c40 to 0ba3b65 Compare February 7, 2026 00:51
@portwebdesign portwebdesign force-pushed the fix/embeddings-query-prefix branch from 9b88a0b to 8370545 Compare February 8, 2026 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants