-
Notifications
You must be signed in to change notification settings - Fork 24
CREST Adapter #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CREST Adapter #807
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a CREST adapter to generate transition state (TS) conformer guesses for H_Abstraction reactions, complementing the existing heuristic approach. The integration includes CREST path detection, conda environment setup, and job submission/monitoring workflows.
Key Changes:
- New CREST adapter module (
arc/job/adapters/ts/crest.py) integrating constrained TS conformer searches into the heuristics workflow
arc/job/adapters/ts/crest.py
Outdated
| atom for atom in closest_atoms if not atom.startswith("H") | ||
| ) | ||
| if_hydrogen_present = any( | ||
| atom | ||
| for atom in closest_atoms |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in get_h_abs_atoms is incorrect. Line 305-312 checks any(atom for atom in closest_atoms if not atom.startswith("H")) which iterates over keys of closest_atoms dict (strings like "H0", "C1"), not the values. It should check any(atom for atom in atom_neighbours if not atom.startswith("H")) to check the actual neighbors of the hydrogen. The same issue exists on lines 308-312 for if_hydrogen_present.
| atom for atom in closest_atoms if not atom.startswith("H") | |
| ) | |
| if_hydrogen_present = any( | |
| atom | |
| for atom in closest_atoms | |
| atom for atom in atom_neighbours if not atom.startswith("H") | |
| ) | |
| if_hydrogen_present = any( | |
| atom | |
| for atom in atom_neighbours |
| "H": extract_digits(best_occurrence["H"]), | ||
| "A": extract_digits(best_occurrence["A"]), | ||
| "B": extract_digits(best_occurrence["B"]), | ||
| } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When len(condition_occurrences) == 1, the function should return the single occurrence but doesn't. After the if len(condition_occurrences) > 1: block completes at line 341, control falls through to the else block at line 342. The single occurrence case needs to be handled explicitly, or the logic restructured to return condition_occurrences[0] when there's exactly one match.
| } | |
| } | |
| elif len(condition_occurrences) == 1: | |
| occurrence = condition_occurrences[0] | |
| return { | |
| "H": extract_digits(occurrence["H"]), | |
| "A": extract_digits(occurrence["A"]), | |
| "B": extract_digits(occurrence["B"]), | |
| } |
| Args: | ||
| xyz_str (str): The string xyz format to be converted. | ||
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is missing the units parameter which is part of the function signature. This parameter should be documented to specify the units of the input coordinates (either 'angstrom' or 'bohr').
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. | |
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. | |
| units (str, optional): The units of the input coordinates (either 'angstrom' or 'bohr'). |
| } | ||
| else: | ||
|
|
||
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "Hyodrogen" should be "Hydrogen".
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate | |
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hydrogen atom has the lowest distance aggregate |
devtools/install_autotst.sh
Outdated
| autotst_on () { | ||
| export AUTOTST_ROOT="__AUTOTST_PATH__" | ||
| export AUTOTST_OLD_PATH="$PATH" | ||
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" | ||
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _strip_path function is defined inside the heredoc with double-quoted variables, but the function body uses single quotes for string operations. This creates a fragile quoting situation. Consider defining this as a regular shell function outside the heredoc, or ensure consistent quoting to avoid unexpected variable expansion issues.
| autotst_on () { | |
| export AUTOTST_ROOT="__AUTOTST_PATH__" | |
| export AUTOTST_OLD_PATH="$PATH" | |
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" | |
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } | |
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } | |
| autotst_on () { | |
| export AUTOTST_ROOT="__AUTOTST_PATH__" | |
| export AUTOTST_OLD_PATH="$PATH" | |
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" |
arc/job/adapters/ts/crest.py
Outdated
| distances = dataframe.loc[hydrogen_key, heavy_atoms].sum() | ||
| occurrence_distances.append((occurrence, distances)) | ||
| except KeyError as e: | ||
| print(f"Error accessing distances for occurrence {occurrence}: {e}") |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logger.error() instead of print() for consistency with the rest of the codebase. The logger is already imported and used elsewhere in this module.
| print(f"Error accessing distances for occurrence {occurrence}: {e}") | |
| logger.error(f"Error accessing distances for occurrence {occurrence}: {e}") |
arc/statmech/arkane.py
Outdated
|
|
||
| def _iter_level_keys_from_section(file_path: str, | ||
| section_start: str, | ||
| section_end: str) -> list[str]: |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint list[str] requires Python 3.9+. Since this project supports Python 3.7+ (as seen in environment files), you need to either use List[str] from typing (which is already imported on line 9) or add from __future__ import annotations at the top of the file.
| section_end: str) -> list[str]: | |
| section_end: str) -> List[str]: |
arc/statmech/arkane.py
Outdated
| return method.lower().replace('-', '') | ||
|
|
||
|
|
||
| def _split_method_year(method_norm: str): |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type annotation is missing for _split_method_year. Based on the docstring examples and implementation, it should return Tuple[str, Optional[int]].
arc/job/adapters/ts/autotst_ts.py
Outdated
| # try: | ||
| # from autotst.reaction import Reaction as AutoTST_Reaction | ||
| # except (ImportError, ModuleNotFoundError): | ||
| # HAS_AUTOTST = False | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # try: | |
| # from autotst.reaction import Reaction as AutoTST_Reaction | |
| # except (ImportError, ModuleNotFoundError): | |
| # HAS_AUTOTST = False |
|
|
||
| import datetime | ||
| import itertools | ||
| import os |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
d6a83b4 to
c231a43
Compare
| return None, None | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, an unused global variable should either be removed, renamed to indicate it is intentionally unused, or explicitly marked as public API if it is meant to be used by other modules. For a settings/config module, globals are typically defined precisely so that other modules can import them, so the appropriate fix is usually to make that intention explicit.
The single best fix here, without changing existing functionality, is to add CREST_PATH (and, for consistency, CREST_ENV_PATH) to the module’s __all__ list. If __all__ already exists in arc/settings/settings.py, we would add these names to it; if it does not, we will introduce a new __all__ definition near the bottom of the file (after the assignment to CREST_PATH, CREST_ENV_PATH). This explicitly declares these globals as part of the public API and will generally cause CodeQL’s “unused global variable” check to accept them as intentional. No new imports or helper methods are needed; we only add a small __all__ declaration.
Concretely, in arc/settings/settings.py, just below line 553 where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, we will insert a __all__ list that includes these two names. We will show this as a replacement block that replaces that final assignment line with the same assignment followed by the __all__ declaration.
-
Copy modified lines R554-R558
| @@ -551,3 +551,8 @@ | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() | ||
|
|
||
| __all__ = [ | ||
| "CREST_PATH", | ||
| "CREST_ENV_PATH", | ||
| ] |
| return None, None | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, for an unused global that is intentionally kept (for documentation or for potential future use), rename it to follow an "unused" naming convention so static analysis tools and human readers understand it is not a mistake. If the value has side effects on the right-hand side, keep the right-hand side intact and only change or remove the left-hand side variable name.
Here, find_crest_executable() must still be called so that CREST_PATH continues to be initialized exactly as before; we only need to make it clear that the second return value is not used. The simplest non‑breaking change is to bind the environment component to a clearly unused name such as _unused_crest_env_path. This preserves all side effects and behavior of find_crest_executable(), keeps CREST_PATH available exactly as before, and satisfies the convention for unused variables described in the background.
Concretely, in arc/settings/settings.py at the bottom where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, change CREST_ENV_PATH to _unused_crest_env_path. No imports or additional methods are required.
-
Copy modified line R553
| @@ -550,4 +550,4 @@ | ||
| return None, None | ||
|
|
||
|
|
||
| CREST_PATH, CREST_ENV_PATH = find_crest_executable() | ||
| CREST_PATH, _unused_crest_env_path = find_crest_executable() |
3e88c36 to
7674f5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
devtools/install_autotst.sh
Outdated
| # replace placeholder with actual path | ||
| sed -i "s#__AUTOTST_PATH__#$(pwd | sed 's#/#\\\\/#g')#" ~/.bashrc |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sed -i command used here is not portable across all Unix-like systems. On macOS, sed -i requires an extension argument (e.g., sed -i '' ...). Consider using a more portable approach such as creating a temporary file or using perl -i -pe instead.
| # replace placeholder with actual path | |
| sed -i "s#__AUTOTST_PATH__#$(pwd | sed 's#/#\\\\/#g')#" ~/.bashrc | |
| # replace placeholder with actual path (use portable sed without -i) | |
| AUTOTST_ESCAPED_PATH="$(pwd | sed 's#/#\\\\/#g')" | |
| tmp_bashrc="$(mktemp "${HOME}/.bashrc.autotst.XXXXXX")" | |
| sed "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc > "${tmp_bashrc}" | |
| mv "${tmp_bashrc}" ~/.bashrc |
arc/settings/settings.py
Outdated
|
|
||
| def find_highest_version_in_directory(directory, name_contains): | ||
| """ | ||
| Finds the file with the highest version in a directory containing a specific string. |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function find_highest_version_in_directory returns the path to the crest executable, but the docstring says it "Finds the file with the highest version". Consider updating the docstring to explicitly mention that it returns the path to the executable, not just any file, and document the return type more clearly (e.g., "Returns: Optional[str]: The full path to the crest executable with the highest version, or None if not found").
| Finds the file with the highest version in a directory containing a specific string. | |
| Find the crest executable with the highest version in a directory. | |
| This function searches the given directory for subdirectories whose names contain | |
| the specified string, interprets their names as versioned folders (e.g., "crest-3.0.2", | |
| "v212", "2.1"), and within each candidate folder looks for an executable named | |
| ``crest``. It returns the full path to the crest executable located in the folder | |
| with the highest parsed version. | |
| Args: | |
| directory (str): The path to the directory to search. | |
| name_contains (str): A substring that must be present in the folder name. | |
| Returns: | |
| Optional[str]: The full path to the crest executable with the highest version, | |
| or None if no suitable executable is found. |
arc/settings/settings.py
Outdated
| for crest_path in potential_env_paths: | ||
| if os.path.isfile(crest_path) and os.access(crest_path, os.X_OK): | ||
| # env_root = .../anaconda3 or .../miniforge3 or .../mambaforge etc. | ||
| env_root = crest_path.split("/envs/crest_env/")[0] |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string split operation crest_path.split("/envs/crest_env/")[0] assumes a Unix-style path separator and will fail on Windows systems. Consider using os.path.sep or pathlib.Path for cross-platform compatibility.
| env_root = crest_path.split("/envs/crest_env/")[0] | |
| env_marker = os.path.join("envs", "crest_env") + os.path.sep | |
| env_root = crest_path.split(env_marker)[0] |
arc/species/converter.py
Outdated
| lxyz = xyz_str.strip().split() | ||
|
|
||
| atom_first = False if is_str_float(lxyz[0]) else True | ||
| lxyz = xyz_str.strip().splitlines() | ||
|
|
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable lxyz is first assigned the result of xyz_str.strip().split() (line 103), which splits by whitespace and creates a list of all words. Then on line 106, it's reassigned to xyz_str.strip().splitlines(). The first assignment on line 103 is only used to check if the first element is a float (line 105), but this could be simplified to avoid confusion. Consider using a different variable name for the initial split or combining the logic more clearly.
| lxyz = xyz_str.strip().split() | |
| atom_first = False if is_str_float(lxyz[0]) else True | |
| lxyz = xyz_str.strip().splitlines() | |
| lxyz = xyz_str.strip().splitlines() | |
| # Determine whether the atom label appears first or last in each line | |
| first_line_tokens = lxyz[0].strip().split() | |
| atom_first = not is_str_float(first_line_tokens[0]) |
arc/species/converter.py
Outdated
| atom_first = False if is_str_float(lxyz[0]) else True | ||
| lxyz = xyz_str.strip().splitlines() | ||
|
|
||
| for idx, item in enumerate(lxyz): |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable idx is defined in the loop but never used. Consider removing it or using for item in lxyz: instead to improve code clarity.
| for idx, item in enumerate(lxyz): | |
| for item in lxyz: |
arc/job/adapters/ts/crest.py
Outdated
| except (ValueError, KeyError) as e: | ||
| logger.error(f"Could not determine the H abstraction atoms, got:\n{e}") |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handling here catches broad exceptions (ValueError, KeyError) which may hide unexpected errors. Consider being more specific about which operations might raise these exceptions and handle them separately, or at minimum, add more detailed logging about what went wrong to aid debugging.
arc/species/converter.py
Outdated
| BOHR_TO_ANGSTROM = 0.529177 | ||
| ANGSTROM_TO_BOHR = 1.8897259886 | ||
|
|
||
| if units.lower() == 'angstrom' and convert_to.lower() == 'angstrom': | ||
| conversion_factor = 1 | ||
| elif units.lower() == 'bohr' and convert_to.lower() == 'bohr': | ||
| conversion_factor = 1 | ||
| elif units.lower() == 'angstrom' and convert_to.lower() == 'bohr': | ||
| conversion_factor = ANGSTROM_TO_BOHR | ||
| elif units.lower() == 'bohr' and convert_to.lower() == 'angstrom': | ||
| conversion_factor = BOHR_TO_ANGSTROM |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion factors are defined as local constants within this function, but they are well-established physical constants (Bohr to Angstrom conversion). Consider moving these to a module-level constants section or importing them from arc.constants to avoid duplication and improve maintainability. This would also ensure consistency if these values are used elsewhere.
| BOHR_TO_ANGSTROM = 0.529177 | |
| ANGSTROM_TO_BOHR = 1.8897259886 | |
| if units.lower() == 'angstrom' and convert_to.lower() == 'angstrom': | |
| conversion_factor = 1 | |
| elif units.lower() == 'bohr' and convert_to.lower() == 'bohr': | |
| conversion_factor = 1 | |
| elif units.lower() == 'angstrom' and convert_to.lower() == 'bohr': | |
| conversion_factor = ANGSTROM_TO_BOHR | |
| elif units.lower() == 'bohr' and convert_to.lower() == 'angstrom': | |
| conversion_factor = BOHR_TO_ANGSTROM | |
| if units.lower() == 'angstrom' and convert_to.lower() == 'angstrom': | |
| conversion_factor = 1 | |
| elif units.lower() == 'bohr' and convert_to.lower() == 'bohr': | |
| conversion_factor = 1 | |
| elif units.lower() == 'angstrom' and convert_to.lower() == 'bohr': | |
| conversion_factor = constants.ANGSTROM_TO_BOHR | |
| elif units.lower() == 'bohr' and convert_to.lower() == 'angstrom': | |
| conversion_factor = constants.BOHR_TO_ANGSTROM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
713209f to
9c25f3d
Compare
Adds a CREST TS search adapter for finding transition state conformers, leveraging heuristics-generated guesses and CREST's conformer search capabilities. This allows ARC to explore TS geometries using CREST. Adds method source to TS guesses from CREST Ensures CREST is recorded as a method source when a TS guess from CREST is deemed unique and considered as a possible TS guess. This change ensures proper provenance tracking for TS guesses. Improves error logging in CREST adapter Enhances error messages in the CREST adapter to provide more context when failing to determine H abstraction atoms. The error message now includes the reaction label and crest seed iteration number, which aids in debugging.
Adds a script to install CREST using a dedicated conda environment. Integrates CREST into the `install_all.sh` script and the help message. Also adds a `--no-rmg` flag in `install_all.sh` to optionally skip RMG-Py installation, and ensures PyRDL is installed only when ARC is installed. Corrects a problem with the AutoTST activation/deactivation hooks and ensures that RMG-Py's path is correctly removed from the PATH and PYTHONPATH. Also ensures TS-GCN can be correctly installed into its environment.
Adds the CREST adapter as an option for transition state (TS) guess generation, particularly for H_Abstraction reactions, improving the exploration of potential TS geometries. Includes CREST in the default list of in-core adapters and allows it to be used in heuristics. The heuristic adapter now passes the local path of the files to the h_abstraction method so the geometries are saved to the appropriate folders. Saves the method used to generate the TS guesses in the file name so the geometries generated with CREST are clearly labeled as such.
9c25f3d to
51368be
Compare
Adds a function to reorder XYZ strings between atom-first and coordinate-first formats, with optional unit conversion. Also introduces a backward compatible wrapper with a deprecation warning. Uses constants for unit conversion factors Replaces hardcoded unit conversion factors with values from the `constants` module. Also, determines atom ordering more robustly by inspecting only the first line. Corrects angstrom to bohr conversion factor The conversion factor between angstrom and bohr was inverted, which led to incorrect geometry conversions. This commit corrects the conversion factor to ensure proper unit conversion when dealing with xyz coordinates.
Removes input files and submit script related to a specific server time limit test case. The `err.txt` file is renamed to `wall_exceeded.txt` and moved to the `trsh` folder.
Enables the usage of CREST as a TS adapter. The implementation includes: - Adding 'crest' to the list of available TS adapters. - Implementing a function to automatically detect the CREST executable in various locations (standalone builds, conda environments, PATH). - Providing instructions for activating the CREST environment, if necessary. This allows ARC to leverage CREST for enhanced transition state search capabilities. Improves crest executable search logic Refactors the crest executable search to more reliably locate the correct version by normalizing path joining. Updates the documentation for clarity.
Adds a `method_sources` attribute to the `TSGuess` class to track all methods/sources that generated a particular transition state guess, allowing for more comprehensive provenance information. This improves the clustering of similar TS guesses. Normalizes method sources to a unique, ordered, lowercase list.
Ensures compatibility of the AutoTST installation script with both GNU and BSD versions of `sed`. It achieves this by conditionally including an empty string in the `sed -i` command for BSD systems, which require it.
Adds the angstrom to bohr conversion factor. This conversion factor is required for the CREST adapter.
Introduces a function to generate unique project names for parallel test execution using xdist. This prevents collisions when cleaning up project directories, ensuring reliable test execution.
Addition of CREST Adapter that complements the heuristic adapter.