Skip to content

Conversation

@calvinp0
Copy link
Member

Addition of CREST Adapter that complements the heuristic adapter.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CREST adapter to generate transition state (TS) conformer guesses for H_Abstraction reactions, complementing the existing heuristic approach. The integration includes CREST path detection, conda environment setup, and job submission/monitoring workflows.

Key Changes:

  • New CREST adapter module (arc/job/adapters/ts/crest.py) integrating constrained TS conformer searches into the heuristics workflow

Comment on lines 306 to 310
atom for atom in closest_atoms if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in closest_atoms
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in get_h_abs_atoms is incorrect. Line 305-312 checks any(atom for atom in closest_atoms if not atom.startswith("H")) which iterates over keys of closest_atoms dict (strings like "H0", "C1"), not the values. It should check any(atom for atom in atom_neighbours if not atom.startswith("H")) to check the actual neighbors of the hydrogen. The same issue exists on lines 308-312 for if_hydrogen_present.

Suggested change
atom for atom in closest_atoms if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in closest_atoms
atom for atom in atom_neighbours if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in atom_neighbours

Copilot uses AI. Check for mistakes.
"H": extract_digits(best_occurrence["H"]),
"A": extract_digits(best_occurrence["A"]),
"B": extract_digits(best_occurrence["B"]),
}
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When len(condition_occurrences) == 1, the function should return the single occurrence but doesn't. After the if len(condition_occurrences) > 1: block completes at line 341, control falls through to the else block at line 342. The single occurrence case needs to be handled explicitly, or the logic restructured to return condition_occurrences[0] when there's exactly one match.

Suggested change
}
}
elif len(condition_occurrences) == 1:
occurrence = condition_occurrences[0]
return {
"H": extract_digits(occurrence["H"]),
"A": extract_digits(occurrence["A"]),
"B": extract_digits(occurrence["B"]),
}

Copilot uses AI. Check for mistakes.
Args:
xyz_str (str): The string xyz format to be converted.
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is missing the units parameter which is part of the function signature. This parameter should be documented to specify the units of the input coordinates (either 'angstrom' or 'bohr').

Suggested change
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
units (str, optional): The units of the input coordinates (either 'angstrom' or 'bohr').

Copilot uses AI. Check for mistakes.
}
else:

# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "Hyodrogen" should be "Hydrogen".

Suggested change
# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate
# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hydrogen atom has the lowest distance aggregate

Copilot uses AI. Check for mistakes.
Comment on lines 143 to 148
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _strip_path function is defined inside the heredoc with double-quoted variables, but the function body uses single quotes for string operations. This creates a fragile quoting situation. Consider defining this as a regular shell function outside the heredoc, or ensure consistent quoting to avoid unexpected variable expansion issues.

Suggested change
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"

Copilot uses AI. Check for mistakes.
distances = dataframe.loc[hydrogen_key, heavy_atoms].sum()
occurrence_distances.append((occurrence, distances))
except KeyError as e:
print(f"Error accessing distances for occurrence {occurrence}: {e}")
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for consistency with the rest of the codebase. The logger is already imported and used elsewhere in this module.

Suggested change
print(f"Error accessing distances for occurrence {occurrence}: {e}")
logger.error(f"Error accessing distances for occurrence {occurrence}: {e}")

Copilot uses AI. Check for mistakes.

def _iter_level_keys_from_section(file_path: str,
section_start: str,
section_end: str) -> list[str]:
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint list[str] requires Python 3.9+. Since this project supports Python 3.7+ (as seen in environment files), you need to either use List[str] from typing (which is already imported on line 9) or add from __future__ import annotations at the top of the file.

Suggested change
section_end: str) -> list[str]:
section_end: str) -> List[str]:

Copilot uses AI. Check for mistakes.
return method.lower().replace('-', '')


def _split_method_year(method_norm: str):
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation is missing for _split_method_year. Based on the docstring examples and implementation, it should return Tuple[str, Optional[int]].

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 27
# try:
# from autotst.reaction import Reaction as AutoTST_Reaction
# except (ImportError, ModuleNotFoundError):
# HAS_AUTOTST = False

Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# try:
# from autotst.reaction import Reaction as AutoTST_Reaction
# except (ImportError, ModuleNotFoundError):
# HAS_AUTOTST = False

Copilot uses AI. Check for mistakes.

import datetime
import itertools
import os
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.

This comment was marked as resolved.

return None, None


CREST_PATH, CREST_ENV_PATH = find_crest_executable()

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'CREST_PATH' is not used.

Copilot Autofix

AI 1 day ago

In general, an unused global variable should either be removed, renamed to indicate it is intentionally unused, or explicitly marked as public API if it is meant to be used by other modules. For a settings/config module, globals are typically defined precisely so that other modules can import them, so the appropriate fix is usually to make that intention explicit.

The single best fix here, without changing existing functionality, is to add CREST_PATH (and, for consistency, CREST_ENV_PATH) to the module’s __all__ list. If __all__ already exists in arc/settings/settings.py, we would add these names to it; if it does not, we will introduce a new __all__ definition near the bottom of the file (after the assignment to CREST_PATH, CREST_ENV_PATH). This explicitly declares these globals as part of the public API and will generally cause CodeQL’s “unused global variable” check to accept them as intentional. No new imports or helper methods are needed; we only add a small __all__ declaration.

Concretely, in arc/settings/settings.py, just below line 553 where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, we will insert a __all__ list that includes these two names. We will show this as a replacement block that replaces that final assignment line with the same assignment followed by the __all__ declaration.

Suggested changeset 1
arc/settings/settings.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/arc/settings/settings.py b/arc/settings/settings.py
--- a/arc/settings/settings.py
+++ b/arc/settings/settings.py
@@ -551,3 +551,8 @@
 
 
 CREST_PATH, CREST_ENV_PATH = find_crest_executable()
+
+__all__ = [
+    "CREST_PATH",
+    "CREST_ENV_PATH",
+]
EOF
@@ -551,3 +551,8 @@


CREST_PATH, CREST_ENV_PATH = find_crest_executable()

__all__ = [
"CREST_PATH",
"CREST_ENV_PATH",
]
Copilot is powered by AI and may make mistakes. Always verify output.
return None, None


CREST_PATH, CREST_ENV_PATH = find_crest_executable()

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable 'CREST_ENV_PATH' is not used.

Copilot Autofix

AI 1 day ago

In general, for an unused global that is intentionally kept (for documentation or for potential future use), rename it to follow an "unused" naming convention so static analysis tools and human readers understand it is not a mistake. If the value has side effects on the right-hand side, keep the right-hand side intact and only change or remove the left-hand side variable name.

Here, find_crest_executable() must still be called so that CREST_PATH continues to be initialized exactly as before; we only need to make it clear that the second return value is not used. The simplest non‑breaking change is to bind the environment component to a clearly unused name such as _unused_crest_env_path. This preserves all side effects and behavior of find_crest_executable(), keeps CREST_PATH available exactly as before, and satisfies the convention for unused variables described in the background.

Concretely, in arc/settings/settings.py at the bottom where CREST_PATH, CREST_ENV_PATH = find_crest_executable() is defined, change CREST_ENV_PATH to _unused_crest_env_path. No imports or additional methods are required.

Suggested changeset 1
arc/settings/settings.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/arc/settings/settings.py b/arc/settings/settings.py
--- a/arc/settings/settings.py
+++ b/arc/settings/settings.py
@@ -550,4 +550,4 @@
     return None, None
 
 
-CREST_PATH, CREST_ENV_PATH = find_crest_executable()
+CREST_PATH, _unused_crest_env_path = find_crest_executable()
EOF
@@ -550,4 +550,4 @@
return None, None


CREST_PATH, CREST_ENV_PATH = find_crest_executable()
CREST_PATH, _unused_crest_env_path = find_crest_executable()
Copilot is powered by AI and may make mistakes. Always verify output.
@calvinp0 calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 3e88c36 to 7674f5f Compare February 2, 2026 09:49
@calvinp0 calvinp0 requested a review from Copilot February 2, 2026 12:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 176 to 177
# replace placeholder with actual path
sed -i "s#__AUTOTST_PATH__#$(pwd | sed 's#/#\\\\/#g')#" ~/.bashrc
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed -i command used here is not portable across all Unix-like systems. On macOS, sed -i requires an extension argument (e.g., sed -i '' ...). Consider using a more portable approach such as creating a temporary file or using perl -i -pe instead.

Suggested change
# replace placeholder with actual path
sed -i "s#__AUTOTST_PATH__#$(pwd | sed 's#/#\\\\/#g')#" ~/.bashrc
# replace placeholder with actual path (use portable sed without -i)
AUTOTST_ESCAPED_PATH="$(pwd | sed 's#/#\\\\/#g')"
tmp_bashrc="$(mktemp "${HOME}/.bashrc.autotst.XXXXXX")"
sed "s#__AUTOTST_PATH__#${AUTOTST_ESCAPED_PATH}#" ~/.bashrc > "${tmp_bashrc}"
mv "${tmp_bashrc}" ~/.bashrc

Copilot uses AI. Check for mistakes.

def find_highest_version_in_directory(directory, name_contains):
"""
Finds the file with the highest version in a directory containing a specific string.
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function find_highest_version_in_directory returns the path to the crest executable, but the docstring says it "Finds the file with the highest version". Consider updating the docstring to explicitly mention that it returns the path to the executable, not just any file, and document the return type more clearly (e.g., "Returns: Optional[str]: The full path to the crest executable with the highest version, or None if not found").

Suggested change
Finds the file with the highest version in a directory containing a specific string.
Find the crest executable with the highest version in a directory.
This function searches the given directory for subdirectories whose names contain
the specified string, interprets their names as versioned folders (e.g., "crest-3.0.2",
"v212", "2.1"), and within each candidate folder looks for an executable named
``crest``. It returns the full path to the crest executable located in the folder
with the highest parsed version.
Args:
directory (str): The path to the directory to search.
name_contains (str): A substring that must be present in the folder name.
Returns:
Optional[str]: The full path to the crest executable with the highest version,
or None if no suitable executable is found.

Copilot uses AI. Check for mistakes.
for crest_path in potential_env_paths:
if os.path.isfile(crest_path) and os.access(crest_path, os.X_OK):
# env_root = .../anaconda3 or .../miniforge3 or .../mambaforge etc.
env_root = crest_path.split("/envs/crest_env/")[0]
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string split operation crest_path.split("/envs/crest_env/")[0] assumes a Unix-style path separator and will fail on Windows systems. Consider using os.path.sep or pathlib.Path for cross-platform compatibility.

Suggested change
env_root = crest_path.split("/envs/crest_env/")[0]
env_marker = os.path.join("envs", "crest_env") + os.path.sep
env_root = crest_path.split(env_marker)[0]

Copilot uses AI. Check for mistakes.
Comment on lines 103 to 107
lxyz = xyz_str.strip().split()

atom_first = False if is_str_float(lxyz[0]) else True
lxyz = xyz_str.strip().splitlines()

Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable lxyz is first assigned the result of xyz_str.strip().split() (line 103), which splits by whitespace and creates a list of all words. Then on line 106, it's reassigned to xyz_str.strip().splitlines(). The first assignment on line 103 is only used to check if the first element is a float (line 105), but this could be simplified to avoid confusion. Consider using a different variable name for the initial split or combining the logic more clearly.

Suggested change
lxyz = xyz_str.strip().split()
atom_first = False if is_str_float(lxyz[0]) else True
lxyz = xyz_str.strip().splitlines()
lxyz = xyz_str.strip().splitlines()
# Determine whether the atom label appears first or last in each line
first_line_tokens = lxyz[0].strip().split()
atom_first = not is_str_float(first_line_tokens[0])

Copilot uses AI. Check for mistakes.
atom_first = False if is_str_float(lxyz[0]) else True
lxyz = xyz_str.strip().splitlines()

for idx, item in enumerate(lxyz):
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable idx is defined in the loop but never used. Consider removing it or using for item in lxyz: instead to improve code clarity.

Suggested change
for idx, item in enumerate(lxyz):
for item in lxyz:

Copilot uses AI. Check for mistakes.
Comment on lines 221 to 222
except (ValueError, KeyError) as e:
logger.error(f"Could not determine the H abstraction atoms, got:\n{e}")
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling here catches broad exceptions (ValueError, KeyError) which may hide unexpected errors. Consider being more specific about which operations might raise these exceptions and handle them separately, or at minimum, add more detailed logging about what went wrong to aid debugging.

Copilot uses AI. Check for mistakes.
Comment on lines 87 to 97
BOHR_TO_ANGSTROM = 0.529177
ANGSTROM_TO_BOHR = 1.8897259886

if units.lower() == 'angstrom' and convert_to.lower() == 'angstrom':
conversion_factor = 1
elif units.lower() == 'bohr' and convert_to.lower() == 'bohr':
conversion_factor = 1
elif units.lower() == 'angstrom' and convert_to.lower() == 'bohr':
conversion_factor = ANGSTROM_TO_BOHR
elif units.lower() == 'bohr' and convert_to.lower() == 'angstrom':
conversion_factor = BOHR_TO_ANGSTROM
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion factors are defined as local constants within this function, but they are well-established physical constants (Bohr to Angstrom conversion). Consider moving these to a module-level constants section or importing them from arc.constants to avoid duplication and improve maintainability. This would also ensure consistency if these values are used elsewhere.

Suggested change
BOHR_TO_ANGSTROM = 0.529177
ANGSTROM_TO_BOHR = 1.8897259886
if units.lower() == 'angstrom' and convert_to.lower() == 'angstrom':
conversion_factor = 1
elif units.lower() == 'bohr' and convert_to.lower() == 'bohr':
conversion_factor = 1
elif units.lower() == 'angstrom' and convert_to.lower() == 'bohr':
conversion_factor = ANGSTROM_TO_BOHR
elif units.lower() == 'bohr' and convert_to.lower() == 'angstrom':
conversion_factor = BOHR_TO_ANGSTROM
if units.lower() == 'angstrom' and convert_to.lower() == 'angstrom':
conversion_factor = 1
elif units.lower() == 'bohr' and convert_to.lower() == 'bohr':
conversion_factor = 1
elif units.lower() == 'angstrom' and convert_to.lower() == 'bohr':
conversion_factor = constants.ANGSTROM_TO_BOHR
elif units.lower() == 'bohr' and convert_to.lower() == 'angstrom':
conversion_factor = constants.BOHR_TO_ANGSTROM

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@calvinp0 calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 713209f to 9c25f3d Compare February 2, 2026 13:21
Adds a CREST TS search adapter for finding transition state conformers,
leveraging heuristics-generated guesses and CREST's conformer search capabilities.

This allows ARC to explore TS geometries using CREST.

Adds method source to TS guesses from CREST

Ensures CREST is recorded as a method source when a TS guess from CREST is deemed unique and considered as a possible TS guess.
This change ensures proper provenance tracking for TS guesses.

Improves error logging in CREST adapter

Enhances error messages in the CREST adapter to provide more context when failing to determine H abstraction atoms.
The error message now includes the reaction label and crest seed iteration number, which aids in debugging.
Adds a script to install CREST using a dedicated conda environment.
Integrates CREST into the `install_all.sh` script and the help message.
Also adds a `--no-rmg` flag in `install_all.sh` to optionally skip RMG-Py installation, and ensures PyRDL is installed only when ARC is installed.

Corrects a problem with the AutoTST activation/deactivation hooks and ensures that RMG-Py's path is correctly removed from the PATH and PYTHONPATH.
Also ensures TS-GCN can be correctly installed into its environment.
Adds the CREST adapter as an option for transition state (TS) guess generation, particularly for H_Abstraction reactions, improving the exploration of potential TS geometries.

Includes CREST in the default list of in-core adapters and allows it to be used in heuristics.

The heuristic adapter now passes the local path of the files to the h_abstraction method so the geometries are saved to the appropriate folders.
Saves the method used to generate the TS guesses in the file name so the geometries generated with CREST are clearly labeled as such.
Adds a function to reorder XYZ strings between atom-first and coordinate-first formats, with optional unit conversion.
Also introduces a backward compatible wrapper with a deprecation warning.

Uses constants for unit conversion factors

Replaces hardcoded unit conversion factors with values from the `constants` module. Also, determines atom ordering more robustly by inspecting only the first line.

Corrects angstrom to bohr conversion factor

The conversion factor between angstrom and bohr was inverted, which led to incorrect geometry conversions.

This commit corrects the conversion factor to ensure proper unit conversion when dealing with xyz coordinates.
Removes input files and submit script related to a specific server time limit test case.
The `err.txt` file is renamed to `wall_exceeded.txt` and moved to the `trsh` folder.
Enables the usage of CREST as a TS adapter.

The implementation includes:
- Adding 'crest' to the list of available TS adapters.
- Implementing a function to automatically detect the CREST executable in various locations (standalone builds, conda environments, PATH).
- Providing instructions for activating the CREST environment, if necessary.

This allows ARC to leverage CREST for enhanced transition state search capabilities.

Improves crest executable search logic

Refactors the crest executable search to more reliably locate the correct version by normalizing path joining.

Updates the documentation for clarity.
Adds a `method_sources` attribute to the `TSGuess` class to track all
methods/sources that generated a particular transition state guess, allowing
for more comprehensive provenance information. This improves the clustering
of similar TS guesses. Normalizes method sources to a unique, ordered,
lowercase list.
Ensures compatibility of the AutoTST installation script with both GNU and BSD versions of `sed`. It achieves this by conditionally including an empty string in the `sed -i` command for BSD systems, which require it.
Adds the angstrom to bohr conversion factor.

This conversion factor is required for the CREST adapter.
Introduces a function to generate unique project names for parallel test execution using xdist. This prevents collisions when cleaning up project directories, ensuring reliable test execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants