Skip to content

Avoid real system calls#842

Merged
amaslenn merged 2 commits intomainfrom
am/slurm
Mar 20, 2026
Merged

Avoid real system calls#842
amaslenn merged 2 commits intomainfrom
am/slurm

Conversation

@amaslenn
Copy link
Contributor

Summary

Avoid real system calls. Most of them were related to supports_gpu_directives_cache == None. It slows down tests on systems with slurm controller and in agents' sandbox.

Test Plan

  1. CI

Additional Notes

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Added GPU-directives cache initialization to the SlurmSystem pytest fixture and updated SlurmSystem tests: refactored one test to avoid a fixture, adjusted mocks, initialized cache state in a test, and added a new test for probe-error fallback behavior and logging.

Changes

Cohort / File(s) Summary
Fixture change
tests/conftest.py
Set supports_gpu_directives_cache = True on the slurm_system pytest fixture.
Slurm system tests
tests/systems/slurm/test_system.py
Removed use of the grouped_nodes fixture in one test, replaced with local empty mapping and mocks for SlurmSystem.update and SlurmSystem.group_nodes_by_state; reset supports_gpu_directives_cache before asserting behavior; added test_supports_gpu_directives_defaults_to_true_on_probe_error which patches fetch_command_output, asserts fallback to True, cache set to True, warning log content, and that fetch_command_output was called with "scontrol show config".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 I nudged the cache, a tiny tweak,
Tests hopped forward, brave not meek.
When probes grumble and logs complain,
I default True — hop on the train! 🥕✨

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Avoid real system calls' directly relates to the main objective of the pull request, which is to eliminate real system calls in tests, particularly those related to GPU directives caching.
Description check ✅ Passed The description explains that the PR avoids real system calls related to supports_gpu_directives_cache == None, which matches the changes shown in the raw summary where the fixture is configured with cache defaults and tests are mocked.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch am/slurm
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.

Add a Ruff configuration file to your project to customize how CodeRabbit runs ruff.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR improves test isolation by eliminating real Slurm system calls that were firing during the test suite — especially on systems with an accessible Slurm controller or inside sandboxed CI agents. The root cause was that supports_gpu_directives_cache defaulted to None on the shared slurm_system fixture, causing the supports_gpu_directives property to trigger an actual scontrol show config invocation, and get_available_nodes_from_group would call update()sinfo/squeue.

Key changes:

  • tests/conftest.py: Pre-seeds supports_gpu_directives_cache = True in the shared slurm_system fixture so the cache is always warm; tests that specifically exercise the property logic now explicitly reset it to None.
  • tests/systems/slurm/test_system.py: Refactors test_get_available_nodes_exceeding_limit_no_callstack to patch SlurmSystem.update and SlurmSystem.group_nodes_by_state, converting it from a test that depended on live Slurm infrastructure into a fully isolated unit test with proper mock assertions. Adds a new test (test_supports_gpu_directives_defaults_to_true_on_probe_error) covering the existing error-handling path that defaults to True when scontrol returns a non-empty stderr. Also removes the legacy typing.Dict/typing.List imports in favour of built-in generic aliases.

Confidence Score: 5/5

  • This PR is safe to merge — it only modifies test infrastructure with no changes to production code.
  • All changes are confined to the test suite. The fixture pre-seeding is opt-out (tests that need live cache logic reset it to None explicitly). The new mocks correctly intercept the two methods that previously made real system calls. The new test adds useful coverage for an existing error-handling branch. The only minor concern is an overly permissive kwargs assertion, which is a style issue and not a correctness bug.
  • No files require special attention.

Important Files Changed

Filename Overview
tests/conftest.py Adds supports_gpu_directives_cache = True to the shared slurm_system fixture to prevent real scontrol show config calls in any test that consumes the fixture.
tests/systems/slurm/test_system.py Refactors test_get_available_nodes_exceeding_limit_no_callstack to mock update and group_nodes_by_state, removing the unused grouped_nodes parameter; resets the cache to None in the GPU-directive tests that need live property logic; adds a new test for the error-handling path; cleans up legacy typing imports.

Sequence Diagram

sequenceDiagram
    participant Test
    participant slurm_system fixture
    participant SlurmSystem
    participant OS (scontrol/sinfo/squeue)

    Note over Test,OS (scontrol/sinfo/squeue): BEFORE this PR
    Test->>slurm_system fixture: request fixture (cache = None)
    Test->>SlurmSystem: get_available_nodes_from_group(...)
    SlurmSystem->>OS (scontrol/sinfo/squeue): sinfo / squeue (real call)
    OS (scontrol/sinfo/squeue)-->>SlurmSystem: result
    Test->>SlurmSystem: supports_gpu_directives (cache miss)
    SlurmSystem->>OS (scontrol/sinfo/squeue): scontrol show config (real call)
    OS (scontrol/sinfo/squeue)-->>SlurmSystem: result

    Note over Test,OS (scontrol/sinfo/squeue): AFTER this PR
    Test->>slurm_system fixture: request fixture (cache = True)
    Test->>SlurmSystem: get_available_nodes_from_group(...)
    SlurmSystem->>SlurmSystem: mock_update (no-op)
    SlurmSystem->>SlurmSystem: mock_group_nodes_by_state (returns empty)
    Test->>SlurmSystem: supports_gpu_directives (cache hit → True)
    Note over OS (scontrol/sinfo/squeue): No real OS calls made
Loading

Comments Outside Diff (1)

  1. tests/systems/slurm/test_system.py, line 167-168 (link)

    Overly permissive kwargs assertion

    group_nodes_by_state currently has the signature (self, partition_name: str, group_name: str) — no exclude_nodes parameter exists. Because the production code calls it with only positional arguments, kwargs will always be {}. Allowing {"exclude_nodes": None} as an alternative makes the assertion vacuously pass any future call that passes exclude_nodes=None, silently masking a regression if the call site changes in an unexpected way.

    Tightening this to check for exactly {} would make the test a stricter guard:

    If exclude_nodes support is intentionally planned for a future change, a brief comment explaining this would help future readers.

Last reviewed commit: "address linting issu..."

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/systems/slurm/test_system.py (1)

19-19: ⚠️ Potential issue | 🟡 Minor

Fix CI lint failure by removing unused typing imports.

Line 19 currently keeps Dict/List unused, and CI already reports ruff-check auto-removing it. Please commit that cleanup so pre-commit passes.

Suggested diff
-from typing import Dict, List
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/systems/slurm/test_system.py` at line 19, Remove the unused typing
imports by deleting "Dict" and "List" from the typing import (the line "from
typing import Dict, List"); if no other names are needed from typing, remove the
entire import line, otherwise only keep the used symbols; save and run the
linter (ruff/ruff-check) to confirm the unused-import warning is resolved and
commit the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/systems/slurm/test_system.py`:
- Around line 163-169: The test currently calls
slurm_system.get_available_nodes_from_group(partition_name, group_name,
num_nodes) but discards its return; update the test to capture the return value
and assert it equals [] (empty list) to lock down fallback behavior — e.g., call
get_available_nodes_from_group(...) into a variable like result and assert
result == [] while keeping the existing mock and log assertions (refer to
get_available_nodes_from_group, partition_name, group_name, num_nodes).

---

Outside diff comments:
In `@tests/systems/slurm/test_system.py`:
- Line 19: Remove the unused typing imports by deleting "Dict" and "List" from
the typing import (the line "from typing import Dict, List"); if no other names
are needed from typing, remove the entire import line, otherwise only keep the
used symbols; save and run the linter (ruff/ruff-check) to confirm the
unused-import warning is resolved and commit the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 45a3e744-8c4b-44bb-9e08-c015ab3b1879

📥 Commits

Reviewing files that changed from the base of the PR and between ef83120 and 9929317.

📒 Files selected for processing (2)
  • tests/conftest.py
  • tests/systems/slurm/test_system.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/systems/slurm/test_system.py (1)

162-171: ⚠️ Potential issue | 🟡 Minor

Assert fallback return value in the no-callstack test.

Line 162 calls get_available_nodes_from_group(...) but discards the returned value. Capture it and assert [] so the fallback contract is explicitly locked.

Suggested diff
-        slurm_system.get_available_nodes_from_group(partition_name, group_name, num_nodes)
+        nodes = slurm_system.get_available_nodes_from_group(partition_name, group_name, num_nodes)
+        assert nodes == []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/systems/slurm/test_system.py` around lines 162 - 171, Capture the
return value from the call to get_available_nodes_from_group(partition_name,
group_name, num_nodes) and assert it equals [] to lock the fallback contract;
e.g., assign the call to a variable (e.g., result) immediately after the
invocation and add assert result == [] while keeping the existing mock
assertions and log-message assertion intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/systems/slurm/test_system.py`:
- Around line 162-171: Capture the return value from the call to
get_available_nodes_from_group(partition_name, group_name, num_nodes) and assert
it equals [] to lock the fallback contract; e.g., assign the call to a variable
(e.g., result) immediately after the invocation and add assert result == []
while keeping the existing mock assertions and log-message assertion intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f54482bc-7047-4a12-b00d-ad4f2ea0fdf7

📥 Commits

Reviewing files that changed from the base of the PR and between 9929317 and 398c9ff.

📒 Files selected for processing (1)
  • tests/systems/slurm/test_system.py

@amaslenn amaslenn requested a review from podkidyshev March 20, 2026 10:43
@amaslenn amaslenn merged commit 9916d02 into main Mar 20, 2026
5 checks passed
@amaslenn amaslenn deleted the am/slurm branch March 20, 2026 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants