Fix CI cgroupv2 failures by enabling cgroupv2 controller subtree control#2327
Fix CI cgroupv2 failures by enabling cgroupv2 controller subtree control#2327delthas wants to merge 1 commit intodevelopment/2.13from
Conversation
Hello delthas,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command: Alternatively, the |
There was a problem hiding this comment.
Pull request overview
Adjusts the end-to-end deploy composite action to proactively enable cgroup v2 controllers on GitHub-hosted runners, addressing recent CI breakages caused by runner/Docker/containerd cgroupv2 behavior changes (ZENKO-5194, runner-images#13684).
Changes:
- Add a deploy-step that attempts to enable cgroup v2 controllers (including
cpuset) viacgroup.subtree_controlbefore bootstrapping the kind cluster.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/actions/deploy/action.yaml
Outdated
| echo "+$c" | sudo tee /sys/fs/cgroup/system.slice/cgroup.subtree_control 2>/dev/null || true | ||
| done | ||
| fi | ||
| echo "Controllers enabled: $(cat /sys/fs/cgroup/cgroup.subtree_control)" |
There was a problem hiding this comment.
This step suppresses all errors from enabling controllers (2>/dev/null || true) but then logs Controllers enabled: ..., which can be misleading if writes failed (e.g., permissions / cgroup rules). Consider verifying that the required controller(s) (at least cpuset) are present in cgroup.subtree_control after attempting to enable them, and if not, emit a clear warning or fail fast with an actionable message so CI failures remain diagnosable.
| echo "Controllers enabled: $(cat /sys/fs/cgroup/cgroup.subtree_control)" | |
| enabled_controllers="$(cat /sys/fs/cgroup/cgroup.subtree_control 2>/dev/null || echo "")" | |
| echo "Controllers enabled: ${enabled_controllers}" | |
| if ! printf '%s\n' "${enabled_controllers}" | grep -qw "cpuset"; then | |
| echo "ERROR: Failed to enable required 'cpuset' cgroup v2 controller in /sys/fs/cgroup/cgroup.subtree_control." >&2 | |
| echo "This can cause JVM (JDK 11) and ZooKeeper failures inside containers on GitHub Actions runners." >&2 | |
| echo "Please ensure the runner allows enabling the 'cpuset' controller (cgroup v2 delegation) or disable this step." >&2 | |
| exit 1 | |
| fi |
.github/actions/deploy/action.yaml
Outdated
| echo "+$c" | sudo tee /sys/fs/cgroup/cgroup.subtree_control 2>/dev/null || true | ||
| done | ||
| if [ -f /sys/fs/cgroup/system.slice/cgroup.subtree_control ]; then | ||
| for c in $(cat /sys/fs/cgroup/system.slice/cgroup.controllers); do | ||
| echo "+$c" | sudo tee /sys/fs/cgroup/system.slice/cgroup.subtree_control 2>/dev/null || true |
There was a problem hiding this comment.
tee will echo each +$c back to stdout, which can add a lot of log noise in CI. Redirecting tee output to /dev/null (while keeping errors/warnings as appropriate) would keep logs cleaner and make failures easier to spot.
| echo "+$c" | sudo tee /sys/fs/cgroup/cgroup.subtree_control 2>/dev/null || true | |
| done | |
| if [ -f /sys/fs/cgroup/system.slice/cgroup.subtree_control ]; then | |
| for c in $(cat /sys/fs/cgroup/system.slice/cgroup.controllers); do | |
| echo "+$c" | sudo tee /sys/fs/cgroup/system.slice/cgroup.subtree_control 2>/dev/null || true | |
| echo "+$c" | sudo tee /sys/fs/cgroup/cgroup.subtree_control >/dev/null || true | |
| done | |
| if [ -f /sys/fs/cgroup/system.slice/cgroup.subtree_control ]; then | |
| for c in $(cat /sys/fs/cgroup/system.slice/cgroup.controllers); do | |
| echo "+$c" | sudo tee /sys/fs/cgroup/system.slice/cgroup.subtree_control >/dev/null || true |
0e3319b to
159cf07
Compare
159cf07 to
79158ea
Compare
Issue: ZENKO-5194
See: actions/runner-images#13684