-
Notifications
You must be signed in to change notification settings - Fork 11
ephemeral: Replace systemd.volatile=overlay with fine-grained mounts #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request replaces the broad systemd.volatile=overlay with more granular, fine-grained tmpfs and overlayfs mounts for /var and /etc respectively. This is a well-reasoned change that solves a real-world issue with nested overlayfs when using podman inside the ephemeral VM. The implementation using systemd credentials injected via SMBIOS is clean and effective. The addition of an integration test to verify the new mount layout is excellent. My feedback is minor, focusing on improving code consistency and fixing a small typo in a comment.
fe6c2e0 to
b3cd27b
Compare
67695e6 to
5516e29
Compare
Instead of using systemd.volatile=overlay which overlaid all of / with a single tmpfs-backed overlayfs, set up /etc and /var separately: - /etc: overlayfs with tmpfs upper (transient changes, lost on reboot) - /var: real tmpfs with content copied from image (not overlayfs) The key benefit is that /var is now a real tmpfs, allowing podman to use overlayfs for container storage inside /var/lib/containers. With the old approach, the nested overlayfs caused "too many levels of symbolic links" errors. Implementation: The initramfs units are embedded in a CPIO archive that gets appended to the existing initramfs. This uses the Linux kernel's ability to concatenate multiple CPIO archives. Services running in initramfs (before switch-root): - bcvk-etc-overlay.service: Sets up overlay on /sysroot/etc using a bind-mounted lowerdir to avoid self-referential mount issues on older kernels. Uses index=off,metacopy=off for virtiofs compat. - bcvk-var-ephemeral.service: Copies /sysroot/var to tmpfs and bind mounts it back. - bcvk-copy-units.service: Copies bcvk-journal-stream.service to /run/systemd/system/ for systemd <256 compatibility. The /run tmpfs is preserved across switch-root via MS_MOVE. For systemd 256+, the journal-stream unit is created via SMBIOS credentials (systemd.extra-unit.*). For older versions like CentOS Stream 9 (systemd 252), the unit is copied from the initramfs since credential-based unit creation isn't supported. The execute service target is changed from default.target to multi-user.target with ConditionPathExists=!/etc/initrd-release to ensure it runs after switch-root, not in the initramfs. This is Phase 1 of issue bootc-dev#22, making ephemeral VMs more bootc-like. SELinux is still disabled (selinux=0); Phase 2 will add composefs support to enable proper SELinux labeling. Closes: bootc-dev#22 (Phase 1) Assisted-by: OpenCode (Sonnet 4) Signed-off-by: Colin Walters <walters@verbum.org>
5516e29 to
cbec809
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the ephemeral VM setup to use fine-grained mounts for /etc and /var instead of a single systemd.volatile=overlay. This change is crucial for allowing Podman to use overlayfs for container storage inside /var/lib/containers, resolving previous 'too many levels of symbolic links' errors. The implementation introduces a new cpio module to embed systemd units directly into the initramfs, ensuring compatibility with older systemd versions. Integration tests have been added to verify the new mount layout. Overall, the changes are well-structured and address a significant functional limitation.
| let buf = write_directory(buf, "usr").unwrap(); | ||
| let buf = write_directory(buf, "usr/lib").unwrap(); | ||
| let buf = write_directory(buf, "usr/lib/systemd").unwrap(); | ||
| let buf = write_directory(buf, "usr/lib/systemd/system").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of unwrap() here can lead to a panic with a generic message if directory creation fails. While Vec<u8> as a writer should not fail, the io::Result<W> return type indicates a potential failure. Consider replacing unwrap() with expect() to provide more context in case of a panic, or implement more robust error handling if these operations could genuinely fail in a recoverable way.
For example: write_directory(buf, "usr").expect("Failed to write 'usr' directory to CPIO archive")
| let buf = write_directory(buf, "usr").unwrap(); | |
| let buf = write_directory(buf, "usr/lib").unwrap(); | |
| let buf = write_directory(buf, "usr/lib/systemd").unwrap(); | |
| let buf = write_directory(buf, "usr/lib/systemd/system").unwrap(); | |
| let buf = write_directory(buf, "usr").expect("Failed to write 'usr' directory to CPIO archive"); | |
| let buf = write_directory(buf, "usr/lib").expect("Failed to write 'usr/lib' directory to CPIO archive"); | |
| let buf = write_directory(buf, "usr/lib/systemd").expect("Failed to write 'usr/lib/systemd' directory to CPIO archive"); | |
| let buf = write_directory(buf, "usr/lib/systemd/system").expect("Failed to write 'usr/lib/systemd/system' directory to CPIO archive"); |
| let buf = write_file( | ||
| buf, | ||
| "usr/lib/systemd/system/bcvk-etc-overlay.service", | ||
| etc_overlay_content.as_bytes(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| let buf = write_file( | ||
| buf, | ||
| "usr/lib/systemd/system/bcvk-var-ephemeral.service", | ||
| var_ephemeral_content.as_bytes(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| let buf = write_file( | ||
| buf, | ||
| "usr/lib/systemd/system/bcvk-copy-units.service", | ||
| copy_units_content.as_bytes(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| let buf = write_file( | ||
| buf, | ||
| "usr/lib/systemd/system/bcvk-journal-stream.service", | ||
| journal_stream_content.as_bytes(), | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| // Drop-in configs to pull units into initrd-fs.target | ||
| let buf = write_directory(buf, "usr/lib/systemd/system/initrd-fs.target.d").unwrap(); | ||
|
|
||
| let buf = write_file( | ||
| buf, | ||
| "usr/lib/systemd/system/initrd-fs.target.d/bcvk-etc-overlay.conf", | ||
| b"[Unit]\nWants=bcvk-etc-overlay.service\n", | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| let buf = write_file( | ||
| buf, | ||
| "usr/lib/systemd/system/initrd-fs.target.d/bcvk-var-ephemeral.conf", | ||
| b"[Unit]\nWants=bcvk-var-ephemeral.service\n", | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| let buf = write_file( | ||
| buf, | ||
| "usr/lib/systemd/system/initrd-fs.target.d/bcvk-copy-units.conf", | ||
| b"[Unit]\nWants=bcvk-copy-units.service\n", | ||
| ) | ||
| .unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of repetition in creating CPIO entries for directories and files. This block could be refactored into a helper function or by iterating over a list of paths and contents. This would improve readability and maintainability, especially if more units or directories need to be added in the future.
For example, you could define a list of (path, content_option) tuples and loop through them.
let entries = [
("usr/lib/systemd/system/bcvk-etc-overlay.service", Some(etc_overlay_content.as_bytes())),
("usr/lib/systemd/system/bcvk-var-ephemeral.service", Some(var_ephemeral_content.as_bytes())),
("usr/lib/systemd/system/bcvk-copy-units.service", Some(copy_units_content.as_bytes())),
("usr/lib/systemd/system/bcvk-journal-stream.service", Some(journal_stream_content.as_bytes())),
("usr/lib/systemd/system/initrd-fs.target.d/bcvk-etc-overlay.conf", Some(b"[Unit]\nWants=bcvk-etc-overlay.service\n")),
("usr/lib/systemd/system/initrd-fs.target.d/bcvk-var-ephemeral.conf", Some(b"[Unit]\nWants=bcvk-var-ephemeral.service\n")),
("usr/lib/systemd/system/initrd-fs.target.d/bcvk-copy-units.conf", Some(b"[Unit]\nWants=bcvk-copy-units.service\n")),
];
let mut buf = write_directory(buf, "usr").expect("Failed to write 'usr' directory");
buf = write_directory(buf, "usr/lib").expect("Failed to write 'usr/lib' directory");
buf = write_directory(buf, "usr/lib/systemd").expect("Failed to write 'usr/lib/systemd' directory");
buf = write_directory(buf, "usr/lib/systemd/system").expect("Failed to write 'usr/lib/systemd/system' directory");
buf = write_directory(buf, "usr/lib/systemd/system/initrd-fs.target.d").expect("Failed to write 'initrd-fs.target.d' directory");
for (path, content) in entries {
if let Some(c) = content {
buf = write_file(buf, path, c).expect(&format!("Failed to write file {} to CPIO archive", path));
} else {
buf = write_directory(buf, path).expect(&format!("Failed to write directory {} to CPIO archive", path));
}
}| let source_initramfs_path = initramfs_path | ||
| .ok_or_else(|| eyre!("No initramfs found in /run/source-image/usr/lib/modules"))?; | ||
|
|
||
| fs::File::create(&kernel_mount)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line creates an empty file at kernel_mount. Since the very next step is a mount --bind command that will mount the actual kernel file to this location, creating an empty file beforehand is redundant and unnecessary. The mount command will handle the creation of the mount point if it doesn't exist, or use the existing one.
Instead of using systemd.volatile=overlay which overlaid all of / with a single tmpfs-backed overlayfs, set up /etc and /var separately:
The key benefit is that /var is now a real tmpfs, allowing podman to use overlayfs for container storage inside /var/lib/containers. With the old approach, the nested overlayfs caused "too many levels of symbolic links" errors.
Implementation uses systemd credentials to inject units that run in the initramfs before switch-root:
Both units use ConditionPathExists=/etc/initrd-release to only run in the initramfs context.
This is Phase 1 of issue #22, making ephemeral VMs more bootc-like. SELinux is still disabled (selinux=0); Phase 2 will add composefs support to enable proper SELinux labeling.
xref: #22 (Phase 1)
Assisted-by: OpenCode (Sonnet 4)