storage, overlay: stop using symlinks for new layers#666
storage, overlay: stop using symlinks for new layers#666giuseppe wants to merge 2 commits intocontainers:mainfrom
Conversation
9827306 to
dff75cb
Compare
|
@mtrmac I believe you are going to like this PR :) |
|
At a glance, yes I do :) |
Luap99
left a comment
There was a problem hiding this comment.
I wonder, do we have a test case that mounts a 500 layer overlay? If not should we have one to ensure mounting always works?
|
Mounting with the new code gives always shorter paths since we use the fd number as the full path instead of a symlink that is still an absolute path. I think this code was left there just for historical reasons and there is no case where it is better |
kolyshkin
left a comment
There was a problem hiding this comment.
Maybe it makes sense to add comments marking the remaining "legacy" code paths (dealing with l/ symlinks) as such, so they can be removed in the future. Otherwise it's hard to grok for future readers as it is -- we don't create any links but we handle them.
One other thing is, I'm not sure if we need the entire fork/exec open lower fds. This is a further optimization but maybe we can just do it in the same process? I mean, the fork/exec was done because of chdir, but it looks like we no longer need it (yet it's left there, and that's another part of "legacy" code I guess). So, maybe we should call mountOverlayFrom when working with legacy l/ symlinks, and do a via-fds in the same process?
As for the tests, I think this is covered by existing @test "allow storing images with more than 127 layers" which creates 300 layers (although I haven't checked it if actually crosses the pagesize threshold).
storage/drivers/overlay/overlay.go
Outdated
|
|
||
| func (d *Driver) create(id, parent string, opts *graphdriver.CreateOpts, readOnly bool) (retErr error) { | ||
| dir, homedir, _ := d.dir2(id, readOnly) | ||
| dir, _, _ := d.dir2(id, readOnly) |
There was a problem hiding this comment.
BTW this removes the only user of the second value returned from dir2, so maybe it makes sense to drop it (perhaps in a separate subsequent commit).
There was a problem hiding this comment.
thanks for spotting it, fixed now!
We do have one for 300 layers, see storage/tests/layers.bats, although as I said I haven't checked we're actually crossing the pagesize threshold with it. |
|
@giuseppe do you spin up a PR in Podman to see how this does there? |
e3f0bdb to
3391780
Compare
fixed, and opened a test PR here: containers/podman#28164 |
|
I restarted the failing test here, it looked to be a time out flake. |
that failure is expected because we are not rebuilding Buildah. Buildah fails to find the symlink since we are not creating it anymore. |
|
@Luap99 @mtrmac @kolyshkin PTAL |
Well how do you expect us to get CI passing then? We should not merge this if it breaks CI tests. i.e. this test is using skopeo for example |
|
ok let's make it in two phases, I'll change this PR to not use the symlinks but still create them |
a1bf17b to
171a0bc
Compare
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
Podman tests pass now |
171a0bc to
b8f4b1f
Compare
kolyshkin
left a comment
There was a problem hiding this comment.
Assuming the new functionality is covered by existing tests, LGTM
b8f4b1f to
e345324
Compare
Write a new "lower-layers" file alongside the existing "lower" file. It stores layer IDs directly instead of l/ symlink references; the reading side appends "/diff" itself. When present, lower-layers is preferred over lower. The old lower file is still written so that older tools (buildah, skopeo) continue to work. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
e345324 to
823a7b8
Compare
this is the limit hardcoded in the kernel. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
|
I think this is not immediately urgent if this PR will keep creating the link files, but per #703 (and code linked from there), the That would need a re-think/re-design before we remove the |
Once we decide to drop completely the |
|
The problem is the other way: if there is no |
|
… and the way we detect which graph driver is “currently in use”. |
I see. Could we use |
|
That all works but those are untransparent implementation hacks. What are we doing here? If the directory presence matters it should be consciously managed as specific goal. Or move the information elsewhere, into a real metadata file, perhaps |
|
sure, we can handle that with an explicit file to prevent If there are no layers/containers, is the storage driver really used? Triggering again the heuristic to pick a driver is really the wrong thing to do? Upgrading from |
The overlay driver maintained a l/ directory of symlinks solely to shorten mount option strings below the kernel's page-size limit.
This is unnecessary since mountOverlayFrom() already handles long mount options via fd-based mounting.
New layers now store relative diff paths (e.g. "layer-id/diff") directly in the lower file instead of short symlink references.
Old-format layers with l/ symlinks continue to work.