Message ID | 20210225194702.6113-1-pvorel@suse.cz |
---|---|
State | New |
Headers | show |
Series | [RFC] Linux: Workaround seccomp() issue with faccessat2() | expand |
Hi, On Thu, Feb 25, 2021 at 08:47:02PM +0100, Petr Vorel wrote: > 3d3ab573a5 ("Linux: Use faccessat2 to implement faccessat (bug 18683)") > started to use faccessat2() which breaks docker/podman/... containers > with guest running glibc 2.33 running on host with older kernel and are > built with older libseccomp. > > See also: https://bugzilla.opensuse.org/show_bug.cgi?id=1182451#c17 > > Signed-off-by: Petr Vorel <pvorel@suse.cz> > --- > Hi, > > I admit that this is a very ugly workaround and wouldn't be surprised if > you just don't care about seccomp() incompatibilities. But it'd be nice > to have unified approach for this incompatibility, as it hits any distro > with glibc 2.33 (currently openSUSE Tumbleweed, Arch Linux, Fedora > rawhide). And after some time (when old LTS distros EOL) this crap could be removed. > > More info: > https://github.com/opencontainers/runc/pull/2750 > https://github.com/seccomp/libseccomp/issues/314 > > Kind regards, > Petr Petr, you must have missed the whole discussion on this subject [1][2], the consensus was that problematic container runtimes need to be fixed to make their seccomp filters return ENOSYS for unknown syscalls. [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119955.html [2] https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/T/#u
Hi Dmitry, > Petr, you must have missed the whole discussion on this subject [1][2], > the consensus was that problematic container runtimes need to be fixed > to make their seccomp filters return ENOSYS for unknown syscalls. Thanks for info and sorry for spam then. Kind regards, Petr > [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119955.html > [2] https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/T/#u
On 26 Feb 2021 05:11, Petr Vorel wrote: > Hi Dmitry, > > Petr, you must have missed the whole discussion on this subject [1][2], > > the consensus was that problematic container runtimes need to be fixed > > to make their seccomp filters return ENOSYS for unknown syscalls. > > Thanks for info and sorry for spam then. no need to apologize. can't expect everyone to read everything all the time. -mike
On 2021-02-26, Dmitry V. Levin <ldv@altlinux.org> wrote: > On Thu, Feb 25, 2021 at 08:47:02PM +0100, Petr Vorel wrote: > > 3d3ab573a5 ("Linux: Use faccessat2 to implement faccessat (bug 18683)") > > started to use faccessat2() which breaks docker/podman/... containers > > with guest running glibc 2.33 running on host with older kernel and are > > built with older libseccomp. > > > > See also: https://bugzilla.opensuse.org/show_bug.cgi?id=1182451#c17 > > > > Signed-off-by: Petr Vorel <pvorel@suse.cz> > > --- > > Hi, > > > > I admit that this is a very ugly workaround and wouldn't be surprised if > > you just don't care about seccomp() incompatibilities. But it'd be nice > > to have unified approach for this incompatibility, as it hits any distro > > with glibc 2.33 (currently openSUSE Tumbleweed, Arch Linux, Fedora > > rawhide). And after some time (when old LTS distros EOL) this crap could be removed. > > > > More info: > > https://github.com/opencontainers/runc/pull/2750 > > https://github.com/seccomp/libseccomp/issues/314 > > > > Kind regards, > > Petr > > Petr, you must have missed the whole discussion on this subject [1][2], > the consensus was that problematic container runtimes need to be fixed > to make their seccomp filters return ENOSYS for unknown syscalls. It should also be noted that we fixed this in runc a month ago[1], which means that it's up to distributions and cloud vendors to update their runc packages to the latest version or backport the patch. Docker's packaging hasn't been updated to use the latest runc yet (that'll happen in the next patch release), but distributions can ship newer runc versions -- that's what we do in openSUSE. [1]: https://github.com/opencontainers/runc/pull/2750
* Aleksa Sarai: > It should also be noted that we fixed this in runc a month ago[1], which > means that it's up to distributions and cloud vendors to update their > runc packages to the latest version or backport the patch. > > Docker's packaging hasn't been updated to use the latest runc yet > (that'll happen in the next patch release), but distributions can ship > newer runc versions -- that's what we do in openSUSE. > > [1]: https://github.com/opencontainers/runc/pull/2750 There are some indications that not all container runtimes will pick up the runc kludge (thanks for developing that by the way). So it's likely that the general issue will be with us for a while longer. Maybe the competitive pressure from other working container runtimes will encourage other re-evaluate their approach, I don't know. We still don't plan to throw in downstream-only glibc patches to paper over this (given that it's been rejected by kernel and glibc developers alike, I really think it's the wrong way to go). So far management isn't breathing down our necks. Thanks, Florian
Hi all, > There are some indications that not all container runtimes will pick up > the runc kludge (thanks for developing that by the way). So it's likely > that the general issue will be with us for a while longer. Maybe the > competitive pressure from other working container runtimes will > encourage other re-evaluate their approach, I don't know. Hopefully. > We still don't plan to throw in downstream-only glibc patches to paper > over this (given that it's been rejected by kernel and glibc developers > alike, I really think it's the wrong way to go). So far management > isn't breathing down our necks. As workaround exists (for openSUSE using podman with newest runc v1.0.0-rc93) I understand the reluctance to accept a workaround. It just reminds me occasional musl approach to be correct no matter what problems it brings to users. Kind regards, Petr > Thanks, > Florian
diff --git a/sysdeps/unix/sysv/linux/faccessat.c b/sysdeps/unix/sysv/linux/faccessat.c index 13160d3249..f01c59b6e7 100644 --- a/sysdeps/unix/sysv/linux/faccessat.c +++ b/sysdeps/unix/sysv/linux/faccessat.c @@ -30,9 +30,22 @@ __faccessat (int fd, const char *file, int mode, int flag) #if __ASSUME_FACCESSAT2 return ret; #else - if (ret == 0 || errno != ENOSYS) + if (ret == 0 || (errno != ENOSYS && errno != EPERM)) return ret; + /* + * Check seccomp() issue with faccessat2(). Additional EPERM means seccomp() + * in use, ENOSYS or EBADF real EPERM. + */ + if (errno == EPERM) { + int backup = errno; + INLINE_SYSCALL_CALL (faccessat2, -2, ".", 0, 0); + int err = errno; + errno = backup; + if (err != EPERM) + return ret; + } + if (flag & ~(AT_SYMLINK_NOFOLLOW | AT_EACCESS)) return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
3d3ab573a5 ("Linux: Use faccessat2 to implement faccessat (bug 18683)") started to use faccessat2() which breaks docker/podman/... containers with guest running glibc 2.33 running on host with older kernel and are built with older libseccomp. See also: https://bugzilla.opensuse.org/show_bug.cgi?id=1182451#c17 Signed-off-by: Petr Vorel <pvorel@suse.cz> --- Hi, I admit that this is a very ugly workaround and wouldn't be surprised if you just don't care about seccomp() incompatibilities. But it'd be nice to have unified approach for this incompatibility, as it hits any distro with glibc 2.33 (currently openSUSE Tumbleweed, Arch Linux, Fedora rawhide). And after some time (when old LTS distros EOL) this crap could be removed. More info: https://github.com/opencontainers/runc/pull/2750 https://github.com/seccomp/libseccomp/issues/314 Kind regards, Petr sysdeps/unix/sysv/linux/faccessat.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)