From patchwork Tue Aug 7 15:01:27 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Serge E. Hallyn" X-Patchwork-Id: 175898 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from chlorine.canonical.com (chlorine.canonical.com [91.189.94.204]) by ozlabs.org (Postfix) with ESMTP id E819D2C00A3 for ; Wed, 8 Aug 2012 21:43:37 +1000 (EST) Received: from localhost ([127.0.0.1] helo=chlorine.canonical.com) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1Sz4fd-0006bM-NE; Wed, 08 Aug 2012 11:43:25 +0000 Received: from 50-56-35-84.static.cloud-ips.com ([50.56.35.84] helo=mail.hallyn.com) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1SylGY-0003ee-7x for kernel-team@lists.ubuntu.com; Tue, 07 Aug 2012 15:00:14 +0000 Received: by mail.hallyn.com (Postfix, from userid 1000) id 4EFFBC80E1; Tue, 7 Aug 2012 15:01:27 +0000 (UTC) Date: Tue, 7 Aug 2012 15:01:27 +0000 From: "Serge E. Hallyn" To: kernel-team@lists.ubuntu.com Subject: [berrange@redhat.com: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace] Message-ID: <20120807150127.GB5070@mail.hallyn.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-Mailman-Approved-At: Wed, 08 Aug 2012 11:43:24 +0000 Cc: james.hunt@canonical.com, stgraber@ubuntu.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.13 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kernel-team-bounces@lists.ubuntu.com Errors-To: kernel-team-bounces@lists.ubuntu.com (Hopefully my unsubscribed account can email kernel-team) Hi, this patch will probably not hit upstream, because the 'proper' fix is user namespaces. User namespaces however won't be ready until after quantal. So I'd like this patch to be applied in precise and quantal if possible. Problem: Containers are granted CAP_SYS_BOOT. The reboot path in the kernel checks whether you are in the initial pidns, and, if not, sends a signal to your parent indicating you were 'rebooted' or 'shut down'. So there is no danger of a container rebooting the host. However, CAP_SYS_BOOT also authorized kexec, without the pidns check. Therefore, containers are able to kexec a new kernel, which is obviously a bad thing. This patch prevents that by only allowing kexec from the initial pid namespace. It is nacked by Eric Biederman (but acked by me) because he feels this should be stopped by having the container in a private user namespace, with the kexec cap_sys_boot check targeted to the initial user namespace. As I said, that won't be doable during quantal timeframe. thanks, -serge ----- Forwarded message from "Daniel P. Berrange" ----- Date: Fri, 3 Aug 2012 11:53:04 +0100 From: "Daniel P. Berrange" To: linux-kernel@vger.kernel.org Cc: containers@lists.linux-foundation.org, "Daniel P. Berrange" , Serge Hallyn , Daniel Lezcano , Michael Kerrisk , "Eric W. Biederman" , Tejun Heo , Oleg Nesterov Subject: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace From: "Daniel P. Berrange" The following commit commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b Author: Daniel Lezcano Date: Wed Mar 28 14:42:51 2012 -0700 pidns: add reboot_pid_ns() to handle the reboot syscall introduced custom handling of the reboot() syscall when invoked from a non-initial PID namespace. The intent was that a process in a container can be allowed to keep CAP_SYS_BOOT and execute reboot() to shutdown/reboot just their private container, rather than the host. Unfortunately the kexec_load() syscall also relies on the CAP_SYS_BOOT capability. So by allowing a container to keep this capability to safely invoke reboot(), they mistakenly also gain the ability to use kexec_load(). The solution is to make kexec_load() return -EPERM if invoked from a PID namespace that is not the initial namespace Signed-off-by: Daniel P. Berrange Cc: Serge Hallyn Cc: Daniel Lezcano Cc: Michael Kerrisk Cc: "Eric W. Biederman" Cc: Tejun Heo Cc: Oleg Nesterov --- kernel/kexec.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/kexec.c b/kernel/kexec.c index 0668d58..b152bde 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -947,6 +947,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments, if (!capable(CAP_SYS_BOOT)) return -EPERM; + /* Processes in containers must not be allowed to load a new + * kernel, even if they have CAP_SYS_BOOT */ + if (task_active_pid_ns(current) != &init_pid_ns) + return -EPERM; + /* * Verify we have a legal set of flags * This leaves us room for future extensions.