diff mbox

[ebiederm@xmission.com:,[REVIEW,2/3] fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)]

Message ID 20131129220709.GA16889@sergelap
State New
Headers show

Commit Message

Serge E. Hallyn Nov. 29, 2013, 10:07 p.m. UTC
Hi,

this patch is in
https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git/log/?h=for-next
(and I assume in linux-next).  It's one of two patches needed in trusty
for containers.  This one enables lxc-attach to work in trusty.

----- Forwarded message from "Eric W. Biederman" <ebiederm@xmission.com> -----

Date: Tue, 26 Nov 2013 16:16:57 -0800
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>, Containers <containers@lists.linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, Aditya Kali <adityakali@google.com>, Oleg Nesterov
	<oleg@redhat.com>, Andy Lutomirski <luto@amacapital.net>
Subject: [REVIEW][PATCH 2/3] fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)


Serge Hallyn <serge.hallyn@ubuntu.com> writes:
> Hi Oleg,
>
> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> breaks lxc-attach in 3.12.  That code forks a child which does
> setns() and then does a clone(CLONE_PARENT).  That way the
> grandchild can be in the right namespaces (which the child was
> not) and be a child of the original task, which is the monitor.
>
> lxc-attach in 3.11 was working fine with no side effects that I
> could see.  Is there a real danger in allowing CLONE_PARENT
> when current->nsproxy->pidns_for_children is not our pidns,
> or was this done out of an "over-abundance of caution"?  Can we
> safely revert that new extra check?

The two fundamental things I know we can not allow are:
- A shared signal queue aka CLONE_THREAD.  Because we compute the pid
  and uid of the signal when we place it in the queue.

- Changing the pid and by extention pid_namespace of an existing
  process.

>From a parents perspective there is nothing special about the pid
namespace, to deny CLONE_PARENT, because the parent simply won't know or
care.

>From the childs perspective all that is special really are shared signal
queues.

User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks
in different pid namespaces is almost certainly going to break because
it is complicated.  But shared signal handlers can look at per thread
information to know which pid namespace a process is in, so I don't know
of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads
at the kernel level.  It would be absolutely stupid to implement but
that is a different thing.

So hmm.

Because it can do no harm, and because it is a regression let's remove
the CLONE_PARENT check and send it stable.

Cc: stable@vger.kernel.org
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/fork.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Andy Whitcroft Dec. 2, 2013, 11:15 a.m. UTC | #1
On Fri, Nov 29, 2013 at 04:07:09PM -0600, Serge Hallyn wrote:
> Hi,
> 
> this patch is in
> https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git/log/?h=for-next
> (and I assume in linux-next).  It's one of two patches needed in trusty
> for containers.  This one enables lxc-attach to work in trusty.
> 
> ----- Forwarded message from "Eric W. Biederman" <ebiederm@xmission.com> -----
> 
> Date: Tue, 26 Nov 2013 16:16:57 -0800
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> To: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Gao feng <gaofeng@cn.fujitsu.com>, Containers <containers@lists.linux-foundation.org>,
> 	linux-fsdevel@vger.kernel.org, Aditya Kali <adityakali@google.com>, Oleg Nesterov
> 	<oleg@redhat.com>, Andy Lutomirski <luto@amacapital.net>
> Subject: [REVIEW][PATCH 2/3] fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)
> 
> 
> Serge Hallyn <serge.hallyn@ubuntu.com> writes:
> > Hi Oleg,
> >
> > commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> > "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> > breaks lxc-attach in 3.12.  That code forks a child which does
> > setns() and then does a clone(CLONE_PARENT).  That way the
> > grandchild can be in the right namespaces (which the child was
> > not) and be a child of the original task, which is the monitor.
> >
> > lxc-attach in 3.11 was working fine with no side effects that I
> > could see.  Is there a real danger in allowing CLONE_PARENT
> > when current->nsproxy->pidns_for_children is not our pidns,
> > or was this done out of an "over-abundance of caution"?  Can we
> > safely revert that new extra check?
> 
> The two fundamental things I know we can not allow are:
> - A shared signal queue aka CLONE_THREAD.  Because we compute the pid
>   and uid of the signal when we place it in the queue.
> 
> - Changing the pid and by extention pid_namespace of an existing
>   process.
> 
> >From a parents perspective there is nothing special about the pid
> namespace, to deny CLONE_PARENT, because the parent simply won't know or
> care.
> 
> >From the childs perspective all that is special really are shared signal
> queues.
> 
> User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks
> in different pid namespaces is almost certainly going to break because
> it is complicated.  But shared signal handlers can look at per thread
> information to know which pid namespace a process is in, so I don't know
> of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads
> at the kernel level.  It would be absolutely stupid to implement but
> that is a different thing.
> 
> So hmm.
> 
> Because it can do no harm, and because it is a regression let's remove
> the CLONE_PARENT check and send it stable.
> 
> Cc: stable@vger.kernel.org
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Acked-by: Andy Lutomirski <luto@amacapital.net>
> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  kernel/fork.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 728d5be9548c..f82fa2ee7581 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1171,7 +1171,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  	 * do not allow it to share a thread group or signal handlers or
>  	 * parent with the forking task.
>  	 */
> -	if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
> +	if (clone_flags & CLONE_SIGHAND) {
>  		if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
>  		    (task_active_pid_ns(current) !=
>  				current->nsproxy->pid_ns_for_children))
> -- 

This patch is the upstream fix for which we are already carrying a
revert.  I will unrevert that and replace same with this patch.

Applied to Trusty unstable.

-apw
diff mbox

Patch

diff --git a/kernel/fork.c b/kernel/fork.c
index 728d5be9548c..f82fa2ee7581 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1171,7 +1171,7 @@  static struct task_struct *copy_process(unsigned long clone_flags,
 	 * do not allow it to share a thread group or signal handlers or
 	 * parent with the forking task.
 	 */
-	if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
+	if (clone_flags & CLONE_SIGHAND) {
 		if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
 		    (task_active_pid_ns(current) !=
 				current->nsproxy->pid_ns_for_children))