diff mbox

pppd service crash in linux-3.13.6

Message ID 20140313170622.GA31206@redhat.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Oleg Nesterov March 13, 2014, 5:06 p.m. UTC
On 03/10, Peter Hurley wrote:
>
> [ +cc Oleg Nesterov ]

Thanks.

> The NULL ptr dereference is from following the current->nsproxy ptr
> in ppp_register_channel().
>
> This was broken by
>
> commit 8aac62706adaaf0fab02c4327761561c8bda9448
> Author: Oleg Nesterov <oleg@redhat.com>
> Date:   Fri Jun 14 21:09:49 2013 +0200
>
>     move exit_task_namespaces() outside of exit_notify()
>
> which moved the exit_task_namespaces(tsk) before disassociate_ctty().

Heh. OK, we can move it down after disassociate_ctty(), the original
motivation for that commit was the problem which was also (hopefully)
fixed by e7b2c406925273 "fput: task_work_add() can fail if the caller
has passed exit_task_work()".

In fact I think that it makes sense to move it down after
exit_task_work() anyway. But this is almost off-topic and I'd like to
avoid this right now.

OTOH, why we should delay disassociate_ctty? IOW, do you see any
potential problem with the trivial patch below?

And it seems that it makes sense to move (at least) check_stack_usage()
down, but this is offtopic too.

Oleg.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Peter Hurley March 13, 2014, 5:55 p.m. UTC | #1
Hi Oleg,

On 03/13/2014 01:06 PM, Oleg Nesterov wrote:
> On 03/10, Peter Hurley wrote:
>>
>> [ +cc Oleg Nesterov ]
>
> Thanks.
>
>> The NULL ptr dereference is from following the current->nsproxy ptr
>> in ppp_register_channel().
>>
>> This was broken by
>>
>> commit 8aac62706adaaf0fab02c4327761561c8bda9448
>> Author: Oleg Nesterov <oleg@redhat.com>
>> Date:   Fri Jun 14 21:09:49 2013 +0200
>>
>>      move exit_task_namespaces() outside of exit_notify()
>>
>> which moved the exit_task_namespaces(tsk) before disassociate_ctty().
>
> Heh. OK, we can move it down after disassociate_ctty(), the original
> motivation for that commit was the problem which was also (hopefully)
> fixed by e7b2c406925273 "fput: task_work_add() can fail if the caller
> has passed exit_task_work()".

I didn't look into what motivated the change; I will now though.

> In fact I think that it makes sense to move it down after
> exit_task_work() anyway. But this is almost off-topic and I'd like to
> avoid this right now.
>
> OTOH, why we should delay disassociate_ctty? IOW, do you see any
> potential problem with the trivial patch below?

I have no idea what kind of dependencies might exist between
task works, cgroup_exit() and all the teardown that disassociate_ctty()
does. I'll look into though.

> And it seems that it makes sense to move (at least) check_stack_usage()
> down, but this is offtopic too.

I agree that it makes sense to check the stack _after_ teardown code
runs, but all the arch-dependent exit_thread() code would need to be
audited first.

> Oleg.
>
> --- x/kernel/exit.c
> +++ kernel/exit.c/
> @@ -784,6 +784,8 @@ void do_exit(long code)
>   	exit_shm(tsk);
>   	exit_files(tsk);
>   	exit_fs(tsk);
> +	if (group_dead)
> +		disassociate_ctty(1);
>   	exit_task_namespaces(tsk);
>   	exit_task_work(tsk);
>   	check_stack_usage();
> @@ -799,13 +801,9 @@ void do_exit(long code)
>
>   	cgroup_exit(tsk, 1);
>
> -	if (group_dead)
> -		disassociate_ctty(1);
> -
>   	module_put(task_thread_info(tsk)->exec_domain->module);
>
>   	proc_exit_connector(tsk);
> -
>   	/*
>   	 * FIXME: do that only when needed, using sched_exit tracepoint
>   	 */

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Hurley March 14, 2014, 2:19 p.m. UTC | #2
[ +cc Eric Biederman ]

On 03/13/2014 01:55 PM, Peter Hurley wrote:
> Hi Oleg,
>
> On 03/13/2014 01:06 PM, Oleg Nesterov wrote:
>> On 03/10, Peter Hurley wrote:
>>>
>>> [ +cc Oleg Nesterov ]
>>
>> Thanks.
>>
>>> The NULL ptr dereference is from following the current->nsproxy ptr
>>> in ppp_register_channel().
>>>
>>> This was broken by
>>>
>>> commit 8aac62706adaaf0fab02c4327761561c8bda9448
>>> Author: Oleg Nesterov <oleg@redhat.com>
>>> Date:   Fri Jun 14 21:09:49 2013 +0200
>>>
>>>      move exit_task_namespaces() outside of exit_notify()
>>>
>>> which moved the exit_task_namespaces(tsk) before disassociate_ctty().
>>
>> Heh. OK, we can move it down after disassociate_ctty(), the original
>> motivation for that commit was the problem which was also (hopefully)
>> fixed by e7b2c406925273 "fput: task_work_add() can fail if the caller
>> has passed exit_task_work()".

Thanks for fixing that!
tty teardown (hangup from disassociate_ctty) requires fput() to work :)

> I didn't look into what motivated the change; I will now though.
>
>> In fact I think that it makes sense to move it down after
>> exit_task_work() anyway. But this is almost off-topic and I'd like to
>> avoid this right now.
>>
>> OTOH, why we should delay disassociate_ctty? IOW, do you see any
>> potential problem with the trivial patch below?

Won't work.

> I have no idea what kind of dependencies might exist between
> task works, cgroup_exit() and all the teardown that disassociate_ctty()
> does. I'll look into though.

cgroup_exit() can exec a userspace process (the notify_on_exit() facility)
which requires both namespace and tty facilities.
Plus, proc_exit_connector() uses netlink which uses user_ns.

I think it's better to have task namespaces valid up to exit_notify().

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Hurley March 14, 2014, 3:02 p.m. UTC | #3
On 03/14/2014 10:19 AM, Peter Hurley wrote:
> Plus, proc_exit_connector() uses netlink which uses user_ns.

Actually, I think this can be safely ignored since if the
netlink socket was tied to a user_ns which has just been
freed in exit_task_namespace() then the socket could not
be open in another process (otherwise the net_ns would
still have an open reference).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 14, 2014, 7:23 p.m. UTC | #4
On 03/14, Peter Hurley wrote:
>
>> On 03/13/2014 01:06 PM, Oleg Nesterov wrote:
>>>
>>> OTOH, why we should delay disassociate_ctty? IOW, do you see any
>>> potential problem with the trivial patch below?
>
> Won't work.
>
> cgroup_exit() can exec a userspace process (the notify_on_exit() facility)
                                                  ^^^^^^^^^^^^^^^^

can't find anything named notify_on_exit, perhap you meant
cgroup_release_agent? Although I guess this should not matter.

> which requires both namespace and tty facilities.

Hmm... why?

The exiting task obviously can't exec. The only way to spawn a userspace
process is call_usermodehelper(), it should work just fine, no?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Hurley March 14, 2014, 8:28 p.m. UTC | #5
On 03/14/2014 03:23 PM, Oleg Nesterov wrote:
> On 03/14, Peter Hurley wrote:
>>
>>> On 03/13/2014 01:06 PM, Oleg Nesterov wrote:
>>>>
>>>> OTOH, why we should delay disassociate_ctty? IOW, do you see any
>>>> potential problem with the trivial patch below?
>>
>> Won't work.
>>
>> cgroup_exit() can exec a userspace process (the notify_on_exit() facility)
>                                                    ^^^^^^^^^^^^^^^^
>
> can't find anything named notify_on_exit, perhap you meant
> cgroup_release_agent? Although I guess this should not matter.

Sorry, I meant the notify_on_release() facility as discussed in the
function header comments of cgroup_exit().

Yes, cgroup_release_agent() is the work function that is scheduled.

>> which requires both namespace and tty facilities.
>
> Hmm... why?
>
> The exiting task obviously can't exec. The only way to spawn a userspace
> process is call_usermodehelper(), it should work just fine, no?

You're correct, in the immediate sense that the user command exec'd will
not inherit open file descriptors.

But what if it expects to be able to find the intact children of
the foreground process group, and can't because the controlling tty
has already been torn down and all the children already sent SIGHUP.

I know that's not what the user command is intended for, but
userspace can be enterprising for establishing dependencies on
kernel constructs.

Or what if the user command expects to find and join the user namespace
of the dying process but now it's already been freed?

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 14, 2014, 9:04 p.m. UTC | #6
On 03/14, Peter Hurley wrote:
>
> On 03/14/2014 03:23 PM, Oleg Nesterov wrote:
>> On 03/14, Peter Hurley wrote:
>>>
> Yes, cgroup_release_agent() is the work function that is scheduled.
>
>>> which requires both namespace and tty facilities.
>>
>> Hmm... why?
>>
>> The exiting task obviously can't exec. The only way to spawn a userspace
>> process is call_usermodehelper(), it should work just fine, no?
>
> You're correct, in the immediate sense that the user command exec'd will
> not inherit open file descriptors.
>
> But what if it expects to be able to find the intact children of
> the foreground process group, and can't because the controlling tty
> has already been torn down and all the children already sent SIGHUP.

Which group/tty ? call_usermodehelper() asks the workqueue thread
to kthread_create/exec. See also below...

> Or what if the user command expects to find and join the user namespace
> of the dying process but now it's already been freed?

But it can't even know who called call_usermodehelper(). Besides,
cgroup_release_agent() uses UMH_WAIT_EXEC, so the caller can continue
and disappear completely before the usermode process has any chance
to do something.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Hurley March 15, 2014, 12:49 p.m. UTC | #7
On 03/14/2014 05:04 PM, Oleg Nesterov wrote:
> On 03/14, Peter Hurley wrote:
>> On 03/14/2014 03:23 PM, Oleg Nesterov wrote:
>>> On 03/14, Peter Hurley wrote:
>>>>
>> Yes, cgroup_release_agent() is the work function that is scheduled.
>>
>>>> which requires both namespace and tty facilities.
>>>
>>> Hmm... why?
>>>
>>> The exiting task obviously can't exec. The only way to spawn a userspace
>>> process is call_usermodehelper(), it should work just fine, no?
>>
>> You're correct, in the immediate sense that the user command exec'd will
>> not inherit open file descriptors.
>>
>> But what if it expects to be able to find the intact children of
>> the foreground process group, and can't because the controlling tty
>> has already been torn down and all the children already sent SIGHUP.
>
> Which group/tty ? call_usermodehelper() asks the workqueue thread
> to kthread_create/exec. See also below...
>
>> Or what if the user command expects to find and join the user namespace
>> of the dying process but now it's already been freed?
>
> But it can't even know who called call_usermodehelper(). Besides,
> cgroup_release_agent() uses UMH_WAIT_EXEC, so the caller can continue
> and disappear completely before the usermode process has any chance
> to do something.

I'm just hypothesizing potential breakage, since the order of teardown
is sensitive to changes, and I didn't do a complete audit of all the
possibilities.

If you feel strongly about moving disassociate_tty(), I won't object.

Regards,
Peter Hurley


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov March 17, 2014, 6:04 p.m. UTC | #8
On 03/15, Peter Hurley wrote:
>
> On 03/14/2014 05:04 PM, Oleg Nesterov wrote:
>>
>> But it can't even know who called call_usermodehelper(). Besides,
>> cgroup_release_agent() uses UMH_WAIT_EXEC, so the caller can continue
>> and disappear completely before the usermode process has any chance
>> to do something.
>
> I'm just hypothesizing potential breakage, since the order of teardown
> is sensitive to changes, and I didn't do a complete audit of all the
> possibilities.

Yes, I understand your concerns. Still I do not see how cgroup_exit()
can depend on tty/namespace.

> If you feel strongly about moving disassociate_tty(), I won't object.

It is not that I feel really strongly... just in looks better to me.
If nothing else:

	1. If we actually can not do disassociate_ctty() before, say,
	   cgroup_exit(), then we should understand and document the
	   reason.

	2. task_work_add() can have more users in drivers/tty which
	   can be triggered by disassociate_tty() paths. So I think
	   it would be nice to call it before task_work_exit().

2/2 is offtopic and hopefully trivial.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- x/kernel/exit.c
+++ kernel/exit.c/
@@ -784,6 +784,8 @@  void do_exit(long code)
 	exit_shm(tsk);
 	exit_files(tsk);
 	exit_fs(tsk);
+	if (group_dead)
+		disassociate_ctty(1);
 	exit_task_namespaces(tsk);
 	exit_task_work(tsk);
 	check_stack_usage();
@@ -799,13 +801,9 @@  void do_exit(long code)
 
 	cgroup_exit(tsk, 1);
 
-	if (group_dead)
-		disassociate_ctty(1);
-
 	module_put(task_thread_info(tsk)->exec_domain->module);
 
 	proc_exit_connector(tsk);
-
 	/*
 	 * FIXME: do that only when needed, using sched_exit tracepoint
 	 */