diff mbox

[nf,V2] netfilter: fix oops in nfqueue during netns error unwinding

Message ID 1462981273-21676-1-git-send-email-fw@strlen.de
State RFC
Delegated to: Pablo Neira
Headers show

Commit Message

Florian Westphal May 11, 2016, 3:41 p.m. UTC
Under full load (unshare() in loop -> OOM conditions) we can
get kernel panic:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
[..]
task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
RIP: 0010:[<ffffffff81476c85>]  [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
[..]
Call Trace:
 [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
 [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
 [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
 [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
 [<ffffffff8141b652>] setup_net+0xc2/0x120
 [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
 [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
 [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
 [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
 [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47

Problem is that we call into the nfqueue backend to zap
packets that might be queued to userspace when we unregister a
netfilter hook.

However, this assumes that the backend was initialized and
net_generic(net, nfnl_queue_net_id) returns valid memory.

This is only true if the hook unregister happens in the netns exit path.
If it happens during error unwind because a netns init hook returned
an error condition (e.g. out of memory), then the result of
net_generic(net, nfnl_queue_net_id) is undefined.

Only do the cleanup for namespaces that were on the
net_namespace_list list (i.e., all netns ->init() functions were ok).

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Dale Whitfield <dale.4d@gmail.com>
Fixes: 8405a8fff3f ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 AFAICS this works fine as well -- if netns was never on the
 net_namespace_list no packets can be queued so we don't need
 to care if the nf_queue init hook got called or not.

Comments

Pablo Neira Ayuso May 12, 2016, 9:47 a.m. UTC | #1
On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> index 5baa8e2..9722819 100644
> --- a/net/netfilter/nf_queue.c
> +++ b/net/netfilter/nf_queue.c
> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
>  {
>  	const struct nf_queue_handler *qh;
>  
> +	/* netns wasn't initialized, error unwind in progress.
> +	 * Its possible that the nfq netns init function was not even
> +	 * called, in which case nfq pernetns data is in undefined state.
> +	 */
> +	if (!net->list.next)
> +		return;

Thanks Florian.

Question for the netns folks: Is this a stable assumption? My only
concern with this is that it relies on internal the netns
representation, so I'd like to make sure this thing doesn't break
again.

Let me know, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman May 12, 2016, 4:15 p.m. UTC | #2
Pablo Neira Ayuso <pablo@netfilter.org> writes:

> On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
>> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
>> index 5baa8e2..9722819 100644
>> --- a/net/netfilter/nf_queue.c
>> +++ b/net/netfilter/nf_queue.c
>> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
>>  {
>>  	const struct nf_queue_handler *qh;
>>  
>> +	/* netns wasn't initialized, error unwind in progress.
>> +	 * Its possible that the nfq netns init function was not even
>> +	 * called, in which case nfq pernetns data is in undefined state.
>> +	 */
>> +	if (!net->list.next)
>> +		return;
>
> Thanks Florian.
>
> Question for the netns folks: Is this a stable assumption? My only
> concern with this is that it relies on internal the netns
> representation, so I'd like to make sure this thing doesn't break
> again.
>
> Let me know, thanks.

Ugh. I got distracted before I finished replying.

I don't think the analysis is correct.

The unwind and the netns exit path should maintain the same constraints.
If they don't there is a bug, perhaps in initialization ordering.

Code should be able to count on net_generic() from the beginning of the
corresponding .init to the end of the corresponding .fini

Something like that is not happening and if we can I would like to track
down why.

Otherwise we need code like the above all over the place.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 12, 2016, 4:40 p.m. UTC | #3
Eric W. Biederman <ebiederm@xmission.com> wrote:
> > On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
> >> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> >> index 5baa8e2..9722819 100644
> >> --- a/net/netfilter/nf_queue.c
> >> +++ b/net/netfilter/nf_queue.c
> >> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
> >>  {
> >>  	const struct nf_queue_handler *qh;
> >>  
> >> +	/* netns wasn't initialized, error unwind in progress.
> >> +	 * Its possible that the nfq netns init function was not even
> >> +	 * called, in which case nfq pernetns data is in undefined state.
> >> +	 */
> >> +	if (!net->list.next)
> >> +		return;
> >
> > Thanks Florian.
> >
> > Question for the netns folks: Is this a stable assumption? My only
> > concern with this is that it relies on internal the netns
> > representation, so I'd like to make sure this thing doesn't break
> > again.
> >
> > Let me know, thanks.
> 
> Ugh. I got distracted before I finished replying.
> 
> I don't think the analysis is correct.

I am afraid it is, see below for example

> The unwind and the netns exit path should maintain the same constraints.
> If they don't there is a bug, perhaps in initialization ordering.

Since 8405a8fff3f8545c888a872d6e3c0c8eecd4d348 any call to
nf_unregister_hook invokes nf_queue_nf_hook_drop().

If the nfnetlink_queue module is loaded, that means we call
nfqnl_nf_hook_drop function via the nf_queue nf_queue_handler struct
in nf_queue.c .

And since nfqnl_nf_hook_drop() uses net_generic() ...

> Code should be able to count on net_generic() from the beginning of the
> corresponding .init to the end of the corresponding .fini

... we cannot count on this.

Example:
1. we try to create new netns
2. net_namespace.c:ops_init walks the netns op list and calls ->init()
3. some of these init hooks register netfilter hooks
4. we call init hook for nfnetlink_queue, but this yields -EFOO (alternatively, another ->init handler before this failed)
5. netns init code invokes the ->exit() handlers in reverse order for unwinding
6. hooks get unregistered, which makes a call into nf_queue_nf_hook_drop
7. nf_queue then calls q->nfqnl_nf_hook_drop(net)
8. oops as nfnetlink_queue wasn't setup in this net ns and net_generic
returns some undefined result

> Something like that is not happening and if we can I would like to track
> down why.
> 
> Otherwise we need code like the above all over the place.

AFAICS no other callers do something similar, but yes,
we'd need this all over the place if there are others.

Maybe we need a saner fix, e.g. by adding bounds check to net_generic(),
and making sure that net_generic() returns non-NULL only if the per
netns memory was allocated properly.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman May 13, 2016, 7:40 p.m. UTC | #4
Florian Westphal <fw@strlen.de> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>> > On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
>> >> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
>> >> index 5baa8e2..9722819 100644
>> >> --- a/net/netfilter/nf_queue.c
>> >> +++ b/net/netfilter/nf_queue.c
>> >> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
>> >>  {
>> >>  	const struct nf_queue_handler *qh;
>> >>  
>> >> +	/* netns wasn't initialized, error unwind in progress.
>> >> +	 * Its possible that the nfq netns init function was not even
>> >> +	 * called, in which case nfq pernetns data is in undefined state.
>> >> +	 */
>> >> +	if (!net->list.next)
>> >> +		return;
>> >
>> > Thanks Florian.
>> >
>> > Question for the netns folks: Is this a stable assumption? My only
>> > concern with this is that it relies on internal the netns
>> > representation, so I'd like to make sure this thing doesn't break
>> > again.
>> >
>> > Let me know, thanks.
>> 
>> Ugh. I got distracted before I finished replying.
>> 
>> I don't think the analysis is correct.
>
> I am afraid it is, see below for example
>
>> The unwind and the netns exit path should maintain the same constraints.
>> If they don't there is a bug, perhaps in initialization ordering.
>
> Since 8405a8fff3f8545c888a872d6e3c0c8eecd4d348 any call to
> nf_unregister_hook invokes nf_queue_nf_hook_drop().
>
> If the nfnetlink_queue module is loaded, that means we call
> nfqnl_nf_hook_drop function via the nf_queue nf_queue_handler struct
> in nf_queue.c .
>
> And since nfqnl_nf_hook_drop() uses net_generic() ...
>
>> Code should be able to count on net_generic() from the beginning of the
>> corresponding .init to the end of the corresponding .fini
>
> ... we cannot count on this.

I guess what I meant was that we should see if we can fix this.

> Example:
> 1. we try to create new netns
> 2. net_namespace.c:ops_init walks the netns op list and calls ->init()
> 3. some of these init hooks register netfilter hooks
> 4. we call init hook for nfnetlink_queue, but this yields -EFOO (alternatively, another ->init handler before this failed)
> 5. netns init code invokes the ->exit() handlers in reverse order for unwinding
> 6. hooks get unregistered, which makes a call into nf_queue_nf_hook_drop
> 7. nf_queue then calls q->nfqnl_nf_hook_drop(net)
> 8. oops as nfnetlink_queue wasn't setup in this net ns and net_generic
> returns some undefined result
>
>> Something like that is not happening and if we can I would like to track
>> down why.
>> 
>> Otherwise we need code like the above all over the place.
>
> AFAICS no other callers do something similar, but yes,
> we'd need this all over the place if there are others.
>
> Maybe we need a saner fix, e.g. by adding bounds check to net_generic(),
> and making sure that net_generic() returns non-NULL only if the per
> netns memory was allocated properly.

As a first approximiation I am thinking we should fix by making
nf_queue_register_handler a per netns operation.

Either that or give nfnetlink_queue it's own dedicated place in
struct net for struct nfnl_queue_net.

Let's sort out the interface so we can't get this wrong.

Eric



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 5baa8e2..9722819 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -102,6 +102,13 @@  void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
 {
 	const struct nf_queue_handler *qh;
 
+	/* netns wasn't initialized, error unwind in progress.
+	 * Its possible that the nfq netns init function was not even
+	 * called, in which case nfq pernetns data is in undefined state.
+	 */
+	if (!net->list.next)
+		return;
+
 	rcu_read_lock();
 	qh = rcu_dereference(queue_handler);
 	if (qh)