Patchwork [Oneiric/Natty,SRU,1/1] net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy

login
register
mail settings
Submitter Leann Ogasawara
Date Sept. 21, 2011, 7:28 p.m.
Message ID <1316633283.2176.14.camel@adamo>
Download mbox | patch
Permalink /patch/115845/
State New
Headers show

Comments

Leann Ogasawara - Sept. 21, 2011, 7:28 p.m.
Hi All,

BugLink: http://bugs.launchpad.net/bugs/843892

== SRU Justification ==
It's been reported that destroying a container causes a kernel Oops and
will hang the system.  The issue is reproducible.  The user has
successfully tested the patch against Oneiric and can confirm the Oops
no longer occurs when using a patched Oneiric kernel.  The patch has
been submitted upstream (CC'd upstream stable) and is currently queued
in the -mm tree.  It also appears it will hit the 3.2 merge window.
Please consider for SRU against Oneiric and Natty.

== Impact ==
The commit message of the patch notes that this will likely affect
2.6.26 and newer kernels, ie affects Lucid, Maverick, Natty, Oneiric.
However, due to the nature of our SRU process, the bug reporter is
likely only able to readily test Natty and Oneiric.  Thus I'm only
submitting this for SRU against Oneiric and Natty.

== Test Case ==
See reproducer in comment #6 of bug report

Thanks,
Leann

From ba1f54e39f8ec3a7db00dd2844dc29988638c5a5 Mon Sep 17 00:00:00 2001
From: Alex Bligh <alex@alex.org.uk>
Date: Wed, 14 Sep 2011 13:43:36 -0700
Subject: [PATCH] net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy

BugLink: http://bugs.launchpad.net/bugs/843892

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
container destroy

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(applied from -mm http://marc.info/?l=linux-mm-commits&m=131603308900694&w=2)

Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
---
 net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)
Tim Gardner - Sept. 21, 2011, 7:48 p.m.
On 09/21/2011 01:28 PM, Leann Ogasawara wrote:
> Hi All,
>
> BugLink: http://bugs.launchpad.net/bugs/843892
>
> == SRU Justification ==
> It's been reported that destroying a container causes a kernel Oops and
> will hang the system.  The issue is reproducible.  The user has
> successfully tested the patch against Oneiric and can confirm the Oops
> no longer occurs when using a patched Oneiric kernel.  The patch has
> been submitted upstream (CC'd upstream stable) and is currently queued
> in the -mm tree.  It also appears it will hit the 3.2 merge window.
> Please consider for SRU against Oneiric and Natty.
>
> == Impact ==
> The commit message of the patch notes that this will likely affect
> 2.6.26 and newer kernels, ie affects Lucid, Maverick, Natty, Oneiric.
> However, due to the nature of our SRU process, the bug reporter is
> likely only able to readily test Natty and Oneiric.  Thus I'm only
> submitting this for SRU against Oneiric and Natty.
>
> == Test Case ==
> See reproducer in comment #6 of bug report
>
> Thanks,
> Leann
>
>  From ba1f54e39f8ec3a7db00dd2844dc29988638c5a5 Mon Sep 17 00:00:00 2001
> From: Alex Bligh<alex@alex.org.uk>
> Date: Wed, 14 Sep 2011 13:43:36 -0700
> Subject: [PATCH] net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
>
> BugLink: http://bugs.launchpad.net/bugs/843892
>
> Problem:
>
> A repeatable Oops can be caused if a container with networking
> unshared is destroyed when it has nf_conntrack entries yet to expire.
>
> A copy of the oops follows below. A perl program generating the oops
> repeatably is attached inline below.
>
> Analysis:
>
> The oops is called from cleanup_net when the namespace is
> destroyed. conntrack iterates through outstanding events and calls
> death_by_timeout on each of them, which in turn produces a call to
> ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
> oopses because net->nfnl is NULL.
>
> The perl program generates the container through fork() then
> clone(NS_NEWNET). I does not explicitly	set up netlink
> explicitly set up netlink, but I presume it was set up else net->nfnl
> would have been NULL earlier (i.e. when an earlier connection
> timed out). This would thus suggest that net->nfnl is made NULL
> during the destruction of the container, which I think is done by
> nfnetlink_net_exit_batch.
>
> I can see that the various subsystems are deinitialised in the opposite
> order to which the relevant register_pernet_subsys calls are called,
> and both nf_conntrack and nfnetlink_net_ops register their relevant
> subsystems. If nfnetlink_net_ops registered later than nfconntrack,
> then its exit routine would have been called first, which would cause
> the oops described. I am not sure there is anything to prevent this
> happening in a container environment.
>
> Whilst there's perhaps a more complex problem revolving around ordering
> of subsystem deinit, it seems to me that missing a netlink event on a
> container that is dying is not a disaster. An early check for net->nfnl
> being non-NULL in ctnetlink_conntrack_event appears to fix this. There
> may remain a potential race condition if it becomes NULL immediately
> after being checked (I am not sure any lock is held at this point or
> how synchronisation for subsystem deinitialization works).
>
> Patch:
>
> The patch attached should apply on everything from 2.6.26 (if not before)
> onwards; it appears to be a problem on all kernels. This was taken against
> Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
> with the above perl script for 15 minutes or so; the perl script hung the
> machine within 20 seconds without this patch.
>
> Applicability:
>
> If this is the right solution, it should be applied to all stable kernels
> as well as head. Apart from the minor overhead of checking one variable
> against NULL, it can never 'do the wrong thing', because if net->nfnl
> is NULL, an oops will inevitably result. Therefore, checking is a reasonable
> thing to do unless it can be proven than net->nfnl will never be NULL.
>
> Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
> container destroy
>
> Signed-off-by: Alex Bligh<alex@alex.org.uk>
> Cc: Patrick McHardy<kaber@trash.net>
> Cc: David Miller<davem@davemloft.net>
> Cc:<stable@kernel.org>
> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> (applied from -mm http://marc.info/?l=linux-mm-commits&m=131603308900694&w=2)
>
> Signed-off-by: Leann Ogasawara<leann.ogasawara@canonical.com>
> ---
>   net/netfilter/nf_conntrack_netlink.c |    5 +++++
>   1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 482e90c..0790d0a 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
>   		return 0;
>
>   	net = nf_ct_net(ct);
> +
> +	/* container deinit, netlink may have died before death_by_timeout */
> +	if (!net->nfnl)
> +		return 0;
> +
>   	if (!item->report&&  !nfnetlink_has_listeners(net, group))
>   		return 0;
>

Acked-by: Tim Gardner <tim.gardner@canonical.com>
Leann Ogasawara - Sept. 22, 2011, 1:43 p.m.
Applied to Oneiric master-next and Natty master-next.

Thanks,
Leann

On Wed, 2011-09-21 at 12:28 -0700, Leann Ogasawara wrote:
> Hi All,
> 
> BugLink: http://bugs.launchpad.net/bugs/843892
> 
> == SRU Justification ==
> It's been reported that destroying a container causes a kernel Oops and
> will hang the system.  The issue is reproducible.  The user has
> successfully tested the patch against Oneiric and can confirm the Oops
> no longer occurs when using a patched Oneiric kernel.  The patch has
> been submitted upstream (CC'd upstream stable) and is currently queued
> in the -mm tree.  It also appears it will hit the 3.2 merge window.
> Please consider for SRU against Oneiric and Natty.
> 
> == Impact ==
> The commit message of the patch notes that this will likely affect
> 2.6.26 and newer kernels, ie affects Lucid, Maverick, Natty, Oneiric.
> However, due to the nature of our SRU process, the bug reporter is
> likely only able to readily test Natty and Oneiric.  Thus I'm only
> submitting this for SRU against Oneiric and Natty.
> 
> == Test Case ==
> See reproducer in comment #6 of bug report
> 
> Thanks,
> Leann
> 
> From ba1f54e39f8ec3a7db00dd2844dc29988638c5a5 Mon Sep 17 00:00:00 2001
> From: Alex Bligh <alex@alex.org.uk>
> Date: Wed, 14 Sep 2011 13:43:36 -0700
> Subject: [PATCH] net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
> 
> BugLink: http://bugs.launchpad.net/bugs/843892
> 
> Problem:
> 
> A repeatable Oops can be caused if a container with networking
> unshared is destroyed when it has nf_conntrack entries yet to expire.
> 
> A copy of the oops follows below. A perl program generating the oops
> repeatably is attached inline below.
> 
> Analysis:
> 
> The oops is called from cleanup_net when the namespace is
> destroyed. conntrack iterates through outstanding events and calls
> death_by_timeout on each of them, which in turn produces a call to
> ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
> oopses because net->nfnl is NULL.
> 
> The perl program generates the container through fork() then
> clone(NS_NEWNET). I does not explicitly	set up netlink
> explicitly set up netlink, but I presume it was set up else net->nfnl
> would have been NULL earlier (i.e. when an earlier connection
> timed out). This would thus suggest that net->nfnl is made NULL
> during the destruction of the container, which I think is done by
> nfnetlink_net_exit_batch.
> 
> I can see that the various subsystems are deinitialised in the opposite
> order to which the relevant register_pernet_subsys calls are called,
> and both nf_conntrack and nfnetlink_net_ops register their relevant
> subsystems. If nfnetlink_net_ops registered later than nfconntrack,
> then its exit routine would have been called first, which would cause
> the oops described. I am not sure there is anything to prevent this
> happening in a container environment.
> 
> Whilst there's perhaps a more complex problem revolving around ordering
> of subsystem deinit, it seems to me that missing a netlink event on a
> container that is dying is not a disaster. An early check for net->nfnl
> being non-NULL in ctnetlink_conntrack_event appears to fix this. There
> may remain a potential race condition if it becomes NULL immediately
> after being checked (I am not sure any lock is held at this point or
> how synchronisation for subsystem deinitialization works).
> 
> Patch:
> 
> The patch attached should apply on everything from 2.6.26 (if not before)
> onwards; it appears to be a problem on all kernels. This was taken against
> Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
> with the above perl script for 15 minutes or so; the perl script hung the
> machine within 20 seconds without this patch.
> 
> Applicability:
> 
> If this is the right solution, it should be applied to all stable kernels
> as well as head. Apart from the minor overhead of checking one variable
> against NULL, it can never 'do the wrong thing', because if net->nfnl
> is NULL, an oops will inevitably result. Therefore, checking is a reasonable
> thing to do unless it can be proven than net->nfnl will never be NULL.
> 
> Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
> container destroy
> 
> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> Cc: Patrick McHardy <kaber@trash.net>
> Cc: David Miller <davem@davemloft.net>
> Cc: <stable@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> (applied from -mm http://marc.info/?l=linux-mm-commits&m=131603308900694&w=2)
> 
> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
> ---
>  net/netfilter/nf_conntrack_netlink.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 482e90c..0790d0a 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
>  		return 0;
>  
>  	net = nf_ct_net(ct);
> +
> +	/* container deinit, netlink may have died before death_by_timeout */
> +	if (!net->nfnl)
> +		return 0;
> +
>  	if (!item->report && !nfnetlink_has_listeners(net, group))
>  		return 0;
>  
> -- 
> 1.7.4.1
> 
> 
> 
>

Patch

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..0790d0a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@  ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
 		return 0;
 
 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;