diff mbox

[nf] netfilter: conntrack: resched in nf_ct_iterate_cleanup

Message ID 1449682209-20330-1-git-send-email-fw@strlen.de
State Changes Requested
Delegated to: Pablo Neira
Headers show

Commit Message

Florian Westphal Dec. 9, 2015, 5:30 p.m. UTC
Ulrich reports soft lockup with following (shortened) callchain:

NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
__netif_receive_skb_core+0x6e4/0x774
process_backlog+0x94/0x160
net_rx_action+0x88/0x178
call_do_softirq+0x24/0x3c
do_softirq+0x54/0x6c
__local_bh_enable_ip+0x7c/0xbc
nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack]
masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6]
atomic_notifier_call_chain+0x1c/0x2c
ipv6_del_addr+0x1bc/0x220 [ipv6]

Problem is that nf_ct_iterate_cleanup can run for a very long time
since it can be interrupted by softirq processing.
Moreover, atomic_notifier_call_chain runs with rcu readlock held.

So lets call cond_resched() in nf_ct_iterate_cleanup loop and defer
the call to a work queue for the atomic_notifier_call_chain case.

Reported-by: Ulrich Weber <uw@ocedo.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Patch also applies to nf-next tree in case you think nf tree isn't appropriate.

 net/ipv6/netfilter/nf_nat_masquerade_ipv6.c | 46 +++++++++++++++++++++++++++--
 net/netfilter/nf_conntrack_core.c           |  3 ++
 2 files changed, 46 insertions(+), 3 deletions(-)

Comments

Pablo Neira Ayuso Dec. 11, 2015, 11:42 a.m. UTC | #1
On Wed, Dec 09, 2015 at 06:30:09PM +0100, Florian Westphal wrote:
> Ulrich reports soft lockup with following (shortened) callchain:
> 
> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
> __netif_receive_skb_core+0x6e4/0x774
> process_backlog+0x94/0x160
> net_rx_action+0x88/0x178
> call_do_softirq+0x24/0x3c
> do_softirq+0x54/0x6c
> __local_bh_enable_ip+0x7c/0xbc
> nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack]
> masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6]
> atomic_notifier_call_chain+0x1c/0x2c
> ipv6_del_addr+0x1bc/0x220 [ipv6]
> 
> Problem is that nf_ct_iterate_cleanup can run for a very long time
> since it can be interrupted by softirq processing.
> Moreover, atomic_notifier_call_chain runs with rcu readlock held.
> 
> So lets call cond_resched() in nf_ct_iterate_cleanup loop and defer
> the call to a work queue for the atomic_notifier_call_chain case.

Don't we potentially have the same problem in IPv4? If so, then it's
probably a good idea to add a nf_ct_iterate_cleanup_defered().

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Dec. 11, 2015, 11:53 a.m. UTC | #2
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Wed, Dec 09, 2015 at 06:30:09PM +0100, Florian Westphal wrote:
> > Ulrich reports soft lockup with following (shortened) callchain:
> > 
> > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
> > __netif_receive_skb_core+0x6e4/0x774
> > process_backlog+0x94/0x160
> > net_rx_action+0x88/0x178
> > call_do_softirq+0x24/0x3c
> > do_softirq+0x54/0x6c
> > __local_bh_enable_ip+0x7c/0xbc
> > nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack]
> > masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6]
> > atomic_notifier_call_chain+0x1c/0x2c
> > ipv6_del_addr+0x1bc/0x220 [ipv6]
> > 
> > Problem is that nf_ct_iterate_cleanup can run for a very long time
> > since it can be interrupted by softirq processing.
> > Moreover, atomic_notifier_call_chain runs with rcu readlock held.
> > 
> > So lets call cond_resched() in nf_ct_iterate_cleanup loop and defer
> > the call to a work queue for the atomic_notifier_call_chain case.
> 
> Don't we potentially have the same problem in IPv4?

No, the inet notifier appears to be fine (blocking notifier).

The only nf_ct_iterate_cleanup callsite that I found to be problematic
(i.e., not preemptible) is the ipv6 address
deletion notifier in ipv6 masquarading.

I also tried with nf-next + CONFIG_DEBUG_ATOMIC_SLEEP + this patch
and I saw no error on ipv4 address deletion w. ip4 masquerade
module loaded.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Dec. 11, 2015, 2:43 p.m. UTC | #3
Florian Westphal <fw@strlen.de> wrote:
> Ulrich reports soft lockup with following (shortened) callchain:
> 
> NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
> __netif_receive_skb_core+0x6e4/0x774
> process_backlog+0x94/0x160
> net_rx_action+0x88/0x178
> call_do_softirq+0x24/0x3c
> do_softirq+0x54/0x6c
> __local_bh_enable_ip+0x7c/0xbc
> nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack]
> masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6]
> atomic_notifier_call_chain+0x1c/0x2c
> ipv6_del_addr+0x1bc/0x220 [ipv6]
> 
> Problem is that nf_ct_iterate_cleanup can run for a very long time
> since it can be interrupted by softirq processing.
> Moreover, atomic_notifier_call_chain runs with rcu readlock held.

Ulrich just reported another softlockup even with this patch applied.

One explanation would be non-matching iter(), in this case
get_next_corpse can take forever since it will walk the entire conntrack
table, rendering the cond_resched moot.

A V2 patch will be coming to also add a lock break + resched to
get_next_corpse.

I'll mark it as 'changes requested' in patchwork.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Dec. 11, 2015, 5:16 p.m. UTC | #4
On Fri, Dec 11, 2015 at 03:43:13PM +0100, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
> > Ulrich reports soft lockup with following (shortened) callchain:
> > 
> > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
> > __netif_receive_skb_core+0x6e4/0x774
> > process_backlog+0x94/0x160
> > net_rx_action+0x88/0x178
> > call_do_softirq+0x24/0x3c
> > do_softirq+0x54/0x6c
> > __local_bh_enable_ip+0x7c/0xbc
> > nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack]
> > masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6]
> > atomic_notifier_call_chain+0x1c/0x2c
> > ipv6_del_addr+0x1bc/0x220 [ipv6]
> > 
> > Problem is that nf_ct_iterate_cleanup can run for a very long time
> > since it can be interrupted by softirq processing.
> > Moreover, atomic_notifier_call_chain runs with rcu readlock held.
> 
> Ulrich just reported another softlockup even with this patch applied.
> 
> One explanation would be non-matching iter(), in this case
> get_next_corpse can take forever since it will walk the entire conntrack
> table, rendering the cond_resched moot.

Probably another reincarnation of 0838aa7fcfcd? Is Ulrich using
conntrack templates?

> A V2 patch will be coming to also add a lock break + resched to
> get_next_corpse.

BTW, the atomic chain notifier in IPv6 seems to be there to handle
this update from the packet path:

 ndisc_rcv()
 ndisc_router_discovery()
 addrconf_prefix_rcv()
 manage_tempaddrs()
 ipv6_add_addr()
 inet6addr_notifier_call_chain()

Probably we can get Hannes have a look into this, I think we can
convert this chain to blocking one through workqueue since
addrconf_prefix_rcv() returns void. The remaining call sites of
inet6addr_notifier_call_chain() that I could tracked come from paths
where I can see ASSERT_RTNL(), so user context is guaranteed.

I'm telling this become I remember that we discussed in netconf'14
Chicago that it would be good to get rid of this kind og asymmetries
between IPv4 and IPv6.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Dec. 11, 2015, 5:21 p.m. UTC | #5
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> BTW, the atomic chain notifier in IPv6 seems to be there to handle
> this update from the packet path:
> 
>  ndisc_rcv()
>  ndisc_router_discovery()
>  addrconf_prefix_rcv()
>  manage_tempaddrs()
>  ipv6_add_addr()
>  inet6addr_notifier_call_chain()
> 
> Probably we can get Hannes have a look into this, I think we can
> convert this chain to blocking one through workqueue since
> addrconf_prefix_rcv() returns void. The remaining call sites of
> inet6addr_notifier_call_chain() that I could tracked come from paths
> where I can see ASSERT_RTNL(), so user context is guaranteed.
>
> I'm telling this become I remember that we discussed in netconf'14
> Chicago that it would be good to get rid of this kind og asymmetries
> between IPv4 and IPv6.

Ok, I agree, that would be a lot nicer.

I'll have a look.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
index 31ba7ca..a877dee 100644
--- a/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
@@ -78,14 +78,54 @@  static struct notifier_block masq_dev_notifier = {
 	.notifier_call	= masq_device_event,
 };
 
+struct masq_dev_slow_work {
+	struct work_struct work;
+	struct net *net;
+	int ifindex;
+};
+
+static void iterate_cleanup_work(struct work_struct *work)
+{
+	struct masq_dev_slow_work *w;
+	struct net *net;
+	int ifindex;
+
+	w = container_of(work, struct masq_dev_slow_work, work);
+
+	net = w->net;
+	ifindex = w->ifindex;
+	kfree(w);
+
+	nf_ct_iterate_cleanup(net, device_cmp, (void *)(long)ifindex, 0, 0);
+
+	put_net(net);
+	module_put(THIS_MODULE);
+}
+
 static int masq_inet_event(struct notifier_block *this,
 			   unsigned long event, void *ptr)
 {
 	struct inet6_ifaddr *ifa = ptr;
-	struct netdev_notifier_info info;
+	struct masq_dev_slow_work *w;
+
+	if (event != NETDEV_DOWN || !try_module_get(THIS_MODULE))
+		return NOTIFY_DONE;
+
+	/* can't call nf_ct_iterate_cleanup in atomic context */
+	w = kmalloc(sizeof(*w), GFP_ATOMIC);
+	if (w) {
+		const struct net_device *dev = ifa->idev->dev;
 
-	netdev_notifier_info_init(&info, ifa->idev->dev);
-	return masq_device_event(this, event, &info);
+		INIT_WORK(&w->work, iterate_cleanup_work);
+
+		w->ifindex = dev->ifindex;
+		w->net = get_net(dev_net(dev));
+		schedule_work(&w->work);
+	} else {
+		module_put(THIS_MODULE);
+	}
+
+	return NOTIFY_DONE;
 }
 
 static struct notifier_block masq_inet_notifier = {
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 3cb3cb8..cffeb68 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1422,6 +1422,8 @@  void nf_ct_iterate_cleanup(struct net *net,
 	struct nf_conn *ct;
 	unsigned int bucket = 0;
 
+	might_sleep();
+
 	while ((ct = get_next_corpse(net, iter, data, &bucket)) != NULL) {
 		/* Time to push up daises... */
 		if (del_timer(&ct->timeout))
@@ -1430,6 +1432,7 @@  void nf_ct_iterate_cleanup(struct net *net,
 		/* ... else the timer will get him soon. */
 
 		nf_ct_put(ct);
+		cond_resched();
 	}
 }
 EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup);