Message ID | 1495322569-63361-1-git-send-email-zlpnobody@163.com |
---|---|
State | Accepted |
Delegated to: | Pablo Neira |
Headers | show |
Liping Zhang <zlpnobody@163.com> wrote: > From: Liping Zhang <zlpnobody@gmail.com> > > If nf_conntrack_htable_size was adjusted by the user during the ct > dump operation, we may invoke nf_ct_put twice for the same ct, i.e. > the "last" ct. This will cause the ct will be freed but still linked > in hash buckets. > > It's very easy to reproduce the problem by the following commands: > # while : ; do > echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets > done > # while : ; do > conntrack -L > done > # iperf -s 127.0.0.1 & > # iperf -c 127.0.0.1 -P 60 -t 36000 > > After a while, the system will hang like this: > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184] > NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382] > ... > > So at last if we find cb->args[1] is equal to "last", this means hash > resize happened, then we can set cb->args[1] to 0 to fix the above > issue. Yes, you're right, seems this was added in 93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out'). Your patch looks correct. However, why do we bump refcnt of 'last' in the first place? Its only the continuation marker, i.e. its expected to reside in the hash slot at cb->args[0], but after rehash this might not be true either. I think we should simplify this, just take the verbatim address, and clear it right at start of ctnetlink_dump_table, i.e. unsigned long last = cb->args[1]; cb->args[1] = 0; for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) { ... hlist_nulls_for_each_entry ... { ... if (last) { if (last != (unsigned long)ct)) cont; last = 0; } ... dump(); } last = 0; /* reset it, as it wasn't in args[0] slot */ } Do you see any problem with that? [ It might be better to take your patch for nf- though and do this no-refcnt thing in nf-next ... ] -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Florian, 2017-05-21 8:00 GMT+08:00 Florian Westphal <fw@strlen.de>: [...] > Yes, you're right, seems this was added in > 93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out'). I added some trace logs, and when the hash size reduced, for example, from 60000 to 500, then the issue would happen. Actually, hitting 'goto out' is not easy, so the issue exists for a very long time. Maybe commit 89f2e21883b5("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID") is to blame for it. > Your patch looks correct. > > However, why do we bump refcnt of 'last' in the first place? > > Its only the continuation marker, i.e. its expected to reside > in the hash slot at cb->args[0], but after rehash this might not > be true either. > > I think we should simplify this, just take the verbatim address, > and clear it right at start of ctnetlink_dump_table, i.e. > > unsigned long last = cb->args[1]; > cb->args[1] = 0; > > for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) { > ... > hlist_nulls_for_each_entry ... { > ... > if (last) { > if (last != (unsigned long)ct)) > cont; > last = 0; > } > ... > dump(); > } > last = 0; /* reset it, as it wasn't in args[0] slot */ > } > > Do you see any problem with that? I think this will be better, this will make code more clean. Also we can clean up the ctnetlink_exp_ct_dump_table too. > > [ It might be better to take your patch for nf- though and do > this no-refcnt thing in nf-next ... ] -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, May 21, 2017 at 08:59:45AM +0800, Liping Zhang wrote: > Hi Florian, > > 2017-05-21 8:00 GMT+08:00 Florian Westphal <fw@strlen.de>: > [...] > > Yes, you're right, seems this was added in > > 93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out'). > > I added some trace logs, and when the hash size reduced, for example, > from 60000 to 500, then the issue would happen. > > Actually, hitting 'goto out' is not easy, so the issue exists for a very long > time. Maybe commit 89f2e21883b5("[NETFILTER]: ctnetlink: change > table dumping not to require an unique ID") is to blame for it. > > > Your patch looks correct. > > > > However, why do we bump refcnt of 'last' in the first place? > > > > Its only the continuation marker, i.e. its expected to reside > > in the hash slot at cb->args[0], but after rehash this might not > > be true either. > > > > I think we should simplify this, just take the verbatim address, > > and clear it right at start of ctnetlink_dump_table, i.e. > > > > unsigned long last = cb->args[1]; > > cb->args[1] = 0; > > > > for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) { > > ... > > hlist_nulls_for_each_entry ... { > > ... > > if (last) { > > if (last != (unsigned long)ct)) > > cont; > > last = 0; > > } > > ... > > dump(); > > } > > last = 0; /* reset it, as it wasn't in args[0] slot */ > > } > > > > Do you see any problem with that? > > I think this will be better, this will make code more clean. > Also we can clean up the ctnetlink_exp_ct_dump_table too. @Florian, no objection then if I place this into nf.git? I will append the Fixes: tag: Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID") -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Sun, May 21, 2017 at 08:59:45AM +0800, Liping Zhang wrote: > > Hi Florian, > > > > 2017-05-21 8:00 GMT+08:00 Florian Westphal <fw@strlen.de>: > > [...] > > > Yes, you're right, seems this was added in > > > 93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out'). > > > > I added some trace logs, and when the hash size reduced, for example, > > from 60000 to 500, then the issue would happen. > > > > Actually, hitting 'goto out' is not easy, so the issue exists for a very long > > time. Maybe commit 89f2e21883b5("[NETFILTER]: ctnetlink: change > > table dumping not to require an unique ID") is to blame for it. > > > > > Your patch looks correct. > > > > > > However, why do we bump refcnt of 'last' in the first place? > > > > > > Its only the continuation marker, i.e. its expected to reside > > > in the hash slot at cb->args[0], but after rehash this might not > > > be true either. > > > > > > I think we should simplify this, just take the verbatim address, > > > and clear it right at start of ctnetlink_dump_table, i.e. > > > > > > unsigned long last = cb->args[1]; > > > cb->args[1] = 0; > > > > > > for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) { > > > ... > > > hlist_nulls_for_each_entry ... { > > > ... > > > if (last) { > > > if (last != (unsigned long)ct)) > > > cont; > > > last = 0; > > > } > > > ... > > > dump(); > > > } > > > last = 0; /* reset it, as it wasn't in args[0] slot */ > > > } > > > > > > Do you see any problem with that? > > > > I think this will be better, this will make code more clean. > > Also we can clean up the ctnetlink_exp_ct_dump_table too. > > @Florian, no objection then if I place this into nf.git? No objection, thanks! > I will append the Fixes: tag: > > Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID") That commit looks fine to me, it seems to make sure to put "last" only once in all cases. 93bb0ceb75be2fdfa9fc0dd1 however adds a check on cb->args[0], and if that is hit it will do a put() on last, and then, the "done" netlink callback will do another put operation on cb->args[1] (i.e., last). -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2017-05-24 6:28 GMT+08:00 Florian Westphal <fw@strlen.de>: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: [...] >> I will append the Fixes: tag: >> >> Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID") > > That commit looks fine to me, it seems to make sure to put > "last" only once in all cases. > > 93bb0ceb75be2fdfa9fc0dd1 however adds a check on cb->args[0], and if > that is hit it will do a put() on last, and then, the "done" netlink > callback will do another put operation on cb->args[1] (i.e., last). After I have a closer look, I think this patch should add: Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping") After this commit, when the hash size was reduced, for example, from 60000 to 600, then we may put the "last" ct twice, as we may fail to go into the iteration and clear the cb->args[1], so: 1. nf_ct_put(last) by ctnetlink_dump_table, but cb->args[1] still point to the "last" 2. nf_ct_put((struct nf_conn *)cb->args[1]) by ctnetlink_done -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Liping Zhang <zlpnobody@gmail.com> wrote: > 2017-05-24 6:28 GMT+08:00 Florian Westphal <fw@strlen.de>: > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > [...] > >> I will append the Fixes: tag: > >> > >> Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID") > > > > That commit looks fine to me, it seems to make sure to put > > "last" only once in all cases. > > > > 93bb0ceb75be2fdfa9fc0dd1 however adds a check on cb->args[0], and if > > that is hit it will do a put() on last, and then, the "done" netlink > > callback will do another put operation on cb->args[1] (i.e., last). > > After I have a closer look, I think this patch should add: > > Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping") > > After this commit, when the hash size was reduced, for example, > from 60000 to 600, then we may put the "last" ct twice, as we may > fail to go into the iteration and clear the cb->args[1], so: > > 1. nf_ct_put(last) by ctnetlink_dump_table, but cb->args[1] still > point to the "last" > 2. nf_ct_put((struct nf_conn *)cb->args[1]) by ctnetlink_done You are right. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, May 21, 2017 at 07:22:49AM +0800, Liping Zhang wrote: > From: Liping Zhang <zlpnobody@gmail.com> > > If nf_conntrack_htable_size was adjusted by the user during the ct > dump operation, we may invoke nf_ct_put twice for the same ct, i.e. > the "last" ct. This will cause the ct will be freed but still linked > in hash buckets. > > It's very easy to reproduce the problem by the following commands: > # while : ; do > echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets > done > # while : ; do > conntrack -L > done > # iperf -s 127.0.0.1 & > # iperf -c 127.0.0.1 -P 60 -t 36000 > > After a while, the system will hang like this: > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184] > NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382] > ... > > So at last if we find cb->args[1] is equal to "last", this means hash > resize happened, then we can set cb->args[1] to 0 to fix the above > issue. Applied, thanks. I have added: Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping") -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index dcf561b..3b449e0 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -888,8 +888,13 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb) } out: local_bh_enable(); - if (last) + if (last) { + /* nf ct hash resize happened, now clear the leftover. */ + if ((struct nf_conn *)cb->args[1] == last) + cb->args[1] = 0; + nf_ct_put(last); + } while (i) { i--;