diff mbox

[nf] netfilter: ctnetlink: fix incorrect nf_ct_put during hash resize

Message ID 1495322569-63361-1-git-send-email-zlpnobody@163.com
State Accepted
Delegated to: Pablo Neira
Headers show

Commit Message

Liping Zhang May 20, 2017, 11:22 p.m. UTC
From: Liping Zhang <zlpnobody@gmail.com>

If nf_conntrack_htable_size was adjusted by the user during the ct
dump operation, we may invoke nf_ct_put twice for the same ct, i.e.
the "last" ct. This will cause the ct will be freed but still linked
in hash buckets.

It's very easy to reproduce the problem by the following commands:
  # while : ; do
  echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
  done
  # while : ; do
  conntrack -L
  done
  # iperf -s 127.0.0.1 &
  # iperf -c 127.0.0.1 -P 60 -t 36000

After a while, the system will hang like this:
  NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184]
  NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382]
  ...

So at last if we find cb->args[1] is equal to "last", this means hash
resize happened, then we can set cb->args[1] to 0 to fix the above
issue.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
---
 net/netfilter/nf_conntrack_netlink.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Florian Westphal May 21, 2017, midnight UTC | #1
Liping Zhang <zlpnobody@163.com> wrote:
> From: Liping Zhang <zlpnobody@gmail.com>
> 
> If nf_conntrack_htable_size was adjusted by the user during the ct
> dump operation, we may invoke nf_ct_put twice for the same ct, i.e.
> the "last" ct. This will cause the ct will be freed but still linked
> in hash buckets.
> 
> It's very easy to reproduce the problem by the following commands:
>   # while : ; do
>   echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
>   done
>   # while : ; do
>   conntrack -L
>   done
>   # iperf -s 127.0.0.1 &
>   # iperf -c 127.0.0.1 -P 60 -t 36000
> 
> After a while, the system will hang like this:
>   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184]
>   NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382]
>   ...
> 
> So at last if we find cb->args[1] is equal to "last", this means hash
> resize happened, then we can set cb->args[1] to 0 to fix the above
> issue.

Yes, you're right, seems this was added in
93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out').

Your patch looks correct.

However, why do we bump refcnt of 'last' in the first place?

Its only the continuation marker, i.e. its expected to reside
in the hash slot at cb->args[0], but after rehash this might not
be true either.

I think we should simplify this, just take the verbatim address,
and clear it right at start of ctnetlink_dump_table, i.e.

unsigned long last = cb->args[1];
cb->args[1] = 0;

for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
	 ...
    hlist_nulls_for_each_entry ... {
       ...
       if (last) {
            if (last != (unsigned long)ct))
	       cont;
            last = 0;
       }
       ...
       dump();
    }
    last = 0; /* reset it, as it wasn't in args[0] slot */
 }

Do you see any problem with that?

[ It might be better to take your patch for nf- though and do
  this no-refcnt thing in nf-next ... ]
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liping Zhang May 21, 2017, 12:59 a.m. UTC | #2
Hi Florian,

2017-05-21 8:00 GMT+08:00 Florian Westphal <fw@strlen.de>:
[...]
> Yes, you're right, seems this was added in
> 93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out').

I added some trace logs, and when the hash size reduced, for example,
from 60000 to 500, then the issue would happen.

Actually, hitting 'goto out' is not easy, so the issue exists for a very long
time. Maybe commit 89f2e21883b5("[NETFILTER]: ctnetlink: change
table dumping not to require an unique ID") is to blame for it.

> Your patch looks correct.
>
> However, why do we bump refcnt of 'last' in the first place?
>
> Its only the continuation marker, i.e. its expected to reside
> in the hash slot at cb->args[0], but after rehash this might not
> be true either.
>
> I think we should simplify this, just take the verbatim address,
> and clear it right at start of ctnetlink_dump_table, i.e.
>
> unsigned long last = cb->args[1];
> cb->args[1] = 0;
>
> for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
>          ...
>     hlist_nulls_for_each_entry ... {
>        ...
>        if (last) {
>             if (last != (unsigned long)ct))
>                cont;
>             last = 0;
>        }
>        ...
>        dump();
>     }
>     last = 0; /* reset it, as it wasn't in args[0] slot */
>  }
>
> Do you see any problem with that?

I think this will be better, this will make code more clean.
Also we can clean up the ctnetlink_exp_ct_dump_table too.

>
> [ It might be better to take your patch for nf- though and do
>   this no-refcnt thing in nf-next ... ]
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso May 23, 2017, 9:34 p.m. UTC | #3
On Sun, May 21, 2017 at 08:59:45AM +0800, Liping Zhang wrote:
> Hi Florian,
> 
> 2017-05-21 8:00 GMT+08:00 Florian Westphal <fw@strlen.de>:
> [...]
> > Yes, you're right, seems this was added in
> > 93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out').
> 
> I added some trace logs, and when the hash size reduced, for example,
> from 60000 to 500, then the issue would happen.
> 
> Actually, hitting 'goto out' is not easy, so the issue exists for a very long
> time. Maybe commit 89f2e21883b5("[NETFILTER]: ctnetlink: change
> table dumping not to require an unique ID") is to blame for it.
> 
> > Your patch looks correct.
> >
> > However, why do we bump refcnt of 'last' in the first place?
> >
> > Its only the continuation marker, i.e. its expected to reside
> > in the hash slot at cb->args[0], but after rehash this might not
> > be true either.
> >
> > I think we should simplify this, just take the verbatim address,
> > and clear it right at start of ctnetlink_dump_table, i.e.
> >
> > unsigned long last = cb->args[1];
> > cb->args[1] = 0;
> >
> > for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
> >          ...
> >     hlist_nulls_for_each_entry ... {
> >        ...
> >        if (last) {
> >             if (last != (unsigned long)ct))
> >                cont;
> >             last = 0;
> >        }
> >        ...
> >        dump();
> >     }
> >     last = 0; /* reset it, as it wasn't in args[0] slot */
> >  }
> >
> > Do you see any problem with that?
> 
> I think this will be better, this will make code more clean.
> Also we can clean up the ctnetlink_exp_ct_dump_table too.

@Florian, no objection then if I place this into nf.git?

I will append the Fixes: tag:

Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID")
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 23, 2017, 10:28 p.m. UTC | #4
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Sun, May 21, 2017 at 08:59:45AM +0800, Liping Zhang wrote:
> > Hi Florian,
> > 
> > 2017-05-21 8:00 GMT+08:00 Florian Westphal <fw@strlen.de>:
> > [...]
> > > Yes, you're right, seems this was added in
> > > 93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c (it adds the 'goto out').
> > 
> > I added some trace logs, and when the hash size reduced, for example,
> > from 60000 to 500, then the issue would happen.
> > 
> > Actually, hitting 'goto out' is not easy, so the issue exists for a very long
> > time. Maybe commit 89f2e21883b5("[NETFILTER]: ctnetlink: change
> > table dumping not to require an unique ID") is to blame for it.
> > 
> > > Your patch looks correct.
> > >
> > > However, why do we bump refcnt of 'last' in the first place?
> > >
> > > Its only the continuation marker, i.e. its expected to reside
> > > in the hash slot at cb->args[0], but after rehash this might not
> > > be true either.
> > >
> > > I think we should simplify this, just take the verbatim address,
> > > and clear it right at start of ctnetlink_dump_table, i.e.
> > >
> > > unsigned long last = cb->args[1];
> > > cb->args[1] = 0;
> > >
> > > for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
> > >          ...
> > >     hlist_nulls_for_each_entry ... {
> > >        ...
> > >        if (last) {
> > >             if (last != (unsigned long)ct))
> > >                cont;
> > >             last = 0;
> > >        }
> > >        ...
> > >        dump();
> > >     }
> > >     last = 0; /* reset it, as it wasn't in args[0] slot */
> > >  }
> > >
> > > Do you see any problem with that?
> > 
> > I think this will be better, this will make code more clean.
> > Also we can clean up the ctnetlink_exp_ct_dump_table too.
> 
> @Florian, no objection then if I place this into nf.git?

No objection, thanks!

> I will append the Fixes: tag:
> 
> Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID")

That commit looks fine to me, it seems to make sure to put
"last" only once in all cases.

93bb0ceb75be2fdfa9fc0dd1 however adds a check on cb->args[0], and if
that is hit it will do a put() on last, and then, the "done" netlink
callback will do another put operation on cb->args[1] (i.e., last).

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liping Zhang May 24, 2017, 12:52 a.m. UTC | #5
2017-05-24 6:28 GMT+08:00 Florian Westphal <fw@strlen.de>:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
[...]
>> I will append the Fixes: tag:
>>
>> Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID")
>
> That commit looks fine to me, it seems to make sure to put
> "last" only once in all cases.
>
> 93bb0ceb75be2fdfa9fc0dd1 however adds a check on cb->args[0], and if
> that is hit it will do a put() on last, and then, the "done" netlink
> callback will do another put operation on cb->args[1] (i.e., last).

After I have a closer look, I think this patch should add:

Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping")

After this commit, when the hash size was reduced, for example,
from 60000 to 600, then we may put the "last" ct twice, as we may
fail to go into the iteration and clear the cb->args[1], so:

1. nf_ct_put(last) by ctnetlink_dump_table, but cb->args[1] still
point to the "last"
2. nf_ct_put((struct nf_conn *)cb->args[1]) by ctnetlink_done
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 24, 2017, 6:22 a.m. UTC | #6
Liping Zhang <zlpnobody@gmail.com> wrote:
> 2017-05-24 6:28 GMT+08:00 Florian Westphal <fw@strlen.de>:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> [...]
> >> I will append the Fixes: tag:
> >>
> >> Fixes: 89f2e21883b5 ("[NETFILTER]: ctnetlink: change table dumping not to require an unique ID")
> >
> > That commit looks fine to me, it seems to make sure to put
> > "last" only once in all cases.
> >
> > 93bb0ceb75be2fdfa9fc0dd1 however adds a check on cb->args[0], and if
> > that is hit it will do a put() on last, and then, the "done" netlink
> > callback will do another put operation on cb->args[1] (i.e., last).
> 
> After I have a closer look, I think this patch should add:
> 
> Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping")
> 
> After this commit, when the hash size was reduced, for example,
> from 60000 to 600, then we may put the "last" ct twice, as we may
> fail to go into the iteration and clear the cb->args[1], so:
> 
> 1. nf_ct_put(last) by ctnetlink_dump_table, but cb->args[1] still
> point to the "last"
> 2. nf_ct_put((struct nf_conn *)cb->args[1]) by ctnetlink_done

You are right.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso May 24, 2017, 10:24 a.m. UTC | #7
On Sun, May 21, 2017 at 07:22:49AM +0800, Liping Zhang wrote:
> From: Liping Zhang <zlpnobody@gmail.com>
> 
> If nf_conntrack_htable_size was adjusted by the user during the ct
> dump operation, we may invoke nf_ct_put twice for the same ct, i.e.
> the "last" ct. This will cause the ct will be freed but still linked
> in hash buckets.
> 
> It's very easy to reproduce the problem by the following commands:
>   # while : ; do
>   echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
>   done
>   # while : ; do
>   conntrack -L
>   done
>   # iperf -s 127.0.0.1 &
>   # iperf -c 127.0.0.1 -P 60 -t 36000
> 
> After a while, the system will hang like this:
>   NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184]
>   NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382]
>   ...
> 
> So at last if we find cb->args[1] is equal to "last", this means hash
> resize happened, then we can set cb->args[1] to 0 to fix the above
> issue.

Applied, thanks.

I have added:

Fixes: d205dc40798d ("[NETFILTER]: ctnetlink: fix deadlock in table dumping")
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index dcf561b..3b449e0 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -888,8 +888,13 @@  ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 	}
 out:
 	local_bh_enable();
-	if (last)
+	if (last) {
+		/* nf ct hash resize happened, now clear the leftover. */
+		if ((struct nf_conn *)cb->args[1] == last)
+			cb->args[1] = 0;
+
 		nf_ct_put(last);
+	}
 
 	while (i) {
 		i--;