diff mbox

[nf] netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse

Message ID 20141106133648.2534.1403.stgit@dragon
State Accepted
Delegated to: Pablo Neira
Headers show

Commit Message

Jesper Dangaard Brouer Nov. 6, 2014, 1:36 p.m. UTC
From: bill bonaparte <programme110@gmail.com>

After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75be2 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().

The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit.  In case
race occured, re-add the CT to the dying list.

Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 net/netfilter/nf_conntrack_core.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Pablo Neira Ayuso Nov. 10, 2014, 4:54 p.m. UTC | #1
On Thu, Nov 06, 2014 at 02:36:48PM +0100, Jesper Dangaard Brouer wrote:
> From: bill bonaparte <programme110@gmail.com>
> 
> After removal of the central spinlock nf_conntrack_lock, in
> commit 93bb0ceb75be2 ("netfilter: conntrack: remove central
> spinlock nf_conntrack_lock"), it is possible to race against
> get_next_corpse().
> 
> The race is against the get_next_corpse() cleanup on
> the "unconfirmed" list (a per-cpu list with seperate locking),
> which set the DYING bit.
> 
> Fix this race, in __nf_conntrack_confirm(), by removing the CT
> from unconfirmed list before checking the DYING bit.  In case
> race occured, re-add the CT to the dying list.

This seems correct to me, some side comments.

> Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
> Reported-by: bill bonaparte <programme110@gmail.com>
> Signed-off-by: bill bonaparte <programme110@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> 
>  net/netfilter/nf_conntrack_core.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 5016a69..1072650 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -611,12 +611,15 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>  	 */
>  	NF_CT_ASSERT(!nf_ct_is_confirmed(ct));
>  	pr_debug("Confirming conntrack %p\n", ct);
> -	/* We have to check the DYING flag inside the lock to prevent
> +
> +	/* We have to check the DYING flag after unlink to prevent
>  	   a race against nf_ct_get_next_corpse() possibly called from
>  	   user context, else we insert an already 'dead' hash, blocking
>  	   further use of that particular connection -JM */

While at this, I think it would be good to fix comment style to:

        /* We have ...
         * ...
         */

I can fix this here, no need to resend, just let me know.

> +	nf_ct_del_from_dying_or_unconfirmed_list(ct);
>  
>  	if (unlikely(nf_ct_is_dying(ct))) {
> +		nf_ct_add_to_dying_list(ct);
>  		nf_conntrack_double_unlock(hash, reply_hash);
>  		local_bh_enable();
>  		return NF_ACCEPT;

Not directly related to your patch, but I don't find a good reason why
we're accepting this packet.

If the conntrack from the unconfirmed list is dying, then the object
will be released by when the packet leaves the stack to its
destination. With stateful filtering depending in place, the follow up
packet in the reply direction will likely be considered invalid (if
tcp tracking is on). Fortunately for us, the origin will likely
retransmit the syn again, so the ct will be setup accordingly.

So, why should we allow this to go through?

This return verdict was introduced in: fc35077 ("netfilter:
nf_conntrack: fix a race in __nf_conntrack_confirm against
nf_ct_get_next_corpse()") btw.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Nov. 12, 2014, 7:35 a.m. UTC | #2
On Mon, 10 Nov 2014 17:54:39 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> On Thu, Nov 06, 2014 at 02:36:48PM +0100, Jesper Dangaard Brouer wrote:
> > From: bill bonaparte <programme110@gmail.com>
> > 
> > After removal of the central spinlock nf_conntrack_lock, in
> > commit 93bb0ceb75be2 ("netfilter: conntrack: remove central
> > spinlock nf_conntrack_lock"), it is possible to race against
> > get_next_corpse().
> > 
> > The race is against the get_next_corpse() cleanup on
> > the "unconfirmed" list (a per-cpu list with seperate locking),
> > which set the DYING bit.
> > 
> > Fix this race, in __nf_conntrack_confirm(), by removing the CT
> > from unconfirmed list before checking the DYING bit.  In case
> > race occured, re-add the CT to the dying list.
> 
> This seems correct to me, some side comments.
> 
> > Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
> > Reported-by: bill bonaparte <programme110@gmail.com>
> > Signed-off-by: bill bonaparte <programme110@gmail.com>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > 
> >  net/netfilter/nf_conntrack_core.c |    7 ++++---
> >  1 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 5016a69..1072650 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -611,12 +611,15 @@ __nf_conntrack_confirm(struct sk_buff *skb)
> >  	 */
> >  	NF_CT_ASSERT(!nf_ct_is_confirmed(ct));
> >  	pr_debug("Confirming conntrack %p\n", ct);
> > -	/* We have to check the DYING flag inside the lock to prevent
> > +
> > +	/* We have to check the DYING flag after unlink to prevent
> >  	   a race against nf_ct_get_next_corpse() possibly called from
> >  	   user context, else we insert an already 'dead' hash, blocking
> >  	   further use of that particular connection -JM */
> 
> While at this, I think it would be good to fix comment style to:
> 
>         /* We have ...
>          * ...
>          */
> 
> I can fix this here, no need to resend, just let me know.

Okay, I was just trying to keep the changes as minimal as possible, if
this should go into a stable-kernel.  Your choice.


> > +	nf_ct_del_from_dying_or_unconfirmed_list(ct);
> >  
> >  	if (unlikely(nf_ct_is_dying(ct))) {
> > +		nf_ct_add_to_dying_list(ct);
> >  		nf_conntrack_double_unlock(hash, reply_hash);
> >  		local_bh_enable();
> >  		return NF_ACCEPT;
> 
> Not directly related to your patch, but I don't find a good reason why
> we're accepting this packet.
> 
> If the conntrack from the unconfirmed list is dying, then the object
> will be released by when the packet leaves the stack to its
> destination. With stateful filtering depending in place, the follow up
> packet in the reply direction will likely be considered invalid (if
> tcp tracking is on). Fortunately for us, the origin will likely
> retransmit the syn again, so the ct will be setup accordingly.
> 
> So, why should we allow this to go through?

True, it also seems strange to me that we accept this packet.

> This return verdict was introduced in: fc35077 ("netfilter:
> nf_conntrack: fix a race in __nf_conntrack_confirm against
> nf_ct_get_next_corpse()") btw.

And the commit does not argue why NF_ACCEPT was chosen...
Joerg Marx Nov. 12, 2014, 10:57 a.m. UTC | #3
On 12.11.2014 08:35, Jesper Dangaard Brouer wrote:

Hi,

I wrote the patch in 2010, so find some arguments below:

>>> > > +	nf_ct_del_from_dying_or_unconfirmed_list(ct);
>>> > >  
>>> > >  	if (unlikely(nf_ct_is_dying(ct))) {
>>> > > +		nf_ct_add_to_dying_list(ct);
>>> > >  		nf_conntrack_double_unlock(hash, reply_hash);
>>> > >  		local_bh_enable();
>>> > >  		return NF_ACCEPT;
>> > 
>> > Not directly related to your patch, but I don't find a good reason why
>> > we're accepting this packet.
>> > 
>> > If the conntrack from the unconfirmed list is dying, then the object
>> > will be released by when the packet leaves the stack to its
>> > destination. With stateful filtering depending in place, the follow up
>> > packet in the reply direction will likely be considered invalid (if
>> > tcp tracking is on). Fortunately for us, the origin will likely
>> > retransmit the syn again, so the ct will be setup accordingly.
>> > 
>> > So, why should we allow this to go through?
> True, it also seems strange to me that we accept this packet.

The raise was triggered in a scenario when we tested high-load IPsec
tunnels and flushed the conntrack hashs from userspace.

For me there is little difference in choosing DROP or ACCEPT as verdict.
The packet/skb belongs to a formerly allowed connection, most likely
this connection is still allowed (but the conntrack hash entry is about
to be removed due to userspace is flushing the conntrack table).

To minimize the impact (lost packets -> retransmit) I decided to allow
the skb in flight, so were is no lost packet at this place.

When the connection is not allowed anymore (but was allowed up to now,
because the hash entry exists), the impact is one last packet 'slipping
through'.

Today I would still decide the way I did in 2010.

> 
>> > This return verdict was introduced in: fc35077 ("netfilter:
>> > nf_conntrack: fix a race in __nf_conntrack_confirm against
>> > nf_ct_get_next_corpse()") btw.
> And the commit does not argue why NF_ACCEPT was chosen...
> 
> -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel
> Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn:
> http://www.linkedin.com/in/brouer

best regards
Jörg Marx
Pablo Neira Ayuso Nov. 13, 2014, 12:08 p.m. UTC | #4
On Wed, Nov 12, 2014 at 11:57:31AM +0100, Jörg Marx wrote:
> On 12.11.2014 08:35, Jesper Dangaard Brouer wrote:
> 
> Hi,
> 
> I wrote the patch in 2010, so find some arguments below:
> 
> >>> > > +	nf_ct_del_from_dying_or_unconfirmed_list(ct);
> >>> > >  
> >>> > >  	if (unlikely(nf_ct_is_dying(ct))) {
> >>> > > +		nf_ct_add_to_dying_list(ct);
> >>> > >  		nf_conntrack_double_unlock(hash, reply_hash);
> >>> > >  		local_bh_enable();
> >>> > >  		return NF_ACCEPT;
> >> > 
> >> > Not directly related to your patch, but I don't find a good reason why
> >> > we're accepting this packet.
> >> > 
> >> > If the conntrack from the unconfirmed list is dying, then the object
> >> > will be released by when the packet leaves the stack to its
> >> > destination. With stateful filtering depending in place, the follow up
> >> > packet in the reply direction will likely be considered invalid (if
> >> > tcp tracking is on). Fortunately for us, the origin will likely
> >> > retransmit the syn again, so the ct will be setup accordingly.
> >> > 
> >> > So, why should we allow this to go through?
> > True, it also seems strange to me that we accept this packet.
> 
> The raise was triggered in a scenario when we tested high-load IPsec
> tunnels and flushed the conntrack hashs from userspace.
> 
> For me there is little difference in choosing DROP or ACCEPT as verdict.
> The packet/skb belongs to a formerly allowed connection, most likely
> this connection is still allowed (but the conntrack hash entry is about
> to be removed due to userspace is flushing the conntrack table).

__nf_conntrack_confirm() is only called for the first packet that we
see in a flow. If you just invoked the flush command once (which
should be the common case), then this is likely to be the first packet
of the flow (unless you already called flush anytime soon in the
past).

> To minimize the impact (lost packets -> retransmit) I decided to allow
> the skb in flight, so were is no lost packet at this place.

I understand your original motivation was to be conservative.

> When the connection is not allowed anymore (but was allowed up to now,
> because the hash entry exists), the impact is one last packet 'slipping
> through'.

The general policy in conntrack is to not drop packets, but in this
case we'll leave things in inconsistent state (ie. we will likely
receive a reply packet in response to the original packet that has no
conntrack yet).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joerg Marx Nov. 13, 2014, 2:33 p.m. UTC | #5
On 13.11.2014 13:08, Pablo Neira Ayuso wrote:

>> For me there is little difference in choosing DROP or ACCEPT as verdict.
>> > The packet/skb belongs to a formerly allowed connection, most likely
>> > this connection is still allowed (but the conntrack hash entry is about
>> > to be removed due to userspace is flushing the conntrack table).
> __nf_conntrack_confirm() is only called for the first packet that we
> see in a flow. If you just invoked the flush command once (which
> should be the common case), then this is likely to be the first packet
> of the flow (unless you already called flush anytime soon in the
> past).
Yes, you are right. As far as I remember it was very hard to trigger
that critical moment, when the first packet triggered the insertion into
the hash table. But the test and production systems showed this strange
behaviour, that no traffic was allowed to flow for exactly 600 seconds.

> 
>> > To minimize the impact (lost packets -> retransmit) I decided to allow
>> > the skb in flight, so were is no lost packet at this place.
> I understand your original motivation was to be conservative.
Yes.

> 
>> > When the connection is not allowed anymore (but was allowed up to now,
>> > because the hash entry exists), the impact is one last packet 'slipping
>> > through'.
Feel free to change the verdict, IMHO it doesn't matter at all as long
as the hash table is in a consistent state. The higher protocol layers
will deal with the missing packet.

> The general policy in conntrack is to not drop packets, but in this
> case we'll leave things in inconsistent state (ie. we will likely
> receive a reply packet in response to the original packet that has no
> conntrack yet).
Under heavy load this can happen anyway I guess?

Thanks and best regards
Jörg
Pablo Neira Ayuso Nov. 14, 2014, 4:40 p.m. UTC | #6
On Wed, Nov 12, 2014 at 08:35:00AM +0100, Jesper Dangaard Brouer wrote:
> > > -	/* We have to check the DYING flag inside the lock to prevent
> > > +
> > > +	/* We have to check the DYING flag after unlink to prevent
> > >  	   a race against nf_ct_get_next_corpse() possibly called from
> > >  	   user context, else we insert an already 'dead' hash, blocking
> > >  	   further use of that particular connection -JM */
> > 
> > While at this, I think it would be good to fix comment style to:
> > 
> >         /* We have ...
> >          * ...
> >          */
> > 
> > I can fix this here, no need to resend, just let me know.
> 
> Okay, I was just trying to keep the changes as minimal as possible, if
> this should go into a stable-kernel.  Your choice.

I'm going to take this patch including the comment style fix, I would
like to avoid specific patches to fix coding style issues, and the
first line of this comment is updated. I think the patch will be still
small to fulfill -stable rules.

I'll send a follow a patch to change the return verdict to NF_DROP to
not mix up different things.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5016a69..1072650 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -611,12 +611,15 @@  __nf_conntrack_confirm(struct sk_buff *skb)
 	 */
 	NF_CT_ASSERT(!nf_ct_is_confirmed(ct));
 	pr_debug("Confirming conntrack %p\n", ct);
-	/* We have to check the DYING flag inside the lock to prevent
+
+	/* We have to check the DYING flag after unlink to prevent
 	   a race against nf_ct_get_next_corpse() possibly called from
 	   user context, else we insert an already 'dead' hash, blocking
 	   further use of that particular connection -JM */
+	nf_ct_del_from_dying_or_unconfirmed_list(ct);
 
 	if (unlikely(nf_ct_is_dying(ct))) {
+		nf_ct_add_to_dying_list(ct);
 		nf_conntrack_double_unlock(hash, reply_hash);
 		local_bh_enable();
 		return NF_ACCEPT;
@@ -636,8 +639,6 @@  __nf_conntrack_confirm(struct sk_buff *skb)
 		    zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
 			goto out;
 
-	nf_ct_del_from_dying_or_unconfirmed_list(ct);
-
 	/* Timer relative to confirmation time, not original
 	   setting time, otherwise we'd get timer wrap in
 	   weird delay cases. */