diff mbox

[1/3] netfilter: nf_tables: fix transaction race condition

Message ID 20150304170102.GA7589@salvia
State Accepted
Delegated to: Pablo Neira
Headers show

Commit Message

Pablo Neira Ayuso March 4, 2015, 5:01 p.m. UTC
On Wed, Mar 04, 2015 at 04:26:13PM +0000, Patrick McHardy wrote:
> On 04.03, Pablo Neira Ayuso wrote:
> > On Tue, Mar 03, 2015 at 08:04:18PM +0000, Patrick McHardy wrote:
> > > A race condition exists in the rule transaction code for rules that
> > > get added and removed within the same transaction.
> > > 
> > > The new rule starts out as inactive in the current and active in the
> > > next generation and is inserted into the ruleset. When it is deleted,
> > > it is additionally set to inactive in the next generation as well.
> > > 
> > > On commit the next generation is begun, then the actions are finalized.
> > > For the new rule this would mean clearing out the inactive bit for
> > > the previously current, now next generation.
> > > 
> > > However nft_rule_clear() clears out the bits for *both* generations,
> > > activating the rule in the current generation, where it should be
> > > deactivated due to being deleted. The rule will thus be active until
> > > the deletion is finalized, removing the rule from the ruleset.
> > > 
> > > Similarly, when aborting a transaction for the same case, the undo
> > > of insertion will remove it from the RCU protected rule list, the
> > > deletion will clear out all bits. However until the next RCU
> > > synchronization after all operations have been undone, the rule is
> > > active on CPUs which can still see the rule on the list.
> > > 
> > > Generally, there may never be any modifications of the current
> > > generations' inactive bit since this defeats the entire purpose of
> > > atomicity. Change nft_rule_clear() to only touch the next generations
> > > bit to fix this.
> > 
> > I think we can get rid of the nft_rule_clear() call from the error
> > path of nf_tables_newrule() too.
> 
> I don't think so, we deactivate the old rule for NLM_F_REPLACE and
> need to undo that on error.

Right.

> Or are you talking about getting rid of the entire error handling
> for NLM_F_REPLACE and have it taken care of by the abort() path?

Yes, that error handling I think we can get rid of it. It's actually
not correct because it's deleting the old rule.

Comments

Patrick McHardy March 4, 2015, 5:03 p.m. UTC | #1
On 04.03, Pablo Neira Ayuso wrote:
> > > I think we can get rid of the nft_rule_clear() call from the error
> > > path of nf_tables_newrule() too.
> > 
> > I don't think so, we deactivate the old rule for NLM_F_REPLACE and
> > need to undo that on error.
> 
> Right.
> 
> > Or are you talking about getting rid of the entire error handling
> > for NLM_F_REPLACE and have it taken care of by the abort() path?
> 
> Yes, that error handling I think we can get rid of it. It's actually
> not correct because it's deleting the old rule.

It does? All I can see is reactivating it?

> In general, if a transaction object is added to the list successfully,
> we can rely on the abort path to undo what we've done. This allows us to
> simplify the error handling of the rule replacement path in
> nf_tables_newrule().
> 
> This implicitly fixes an unnecessary removal of the old rule removal,
> which needs to be left in place if we fail to replace.

I agree on the simplification, but I don't see any problem with this.

>  err3:
>  	list_del_rcu(&rule->list);
> -	if (trans) {
> -		list_del_rcu(&nft_trans_rule(trans)->list);
> -		nft_rule_clear(net, nft_trans_rule(trans));
> -		nft_trans_destroy(trans);
> -		chain->use++;
> -	}
>  err2:
>  	nf_tables_rule_destroy(&ctx, rule);
>  err1:
> -- 
> 1.7.10.4
> 

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso March 4, 2015, 5:28 p.m. UTC | #2
On Wed, Mar 04, 2015 at 05:03:56PM +0000, Patrick McHardy wrote:
> On 04.03, Pablo Neira Ayuso wrote:
> > > > I think we can get rid of the nft_rule_clear() call from the error
> > > > path of nf_tables_newrule() too.
> > > 
> > > I don't think so, we deactivate the old rule for NLM_F_REPLACE and
> > > need to undo that on error.
> > 
> > Right.
> > 
> > > Or are you talking about getting rid of the entire error handling
> > > for NLM_F_REPLACE and have it taken care of by the abort() path?
> > 
> > Yes, that error handling I think we can get rid of it. It's actually
> > not correct because it's deleting the old rule.
> 
> It does? All I can see is reactivating it?
> 
> > In general, if a transaction object is added to the list successfully,
> > we can rely on the abort path to undo what we've done. This allows us to
> > simplify the error handling of the rule replacement path in
> > nf_tables_newrule().
> > 
> > This implicitly fixes an unnecessary removal of the old rule removal,
> > which needs to be left in place if we fail to replace.
> 
> I agree on the simplification, but I don't see any problem with this.

Let me see, from the replacement path:

       trans = nft_trans_rule_add(&ctx, NFT_MSG_DELRULE,
                                  old_rule);
       ...
       nft_rule_deactivate_next(net, old_rule);
       chain->use--;
       list_add_tail_rcu(&rule->list, &old_rule->list);

So we basically:

1) add transaction object to delete the old rule
2) deactivate the old rule
3) reduce the chain use counter
4) add the new rule after the old rule

Then, if we fail to add the transaction for the new rule, what we have
in err3 says:

1) We remove the new rule from the chain, we couldn't add a
   transaction object for this, so we have to manually undo this.
2) We remove the old rule (but it should actually be left there in
   place).
3) Clear the old rule generation bits, as it needs to be active in the
   next generation given that we failed (this undoes step2)
4) Release the transaction object.
5) Restore chain use counter.

#3, #4 and #5 can be handled from the abort path.

#2 should not be there I think.

> >  err3:
> >  	list_del_rcu(&rule->list);
> > -	if (trans) {
> > -		list_del_rcu(&nft_trans_rule(trans)->list);
> > -		nft_rule_clear(net, nft_trans_rule(trans));
> > -		nft_trans_destroy(trans);
> > -		chain->use++;
> > -	}
> >  err2:
> >  	nf_tables_rule_destroy(&ctx, rule);
> >  err1:
> > -- 
> > 1.7.10.4
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 4, 2015, 5:36 p.m. UTC | #3
On 04.03, Pablo Neira Ayuso wrote:
> On Wed, Mar 04, 2015 at 05:03:56PM +0000, Patrick McHardy wrote:
> > On 04.03, Pablo Neira Ayuso wrote:
> > > > > I think we can get rid of the nft_rule_clear() call from the error
> > > > > path of nf_tables_newrule() too.
> > > > 
> > > > I don't think so, we deactivate the old rule for NLM_F_REPLACE and
> > > > need to undo that on error.
> > > 
> > > Right.
> > > 
> > > > Or are you talking about getting rid of the entire error handling
> > > > for NLM_F_REPLACE and have it taken care of by the abort() path?
> > > 
> > > Yes, that error handling I think we can get rid of it. It's actually
> > > not correct because it's deleting the old rule.
> > 
> > It does? All I can see is reactivating it?
> > 
> > > In general, if a transaction object is added to the list successfully,
> > > we can rely on the abort path to undo what we've done. This allows us to
> > > simplify the error handling of the rule replacement path in
> > > nf_tables_newrule().
> > > 
> > > This implicitly fixes an unnecessary removal of the old rule removal,
> > > which needs to be left in place if we fail to replace.
> > 
> > I agree on the simplification, but I don't see any problem with this.
> 
> Let me see, from the replacement path:
> 
>        trans = nft_trans_rule_add(&ctx, NFT_MSG_DELRULE,
>                                   old_rule);
>        ...
>        nft_rule_deactivate_next(net, old_rule);
>        chain->use--;
>        list_add_tail_rcu(&rule->list, &old_rule->list);
> 
> So we basically:
> 
> 1) add transaction object to delete the old rule
> 2) deactivate the old rule
> 3) reduce the chain use counter
> 4) add the new rule after the old rule
> 
> Then, if we fail to add the transaction for the new rule, what we have
> in err3 says:
> 
> 1) We remove the new rule from the chain, we couldn't add a
>    transaction object for this, so we have to manually undo this.
> 2) We remove the old rule (but it should actually be left there in
>    place).
> 3) Clear the old rule generation bits, as it needs to be active in the
>    next generation given that we failed (this undoes step2)
> 4) Release the transaction object.
> 5) Restore chain use counter.
> 
> #3, #4 and #5 can be handled from the abort path.
> 
> #2 should not be there I think.

I still don't see where we remove the old rule. We activate it
and remove the transaction object, but that's it.

Where do you see this?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 4, 2015, 5:38 p.m. UTC | #4
On 04.03, Patrick McHardy wrote:
> On 04.03, Pablo Neira Ayuso wrote:
> > On Wed, Mar 04, 2015 at 05:03:56PM +0000, Patrick McHardy wrote:
> > > 
> > > I agree on the simplification, but I don't see any problem with this.
> > 
> > Let me see, from the replacement path:
> > 
> >        trans = nft_trans_rule_add(&ctx, NFT_MSG_DELRULE,
> >                                   old_rule);
> >        ...
> >        nft_rule_deactivate_next(net, old_rule);
> >        chain->use--;
> >        list_add_tail_rcu(&rule->list, &old_rule->list);
> > 
> > So we basically:
> > 
> > 1) add transaction object to delete the old rule
> > 2) deactivate the old rule
> > 3) reduce the chain use counter
> > 4) add the new rule after the old rule
> > 
> > Then, if we fail to add the transaction for the new rule, what we have
> > in err3 says:
> > 
> > 1) We remove the new rule from the chain, we couldn't add a
> >    transaction object for this, so we have to manually undo this.
> > 2) We remove the old rule (but it should actually be left there in
> >    place).
> > 3) Clear the old rule generation bits, as it needs to be active in the
> >    next generation given that we failed (this undoes step2)
> > 4) Release the transaction object.
> > 5) Restore chain use counter.
> > 
> > #3, #4 and #5 can be handled from the abort path.
> > 
> > #2 should not be there I think.
> 
> I still don't see where we remove the old rule. We activate it
> and remove the transaction object, but that's it.
> 
> Where do you see this?

Ok got it, I misread the code and through we'd only delete the
transaction object. Ok, so I agree that this actually fixes a bug :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From f4ab0cab91e2968652745dc883d46da61421f560 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 4 Mar 2015 17:55:27 +0100
Subject: [PATCH] netfilter: nf_tables: fix error handling of rule replacement

In general, if a transaction object is added to the list successfully,
we can rely on the abort path to undo what we've done. This allows us to
simplify the error handling of the rule replacement path in
nf_tables_newrule().

This implicitly fixes an unnecessary removal of the old rule removal,
which needs to be left in place if we fail to replace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index a8c9462..6668adb 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2031,12 +2031,6 @@  static int nf_tables_newrule(struct sock *nlsk, struct sk_buff *skb,
 
 err3:
 	list_del_rcu(&rule->list);
-	if (trans) {
-		list_del_rcu(&nft_trans_rule(trans)->list);
-		nft_rule_clear(net, nft_trans_rule(trans));
-		nft_trans_destroy(trans);
-		chain->use++;
-	}
 err2:
 	nf_tables_rule_destroy(&ctx, rule);
 err1:
-- 
1.7.10.4