diff mbox

[nf] netfilter: use RCU safe kfree for conntrack extensions

Message ID 20130911090900.0129AE8AFF@unicorn.suse.cz
State Accepted
Headers show

Commit Message

Michal Kubecek Sept. 11, 2013, 8:17 a.m. UTC
Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced
RCU protection for freeing extension data when reallocation
moves them to a new location. We need the same protection when
freeing them in nf_ct_ext_free() in order to prevent a
use-after-free by other threads referencing a NAT extension data
via bysource list.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 include/net/netfilter/nf_conntrack_extend.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Phil Oester Sept. 11, 2013, 2:57 p.m. UTC | #1
On Wed, Sep 11, 2013 at 10:17:27AM +0200, Michal Kubecek wrote:
> Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced
> RCU protection for freeing extension data when reallocation
> moves them to a new location. We need the same protection when
> freeing them in nf_ct_ext_free() in order to prevent a
> use-after-free by other threads referencing a NAT extension data
> via bysource list.

Hi Michal - 

coincidentally I've been looking into this area this week due to another
bug report (https://bugzilla.kernel.org/show_bug.cgi?id=60853).  Looking at
your proposed fix, the NAT extension data should have been cleaned
from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy)
before reaching the kfree.  Would you agree?

The reporter of #60853 suggested adding a synchronize_rcu to the end of the
nf_nat_cleanup_conntrack function, which seems sane.  I have been trying to 
reproduce the crash to test that theory.

Are you able to reproduce an OOPS in your testing?  Or is there a bug report
you are working from?

Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek Sept. 11, 2013, 3:28 p.m. UTC | #2
On Wed, Sep 11, 2013 at 07:57:15AM -0700, Phil Oester wrote:
> On Wed, Sep 11, 2013 at 10:17:27AM +0200, Michal Kubecek wrote:
> > Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced
> > RCU protection for freeing extension data when reallocation
> > moves them to a new location. We need the same protection when
> > freeing them in nf_ct_ext_free() in order to prevent a
> > use-after-free by other threads referencing a NAT extension data
> > via bysource list.
> 
> Hi Michal - 
> 
> coincidentally I've been looking into this area this week due to another
> bug report (https://bugzilla.kernel.org/show_bug.cgi?id=60853).

Looking at the initial command, I would say this bug report is actually
of the same origin as ours.

> Looking at
> your proposed fix, the NAT extension data should have been cleaned
> from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy)
> before reaching the kfree.  Would you agree?

It is cleaned from the list but as it is an RCU list, other readers can
still be holding pointers to it. We have to wait for the RCU grace
period before we can reuse it.

> The reporter of #60853 suggested adding a synchronize_rcu to the end of the
> nf_nat_cleanup_conntrack function, which seems sane.

That was also my first idea. However, nf_nat_cleanup_conntrack() is
called from __nf_ct_ext_destroy() inside a rcu_read_lock() /
rcu_read_unlock() block. Even if this block is for a different RCU list,
we still cannot call synchronize_rcu() while inside it.

We could call synchronize_rcu() in __nf_ct_ext_destroy() after
rcu_read_unlock() but this would IMHO add an unnecessary delay so it is
more efficient and more appropriate to wait before the actual kfree()
which is the operation that needs to wait for RCU grace period.

> I have been trying to reproduce the crash to test that theory.
> Are you able to reproduce an OOPS in your testing?  Or is there a bug
> report you are working from?

No, it is a bugreport from our customer. And even that customer
encountered it only once so far. Which is not very surprising as to
reproduce it, you have to be (un)lucky twice: first to have someone
overwrite the area soon enough and second to have someone access the
area after it is overwritten. This is IMHO the reason why the reporter
cleared the block with memset() for testing purposes.

                                                     Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phil Oester Sept. 11, 2013, 5:09 p.m. UTC | #3
On Wed, Sep 11, 2013 at 05:28:05PM +0200, Michal Kubecek wrote:
> > Looking at
> > your proposed fix, the NAT extension data should have been cleaned
> > from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy)
> > before reaching the kfree.  Would you agree?
> 
> It is cleaned from the list but as it is an RCU list, other readers can
> still be holding pointers to it. We have to wait for the RCU grace
> period before we can reuse it.

Agreed - looks like your fix should work.  However, two nits:

1) normally RCU functions have _rcu suffixes.  So nf_ct_ext_free should
become nf_ct_ext_free_rcu.

2) kfree_rcu was not added to the kernel until 3.0.  All of the bug
reports I've been looking into (including the original in netfilter bugzilla
at http://bugzilla.netfilter.org/show_bug.cgi?id=714) have been reported in
2.6.32 or earlier kernels.  So a different fix would need to be backported for
-stable.  For that, we would probably export __nf_ct_ext_free_rcu from
nf_conntrack_extend.c and change the kfree call in nf_ct_ext_free_rcu to
call_rcu(&ct->ext->rcu, __nf_ct_ext_free_rcu). Of course the alternative
is just to use this fix for both old and new kernels for simplicity.

> No, it is a bugreport from our customer. And even that customer
> encountered it only once so far. Which is not very surprising as to
> reproduce it, you have to be (un)lucky twice: first to have someone
> overwrite the area soon enough and second to have someone access the
> area after it is overwritten.

Yes, hitting this seems dependent upon phase of the moon.

Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek Sept. 11, 2013, 5:42 p.m. UTC | #4
On Wed, Sep 11, 2013 at 10:09:47AM -0700, Phil Oester wrote:
> 
> 1) normally RCU functions have _rcu suffixes.  So nf_ct_ext_free should
> become nf_ct_ext_free_rcu.

Right. I'll post updated version with the rename tomorrow.

> 2) kfree_rcu was not added to the kernel until 3.0.  All of the bug
> reports I've been looking into (including the original in netfilter bugzilla
> at http://bugzilla.netfilter.org/show_bug.cgi?id=714) have been reported in
> 2.6.32 or earlier kernels.  So a different fix would need to be backported for
> -stable.  For that, we would probably export __nf_ct_ext_free_rcu from
> nf_conntrack_extend.c and change the kfree call in nf_ct_ext_free_rcu to
> call_rcu(&ct->ext->rcu, __nf_ct_ext_free_rcu).

Yes, the patch submitted here is against current nf branch. For
SLES 11 SP1 (with 2.6.32 kernel), I'm going to use call_rcu() the way
original commit 68b80f11 does. IIRC the only pre-3.0 stable branch still
maintained is 2.6.32, all others are 3.0 or newer so they have
kfree_rcu() and also use it in __nf_ct_ext_add() since commit 1f8d36a1. 

                                                        Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Sept. 11, 2013, 5:50 p.m. UTC | #5
On Wed, Sep 11, 2013 at 10:09:47AM -0700, Phil Oester wrote:
> On Wed, Sep 11, 2013 at 05:28:05PM +0200, Michal Kubecek wrote:
> > > Looking at
> > > your proposed fix, the NAT extension data should have been cleaned
> > > from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy)
> > > before reaching the kfree.  Would you agree?
> > 
> > It is cleaned from the list but as it is an RCU list, other readers can
> > still be holding pointers to it. We have to wait for the RCU grace
> > period before we can reuse it.
> 
> Agreed - looks like your fix should work.  However, two nits:
> 
> 1) normally RCU functions have _rcu suffixes.  So nf_ct_ext_free should
> become nf_ct_ext_free_rcu.

That postfix is there if the function requires to be called holding
rcu read lock, not this case. I'll take this patch.

> 2) kfree_rcu was not added to the kernel until 3.0.  All of the bug
> reports I've been looking into (including the original in netfilter bugzilla
> at http://bugzilla.netfilter.org/show_bug.cgi?id=714) have been reported in
> 2.6.32 or earlier kernels.  So a different fix would need to be backported for
> -stable.  For that, we would probably export __nf_ct_ext_free_rcu from
> nf_conntrack_extend.c and change the kfree call in nf_ct_ext_free_rcu to
> call_rcu(&ct->ext->rcu, __nf_ct_ext_free_rcu). Of course the alternative
> is just to use this fix for both old and new kernels for simplicity.

Either way, we need a specific backport for 2.6.x indeed.

Thanks for tracking up this issue.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 11, 2013, 5:54 p.m. UTC | #6
On Wed, 2013-09-11 at 19:42 +0200, Michal Kubecek wrote:

> Yes, the patch submitted here is against current nf branch. For
> SLES 11 SP1 (with 2.6.32 kernel), I'm going to use call_rcu() the way
> original commit 68b80f11 does. 

Well, please just submit a patch for current tree, using kfree_rcu()

When doing backport to stable branches, needed adaptation shall be done.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
index 977bc8a..3313108 100644
--- a/include/net/netfilter/nf_conntrack_extend.h
+++ b/include/net/netfilter/nf_conntrack_extend.h
@@ -80,7 +80,7 @@  static inline void nf_ct_ext_destroy(struct nf_conn *ct)
 static inline void nf_ct_ext_free(struct nf_conn *ct)
 {
 	if (ct->ext)
-		kfree(ct->ext);
+		kfree_rcu(ct->ext, rcu);
 }
 
 /* Add this type, returns pointer to data or NULL. */