Message ID | 20130911090900.0129AE8AFF@unicorn.suse.cz |
---|---|
State | Accepted |
Headers | show |
On Wed, Sep 11, 2013 at 10:17:27AM +0200, Michal Kubecek wrote: > Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced > RCU protection for freeing extension data when reallocation > moves them to a new location. We need the same protection when > freeing them in nf_ct_ext_free() in order to prevent a > use-after-free by other threads referencing a NAT extension data > via bysource list. Hi Michal - coincidentally I've been looking into this area this week due to another bug report (https://bugzilla.kernel.org/show_bug.cgi?id=60853). Looking at your proposed fix, the NAT extension data should have been cleaned from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy) before reaching the kfree. Would you agree? The reporter of #60853 suggested adding a synchronize_rcu to the end of the nf_nat_cleanup_conntrack function, which seems sane. I have been trying to reproduce the crash to test that theory. Are you able to reproduce an OOPS in your testing? Or is there a bug report you are working from? Phil -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 11, 2013 at 07:57:15AM -0700, Phil Oester wrote: > On Wed, Sep 11, 2013 at 10:17:27AM +0200, Michal Kubecek wrote: > > Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced > > RCU protection for freeing extension data when reallocation > > moves them to a new location. We need the same protection when > > freeing them in nf_ct_ext_free() in order to prevent a > > use-after-free by other threads referencing a NAT extension data > > via bysource list. > > Hi Michal - > > coincidentally I've been looking into this area this week due to another > bug report (https://bugzilla.kernel.org/show_bug.cgi?id=60853). Looking at the initial command, I would say this bug report is actually of the same origin as ours. > Looking at > your proposed fix, the NAT extension data should have been cleaned > from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy) > before reaching the kfree. Would you agree? It is cleaned from the list but as it is an RCU list, other readers can still be holding pointers to it. We have to wait for the RCU grace period before we can reuse it. > The reporter of #60853 suggested adding a synchronize_rcu to the end of the > nf_nat_cleanup_conntrack function, which seems sane. That was also my first idea. However, nf_nat_cleanup_conntrack() is called from __nf_ct_ext_destroy() inside a rcu_read_lock() / rcu_read_unlock() block. Even if this block is for a different RCU list, we still cannot call synchronize_rcu() while inside it. We could call synchronize_rcu() in __nf_ct_ext_destroy() after rcu_read_unlock() but this would IMHO add an unnecessary delay so it is more efficient and more appropriate to wait before the actual kfree() which is the operation that needs to wait for RCU grace period. > I have been trying to reproduce the crash to test that theory. > Are you able to reproduce an OOPS in your testing? Or is there a bug > report you are working from? No, it is a bugreport from our customer. And even that customer encountered it only once so far. Which is not very surprising as to reproduce it, you have to be (un)lucky twice: first to have someone overwrite the area soon enough and second to have someone access the area after it is overwritten. This is IMHO the reason why the reporter cleared the block with memset() for testing purposes. Michal Kubecek -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 11, 2013 at 05:28:05PM +0200, Michal Kubecek wrote: > > Looking at > > your proposed fix, the NAT extension data should have been cleaned > > from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy) > > before reaching the kfree. Would you agree? > > It is cleaned from the list but as it is an RCU list, other readers can > still be holding pointers to it. We have to wait for the RCU grace > period before we can reuse it. Agreed - looks like your fix should work. However, two nits: 1) normally RCU functions have _rcu suffixes. So nf_ct_ext_free should become nf_ct_ext_free_rcu. 2) kfree_rcu was not added to the kernel until 3.0. All of the bug reports I've been looking into (including the original in netfilter bugzilla at http://bugzilla.netfilter.org/show_bug.cgi?id=714) have been reported in 2.6.32 or earlier kernels. So a different fix would need to be backported for -stable. For that, we would probably export __nf_ct_ext_free_rcu from nf_conntrack_extend.c and change the kfree call in nf_ct_ext_free_rcu to call_rcu(&ct->ext->rcu, __nf_ct_ext_free_rcu). Of course the alternative is just to use this fix for both old and new kernels for simplicity. > No, it is a bugreport from our customer. And even that customer > encountered it only once so far. Which is not very surprising as to > reproduce it, you have to be (un)lucky twice: first to have someone > overwrite the area soon enough and second to have someone access the > area after it is overwritten. Yes, hitting this seems dependent upon phase of the moon. Phil -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 11, 2013 at 10:09:47AM -0700, Phil Oester wrote: > > 1) normally RCU functions have _rcu suffixes. So nf_ct_ext_free should > become nf_ct_ext_free_rcu. Right. I'll post updated version with the rename tomorrow. > 2) kfree_rcu was not added to the kernel until 3.0. All of the bug > reports I've been looking into (including the original in netfilter bugzilla > at http://bugzilla.netfilter.org/show_bug.cgi?id=714) have been reported in > 2.6.32 or earlier kernels. So a different fix would need to be backported for > -stable. For that, we would probably export __nf_ct_ext_free_rcu from > nf_conntrack_extend.c and change the kfree call in nf_ct_ext_free_rcu to > call_rcu(&ct->ext->rcu, __nf_ct_ext_free_rcu). Yes, the patch submitted here is against current nf branch. For SLES 11 SP1 (with 2.6.32 kernel), I'm going to use call_rcu() the way original commit 68b80f11 does. IIRC the only pre-3.0 stable branch still maintained is 2.6.32, all others are 3.0 or newer so they have kfree_rcu() and also use it in __nf_ct_ext_add() since commit 1f8d36a1. Michal Kubecek -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 11, 2013 at 10:09:47AM -0700, Phil Oester wrote: > On Wed, Sep 11, 2013 at 05:28:05PM +0200, Michal Kubecek wrote: > > > Looking at > > > your proposed fix, the NAT extension data should have been cleaned > > > from the bysource list in nf_nat_cleanup_conntrack (via __nf_ct_ext_destroy) > > > before reaching the kfree. Would you agree? > > > > It is cleaned from the list but as it is an RCU list, other readers can > > still be holding pointers to it. We have to wait for the RCU grace > > period before we can reuse it. > > Agreed - looks like your fix should work. However, two nits: > > 1) normally RCU functions have _rcu suffixes. So nf_ct_ext_free should > become nf_ct_ext_free_rcu. That postfix is there if the function requires to be called holding rcu read lock, not this case. I'll take this patch. > 2) kfree_rcu was not added to the kernel until 3.0. All of the bug > reports I've been looking into (including the original in netfilter bugzilla > at http://bugzilla.netfilter.org/show_bug.cgi?id=714) have been reported in > 2.6.32 or earlier kernels. So a different fix would need to be backported for > -stable. For that, we would probably export __nf_ct_ext_free_rcu from > nf_conntrack_extend.c and change the kfree call in nf_ct_ext_free_rcu to > call_rcu(&ct->ext->rcu, __nf_ct_ext_free_rcu). Of course the alternative > is just to use this fix for both old and new kernels for simplicity. Either way, we need a specific backport for 2.6.x indeed. Thanks for tracking up this issue. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-09-11 at 19:42 +0200, Michal Kubecek wrote: > Yes, the patch submitted here is against current nf branch. For > SLES 11 SP1 (with 2.6.32 kernel), I'm going to use call_rcu() the way > original commit 68b80f11 does. Well, please just submit a patch for current tree, using kfree_rcu() When doing backport to stable branches, needed adaptation shall be done. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h index 977bc8a..3313108 100644 --- a/include/net/netfilter/nf_conntrack_extend.h +++ b/include/net/netfilter/nf_conntrack_extend.h @@ -80,7 +80,7 @@ static inline void nf_ct_ext_destroy(struct nf_conn *ct) static inline void nf_ct_ext_free(struct nf_conn *ct) { if (ct->ext) - kfree(ct->ext); + kfree_rcu(ct->ext, rcu); } /* Add this type, returns pointer to data or NULL. */
Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced RCU protection for freeing extension data when reallocation moves them to a new location. We need the same protection when freeing them in nf_ct_ext_free() in order to prevent a use-after-free by other threads referencing a NAT extension data via bysource list. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> --- include/net/netfilter/nf_conntrack_extend.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)