Message ID | 4A38D9BE.3020403@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet wrote: > Patrick McHardy a écrit : >> Before the conntrack is confirmed, it is exclusively handled by a >> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT >> is visible before we add the conntrack to the hash table since the >> lookup is lockless, but simply moving the set_bit before the hash >> insertion should be fine I think. >> > > Hmm... now we could have the reverse case : > > __nf_conntrack_confirm() could be "interrupted" by __nf_ct_refresh_acct() > > index 5f72b94..22755fa 100644 > --- a/net/netfilter/nf_conntrack_core.c > +++ b/net/netfilter/nf_conntrack_core.c > @@ -425,6 +425,7 @@ __nf_conntrack_confirm(struct sk_buff *skb) > /* Remove from unconfirmed list */ > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); > > + set_bit(IPS_CONFIRMED_BIT, &ct->status); > __nf_conntrack_hash_insert(ct, hash, repl_hash); > /* Timer relative to confirmation time, not original > setting time, otherwise we'd get timer wrap in > @@ -432,7 +433,6 @@ __nf_conntrack_confirm(struct sk_buff *skb) > ct->timeout.expires += jiffies; > > << What happens if another packet is handled by __nf_ct_refresh_acct here >> > (seeing or not the IPS_CONFIRMED_BIT) >> > > add_timer(&ct->timeout); > > << or here ? >> > > > atomic_inc(&ct->ct_general.use); > - set_bit(IPS_CONFIRMED_BIT, &ct->status); > NF_CT_STAT_INC(net, insert); > spin_unlock_bh(&nf_conntrack_lock); > help = nfct_help(ct); > > Problem is timeout.expires is either a relative or absolute timeout, and changes happen > in __nf_conntrack_confirm() or __nf_ct_refresh_acct(). > > We must have a synchronization (an barriers), a single bit wont be enough. Please have a look at the second patch I just sent. It relies on the RCU barriers to make sure all stores are visible before other CPUs can find the conntrack. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy a écrit : > Eric Dumazet wrote: >> Patrick McHardy a écrit : >>> Before the conntrack is confirmed, it is exclusively handled by a >>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT >>> is visible before we add the conntrack to the hash table since the >>> lookup is lockless, but simply moving the set_bit before the hash >>> insertion should be fine I think. >>> >> >> >> Problem is timeout.expires is either a relative or absolute timeout, >> and changes happen >> in __nf_conntrack_confirm() or __nf_ct_refresh_acct(). >> >> We must have a synchronization (an barriers), a single bit wont be >> enough. > > Please have a look at the second patch I just sent. It relies > on the RCU barriers to make sure all stores are visible before > other CPUs can find the conntrack. > Sorry, I dont understand how your second patch corrects the problem. This (unconfirmed) conntrack is visible by another cpu. This other cpu can call __nf_ct_refresh_acct() while this cpu runs in __nf_conntrack_confirm() @@ -425,7 +425,6 @@ __nf_conntrack_confirm(struct sk_buff *skb) /* Remove from unconfirmed list */ hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); - __nf_conntrack_hash_insert(ct, hash, repl_hash); /* Timer relative to confirmation time, not original setting time, otherwise we'd get timer wrap in weird delay cases. */ @@ -433,8 +432,15 @@ __nf_conntrack_confirm(struct sk_buff *skb) add_timer(&ct->timeout); <<<< another cpu could here change timeout.expires (thinking its still relative) >>>> atomic_inc(&ct->ct_general.use); set_bit(IPS_CONFIRMED_BIT, &ct->status); + + /* Since the lookup is lockless, hash insertion must be after starting the + * timer and setting the CONFIRMED bit. The RCU barriers guarantee that no + * other CPU can find the conntrack before the above stores are visible. + */ + __nf_conntrack_hash_insert(ct, hash, repl_hash); NF_CT_STAT_INC(net, insert); spin_unlock_bh(&nf_conntrack_lock); help = nfct_help(ct); if (help && help->helper) nf_conntrack_event_cache(IPCT_HELPER, ct); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Patrick McHardy a écrit : >> Eric Dumazet wrote: >>> Patrick McHardy a écrit : >>>> Before the conntrack is confirmed, it is exclusively handled by a >>>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT >>>> is visible before we add the conntrack to the hash table since the >>>> lookup is lockless, but simply moving the set_bit before the hash >>>> insertion should be fine I think. >>>> >>> >>> Problem is timeout.expires is either a relative or absolute timeout, >>> and changes happen >>> in __nf_conntrack_confirm() or __nf_ct_refresh_acct(). >>> >>> We must have a synchronization (an barriers), a single bit wont be >>> enough. >> Please have a look at the second patch I just sent. It relies >> on the RCU barriers to make sure all stores are visible before >> other CPUs can find the conntrack. >> > > Sorry, I dont understand how your second patch corrects the problem. > > This (unconfirmed) conntrack is visible by another cpu. No, before it is confirmed, its only visible to the CPU handling the initial packet of a connection. Confirmation is the step that makes it visible to other CPUs. > This other > cpu can call __nf_ct_refresh_acct() while this cpu runs > in __nf_conntrack_confirm() Not for the same conntrack, that would be a seperate bug. Does that explain what I'm trying to do? :) > > @@ -425,7 +425,6 @@ __nf_conntrack_confirm(struct sk_buff *skb) > /* Remove from unconfirmed list */ > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); > > - __nf_conntrack_hash_insert(ct, hash, repl_hash); > /* Timer relative to confirmation time, not original > setting time, otherwise we'd get timer wrap in > weird delay cases. */ > @@ -433,8 +432,15 @@ __nf_conntrack_confirm(struct sk_buff *skb) > add_timer(&ct->timeout); > > <<<< another cpu could here change timeout.expires (thinking its still relative) >>>> > > atomic_inc(&ct->ct_general.use); > set_bit(IPS_CONFIRMED_BIT, &ct->status); > + > + /* Since the lookup is lockless, hash insertion must be after starting the > + * timer and setting the CONFIRMED bit. The RCU barriers guarantee that no > + * other CPU can find the conntrack before the above stores are visible. > + */ > + __nf_conntrack_hash_insert(ct, hash, repl_hash); > NF_CT_STAT_INC(net, insert); > spin_unlock_bh(&nf_conntrack_lock); > help = nfct_help(ct); > if (help && help->helper) > nf_conntrack_event_cache(IPCT_HELPER, ct); > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy a écrit : > Eric Dumazet wrote: >> Patrick McHardy a écrit : >>> Eric Dumazet wrote: >>>> Patrick McHardy a écrit : >>>>> Before the conntrack is confirmed, it is exclusively handled by a >>>>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT >>>>> is visible before we add the conntrack to the hash table since the >>>>> lookup is lockless, but simply moving the set_bit before the hash >>>>> insertion should be fine I think. >>>>> >>>> >>>> Problem is timeout.expires is either a relative or absolute timeout, >>>> and changes happen >>>> in __nf_conntrack_confirm() or __nf_ct_refresh_acct(). >>>> >>>> We must have a synchronization (an barriers), a single bit wont be >>>> enough. >>> Please have a look at the second patch I just sent. It relies >>> on the RCU barriers to make sure all stores are visible before >>> other CPUs can find the conntrack. >>> >> >> Sorry, I dont understand how your second patch corrects the problem. >> >> This (unconfirmed) conntrack is visible by another cpu. > > No, before it is confirmed, its only visible to the CPU handling > the initial packet of a connection. Confirmation is the step that > makes it visible to other CPUs. Thanks Patrick, I missed this, and your patch seems fine now :) > >> This other >> cpu can call __nf_ct_refresh_acct() while this cpu runs >> in __nf_conntrack_confirm() > > Not for the same conntrack, that would be a seperate bug. > > Does that explain what I'm trying to do? :) Yes sure, thanks again. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Patrick McHardy a écrit : >> Eric Dumazet wrote: >>> Patrick McHardy a écrit : >>>> Eric Dumazet wrote: >>>>> Patrick McHardy a écrit : >>>>>> Before the conntrack is confirmed, it is exclusively handled by a >>>>>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT >>>>>> is visible before we add the conntrack to the hash table since the >>>>>> lookup is lockless, but simply moving the set_bit before the hash >>>>>> insertion should be fine I think. >>>>>> >>>>> Problem is timeout.expires is either a relative or absolute timeout, >>>>> and changes happen >>>>> in __nf_conntrack_confirm() or __nf_ct_refresh_acct(). >>>>> >>>>> We must have a synchronization (an barriers), a single bit wont be >>>>> enough. >>>> Please have a look at the second patch I just sent. It relies >>>> on the RCU barriers to make sure all stores are visible before >>>> other CPUs can find the conntrack. >>>> >>> Sorry, I dont understand how your second patch corrects the problem. >>> >>> This (unconfirmed) conntrack is visible by another cpu. >> No, before it is confirmed, its only visible to the CPU handling >> the initial packet of a connection. Confirmation is the step that >> makes it visible to other CPUs. > > Thanks Patrick, I missed this, and your patch seems fine now :) Thanks for your help, I'll send it to Dave later today. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -425,6 +425,7 @@ __nf_conntrack_confirm(struct sk_buff *skb) /* Remove from unconfirmed list */ hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); + set_bit(IPS_CONFIRMED_BIT, &ct->status); __nf_conntrack_hash_insert(ct, hash, repl_hash); /* Timer relative to confirmation time, not original setting time, otherwise we'd get timer wrap in @@ -432,7 +433,6 @@ __nf_conntrack_confirm(struct sk_buff *skb) ct->timeout.expires += jiffies; << What happens if another packet is handled by __nf_ct_refresh_acct here >> (seeing or not the IPS_CONFIRMED_BIT) >> add_timer(&ct->timeout);