Message ID | 4A38FC5A.70500@trash.net |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Patrick McHardy a écrit : > Patrick McHardy wrote: >> Eric Dumazet wrote: >>> Patrick McHardy a écrit : >>>> No, before it is confirmed, its only visible to the CPU handling >>>> the initial packet of a connection. Confirmation is the step that >>>> makes it visible to other CPUs. >>> >>> Thanks Patrick, I missed this, and your patch seems fine now :) >> >> Thanks for your help, I'll send it to Dave later today. > > I'm having some trouble figuring out the exact events that would > lead to the timer base corruption. Ingo, could you please test > this patch to make sure it also fixes the problem? > > ;) Event can be described as following : CPU1 CPU2 /* __nf_conntrack_confirm() */ __nf_conntrack_hash_insert(ct, hash, repl_hash); // now 'ct' is visible by other cpus // search conntrack and find ct // timeout.expires becomes absolute here ct->timeout.expires += jiffies; add_timer(&ct->timeout); /* __nf_ct_refresh_acct() */ if (!nf_ct_is_confirmed(ct)) { // we *believe* timeout.expires // is not yet in use by timer code // and is still a relative quantity. // We want to 'update' it but we should not ! ct->timeout.expires = extra_jiffies; << CORRUPTION >> } else { // too late :( set_bit(IPS_CONFIRMED_BIT, &ct->status); This is how I understood the problem, but I may be wrong ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Patrick McHardy a écrit : >> I'm having some trouble figuring out the exact events that would >> lead to the timer base corruption. Ingo, could you please test >> this patch to make sure it also fixes the problem? > > ;) > > Event can be described as following : > > CPU1 CPU2 > > /* __nf_conntrack_confirm() */ > __nf_conntrack_hash_insert(ct, hash, repl_hash); > // now 'ct' is visible by other cpus > // search conntrack and find ct > // timeout.expires becomes absolute here > ct->timeout.expires += jiffies; > add_timer(&ct->timeout); > > /* __nf_ct_refresh_acct() */ > if (!nf_ct_is_confirmed(ct)) { > // we *believe* timeout.expires > // is not yet in use by timer code > // and is still a relative quantity. > // We want to 'update' it but we should not ! > ct->timeout.expires = extra_jiffies; << CORRUPTION >> > } else { > // too late :( > set_bit(IPS_CONFIRMED_BIT, &ct->status); > > This is how I understood the problem, but I may be wrong ? Thats one case that can happen, but that wouldn't corrupt the timer base AFAICS. Also the callpath shows that it actually went into the mod_timer_pending() path *and* timer_pending() was true. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 5f72b94..9b20e58 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -425,7 +425,6 @@ __nf_conntrack_confirm(struct sk_buff *skb) /* Remove from unconfirmed list */ hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); - __nf_conntrack_hash_insert(ct, hash, repl_hash); /* Timer relative to confirmation time, not original setting time, otherwise we'd get timer wrap in weird delay cases. */ @@ -433,8 +432,16 @@ __nf_conntrack_confirm(struct sk_buff *skb) add_timer(&ct->timeout); atomic_inc(&ct->ct_general.use); set_bit(IPS_CONFIRMED_BIT, &ct->status); + + /* Since the lookup is lockless, hash insertion must be done after + * starting the timer and setting the CONFIRMED bit. The RCU barriers + * guarantee that no other CPU can find the conntrack before the above + * stores are visible. + */ + __nf_conntrack_hash_insert(ct, hash, repl_hash); NF_CT_STAT_INC(net, insert); spin_unlock_bh(&nf_conntrack_lock); + help = nfct_help(ct); if (help && help->helper) nf_conntrack_event_cache(IPCT_HELPER, ct);