From patchwork Tue Jul 5 12:28:03 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 644722 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3rkNTJ2yMGz9sdb for ; Tue, 5 Jul 2016 22:28:20 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932965AbcGEM2R (ORCPT ); Tue, 5 Jul 2016 08:28:17 -0400 Received: from mail.us.es ([193.147.175.20]:36977 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932729AbcGEM2Q (ORCPT ); Tue, 5 Jul 2016 08:28:16 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 066EB1F4B74 for ; Tue, 5 Jul 2016 14:28:12 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id E7A2DFAB4E for ; Tue, 5 Jul 2016 14:28:11 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id DC36064490; Tue, 5 Jul 2016 14:28:11 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-104.6 required=7.5 tests=BAYES_50, HEADER_FROM_DIFFERENT_DOMAINS, RP_MATCHES_RCVD, SMTPAUTH_US, SPF_HELO_FAIL, USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 4D1AE9EBA9 for ; Tue, 5 Jul 2016 14:28:08 +0200 (CEST) Received: from 192.168.1.13 (192.168.1.13) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/530/antivirus1-rhel7.int); Tue, 05 Jul 2016 14:28:08 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/antivirus1-rhel7.int) Received: (qmail 13941 invoked from network); 5 Jul 2016 14:28:08 +0200 Received: from 1984.lsi.us.es (HELO us.es) (1984lsi@150.214.188.80) by mail.us.es with AES128-SHA encrypted SMTP; 5 Jul 2016 14:28:08 +0200 Date: Tue, 5 Jul 2016 14:28:03 +0200 From: Pablo Neira Ayuso To: Marc Dionne Cc: Florian Westphal , netdev , regressions@leemhuis.info, netfilter-devel@vger.kernel.org Subject: Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653 Message-ID: <20160705122803.GA26862@salvia> References: <20160627142238.GA10613@breakpoint.cc> <20160627153820.GB10613@breakpoint.cc> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi, On Mon, Jul 04, 2016 at 09:35:28AM -0300, Marc Dionne wrote: > If there is no quick fix, seems like a revert should be considered: > - Looks to me like the commit attempts to fix a long standing bug > (exists at least as far back as 3.5, > https://bugzilla.kernel.org/show_bug.cgi?id=52991) > - The above bug has a simple workaround (at least for us) that we > implemented more than 3 years ago I guess the workaround consists of using a rule to NOTRACK this traffic. Or there is any custom patch that you've used on your side to resolve this? > - The commit reverts cleanly, restoring the original behaviour > - From that bug report, bind was one of the affected applications; I > would suspect that this regression is likely to affect bind as well > > I'd be more than happy to test suggested fixes or give feedback with > debugging patches, etc. Could you monitor # conntrack -S or alternatively (if conntrack utility not available in your system): # cat /proc/net/stat/nf_conntrack ? Please, watch for insert_failed and drop statistics. Are you observing any splat or just large packet drops? Could you compile your kernel with lockdep on and retest? Is there any chance I can get your test file that generates the UDP client threads to reproduce this here? I'm also attaching a patch to drop old ct that lost race path out from hashtable locks to avoid releasing the ct object while holding the locks, although I couldn't come up with any interaction so far triggering the condition that you're observing. Thanks. diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 62c42e9..98a71f1 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -638,7 +638,8 @@ static void nf_ct_acct_merge(struct nf_conn *ct, enum ip_conntrack_info ctinfo, /* Resolve race on insertion if this protocol allows this. */ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb, enum ip_conntrack_info ctinfo, - struct nf_conntrack_tuple_hash *h) + struct nf_conntrack_tuple_hash *h, + struct nf_conn **old_ct) { /* This is the conntrack entry already in hashes that won race. */ struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); @@ -649,7 +650,7 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb, !nf_ct_is_dying(ct) && atomic_inc_not_zero(&ct->ct_general.use)) { nf_ct_acct_merge(ct, ctinfo, (struct nf_conn *)skb->nfct); - nf_conntrack_put(skb->nfct); + *old_ct = (struct nf_conn *)skb->nfct; /* Assign conntrack already in hashes to this skbuff. Don't * modify skb->nfctinfo to ensure consistent stateful filtering. */ @@ -667,7 +668,7 @@ __nf_conntrack_confirm(struct sk_buff *skb) const struct nf_conntrack_zone *zone; unsigned int hash, reply_hash; struct nf_conntrack_tuple_hash *h; - struct nf_conn *ct; + struct nf_conn *ct, *old_ct = NULL; struct nf_conn_help *help; struct nf_conn_tstamp *tstamp; struct hlist_nulls_node *n; @@ -771,11 +772,14 @@ __nf_conntrack_confirm(struct sk_buff *skb) out: nf_ct_add_to_dying_list(ct); - ret = nf_ct_resolve_clash(net, skb, ctinfo, h); + ret = nf_ct_resolve_clash(net, skb, ctinfo, h, &old_ct); dying: nf_conntrack_double_unlock(hash, reply_hash); NF_CT_STAT_INC(net, insert_failed); local_bh_enable(); + if (old_ct) + nf_ct_put(old_ct); + return ret; } EXPORT_SYMBOL_GPL(__nf_conntrack_confirm);