diff mbox

Multi-thread udp 4.7 regression, bisected to 71d8c47fc653

Message ID 20160627153820.GB10613@breakpoint.cc
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Florian Westphal June 27, 2016, 3:38 p.m. UTC
Marc Dionne <marc.c.dionne@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal <fw@strlen.de> wrote:
> > Marc Dionne <marc.c.dionne@gmail.com> wrote:
> >> Hi,

> >         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
> >                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
> > -                                   zone, net))
> > -                       goto out;
> > +                                   zone, net)) {
> > +                       nf_ct_add_to_dying_list(ct);
> > +                       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
> > +                       goto dying;
> > +               }

This is bogus as h can be a reply too (key compare does not deal
with it).

Below is what I actually intended; I can't come up with a reason why
you experience this issue other than that we're getting confused over
reply/original direction.

If the patch doesn't help either, can you tell us what kind of iptables
rules are installed on the affected system or perhaps report perf drop
monitor stat when things go wrong?

Thanks!

Comments

Marc Dionne June 27, 2016, 5:21 p.m. UTC | #1
On Mon, Jun 27, 2016 at 12:38 PM, Florian Westphal <fw@strlen.de> wrote:
> Marc Dionne <marc.c.dionne@gmail.com> wrote:
>> On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal <fw@strlen.de> wrote:
>> > Marc Dionne <marc.c.dionne@gmail.com> wrote:
>> >> Hi,
>
>> >         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
>> >                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
>> > -                                   zone, net))
>> > -                       goto out;
>> > +                                   zone, net)) {
>> > +                       nf_ct_add_to_dying_list(ct);
>> > +                       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
>> > +                       goto dying;
>> > +               }
>
> This is bogus as h can be a reply too (key compare does not deal
> with it).
>
> Below is what I actually intended; I can't come up with a reason why
> you experience this issue other than that we're getting confused over
> reply/original direction.
>
> If the patch doesn't help either, can you tell us what kind of iptables
> rules are installed on the affected system or perhaps report perf drop
> monitor stat when things go wrong?
>
> Thanks!

The additional patch didn't help either.

I had a lot of iptables bloat, but I reverted to old simple iptables
and ip6tables configs (attached), and still see the problem.  Note
that the test normally uses ipv6, but the behaviour is the same with
ipv4.

Marc
Marc Dionne July 4, 2016, 12:35 p.m. UTC | #2
On Mon, Jun 27, 2016 at 2:21 PM, Marc Dionne <marc.c.dionne@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 12:38 PM, Florian Westphal <fw@strlen.de> wrote:
>> Marc Dionne <marc.c.dionne@gmail.com> wrote:
>>> On Mon, Jun 27, 2016 at 11:22 AM, Florian Westphal <fw@strlen.de> wrote:
>>> > Marc Dionne <marc.c.dionne@gmail.com> wrote:
>>> >> Hi,
>>
>>> >         hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
>>> >                 if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
>>> > -                                   zone, net))
>>> > -                       goto out;
>>> > +                                   zone, net)) {
>>> > +                       nf_ct_add_to_dying_list(ct);
>>> > +                       ret = nf_ct_resolve_clash(net, skb, ctinfo, h);
>>> > +                       goto dying;
>>> > +               }
>>
>> This is bogus as h can be a reply too (key compare does not deal
>> with it).
>>
>> Below is what I actually intended; I can't come up with a reason why
>> you experience this issue other than that we're getting confused over
>> reply/original direction.
>>
>> If the patch doesn't help either, can you tell us what kind of iptables
>> rules are installed on the affected system or perhaps report perf drop
>> monitor stat when things go wrong?
>>
>> Thanks!
>
> The additional patch didn't help either.
>
> I had a lot of iptables bloat, but I reverted to old simple iptables
> and ip6tables configs (attached), and still see the problem.  Note
> that the test normally uses ipv6, but the behaviour is the same with
> ipv4.
>
> Marc

Hi,

Any other ideas for this issue?  I would hate to see the 4.7 release
go out without a fix or revert, and have a kernel that we have to tell
our clients not to use.

If there is no quick fix, seems like a revert should be considered:
- Looks to me like the commit attempts to fix a long standing bug
(exists at least as far back as 3.5,
https://bugzilla.kernel.org/show_bug.cgi?id=52991)
- The above bug has a simple workaround (at least for us) that we
implemented more than 3 years ago
- The commit reverts cleanly, restoring the original behaviour
- From that bug report, bind was one of the affected applications; I
would suspect that this regression is likely to affect bind as well

I'd be more than happy to test suggested fixes or give feedback with
debugging patches, etc.

Thanks,
Marc
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -638,6 +638,12 @@  static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
 	struct nf_conntrack_l4proto *l4proto;
 
+	/* skb being confirmed is always original dir; do not attach to
+	 * a reply tuple.
+	 */
+	if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
+		goto out;
+
 	l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
 	if (l4proto->allow_clash &&
 	    !nf_ct_is_dying(ct) &&
@@ -650,6 +656,7 @@  static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
 		skb->nfct = &ct->ct_general;
 		return NF_ACCEPT;
 	}
+ out:
 	NF_CT_STAT_INC(net, drop);
 	return NF_DROP;
 }