Patchwork 3.10.0-rc2 mlx4 not receiving packets for some multicast groups

login
register
mail settings
Submitter Shawn Bohrer
Date May 28, 2013, 8:15 p.m.
Message ID <20130528201508.GA6409@sbohrermbp13-local.rgmadvisors.com>
Download mbox | patch
Permalink /patch/246955/
State RFC
Delegated to: David Miller
Headers show

Comments

Shawn Bohrer - May 28, 2013, 8:15 p.m.
On Sat, May 25, 2013 at 10:13:47AM -0500, Shawn Bohrer wrote:
> On Sat, May 25, 2013 at 06:41:05AM +0300, Or Gerlitz wrote:
> > On Fri, May 24, 2013 at 7:34 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote:
> > > On Fri, May 24, 2013 at 10:49:31AM -0500, Shawn Bohrer wrote:
> > > > I just started testing the 3.10 kernel, previously we were on 3.4 so
> > > > there is a fairly large jump.  I've additionally applied the following
> > > > four patches to the 3.10.0-rc2 kernel that I'm testing:
> > > >
> > > > https://patchwork.kernel.org/patch/2484651/
> > > > https://patchwork.kernel.org/patch/2484671/
> > > > https://patchwork.kernel.org/patch/2484681/
> > > > https://patchwork.kernel.org/patch/2484641/
> > > >
> > >> I don't know if those patches are related to my issues or not but I
> > >> plan on trying to reproduce without them soon.
> > 
> > > I've reverted the four patches above from my test kernel and still see
> > > the issue so they don't appear to be the cause.
> > 
> > Hi Shawn,
> > 
> > So 3.4 works, 3.10-rc2 breaks? its indeed a fairly large gap, maybe
> > try to bisec that? just to make sure, did use touch any mlx4
> > non-default config? specifically did you turn DMFS (Device Managed
> > Flow Steering) on using the  set the mlx4_core module param of
> > log_num_mgm_entry_size or you were using B0 steering (the default)?
> 
> Initially my goal is to sanity check 3.10 before I start playing with
> the knobs, so I haven't explicitly changed any new mlx4 settings yet.
> We do however set some non-default values but I'm doing that on both
> kernels:
> 
> mlx4_core log_num_vlan=7
> mlx4_en pfctx=0xff pfcrx=0xff

Naturally I was wrong and we set more than the above non-default
values.  We additionally set high_rate_steer=1 on mlx4_core. As
you may know this parameter isn't currently available in the upstream
driver, so I've been carrying the following patch in my 3.4 and 3.10
trees:

---
 drivers/net/ethernet/mellanox/mlx4/main.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)
Or Gerlitz - May 29, 2013, 1:55 p.m.
On Tue, May 28, 2013 at 11:15 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote:
> Naturally I was wrong and we set more than the above non-default
> values.  We additionally set high_rate_steer=1 on mlx4_core. As
> you may know this parameter isn't currently available in the upstream
> driver, so I've been carrying the following patch in my 3.4 and 3.10 trees:

[...]

> I've confirmed that with the above high_rate_steer patch and
> high_rate_steer=1 I receive data on 3.10.0-rc3 and with
> high_rate_steer=0 I only receive data on a small number of multicast
> addresses.  With 3.4 and the same patch I receive data in both cases.

[...]

Shawn, so end-in mind you want the NIC steering mode to be DMFS
(Device Managed Flow Steering) e.g for the processes bypassing the
kernel, correct? since the NIC steering mode is global, you will not
be able to use that non-upstream patch moving forward. So we need to
debug/bisect why without the patch (what you call high_rate_steer=0)
you don't get data on all groups. Can you bisect that on a single
node, e.g set the rest of the environment with 3.4 that works, and on
a given node see what is the commit that breaks that?

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shawn Bohrer - May 30, 2013, 8:31 p.m.
On Wed, May 29, 2013 at 04:55:32PM +0300, Or Gerlitz wrote:
> On Tue, May 28, 2013 at 11:15 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote:
> > Naturally I was wrong and we set more than the above non-default
> > values.  We additionally set high_rate_steer=1 on mlx4_core. As
> > you may know this parameter isn't currently available in the upstream
> > driver, so I've been carrying the following patch in my 3.4 and 3.10 trees:
> 
> [...]
> 
> > I've confirmed that with the above high_rate_steer patch and
> > high_rate_steer=1 I receive data on 3.10.0-rc3 and with
> > high_rate_steer=0 I only receive data on a small number of multicast
> > addresses.  With 3.4 and the same patch I receive data in both cases.
> 
> [...]
> 
> Shawn, so end-in mind you want the NIC steering mode to be DMFS
> (Device Managed Flow Steering) e.g for the processes bypassing the
> kernel, correct? since the NIC steering mode is global, you will not
> be able to use that non-upstream patch moving forward.

Yes, end goal is to use DMFS.  However, we have some ConnectX-2 cards
which I guess do not support DMFS and naturally I'd like plain old UDP
multicast to continue to work at the same level as 3.4.  So I may
still want that high_rate_steer option upstreamed, but we'll see once
I get 3.10 into better shape.

> So we need to
> debug/bisect why without the patch (what you call high_rate_steer=0)
> you don't get data on all groups. Can you bisect that on a single
> node, e.g set the rest of the environment with 3.4 that works, and on
> a given node see what is the commit that breaks that?

Done. It appears that the patch that breaks receiving packets on many
different multicast groups/sockets is:

commit 4cd729b04285b7330edaf5a7080aa795d6d15ff3
Author: Vlad Yasevich <vyasevic@redhat.com>
Date:   Mon Apr 15 09:54:25 2013 +0000

    net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api
    
    The current implementation of dev_uc_sync/unsync() assumes that there is
    a strict 1-to-1 relationship between the source and destination of the sync.
    In other words, once an address has been synced to a destination device, it
    will not be synced to any other device through the sync API.
    However, there are some virtual devices that aggreate a number of lower
    devices and need to sync addresses to all of them.  The current
    API falls short there.
    
    This patch introduces a new dev_uc_sync_multiple() api that can be called
    in the above circumstances and allows sync to work for every invocation.
    
    CC: Jiri Pirko <jiri@resnulli.us>
    Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

I've confirmed that reverting this patch on top of 3.10-rc3 allows me
to receive packets on all of my multicast groups without the Mellanox
high_rate_steer option set.

--
Shawn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz - May 30, 2013, 8:42 p.m.
On Thu, May 30, 2013 at 11:31 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote:
>> So we need to
>> debug/bisect why without the patch (what you call high_rate_steer=0)
>> you don't get data on all groups. Can you bisect that on a single
>> node, e.g set the rest of the environment with 3.4 that works, and on
>> a given node see what is the commit that breaks that?

> Done. It appears that the patch that breaks receiving packets on many
> different multicast groups/sockets is:
>
> commit 4cd729b04285b7330edaf5a7080aa795d6d15ff3
> Author: Vlad Yasevich <vyasevic@redhat.com>
> Date:   Mon Apr 15 09:54:25 2013 +0000
>
>     net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api
>
>     The current implementation of dev_uc_sync/unsync() assumes that there is
>     a strict 1-to-1 relationship between the source and destination of the sync.
>     In other words, once an address has been synced to a destination device, it
>     will not be synced to any other device through the sync API.
>     However, there are some virtual devices that aggreate a number of lower
>     devices and need to sync addresses to all of them.  The current
>     API falls short there.
>
>     This patch introduces a new dev_uc_sync_multiple() api that can be called
>     in the above circumstances and allows sync to work for every invocation.
>
>     CC: Jiri Pirko <jiri@resnulli.us>
>     Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> I've confirmed that reverting this patch on top of 3.10-rc3 allows me
> to receive packets on all of my multicast groups without the Mellanox
> high_rate_steer option set.

OK, impressive debugging... so what do we do from here? Vlad, Shawn
observes a regression once this patch is used on a large scale setup
that uses many multicast groups (you can read the posts done earlier
on this thread), does this rings any bell w.r.t to the actual problem
in the patch?

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich - May 30, 2013, 8:57 p.m.
On 05/30/2013 04:42 PM, Or Gerlitz wrote:
> On Thu, May 30, 2013 at 11:31 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote:
>>> So we need to
>>> debug/bisect why without the patch (what you call high_rate_steer=0)
>>> you don't get data on all groups. Can you bisect that on a single
>>> node, e.g set the rest of the environment with 3.4 that works, and on
>>> a given node see what is the commit that breaks that?
>
>> Done. It appears that the patch that breaks receiving packets on many
>> different multicast groups/sockets is:
>>
>> commit 4cd729b04285b7330edaf5a7080aa795d6d15ff3
>> Author: Vlad Yasevich <vyasevic@redhat.com>
>> Date:   Mon Apr 15 09:54:25 2013 +0000
>>
>>      net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api
>>
>>      The current implementation of dev_uc_sync/unsync() assumes that there is
>>      a strict 1-to-1 relationship between the source and destination of the sync.
>>      In other words, once an address has been synced to a destination device, it
>>      will not be synced to any other device through the sync API.
>>      However, there are some virtual devices that aggreate a number of lower
>>      devices and need to sync addresses to all of them.  The current
>>      API falls short there.
>>
>>      This patch introduces a new dev_uc_sync_multiple() api that can be called
>>      in the above circumstances and allows sync to work for every invocation.
>>
>>      CC: Jiri Pirko <jiri@resnulli.us>
>>      Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> I've confirmed that reverting this patch on top of 3.10-rc3 allows me
>> to receive packets on all of my multicast groups without the Mellanox
>> high_rate_steer option set.
>
> OK, impressive debugging... so what do we do from here? Vlad, Shawn
> observes a regression once this patch is used on a large scale setup
> that uses many multicast groups (you can read the posts done earlier
> on this thread), does this rings any bell w.r.t to the actual problem
> in the patch?

I haven't seen that, but I didn't test with that many multicast groups. 
  I had 20 groups working.

I'll take a look and see what might be going on.

Thanks
-vlad

>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh - May 31, 2013, 12:23 a.m.
Vlad Yasevich <vyasevic@redhat.com> wrote:

>>>      CC: Jiri Pirko <jiri@resnulli.us>
>>>      Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> I've confirmed that reverting this patch on top of 3.10-rc3 allows me
>>> to receive packets on all of my multicast groups without the Mellanox
>>> high_rate_steer option set.
>>
>> OK, impressive debugging... so what do we do from here? Vlad, Shawn
>> observes a regression once this patch is used on a large scale setup
>> that uses many multicast groups (you can read the posts done earlier
>> on this thread), does this rings any bell w.r.t to the actual problem
>> in the patch?
>
>I haven't seen that, but I didn't test with that many multicast groups. I
>had 20 groups working.
>
>I'll take a look and see what might be going on.

	I've actually been porting bonding to the dev_sync/unsync
system, and have a patch series of 4 fixes to various internals of
dev_sync/unsync; I'll post those under separate cover.  It may be that
one or more of those things are the source of this problem (or I might
have it all wrong).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shawn Bohrer - May 31, 2013, 3:17 p.m.
On Thu, May 30, 2013 at 05:23:20PM -0700, Jay Vosburgh wrote:
> Vlad Yasevich <vyasevic@redhat.com> wrote:
> 
> >>>      CC: Jiri Pirko <jiri@resnulli.us>
> >>>      Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> >>>      Signed-off-by: David S. Miller <davem@davemloft.net>
> >>>
> >>> I've confirmed that reverting this patch on top of 3.10-rc3 allows me
> >>> to receive packets on all of my multicast groups without the Mellanox
> >>> high_rate_steer option set.
> >>
> >> OK, impressive debugging... so what do we do from here? Vlad, Shawn
> >> observes a regression once this patch is used on a large scale setup
> >> that uses many multicast groups (you can read the posts done earlier
> >> on this thread), does this rings any bell w.r.t to the actual problem
> >> in the patch?
> >
> >I haven't seen that, but I didn't test with that many multicast groups. I
> >had 20 groups working.
> >
> >I'll take a look and see what might be going on.
> 
> 	I've actually been porting bonding to the dev_sync/unsync
> system, and have a patch series of 4 fixes to various internals of
> dev_sync/unsync; I'll post those under separate cover.  It may be that
> one or more of those things are the source of this problem (or I might
> have it all wrong).

Thanks Jay,

I've tested your 4 patches on top of Linus' tree and they do solve the
multicast issue I was seeing in this thread.

--
Shawn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 0d32a82..7808e4a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -71,6 +71,11 @@  static int msi_x = 1;
 module_param(msi_x, int, 0444);
 MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero");
 
+static int high_rate_steer;
+module_param(high_rate_steer, int, 0444);
+MODULE_PARM_DESC(high_rate_steer, "Enable steering mode for higher packet rate"
+                                  " (default off)");
+
 #else /* CONFIG_PCI_MSI */
 
 #define msi_x (0)
@@ -288,6 +293,11 @@  static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	if (mlx4_is_mfunc(dev))
 		dev->caps.flags &= ~MLX4_DEV_CAP_FLAG_SENSE_SUPPORT;
 
+	if (high_rate_steer && !mlx4_is_mfunc(dev)) {
+		dev->caps.flags &= ~MLX4_DEV_CAP_FLAG_VEP_UC_STEER;
+		dev->caps.flags &= ~MLX4_DEV_CAP_FLAG_VEP_MC_STEER;
+	}
+
 	dev->caps.log_num_macs  = log_num_mac;
 	dev->caps.log_num_vlans = MLX4_LOG_NUM_VLANS;
 	dev->caps.log_num_prios = use_prio ? 3 : 0;