Message ID | 20130528201508.GA6409@sbohrermbp13-local.rgmadvisors.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, May 28, 2013 at 11:15 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote: > Naturally I was wrong and we set more than the above non-default > values. We additionally set high_rate_steer=1 on mlx4_core. As > you may know this parameter isn't currently available in the upstream > driver, so I've been carrying the following patch in my 3.4 and 3.10 trees: [...] > I've confirmed that with the above high_rate_steer patch and > high_rate_steer=1 I receive data on 3.10.0-rc3 and with > high_rate_steer=0 I only receive data on a small number of multicast > addresses. With 3.4 and the same patch I receive data in both cases. [...] Shawn, so end-in mind you want the NIC steering mode to be DMFS (Device Managed Flow Steering) e.g for the processes bypassing the kernel, correct? since the NIC steering mode is global, you will not be able to use that non-upstream patch moving forward. So we need to debug/bisect why without the patch (what you call high_rate_steer=0) you don't get data on all groups. Can you bisect that on a single node, e.g set the rest of the environment with 3.4 that works, and on a given node see what is the commit that breaks that? Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 29, 2013 at 04:55:32PM +0300, Or Gerlitz wrote: > On Tue, May 28, 2013 at 11:15 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote: > > Naturally I was wrong and we set more than the above non-default > > values. We additionally set high_rate_steer=1 on mlx4_core. As > > you may know this parameter isn't currently available in the upstream > > driver, so I've been carrying the following patch in my 3.4 and 3.10 trees: > > [...] > > > I've confirmed that with the above high_rate_steer patch and > > high_rate_steer=1 I receive data on 3.10.0-rc3 and with > > high_rate_steer=0 I only receive data on a small number of multicast > > addresses. With 3.4 and the same patch I receive data in both cases. > > [...] > > Shawn, so end-in mind you want the NIC steering mode to be DMFS > (Device Managed Flow Steering) e.g for the processes bypassing the > kernel, correct? since the NIC steering mode is global, you will not > be able to use that non-upstream patch moving forward. Yes, end goal is to use DMFS. However, we have some ConnectX-2 cards which I guess do not support DMFS and naturally I'd like plain old UDP multicast to continue to work at the same level as 3.4. So I may still want that high_rate_steer option upstreamed, but we'll see once I get 3.10 into better shape. > So we need to > debug/bisect why without the patch (what you call high_rate_steer=0) > you don't get data on all groups. Can you bisect that on a single > node, e.g set the rest of the environment with 3.4 that works, and on > a given node see what is the commit that breaks that? Done. It appears that the patch that breaks receiving packets on many different multicast groups/sockets is: commit 4cd729b04285b7330edaf5a7080aa795d6d15ff3 Author: Vlad Yasevich <vyasevic@redhat.com> Date: Mon Apr 15 09:54:25 2013 +0000 net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api The current implementation of dev_uc_sync/unsync() assumes that there is a strict 1-to-1 relationship between the source and destination of the sync. In other words, once an address has been synced to a destination device, it will not be synced to any other device through the sync API. However, there are some virtual devices that aggreate a number of lower devices and need to sync addresses to all of them. The current API falls short there. This patch introduces a new dev_uc_sync_multiple() api that can be called in the above circumstances and allows sync to work for every invocation. CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> I've confirmed that reverting this patch on top of 3.10-rc3 allows me to receive packets on all of my multicast groups without the Mellanox high_rate_steer option set. -- Shawn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 11:31 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote: >> So we need to >> debug/bisect why without the patch (what you call high_rate_steer=0) >> you don't get data on all groups. Can you bisect that on a single >> node, e.g set the rest of the environment with 3.4 that works, and on >> a given node see what is the commit that breaks that? > Done. It appears that the patch that breaks receiving packets on many > different multicast groups/sockets is: > > commit 4cd729b04285b7330edaf5a7080aa795d6d15ff3 > Author: Vlad Yasevich <vyasevic@redhat.com> > Date: Mon Apr 15 09:54:25 2013 +0000 > > net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api > > The current implementation of dev_uc_sync/unsync() assumes that there is > a strict 1-to-1 relationship between the source and destination of the sync. > In other words, once an address has been synced to a destination device, it > will not be synced to any other device through the sync API. > However, there are some virtual devices that aggreate a number of lower > devices and need to sync addresses to all of them. The current > API falls short there. > > This patch introduces a new dev_uc_sync_multiple() api that can be called > in the above circumstances and allows sync to work for every invocation. > > CC: Jiri Pirko <jiri@resnulli.us> > Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > I've confirmed that reverting this patch on top of 3.10-rc3 allows me > to receive packets on all of my multicast groups without the Mellanox > high_rate_steer option set. OK, impressive debugging... so what do we do from here? Vlad, Shawn observes a regression once this patch is used on a large scale setup that uses many multicast groups (you can read the posts done earlier on this thread), does this rings any bell w.r.t to the actual problem in the patch? Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/30/2013 04:42 PM, Or Gerlitz wrote: > On Thu, May 30, 2013 at 11:31 PM, Shawn Bohrer <shawn.bohrer@gmail.com> wrote: >>> So we need to >>> debug/bisect why without the patch (what you call high_rate_steer=0) >>> you don't get data on all groups. Can you bisect that on a single >>> node, e.g set the rest of the environment with 3.4 that works, and on >>> a given node see what is the commit that breaks that? > >> Done. It appears that the patch that breaks receiving packets on many >> different multicast groups/sockets is: >> >> commit 4cd729b04285b7330edaf5a7080aa795d6d15ff3 >> Author: Vlad Yasevich <vyasevic@redhat.com> >> Date: Mon Apr 15 09:54:25 2013 +0000 >> >> net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api >> >> The current implementation of dev_uc_sync/unsync() assumes that there is >> a strict 1-to-1 relationship between the source and destination of the sync. >> In other words, once an address has been synced to a destination device, it >> will not be synced to any other device through the sync API. >> However, there are some virtual devices that aggreate a number of lower >> devices and need to sync addresses to all of them. The current >> API falls short there. >> >> This patch introduces a new dev_uc_sync_multiple() api that can be called >> in the above circumstances and allows sync to work for every invocation. >> >> CC: Jiri Pirko <jiri@resnulli.us> >> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> I've confirmed that reverting this patch on top of 3.10-rc3 allows me >> to receive packets on all of my multicast groups without the Mellanox >> high_rate_steer option set. > > OK, impressive debugging... so what do we do from here? Vlad, Shawn > observes a regression once this patch is used on a large scale setup > that uses many multicast groups (you can read the posts done earlier > on this thread), does this rings any bell w.r.t to the actual problem > in the patch? I haven't seen that, but I didn't test with that many multicast groups. I had 20 groups working. I'll take a look and see what might be going on. Thanks -vlad > > Or. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Vlad Yasevich <vyasevic@redhat.com> wrote: >>> CC: Jiri Pirko <jiri@resnulli.us> >>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> >>> Signed-off-by: David S. Miller <davem@davemloft.net> >>> >>> I've confirmed that reverting this patch on top of 3.10-rc3 allows me >>> to receive packets on all of my multicast groups without the Mellanox >>> high_rate_steer option set. >> >> OK, impressive debugging... so what do we do from here? Vlad, Shawn >> observes a regression once this patch is used on a large scale setup >> that uses many multicast groups (you can read the posts done earlier >> on this thread), does this rings any bell w.r.t to the actual problem >> in the patch? > >I haven't seen that, but I didn't test with that many multicast groups. I >had 20 groups working. > >I'll take a look and see what might be going on. I've actually been porting bonding to the dev_sync/unsync system, and have a patch series of 4 fixes to various internals of dev_sync/unsync; I'll post those under separate cover. It may be that one or more of those things are the source of this problem (or I might have it all wrong). -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 30, 2013 at 05:23:20PM -0700, Jay Vosburgh wrote: > Vlad Yasevich <vyasevic@redhat.com> wrote: > > >>> CC: Jiri Pirko <jiri@resnulli.us> > >>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> > >>> Signed-off-by: David S. Miller <davem@davemloft.net> > >>> > >>> I've confirmed that reverting this patch on top of 3.10-rc3 allows me > >>> to receive packets on all of my multicast groups without the Mellanox > >>> high_rate_steer option set. > >> > >> OK, impressive debugging... so what do we do from here? Vlad, Shawn > >> observes a regression once this patch is used on a large scale setup > >> that uses many multicast groups (you can read the posts done earlier > >> on this thread), does this rings any bell w.r.t to the actual problem > >> in the patch? > > > >I haven't seen that, but I didn't test with that many multicast groups. I > >had 20 groups working. > > > >I'll take a look and see what might be going on. > > I've actually been porting bonding to the dev_sync/unsync > system, and have a patch series of 4 fixes to various internals of > dev_sync/unsync; I'll post those under separate cover. It may be that > one or more of those things are the source of this problem (or I might > have it all wrong). Thanks Jay, I've tested your 4 patches on top of Linus' tree and they do solve the multicast issue I was seeing in this thread. -- Shawn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 0d32a82..7808e4a 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -71,6 +71,11 @@ static int msi_x = 1; module_param(msi_x, int, 0444); MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero"); +static int high_rate_steer; +module_param(high_rate_steer, int, 0444); +MODULE_PARM_DESC(high_rate_steer, "Enable steering mode for higher packet rate" + " (default off)"); + #else /* CONFIG_PCI_MSI */ #define msi_x (0) @@ -288,6 +293,11 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) if (mlx4_is_mfunc(dev)) dev->caps.flags &= ~MLX4_DEV_CAP_FLAG_SENSE_SUPPORT; + if (high_rate_steer && !mlx4_is_mfunc(dev)) { + dev->caps.flags &= ~MLX4_DEV_CAP_FLAG_VEP_UC_STEER; + dev->caps.flags &= ~MLX4_DEV_CAP_FLAG_VEP_MC_STEER; + } + dev->caps.log_num_macs = log_num_mac; dev->caps.log_num_vlans = MLX4_LOG_NUM_VLANS; dev->caps.log_num_prios = use_prio ? 3 : 0;