Message ID | 1329401112-9542-1-git-send-email-ariele@broadcom.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2012-02-16 at 16:05 +0200, Ariel Elior wrote: > In 57712 and 578xx the tx-switching module parameter allows the user to control > whether outgoing traffic can be loopbacked into the device in case there is a > relevant client for the data using the same device for rx. > A classic example where this is necessary is for virtualization purposes, where > one vm is transmitting data to another, while both use different pci functions of > the same port of the same nic. > > In case there is a promiscuous client in the rx (which wants to receive all > data) or if the traffic is broadcast, traffic may be sent on both the loopback > channel and the physical wire. > > The reason tx-switching is controlled by a module parameter is twofold: > 1. There is a certain performance penalty for tx-switching because: > a. every packet must be compared against the receiver clients. > b. duplicated traffic being loopbacked can consume a significant portion of > the overall bandwidth, depending on the scenario. So you really want the driver/firmware/hardware to know all the local addresses (as John Fastabend was proposing). > 2. Tx-switching doesn't make much sense as a per function parameter, but should > rather be controlled uniformly for the entire device. [...] What if there are multiple such cards in the same system, and this is only wanted for one of them? Ben.
On Thu, 16 Feb 2012 16:05:12 +0200 "Ariel Elior" <ariele@broadcom.com> wrote: > In 57712 and 578xx the tx-switching module parameter allows the user to control > whether outgoing traffic can be loopbacked into the device in case there is a > relevant client for the data using the same device for rx. > A classic example where this is necessary is for virtualization purposes, where > one vm is transmitting data to another, while both use different pci functions of > the same port of the same nic. > > In case there is a promiscuous client in the rx (which wants to receive all > data) or if the traffic is broadcast, traffic may be sent on both the loopback > channel and the physical wire. > > The reason tx-switching is controlled by a module parameter is twofold: > 1. There is a certain performance penalty for tx-switching because: > a. every packet must be compared against the receiver clients. > b. duplicated traffic being loopbacked can consume a significant portion of > the overall bandwidth, depending on the scenario. > 2. Tx-switching doesn't make much sense as a per function parameter, but should > rather be controlled uniformly for the entire device. The reason is that if one > interface wants to be able to send data on the loopback it is not enough to > enable tx-switching for that interface, as the target interface must also > register its rx classification information where the transmitting interface can > find it. One would still have to use the module parameter in each VM, though. > > Signed-off-by: Ariel Elior <ariele@broadcom.com> > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Module parameters are the hardware vendors friend, but the system integrators nightmare. Although you think your hardware is special but it isn't some other vendor will have same idea, how is user and distribution supposed to control it? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-02-16 at 17:20 +0000, Ben Hutchings wrote: > On Thu, 2012-02-16 at 16:05 +0200, Ariel Elior wrote: > > In 57712 and 578xx the tx-switching module parameter allows the user to control > > whether outgoing traffic can be loopbacked into the device in case there is a > > relevant client for the data using the same device for rx. > > A classic example where this is necessary is for virtualization purposes, where > > one vm is transmitting data to another, while both use different pci functions of > > the same port of the same nic. > > > > In case there is a promiscuous client in the rx (which wants to receive all > > data) or if the traffic is broadcast, traffic may be sent on both the loopback > > channel and the physical wire. > > > > The reason tx-switching is controlled by a module parameter is twofold: > > 1. There is a certain performance penalty for tx-switching because: > > a. every packet must be compared against the receiver clients. > > b. duplicated traffic being loopbacked can consume a significant portion of > > the overall bandwidth, depending on the scenario. > > So you really want the driver/firmware/hardware to know all the local > addresses (as John Fastabend was proposing). We need the HW to know the MAC addresses - L2 information only and each PF is only configuring the addresses it owns (same addresses which are used for Rx filtering anyway). > > 2. Tx-switching doesn't make much sense as a per function parameter, but should > > rather be controlled uniformly for the entire device. > [...] > > What if there are multiple such cards in the same system, and this is > only wanted for one of them? Using this parameter is more likely in a physical device assignment in a VM - having multiple devices that this rule apply to some but not all, is possible - but less likely. It seem more convenient than using the ethtool private flag and forcing the administrator to go over all PFs (which sounds more common). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-02-16 at 09:49 -0800, Stephen Hemminger wrote: > On Thu, 16 Feb 2012 16:05:12 +0200 > "Ariel Elior" <ariele@broadcom.com> wrote: > > > In 57712 and 578xx the tx-switching module parameter allows the user to control > > whether outgoing traffic can be loopbacked into the device in case there is a > > relevant client for the data using the same device for rx. > > A classic example where this is necessary is for virtualization purposes, where > > one vm is transmitting data to another, while both use different pci functions of > > the same port of the same nic. > > > > In case there is a promiscuous client in the rx (which wants to receive all > > data) or if the traffic is broadcast, traffic may be sent on both the loopback > > channel and the physical wire. > > > > The reason tx-switching is controlled by a module parameter is twofold: > > 1. There is a certain performance penalty for tx-switching because: > > a. every packet must be compared against the receiver clients. > > b. duplicated traffic being loopbacked can consume a significant portion of > > the overall bandwidth, depending on the scenario. > > 2. Tx-switching doesn't make much sense as a per function parameter, but should > > rather be controlled uniformly for the entire device. The reason is that if one > > interface wants to be able to send data on the loopback it is not enough to > > enable tx-switching for that interface, as the target interface must also > > register its rx classification information where the transmitting interface can > > find it. One would still have to use the module parameter in each VM, though. > > > > Signed-off-by: Ariel Elior <ariele@broadcom.com> > > Signed-off-by: Eilon Greenstein <eilong@broadcom.com> > > Module parameters are the hardware vendors friend, but the system > integrators nightmare. Although you think your hardware is special > but it isn't some other vendor will have same idea, how is user and > distribution supposed to control it? Actually, module parameters require more explanations and cause more questions since they are unique to the device than any standard way - so we do prefer a standard way of doing things. In this case, we looked at other driver and scanned the mailing list history to see if we missed some discussion - but could not found anything. It is possible that for some HW the cost of doing this internal switching is low and therefore enabled by default and it is possible that some HW do not support it. This applies only to multi-functions (more than one PF sharing the same network port) devices and is usually required in VMs which are using physical device assignment since most multi-function environments are controlled by the switch which is looping back the packets. But netdev is a great place to ask - are there other vendors out there that requires this control over internal switching? If so, we can define a new ethtool command. The alternative of using the ethtool private flags seems just as inconvenient from administrators point of view and also seem less appropriate since this configuration is more likely to be the same for all PFs on the same machine. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/16/2012 10:35 AM, Eilon Greenstein wrote: > On Thu, 2012-02-16 at 09:49 -0800, Stephen Hemminger wrote: >> On Thu, 16 Feb 2012 16:05:12 +0200 >> "Ariel Elior" <ariele@broadcom.com> wrote: >> >>> In 57712 and 578xx the tx-switching module parameter allows the user to control >>> whether outgoing traffic can be loopbacked into the device in case there is a >>> relevant client for the data using the same device for rx. >>> A classic example where this is necessary is for virtualization purposes, where >>> one vm is transmitting data to another, while both use different pci functions of >>> the same port of the same nic. >>> >>> In case there is a promiscuous client in the rx (which wants to receive all >>> data) or if the traffic is broadcast, traffic may be sent on both the loopback >>> channel and the physical wire. >>> >>> The reason tx-switching is controlled by a module parameter is twofold: >>> 1. There is a certain performance penalty for tx-switching because: >>> a. every packet must be compared against the receiver clients. >>> b. duplicated traffic being loopbacked can consume a significant portion of >>> the overall bandwidth, depending on the scenario. >>> 2. Tx-switching doesn't make much sense as a per function parameter, but should >>> rather be controlled uniformly for the entire device. The reason is that if one >>> interface wants to be able to send data on the loopback it is not enough to >>> enable tx-switching for that interface, as the target interface must also >>> register its rx classification information where the transmitting interface can >>> find it. One would still have to use the module parameter in each VM, though. >>> >>> Signed-off-by: Ariel Elior <ariele@broadcom.com> >>> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> >> >> Module parameters are the hardware vendors friend, but the system >> integrators nightmare. Although you think your hardware is special >> but it isn't some other vendor will have same idea, how is user and >> distribution supposed to control it? > > Actually, module parameters require more explanations and cause more > questions since they are unique to the device than any standard way - so > we do prefer a standard way of doing things. In this case, we looked at > other driver and scanned the mailing list history to see if we missed > some discussion - but could not found anything. It is possible that for > some HW the cost of doing this internal switching is low and therefore > enabled by default and it is possible that some HW do not support it. > This applies only to multi-functions (more than one PF sharing the same > network port) devices and is usually required in VMs which are using > physical device assignment since most multi-function environments are > controlled by the switch which is looping back the packets. > It should be relevant to any case where your doing hardware switching and the mechanism to configure this should be independent of how you expose multiple MAC services (mac/vlan pairs) realized as net devices in Linux. Specifically the mechanism should work for a PF and many VFs, multiple PFs, or queue based filtering mechanisms (Intel's VMDq). The 82599 Intel devices support disabling loopback. This is needed to support VEPA modes as defined in the 802.1Qbg standard which should be ratified shortly. Typically you would expect the peer to support a hairpin forwarding so that PF-VF, VF-VF, and PF-PF communication still works. > But netdev is a great place to ask - are there other vendors out there > that requires this control over internal switching? If so, we can define > a new ethtool command. The alternative of using the ethtool private > flags seems just as inconvenient from administrators point of view and > also seem less appropriate since this configuration is more likely to be > the same for all PFs on the same machine. > This needs to be configurable at runtime. Because the 802.1Qbg spec defines a protocol to learn which mode we should use and we want to be able to support this. 'lldpad' and 'libvirt' already have some support for this. Also macvlan's may be stacked on top of the PF and depending on the macvlan mode VEB or VEPA you may need to configure the hardware switch to be compatible. My thought on this is it should be a netlink command because it will be helpful in userspace to get events when this is changed. A module parameter should be a non-starter here because that would require any management application to start loading and unloading modules which is a pain and bounces the link. Ethtool is better than a modparam but I would prefer to get an event so that I can keep lldpad (or any other app for that matter) in sync. Thanks, John > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-02-16 at 11:38 -0800, John Fastabend wrote: > On 2/16/2012 10:35 AM, Eilon Greenstein wrote: > > On Thu, 2012-02-16 at 09:49 -0800, Stephen Hemminger wrote: > >> On Thu, 16 Feb 2012 16:05:12 +0200 > >> "Ariel Elior" <ariele@broadcom.com> wrote: > >> > >>> In 57712 and 578xx the tx-switching module parameter allows the user to control > >>> whether outgoing traffic can be loopbacked into the device in case there is a > >>> relevant client for the data using the same device for rx. > >>> A classic example where this is necessary is for virtualization purposes, where > >>> one vm is transmitting data to another, while both use different pci functions of > >>> the same port of the same nic. > >>> > >>> In case there is a promiscuous client in the rx (which wants to receive all > >>> data) or if the traffic is broadcast, traffic may be sent on both the loopback > >>> channel and the physical wire. > >>> > >>> The reason tx-switching is controlled by a module parameter is twofold: > >>> 1. There is a certain performance penalty for tx-switching because: > >>> a. every packet must be compared against the receiver clients. > >>> b. duplicated traffic being loopbacked can consume a significant portion of > >>> the overall bandwidth, depending on the scenario. > >>> 2. Tx-switching doesn't make much sense as a per function parameter, but should > >>> rather be controlled uniformly for the entire device. The reason is that if one > >>> interface wants to be able to send data on the loopback it is not enough to > >>> enable tx-switching for that interface, as the target interface must also > >>> register its rx classification information where the transmitting interface can > >>> find it. One would still have to use the module parameter in each VM, though. > >>> > >>> Signed-off-by: Ariel Elior <ariele@broadcom.com> > >>> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> > >> > >> Module parameters are the hardware vendors friend, but the system > >> integrators nightmare. Although you think your hardware is special > >> but it isn't some other vendor will have same idea, how is user and > >> distribution supposed to control it? > > > > Actually, module parameters require more explanations and cause more > > questions since they are unique to the device than any standard way - so > > we do prefer a standard way of doing things. In this case, we looked at > > other driver and scanned the mailing list history to see if we missed > > some discussion - but could not found anything. It is possible that for > > some HW the cost of doing this internal switching is low and therefore > > enabled by default and it is possible that some HW do not support it. > > This applies only to multi-functions (more than one PF sharing the same > > network port) devices and is usually required in VMs which are using > > physical device assignment since most multi-function environments are > > controlled by the switch which is looping back the packets. > > > > It should be relevant to any case where your doing hardware switching and > the mechanism to configure this should be independent of how you expose > multiple MAC services (mac/vlan pairs) realized as net devices in Linux. > Specifically the mechanism should work for a PF and many VFs, multiple PFs, > or queue based filtering mechanisms (Intel's VMDq). > > The 82599 Intel devices support disabling loopback. This is needed to support > VEPA modes as defined in the 802.1Qbg standard which should be ratified > shortly. Typically you would expect the peer to support a hairpin forwarding > so that PF-VF, VF-VF, and PF-PF communication still works. > > > But netdev is a great place to ask - are there other vendors out there > > that requires this control over internal switching? If so, we can define > > a new ethtool command. The alternative of using the ethtool private > > flags seems just as inconvenient from administrators point of view and > > also seem less appropriate since this configuration is more likely to be > > the same for all PFs on the same machine. > > > > This needs to be configurable at runtime. Because the 802.1Qbg spec defines > a protocol to learn which mode we should use and we want to be able to support > this. 'lldpad' and 'libvirt' already have some support for this. Also macvlan's > may be stacked on top of the PF and depending on the macvlan mode VEB or VEPA > you may need to configure the hardware switch to be compatible. > > My thought on this is it should be a netlink command because it will be helpful > in userspace to get events when this is changed. A module parameter should be > a non-starter here because that would require any management application to start > loading and unloading modules which is a pain and bounces the link. Ethtool is > better than a modparam but I would prefer to get an event so that I can keep > lldpad (or any other app for that matter) in sync. OK, thanks John. Dave - please do not apply this patch. We need to look at the alternatives suggested by John. Thanks, Eilon -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/16/2012 11:57 AM, Eilon Greenstein wrote: >>>> "Ariel Elior" <ariele@broadcom.com> wrote: >>>> >>>>> In 57712 and 578xx the tx-switching module parameter allows the user to control >>>>> whether outgoing traffic can be loopbacked into the device in case there is a >>>>> relevant client for the data using the same device for rx. [...] >>> But netdev is a great place to ask - are there other vendors out there >>> that requires this control over internal switching? If so, we can define >>> a new ethtool command. The alternative of using the ethtool private >>> flags seems just as inconvenient from administrators point of view and >>> also seem less appropriate since this configuration is more likely to be >>> the same for all PFs on the same machine. >>> [...] > OK, thanks John. Dave - please do not apply this patch. We need to look > at the alternatives suggested by John. > > Thanks, > Eilon > Eilon, any progress with this? We need this to support VEPA modes in macvlan running over SR-IOV devices or other embedded switches. The problem being if the stacked device expects the packet to be sent to the reflexive relay on the switch in the SR-IOV case the embedded switch may not actually do this. Thanks, John -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2012-04-15 at 22:11 -0700, John Fastabend wrote: > On 2/16/2012 11:57 AM, Eilon Greenstein wrote: > >>>> "Ariel Elior" <ariele@broadcom.com> wrote: > >>>> > >>>>> In 57712 and 578xx the tx-switching module parameter allows the user to control > >>>>> whether outgoing traffic can be loopbacked into the device in case there is a > >>>>> relevant client for the data using the same device for rx. > > [...] > > >>> But netdev is a great place to ask - are there other vendors out there > >>> that requires this control over internal switching? If so, we can define > >>> a new ethtool command. The alternative of using the ethtool private > >>> flags seems just as inconvenient from administrators point of view and > >>> also seem less appropriate since this configuration is more likely to be > >>> the same for all PFs on the same machine. > >>> > > [...] > > > OK, thanks John. Dave - please do not apply this patch. We need to look > > at the alternatives suggested by John. > > > > Thanks, > > Eilon > > > > Eilon, any progress with this? We need this to support VEPA modes in > macvlan running over SR-IOV devices or other embedded switches. The > problem being if the stacked device expects the packet to be sent > to the reflexive relay on the switch in the SR-IOV case the embedded > switch may not actually do this. John, honestly, this is not at the top of the priority list for us right now. If you would like to proceed, please do so - we will have to adopt to what you suggest when we get back to this issue. Thanks, Eilon -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index d60b5f0..4d359e9 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1275,6 +1275,7 @@ struct bnx2x { #define NO_ISCSI_FLAG (1 << 14) #define NO_FCOE_FLAG (1 << 15) #define BC_SUPPORTS_PFC_STATS (1 << 17) +#define TX_SWITCHING (1 << 18) #define NO_ISCSI(bp) ((bp)->flags & NO_ISCSI_FLAG) #define NO_ISCSI_OOO(bp) ((bp)->flags & NO_ISCSI_OOO_FLAG) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h index f978c6a..3e26a3f 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h @@ -1241,6 +1241,10 @@ static inline u8 bnx2x_get_path_func_num(struct bnx2x *bp) static inline void bnx2x_init_bp_objs(struct bnx2x *bp) { + /* mcast rules must be added to tx if tx switching is enabled */ + bnx2x_obj_type o_type = bp->flags & TX_SWITCHING ? + BNX2X_OBJ_TYPE_RX_TX : BNX2X_OBJ_TYPE_RX; + /* RX_MODE controlling object */ bnx2x_init_rx_mode_obj(bp, &bp->rx_mode_obj); @@ -1250,7 +1254,7 @@ static inline void bnx2x_init_bp_objs(struct bnx2x *bp) bnx2x_sp(bp, mcast_rdata), bnx2x_sp_mapping(bp, mcast_rdata), BNX2X_FILTER_MCAST_PENDING, &bp->sp_state, - BNX2X_OBJ_TYPE_RX); + o_type); /* Setup CAM credit pools */ bnx2x_init_mac_credit_pool(bp, &bp->macs_pool, BP_FUNC(bp), diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 8e809c1..1e30bbd 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -129,7 +129,9 @@ static int debug; module_param(debug, int, 0); MODULE_PARM_DESC(debug, " Default debug msglevel"); - +static uint tx_switching; +module_param(tx_switching, uint, 0); +MODULE_PARM_DESC(tx_switching, " Enable tx-switching"); struct workqueue_struct *bnx2x_wq; @@ -2686,6 +2688,11 @@ static inline unsigned long bnx2x_get_common_flags(struct bnx2x *bp, if (zero_stats) __set_bit(BNX2X_Q_FLG_ZERO_STATS, &flags); + /* tx only connections can support tx-switching, though their + * COS-ness doesn't survive the loopback + */ + if (bp->flags & TX_SWITCHING) + __set_bit(BNX2X_Q_FLG_TX_SWITCH, &flags); return flags; } @@ -10186,6 +10193,10 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp) bp->dev->features |= NETIF_F_LRO; } + /* test tx switching module parameter */ + if (tx_switching) + bp->flags |= TX_SWITCHING; + if (CHIP_IS_E1(bp)) bp->dropless_fc = 0; else