Message ID | 20180612151322.86792.97587.stgit@ahduyck-green-test.jf.intel.com |
---|---|
Headers | show |
Series | Add support for L2 Fwd Offload w/o ndo_select_queue | expand |
On Tue, 12 Jun 2018 11:18:25 -0400 Alexander Duyck <alexander.h.duyck@intel.com> wrote: > This patch series is meant to allow support for the L2 forward offload, aka > MACVLAN offload without the need for using ndo_select_queue. > > The existing solution currently requires that we use ndo_select_queue in > the transmit path if we want to associate specific Tx queues with a given > MACVLAN interface. In order to get away from this we need to repurpose the > tc_to_txq array and XPS pointer for the MACVLAN interface and use those as > a means of accessing the queues on the lower device. As a result we cannot > offload a device that is configured as multiqueue, however it doesn't > really make sense to configure a macvlan interfaced as being multiqueue > anyway since it doesn't really have a qdisc of its own in the first place. > > I am submitting this as an RFC for the netdev mailing list, and officially > submitting it for testing to Jeff Kirsher's next-queue in order to validate > the ixgbe specific bits. > > The big changes in this set are: > Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN > Disable XPS for single queue devices > Replace accel_priv with sb_dev in ndo_select_queue > Add sb_dev parameter to fallback function for ndo_select_queue > Consolidated ndo_select_queue functions that appeared to be duplicates > > v2: Implement generic "select_queue" functions instead of "fallback" functions. > Tweak last two patches to account for changes in dev_pick_tx_xxx functions. > > --- > > Alexander Duyck (7): > net-sysfs: Drop support for XPS and traffic_class on single queue device > net: Add support for subordinate device traffic classes > ixgbe: Add code to populate and use macvlan tc to Tx queue map > net: Add support for subordinate traffic classes to netdev_pick_tx > net: Add generic ndo_select_queue functions > net: allow ndo_select_queue to pass netdev > net: allow fallback function to pass netdev > > > drivers/infiniband/hw/hfi1/vnic_main.c | 2 > drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 4 - > drivers/net/bonding/bond_main.c | 3 > drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 - > drivers/net/ethernet/broadcom/bcmsysport.c | 6 - > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 + > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 > drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 - > drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 - > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 62 ++++++-- > drivers/net/ethernet/lantiq_etop.c | 10 - > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 7 + > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 > drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 > drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 - > drivers/net/ethernet/renesas/ravb_main.c | 3 > drivers/net/ethernet/sun/ldmvsw.c | 3 > drivers/net/ethernet/sun/sunvnet.c | 3 > drivers/net/ethernet/ti/netcp_core.c | 9 - > drivers/net/hyperv/netvsc_drv.c | 6 - > drivers/net/macvlan.c | 10 - > drivers/net/net_failover.c | 7 + > drivers/net/team/team.c | 3 > drivers/net/tun.c | 3 > drivers/net/wireless/marvell/mwifiex/main.c | 3 > drivers/net/xen-netback/interface.c | 4 - > drivers/net/xen-netfront.c | 3 > drivers/staging/netlogic/xlr_net.c | 9 - > drivers/staging/rtl8188eu/os_dep/os_intfs.c | 3 > drivers/staging/rtl8723bs/os_dep/os_intfs.c | 7 - > include/linux/netdevice.h | 34 ++++- > net/core/dev.c | 156 ++++++++++++++++++--- > net/core/net-sysfs.c | 36 ++++- > net/mac80211/iface.c | 4 - > net/packet/af_packet.c | 7 + > 35 files changed, 312 insertions(+), 130 deletions(-) > > -- This makes sense. I thought you were hoping to get rid of select queue in future?
On 06/12/2018 08:18 AM, Alexander Duyck wrote: > This patch series is meant to allow support for the L2 forward offload, aka > MACVLAN offload without the need for using ndo_select_queue. > > The existing solution currently requires that we use ndo_select_queue in > the transmit path if we want to associate specific Tx queues with a given > MACVLAN interface. In order to get away from this we need to repurpose the > tc_to_txq array and XPS pointer for the MACVLAN interface and use those as > a means of accessing the queues on the lower device. As a result we cannot > offload a device that is configured as multiqueue, however it doesn't > really make sense to configure a macvlan interfaced as being multiqueue > anyway since it doesn't really have a qdisc of its own in the first place. Interesting, so at some point I had came up with the following for mapping queues between the DSA slave network devices and the DSA master network device (doing the actual transmission). The DSA master network device driver is just a normal network device driver. The set-up is as follows: 4 external Ethernet switch ports, each with 8 egress queues and the DSA master (bcmsysport.c), aka CPU Ethernet controller has 32 output queues, so you can do a 1:1 mapping of those, that's actually what we want. A subsequent hardware generation only provides 16 output queues, so we can still do 2:1 mapping. The implementation is done like this: - DSA slave network devices are always created after the DSA master network device so we can leverage that - a specific notifier is running from the DSA core and tells the DSA master about the switch position in the tree (position 0 = directly attached), and the switch port number and a pointer to the slave network device - we establish the mapping between the queues within the bcmsysport driver as a simple array - when transmitting, DSA slave network devices set a specific queue/port number within the 16-bits that skb->queue_mapping permits - this gets re-used by bcmsysport.c to extract the correct queue number during ndo_select_queue such that the appropriate queue number gets used and congestion works end-to-end. The reason why we do that is because there is some out of band HW that monitors the queue depth of the switch port's egress queue and back-pressure the Ethernet controller directly when trying to transmit to a congested queue. I had initially considered establishing the mapping using tc and some custom "bind" argument of some kind, but ended-up doing things the way they are which are more automatic though they leave less configuration to an user. This has a number of caveats though: - this is made generic within the context of DSA in that nothing is switch driver or Ethernet MAC driver specific and the notifier represents the contract between these two seemingly independent subsystems - the queue indicated between DSA slave and master is unfortunately switch driver/controller specific (BRCM_TAG_SET_PORT_QUEUE, BRCM_TAG_GET_PORT, BRCM_TAG_GET_QUEUE) What I like about your patchset is the mapping establishment, but as you will read from my reply in patch 2, I think the (upper) 1:N (lower) mapping might not work for my specific use case. Anyhow, not intended to be blocking this, as it seems to be going in the right direction anyway. > > I am submitting this as an RFC for the netdev mailing list, and officially > submitting it for testing to Jeff Kirsher's next-queue in order to validate > the ixgbe specific bits. > > The big changes in this set are: > Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN > Disable XPS for single queue devices > Replace accel_priv with sb_dev in ndo_select_queue > Add sb_dev parameter to fallback function for ndo_select_queue > Consolidated ndo_select_queue functions that appeared to be duplicates Interesting, turns out I had a possibly similar use case with DSA with the slave network devices need to select an outgoing queue number for > > v2: Implement generic "select_queue" functions instead of "fallback" functions. > Tweak last two patches to account for changes in dev_pick_tx_xxx functions. > > --- > > Alexander Duyck (7): > net-sysfs: Drop support for XPS and traffic_class on single queue device > net: Add support for subordinate device traffic classes > ixgbe: Add code to populate and use macvlan tc to Tx queue map > net: Add support for subordinate traffic classes to netdev_pick_tx > net: Add generic ndo_select_queue functions > net: allow ndo_select_queue to pass netdev > net: allow fallback function to pass netdev > > > drivers/infiniband/hw/hfi1/vnic_main.c | 2 > drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 4 - > drivers/net/bonding/bond_main.c | 3 > drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 - > drivers/net/ethernet/broadcom/bcmsysport.c | 6 - > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 + > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 > drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 - > drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 - > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 62 ++++++-- > drivers/net/ethernet/lantiq_etop.c | 10 - > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 7 + > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 > drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 > drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 - > drivers/net/ethernet/renesas/ravb_main.c | 3 > drivers/net/ethernet/sun/ldmvsw.c | 3 > drivers/net/ethernet/sun/sunvnet.c | 3 > drivers/net/ethernet/ti/netcp_core.c | 9 - > drivers/net/hyperv/netvsc_drv.c | 6 - > drivers/net/macvlan.c | 10 - > drivers/net/net_failover.c | 7 + > drivers/net/team/team.c | 3 > drivers/net/tun.c | 3 > drivers/net/wireless/marvell/mwifiex/main.c | 3 > drivers/net/xen-netback/interface.c | 4 - > drivers/net/xen-netfront.c | 3 > drivers/staging/netlogic/xlr_net.c | 9 - > drivers/staging/rtl8188eu/os_dep/os_intfs.c | 3 > drivers/staging/rtl8723bs/os_dep/os_intfs.c | 7 - > include/linux/netdevice.h | 34 ++++- > net/core/dev.c | 156 ++++++++++++++++++--- > net/core/net-sysfs.c | 36 ++++- > net/mac80211/iface.c | 4 - > net/packet/af_packet.c | 7 + > 35 files changed, 312 insertions(+), 130 deletions(-) > > -- >
On Tue, Jun 12, 2018 at 10:56 AM, Florian Fainelli <f.fainelli@gmail.com> wrote: > On 06/12/2018 08:18 AM, Alexander Duyck wrote: >> This patch series is meant to allow support for the L2 forward offload, aka >> MACVLAN offload without the need for using ndo_select_queue. >> >> The existing solution currently requires that we use ndo_select_queue in >> the transmit path if we want to associate specific Tx queues with a given >> MACVLAN interface. In order to get away from this we need to repurpose the >> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as >> a means of accessing the queues on the lower device. As a result we cannot >> offload a device that is configured as multiqueue, however it doesn't >> really make sense to configure a macvlan interfaced as being multiqueue >> anyway since it doesn't really have a qdisc of its own in the first place. > > Interesting, so at some point I had came up with the following for > mapping queues between the DSA slave network devices and the DSA master > network device (doing the actual transmission). The DSA master network > device driver is just a normal network device driver. > > The set-up is as follows: 4 external Ethernet switch ports, each with 8 > egress queues and the DSA master (bcmsysport.c), aka CPU Ethernet > controller has 32 output queues, so you can do a 1:1 mapping of those, > that's actually what we want. A subsequent hardware generation only > provides 16 output queues, so we can still do 2:1 mapping. > > The implementation is done like this: > > - DSA slave network devices are always created after the DSA master > network device so we can leverage that > > - a specific notifier is running from the DSA core and tells the DSA > master about the switch position in the tree (position 0 = directly > attached), and the switch port number and a pointer to the slave network > device > > - we establish the mapping between the queues within the bcmsysport > driver as a simple array > > - when transmitting, DSA slave network devices set a specific queue/port > number within the 16-bits that skb->queue_mapping permits > > - this gets re-used by bcmsysport.c to extract the correct queue number > during ndo_select_queue such that the appropriate queue number gets used > and congestion works end-to-end. > > The reason why we do that is because there is some out of band HW that > monitors the queue depth of the switch port's egress queue and > back-pressure the Ethernet controller directly when trying to transmit > to a congested queue. > > I had initially considered establishing the mapping using tc and some > custom "bind" argument of some kind, but ended-up doing things the way > they are which are more automatic though they leave less configuration > to an user. This has a number of caveats though: > > - this is made generic within the context of DSA in that nothing is > switch driver or Ethernet MAC driver specific and the notifier > represents the contract between these two seemingly independent subsystems > > - the queue indicated between DSA slave and master is unfortunately > switch driver/controller specific (BRCM_TAG_SET_PORT_QUEUE, > BRCM_TAG_GET_PORT, BRCM_TAG_GET_QUEUE) > > What I like about your patchset is the mapping establishment, but as you > will read from my reply in patch 2, I think the (upper) 1:N (lower) > mapping might not work for my specific use case. > > Anyhow, not intended to be blocking this, as it seems to be going in the > right direction anyway. I think I am still not getting why the 1:N would be an issue. At least the way I have the code implemented here the lower queues all have a qdisc associated with them, just not the upper device. Generally I am using the macvlan as a bump in the wire to take care of filtering for the bridging mode. If I have to hairpin packets and send them back up on on of the the upper interfaces I want to do that in software rather than hardware so I try to take care of it there instead of routing it through the hardware. >> >> I am submitting this as an RFC for the netdev mailing list, and officially >> submitting it for testing to Jeff Kirsher's next-queue in order to validate >> the ixgbe specific bits. >> >> The big changes in this set are: >> Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN >> Disable XPS for single queue devices >> Replace accel_priv with sb_dev in ndo_select_queue >> Add sb_dev parameter to fallback function for ndo_select_queue >> Consolidated ndo_select_queue functions that appeared to be duplicates > > Interesting, turns out I had a possibly similar use case with DSA with > the slave network devices need to select an outgoing queue number for I was kind of assuming this could be applied to a number of possible use cases. As it was I was wondering if maybe we should look at adding this as an option for just a standard VLAN as we could perform the same kind of filtering and just deliver the packet directly to the VLAN interface instead of requiring the extra trip through the stack after the tag has been stripped.
On Tue, Jun 12, 2018 at 10:50 AM, Stephen Hemminger <stephen@networkplumber.org> wrote: > On Tue, 12 Jun 2018 11:18:25 -0400 > Alexander Duyck <alexander.h.duyck@intel.com> wrote: > >> This patch series is meant to allow support for the L2 forward offload, aka >> MACVLAN offload without the need for using ndo_select_queue. >> >> The existing solution currently requires that we use ndo_select_queue in >> the transmit path if we want to associate specific Tx queues with a given >> MACVLAN interface. In order to get away from this we need to repurpose the >> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as >> a means of accessing the queues on the lower device. As a result we cannot >> offload a device that is configured as multiqueue, however it doesn't >> really make sense to configure a macvlan interfaced as being multiqueue >> anyway since it doesn't really have a qdisc of its own in the first place. >> >> I am submitting this as an RFC for the netdev mailing list, and officially >> submitting it for testing to Jeff Kirsher's next-queue in order to validate >> the ixgbe specific bits. >> >> The big changes in this set are: >> Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN >> Disable XPS for single queue devices >> Replace accel_priv with sb_dev in ndo_select_queue >> Add sb_dev parameter to fallback function for ndo_select_queue >> Consolidated ndo_select_queue functions that appeared to be duplicates >> >> v2: Implement generic "select_queue" functions instead of "fallback" functions. >> Tweak last two patches to account for changes in dev_pick_tx_xxx functions. >> >> --- >> >> Alexander Duyck (7): >> net-sysfs: Drop support for XPS and traffic_class on single queue device >> net: Add support for subordinate device traffic classes >> ixgbe: Add code to populate and use macvlan tc to Tx queue map >> net: Add support for subordinate traffic classes to netdev_pick_tx >> net: Add generic ndo_select_queue functions >> net: allow ndo_select_queue to pass netdev >> net: allow fallback function to pass netdev >> >> >> drivers/infiniband/hw/hfi1/vnic_main.c | 2 >> drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 4 - >> drivers/net/bonding/bond_main.c | 3 >> drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 - >> drivers/net/ethernet/broadcom/bcmsysport.c | 6 - >> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 + >> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 >> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 - >> drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 - >> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 62 ++++++-- >> drivers/net/ethernet/lantiq_etop.c | 10 - >> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 7 + >> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 >> drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 >> drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 - >> drivers/net/ethernet/renesas/ravb_main.c | 3 >> drivers/net/ethernet/sun/ldmvsw.c | 3 >> drivers/net/ethernet/sun/sunvnet.c | 3 >> drivers/net/ethernet/ti/netcp_core.c | 9 - >> drivers/net/hyperv/netvsc_drv.c | 6 - >> drivers/net/macvlan.c | 10 - >> drivers/net/net_failover.c | 7 + >> drivers/net/team/team.c | 3 >> drivers/net/tun.c | 3 >> drivers/net/wireless/marvell/mwifiex/main.c | 3 >> drivers/net/xen-netback/interface.c | 4 - >> drivers/net/xen-netfront.c | 3 >> drivers/staging/netlogic/xlr_net.c | 9 - >> drivers/staging/rtl8188eu/os_dep/os_intfs.c | 3 >> drivers/staging/rtl8723bs/os_dep/os_intfs.c | 7 - >> include/linux/netdevice.h | 34 ++++- >> net/core/dev.c | 156 ++++++++++++++++++--- >> net/core/net-sysfs.c | 36 ++++- >> net/mac80211/iface.c | 4 - >> net/packet/af_packet.c | 7 + >> 35 files changed, 312 insertions(+), 130 deletions(-) >> >> -- > > This makes sense. I thought you were hoping to get rid of select queue in future? That would be nice, however there are still a bunch of corner cases that are not handled that have been dumped into select queue. For example in the case of ixgbe the issue is FCoE. There are a number of other places that are using it as well as I seem to recall netvsc and bonding both use it to store off the original Rx->Tx queue mapping when passing through the interface. For now I figure we can take this one hill at a time and I am just making it so we don't have to use ndo_select_queue in order to make vmdq work for macvlan offload. - Alex
On Tue, Jun 12, 2018 at 8:18 AM, Alexander Duyck <alexander.h.duyck@intel.com> wrote: > This patch series is meant to allow support for the L2 forward offload, aka > MACVLAN offload without the need for using ndo_select_queue. > > The existing solution currently requires that we use ndo_select_queue in > the transmit path if we want to associate specific Tx queues with a given > MACVLAN interface. In order to get away from this we need to repurpose the > tc_to_txq array and XPS pointer for the MACVLAN interface and use those as > a means of accessing the queues on the lower device. As a result we cannot > offload a device that is configured as multiqueue, however it doesn't > really make sense to configure a macvlan interfaced as being multiqueue > anyway since it doesn't really have a qdisc of its own in the first place. > > I am submitting this as an RFC for the netdev mailing list, and officially > submitting it for testing to Jeff Kirsher's next-queue in order to validate > the ixgbe specific bits. > > The big changes in this set are: > Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN > Disable XPS for single queue devices > Replace accel_priv with sb_dev in ndo_select_queue > Add sb_dev parameter to fallback function for ndo_select_queue > Consolidated ndo_select_queue functions that appeared to be duplicates > > v2: Implement generic "select_queue" functions instead of "fallback" functions. > Tweak last two patches to account for changes in dev_pick_tx_xxx functions. > > --- > > Alexander Duyck (7): > net-sysfs: Drop support for XPS and traffic_class on single queue device > net: Add support for subordinate device traffic classes > ixgbe: Add code to populate and use macvlan tc to Tx queue map > net: Add support for subordinate traffic classes to netdev_pick_tx > net: Add generic ndo_select_queue functions > net: allow ndo_select_queue to pass netdev > net: allow fallback function to pass netdev > Alex, there were recent changes to Dave's net-next which caused a conflict with one or more of your patches. So I have removed this series from my tree for now, and will work with you on updating the series to work with the latest net-next tree.