Message ID | 1388978467-2075-2-git-send-email-jasowang@redhat.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2014-01-06 at 11:21 +0800, Jason Wang wrote: > Currently, the tx queue were selected implicitly in > ndo_dfwd_start_xmit(). The > will cause several issues: > > - NETIF_F_LLTX was forced for macvlan device in this case which lead > extra lock > contention. > - dev_hard_start_xmit() was called with NULL txq which bypasses the > net device > watchdog > - dev_hard_start_xmit() does not check txq everywhere which will lead > a crash > when tso is disabled for lower device. > > Fix this by explicitly introducing a select queue method just for l2 > forwarding > offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to > do the > queue selecting and transmitting for l2 forwarding. > > With this fixes, NETIF_F_LLTX could be preserved for macvlan and > there's no need > to check txq against NULL in dev_hard_start_xmit(). > > In the future, it was also required for macvtap l2 forwarding support > since it > provides a necessary synchronization method. > > Cc: John Fastabend <john.r.fastabend@intel.com> > Cc: Neil Horman <nhorman@tuxdriver.com> > Cc: e1000-devel@lists.sourceforge.net > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 15 +++++++++---- > drivers/net/macvlan.c | 3 +- > include/linux/netdevice.h | 11 +++++++++ > net/core/dev.c | 28 > ++++++++++++++++++++++++- > 4 files changed, 49 insertions(+), 8 deletions(-) Thanks Jason, I have added this to my queue since it has changes against ixgbe.
On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: > Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The > will cause several issues: > > - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock > contention. > - dev_hard_start_xmit() was called with NULL txq which bypasses the net device > watchdog > - dev_hard_start_xmit() does not check txq everywhere which will lead a crash > when tso is disabled for lower device. > > Fix this by explicitly introducing a select queue method just for l2 forwarding > offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the > queue selecting and transmitting for l2 forwarding. > > With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need > to check txq against NULL in dev_hard_start_xmit(). > > In the future, it was also required for macvtap l2 forwarding support since it > provides a necessary synchronization method. > > Cc: John Fastabend <john.r.fastabend@intel.com> > Cc: Neil Horman <nhorman@tuxdriver.com> > Cc: e1000-devel@lists.sourceforge.net > Signed-off-by: Jason Wang <jasowang@redhat.com> Instead of creating another operation here to do special queue selection, why not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? ndo_dfwd_start_xmit already knows which queue set to pick from (since their reserved for the device doing the transmitting). It seems more clear to me than creating a new netdevice operation. As for the crash issue, I'm not sure what you mean. Where in dev_hard_start_xmit would we need to check txq that we're not currently, and what crash results? Also, can you elaborate on what you mean by additional lock contention? What contention do you see that goes above and beyond the normal locking required by txq access? I suppose its extra locking above and beyond in the macvtap case, where you would otherwise never hit hardware, but that not the only use case, and I think the solution there is likely to add some code in the macvlan feature set handler so that NETIF_F_LLTX is cleared if you disable the hardware forwarding acceleration via ethtool. Regards Neil > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2014 04:42 AM, Neil Horman wrote: > On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: >> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The >> will cause several issues: >> >> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock >> contention. >> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device >> watchdog >> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash >> when tso is disabled for lower device. >> >> Fix this by explicitly introducing a select queue method just for l2 forwarding >> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the >> queue selecting and transmitting for l2 forwarding. >> >> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need >> to check txq against NULL in dev_hard_start_xmit(). >> >> In the future, it was also required for macvtap l2 forwarding support since it >> provides a necessary synchronization method. >> >> Cc: John Fastabend <john.r.fastabend@intel.com> >> Cc: Neil Horman <nhorman@tuxdriver.com> >> Cc: e1000-devel@lists.sourceforge.net >> Signed-off-by: Jason Wang <jasowang@redhat.com> > > Instead of creating another operation here to do special queue selection, why > not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument > list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? > ndo_dfwd_start_xmit already knows which queue set to pick from (since their > reserved for the device doing the transmitting). It seems more clear to me than > creating a new netdevice operation. > > As for the crash issue, I'm not sure what you mean. Where in > dev_hard_start_xmit would we need to check txq that we're not currently, and > what crash results? > > Also, can you elaborate on what you mean by additional lock contention? What > contention do you see that goes above and beyond the normal locking required by > txq access? I suppose its extra locking above and beyond in the macvtap case, > where you would otherwise never hit hardware, but that not the only use case, > and I think the solution there is likely to add some code in the macvlan feature > set handler so that NETIF_F_LLTX is cleared if you disable the hardware > forwarding acceleration via ethtool. > NETIF_F_LLTX is cleared in macvlan_open() which should be used in the macvtap case. if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) { vlan->fwd_priv = lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev); /* If we get a NULL pointer back, or if we get an error * then we should just fall through to the non accelerated path */ if (IS_ERR_OR_NULL(vlan->fwd_priv)) { vlan->fwd_priv = NULL; } else { dev->features &= ~NETIF_F_LLTX; return 0; } } Thanks, John
On Mon, Jan 06, 2014 at 07:06:25AM -0800, John Fastabend wrote: > On 01/06/2014 04:42 AM, Neil Horman wrote: > >On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: > >>Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The > >>will cause several issues: > >> > >>- NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock > >> contention. > >>- dev_hard_start_xmit() was called with NULL txq which bypasses the net device > >> watchdog > >>- dev_hard_start_xmit() does not check txq everywhere which will lead a crash > >> when tso is disabled for lower device. > >> > >>Fix this by explicitly introducing a select queue method just for l2 forwarding > >>offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the > >>queue selecting and transmitting for l2 forwarding. > >> > >>With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need > >>to check txq against NULL in dev_hard_start_xmit(). > >> > >>In the future, it was also required for macvtap l2 forwarding support since it > >>provides a necessary synchronization method. > >> > >>Cc: John Fastabend <john.r.fastabend@intel.com> > >>Cc: Neil Horman <nhorman@tuxdriver.com> > >>Cc: e1000-devel@lists.sourceforge.net > >>Signed-off-by: Jason Wang <jasowang@redhat.com> > > > >Instead of creating another operation here to do special queue selection, why > >not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument > >list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? > >ndo_dfwd_start_xmit already knows which queue set to pick from (since their > >reserved for the device doing the transmitting). It seems more clear to me than > >creating a new netdevice operation. > > > >As for the crash issue, I'm not sure what you mean. Where in > >dev_hard_start_xmit would we need to check txq that we're not currently, and > >what crash results? > > > >Also, can you elaborate on what you mean by additional lock contention? What > >contention do you see that goes above and beyond the normal locking required by > >txq access? I suppose its extra locking above and beyond in the macvtap case, > >where you would otherwise never hit hardware, but that not the only use case, > >and I think the solution there is likely to add some code in the macvlan feature > >set handler so that NETIF_F_LLTX is cleared if you disable the hardware > >forwarding acceleration via ethtool. > > > > NETIF_F_LLTX is cleared in macvlan_open() which should be used in the > macvtap case. > Thats right, since accelerated hardware tx queue doesn't participate in the network stack queue locking, the upper device needs to do it. Thanks! Neil > if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) { > vlan->fwd_priv = > > lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev); > > /* If we get a NULL pointer back, or if we get an error > * then we should just fall through to the non > accelerated path > */ > if (IS_ERR_OR_NULL(vlan->fwd_priv)) { > vlan->fwd_priv = NULL; > } else { > dev->features &= ~NETIF_F_LLTX; > return 0; > } > } > > Thanks, > John > > > -- > John Fastabend Intel Corporation > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2014 08:42 PM, Neil Horman wrote: > On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: >> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The >> will cause several issues: >> >> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock >> contention. >> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device >> watchdog >> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash >> when tso is disabled for lower device. >> >> Fix this by explicitly introducing a select queue method just for l2 forwarding >> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the >> queue selecting and transmitting for l2 forwarding. >> >> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need >> to check txq against NULL in dev_hard_start_xmit(). >> >> In the future, it was also required for macvtap l2 forwarding support since it >> provides a necessary synchronization method. >> >> Cc: John Fastabend <john.r.fastabend@intel.com> >> Cc: Neil Horman <nhorman@tuxdriver.com> >> Cc: e1000-devel@lists.sourceforge.net >> Signed-off-by: Jason Wang <jasowang@redhat.com> > Instead of creating another operation here to do special queue selection, why > not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument > list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? > ndo_dfwd_start_xmit already knows which queue set to pick from (since their > reserved for the device doing the transmitting). It seems more clear to me than > creating a new netdevice operation. See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless tx path"). The point is keep the tx path lockless to be efficient and simplicity for management. And macvtap multiqueue was also implemented with this assumption. The real contention should be done in the txq of lower device instead of macvlan itself. This is also needed for multiqueue macvtap. > > As for the crash issue, I'm not sure what you mean. Where in > dev_hard_start_xmit would we need to check txq that we're not currently, and > what crash results? Well, see current dev_hard_start_xmit(), if lower device does not support tso or tso is disabled, in gso path: gso: ... txq_trans_update(txq); if (unlikely(netif_xmit_stopped(txq) && skb->next)) There's an obvious NULL pointer dereference. > > Also, can you elaborate on what you mean by additional lock contention? If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the lock of device itself must be held before doing transmission. In the case, the macvlan txq lock contention is obvious unnecessary. > What > contention do you see that goes above and beyond the normal locking required by > txq access? As I said above, the point is keeping the lockess tx path and make the contention of txq for real device instead of macvlan itself. > I suppose its extra locking above and beyond in the macvtap case, > where you would otherwise never hit hardware, but that not the only use case, > and I think the solution there is likely to add some code in the macvlan feature > set handler so that NETIF_F_LLTX is cleared if you disable the hardware > forwarding acceleration via ethtool. > > Regards > Neil > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 1/5/2014 7:21 PM, Jason Wang wrote: > Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The > will cause several issues: > > - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock > contention. > - dev_hard_start_xmit() was called with NULL txq which bypasses the net device > watchdog > - dev_hard_start_xmit() does not check txq everywhere which will lead a crash > when tso is disabled for lower device. > > Fix this by explicitly introducing a select queue method just for l2 forwarding > offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the > queue selecting and transmitting for l2 forwarding. > > With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need > to check txq against NULL in dev_hard_start_xmit(). > > In the future, it was also required for macvtap l2 forwarding support since it > provides a necessary synchronization method. > > Cc: John Fastabend <john.r.fastabend@intel.com> > Cc: Neil Horman <nhorman@tuxdriver.com> > Cc: e1000-devel@lists.sourceforge.net > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- [...] > index 4fc1722..bc2b03f 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2538,6 +2538,32 @@ static inline int skb_needs_linearize(struct sk_buff *skb, > !(features & NETIF_F_SG))); > } > > +int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev, > + void *accel_priv) > +{ > + struct netdev_queue *txq; > + int ret = NETDEV_TX_BUSY; > + int index; > + > + BUG_ON(!dev->netdev_ops->ndo_dfwd_select_queue); > + index = dev->netdev_ops->ndo_dfwd_select_queue(dev, skb, > + accel_priv); > + > + local_bh_disable(); > + > + skb_set_queue_mapping(skb, index); How about replacing the index calculation and skb_set_queue_mapping with netdev_pick_tx(). Then we don't need to add a new op and the existing XPS, tx hash and select_queue() op works. > + txq = netdev_get_tx_queue(dev, index); > + > + HARD_TX_LOCK(dev, txq, smp_processor_id()); > + if (!netif_xmit_frozen_or_stopped(txq)) > + ret = dev_hard_start_xmit(skb, dev, txq, accel_priv); > + HARD_TX_UNLOCK(dev, txq); > + > + local_bh_enable(); > + return ret; > +} > +EXPORT_SYMBOL_GPL(dfwd_direct_xmit); > + > int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > struct netdev_queue *txq, void *accel_priv) > { > @@ -2611,7 +2637,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > rc = ops->ndo_start_xmit(skb, dev); > > trace_net_dev_xmit(skb, rc, dev, skb_len); > - if (rc == NETDEV_TX_OK && txq) > + if (rc == NETDEV_TX_OK) > txq_trans_update(txq); Removing the check here rather than adding more checks in the gso case as I suggested in the other thread seems cleaner. Thanks! John > return rc; > } > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[...] >> >> +int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev, >> + void *accel_priv) >> +{ >> + struct netdev_queue *txq; >> + int ret = NETDEV_TX_BUSY; >> + int index; >> + >> + BUG_ON(!dev->netdev_ops->ndo_dfwd_select_queue); >> + index = dev->netdev_ops->ndo_dfwd_select_queue(dev, skb, >> + accel_priv); >> + >> + local_bh_disable(); >> + >> + skb_set_queue_mapping(skb, index); > > How about replacing the index calculation and skb_set_queue_mapping with > netdev_pick_tx(). Then we don't need to add a new op and the existing > XPS, tx hash and select_queue() op works. > Sorry for the noise that was dumb it wouldn't work. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote: > On 01/06/2014 08:42 PM, Neil Horman wrote: > > On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: > >> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The > >> will cause several issues: > >> > >> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock > >> contention. > >> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device > >> watchdog > >> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash > >> when tso is disabled for lower device. > >> > >> Fix this by explicitly introducing a select queue method just for l2 forwarding > >> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the > >> queue selecting and transmitting for l2 forwarding. > >> > >> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need > >> to check txq against NULL in dev_hard_start_xmit(). > >> > >> In the future, it was also required for macvtap l2 forwarding support since it > >> provides a necessary synchronization method. > >> > >> Cc: John Fastabend <john.r.fastabend@intel.com> > >> Cc: Neil Horman <nhorman@tuxdriver.com> > >> Cc: e1000-devel@lists.sourceforge.net > >> Signed-off-by: Jason Wang <jasowang@redhat.com> > > Instead of creating another operation here to do special queue selection, why > > not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument > > list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? > > ndo_dfwd_start_xmit already knows which queue set to pick from (since their > > reserved for the device doing the transmitting). It seems more clear to me than > > creating a new netdevice operation. > > See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless > tx path"). The point is keep the tx path lockless to be efficient and > simplicity for management. And macvtap multiqueue was also implemented > with this assumption. The real contention should be done in the txq of > lower device instead of macvlan itself. This is also needed for > multiqueue macvtap. > > Ok, I see how you're preserving LLTX here, and thats great, but it doesn't really buy us anything that I can see. If a macvlan is using hardware acceleration, it needs to arbitrate access to that hardware. Weather thats done by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan itself is equivalent. The decision to use dfwd hardware acceleration is made on open, so its not like theres any traffic that can avoid the lock, as it all goes through the hardware. All I see that this has bought us is an extra net_device method (which isn't a big deal, but not necessecary as I see it). It seems like the right solution here is to use ethtool to disable the dfwd acceleration feature on the hardware if you don't want to incur the additional locking. > > As for the crash issue, I'm not sure what you mean. Where in > > dev_hard_start_xmit would we need to check txq that we're not currently, and > > what crash results? > > Well, see current dev_hard_start_xmit(), if lower device does not > support tso or tso is disabled, in gso path: > > gso: > ... > txq_trans_update(txq); > if (unlikely(netif_xmit_stopped(txq) && skb->next)) > > There's an obvious NULL pointer dereference. Yeah, I saw that after I wrote my note, sorry about that. However, it doesn't change what I said above. i don't think theres a need to add an additional select_queue method here, just add a pointer to a pointer arg to the dfwd xmit routine to fill in the queue that was selected on transmit. That should resolve all the potential NULL pointers without the need to ad additional methods to net_device_ops. > > > > Also, can you elaborate on what you mean by additional lock contention? > > If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the > lock of device itself must be held before doing transmission. In the > case, the macvlan txq lock contention is obvious unnecessary. Not in its current implementation. The lowerdevice's NETIF_F_LLTX are ignored during transmit because we're using a privately allocated set of queues assigned to the transmitting macvlan (i.e. there is not contention on the lowerdevice for any single given macvlan). Thats why we have to clear the NETIF_F_LLTX field on the macvlan itself. using dfwd acceleration is effectively giving a macvlan its own hardware, that needs arbitration between every context thats trying to transmit over it. That said, I did just notice that if the macvlan has multiple queues we will still use a single tx queue on transmit, we should map those queues to multiple hardware queues. I'll fix that shortly. > > What > > contention do you see that goes above and beyond the normal locking required by > > txq access? > > As I said above, the point is keeping the lockess tx path and make the > contention of txq for real device instead of macvlan itself. But you've not done that. You've just moved the locking down to the lowerdev, which is fine I suppose, but doesn't actually improve anything, since we will still lock exactly as many time as if we do the locking in the macvlan. Regards Neil > > I suppose its extra locking above and beyond in the macvtap case, > > where you would otherwise never hit hardware, but that not the only use case, > > and I think the solution there is likely to add some code in the macvlan feature > > set handler so that NETIF_F_LLTX is cleared if you disable the hardware > > forwarding acceleration via ethtool. > > > > Regards > > Neil > > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/07/2014 09:17 PM, Neil Horman wrote: > On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote: >> On 01/06/2014 08:42 PM, Neil Horman wrote: >>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: >>>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The >>>> will cause several issues: >>>> >>>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock >>>> contention. >>>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device >>>> watchdog >>>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash >>>> when tso is disabled for lower device. >>>> >>>> Fix this by explicitly introducing a select queue method just for l2 forwarding >>>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the >>>> queue selecting and transmitting for l2 forwarding. >>>> >>>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need >>>> to check txq against NULL in dev_hard_start_xmit(). >>>> >>>> In the future, it was also required for macvtap l2 forwarding support since it >>>> provides a necessary synchronization method. >>>> >>>> Cc: John Fastabend <john.r.fastabend@intel.com> >>>> Cc: Neil Horman <nhorman@tuxdriver.com> >>>> Cc: e1000-devel@lists.sourceforge.net >>>> Signed-off-by: Jason Wang <jasowang@redhat.com> >>> Instead of creating another operation here to do special queue selection, why >>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument >>> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? >>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their >>> reserved for the device doing the transmitting). It seems more clear to me than >>> creating a new netdevice operation. >> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless >> tx path"). The point is keep the tx path lockless to be efficient and >> simplicity for management. And macvtap multiqueue was also implemented >> with this assumption. The real contention should be done in the txq of >> lower device instead of macvlan itself. This is also needed for >> multiqueue macvtap. > Ok, I see how you're preserving LLTX here, and thats great, but it doesn't > really buy us anything that I can see. If a macvlan is using hardware > acceleration, it needs to arbitrate access to that hardware. Weather thats done > by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan > itself is equivalent. The decision to use dfwd hardware acceleration is made on > open, so its not like theres any traffic that can avoid the lock, as it all goes > through the hardware. All I see that this has bought us is an extra net_device > method (which isn't a big deal, but not necessecary as I see it). As I replied to patch 1/2, looking at the code itself again. The locking on the lowerdev's tx queue is really need since we need synchronize with other control path. Two examples are dev watchdog and ixgbe_down() both of which will try to hold tx lock to synchronize the with transmission. Without holding the lowerdev tx lock, we may have more serious issues. Also, it's a little strange for a net device has two modes. Future developers need to care about two different tx lock paths which is sub optimal. For the issue of an extra net_device method, if you don't like we can reuse the ndo_select_queue by also passing the accel_priv to that method. > > It seems like the right solution here is to use ethtool to disable the dfwd > acceleration feature on the hardware if you don't want to incur the additional > locking. > >>> As for the crash issue, I'm not sure what you mean. Where in >>> dev_hard_start_xmit would we need to check txq that we're not currently, and >>> what crash results? >> Well, see current dev_hard_start_xmit(), if lower device does not >> support tso or tso is disabled, in gso path: >> >> gso: >> ... >> txq_trans_update(txq); >> if (unlikely(netif_xmit_stopped(txq) && skb->next)) >> >> There's an obvious NULL pointer dereference. > Yeah, I saw that after I wrote my note, sorry about that. However, it doesn't > change what I said above. i don't think theres a need to add an additional > select_queue method here, just add a pointer to a pointer arg to the dfwd xmit > routine to fill in the queue that was selected on transmit. That should resolve > all the potential NULL pointers without the need to ad additional methods to > net_device_ops. > >>> Also, can you elaborate on what you mean by additional lock contention? >> If the lower device has NETIF_F_LLTX, then both macvlan txq lock and the >> lock of device itself must be held before doing transmission. In the >> case, the macvlan txq lock contention is obvious unnecessary. > Not in its current implementation. The lowerdevice's NETIF_F_LLTX are ignored > during transmit because we're using a privately allocated set of queues assigned > to the transmitting macvlan (i.e. there is not contention on the lowerdevice for > any single given macvlan). Thats why we have to clear the NETIF_F_LLTX field on > the macvlan itself. using dfwd acceleration is effectively giving a macvlan its > own hardware, that needs arbitration between every context thats trying to > transmit over it. > > > That said, I did just notice that if the macvlan has multiple queues we will > still use a single tx queue on transmit, we should map those queues to multiple > hardware queues. I'll fix that shortly. > >>> What >>> contention do you see that goes above and beyond the normal locking required by >>> txq access? >> As I said above, the point is keeping the lockess tx path and make the >> contention of txq for real device instead of macvlan itself. > But you've not done that. You've just moved the locking down to the lowerdev, > which is fine I suppose, but doesn't actually improve anything, since we will > still lock exactly as many time as if we do the locking in the macvlan. > > Regards > Neil > >>> I suppose its extra locking above and beyond in the macvtap case, >>> where you would otherwise never hit hardware, but that not the only use case, >>> and I think the solution there is likely to add some code in the macvlan feature >>> set handler so that NETIF_F_LLTX is cleared if you disable the hardware >>> forwarding acceleration via ethtool. >>> >>> Regards >>> Neil >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote: > On 01/07/2014 09:17 PM, Neil Horman wrote: > > On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote: > >> On 01/06/2014 08:42 PM, Neil Horman wrote: > >>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: > >>>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The > >>>> will cause several issues: > >>>> > >>>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock > >>>> contention. > >>>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device > >>>> watchdog > >>>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash > >>>> when tso is disabled for lower device. > >>>> > >>>> Fix this by explicitly introducing a select queue method just for l2 forwarding > >>>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the > >>>> queue selecting and transmitting for l2 forwarding. > >>>> > >>>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need > >>>> to check txq against NULL in dev_hard_start_xmit(). > >>>> > >>>> In the future, it was also required for macvtap l2 forwarding support since it > >>>> provides a necessary synchronization method. > >>>> > >>>> Cc: John Fastabend <john.r.fastabend@intel.com> > >>>> Cc: Neil Horman <nhorman@tuxdriver.com> > >>>> Cc: e1000-devel@lists.sourceforge.net > >>>> Signed-off-by: Jason Wang <jasowang@redhat.com> > >>> Instead of creating another operation here to do special queue selection, why > >>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument > >>> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? > >>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their > >>> reserved for the device doing the transmitting). It seems more clear to me than > >>> creating a new netdevice operation. > >> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless > >> tx path"). The point is keep the tx path lockless to be efficient and > >> simplicity for management. And macvtap multiqueue was also implemented > >> with this assumption. The real contention should be done in the txq of > >> lower device instead of macvlan itself. This is also needed for > >> multiqueue macvtap. > > Ok, I see how you're preserving LLTX here, and thats great, but it doesn't > > really buy us anything that I can see. If a macvlan is using hardware > > acceleration, it needs to arbitrate access to that hardware. Weather thats done > > by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan > > itself is equivalent. The decision to use dfwd hardware acceleration is made on > > open, so its not like theres any traffic that can avoid the lock, as it all goes > > through the hardware. All I see that this has bought us is an extra net_device > > method (which isn't a big deal, but not necessecary as I see it). > > As I replied to patch 1/2, looking at the code itself again. The locking > on the lowerdev's tx queue is really need since we need synchronize with > other control path. Two examples are dev watchdog and ixgbe_down() both > of which will try to hold tx lock to synchronize the with transmission. > Without holding the lowerdev tx lock, we may have more serious issues. > Also, it's a little strange for a net device has two modes. Future > developers need to care about two different tx lock paths which is sub > optimal. > Ok, having looked at this for a few hours, I agree, locking in the lowerdev has some definiate advantages in plugging the holes you've pointed out. > For the issue of an extra net_device method, if you don't like we can > reuse the ndo_select_queue by also passing the accel_priv to that method. I do, that actually simplifies things, since it lets us use the entire dev_hard_start_xmit path unmodified, which gives us the locking your looking for without having to create a new slimmed down variant of dev_hard_start_xmit. Regards Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/08/2014 10:40 PM, Neil Horman wrote: > On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote: >> On 01/07/2014 09:17 PM, Neil Horman wrote: >>> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote: >>>> On 01/06/2014 08:42 PM, Neil Horman wrote: >>>>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: >>>>>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The >>>>>> will cause several issues: >>>>>> >>>>>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock >>>>>> contention. >>>>>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device >>>>>> watchdog >>>>>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash >>>>>> when tso is disabled for lower device. >>>>>> >>>>>> Fix this by explicitly introducing a select queue method just for l2 forwarding >>>>>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the >>>>>> queue selecting and transmitting for l2 forwarding. >>>>>> >>>>>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need >>>>>> to check txq against NULL in dev_hard_start_xmit(). >>>>>> >>>>>> In the future, it was also required for macvtap l2 forwarding support since it >>>>>> provides a necessary synchronization method. >>>>>> >>>>>> Cc: John Fastabend <john.r.fastabend@intel.com> >>>>>> Cc: Neil Horman <nhorman@tuxdriver.com> >>>>>> Cc: e1000-devel@lists.sourceforge.net >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com> >>>>> Instead of creating another operation here to do special queue selection, why >>>>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument >>>>> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? >>>>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their >>>>> reserved for the device doing the transmitting). It seems more clear to me than >>>>> creating a new netdevice operation. >>>> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless >>>> tx path"). The point is keep the tx path lockless to be efficient and >>>> simplicity for management. And macvtap multiqueue was also implemented >>>> with this assumption. The real contention should be done in the txq of >>>> lower device instead of macvlan itself. This is also needed for >>>> multiqueue macvtap. >>> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't >>> really buy us anything that I can see. If a macvlan is using hardware >>> acceleration, it needs to arbitrate access to that hardware. Weather thats done >>> by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan >>> itself is equivalent. The decision to use dfwd hardware acceleration is made on >>> open, so its not like theres any traffic that can avoid the lock, as it all goes >>> through the hardware. All I see that this has bought us is an extra net_device >>> method (which isn't a big deal, but not necessecary as I see it). >> As I replied to patch 1/2, looking at the code itself again. The locking >> on the lowerdev's tx queue is really need since we need synchronize with >> other control path. Two examples are dev watchdog and ixgbe_down() both >> of which will try to hold tx lock to synchronize the with transmission. >> Without holding the lowerdev tx lock, we may have more serious issues. >> Also, it's a little strange for a net device has two modes. Future >> developers need to care about two different tx lock paths which is sub >> optimal. >> > Ok, having looked at this for a few hours, I agree, locking in the lowerdev has > some definiate advantages in plugging the holes you've pointed out. > >> For the issue of an extra net_device method, if you don't like we can >> reuse the ndo_select_queue by also passing the accel_priv to that method. > I do, that actually simplifies things, since it lets us use the entire > dev_hard_start_xmit path unmodified, which gives us the locking your looking for > without having to create a new slimmed down variant of dev_hard_start_xmit. > > Regards > Neil Right, will post V2. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 09, 2014 at 04:28:49PM +0800, Jason Wang wrote: > On 01/08/2014 10:40 PM, Neil Horman wrote: > > On Wed, Jan 08, 2014 at 11:21:21AM +0800, Jason Wang wrote: > >> On 01/07/2014 09:17 PM, Neil Horman wrote: > >>> On Tue, Jan 07, 2014 at 11:42:24AM +0800, Jason Wang wrote: > >>>> On 01/06/2014 08:42 PM, Neil Horman wrote: > >>>>> On Mon, Jan 06, 2014 at 11:21:07AM +0800, Jason Wang wrote: > >>>>>> Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The > >>>>>> will cause several issues: > >>>>>> > >>>>>> - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock > >>>>>> contention. > >>>>>> - dev_hard_start_xmit() was called with NULL txq which bypasses the net device > >>>>>> watchdog > >>>>>> - dev_hard_start_xmit() does not check txq everywhere which will lead a crash > >>>>>> when tso is disabled for lower device. > >>>>>> > >>>>>> Fix this by explicitly introducing a select queue method just for l2 forwarding > >>>>>> offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the > >>>>>> queue selecting and transmitting for l2 forwarding. > >>>>>> > >>>>>> With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need > >>>>>> to check txq against NULL in dev_hard_start_xmit(). > >>>>>> > >>>>>> In the future, it was also required for macvtap l2 forwarding support since it > >>>>>> provides a necessary synchronization method. > >>>>>> > >>>>>> Cc: John Fastabend <john.r.fastabend@intel.com> > >>>>>> Cc: Neil Horman <nhorman@tuxdriver.com> > >>>>>> Cc: e1000-devel@lists.sourceforge.net > >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com> > >>>>> Instead of creating another operation here to do special queue selection, why > >>>>> not just have ndo_dfwd_start_xmit include a pointer to a pointer in its argument > >>>>> list, so it can pass the txq it used back to the caller (dev_hard_start_xmit)? > >>>>> ndo_dfwd_start_xmit already knows which queue set to pick from (since their > >>>>> reserved for the device doing the transmitting). It seems more clear to me than > >>>>> creating a new netdevice operation. > >>>> See commit 8ffab51b3dfc54876f145f15b351c41f3f703195 ("macvlan: lockless > >>>> tx path"). The point is keep the tx path lockless to be efficient and > >>>> simplicity for management. And macvtap multiqueue was also implemented > >>>> with this assumption. The real contention should be done in the txq of > >>>> lower device instead of macvlan itself. This is also needed for > >>>> multiqueue macvtap. > >>> Ok, I see how you're preserving LLTX here, and thats great, but it doesn't > >>> really buy us anything that I can see. If a macvlan is using hardware > >>> acceleration, it needs to arbitrate access to that hardware. Weather thats done > >>> by locking the lowerdev's tx queue lock or by enforcing locking on the macvlan > >>> itself is equivalent. The decision to use dfwd hardware acceleration is made on > >>> open, so its not like theres any traffic that can avoid the lock, as it all goes > >>> through the hardware. All I see that this has bought us is an extra net_device > >>> method (which isn't a big deal, but not necessecary as I see it). > >> As I replied to patch 1/2, looking at the code itself again. The locking > >> on the lowerdev's tx queue is really need since we need synchronize with > >> other control path. Two examples are dev watchdog and ixgbe_down() both > >> of which will try to hold tx lock to synchronize the with transmission. > >> Without holding the lowerdev tx lock, we may have more serious issues. > >> Also, it's a little strange for a net device has two modes. Future > >> developers need to care about two different tx lock paths which is sub > >> optimal. > >> > > Ok, having looked at this for a few hours, I agree, locking in the lowerdev has > > some definiate advantages in plugging the holes you've pointed out. > > > >> For the issue of an extra net_device method, if you don't like we can > >> reuse the ndo_select_queue by also passing the accel_priv to that method. > > I do, that actually simplifies things, since it lets us use the entire > > dev_hard_start_xmit path unmodified, which gives us the locking your looking for > > without having to create a new slimmed down variant of dev_hard_start_xmit. > > > > Regards > > Neil > > Right, will post V2. > Thanks Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index cc06854..ee71cf7 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -7629,16 +7629,20 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv) kfree(fwd_adapter); } +static u16 ixgbe_fwd_select_queue(struct net_device *dev, struct sk_buff *skb, + void *priv) +{ + struct ixgbe_fwd_adapter *fwd_adapter = priv; + return skb->queue_mapping + fwd_adapter->tx_base_queue; +} + static netdev_tx_t ixgbe_fwd_xmit(struct sk_buff *skb, struct net_device *dev, void *priv) { struct ixgbe_fwd_adapter *fwd_adapter = priv; - unsigned int queue; - struct ixgbe_ring *tx_ring; - - queue = skb->queue_mapping + fwd_adapter->tx_base_queue; - tx_ring = fwd_adapter->real_adapter->tx_ring[queue]; + struct ixgbe_ring *tx_ring = + fwd_adapter->real_adapter->tx_ring[skb->queue_mapping]; return __ixgbe_xmit_frame(skb, dev, tx_ring); } @@ -7689,6 +7693,7 @@ static const struct net_device_ops ixgbe_netdev_ops = { .ndo_bridge_getlink = ixgbe_ndo_bridge_getlink, .ndo_dfwd_add_station = ixgbe_fwd_add, .ndo_dfwd_del_station = ixgbe_fwd_del, + .ndo_dfwd_select_queue = ixgbe_fwd_select_queue, .ndo_dfwd_start_xmit = ixgbe_fwd_xmit, }; diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 5360f73..2cbbce3 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -299,7 +299,7 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb, if (vlan->fwd_priv) { skb->dev = vlan->lowerdev; - ret = dev_hard_start_xmit(skb, skb->dev, NULL, vlan->fwd_priv); + ret = dfwd_direct_xmit(skb, skb->dev, vlan->fwd_priv); } else { ret = macvlan_queue_xmit(skb, dev); } @@ -366,7 +366,6 @@ static int macvlan_open(struct net_device *dev) if (IS_ERR_OR_NULL(vlan->fwd_priv)) { vlan->fwd_priv = NULL; } else { - dev->features &= ~NETIF_F_LLTX; return 0; } } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d9a550b..dbfd476 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -975,6 +975,11 @@ struct netdev_phys_port_id { * by 'ndo_dfwd_add_station'. 'pdev' is the net device backing * the station and priv is the structure returned by the add * operation. + * u16 (*ndo_dfwd_select_queue)(struct net_device *dev, + * struct sk_buff *skb, + * void *priv); + * Called to decide which queue to xmit over the accelerated station when + * device supports multiple transmit queues. * netdev_tx_t (*ndo_dfwd_start_xmit)(struct sk_buff *skb, * struct net_device *dev, * void *priv); @@ -1123,6 +1128,10 @@ struct net_device_ops { void (*ndo_dfwd_del_station)(struct net_device *pdev, void *priv); + u16 (*ndo_dfwd_select_queue)(struct net_device *dev, + struct sk_buff *skb, + void *priv); + netdev_tx_t (*ndo_dfwd_start_xmit) (struct sk_buff *skb, struct net_device *dev, void *priv); @@ -2416,6 +2425,8 @@ int dev_set_mac_address(struct net_device *, struct sockaddr *); int dev_change_carrier(struct net_device *, bool new_carrier); int dev_get_phys_port_id(struct net_device *dev, struct netdev_phys_port_id *ppid); +int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev, + void *accel_priv); int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, void *accel_priv); int dev_forward_skb(struct net_device *dev, struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index 4fc1722..bc2b03f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2538,6 +2538,32 @@ static inline int skb_needs_linearize(struct sk_buff *skb, !(features & NETIF_F_SG))); } +int dfwd_direct_xmit(struct sk_buff *skb, struct net_device *dev, + void *accel_priv) +{ + struct netdev_queue *txq; + int ret = NETDEV_TX_BUSY; + int index; + + BUG_ON(!dev->netdev_ops->ndo_dfwd_select_queue); + index = dev->netdev_ops->ndo_dfwd_select_queue(dev, skb, + accel_priv); + + local_bh_disable(); + + skb_set_queue_mapping(skb, index); + txq = netdev_get_tx_queue(dev, index); + + HARD_TX_LOCK(dev, txq, smp_processor_id()); + if (!netif_xmit_frozen_or_stopped(txq)) + ret = dev_hard_start_xmit(skb, dev, txq, accel_priv); + HARD_TX_UNLOCK(dev, txq); + + local_bh_enable(); + return ret; +} +EXPORT_SYMBOL_GPL(dfwd_direct_xmit); + int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, void *accel_priv) { @@ -2611,7 +2637,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, rc = ops->ndo_start_xmit(skb, dev); trace_net_dev_xmit(skb, rc, dev, skb_len); - if (rc == NETDEV_TX_OK && txq) + if (rc == NETDEV_TX_OK) txq_trans_update(txq); return rc; }
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX was forced for macvlan device in this case which lead extra lock contention. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a select queue method just for l2 forwarding offload (ndo_dfwd_select_queue), and introducing dfwd_direct_xmit() to do the queue selecting and transmitting for l2 forwarding. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 15 +++++++++---- drivers/net/macvlan.c | 3 +- include/linux/netdevice.h | 11 +++++++++ net/core/dev.c | 28 ++++++++++++++++++++++++- 4 files changed, 49 insertions(+), 8 deletions(-)