Message ID | 20140812143531.GJ2404@oracle.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Tue, 12 Aug 2014 10:35:31 -0400 > @@ -489,28 +491,13 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf) > return 0; > } > > -static void maybe_tx_wakeup(struct vnet *vp) > +static void maybe_tx_wakeup(unsigned long param) > { > + struct vnet *vp = (struct vnet *)param; > struct net_device *dev = vp->dev; > ... > - } > + vnet_tx_timeout(dev); > netif_tx_unlock(dev); > } > > @@ -764,7 +756,25 @@ out_dropped: > > static void vnet_tx_timeout(struct net_device *dev) > { > - /* XXX Implement me XXX */ > + struct vnet *vp = netdev_priv(dev); > + Please do not usurp vnet_tx_timeout() to handle queue waking. It is hooked up to netdev_ops->ndo_tx_timeout() which is invoked when the qdisc watchdog triggers after dev->watchdog_timeout time elapses after the queue is stopped without a subsequent wake. This method is a therefore executed after a hard timeout, meaning that the device should probably be reset, whereas maybe_tx_wakeup() is normal processing meant to drive the engine of TX flow control. In the context of sunvnet, vnet_tx_timeout() should probably reset the LDC channel(s) of the unresponsive peer(s). Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please do not usurp vnet_tx_timeout() to handle queue waking. > > It is hooked up to netdev_ops->ndo_tx_timeout() which is invoked when > the qdisc watchdog triggers after dev->watchdog_timeout time elapses > after the queue is stopped without a subsequent wake. > > This method is a therefore executed after a hard timeout, meaning that > the device should probably be reset, whereas maybe_tx_wakeup() is > normal processing meant to drive the engine of TX flow control. > > In the context of sunvnet, vnet_tx_timeout() should probably reset the > LDC channel(s) of the unresponsive peer(s). yes, after a couple of days of reflection on the lingering issues in http://www.spinics.net/lists/sparclinux/msg12360.html I too was starting to arrive at similar conclusions about maybe_tx_wakeup vs vnet_tx_timeout. First case 2 in the "todo" list (cannot send DRING_STOPPED): If we cannot send a vnet_send_ack() we should invoke ldc_disconnect() for this peer. The ldc_disconnect() will reset hs_state and flags in the ldc_channel, so subsequent attempts to ldc_write() (e.g., from vio_ldc_send()) will return -ENOTCONN. So, for that case, a subsequent vnet_start_xmit() would *not* netif_stop_queue(), but just drop the packet, reset the dring, and keep driving. To recover the disconnected channel, the admin would need to (ldm) unbind/bind the offender, identifiable by their mac address. And none of the other channels shold be affected (assuming inter-vnet-links is set to on) So for case 1 we could do something very similar- I haven't tried this yet, but here's what I'm thinking: vnet_start_xmit() should not do a netif_stop_queue. Instead it should do a (new function) "vio_stop_queue()" which sets some (new) VIO_DR_FLOW_CONTROLLED state in the vio_driver_state, (or perhaps some flag bit in the ldc_channel?) and also marks a (new) boolean is_flow_controlled state variable in the struct vnet as TRUE. The modified maybe_tx_wakeup would check is_flow_controlled on the vnet, and call a new vnet_wake_queue() to reset VIO_DR_FLOW_CONTROLLED if appropriate. vnet_start_xmit should drop packets as long as VIO_DR_FLOW_CONTROLLED is set. The above sketch has a deficiency that it doesnt avoid packet drops at the net_device layer itself - packets that can't make it out because of FLOW_CONTROLLED will get dropped - I think the Qdisc code path deals better with the back-pressure. I dont know if there is a way to leverage that path and still avoid the head-of-line blocking due to one flow-controlled ldc_channel. I'm a little hesitant to ldc_disconnect a peer from the vnet_start_xmit path- if we have a super-efficient Tx path, we dont want to reset things merely because the receiver can't keep up- we just want to flow-control the Tx? Whereas the inability to transmit a DRING_STOPPED from the Rx side is likely to be a more unusual situation (would need the peer to send a burst of packets and then not be prepared to do even a single ldc_read()!)? thoughts? --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Tue, 12 Aug 2014 21:58:17 -0400 >> >> Please do not usurp vnet_tx_timeout() to handle queue waking. >> >> It is hooked up to netdev_ops->ndo_tx_timeout() which is invoked when >> the qdisc watchdog triggers after dev->watchdog_timeout time elapses >> after the queue is stopped without a subsequent wake. >> >> This method is a therefore executed after a hard timeout, meaning that >> the device should probably be reset, whereas maybe_tx_wakeup() is >> normal processing meant to drive the engine of TX flow control. >> >> In the context of sunvnet, vnet_tx_timeout() should probably reset the >> LDC channel(s) of the unresponsive peer(s). > > yes, after a couple of days of reflection on the lingering issues in > http://www.spinics.net/lists/sparclinux/msg12360.html > I too was starting to arrive at similar conclusions about > maybe_tx_wakeup vs vnet_tx_timeout. However, don't get ahead of yourself. All I'm saying in the context of reviewing this patch #3 is that you should leave vnet_tx_timeout() alone and just put what you were putting into vnet_tx_timeout() into maybe_tx_wakeup(). > First case 2 in the "todo" list (cannot send DRING_STOPPED): > If we cannot send a vnet_send_ack() we should invoke ldc_disconnect() > for this peer. The ldc_disconnect() will reset hs_state and flags > in the ldc_channel, so subsequent attempts to ldc_write() > (e.g., from vio_ldc_send()) will return -ENOTCONN. So, for > that case, a subsequent vnet_start_xmit() would *not* netif_stop_queue(), > but just drop the packet, reset the dring, and keep driving. > > To recover the disconnected channel, the admin would need to (ldm) > unbind/bind the offender, identifiable by their mac address. > And none of the other channels shold be affected (assuming > inter-vnet-links is set to on) This doesn't match my understanding. My understanding is that when an LDC connection resets, the peer on the other side should automatically try to re-bind or whatever is necessary to re-establish the LDC link. It's supposed to be fault tolerant in as many situations as possible without the need for explicit adminstrator intervention. But besides that, an LDC reset is a big hammer. So I would say that it should be deferred to the last possible point. So initially we do the backoff spinning loop synchronously. If that doesn't succeed, we schedule a workqueue that can poll the LDC channel some more trying to do the write, in a sleepable context (so a loop with delays and preemption points) until a much larger timeout. Only at this second timeout do we declare things to be serious enough for an LDC channel reset. > So for case 1 we could do something very similar- I haven't tried this yet, > but here's what I'm thinking: vnet_start_xmit() should not do a > netif_stop_queue. Instead it should do a (new function) "vio_stop_queue()" > which sets some (new) VIO_DR_FLOW_CONTROLLED state in the vio_driver_state, > (or perhaps some flag bit in the ldc_channel?) and also marks a > (new) boolean is_flow_controlled state variable in the struct vnet > as TRUE. > > The modified maybe_tx_wakeup would check is_flow_controlled on the vnet, > and call a new vnet_wake_queue() to reset VIO_DR_FLOW_CONTROLLED > if appropriate. > > vnet_start_xmit should drop packets as long as VIO_DR_FLOW_CONTROLLED > is set. Again, as I stated yesterday, you really cannot do this. Head of line blocking when one peer gets wedged is absolutely necessary. If a packet for a blocked peer is pending, you have to make the other guys wait until either the blocked peer gets unstuck or you reset the blocked peer's LDC channel on a hard timeout. > I dont know if there is a way to leverage that path and still avoid > the head-of-line blocking due to one flow-controlled ldc_channel. Again, I don't think the semantics resulting from head-of-line blocking avoidance are acceptable. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Aug 12, 2014, at 6:58 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: >> >> Please do not usurp vnet_tx_timeout() to handle queue waking. >> >> It is hooked up to netdev_ops->ndo_tx_timeout() which is invoked when >> the qdisc watchdog triggers after dev->watchdog_timeout time elapses >> after the queue is stopped without a subsequent wake. >> >> This method is a therefore executed after a hard timeout, meaning that >> the device should probably be reset, whereas maybe_tx_wakeup() is >> normal processing meant to drive the engine of TX flow control. >> >> In the context of sunvnet, vnet_tx_timeout() should probably reset the >> LDC channel(s) of the unresponsive peer(s). > > yes, after a couple of days of reflection on the lingering issues in > http://www.spinics.net/lists/sparclinux/msg12360.html > I too was starting to arrive at similar conclusions about > maybe_tx_wakeup vs vnet_tx_timeout. > > First case 2 in the "todo" list (cannot send DRING_STOPPED): > If we cannot send a vnet_send_ack() we should invoke ldc_disconnect() > for this peer. The ldc_disconnect() will reset hs_state and flags > in the ldc_channel, so subsequent attempts to ldc_write() > (e.g., from vio_ldc_send()) will return -ENOTCONN. So, for > that case, a subsequent vnet_start_xmit() would *not* netif_stop_queue(), > but just drop the packet, reset the dring, and keep driving. > > To recover the disconnected channel, the admin would need to (ldm) > unbind/bind the offender, identifiable by their mac address. > And none of the other channels shold be affected (assuming > inter-vnet-links is set to on) We do not consider a peer which couldn't process LDC messages as an offender but just a peer in a broken state. A peer could get into this type of situation due to many different reasons for example a hang or lockup in an entirely unrelated code or stuck in a debugger or due to a bug in the sunvnet driver itself. Such a guest should be able to recover when it is rebooted. Requiring an unbind/bind is not considered a solution, we don't expect admins to perform unbind/bind unless they are really trying to destroy the domain. Note, an unbind of a domain causes the vnet device ports in all of the peer domains removed and the bind brings back, which is a heavy hammer. What we really want is an LDC reset which puts the LDC state in a clean state to allow the peer to restart the handshake and establish communication again. This can be accomplished either by rebooting the peer or unload/load the sunvnet driver in the peer domain. It could also automatically recover too. You can see this as something similar to resetting a NIC when the NIC h/w is locked up. From what I read the ldc_disconect() code, it seems like it is reconfiguring the queues, so I assume a reset event would re-enable the communication fine, but I don't know for sure without testing this specific condition. > > So for case 1 we could do something very similar- I haven't tried this yet, > but here's what I'm thinking: vnet_start_xmit() should not do a > netif_stop_queue. Instead it should do a (new function) "vio_stop_queue()" > which sets some (new) VIO_DR_FLOW_CONTROLLED state in the vio_driver_state, > (or perhaps some flag bit in the ldc_channel?) and also marks a > (new) boolean is_flow_controlled state variable in the struct vnet > as TRUE. > > The modified maybe_tx_wakeup would check is_flow_controlled on the vnet, > and call a new vnet_wake_queue() to reset VIO_DR_FLOW_CONTROLLED > if appropriate. > > vnet_start_xmit should drop packets as long as VIO_DR_FLOW_CONTROLLED > is set. > One important point to keep in mind that packets to different peers shouldn't be blocked by one blocked peer. Any flow controlling or dropping of the packets needs to be done on a per-port basis. -Raghuram > The above sketch has a deficiency that it doesnt avoid packet drops > at the net_device layer itself - packets that can't make it out because > of FLOW_CONTROLLED will get dropped - I think the Qdisc code path > deals better with the back-pressure. I dont know if there is a way > to leverage that path and still avoid the head-of-line blocking due > to one flow-controlled ldc_channel. > > I'm a little hesitant to ldc_disconnect a peer from the vnet_start_xmit > path- if we have a super-efficient Tx path, we dont want to > reset things merely because the receiver can't keep up- we just want to > flow-control the Tx? Whereas the inability to transmit a DRING_STOPPED > from the Rx side is likely to be a more unusual situation (would need > the peer to send a burst of packets and then not be prepared to do > even a single ldc_read()!)? > > thoughts? > > --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> Date: Tue, 12 Aug 2014 21:26:28 -0700 > One important point to keep in mind that packets to different peers shouldn't > be blocked by one blocked peer. Any flow controlling or dropping of the packets > needs to be done on a per-port basis. Until we use a big hammer and reset the LDC channel of that peer, we _absolutely_ should not reorder traffic and send to non-stuck peers. A networking stack should avoid packet reordering in all cases when it has the ability to do so, and sending packets which came in later to unstuck peers when there are older packets still pending to the stuck peer violates that requirement. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Aug 12, 2014, at 9:31 PM, David Miller <davem@davemloft.net> wrote: > From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> > Date: Tue, 12 Aug 2014 21:26:28 -0700 > >> One important point to keep in mind that packets to different peers shouldn't >> be blocked by one blocked peer. Any flow controlling or dropping of the packets >> needs to be done on a per-port basis. > > Until we use a big hammer and reset the LDC channel of that peer, we > _absolutely_ should not reorder traffic and send to non-stuck peers. The packet ordering requirement applies only for packets destined to a specific destination. The packets to different destinations can flow as long as they do not cause ordering issues. In case of sunvnet, each peer except the switch-port would be for a specific destination, so sending packets to other ports would not result in re-ordering of the packets. Even if the peer to peer LDC is the one that is blocked then disconnecting that LDC channel and sending the packets via switch-port would not be expected to change the order of the packet as the LDC reset is expect to drop all packets in the ring so far. -Raghuram > > A networking stack should avoid packet reordering in all cases when it > has the ability to do so, and sending packets which came in later to > unstuck peers when there are older packets still pending to the stuck > peer violates that requirement. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> Date: Tue, 12 Aug 2014 22:14:09 -0700 > On Aug 12, 2014, at 9:31 PM, David Miller <davem@davemloft.net> wrote: > >> From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> >> Date: Tue, 12 Aug 2014 21:26:28 -0700 >> >>> One important point to keep in mind that packets to different peers shouldn't >>> be blocked by one blocked peer. Any flow controlling or dropping of the packets >>> needs to be done on a per-port basis. >> >> Until we use a big hammer and reset the LDC channel of that peer, we >> _absolutely_ should not reorder traffic and send to non-stuck peers. > > The packet ordering requirement applies only for packets destined to > a specific destination. The packets to different destinations can flow > as long as they do not cause ordering issues. In case of sunvnet, > each peer except the switch-port would be for a specific destination, > so sending packets to other ports would not result in re-ordering of the packets. > > Even if the peer to peer LDC is the one that is blocked then disconnecting > that LDC channel and sending the packets via switch-port would not be > expected to change the order of the packet as the LDC reset is expect > to drop all packets in the ring so far. Ok, that makes sense. So the question is how to manage this on the driver side, and the most natural way I see do this would be to use multiple TX netdev queues and a custom netdev_ops->ndo_select_queue() method which selects the queue based upon the peer that would be selected. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Aug 12, 2014, at 10:29 PM, David Miller <davem@davemloft.net> wrote: > From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> > Date: Tue, 12 Aug 2014 22:14:09 -0700 > >> On Aug 12, 2014, at 9:31 PM, David Miller <davem@davemloft.net> wrote: >> >>> From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> >>> Date: Tue, 12 Aug 2014 21:26:28 -0700 >>> >>>> One important point to keep in mind that packets to different peers shouldn't >>>> be blocked by one blocked peer. Any flow controlling or dropping of the packets >>>> needs to be done on a per-port basis. >>> >>> Until we use a big hammer and reset the LDC channel of that peer, we >>> _absolutely_ should not reorder traffic and send to non-stuck peers. >> >> The packet ordering requirement applies only for packets destined to >> a specific destination. The packets to different destinations can flow >> as long as they do not cause ordering issues. In case of sunvnet, >> each peer except the switch-port would be for a specific destination, >> so sending packets to other ports would not result in re-ordering of the packets. >> >> Even if the peer to peer LDC is the one that is blocked then disconnecting >> that LDC channel and sending the packets via switch-port would not be >> expected to change the order of the packet as the LDC reset is expect >> to drop all packets in the ring so far. > > Ok, that makes sense. > > So the question is how to manage this on the driver side, and the most > natural way I see do this would be to use multiple TX netdev queues > and a custom netdev_ops->ndo_select_queue() method which selects the > queue based upon the peer that would be selected. Thanks, if there is a method to accomplish this with multiple Tx netdev queues would be wonderful. We will research on the custom ndo_selec_queue() method to direct traffic automatically at the network stack level. We probably also need to look for methods to increase parallelism to boost performance, I assume multiple Tx queues would be a method to accomplish that as well. -Raghuram-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> Date: Tue, 12 Aug 2014 22:44:27 -0700 > Thanks, if there is a method to accomplish this with multiple Tx netdev queues > would be wonderful. We will research on the custom ndo_selec_queue() > method to direct traffic automatically at the network stack level. > We probably also need to look for methods to increase parallelism to boost > performance, I assume multiple Tx queues would be a method to accomplish > that as well. It will introduce parallelism only when there is traffic to multiple peers at the same time. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Aug 12, 2014, at 11:11 PM, David Miller <davem@davemloft.net> wrote: > From: Raghuram Kothakota <Raghuram.Kothakota@oracle.com> > Date: Tue, 12 Aug 2014 22:44:27 -0700 > >> Thanks, if there is a method to accomplish this with multiple Tx netdev queues >> would be wonderful. We will research on the custom ndo_selec_queue() >> method to direct traffic automatically at the network stack level. >> We probably also need to look for methods to increase parallelism to boost >> performance, I assume multiple Tx queues would be a method to accomplish >> that as well. > > It will introduce parallelism only when there is traffic to multiple > peers at the same time. Yes, I should not have related these two things, especially without proper experimentation. In other implementations, we have used techniques such as multiple Tx queues to increase parallelism but path selection is done within the driver to get the benefit of the parallelism for all peers. We need to see what method suites best for sunvnet, we hope to know more once we investigate all methods available. In case of custom ndo_select_queue() method, we need to see if there are methods to deal with ports dynamically added/removed too. -Raghuram-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On (08/12/14 21:26), Raghuram Kothakota wrote: > > We do not consider a peer which couldn't process LDC messages > as an offender but just a peer in a broken state. A peer could get So I was using "offender" in the same way that one typically talks about sending ICMP errors for "offending IP packets" (nothing pesonal :-)). I dont want to quibble over terminology here. > From what I read the ldc_disconect() code, it seems like it is reconfiguring > the queues, so I assume a reset event would re-enable the communication > fine, but I don't know for sure without testing this specific condition. No such events were generated by the code when I tested it. I'll send a consolidated reply to the rest of that very long thread in a bit. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On (08/12/14 21:25), David Miller wrote: > > However, don't get ahead of yourself. All I'm saying in the context > of reviewing this patch #3 is that you should leave vnet_tx_timeout() > alone and just put what you were putting into vnet_tx_timeout() into > maybe_tx_wakeup(). And just leave vnet_tx_timeout() as before (i.e, "XXX implement me"?)? Sure, I can generate that (as a stand-alone unified-diff patch?) it would be the minimum fix needed to avoid the deadlock on netif_tx_lock() itself. > This doesn't match my understanding. My understanding is that when > an LDC connection resets, the peer on the other side should automatically > try to re-bind or whatever is necessary to re-establish the LDC link. > > It's supposed to be fault tolerant in as many situations as possible > without the need for explicit adminstrator intervention. Ideally, yes. But that's not what I was observing when I tested this. > But besides that, an LDC reset is a big hammer. So I would say that it > should be deferred to the last possible point. Correct, and that's why it feels ok (to me) to do this when the DRING_STOPPED fails (so that we also fail subsequent ldc_write's from vnet_start_xmit()) but not in any other case. Note that, as I pointed out to Raghuram already, doing ldc_disconnect does *not* send any reset events. The code may have other missing functionality. > So initially we do the backoff spinning loop synchronously. If that > doesn't succeed, we schedule a workqueue that can poll the LDC channel > some more trying to do the write, in a sleepable context (so a loop > with delays and preemption points) until a much larger timeout. Only > at this second timeout do we declare things to be serious enough for > an LDC channel reset. > > > So for case 1 we could do something very similar- I haven't tried this yet, > > but here's what I'm thinking: vnet_start_xmit() should not do a > > netif_stop_queue. Instead it should do a (new function) "vio_stop_queue()" > > which sets some (new) VIO_DR_FLOW_CONTROLLED state in the vio_driver_state, > > (or perhaps some flag bit in the ldc_channel?) and also marks a > > (new) boolean is_flow_controlled state variable in the struct vnet > > as TRUE. > > > > The modified maybe_tx_wakeup would check is_flow_controlled on the vnet, > > and call a new vnet_wake_queue() to reset VIO_DR_FLOW_CONTROLLED > > if appropriate. > > > > vnet_start_xmit should drop packets as long as VIO_DR_FLOW_CONTROLLED > > is set. > > Again, as I stated yesterday, you really cannot do this. Head of line > blocking when one peer gets wedged is absolutely necessary. In the light of the subsequent discussion about head-of-line blocking (and it's applicability per ldc_channel, instead of per-net_device), do we still think the above has issues, or is otherwise not worth doing? Further in the thread, David Miller wrote: > So the question is how to manage this on the driver side, and the most > natural way I see do this would be to use multiple TX netdev queues > and a custom netdev_ops->ndo_select_queue() method which selects the > queue based upon the peer that would be selected. This might work well for the current sunvnet model (and even there, it has limitations- if all the traffic is going out via the switchport to a gateway, you are again back to the head-of-line blocking model). But how generally useful is this model? For point-to-multipoint links like ethernet, I dont think you actually track one channel per peer. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Aug 13, 2014, at 3:51 AM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: > On (08/12/14 21:26), Raghuram Kothakota wrote: >> >> We do not consider a peer which couldn't process LDC messages >> as an offender but just a peer in a broken state. A peer could get > > So I was using "offender" in the same way that one typically > talks about sending ICMP errors for "offending IP packets" > (nothing pesonal :-)). I dont want to quibble over terminology here. > >> From what I read the ldc_disconect() code, it seems like it is reconfiguring >> the queues, so I assume a reset event would re-enable the communication >> fine, but I don't know for sure without testing this specific condition. > > No such events were generated by the code when I tested it. The reset event is expected for the peer, if it is not happening then we probably need to check why. -Raghuram > > I'll send a consolidated reply to the rest of that very long thread > in a bit. > > --Sowmini > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Wed, 13 Aug 2014 07:20:10 -0400 > On (08/12/14 21:25), David Miller wrote: >> >> However, don't get ahead of yourself. All I'm saying in the context >> of reviewing this patch #3 is that you should leave vnet_tx_timeout() >> alone and just put what you were putting into vnet_tx_timeout() into >> maybe_tx_wakeup(). > > And just leave vnet_tx_timeout() as before (i.e, "XXX implement me"?)? Yes, and I see this is what you have done. > Note that, as I pointed out to Raghuram already, doing ldc_disconnect > does *not* send any reset events. The code may have other missing > functionality. For the caller of ldc_disconnect() the reset is implied, if this is what you are talking about. I don't see any value for generating an event the user of the LDC channel itself generated. >> So the question is how to manage this on the driver side, and the most >> natural way I see do this would be to use multiple TX netdev queues >> and a custom netdev_ops->ndo_select_queue() method which selects the >> queue based upon the peer that would be selected. > > This might work well for the current sunvnet model (and even > there, it has limitations- if all the traffic is going out > via the switchport to a gateway, you are again back to the > head-of-line blocking model). > > But how generally useful is this model? For point-to-multipoint links > like ethernet, I dont think you actually track one channel per > peer. I don't understand what your concern could be. Ethernet is a broadcast medium, so all traffic (as far as the NIC is concerned) goes via the same single queue. But here we have independent queues leading up to different destinations, so representing that as independent queues makes sense. This way, if one non-switch peer goes down, all switch based traffic will still continue to flow, and vice versa. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c index 39c39bf..c75a638 100644 --- a/drivers/net/ethernet/sun/sunvnet.c +++ b/drivers/net/ethernet/sun/sunvnet.c @@ -32,6 +32,8 @@ MODULE_DESCRIPTION("Sun LDOM virtual network driver"); MODULE_LICENSE("GPL"); MODULE_VERSION(DRV_MODULE_VERSION); +static void vnet_tx_timeout(struct net_device *dev); + /* Heuristic for the number of times to exponentially backoff and * retry sending an LDC trigger when EAGAIN is encountered */ @@ -489,28 +491,13 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf) return 0; } -static void maybe_tx_wakeup(struct vnet *vp) +static void maybe_tx_wakeup(unsigned long param) { + struct vnet *vp = (struct vnet *)param; struct net_device *dev = vp->dev; netif_tx_lock(dev); - if (likely(netif_queue_stopped(dev))) { - struct vnet_port *port; - int wake = 1; - - list_for_each_entry(port, &vp->port_list, list) { - struct vio_dring_state *dr; - - dr = &port->vio.drings[VIO_DRIVER_TX_RING]; - if (vnet_tx_dring_avail(dr) < - VNET_TX_WAKEUP_THRESH(dr)) { - wake = 0; - break; - } - } - if (wake) - netif_wake_queue(dev); - } + vnet_tx_timeout(dev); netif_tx_unlock(dev); } @@ -587,8 +574,13 @@ static void vnet_event(void *arg, int event) break; } spin_unlock(&vio->lock); + /* Kick off a tasklet to wake the queue. We cannot call + * maybe_tx_wakeup directly here because we could deadlock on + * netif_tx_lock() with dev_watchdog() + */ if (unlikely(tx_wakeup && err != -ECONNRESET)) - maybe_tx_wakeup(port->vp); + tasklet_schedule(&port->vp->vnet_tx_wakeup); + local_irq_restore(flags); } @@ -764,7 +756,25 @@ out_dropped: static void vnet_tx_timeout(struct net_device *dev) { - /* XXX Implement me XXX */ + struct vnet *vp = netdev_priv(dev); + + if (likely(netif_queue_stopped(dev))) { + struct vnet_port *port; + int wake = 1; + + list_for_each_entry(port, &vp->port_list, list) { + struct vio_dring_state *dr; + + dr = &port->vio.drings[VIO_DRIVER_TX_RING]; + if (vnet_tx_dring_avail(dr) < + VNET_TX_WAKEUP_THRESH(dr)) { + wake = 0; + break; + } + } + if (wake) + netif_wake_queue(dev); + } } static int vnet_open(struct net_device *dev) @@ -1071,6 +1081,7 @@ static struct vnet *vnet_new(const u64 *local_mac) vp = netdev_priv(dev); spin_lock_init(&vp->lock); + tasklet_init(&vp->vnet_tx_wakeup, maybe_tx_wakeup, (unsigned long)vp); vp->dev = dev; INIT_LIST_HEAD(&vp->port_list); @@ -1130,6 +1141,7 @@ static void vnet_cleanup(void) vp = list_first_entry(&vnet_list, struct vnet, list); list_del(&vp->list); dev = vp->dev; + tasklet_kill(&vp->vnet_tx_wakeup); /* vio_unregister_driver() should have cleaned up port_list */ BUG_ON(!list_empty(&vp->port_list)); unregister_netdev(dev); diff --git a/drivers/net/ethernet/sun/sunvnet.h b/drivers/net/ethernet/sun/sunvnet.h index d347a5b..0e872aa 100644 --- a/drivers/net/ethernet/sun/sunvnet.h +++ b/drivers/net/ethernet/sun/sunvnet.h @@ -1,6 +1,8 @@ #ifndef _SUNVNET_H #define _SUNVNET_H +#include <linux/interrupt.h> + #define DESC_NCOOKIES(entry_size) \ ((entry_size) - sizeof(struct vio_net_desc)) @@ -78,6 +80,8 @@ struct vnet { struct list_head list; u64 local_mac; + + struct tasklet_struct vnet_tx_wakeup; }; #endif /* _SUNVNET_H */