Message ID | 1359023152-32576-1-git-send-email-bjorn@mork.no |
---|---|
State | Superseded, archived |
Headers | show |
On Thursday 24 January 2013 11:25:52 Bjørn Mork wrote: > The MBIM firmware for the Sierra Wireless MC7710 is a nice source > of "interesting" device issues. One of the uglier ones is that > it under certain conditions will start flooding us with frames > having length 0 as fast as it can. And that is pretty fast... If you can tell that those frames are bogus, why not count them as opposed to generic qlen? Say, if you see 20 among the last 30 are of this type, throttle. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjørn Mork <bjorn@mork.no> writes: > A device sending 0 length frames as fast as it can has been > observed killing the host system due to the resulting memory > pressure. We handle the done queue as fast as we can, so > if this queue is filling up then that is an indication that we > are under too heavy pressure. Refusing further allocations > until the done queue is handled prevents the buggy device > from taking the system down. > > Signed-off-by: Bjørn Mork <bjorn@mork.no> > --- > Hello Oliver, > > The MBIM firmware for the Sierra Wireless MC7710 is a nice source > of "interesting" device issues. One of the uglier ones is that > it under certain conditions will start flooding us with frames > having length 0 as fast as it can. And that is pretty fast... > > My older laptop dies immediately under this. It just cannot keep > up with the infinite allocations usbnet will do when the done > queue first starts growing beyond reason. > > I really do not have a clue how to handle this problem, but this > patch seems to do the job for me without affecting normal devices. > The queue limit is just a number which Works For Me, leaving the > system running with the buggy device and not kicking in under > normal load. > > What do you think? Is there some other way this should be solved? To illustrate the problem, this the start and stop debug output for such a buggy device session *with* the RFC patch applied: Jan 24 11:16:23 nemi kernel: [ 3187.624164] qmi_wwan 8-4:1.8 wwan0: open: enable queueing (rx 60, tx 60) mtu 1500 simple framing Jan 24 11:16:38 nemi kernel: [ 3202.536921] qmi_wwan 8-4:1.8 wwan0: stop stats: rx/tx 1/11, errs 738980/0 I believe the stats tell the full story... I do not have any logs without the throttling patch, as that takes down everything on my laptop including the ahci driver and keyboard. Not even the magic sysrq is working then. If anyone is interested in the full debug log (211KB compressed) from the above session, then I've put it on http://www.mork.no/~bjorn/usbnet-zero-packet-fix.log.gz It is mostly full of "rx length 0" lines, but with an occasional sequence of Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1025) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1026) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1027) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1028) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1029) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1030) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1031) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1032) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1033) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: rx length 0 Jan 24 11:16:23 nemi kernel: [ 3187.682669] qmi_wwan 8-4:1.8 wwan0: done queue filling up (1034) - throttling Jan 24 11:16:23 nemi kernel: [ 3187.697826] qmi_wwan 8-4:1.8 wwan0: rxqlen 0 --> 10 showing that the throttling is kicking in and doing its job. Bjørn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Oliver Neukum <oneukum@suse.de> writes: > On Thursday 24 January 2013 11:25:52 Bjørn Mork wrote: >> The MBIM firmware for the Sierra Wireless MC7710 is a nice source >> of "interesting" device issues. One of the uglier ones is that >> it under certain conditions will start flooding us with frames >> having length 0 as fast as it can. And that is pretty fast... > > If you can tell that those frames are bogus, why not count them > as opposed to generic qlen? Say, if you see 20 among the last 30 > are of this type, throttle. Sounds like a good idea, but where do I add/get that statistics? Bjørn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 24 January 2013 11:52:22 Bjørn Mork wrote: > Oliver Neukum <oneukum@suse.de> writes: > > > On Thursday 24 January 2013 11:25:52 Bjørn Mork wrote: > >> The MBIM firmware for the Sierra Wireless MC7710 is a nice source > >> of "interesting" device issues. One of the uglier ones is that > >> it under certain conditions will start flooding us with frames > >> having length 0 as fast as it can. And that is pretty fast... > > > > If you can tell that those frames are bogus, why not count them > > as opposed to generic qlen? Say, if you see 20 among the last 30 > > are of this type, throttle. > > Sounds like a good idea, but where do I add/get that statistics? rx_complete It need not be very accurate. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Oliver Neukum <oneukum@suse.de> writes: > On Thursday 24 January 2013 11:52:22 Bjørn Mork wrote: >> Oliver Neukum <oneukum@suse.de> writes: >> >> > On Thursday 24 January 2013 11:25:52 Bjørn Mork wrote: >> >> The MBIM firmware for the Sierra Wireless MC7710 is a nice source >> >> of "interesting" device issues. One of the uglier ones is that >> >> it under certain conditions will start flooding us with frames >> >> having length 0 as fast as it can. And that is pretty fast... >> > >> > If you can tell that those frames are bogus, why not count them >> > as opposed to generic qlen? Say, if you see 20 among the last 30 >> > are of this type, throttle. >> >> Sounds like a good idea, but where do I add/get that statistics? > > rx_complete > > It need not be very accurate. Sorry for being daft, but how do I code the "20 among the last 30" part there? Bjørn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 24 January 2013 12:22:54 Bjørn Mork wrote: > Oliver Neukum <oneukum@suse.de> writes: > > > On Thursday 24 January 2013 11:52:22 Bjørn Mork wrote: > >> Oliver Neukum <oneukum@suse.de> writes: > >> > >> > On Thursday 24 January 2013 11:25:52 Bjørn Mork wrote: > >> >> The MBIM firmware for the Sierra Wireless MC7710 is a nice source > >> >> of "interesting" device issues. One of the uglier ones is that > >> >> it under certain conditions will start flooding us with frames > >> >> having length 0 as fast as it can. And that is pretty fast... > >> > > >> > If you can tell that those frames are bogus, why not count them > >> > as opposed to generic qlen? Say, if you see 20 among the last 30 > >> > are of this type, throttle. > >> > >> Sounds like a good idea, but where do I add/get that statistics? > > > > rx_complete > > > > It need not be very accurate. > > Sorry for being daft, but how do I code the "20 among the last 30" part > there? Just by agreeing that you can live with false negatives but not false positives if (++counter > 30) { counter = bogus = 0; } else { if (is_bogus(packet) bogus++; if (bogus > counter/2) throttle(); } Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello. On 24-01-2013 14:25, Bjørn Mork wrote: > A device sending 0 length frames as fast as it can has been > observed killing the host system due to the resulting memory > pressure. We handle the done queue as fast as we can, so > if this queue is filling up then that is an indication that we > are under too heavy pressure. Refusing further allocations > until the done queue is handled prevents the buggy device > from taking the system down. > > Signed-off-by: Bjørn Mork <bjorn@mork.no> [...] > drivers/net/usb/usbnet.c | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) > diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c > index f34b2eb..85c7ffd 100644 > --- a/drivers/net/usb/usbnet.c > +++ b/drivers/net/usb/usbnet.c > @@ -380,6 +380,14 @@ static int rx_submit (struct usbnet *dev, struct urb *urb, gfp_t flags) > unsigned long lockflags; > size_t size = dev->rx_urb_size; > > + /* Do not let a device flood us to death! */ > + if (dev->done.qlen > 1024) { > + netif_dbg(dev, rx_err, dev->net, "done queue filling up (%u) - throttling\n", dev->done.qlen); > + usbnet_defer_kevent (dev, EVENT_RX_MEMORY); > + usb_free_urb (urb); Run your patch thru scripts/checkpatch.pl please -- spaces before parens are not allowed. WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-01-24 at 16:39 +0400, Sergei Shtylyov wrote: > On 24-01-2013 14:25, Bjørn Mork wrote: > > A device sending 0 length frames as fast as it can has been > > observed killing the host system due to the resulting memory > > pressure. [] > > diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c [] > > + /* Do not let a device flood us to death! */ > > + if (dev->done.qlen > 1024) { > > + netif_dbg(dev, rx_err, dev->net, "done queue filling up (%u) - throttling\n", dev->done.qlen); > > + usbnet_defer_kevent (dev, EVENT_RX_MEMORY); > > + usb_free_urb (urb); > > Run your patch thru scripts/checkpatch.pl please And maybe ratelimit the netif_dbg -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index f34b2eb..85c7ffd 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -380,6 +380,14 @@ static int rx_submit (struct usbnet *dev, struct urb *urb, gfp_t flags) unsigned long lockflags; size_t size = dev->rx_urb_size; + /* Do not let a device flood us to death! */ + if (dev->done.qlen > 1024) { + netif_dbg(dev, rx_err, dev->net, "done queue filling up (%u) - throttling\n", dev->done.qlen); + usbnet_defer_kevent (dev, EVENT_RX_MEMORY); + usb_free_urb (urb); + return -ENOMEM; + } + skb = __netdev_alloc_skb_ip_align(dev->net, size, flags); if (!skb) { netif_dbg(dev, rx_err, dev->net, "no rx skb\n");
A device sending 0 length frames as fast as it can has been observed killing the host system due to the resulting memory pressure. We handle the done queue as fast as we can, so if this queue is filling up then that is an indication that we are under too heavy pressure. Refusing further allocations until the done queue is handled prevents the buggy device from taking the system down. Signed-off-by: Bjørn Mork <bjorn@mork.no> --- Hello Oliver, The MBIM firmware for the Sierra Wireless MC7710 is a nice source of "interesting" device issues. One of the uglier ones is that it under certain conditions will start flooding us with frames having length 0 as fast as it can. And that is pretty fast... My older laptop dies immediately under this. It just cannot keep up with the infinite allocations usbnet will do when the done queue first starts growing beyond reason. I really do not have a clue how to handle this problem, but this patch seems to do the job for me without affecting normal devices. The queue limit is just a number which Works For Me, leaving the system running with the buggy device and not kicking in under normal load. What do you think? Is there some other way this should be solved? Bjørn drivers/net/usb/usbnet.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-)