diff mbox

usbnet: fix 100% CPU use on suspended device

Message ID AANLkTi=pSkOuiFGAK4inPdm-KbLuVrz51e2T6eV51ts0@mail.gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Elly Jones July 21, 2010, 6:51 p.m. UTC
Subject: [patch] usbnet: fix 100% CPU use on suspended device
From: Elly Jones <ellyjones@google.com>

This patch causes the usbnet module not to attempt to submit URBs to the device
if the device is not ready to accept them. This fixes a misbehavior trigged by
the Qualcomm Gobi driver (released under GPL through their Code Aurora
initiative) which causes the usbnet core to consume 100% of CPU attempting and
failing to submit URBs. This patch is against Linus's 2.6 repo commit
a9f7f2e74ae0e6a801a2433dc8e9124d73da0cb4.
Signed-off-by: Elizabeth Jones <ellyjones@google.com>
---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller July 26, 2010, 4:57 a.m. UTC | #1
From: Elly Jones <ellyjones@google.com>
Date: Wed, 21 Jul 2010 14:51:48 -0400

> Subject: [patch] usbnet: fix 100% CPU use on suspended device
> From: Elly Jones <ellyjones@google.com>
> 
> This patch causes the usbnet module not to attempt to submit URBs to the device
> if the device is not ready to accept them. This fixes a misbehavior trigged by
> the Qualcomm Gobi driver (released under GPL through their Code Aurora
> initiative) which causes the usbnet core to consume 100% of CPU attempting and
> failing to submit URBs. This patch is against Linus's 2.6 repo commit
> a9f7f2e74ae0e6a801a2433dc8e9124d73da0cb4.
> Signed-off-by: Elizabeth Jones <ellyjones@google.com>

If the Qualcomm Gobi driver is the only one where this happens, maybe the
real problem is in that driver.

I'm not applying this until a USB person looks more deeply into this.

> ---
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 81c76ad..df7e72e 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -1172,6 +1172,7 @@ static void usbnet_bh (unsigned long param)
>  	// or are we maybe short a few urbs?
>  	} else if (netif_running (dev->net) &&
>  		   netif_device_present (dev->net) &&
> +		   dev->udev->can_submit &&
>  		   !timer_pending (&dev->delay) &&
>  		   !test_bit (EVENT_RX_HALT, &dev->flags)) {
>  		int	temp = dev->rxq.qlen;
> --
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Elly Jones July 26, 2010, 1:26 p.m. UTC | #2
On Mon, Jul 26, 2010 at 12:57 AM, David Miller <davem@davemloft.net> wrote:
> From: Elly Jones <ellyjones@google.com>
> Date: Wed, 21 Jul 2010 14:51:48 -0400
>
>> Subject: [patch] usbnet: fix 100% CPU use on suspended device
>> From: Elly Jones <ellyjones@google.com>
>>
>> This patch causes the usbnet module not to attempt to submit URBs to the device
>> if the device is not ready to accept them. This fixes a misbehavior trigged by
>> the Qualcomm Gobi driver (released under GPL through their Code Aurora
>> initiative) which causes the usbnet core to consume 100% of CPU attempting and
>> failing to submit URBs. This patch is against Linus's 2.6 repo commit
>> a9f7f2e74ae0e6a801a2433dc8e9124d73da0cb4.
>> Signed-off-by: Elizabeth Jones <ellyjones@google.com>
>
> If the Qualcomm Gobi driver is the only one where this happens, maybe the
> real problem is in that driver.

The member in question (dev->udev->can_submit) is documented as 'URBs
may be submitted'. The existing code doesn't honor that flag. It is
somewhat puzzling that (so far) only the Gobi driver seems to use that
flag, but I don't think the bug lies in their driver here.

> I'm not applying this until a USB person looks more deeply into this.

Alright. Can you suggest a particular USB person to bother?

>> ---
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index 81c76ad..df7e72e 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -1172,6 +1172,7 @@ static void usbnet_bh (unsigned long param)
>>       // or are we maybe short a few urbs?
>>       } else if (netif_running (dev->net) &&
>>                  netif_device_present (dev->net) &&
>> +                dev->udev->can_submit &&
>>                  !timer_pending (&dev->delay) &&
>>                  !test_bit (EVENT_RX_HALT, &dev->flags)) {
>>               int     temp = dev->rxq.qlen;
>> --
>

-- Elly
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 26, 2010, 2:36 p.m. UTC | #3
On Mon, 26 Jul 2010, Elly Jones wrote:

> On Mon, Jul 26, 2010 at 12:57 AM, David Miller <davem@davemloft.net> wrote:
> > From: Elly Jones <ellyjones@google.com>
> > Date: Wed, 21 Jul 2010 14:51:48 -0400
> >
> >> Subject: [patch] usbnet: fix 100% CPU use on suspended device
> >> From: Elly Jones <ellyjones@google.com>
> >>
> >> This patch causes the usbnet module not to attempt to submit URBs to the device
> >> if the device is not ready to accept them. This fixes a misbehavior trigged by
> >> the Qualcomm Gobi driver (released under GPL through their Code Aurora
> >> initiative) which causes the usbnet core to consume 100% of CPU attempting and
> >> failing to submit URBs. This patch is against Linus's 2.6 repo commit
> >> a9f7f2e74ae0e6a801a2433dc8e9124d73da0cb4.
> >> Signed-off-by: Elizabeth Jones <ellyjones@google.com>
> >
> > If the Qualcomm Gobi driver is the only one where this happens, maybe the
> > real problem is in that driver.
> 
> The member in question (dev->udev->can_submit) is documented as 'URBs
> may be submitted'. The existing code doesn't honor that flag. It is
> somewhat puzzling that (so far) only the Gobi driver seems to use that
> flag, but I don't think the bug lies in their driver here.

That flag is not intended for use by drivers but by the USB core.  (It
gets cleared only when the device is suspended.)  usbnet and Gobi
shouldn't need to look at it.

> > I'm not applying this until a USB person looks more deeply into this.
> 
> Alright. Can you suggest a particular USB person to bother?
> 
> >> ---
> >> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> >> index 81c76ad..df7e72e 100644
> >> --- a/drivers/net/usb/usbnet.c
> >> +++ b/drivers/net/usb/usbnet.c
> >> @@ -1172,6 +1172,7 @@ static void usbnet_bh (unsigned long param)
> >>       // or are we maybe short a few urbs?
> >>       } else if (netif_running (dev->net) &&
> >>                  netif_device_present (dev->net) &&
> >> +                dev->udev->can_submit &&
> >>                  !timer_pending (&dev->delay) &&
> >>                  !test_bit (EVENT_RX_HALT, &dev->flags)) {
> >>               int     temp = dev->rxq.qlen;

This isn't right.  The problem should be fixed some other way.  Under
what circumstances are URBs submitted incorrectly?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Elly Jones July 26, 2010, 2:47 p.m. UTC | #4
On Mon, Jul 26, 2010 at 10:36 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 26 Jul 2010, Elly Jones wrote:
>
>> On Mon, Jul 26, 2010 at 12:57 AM, David Miller <davem@davemloft.net> wrote:
>> > From: Elly Jones <ellyjones@google.com>
>> > Date: Wed, 21 Jul 2010 14:51:48 -0400
>> >
>> >> Subject: [patch] usbnet: fix 100% CPU use on suspended device
>> >> From: Elly Jones <ellyjones@google.com>
>> >>
>> >> This patch causes the usbnet module not to attempt to submit URBs to the device
>> >> if the device is not ready to accept them. This fixes a misbehavior trigged by
>> >> the Qualcomm Gobi driver (released under GPL through their Code Aurora
>> >> initiative) which causes the usbnet core to consume 100% of CPU attempting and
>> >> failing to submit URBs. This patch is against Linus's 2.6 repo commit
>> >> a9f7f2e74ae0e6a801a2433dc8e9124d73da0cb4.
>> >> Signed-off-by: Elizabeth Jones <ellyjones@google.com>
>> >
>> > If the Qualcomm Gobi driver is the only one where this happens, maybe the
>> > real problem is in that driver.
>>
>> The member in question (dev->udev->can_submit) is documented as 'URBs
>> may be submitted'. The existing code doesn't honor that flag. It is
>> somewhat puzzling that (so far) only the Gobi driver seems to use that
>> flag, but I don't think the bug lies in their driver here.
>
> That flag is not intended for use by drivers but by the USB core.  (It
> gets cleared only when the device is suspended.)  usbnet and Gobi
> shouldn't need to look at it.

Aha!

>> > I'm not applying this until a USB person looks more deeply into this.
>>
>> Alright. Can you suggest a particular USB person to bother?
>>
>> >> ---
>> >> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> >> index 81c76ad..df7e72e 100644
>> >> --- a/drivers/net/usb/usbnet.c
>> >> +++ b/drivers/net/usb/usbnet.c
>> >> @@ -1172,6 +1172,7 @@ static void usbnet_bh (unsigned long param)
>> >>       // or are we maybe short a few urbs?
>> >>       } else if (netif_running (dev->net) &&
>> >>                  netif_device_present (dev->net) &&
>> >> +                dev->udev->can_submit &&
>> >>                  !timer_pending (&dev->delay) &&
>> >>                  !test_bit (EVENT_RX_HALT, &dev->flags)) {
>> >>               int     temp = dev->rxq.qlen;
>
> This isn't right.  The problem should be fixed some other way.  Under
> what circumstances are URBs submitted incorrectly?

When the device is autosuspended. What is the proper thing for a
device to do here?

> Alan Stern

-- Elly
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern July 26, 2010, 3:13 p.m. UTC | #5
On Mon, 26 Jul 2010, Elly Jones wrote:

> > This isn't right.  The problem should be fixed some other way.  Under
> > what circumstances are URBs submitted incorrectly?
> 
> When the device is autosuspended. What is the proper thing for a
> device to do here?

From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
not an expert on usbnet; we should ask someone who is, like Oliver.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum July 26, 2010, 4:21 p.m. UTC | #6
Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
> On Mon, 26 Jul 2010, Elly Jones wrote:
> 
> > > This isn't right.  The problem should be fixed some other way.  Under
> > > what circumstances are URBs submitted incorrectly?
> > 
> > When the device is autosuspended. What is the proper thing for a
> > device to do here?
> 
> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
> not an expert on usbnet; we should ask someone who is, like Oliver.

Sorry, I didn't notice this thread.

The correct way to check for autosuspend in usbnet is to look
at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
uses rx_submit() which does the correct check. The bug seems to be
a lack of error handling in usbnet_bh() regarding the return of rx_submit()

	Regards
		Oliver
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Elly Jones Aug. 2, 2010, 1:31 p.m. UTC | #7
On Mon, Jul 26, 2010 at 12:21 PM, Oliver Neukum <oliver@neukum.org> wrote:
> Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
>> On Mon, 26 Jul 2010, Elly Jones wrote:
>>
>> > > This isn't right.  The problem should be fixed some other way.  Under
>> > > what circumstances are URBs submitted incorrectly?
>> >
>> > When the device is autosuspended. What is the proper thing for a
>> > device to do here?
>>
>> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
>> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
>> not an expert on usbnet; we should ask someone who is, like Oliver.
>
> Sorry, I didn't notice this thread.
>
> The correct way to check for autosuspend in usbnet is to look
> at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
> uses rx_submit() which does the correct check. The bug seems to be
> a lack of error handling in usbnet_bh() regarding the return of rx_submit()

If rx_submit() fails, should usbnet_bh() just not tasklet_schedule() itself?

>        Regards
>                Oliver

-- Elly
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum Aug. 3, 2010, 2:39 p.m. UTC | #8
Am Montag, 2. August 2010, 15:31:33 schrieb Elly Jones:
> On Mon, Jul 26, 2010 at 12:21 PM, Oliver Neukum <oliver@neukum.org> wrote:
> > Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
> >> On Mon, 26 Jul 2010, Elly Jones wrote:
> >>
> >> > > This isn't right.  The problem should be fixed some other way.  Under
> >> > > what circumstances are URBs submitted incorrectly?
> >> >
> >> > When the device is autosuspended. What is the proper thing for a
> >> > device to do here?
> >>
> >> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
> >> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
> >> not an expert on usbnet; we should ask someone who is, like Oliver.
> >
> > Sorry, I didn't notice this thread.
> >
> > The correct way to check for autosuspend in usbnet is to look
> > at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
> > uses rx_submit() which does the correct check. The bug seems to be
> > a lack of error handling in usbnet_bh() regarding the return of rx_submit()
> 
> If rx_submit() fails, should usbnet_bh() just not tasklet_schedule() itself?

That would not work unless the cause of the failure would be removed.
If you get -ENOLINK the sane option seems to me to give up.

	Regards
		Oliver
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Elly Jones Aug. 4, 2010, 2:04 p.m. UTC | #9
On Tue, Aug 3, 2010 at 10:39 AM, Oliver Neukum <oliver@neukum.org> wrote:
> Am Montag, 2. August 2010, 15:31:33 schrieb Elly Jones:
>> On Mon, Jul 26, 2010 at 12:21 PM, Oliver Neukum <oliver@neukum.org> wrote:
>> > Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
>> >> On Mon, 26 Jul 2010, Elly Jones wrote:
>> >>
>> >> > > This isn't right.  The problem should be fixed some other way.  Under
>> >> > > what circumstances are URBs submitted incorrectly?
>> >> >
>> >> > When the device is autosuspended. What is the proper thing for a
>> >> > device to do here?
>> >>
>> >> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
>> >> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
>> >> not an expert on usbnet; we should ask someone who is, like Oliver.
>> >
>> > Sorry, I didn't notice this thread.
>> >
>> > The correct way to check for autosuspend in usbnet is to look
>> > at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
>> > uses rx_submit() which does the correct check. The bug seems to be
>> > a lack of error handling in usbnet_bh() regarding the return of rx_submit()
>>
>> If rx_submit() fails, should usbnet_bh() just not tasklet_schedule() itself?
>
> That would not work unless the cause of the failure would be removed.
> If you get -ENOLINK the sane option seems to me to give up.

'Give up' meaning what? If we reschedule the tasklet, it'll just try
again (and fail again), won't it?

>        Regards
>                Oliver

-- Elly
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum Aug. 4, 2010, 2:08 p.m. UTC | #10
Am Mittwoch, 4. August 2010, 16:04:48 schrieben Sie:
> On Tue, Aug 3, 2010 at 10:39 AM, Oliver Neukum <oliver@neukum.org> wrote:
> > Am Montag, 2. August 2010, 15:31:33 schrieb Elly Jones:
> >> On Mon, Jul 26, 2010 at 12:21 PM, Oliver Neukum <oliver@neukum.org> wrote:
> >> > Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
> >> >> On Mon, 26 Jul 2010, Elly Jones wrote:
> >> >>
> >> >> > > This isn't right.  The problem should be fixed some other way.  Under
> >> >> > > what circumstances are URBs submitted incorrectly?
> >> >> >
> >> >> > When the device is autosuspended. What is the proper thing for a
> >> >> > device to do here?
> >> >>
> >> >> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
> >> >> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
> >> >> not an expert on usbnet; we should ask someone who is, like Oliver.
> >> >
> >> > Sorry, I didn't notice this thread.
> >> >
> >> > The correct way to check for autosuspend in usbnet is to look
> >> > at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
> >> > uses rx_submit() which does the correct check. The bug seems to be
> >> > a lack of error handling in usbnet_bh() regarding the return of rx_submit()
> >>
> >> If rx_submit() fails, should usbnet_bh() just not tasklet_schedule() itself?
> >
> > That would not work unless the cause of the failure would be removed.
> > If you get -ENOLINK the sane option seems to me to give up.
> 
> 'Give up' meaning what? If we reschedule the tasklet, it'll just try
> again (and fail again), won't it?

Yes, exactly. If the tasklet runs after the interface has been suspended,
it cannot replenish the rx URBs. That will be the job of resume()
Just stop trying and do nothing.

	Regards
		Oliver
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 81c76ad..df7e72e 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1172,6 +1172,7 @@  static void usbnet_bh (unsigned long param)
 	// or are we maybe short a few urbs?
 	} else if (netif_running (dev->net) &&
 		   netif_device_present (dev->net) &&
+		   dev->udev->can_submit &&
 		   !timer_pending (&dev->delay) &&
 		   !test_bit (EVENT_RX_HALT, &dev->flags)) {
 		int	temp = dev->rxq.qlen;