Patchwork [01/12] usbnet: introduce usbnet 3 command helpers

login
register
mail settings
Submitter Ming Lei
Date Oct. 2, 2012, 6:51 a.m.
Message ID <1349160684-6627-2-git-send-email-ming.lei@canonical.com>
Download mbox | patch
Permalink /patch/188395/
State Deferred
Delegated to: David Miller
Headers show

Comments

Ming Lei - Oct. 2, 2012, 6:51 a.m.
This patch introduces the below 3 usb command helpers:

	usbnet_read_cmd / usbnet_write_cmd / usbnet_write_cmd_async

so that each low level driver doesn't need to implement them
by itself, and the dma buffer allocation for usb transfer and
runtime PM things can be handled just in one place.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/net/usb/usbnet.c   |  133 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/usb/usbnet.h |    6 ++
 2 files changed, 139 insertions(+)
Oliver Neukum - Oct. 9, 2012, 8:47 a.m.
On Tuesday 02 October 2012 14:51:12 Ming Lei wrote:
> This patch introduces the below 3 usb command helpers:
> 
> 	usbnet_read_cmd / usbnet_write_cmd / usbnet_write_cmd_async
> 
> so that each low level driver doesn't need to implement them
> by itself, and the dma buffer allocation for usb transfer and
> runtime PM things can be handled just in one place.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  drivers/net/usb/usbnet.c   |  133 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/usb/usbnet.h |    6 ++
>  2 files changed, 139 insertions(+)
> 
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index fc9f578..3b51554 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -1592,6 +1592,139 @@ int usbnet_resume (struct usb_interface *intf)
>  }
>  EXPORT_SYMBOL_GPL(usbnet_resume);
>  
> +/*-------------------------------------------------------------------------*/
> +int usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
> +		    u16 value, u16 index, void *data, u16 size)
> +{
> +	void *buf = NULL;
> +	int err = -ENOMEM;
> +
> +	netdev_dbg(dev->net, "usbnet_read_cmd cmd=0x%02x reqtype=%02x"
> +		   " value=0x%04x index=0x%04x size=%d\n",
> +		   cmd, reqtype, value, index, size);
> +
> +	if (data) {
> +		buf = kmalloc(size, GFP_KERNEL);

Using GFP_KERNEL you preclude using those in resume() and error handling.
Please pass a gfp_t parameter.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 10, 2012, 3:19 a.m.
On Tue, Oct 9, 2012 at 4:47 PM, Oliver Neukum <oneukum@suse.de> wrote:
>
> Using GFP_KERNEL you preclude using those in resume() and error handling.
> Please pass a gfp_t parameter.

IMO, it is not a big deal because generally only several bytes are to be
allocated inside these helpers.

If you still think the problem should be considered, another candidate fix
is to take GFP_NOIO during system suspend/resume, and take GFP_KERNEL
in other situations.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 10, 2012, 5:51 a.m.
On Wednesday 10 October 2012 11:19:09 Ming Lei wrote:
> On Tue, Oct 9, 2012 at 4:47 PM, Oliver Neukum <oneukum@suse.de> wrote:
> >
> > Using GFP_KERNEL you preclude using those in resume() and error handling.
> > Please pass a gfp_t parameter.
> 
> IMO, it is not a big deal because generally only several bytes are to be
> allocated inside these helpers.

Any allocation can do it, if the VM layer decides to block.

> If you still think the problem should be considered, another candidate fix
> is to take GFP_NOIO during system suspend/resume, and take GFP_KERNEL
> in other situations.

No, the problem is autoresume.

Suppose we have a device with two interface. Interface A be usbnet; interface B
something you page on. Now consider that you can only resume both interfaces
and this is (and needs to be) done synchronously.

Now we can have this code path:

autoresume of device -> resume() -> kmalloc(..., GFP_KERNEL) ->
VM layer decides to start paging out -> IO to interface B -> autoresume of device
--> DEADLOCK

We need to use GFP_NOIO in situations the helper cannot know about.
Please add a gfp_t parameter. Then the caller will solve that.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 10, 2012, 5:56 a.m.
On Wed, Oct 10, 2012 at 11:19 AM, Ming Lei <ming.lei@canonical.com> wrote:
> On Tue, Oct 9, 2012 at 4:47 PM, Oliver Neukum <oneukum@suse.de> wrote:
>>
>> Using GFP_KERNEL you preclude using those in resume() and error handling.
>> Please pass a gfp_t parameter.
>
> IMO, it is not a big deal because generally only several bytes are to be
> allocated inside these helpers.

Also pm_restrict_gfp_mask()/pm_restore_gfp_mask() have been introduced
to address the problem for 2 years, looks the gfp_t isn't needed, doesn't it?

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 10, 2012, 8:17 a.m.
On Wed, Oct 10, 2012 at 1:51 PM, Oliver Neukum <oneukum@suse.de> wrote:
>
> No, the problem is autoresume.
>
> Suppose we have a device with two interface. Interface A be usbnet; interface B
> something you page on. Now consider that you can only resume both interfaces
> and this is (and needs to be) done synchronously.
>
> Now we can have this code path:
>
> autoresume of device -> resume() -> kmalloc(..., GFP_KERNEL) ->
> VM layer decides to start paging out -> IO to interface B -> autoresume of device
> --> DEADLOCK

OK, thanks for your detailed explanation.

> We need to use GFP_NOIO in situations the helper cannot know about.
> Please add a gfp_t parameter. Then the caller will solve that.

Considered that most of drivers call the helpers in different context, I think
it is better to switch the gpf_t flag runtime inside helpers, like below:

           if (dev->power.runtime_status == RPM_RESUMING)
                   gfp = GFP_NOIO;
          else
                   gfp = GFP_KERNEL;

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 10, 2012, 8:24 a.m.
On Wednesday 10 October 2012 13:56:16 Ming Lei wrote:
> On Wed, Oct 10, 2012 at 11:19 AM, Ming Lei <ming.lei@canonical.com> wrote:
> > On Tue, Oct 9, 2012 at 4:47 PM, Oliver Neukum <oneukum@suse.de> wrote:
> >>
> >> Using GFP_KERNEL you preclude using those in resume() and error handling.
> >> Please pass a gfp_t parameter.
> >
> > IMO, it is not a big deal because generally only several bytes are to be
> > allocated inside these helpers.
> 
> Also pm_restrict_gfp_mask()/pm_restore_gfp_mask() have been introduced
> to address the problem for 2 years, looks the gfp_t isn't needed, doesn't it?

No, absolutely not. Introducing them was a mistake and is hiding errors.
Those helpers solve the problem only for the case of _system_ suspend/resume.
However the runtime case has the same problem. So in addition to not solving
the problem, we now have two code paths.
Frankly, those functions should be removed.

Secondly, in this case a similar deadlock exists with error handling.
Again take a device with network and storage functions (a.k.a. cell phone).
The storage function does a reset. And the deadlock happens like this:

reset storage -> pre_reset() -> physical reset -> post_reset() -> net interface
does a control message -> kmalloc(..., GFP_KERNEL) -> VM layer decide
to page out -> IO to storage function -> SCSI layer waits for error handler --> DEADLOCK

Believe me, you won't find a fancy solution for this. Just pass the gfp_t.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 10, 2012, 8:39 a.m.
On Wednesday 10 October 2012 16:17:25 Ming Lei wrote:
> On Wed, Oct 10, 2012 at 1:51 PM, Oliver Neukum <oneukum@suse.de> wrote:

> > We need to use GFP_NOIO in situations the helper cannot know about.
> > Please add a gfp_t parameter. Then the caller will solve that.
> 
> Considered that most of drivers call the helpers in different context, I think
> it is better to switch the gpf_t flag runtime inside helpers, like below:
> 
>            if (dev->power.runtime_status == RPM_RESUMING)
>                    gfp = GFP_NOIO;
>           else
>                    gfp = GFP_KERNEL;

You are admirably persistent ;-)

If you extended the check to RPM_SUSPENDING it might work,
but still the problem with error handling exists.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 10, 2012, 9:48 a.m.
On Wed, Oct 10, 2012 at 4:39 PM, Oliver Neukum <oneukum@suse.de> wrote:
> On Wednesday 10 October 2012 16:17:25 Ming Lei wrote:
>> On Wed, Oct 10, 2012 at 1:51 PM, Oliver Neukum <oneukum@suse.de> wrote:
>
>> > We need to use GFP_NOIO in situations the helper cannot know about.
>> > Please add a gfp_t parameter. Then the caller will solve that.
>>
>> Considered that most of drivers call the helpers in different context, I think
>> it is better to switch the gpf_t flag runtime inside helpers, like below:
>>
>>            if (dev->power.runtime_status == RPM_RESUMING)
>>                    gfp = GFP_NOIO;
>>           else
>>                    gfp = GFP_KERNEL;
>
> You are admirably persistent ;-)

I am only trying to solve the problem more generally, :-)

> If you extended the check to RPM_SUSPENDING it might work,
> but still the problem with error handling exists.

Could you describe the error handling case in a bit detail so
that callers of these helpers can know when GFP_KERNEL
is to be passed and when GFP_NOIO is taken if the gfp
patamerer has to be added?

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 10, 2012, 10:08 a.m.
On Wednesday 10 October 2012 17:48:54 Ming Lei wrote:
> On Wed, Oct 10, 2012 at 4:39 PM, Oliver Neukum <oneukum@suse.de> wrote:
> > On Wednesday 10 October 2012 16:17:25 Ming Lei wrote:
> >> On Wed, Oct 10, 2012 at 1:51 PM, Oliver Neukum <oneukum@suse.de> wrote:
> >
> >> > We need to use GFP_NOIO in situations the helper cannot know about.
> >> > Please add a gfp_t parameter. Then the caller will solve that.
> >>
> >> Considered that most of drivers call the helpers in different context, I think
> >> it is better to switch the gpf_t flag runtime inside helpers, like below:
> >>
> >>            if (dev->power.runtime_status == RPM_RESUMING)
> >>                    gfp = GFP_NOIO;
> >>           else
> >>                    gfp = GFP_KERNEL;
> >
> > You are admirably persistent ;-)
> 
> I am only trying to solve the problem more generally, :-)

The most generic solution is passing the parameter.

> > If you extended the check to RPM_SUSPENDING it might work,
> > but still the problem with error handling exists.
> 
> Could you describe the error handling case in a bit detail so
> that callers of these helpers can know when GFP_KERNEL
> is to be passed and when GFP_NOIO is taken if the gfp
> patamerer has to be added?

A reset always applies to the whole device. Resets are used in error
handling of block devices (storage and uas). If you reset a device,
pre_reset() and post_reset() of all interfaces need to be called. So they
are part of the SCSI error handler. SCSI error handlers can allocate memory
only with GFP_NOIO (or GFP_ATOMIC) because any IO for paging
can cause the SCSI layer to wait for the error handling to finish. The error
handling can only finish when pre/post_reset() have finished. Catch-22

So any control messages in block error handling need to use GFP_NOIO.
If you look at the control message helpers in usbcore you will find a lot
of GFP_NOIO. That is the reason.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 10, 2012, 11:02 a.m.
On Wed, Oct 10, 2012 at 6:08 PM, Oliver Neukum <oneukum@suse.de> wrote:

>
> A reset always applies to the whole device. Resets are used in error
> handling of block devices (storage and uas). If you reset a device,
> pre_reset() and post_reset() of all interfaces need to be called. So they
> are part of the SCSI error handler. SCSI error handlers can allocate memory
> only with GFP_NOIO (or GFP_ATOMIC) because any IO for paging
> can cause the SCSI layer to wait for the error handling to finish. The error
> handling can only finish when pre/post_reset() have finished. Catch-22

IMO, it is not practical to obey the rule for drivers, because driver may
call many other kernel component API which may allocate memory
via GFP_KERNEL in the path easily.

Same with runtime PM case.

>
> So any control messages in block error handling need to use GFP_NOIO.
> If you look at the control message helpers in usbcore you will find a lot
> of GFP_NOIO. That is the reason.

If one driver has no .pre_reset or .post_reset, usb_reset_device() will
try to unbind&bind the interface, so you mean these usb drivers have
to use GFP_NOIO to allocate memory in its probe() and disconnect()?
Unfortunately, that is not true, and no way to make sure all memory
allocations in the path via GFP_NOIO, IMO.

Also, only very few drivers have implemented .pre_reset and .post_reset.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight - Oct. 10, 2012, 11:25 a.m.
> On Wed, Oct 10, 2012 at 6:08 PM, Oliver Neukum <oneukum@suse.de> wrote:
> 
> > A reset always applies to the whole device. Resets are used in error
> > handling of block devices (storage and uas). If you reset a device,
> > pre_reset() and post_reset() of all interfaces need to be called. So they
> > are part of the SCSI error handler. SCSI error handlers can allocate memory
> > only with GFP_NOIO (or GFP_ATOMIC) because any IO for paging
> > can cause the SCSI layer to wait for the error handling to finish. The error
> > handling can only finish when pre/post_reset() have finished. Catch-22
> 
> IMO, it is not practical to obey the rule for drivers, because driver may
> call many other kernel component API which may allocate memory
> via GFP_KERNEL in the path easily.

What about the error handler/sleep/resume code calling into the
memory allocator to indicate that all allocates be GFP_NOIO until
it calls back to indicate that the restricted path is complete.
Might be a per-cpu count?

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 10, 2012, 11:39 a.m.
On Wednesday 10 October 2012 12:25:58 David Laight wrote:
> > On Wed, Oct 10, 2012 at 6:08 PM, Oliver Neukum <oneukum@suse.de> wrote:
> > 
> > > A reset always applies to the whole device. Resets are used in error
> > > handling of block devices (storage and uas). If you reset a device,
> > > pre_reset() and post_reset() of all interfaces need to be called. So they
> > > are part of the SCSI error handler. SCSI error handlers can allocate memory
> > > only with GFP_NOIO (or GFP_ATOMIC) because any IO for paging
> > > can cause the SCSI layer to wait for the error handling to finish. The error
> > > handling can only finish when pre/post_reset() have finished. Catch-22
> > 
> > IMO, it is not practical to obey the rule for drivers, because driver may
> > call many other kernel component API which may allocate memory
> > via GFP_KERNEL in the path easily.
> 
> What about the error handler/sleep/resume code calling into the
> memory allocator to indicate that all allocates be GFP_NOIO until
> it calls back to indicate that the restricted path is complete.

This seems to be a very complex scheme.

> Might be a per-cpu count?

No. The handlers may sleep and switch CPUs.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 10, 2012, 11:45 a.m.
On Wed, Oct 10, 2012 at 7:25 PM, David Laight <David.Laight@aculab.com> wrote:

>
> What about the error handler/sleep/resume code calling into the
> memory allocator to indicate that all allocates be GFP_NOIO until
> it calls back to indicate that the restricted path is complete.
> Might be a per-cpu count?

IMO, it might be a per-task variable, but unfortunately no such
mechanism is provided by current kernel.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 11, 2012, 3:18 a.m.
On Wed, Oct 10, 2012 at 1:51 PM, Oliver Neukum <oneukum@suse.de> wrote:
>
> No, the problem is autoresume.
>
> Suppose we have a device with two interface. Interface A be usbnet; interface B
> something you page on. Now consider that you can only resume both interfaces
> and this is (and needs to be) done synchronously.
>
> Now we can have this code path:
>
> autoresume of device -> resume() -> kmalloc(..., GFP_KERNEL) ->
> VM layer decides to start paging out -> IO to interface B -> autoresume of device
> --> DEADLOCK

Currently scsi disk can only be runtime suspended when the device is not
opened, so are you sure that the paging out above can cause IO on a suspend
usb mass storage disk which is not mounted or opened by utility now?


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 11, 2012, 4:11 a.m.
On Thursday 11 October 2012 11:18:22 Ming Lei wrote:
> On Wed, Oct 10, 2012 at 1:51 PM, Oliver Neukum <oneukum@suse.de> wrote:
> >
> > No, the problem is autoresume.
> >
> > Suppose we have a device with two interface. Interface A be usbnet; interface B
> > something you page on. Now consider that you can only resume both interfaces
> > and this is (and needs to be) done synchronously.
> >
> > Now we can have this code path:
> >
> > autoresume of device -> resume() -> kmalloc(..., GFP_KERNEL) ->
> > VM layer decides to start paging out -> IO to interface B -> autoresume of device
> > --> DEADLOCK
> 
> Currently scsi disk can only be runtime suspended when the device is not
> opened, so are you sure that the paging out above can cause IO on a suspend
> usb mass storage disk which is not mounted or opened by utility now?

We definitely do not wish to keep it that way. People at Intel are currently working
on better power management for sd, which would allow full autosuspend.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 11, 2012, 8:14 a.m.
On Thu, Oct 11, 2012 at 12:11 PM, Oliver Neukum <oneukum@suse.de> wrote:
> On Thursday 11 October 2012 11:18:22 Ming Lei wrote:

>> Currently scsi disk can only be runtime suspended when the device is not
>> opened, so are you sure that the paging out above can cause IO on a suspend
>> usb mass storage disk which is not mounted or opened by utility now?
>
> We definitely do not wish to keep it that way. People at Intel are currently working
> on better power management for sd, which would allow full autosuspend.

OK, got it.

For auto-resume situation, it can be solved with switching the gpf_t flag
runtime inside helper, but I think it is better to do it after the sd's
full autosuspend is seen in -next tree.

For error handling case, it is inevitably for usbnet to allocate memory
with GFP_KERNEL because no usbnet drivers have implemented
.pre_reset and .post_reset callback, and no such actual problems
have been reported until now, so it should be OK to not consider the
case now.

So could we merge the patch set[1-11] first?

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 11, 2012, 9:05 a.m.
On Thursday 11 October 2012 16:14:02 Ming Lei wrote:
> On Thu, Oct 11, 2012 at 12:11 PM, Oliver Neukum <oneukum@suse.de> wrote:
> > On Thursday 11 October 2012 11:18:22 Ming Lei wrote:
> 
> >> Currently scsi disk can only be runtime suspended when the device is not
> >> opened, so are you sure that the paging out above can cause IO on a suspend
> >> usb mass storage disk which is not mounted or opened by utility now?
> >
> > We definitely do not wish to keep it that way. People at Intel are currently working
> > on better power management for sd, which would allow full autosuspend.
> 
> OK, got it.
> 
> For auto-resume situation, it can be solved with switching the gpf_t flag
> runtime inside helper, but I think it is better to do it after the sd's
> full autosuspend is seen in -next tree.

That depends on whether an API change would be necessary.
Changing the code only when necessary is no problem. But the
API I want to do right from the beginning if that is possible.

This depends on whether you get in your extension to task_struct.
If you do, we can certainly use it also for the suspend case.

> For error handling case, it is inevitably for usbnet to allocate memory
> with GFP_KERNEL because no usbnet drivers have implemented
> .pre_reset and .post_reset callback, and no such actual problems
> have been reported until now, so it should be OK to not consider the
> case now.
> 
> So could we merge the patch set[1-11] first?

Given the solution for error handling you came up with, yes, we can.
Would you resubmit?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 11, 2012, 11:29 a.m.
On Thu, Oct 11, 2012 at 5:05 PM, Oliver Neukum <oneukum@suse.de> wrote:
> That depends on whether an API change would be necessary.
> Changing the code only when necessary is no problem. But the
> API I want to do right from the beginning if that is possible.

For the auto-resume situation, the current API is OK even without
changes to task_struct.

>
> This depends on whether you get in your extension to task_struct.
> If you do, we can certainly use it also for the suspend case.

I will do it.

>
>> For error handling case, it is inevitably for usbnet to allocate memory
>> with GFP_KERNEL because no usbnet drivers have implemented
>> .pre_reset and .post_reset callback, and no such actual problems
>> have been reported until now, so it should be OK to not consider the
>> case now.
>>
>> So could we merge the patch set[1-11] first?
>
> Given the solution for error handling you came up with, yes, we can.
> Would you resubmit?

OK, will do it.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Oct. 11, 2012, 2:36 p.m.
On Thu, 11 Oct 2012, Oliver Neukum wrote:

> On Thursday 11 October 2012 11:18:22 Ming Lei wrote:
> > On Wed, Oct 10, 2012 at 1:51 PM, Oliver Neukum <oneukum@suse.de> wrote:
> > >
> > > No, the problem is autoresume.
> > >
> > > Suppose we have a device with two interface. Interface A be usbnet; interface B
> > > something you page on. Now consider that you can only resume both interfaces
> > > and this is (and needs to be) done synchronously.
> > >
> > > Now we can have this code path:
> > >
> > > autoresume of device -> resume() -> kmalloc(..., GFP_KERNEL) ->
> > > VM layer decides to start paging out -> IO to interface B -> autoresume of device
> > > --> DEADLOCK
> > 
> > Currently scsi disk can only be runtime suspended when the device is not
> > opened, so are you sure that the paging out above can cause IO on a suspend
> > usb mass storage disk which is not mounted or opened by utility now?
> 
> We definitely do not wish to keep it that way. People at Intel are currently working
> on better power management for sd, which would allow full autosuspend.

It's worse than you may realize.  When a SCSI disk is suspended, all of
its ancestor devices may be suspended too.  Pages can't be read in from
the drive until all those ancestors are resumed.  This means that all
runtime resume code paths for all drivers that could be bound to an
ancestor of a block device must avoid GFP_KERNEL.  In practice it's
probably easiest for the runtime PM core to use tsk_set_allowd_gfp()
before calling any runtime_resume method.

Or at least, this will be true when sd supports nontrivial autosuspend.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 12, 2012, 1:43 a.m.
On Thu, Oct 11, 2012 at 10:36 PM, Alan Stern <stern@rowland.harvard.edu> wrote:

>
> It's worse than you may realize.  When a SCSI disk is suspended, all of
> its ancestor devices may be suspended too.  Pages can't be read in from
> the drive until all those ancestors are resumed.  This means that all
> runtime resume code paths for all drivers that could be bound to an
> ancestor of a block device must avoid GFP_KERNEL.  In practice it's

Exactly, so several subsystems(for example, pci, usb, scsi) will be involved,
and converting GFP_KERNEL in runtime PM path to GFP_NOIO becomes
more difficult.

> probably easiest for the runtime PM core to use tsk_set_allowd_gfp()
> before calling any runtime_resume method.

Yes, it might be done in usb runtime resume context because all
usb device might include a mass storage interface. But, in fact,
we can find if there is one mass storage interface on the current
configuration easily inside usb_runtime_resume().

Also, we can loose the constraint in runtime PM core, before calling
runtime_resume callback for one device, the current context is marked
as ~GFP_IOFS only if it is a block device or there is one block device
descendant. But the approach becomes a bit complicated because
device tree traversing is involved.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 12, 2012, 1:51 p.m.
On Thursday 11 October 2012 10:36:22 Alan Stern wrote:

> It's worse than you may realize.  When a SCSI disk is suspended, all of
> its ancestor devices may be suspended too.  Pages can't be read in from
> the drive until all those ancestors are resumed.  This means that all
> runtime resume code paths for all drivers that could be bound to an
> ancestor of a block device must avoid GFP_KERNEL.  In practice it's
> probably easiest for the runtime PM core to use tsk_set_allowd_gfp()
> before calling any runtime_resume method.
> 
> Or at least, this will be true when sd supports nontrivial autosuspend.

Up to now, I've found three driver for which tsk_set_allowd_gfp() wouldn't
do the job. They boil down into two types of errors. That is surprisingly good.

First we have workqueues. bas-gigaset is a good example.
The driver kills a scheduled work in pre_reset(). If this is done synchronously
the driver may need to wait for a memory allocation inside the work.
In principle we could provide a workqueue limited to GFP_NOIO. Is that worth
it, or do we just check?

Second there is a problem just like priority inversion with realtime tasks.
usb-skeleton and ati_remote2
They take mutexes which are also taken in other code paths. So the error
handler may need to wait for a mutex to be dropped which can only happen
if a memory allocation succeeds, which is waiting for the error handler.

usb-skeleton is even worse, as it does copy_to_user(). I guess copy_to/from_user
must simply not be done under such a mutex.

I am afraid there is no generic solution in the last two cases. What do you think?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 12, 2012, 3:17 p.m.
On Fri, Oct 12, 2012 at 9:51 PM, Oliver Neukum <oneukum@suse.de> wrote:
> On Thursday 11 October 2012 10:36:22 Alan Stern wrote:
>
>> It's worse than you may realize.  When a SCSI disk is suspended, all of
>> its ancestor devices may be suspended too.  Pages can't be read in from
>> the drive until all those ancestors are resumed.  This means that all
>> runtime resume code paths for all drivers that could be bound to an
>> ancestor of a block device must avoid GFP_KERNEL.  In practice it's
>> probably easiest for the runtime PM core to use tsk_set_allowd_gfp()
>> before calling any runtime_resume method.
>>
>> Or at least, this will be true when sd supports nontrivial autosuspend.
>
> Up to now, I've found three driver for which tsk_set_allowd_gfp() wouldn't
> do the job. They boil down into two types of errors. That is surprisingly good.

Looks all are very good examples, :-)

>
> First we have workqueues. bas-gigaset is a good example.
> The driver kills a scheduled work in pre_reset(). If this is done synchronously
> the driver may need to wait for a memory allocation inside the work.
> In principle we could provide a workqueue limited to GFP_NOIO. Is that worth
> it, or do we just check?

The easiest way is to always call tsk_set_allowd_gfp(~GFP_IOFS) in the
start of work function under the situation, and restore the flag in the end
of the work function.

>
> Second there is a problem just like priority inversion with realtime tasks.
> usb-skeleton and ati_remote2
> They take mutexes which are also taken in other code paths. So the error
> handler may need to wait for a mutex to be dropped which can only happen
> if a memory allocation succeeds, which is waiting for the error handler.

Suppose mutex_lock(A) is called in pre_reset(), one solution is that
always calling tsk_set_allowd_gfp(~GFP_IOFS) before each mutex_lock(A).
We can do it only for devices with storage interface in current
configuration.


Thanks´╝î
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Oct. 12, 2012, 3:18 p.m.
On Fri, 12 Oct 2012, Ming Lei wrote:

> > probably easiest for the runtime PM core to use tsk_set_allowd_gfp()
> > before calling any runtime_resume method.
> 
> Yes, it might be done in usb runtime resume context because all
> usb device might include a mass storage interface. But, in fact,
> we can find if there is one mass storage interface on the current
> configuration easily inside usb_runtime_resume().
> 
> Also, we can loose the constraint in runtime PM core, before calling
> runtime_resume callback for one device, the current context is marked
> as ~GFP_IOFS only if it is a block device or there is one block device
> descendant. But the approach becomes a bit complicated because
> device tree traversing is involved.

Exactly.  It's much easier just to do it always.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Oct. 12, 2012, 3:29 p.m.
On Fri, 12 Oct 2012, Oliver Neukum wrote:

> On Thursday 11 October 2012 10:36:22 Alan Stern wrote:
> 
> > It's worse than you may realize.  When a SCSI disk is suspended, all of
> > its ancestor devices may be suspended too.  Pages can't be read in from
> > the drive until all those ancestors are resumed.  This means that all
> > runtime resume code paths for all drivers that could be bound to an
> > ancestor of a block device must avoid GFP_KERNEL.  In practice it's
> > probably easiest for the runtime PM core to use tsk_set_allowd_gfp()
> > before calling any runtime_resume method.
> > 
> > Or at least, this will be true when sd supports nontrivial autosuspend.
> 
> Up to now, I've found three driver for which tsk_set_allowd_gfp() wouldn't
> do the job. They boil down into two types of errors. That is surprisingly good.
> 
> First we have workqueues. bas-gigaset is a good example.
> The driver kills a scheduled work in pre_reset(). If this is done synchronously
> the driver may need to wait for a memory allocation inside the work.
> In principle we could provide a workqueue limited to GFP_NOIO. Is that worth
> it, or do we just check?

The work routine could set the GFP mask upon entry and exit.  Then a 
separate workqueue wouldn't be needed.

> Second there is a problem just like priority inversion with realtime tasks.
> usb-skeleton and ati_remote2
> They take mutexes which are also taken in other code paths. So the error
> handler may need to wait for a mutex to be dropped which can only happen
> if a memory allocation succeeds, which is waiting for the error handler.
> 
> usb-skeleton is even worse, as it does copy_to_user(). I guess copy_to/from_user
> must simply not be done under such a mutex.

Right.

> I am afraid there is no generic solution in the last two cases. What do you think?

The other contexts must also set the GFP mask.  Unfortunately, this has 
to be done case-by-case.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei - Oct. 12, 2012, 3:33 p.m.
On Fri, Oct 12, 2012 at 11:17 PM, Ming Lei <ming.lei@canonical.com> wrote:

> Suppose mutex_lock(A) is called in pre_reset(), one solution is that
> always calling tsk_set_allowd_gfp(~GFP_IOFS) before each mutex_lock(A).
> We can do it only for devices with storage interface in current
> configuration.

The problem will become quite complicated if a 3rd, even 4th,... context
is involved in the task dependency.

But I am wondering if there are such practical examples wrt. usb
mass storage.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum - Oct. 15, 2012, 10:04 a.m.
On Friday 12 October 2012 11:29:49 Alan Stern wrote:
> On Fri, 12 Oct 2012, Oliver Neukum wrote:

> > First we have workqueues. bas-gigaset is a good example.
> > The driver kills a scheduled work in pre_reset(). If this is done synchronously
> > the driver may need to wait for a memory allocation inside the work.
> > In principle we could provide a workqueue limited to GFP_NOIO. Is that worth
> > it, or do we just check?
> 
> The work routine could set the GFP mask upon entry and exit.  Then a 
> separate workqueue wouldn't be needed.

Well, yes. But if we have to touch the code we might just as well use GFP-NOIO

> > I am afraid there is no generic solution in the last two cases. What do you think?
> 
> The other contexts must also set the GFP mask.  Unfortunately, this has 
> to be done case-by-case.

This raises a question. If we do the port-power-off stuff, does reset_resume() of every
device under the depowered port have to be called? If so, we cannot exclude vendor
specific drivers from the audit, can we?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Oct. 15, 2012, 2:27 p.m.
On Mon, 15 Oct 2012, Oliver Neukum wrote:

> On Friday 12 October 2012 11:29:49 Alan Stern wrote:
> > On Fri, 12 Oct 2012, Oliver Neukum wrote:
> 
> > > First we have workqueues. bas-gigaset is a good example.
> > > The driver kills a scheduled work in pre_reset(). If this is done synchronously
> > > the driver may need to wait for a memory allocation inside the work.
> > > In principle we could provide a workqueue limited to GFP_NOIO. Is that worth
> > > it, or do we just check?
> > 
> > The work routine could set the GFP mask upon entry and exit.  Then a 
> > separate workqueue wouldn't be needed.
> 
> Well, yes. But if we have to touch the code we might just as well use GFP-NOIO

Depends on what the code does.  If it does very little requiring memory 
allocation then yes, you could simply use GFP_NOIO.  But if it calls 
lots of other routines that all do their own allocation, setting the 
mask will be better.

> > > I am afraid there is no generic solution in the last two cases. What do you think?
> > 
> > The other contexts must also set the GFP mask.  Unfortunately, this has 
> > to be done case-by-case.
> 
> This raises a question. If we do the port-power-off stuff, does reset_resume() of every
> device under the depowered port have to be called?

Certainly.

>  If so, we cannot exclude vendor
> specific drivers from the audit, can we?

True.  But hasn't that always been the case?  A device could need a 
vendor-specific driver for one interface while another interface uses a 
standard mass-storage protocol.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index fc9f578..3b51554 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1592,6 +1592,139 @@  int usbnet_resume (struct usb_interface *intf)
 }
 EXPORT_SYMBOL_GPL(usbnet_resume);
 
+/*-------------------------------------------------------------------------*/
+int usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
+		    u16 value, u16 index, void *data, u16 size)
+{
+	void *buf = NULL;
+	int err = -ENOMEM;
+
+	netdev_dbg(dev->net, "usbnet_read_cmd cmd=0x%02x reqtype=%02x"
+		   " value=0x%04x index=0x%04x size=%d\n",
+		   cmd, reqtype, value, index, size);
+
+	if (data) {
+		buf = kmalloc(size, GFP_KERNEL);
+		if (!buf)
+			goto out;
+	}
+
+	err = usb_control_msg(dev->udev, usb_rcvctrlpipe(dev->udev, 0),
+			      cmd, reqtype, value, index, buf, size,
+			      USB_CTRL_GET_TIMEOUT);
+	if (err > 0 && err <= size)
+		memcpy(data, buf, err);
+	kfree(buf);
+out:
+	return err;
+}
+EXPORT_SYMBOL_GPL(usbnet_read_cmd);
+
+int usbnet_write_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
+		     u16 value, u16 index, const void *data, u16 size)
+{
+	void *buf = NULL;
+	int err = -ENOMEM;
+
+	netdev_dbg(dev->net, "usbnet_write_cmd cmd=0x%02x reqtype=%02x"
+		   " value=0x%04x index=0x%04x size=%d\n",
+		   cmd, reqtype, value, index, size);
+
+	if (data) {
+		buf = kmemdup(data, size, GFP_KERNEL);
+		if (!buf)
+			goto out;
+	}
+
+	err = usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev, 0),
+			      cmd, reqtype, value, index, buf, size,
+			      USB_CTRL_SET_TIMEOUT);
+	kfree(buf);
+
+out:
+	return err;
+}
+EXPORT_SYMBOL_GPL(usbnet_write_cmd);
+
+static void usbnet_async_cmd_cb(struct urb *urb)
+{
+	struct usb_ctrlrequest *req = (struct usb_ctrlrequest *)urb->context;
+	int status = urb->status;
+
+	if (status < 0)
+		dev_dbg(&urb->dev->dev, "%s failed with %d",
+			__func__, status);
+
+	kfree(req);
+	usb_free_urb(urb);
+}
+
+int usbnet_write_cmd_async(struct usbnet *dev, u8 cmd, u8 reqtype,
+			   u16 value, u16 index, const void *data, u16 size)
+{
+	struct usb_ctrlrequest *req = NULL;
+	struct urb *urb;
+	int err = -ENOMEM;
+	void *buf = NULL;
+
+	netdev_dbg(dev->net, "usbnet_write_cmd cmd=0x%02x reqtype=%02x"
+		   " value=0x%04x index=0x%04x size=%d\n",
+		   cmd, reqtype, value, index, size);
+
+	urb = usb_alloc_urb(0, GFP_ATOMIC);
+	if (!urb) {
+		netdev_err(dev->net, "Error allocating URB in"
+			   " %s!\n", __func__);
+		goto fail;
+	}
+
+	if (data) {
+		buf = kmemdup(data, size, GFP_ATOMIC);
+		if (!buf) {
+			netdev_err(dev->net, "Error allocating buffer"
+				   " in %s!\n", __func__);
+			goto fail_free;
+		}
+	}
+
+	req = kmalloc(sizeof(struct usb_ctrlrequest), GFP_ATOMIC);
+	if (!req) {
+		netdev_err(dev->net, "Failed to allocate memory for %s\n",
+			   __func__);
+		goto fail_free_buf;
+	}
+
+	req->bRequestType = reqtype;
+	req->bRequest = cmd;
+	req->wValue = cpu_to_le16(value);
+	req->wIndex = cpu_to_le16(index);
+	req->wLength = cpu_to_le16(size);
+
+	usb_fill_control_urb(urb, dev->udev,
+			     usb_sndctrlpipe(dev->udev, 0),
+			     (void *)req, buf, size,
+			     usbnet_async_cmd_cb, req);
+	urb->transfer_flags |= URB_FREE_BUFFER;
+
+	err = usb_submit_urb(urb, GFP_ATOMIC);
+	if (err < 0) {
+		netdev_err(dev->net, "Error submitting the control"
+			   " message: status=%d\n", err);
+		goto fail_free;
+	}
+	return 0;
+
+fail_free_buf:
+	kfree(buf);
+fail_free:
+	kfree(req);
+	usb_free_urb(urb);
+fail:
+	return err;
+
+}
+EXPORT_SYMBOL_GPL(usbnet_write_cmd_async);
+
 
 /*-------------------------------------------------------------------------*/
 
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index f87cf62..32a57dd 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -161,6 +161,12 @@  extern int usbnet_suspend(struct usb_interface *, pm_message_t);
 extern int usbnet_resume(struct usb_interface *);
 extern void usbnet_disconnect(struct usb_interface *);
 
+extern int usbnet_read_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
+		    u16 value, u16 index, void *data, u16 size);
+extern int usbnet_write_cmd(struct usbnet *dev, u8 cmd, u8 reqtype,
+		    u16 value, u16 index, const void *data, u16 size);
+extern int usbnet_write_cmd_async(struct usbnet *dev, u8 cmd, u8 reqtype,
+		    u16 value, u16 index, const void *data, u16 size);
 
 /* Drivers that reuse some of the standard USB CDC infrastructure
  * (notably, using multiple interfaces according to the CDC