diff mbox

Make e100 suspend handler support PCI cards lacking PM capability

Message ID 200906141606.29754.rjw@sisk.pl
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Rafael J. Wysocki June 14, 2009, 2:06 p.m. UTC
On Sunday 14 June 2009, Andreas Mohr wrote:
> Hi,
> 
> On Sun, Jun 14, 2009 at 12:28:15AM +0200, Rafael J. Wysocki wrote:
> > On Saturday 13 June 2009, Andreas Mohr wrote:
> > > Hi all,
> > > 
> > > after having added non-MII PHY card support to e100, I noticed that
> > > the suspend handler rejects power-management non-capable PCI cards,
> > 
> > Well, that means we have a bug somewhere in the PCI PM core.
> 
> I don't know. I had shortly investigated the same thing,
> but it very much seemed that this is by design, pci_set_power_state()
> is documented to reject non-PM cards (in power/pci.txt, and in
> pci.c, too). Thus I didn't work in this area.
> 
> And from a cleanliness point of view pci_set_power_state()
> acting on a non-PM card with no special non-PM ACPI support _should_
> return an error status I guess.
> (especially since docs say that pci_set_power_state() should
> be used for PM cards)
> 
> > > causing a S2R request to immediately get back up to the desktop,
> > > losing network access in the process (rtnl mutex deadlock).
> > 
> > That's bad.
> 
> Indeed, and I have no idea what the problem was.
> rtnl_is_locked() always was false within suspend/resume,
> thus it had to be a userspace-triggered effect sometime
> _after_ (non-)resume happened
> (probably due to the network controller being down and thus inoperable
> after .suspend).
> 
> BTW, after that failed .suspend, .resume was not called. I assume this to
> be correct behaviour.
> 
> > >  static int __e100_power_off(struct pci_dev *pdev, bool wake)
> > >  {
> > > +	/* some older devices don't support PCI PM
> > > +	 * (e.g. mac_82557_D100_B combo card with 80c24 PHY)
> > > +	 * - skip those! (they most likely won't support WoL either)
> > > +	 */
> > > +	if (!pci_find_capability(pdev, PCI_CAP_ID_PM))
> > > +		return 0;
> > 
> > Devices without PCI_CAP_ID_PM may still be power-manageable by ACPI, so
> > returning 0 at this point is not a general solution.
> 
> Oh, interesting.
> 
> BTW, any idea why we have several drivers doing some seemingly useless
> 
>         /* Find power-management capability. */
>         pm_cap = pci_find_capability(pdev, PCI_CAP_ID_PM);
>         if (pm_cap == 0) {
>                 printk(KERN_ERR PFX "Cannot find PowerManagement capability, "
>                        "aborting.\n");
>                 err = -EIO;
>                 goto err_out_free_res;
>         }
> 
> ?
> 
> - it's code bloat
> - it needlessly rejects non-PM cards
> - it annoys the hell out of users of a non-PM card

No idea.

> > > +
> > >  	if (wake) {
> > >  		return pci_prepare_to_sleep(pdev);
> > 
> > pci_prepare_to_sleep() is supposed to return 0 for your device.  I'll have a
> > look at it.
> 
> No, wake is false for my card, thus that's not the branch to
> investigate.

Ah.  The problem is, then, that we try to put the device into D3, which it
cannot do and error code is correctly returned from pci_set_power_state().

I would use the appended patch in that case and the patch sent previously
is necessary for the 'wake = true' case.

Thanks,
Rafael

---
 drivers/net/e100.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andreas Mohr June 14, 2009, 4:46 p.m. UTC | #1
Hi,

On Sun, Jun 14, 2009 at 04:06:29PM +0200, Rafael J. Wysocki wrote:
> On Sunday 14 June 2009, Andreas Mohr wrote:
> > Hi,
> > 
> > On Sun, Jun 14, 2009 at 12:28:15AM +0200, Rafael J. Wysocki wrote:
> > > On Saturday 13 June 2009, Andreas Mohr wrote:
> > > > +
> > > >  	if (wake) {
> > > >  		return pci_prepare_to_sleep(pdev);
> > > 
> > > pci_prepare_to_sleep() is supposed to return 0 for your device.  I'll have a
> > > look at it.
> > 
> > No, wake is false for my card, thus that's not the branch to
> > investigate.
> 
> Ah.  The problem is, then, that we try to put the device into D3, which it
> cannot do and error code is correctly returned from pci_set_power_state().
> 
> I would use the appended patch in that case and the patch sent previously
> is necessary for the 'wake = true' case.

OK, as said I cannot test this right now, but I'm _damn_ sure it would
work. Thus I'd say your equivalent patch posted a bit later can be
committed already.

But what about the wake = true case?
I'm not affected by this since my card doesn't have any wake capa,
thus it's your call of whether that pci core code part really was broken
and needed fixing.

So, for the patch in your next mail:
Acked-by: Andreas Mohr <andi@lisas.de>


BTW, that patch was (pasted):

 static int __e100_power_off(struct pci_dev *pdev, bool wake)
 {
-       if (wake) {
+       if (wake)
                return pci_prepare_to_sleep(pdev);
-       } else {
-               pci_wake_from_d3(pdev, false);
-               return pci_set_power_state(pdev, PCI_D3hot);
-       }
+
+       pci_wake_from_d3(pdev, false);
+       pci_set_power_state(pdev, PCI_D3hot);
+
+       return 0;
 }


Couple questions still:
- why do we call pci_wake_from_d3(...false) only!?
  Wouldn't this break WoL after one iteration back and forth,
  due to missing 'true' case?
- why do we call netif_device_detach() _after_ doing hardware shutdown
  of the network controller? I'd guess this can cause huge issues?
  Someone told me he had rtnl lock issues upon S2D with e100
  (very similar to my rtnl issues during aborted .suspend),
  and that might possibly be the reason?

So few lines of code, so many questions...

Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 14, 2009, 5:09 p.m. UTC | #2
On Sunday 14 June 2009, Andreas Mohr wrote:
> Hi,
> 
> On Sun, Jun 14, 2009 at 04:06:29PM +0200, Rafael J. Wysocki wrote:
> > On Sunday 14 June 2009, Andreas Mohr wrote:
> > > Hi,
> > > 
> > > On Sun, Jun 14, 2009 at 12:28:15AM +0200, Rafael J. Wysocki wrote:
> > > > On Saturday 13 June 2009, Andreas Mohr wrote:
> > > > > +
> > > > >  	if (wake) {
> > > > >  		return pci_prepare_to_sleep(pdev);
> > > > 
> > > > pci_prepare_to_sleep() is supposed to return 0 for your device.  I'll have a
> > > > look at it.
> > > 
> > > No, wake is false for my card, thus that's not the branch to
> > > investigate.
> > 
> > Ah.  The problem is, then, that we try to put the device into D3, which it
> > cannot do and error code is correctly returned from pci_set_power_state().
> > 
> > I would use the appended patch in that case and the patch sent previously
> > is necessary for the 'wake = true' case.
> 
> OK, as said I cannot test this right now, but I'm _damn_ sure it would
> work. Thus I'd say your equivalent patch posted a bit later can be
> committed already.
> 
> But what about the wake = true case?
> I'm not affected by this since my card doesn't have any wake capa,
> thus it's your call of whether that pci core code part really was broken
> and needed fixing.

I think it needs fixing.

> So, for the patch in your next mail:
> Acked-by: Andreas Mohr <andi@lisas.de>
> 
> 
> BTW, that patch was (pasted):
> 
>  static int __e100_power_off(struct pci_dev *pdev, bool wake)
>  {
> -       if (wake) {
> +       if (wake)
>                 return pci_prepare_to_sleep(pdev);
> -       } else {
> -               pci_wake_from_d3(pdev, false);
> -               return pci_set_power_state(pdev, PCI_D3hot);
> -       }
> +
> +       pci_wake_from_d3(pdev, false);
> +       pci_set_power_state(pdev, PCI_D3hot);
> +
> +       return 0;
>  }
> 
> 
> Couple questions still:
> - why do we call pci_wake_from_d3(...false) only!?
>   Wouldn't this break WoL after one iteration back and forth,
>   due to missing 'true' case?

The 'true' case is the 'wake = true' one.

> - why do we call netif_device_detach() _after_ doing hardware shutdown
>   of the network controller? I'd guess this can cause huge issues?
>   Someone told me he had rtnl lock issues upon S2D with e100
>   (very similar to my rtnl issues during aborted .suspend),
>   and that might possibly be the reason?

I think you're right, but I'm not a network driver expert.

Perhaps you can change the ordering and see if that fixes the rtnl issue
(since you're able to reproduce it without my patch, that should be easy to
verify).

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Mohr June 14, 2009, 5:20 p.m. UTC | #3
Hi,

On Sun, Jun 14, 2009 at 07:09:45PM +0200, Rafael J. Wysocki wrote:
> On Sunday 14 June 2009, Andreas Mohr wrote:
> > Couple questions still:
> > - why do we call pci_wake_from_d3(...false) only!?
> >   Wouldn't this break WoL after one iteration back and forth,
> >   due to missing 'true' case?
> 
> The 'true' case is the 'wake = true' one.

OK, so it wasn't an explicit pci_wake_from_d3(...true), but the
operations done there are the equivalent of it probably.

> > - why do we call netif_device_detach() _after_ doing hardware shutdown
> >   of the network controller? I'd guess this can cause huge issues?
> >   Someone told me he had rtnl lock issues upon S2D with e100
> >   (very similar to my rtnl issues during aborted .suspend),
> >   and that might possibly be the reason?
> 
> I think you're right, but I'm not a network driver expert.
> 
> Perhaps you can change the ordering and see if that fixes the rtnl issue
> (since you're able to reproduce it without my patch, that should be easy to
> verify).

I'll test this - later.

Thanks a lot,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Mohr June 19, 2009, 8 a.m. UTC | #4
Hi,

On Sun, Jun 14, 2009 at 07:09:45PM +0200, Rafael J. Wysocki wrote:
> On Sunday 14 June 2009, Andreas Mohr wrote:
> > - why do we call netif_device_detach() _after_ doing hardware shutdown
> >   of the network controller? I'd guess this can cause huge issues?
> >   Someone told me he had rtnl lock issues upon S2D with e100
> >   (very similar to my rtnl issues during aborted .suspend),
> >   and that might possibly be the reason?
> 
> I think you're right, but I'm not a network driver expert.
> 
> Perhaps you can change the ordering and see if that fixes the rtnl issue
> (since you're able to reproduce it without my patch, that should be easy to
> verify).

Well, I just moved netif_device_detach() above netif_running() check,
but this didn't fix my network issues in case of a rejecting .suspend
handler: after resume when unloading e100, that hangs, and I get tons of
rtnl timeouts and locked rtnl mutex.
This is most likely because upon e100 unload, a backtrace showed that I
was hanging in e100_down -> msleep (somewhere at the very beginning of e100_down),
which is most definitely the inlined napi_disable() call there:

static inline void napi_disable(struct napi_struct *n)
{
        set_bit(NAPI_STATE_DISABLE, &n->state);
        while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
                msleep(1);
        clear_bit(NAPI_STATE_DISABLE, &n->state);
}

IOW the .suspend seems to keep NAPI layer active, yet due to .suspend failure
there's no .resume called, thus card is in an _inoperable_ state and
NAPI cannot be processed any further, thus napi_disable() on driver unload
locks up.


BTW, in include/linux/napi.h, shouldn't napi_disable() make use of
napi_synchronize() instead of C&P?
(simply move napi_synchronize() above napi_disable() and use it there)
Oh wait, there's the CONFIG_SMP complication:
napi_synchronize() is implemented for SMP only, whereas napi_disable()
checks the same thing _always_.
(or is it a BUG that napi_disable() does the same check for non-SMP,
too??)

Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/drivers/net/e100.c
===================================================================
--- linux-2.6.orig/drivers/net/e100.c
+++ linux-2.6/drivers/net/e100.c
@@ -2763,8 +2763,9 @@  static int __e100_power_off(struct pci_d
 		return pci_prepare_to_sleep(pdev);
 	} else {
 		pci_wake_from_d3(pdev, false);
-		return pci_set_power_state(pdev, PCI_D3hot);
+		pci_set_power_state(pdev, PCI_D3hot);
 	}
+	return 0;
 }
 
 #ifdef CONFIG_PM