Patchwork 2.6.29-rc3: tg3 dead after resume

login
register
mail settings
Submitter Rafael J. Wysocki
Date Jan. 30, 2009, 11:31 p.m.
Message ID <200901310031.37660.rjw@sisk.pl>
Download mbox | patch
Permalink /patch/21288/
State RFC
Delegated to: David Miller
Headers show

Comments

Rafael J. Wysocki - Jan. 30, 2009, 11:31 p.m.
On Saturday 31 January 2009, Parag Warudkar wrote:
> 
> On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:
> 
> > On Friday 30 January 2009, Parag Warudkar wrote:
> > > 
> > > On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:
> > > 
> > > > 
> > > > I still am interested if it makes any difference for Parag.
> > > 
> > > No difference - tg3 is still dead after resume.
> > 
> > Thanks for testing.
> > 
> > Well, I'm not sure if tg3 is at fault, really.
> > 
> > What happens if you unload tg3 before suspend and load it back after the
> > resume?
> 
> This time it fails with different error on loading after suspend/resume 
> cycle -
> 
>  1196.873608] tg3 0000:0e:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [ 1196.873620] tg3 0000:0e:00.0: setting latency timer to 64
> [ 1196.880017] tg3 0000:0e:00.0: PME# disabled
> [ 1196.996270] tg3: (0000:0e:00.0) phy probe failed, err -19
> [ 1197.508033] tg3: Problem fetching invariants of chip, aborting.
> [ 1197.508048] tg3 0000:0e:00.0: PCI INT A disabled

It seems like something between the tg3 chip and the host CPU doesn't work
correctly after resume, Linus is right.

I wonder if this change makes any difference:



Rafael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds - Jan. 30, 2009, 11:51 p.m.
On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> 
> I wonder if this change makes any difference:
> 
> --- linux-2.6.orig/drivers/pci/pci-driver.c
> +++ linux-2.6/drivers/pci/pci-driver.c
> @@ -501,6 +501,9 @@ static int pci_pm_suspend(struct device
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_suspend(dev, PMSG_SUSPEND);
>  
> +	if (!drv || !drv->pm)
> +		return 0;
> +
>  	if (drv && drv->pm && drv->pm->suspend) {
>  		error = drv->pm->suspend(dev);
>  		suspend_report_result(drv->pm->suspend, error);

I don't think that's right. Now you don't end up calling 
pci_pm_default_suspend_generic() at all, and this no pci_save_state().

But I think it could easily be the call to pci_disable_enabled_device(). 
It does that

	if (atomic_read(&dev->enable_cnt))
		do_pci_disable_device(dev);

and that ends up disabling PCI_COMMAND_MASTER and then calling 
pcibios_disable_device().

Any device we have ever done pci_enable_device() on would trigger this, 
which includes PCIE bridges, for example. And while the pcie driver does 
that

	pcie_portdrv_restore_config ->
		pci_enable_device(dev);

thing to re-enable it, that's a no-op since the enable_count is already 
non-zero.

And we do try to restore it (pci_restore_standard_config() will call 
pci_restore_state()), but since we've done the 
pci_disable_enabled_device() _before_ we did the pci_save_state(), we now 
restore a non-working setup. 

I think. The rules are too damn subtle there.  Rafael, can you look around 
a bit?

		Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Jan. 31, 2009, 12:07 a.m.
On Saturday 31 January 2009, Linus Torvalds wrote:
> 
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> > 
> > I wonder if this change makes any difference:
> > 
> > --- linux-2.6.orig/drivers/pci/pci-driver.c
> > +++ linux-2.6/drivers/pci/pci-driver.c
> > @@ -501,6 +501,9 @@ static int pci_pm_suspend(struct device
> >  	if (pci_has_legacy_pm_support(pci_dev))
> >  		return pci_legacy_suspend(dev, PMSG_SUSPEND);
> >  
> > +	if (!drv || !drv->pm)
> > +		return 0;
> > +
> >  	if (drv && drv->pm && drv->pm->suspend) {
> >  		error = drv->pm->suspend(dev);
> >  		suspend_report_result(drv->pm->suspend, error);
> 
> I don't think that's right. Now you don't end up calling 
> pci_pm_default_suspend_generic() at all, and this no pci_save_state().
> 
> But I think it could easily be the call to pci_disable_enabled_device(). 
> It does that
> 
> 	if (atomic_read(&dev->enable_cnt))
> 		do_pci_disable_device(dev);
> 
> and that ends up disabling PCI_COMMAND_MASTER and then calling 
> pcibios_disable_device().
> 
> Any device we have ever done pci_enable_device() on would trigger this, 
> which includes PCIE bridges, for example. And while the pcie driver does 
> that
> 
> 	pcie_portdrv_restore_config ->
> 		pci_enable_device(dev);
> 
> thing to re-enable it, that's a no-op since the enable_count is already 
> non-zero.
> 
> And we do try to restore it (pci_restore_standard_config() will call 
> pci_restore_state()), but since we've done the 
> pci_disable_enabled_device() _before_ we did the pci_save_state(), we now 
> restore a non-working setup. 
> 
> I think. The rules are too damn subtle there.  Rafael, can you look around 
> a bit?

Sure, I'm looking at it right now.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Jan. 31, 2009, 12:34 a.m.
On Saturday 31 January 2009, Linus Torvalds wrote:
> 
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> > 
> > I wonder if this change makes any difference:
> > 
> > --- linux-2.6.orig/drivers/pci/pci-driver.c
> > +++ linux-2.6/drivers/pci/pci-driver.c
> > @@ -501,6 +501,9 @@ static int pci_pm_suspend(struct device
> >  	if (pci_has_legacy_pm_support(pci_dev))
> >  		return pci_legacy_suspend(dev, PMSG_SUSPEND);
> >  
> > +	if (!drv || !drv->pm)
> > +		return 0;
> > +
> >  	if (drv && drv->pm && drv->pm->suspend) {
> >  		error = drv->pm->suspend(dev);
> >  		suspend_report_result(drv->pm->suspend, error);
> 
> I don't think that's right. Now you don't end up calling 
> pci_pm_default_suspend_generic() at all, and this no pci_save_state().
> 
> But I think it could easily be the call to pci_disable_enabled_device(). 
> It does that
> 
> 	if (atomic_read(&dev->enable_cnt))
> 		do_pci_disable_device(dev);
> 
> and that ends up disabling PCI_COMMAND_MASTER and then calling 
> pcibios_disable_device().

pci_disable_enabled_device() is not called for the PCIe port driver, because
it has the legacy PM support.

What happens is

pci_pm_suspend(port) ->
 		pci_legacy_suspend(port) ->
 		 		pcie_portdrv_suspend(port) [this doesn't save the state]
 		 		pci_save_state(port)

and then, with interrupts off

pci_pm_suspend_noirq(port) ->
 		pci_legacy_suspend_late(port) ->
 		 		pcie_portdrv_suspend_late(port) ->
 		 		 		pci_save_state(port)

and I suspect this last pci_save_state() breaks things.  I'm not sure why,
though.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -501,6 +501,9 @@  static int pci_pm_suspend(struct device
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend(dev, PMSG_SUSPEND);
 
+	if (!drv || !drv->pm)
+		return 0;
+
 	if (drv && drv->pm && drv->pm->suspend) {
 		error = drv->pm->suspend(dev);
 		suspend_report_result(drv->pm->suspend, error);