Patchwork 2.6.29-rc3: tg3 dead after resume

login
register
mail settings
Submitter Parag Warudkar
Date Jan. 29, 2009, 1:49 a.m.
Message ID <alpine.DEB.2.00.0901282049040.5026@parag-desktop>
Download mbox | patch
Permalink /patch/20920/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Parag Warudkar - Jan. 29, 2009, 1:49 a.m.
On Wed, 28 Jan 2009, Linus Torvalds wrote:
 
> For example, if we get the "dev->current_state" cache wrong, then we may 
> not actually end up changing it when we should, because we think we 
> already match the target state. I don't _think_ that is it, but that's the 
> kind of thing that could happen.
> 
> Can you do a
> 
> 	lspci -vvxxx -s [tg3-device]
> 
> before-and-after suspend? Is there some state that looks like it got 
> corrupted?

Sure, diff -u below. There are differences but not sure if they are 
abnormal or expected.

Also, BTW, reverting the only tg3 specific commit - 
commit 9e9fd12dc0679643c191fc9795a3021807e77de4
Author: Matt Carlson <mcarlson@broadcom.com>
Date:   Mon Jan 19 16:57:45 2009 -0800

    tg3: Fix firmware loading

did not help.

parag@parag-desktop:~$ diff -u lspci-pre-suspend lspci-post-suspend
<4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
@@ -36,15 +36,15 @@
 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
 30: 00 00 04 20 48 00 00 00 00 00 00 00 03 01 00 00
 40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 20 00 64
-50: 03 58 fc 00 00 00 00 78 09 e8 78 00 7d c9 08 78
-60: 00 00 00 00 00 00 00 00 98 02 02 a0 00 00 18 76
-70: f2 10 00 00 c0 00 00 00 2c 00 00 00 00 00 00 00
-80: 3c 10 07 13 00 00 00 00 34 00 13 04 82 70 08 fc
-90: 19 be 00 01 00 00 00 b7 00 00 00 00 14 00 00 00
-a0: 00 00 00 00 4c 01 00 00 00 00 00 00 3e 01 00 00
-b0: 00 00 00 00 00 00 00 36 00 00 00 00 00 00 00 00
+50: 03 58 fc 00 00 00 00 78 09 e8 78 00 7e cb 08 a8
+60: 00 00 00 00 00 00 00 00 9a 02 02 a0 00 00 00 10
+70: 72 10 00 00 c0 00 00 00 2c 00 00 00 00 00 00 00
+80: 3c 10 07 13 00 00 00 00 00 00 00 00 fe 70 08 fc
+90: 11 be 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c0: 00 00 00 00 00 80 00 00 0e 00 00 00 00 00 00 00
 d0: 10 00 01 00 a0 8f 00 00 00 50 10 00 11 64 03 00
 e0: 40 00 11 10 00 00 00 00 05 d0 81 00 0c f0 e0 fe
-f0: 00 00 00 00 c9 41 00 00 00 00 00 00 00 00 00 00
+f0: 00 00 00 00 d1 41 00 00 00 00 00 00 00 00 00 00


Parag
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds - Jan. 29, 2009, 2:10 a.m.
On Wed, 28 Jan 2009, Parag Warudkar wrote:
> 
> Sure, diff -u below. There are differences but not sure if they are 
> abnormal or expected.

Well, they're all in the "extended set", ie not the basic registers that 
the PCI layer saves. The PCI layer normally just saves the low 16 dwords, 
along with the PCI[EX] capability thing.

None of the PCI save/restore routines have ever saved the extended state 
(well, "ever" is a strong word - I think we long ago used to pass in how 
many bytes we wanted saved, but got rid of it), and it certainly didn't 
change with the recent PCI suspend/resume changes.

I get the feeling that it's some odd tg3 issue. That tg3 driver does have 
that special

        /* Make sure register accesses (indirect or otherwise)
         * will function correctly.
         */
        pci_write_config_dword(tp->pdev,
                               TG3PCI_MISC_HOST_CTRL,
                               tp->misc_host_ctrl);

in its own version of setting the power state, and maybe that really 
_must_ happen before we actually set the state back to PCI_D0. That sounds 
very odd, but hey..

I added Matt Carlson to the cc, since he seems to be the main tg3 
authority here.

Matt: the whole discussion is on netdev and the kernel mailing list, but 
the short version is that -rc3 suspends and resumes for Parag again 
(unlike -rc2), but tg3 doesn't appear to resume properly. The generic PCI 
layer now does more at resume time (very early, when interrupts are still 
off), see

 - pci_pm_resume_noirq ->
     pci_pm_default_resume_noirq() ->
       pci_restore_standard_config()

for more of the details (basically it always does that 
"pci_restore_state()" and tries to bring the device back to PCI_D0).

			Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Carlson - Jan. 29, 2009, 2:19 a.m.
On Wed, Jan 28, 2009 at 06:10:37PM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 28 Jan 2009, Parag Warudkar wrote:
> > 
> > Sure, diff -u below. There are differences but not sure if they are 
> > abnormal or expected.
> 
> Well, they're all in the "extended set", ie not the basic registers that 
> the PCI layer saves. The PCI layer normally just saves the low 16 dwords, 
> along with the PCI[EX] capability thing.
> 
> None of the PCI save/restore routines have ever saved the extended state 
> (well, "ever" is a strong word - I think we long ago used to pass in how 
> many bytes we wanted saved, but got rid of it), and it certainly didn't 
> change with the recent PCI suspend/resume changes.
> 
> I get the feeling that it's some odd tg3 issue. That tg3 driver does have 
> that special
> 
>         /* Make sure register accesses (indirect or otherwise)
>          * will function correctly.
>          */
>         pci_write_config_dword(tp->pdev,
>                                TG3PCI_MISC_HOST_CTRL,
>                                tp->misc_host_ctrl);
> 
> in its own version of setting the power state, and maybe that really 
> _must_ happen before we actually set the state back to PCI_D0. That sounds 
> very odd, but hey..
> 
> I added Matt Carlson to the cc, since he seems to be the main tg3 
> authority here.
> 
> Matt: the whole discussion is on netdev and the kernel mailing list, but 
> the short version is that -rc3 suspends and resumes for Parag again 
> (unlike -rc2), but tg3 doesn't appear to resume properly. The generic PCI 
> layer now does more at resume time (very early, when interrupts are still 
> off), see
> 
>  - pci_pm_resume_noirq ->
>      pci_pm_default_resume_noirq() ->
>        pci_restore_standard_config()
> 
> for more of the details (basically it always does that 
> "pci_restore_state()" and tries to bring the device back to PCI_D0).

Thanks Linus.  I'm looking over the diffs Parag sent and I already see
some suspicious register settings.  Let me think about this some more
and then I'll jump into the discussion.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - Jan. 29, 2009, 10:22 p.m.
On Thursday 29 January 2009, Matt Carlson wrote:
> On Wed, Jan 28, 2009 at 06:10:37PM -0800, Linus Torvalds wrote:
> > 
> > 
> > On Wed, 28 Jan 2009, Parag Warudkar wrote:
> > > 
> > > Sure, diff -u below. There are differences but not sure if they are 
> > > abnormal or expected.
> > 
> > Well, they're all in the "extended set", ie not the basic registers that 
> > the PCI layer saves. The PCI layer normally just saves the low 16 dwords, 
> > along with the PCI[EX] capability thing.
> > 
> > None of the PCI save/restore routines have ever saved the extended state 
> > (well, "ever" is a strong word - I think we long ago used to pass in how 
> > many bytes we wanted saved, but got rid of it), and it certainly didn't 
> > change with the recent PCI suspend/resume changes.
> > 
> > I get the feeling that it's some odd tg3 issue. That tg3 driver does have 
> > that special
> > 
> >         /* Make sure register accesses (indirect or otherwise)
> >          * will function correctly.
> >          */
> >         pci_write_config_dword(tp->pdev,
> >                                TG3PCI_MISC_HOST_CTRL,
> >                                tp->misc_host_ctrl);
> > 
> > in its own version of setting the power state, and maybe that really 
> > _must_ happen before we actually set the state back to PCI_D0. That sounds 
> > very odd, but hey..
> > 
> > I added Matt Carlson to the cc, since he seems to be the main tg3 
> > authority here.
> > 
> > Matt: the whole discussion is on netdev and the kernel mailing list, but 
> > the short version is that -rc3 suspends and resumes for Parag again 
> > (unlike -rc2), but tg3 doesn't appear to resume properly. The generic PCI 
> > layer now does more at resume time (very early, when interrupts are still 
> > off), see
> > 
> >  - pci_pm_resume_noirq ->
> >      pci_pm_default_resume_noirq() ->
> >        pci_restore_standard_config()
> > 
> > for more of the details (basically it always does that 
> > "pci_restore_state()" and tries to bring the device back to PCI_D0).
> 
> Thanks Linus.  I'm looking over the diffs Parag sent and I already see
> some suspicious register settings.  Let me think about this some more
> and then I'll jump into the discussion.

FWIW, I can't reproduce the problem with tg3 on my testbox.  Suspend to RAM
and resume seem to work correctly on it.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- lspci-pre-suspend   2009-01-28 20:35:37.070584068 -0500
+++ lspci-post-suspend  2009-01-28 20:36:56.922471408 -0500
@@ -12,7 +12,7 @@ 
        Capabilities: [50] Vital Product Data <?>
        Capabilities: [58] Vendor Specific Information <?>
        Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/0 Enable+
-               Address: 00000000fee0f00c  Data: 41c9
+               Address: 00000000fee0f00c  Data: 41d1
        Capabilities: [d0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s