diff mbox

3.0-rc1: powerpc hangs at Kernel virtual memory layout

Message ID 1306983467.29297.51.camel@pasglop (mailing list archive)
State Not Applicable
Headers show

Commit Message

Benjamin Herrenschmidt June 2, 2011, 2:57 a.m. UTC
On Wed, 2011-06-01 at 17:16 -0700, Christian Kujau wrote:
> On Tue, 31 May 2011 at 16:50, Christian Kujau wrote:
> > trying to boot 3.0-rc1 on powerpc32 only progresses until:
> > 
> >   > Kernel virtual memory layout:
> >   >   * 0xfffcf000..0xfffff000  : fixmap
> 
> After hours (and hours!) of git-bisecting, it said:
> 
> -----------------------
> ccc7c28af205888798b51b6cbc0b557ac1170a49 is the first bad commit
> commit ccc7c28af205888798b51b6cbc0b557ac1170a49
> Author: Rafał Miłecki <zajec5@gmail.com>
> Date:   Fri Apr 1 13:26:52 2011 +0200
> 
>     ssb: pci: implement serdes workaround
>     
>     Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
>     Signed-off-by: John W. Linville <linville@tuxdriver.com>
> -----------------------

Ok, thanks a lot, It looks rather trivial actually: That new workaround
is PCIe specific but is called unconditionally, and will do bad things
non-PCIe implementations.

John, care to send the patch below to Linus ASAP ? I could reproduce and
verify it fixes it. Thanks !

ssb: pci: Don't call PCIe specific workarounds on PCI cores

Otherwise it can/will crash....

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Comments

Christian Kujau June 2, 2011, 3:06 a.m. UTC | #1
On Thu, 2 Jun 2011 at 12:57, Benjamin Herrenschmidt wrote:
> Ok, thanks a lot, It looks rather trivial actually: That new workaround
> is PCIe specific but is called unconditionally, and will do bad things
> non-PCIe implementations.

Indeed. This PowerBook G4 does not has PCIe, yet the whole SSB thingy gets 
enabled in my .config somehow. Thanks for the quick fix, I tried to revert 
ccc7c28af2... from Linus' current tree, but I had to rip out some more to 
make it compile.

I'll try your fix in a minute and get back to you with those cdrom init 
problems as well.

Thanks,
Christian.
Christian Kujau June 2, 2011, 4:27 a.m. UTC | #2
On Thu, 2 Jun 2011 at 12:57, Benjamin Herrenschmidt wrote:
> Ok, thanks a lot, It looks rather trivial actually: That new workaround
> is PCIe specific but is called unconditionally, and will do bad things
> non-PCIe implementations.

OK, with your patch applied to Linus' latest git tree the machine 
continues to boot. Also, with the latest tree, the "machine is stuck after 
ide-cd init" problem[0] went away.

For this particular problem and patch, feel free to add:

Tested-by: Christian Kujau <lists@nerdbynature.de>

However, shortly after boot and loggin in to the box remotely, the bux did 
not respond any more. I'm not sure if these are related to those SSB/PCIe 
changes, but somehow I hope they are - bisecting those would take much 
longer, as it's not an "instant" death:

 * http://nerdbynature.de/bits/3.0-rc1/linux-3.0-rc1_stuck1.jpg
 * http://nerdbynature.de/bits/3.0-rc1/linux-3.0-rc1_stuck2.jpg

This is what an OCR program made of it:

irq euent stamp: 185804850
hardirqs last enabled at (185904849): [<c04005b0>] _raw_spin_unlock_irqrestore+0x40/0x?e
hardirqs last disabled at (185904850): [<c00120b8>] reenable_mmu+0x24/0x78
Softirqs last enabled at (185892414): [<c000fe8c>] call_do_softirq+0x14/0x24
softirqs last disabled at (18589240?): [<c000fe8c>] call_do_softirq+0x14/0x24
NIP: e04005b4 LR: e04005b0 CTR: 00000000
REGS: ef92be10 TRHP: 0901 Not tainted (3.0.0-rel-00049-g1fa?b6a-dirtg)
MSB: 00009032 <EE.ME.IR.DR> CR: 42002084
TRSK = ef8d0000[38B] ’kuorker/0:2’ THREAD:
GPR00: c04005b0 ef92bec0 efBd0000 00000001
GPR08: 00000000 0b14aed0 0049a306 00030600
HIP [c01005b1] _rau_spin_unlock_irqrestore+0x44/0x?c
LR [c04005b0] _rau_spin_unlock_irqrestore+0x40/0x?c
Call Trace:
[ef92bec0] [c04005b0] _raw_spin_unlock_irqrestore+0x40/0x?c (unreliable)
[ef92bed0] [c029c504] flush_tu_ldisc+0x121/0x230
[ef92bf10] [c001c86c] process_one_uork+0x1c1/0x4cB
[ef92bfS0] [c004efac] worker_thread+0x1?8/0x3c1
[ef92bf90] [c0051148] kthread+0x81/0x88
[ef92hff0] [c0810390] kernel_thread+0x1c/0x68

XER: 20000000
ef92a000 ef8d0660 00000006 00000000 18614000 22002088
Instruction dump:
??? 93e1060c ?c9f23?B 38800001 90010011 4bc6e9a9 ?fc3i`3?8 4be61a69
?3e08080 11820021 1bc6b515 ?fe00124
B8c16008 ?c0803a6 83c1000c

Well, the picture is way better :-\

Thanks,
Christian.

[0] http://nerdbynature.de/bits/3.0-rc1/linux-3.0-rc1-cdrom.jpg
Benjamin Herrenschmidt June 2, 2011, 7:33 a.m. UTC | #3
On Wed, 2011-06-01 at 21:27 -0700, Christian Kujau wrote:
> On Thu, 2 Jun 2011 at 12:57, Benjamin Herrenschmidt wrote:
> > Ok, thanks a lot, It looks rather trivial actually: That new workaround
> > is PCIe specific but is called unconditionally, and will do bad things
> > non-PCIe implementations.
> 
> OK, with your patch applied to Linus' latest git tree the machine 
> continues to boot. Also, with the latest tree, the "machine is stuck after 
> ide-cd init" problem[0] went away.
> 
> For this particular problem and patch, feel free to add:
> 
> Tested-by: Christian Kujau <lists@nerdbynature.de>
> 
> However, shortly after boot and loggin in to the box remotely, the bux did 
> not respond any more. I'm not sure if these are related to those SSB/PCIe 
> changes, but somehow I hope they are - bisecting those would take much 
> longer, as it's not an "instant" death:
> 
>  * http://nerdbynature.de/bits/3.0-rc1/linux-3.0-rc1_stuck1.jpg
>  * http://nerdbynature.de/bits/3.0-rc1/linux-3.0-rc1_stuck2.jpg
> 
> This is what an OCR program made of it:

I think this is another problem that I'm in the middle of trying to
figure out.

It -looks- to me that something goes wrong in the tty code when a large
file is piped through a pty, causing the kernel to hang for minutes in
the workqueue / ldisk flush code. I've just sent an initial report to
Alan Cox about it and am currently bisecting it.

Cheers,
Ben.

> irq euent stamp: 185804850
> hardirqs last enabled at (185904849): [<c04005b0>] _raw_spin_unlock_irqrestore+0x40/0x?e
> hardirqs last disabled at (185904850): [<c00120b8>] reenable_mmu+0x24/0x78
> Softirqs last enabled at (185892414): [<c000fe8c>] call_do_softirq+0x14/0x24
> softirqs last disabled at (18589240?): [<c000fe8c>] call_do_softirq+0x14/0x24
> NIP: e04005b4 LR: e04005b0 CTR: 00000000
> REGS: ef92be10 TRHP: 0901 Not tainted (3.0.0-rel-00049-g1fa?b6a-dirtg)
> MSB: 00009032 <EE.ME.IR.DR> CR: 42002084
> TRSK = ef8d0000[38B] ’kuorker/0:2’ THREAD:
> GPR00: c04005b0 ef92bec0 efBd0000 00000001
> GPR08: 00000000 0b14aed0 0049a306 00030600
> HIP [c01005b1] _rau_spin_unlock_irqrestore+0x44/0x?c
> LR [c04005b0] _rau_spin_unlock_irqrestore+0x40/0x?c
> Call Trace:
> [ef92bec0] [c04005b0] _raw_spin_unlock_irqrestore+0x40/0x?c (unreliable)
> [ef92bed0] [c029c504] flush_tu_ldisc+0x121/0x230
> [ef92bf10] [c001c86c] process_one_uork+0x1c1/0x4cB
> [ef92bfS0] [c004efac] worker_thread+0x1?8/0x3c1
> [ef92bf90] [c0051148] kthread+0x81/0x88
> [ef92hff0] [c0810390] kernel_thread+0x1c/0x68
> 
> XER: 20000000
> ef92a000 ef8d0660 00000006 00000000 18614000 22002088
> Instruction dump:
> ??? 93e1060c ?c9f23?B 38800001 90010011 4bc6e9a9 ?fc3i`3?8 4be61a69
> ?3e08080 11820021 1bc6b515 ?fe00124
> B8c16008 ?c0803a6 83c1000c
> 
> Well, the picture is way better :-\
> 
> Thanks,
> Christian.
> 
> [0] http://nerdbynature.de/bits/3.0-rc1/linux-3.0-rc1-cdrom.jpg
Christian Kujau June 6, 2011, 2:11 a.m. UTC | #4
On Thu, 2 Jun 2011 at 17:33, Benjamin Herrenschmidt wrote:
> It -looks- to me that something goes wrong in the tty code when a large
> file is piped through a pty, causing the kernel to hang for minutes in
> the workqueue / ldisk flush code. I've just sent an initial report to
> Alan Cox about it and am currently bisecting it.

This was the "tty vs workqueue oddities" thread, right? FWIW, 
55db4c64eddf37 ("Revert "tty: make receive_buf() return the amout of bytes 
received"") seems to have fixed it on this powerpc machine as well.

With your "ssb: pci: Don't call PCIe specific workarounds on PCI cores" 
patch applied, powerpc32 seems to be quite happy with 3.0-rc1+

Thanks,
Christian.
Benjamin Herrenschmidt June 6, 2011, 3:46 a.m. UTC | #5
On Sun, 2011-06-05 at 19:11 -0700, Christian Kujau wrote:
> On Thu, 2 Jun 2011 at 17:33, Benjamin Herrenschmidt wrote:
> > It -looks- to me that something goes wrong in the tty code when a large
> > file is piped through a pty, causing the kernel to hang for minutes in
> > the workqueue / ldisk flush code. I've just sent an initial report to
> > Alan Cox about it and am currently bisecting it.
> 
> This was the "tty vs workqueue oddities" thread, right? FWIW, 
> 55db4c64eddf37 ("Revert "tty: make receive_buf() return the amout of bytes 
> received"") seems to have fixed it on this powerpc machine as well.

Yup.

> With your "ssb: pci: Don't call PCIe specific workarounds on PCI cores" 
> patch applied, powerpc32 seems to be quite happy with 3.0-rc1+

Good :-)

Cheers,
Ben.
Christian Kujau June 10, 2011, 10:54 p.m. UTC | #6
On Thu, 2 Jun 2011 at 12:57, Benjamin Herrenschmidt wrote:
> John, care to send the patch below to Linus ASAP ? I could reproduce and
> verify it fixes it. Thanks !
> 
> ssb: pci: Don't call PCIe specific workarounds on PCI cores
> 
> Otherwise it can/will crash....

The patch did not make it into -rc2, it's not in today's git tree either, 
AFAICS. Can anyone push this, please?

Thanks,
Christian.

> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> 
> diff --git a/drivers/ssb/driver_pcicore.c b/drivers/ssb/driver_pcicore.c
> index 82feb34..eddf1b9 100644
> --- a/drivers/ssb/driver_pcicore.c
> +++ b/drivers/ssb/driver_pcicore.c
> @@ -540,7 +540,8 @@ void ssb_pcicore_init(struct ssb_pcicore *pc)
>  		ssb_pcicore_init_clientmode(pc);
>  
>  	/* Additional always once-executed workarounds */
> -	ssb_pcicore_serdes_workaround(pc);
> +	if (dev->id.coreid == SSB_DEV_PCIE)
> +		ssb_pcicore_serdes_workaround(pc);
>  	/* TODO: ASPM */
>  	/* TODO: Clock Request Update */
>  }
>
Rafał Miłecki June 10, 2011, 10:59 p.m. UTC | #7
2011/6/11 Christian Kujau <lists@nerdbynature.de>:
> On Thu, 2 Jun 2011 at 12:57, Benjamin Herrenschmidt wrote:
>> John, care to send the patch below to Linus ASAP ? I could reproduce and
>> verify it fixes it. Thanks !
>>
>> ssb: pci: Don't call PCIe specific workarounds on PCI cores
>>
>> Otherwise it can/will crash....
>
> The patch did not make it into -rc2, it's not in today's git tree either,
> AFAICS. Can anyone push this, please?

Yeah, I noticed it wasn't in the pull for rc2. I pinged John, he told
me to just wait.

Patch was taken with the recent pull, it should go into rc3.
diff mbox

Patch

diff --git a/drivers/ssb/driver_pcicore.c b/drivers/ssb/driver_pcicore.c
index 82feb34..eddf1b9 100644
--- a/drivers/ssb/driver_pcicore.c
+++ b/drivers/ssb/driver_pcicore.c
@@ -540,7 +540,8 @@  void ssb_pcicore_init(struct ssb_pcicore *pc)
 		ssb_pcicore_init_clientmode(pc);
 
 	/* Additional always once-executed workarounds */
-	ssb_pcicore_serdes_workaround(pc);
+	if (dev->id.coreid == SSB_DEV_PCIE)
+		ssb_pcicore_serdes_workaround(pc);
 	/* TODO: ASPM */
 	/* TODO: Clock Request Update */
 }