diff mbox

Aw: A500 boot crash in 4.2.0-rc3-00246-g763e326

Message ID 55D38BE6.6080302@gmx.de
State Not Applicable
Headers show

Commit Message

Helge Deller Aug. 18, 2015, 7:47 p.m. UTC
On 18.08.2015 20:44, Meelis Roos wrote:
> (CC-s added)
>
>>>> Tried 4.2.0-rc3-00246-g763e326 on A500 but it crashes on boot. This is
>>>> still present in todays 4.2.0-rc6+git. 4.1 was fine, I Will bisect but
>>>> it takes time.
>>>>
>>>> PDC Stable Storage facility v0.30
>>>> STI GSC/PCI core graphics driver Version 0.9b
>>>> Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
>>>> serial 0000:00:04.0: enabling device (0146 -> 0147)
>>>> console [ttyS0] disabled
>>>> 0000:00:04.0: ttyS0 at MMIO 0xfffffffff8000000 (irq = 21, base_baud = 115200) is a 16550A
>>>> console [ttyS0] enabled
>>>> console [ttyS0] enabled
>>>> bootconsole [ttyB0] disabled
>>>> bootconsole [ttyB0] disabled
>>>> 0000:00:04.0: ttyS1 at MMIO 0xfffffffff8000008 (irq = 21, base_baud = 115200) is a 16550A
>>>> 0000:00:04.0: ttyS2 at MMIO 0xfffffffff8000010 (irq = 21, base_baud = 115200) is a 16550A
>>>> serial 0000:00:05.0: enabling device (0000 -> 0003)
>>>> serial 0000:00:05.0: enabling SERR and PARITY (0003 -> 0143)
>>>> 0000:00:05.0: ttyS3 at MMIO 0xfffffffff8003000 (irq = 22, base_baud = 115200) is a 16550A
>>>> serial 0000:00:05.0: Couldn't register serial port 0, irq 22, type 2, error -28
>>>> sym53c8xx 0000:00:01.0: enabling device (0000 -> 0003)
>>>> sym53c8xx 0000:00:01.0: enabling SERR and PARITY (0003 -> 0143)
>>>
>>> I'm seeing the same problem on my rp5470 with Kernel 4.2-rc7. My machine just hangs though and doesn't crash.
>>
>> I did a bisect and commit 3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d ("PCI: Add pci_bus_addr_t") seems to be the culprit:
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d
>>
>> Since the patch has "CC: stable@vger.kernel.org  # v3.19+", it might be the reason why you see kernel 3.19 failing too...
>
> I just found the same commit breaking my A500 parisc machine by
> bisecting.
>
> CC: patch author and linux-pci.

I think this is the problem:



but the real problem is probably, that the sym53c8xx driver or maybe the parisc PCI core code isn't 64bit clean?

I did some more debugging, and on parisc the first hang happens in function sym_check_raid() [called from sym2_probe()]
indrivers/scsi/sym53c8xx_2/sym_glue.c while trying to call readl():
   ram_val = readl(device->s.ramaddr + ram_size - 16);

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yinghai Lu Aug. 18, 2015, 9:24 p.m. UTC | #1
On Tue, Aug 18, 2015 at 12:47 PM, Helge Deller <deller@gmx.de> wrote:
> On 18.08.2015 20:44, Meelis Roos wrote:
>>> I did a bisect and commit 3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d ("PCI:
>>> Add pci_bus_addr_t") seems to be the culprit:
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d
>>
>> CC: patch author and linux-pci.
>
>
> I think this is the problem:
>
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -1,6 +1,10 @@
>  #
>  # PCI configuration
>  #
> +config PCI_BUS_ADDR_T_64BIT
> +       def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
> +       depends on PCI

Thanks for bisecting.

Then we should change to

config PCI_BUS_ADDR_T_64BIT
       def_bool y if (ARCH_DMA_ADDR_T_64BIT || SPARC64)
       depends on PCI
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Meelis Roos Aug. 19, 2015, 4:48 a.m. UTC | #2
> On Tue, Aug 18, 2015 at 12:47 PM, Helge Deller <deller@gmx.de> wrote:
> > On 18.08.2015 20:44, Meelis Roos wrote:
> >>> I did a bisect and commit 3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d ("PCI:
> >>> Add pci_bus_addr_t") seems to be the culprit:
> >>>
> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d
> >>
> >> CC: patch author and linux-pci.
> >
> >
> > I think this is the problem:
> >
> > --- a/drivers/pci/Kconfig
> > +++ b/drivers/pci/Kconfig
> > @@ -1,6 +1,10 @@
> >  #
> >  # PCI configuration
> >  #
> > +config PCI_BUS_ADDR_T_64BIT
> > +       def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
> > +       depends on PCI
> 
> Thanks for bisecting.
> 
> Then we should change to
> 
> config PCI_BUS_ADDR_T_64BIT
>        def_bool y if (ARCH_DMA_ADDR_T_64BIT || SPARC64)
>        depends on PCI

Why SPARC64? The problem happened on parisc.
Yinghai Lu Aug. 19, 2015, 5:30 a.m. UTC | #3
On Tue, Aug 18, 2015 at 9:48 PM, Meelis Roos <mroos@linux.ee> wrote:
>> On Tue, Aug 18, 2015 at 12:47 PM, Helge Deller <deller@gmx.de> wrote:
>>
>> Then we should change to
>>
>> config PCI_BUS_ADDR_T_64BIT
>>        def_bool y if (ARCH_DMA_ADDR_T_64BIT || SPARC64)
>>        depends on PCI
>
> Why SPARC64? The problem happened on parisc.
>

so will not set PCI_BUS_ADDR_T_64BIT for parisc.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Meelis Roos Aug. 19, 2015, 10:40 a.m. UTC | #4
> >>> I did a bisect and commit 3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d ("PCI:
> >>> Add pci_bus_addr_t") seems to be the culprit:
> >>>
> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3a9ad0b4fdcd57f775d3615004c8c64c021a9e7d
> >>
> >> CC: patch author and linux-pci.
> >
> >
> > I think this is the problem:
> >
> > --- a/drivers/pci/Kconfig
> > +++ b/drivers/pci/Kconfig
> > @@ -1,6 +1,10 @@
> >  #
> >  # PCI configuration
> >  #
> > +config PCI_BUS_ADDR_T_64BIT
> > +       def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
> > +       depends on PCI
> 
> Thanks for bisecting.
> 
> Then we should change to
> 
> config PCI_BUS_ADDR_T_64BIT
>        def_bool y if (ARCH_DMA_ADDR_T_64BIT || SPARC64)
>        depends on PCI

This works for my A500.
John David Anglin Aug. 19, 2015, 11:25 a.m. UTC | #5
On 2015-08-19, at 1:30 AM, Yinghai Lu wrote:

> On Tue, Aug 18, 2015 at 9:48 PM, Meelis Roos <mroos@linux.ee> wrote:
>>> On Tue, Aug 18, 2015 at 12:47 PM, Helge Deller <deller@gmx.de> wrote:
>>> 
>>> Then we should change to
>>> 
>>> config PCI_BUS_ADDR_T_64BIT
>>>       def_bool y if (ARCH_DMA_ADDR_T_64BIT || SPARC64)
>>>       depends on PCI
>> 
>> Why SPARC64? The problem happened on parisc.
>> 
> 
> so will not set PCI_BUS_ADDR_T_64BIT for parisc.

I'm not sure this is optimal.  While the A500 may only have 32-bit PCI, c8000 appears to have a mix
of 32 and 64-bit, and rp34XX is all 64-bit.

Dave
--
John David Anglin	dave.anglin@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller Aug. 19, 2015, 2:06 p.m. UTC | #6
> On 2015-08-19, at 1:30 AM, Yinghai Lu wrote:
> > On Tue, Aug 18, 2015 at 9:48 PM, Meelis Roos <mroos@linux.ee> wrote:
> >>> On Tue, Aug 18, 2015 at 12:47 PM, Helge Deller <deller@gmx.de> wrote:
> >>> 
> >>> Then we should change to
> >>> 
> >>> config PCI_BUS_ADDR_T_64BIT
> >>>       def_bool y if (ARCH_DMA_ADDR_T_64BIT || SPARC64)
> >>>       depends on PCI
> >> 
> >> Why SPARC64? The problem happened on parisc.
> > so will not set PCI_BUS_ADDR_T_64BIT for parisc.

I think given the current time frame it's probably the best approach to fix this problem for kernel 4.2.
It reverts the behaviour back to how it was before (for all arches beside SPARC64).
I'm still wondering if/why parisc is the only arch (in the sym53c8xx driver only!) which broke by this change...

> I'm not sure this is optimal.  While the A500 may only have 32-bit PCI, c8000 appears to have a mix
> of 32 and 64-bit, and rp34XX is all 64-bit.

True, but probably nobody of us noticed that we only used the 32-bit PCI interface even with 64bit kernel on parisc up to now?
pci_bus_alloc_resource() in drivers/pci/bus.c just disabled (flag=0) all 64bit resources for us.
But I agree, I think we need to fix drivers/parisc/lba_pci.c to correctly cope with 64bit pci addresses.

In the meantime I did some more debugging on sym_iomap_device() in drivers/scsi/sym53c8xx_2/sym_glue.c:
   pcibios_resource_to_bus(pdev->bus, &bus_addr,
                          &pdev->resource[i]);
   ram_base = bus_addr.start;

with (working) 32bit PCI addresses I get:
ressource = [mem 0xffffffff80002000-0xffffffff80003fff 64bit]
and pcibios_resource_to_bus() returning: ram_base = 0x80002000 

and with (failing) 64bit PCI addresses I get:
ressource = [mem 0xfffffff004000000-0xfffffff004001fff 64bit]
and pcibios_resource_to_bus() returning: ram_base = 0xfffffff004000000 

It seems the resource window doesn't get initialized correctly for 64bit...

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -1,6 +1,10 @@ 
  #
  # PCI configuration
  #
+config PCI_BUS_ADDR_T_64BIT
+       def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
+       depends on PCI


CONFIG_PCI_BUS_ADDR_T_64BIT gets now defined on all 64bit arches.
Then if CONFIG_PCI_BUS_ADDR_T_64BIT is set, in pci_bus_alloc_resource()
64bit address spaces (IORESOURCE_MEM_64) will be enabled which weren't enabled before.

This trivial/temporary hack fixes the problem:

--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -200,7 +200,7 @@  int pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
                                           resource_size_t),
                 void *alignf_data)
  {
-#ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
+#if defined(CONFIG_PCI_BUS_ADDR_T_64BIT) && !defined(CONFIG_PARISC)
         int rc;
  
         if (res->flags & IORESOURCE_MEM_64) {