Message ID | 4DC27253-67FC-4A55-8C78-7782D9D0CF53@servergy.com (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
On Tue, 2012-06-05 at 23:50 -0400, Ben Collins wrote: > The commit introducing pcibios_io_space_offset() was ignoring 32-bit to > 64-bit sign extention, which is the case on ppc32 with 64-bit resource > addresses. This only seems to have shown up while running under QEMU for > e500mc target. It may or may be suboptimal that QEMU has an IO base > address > 32-bits for the e500-pci implementation, but 1) it's still a > regression and 2) it's more correct to handle things this way. See Bjorn, we both did end up getting it wrong :-) > Signed-off-by: Ben Collins <bcollins@ubuntu.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > arch/powerpc/kernel/pci-common.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c > index 8e78e93..be9ced7 100644 > --- a/arch/powerpc/kernel/pci-common.c > +++ b/arch/powerpc/kernel/pci-common.c > @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) > return pci_enable_resources(dev, mask); > } > > +/* Before assuming too much here, take care to realize that we need sign > + * extension from 32-bit pointers to 64-bit resource addresses to work. > + */ > resource_size_t pcibios_io_space_offset(struct pci_controller *hose) > { > - return (unsigned long) hose->io_base_virt - _IO_BASE; > + long vbase = (long)hose->io_base_virt; > + long io_base = _IO_BASE; > + > + return (resource_size_t)(vbase - io_base); > } > > static void __devinit pcibios_setup_phb_resources(struct pci_controller *hose, struct list_head *resources)
On 06/05/2012 10:50 PM, Ben Collins wrote: > The commit introducing pcibios_io_space_offset() was ignoring 32-bit to > 64-bit sign extention, which is the case on ppc32 with 64-bit resource > addresses. This only seems to have shown up while running under QEMU for > e500mc target. It may or may be suboptimal that QEMU has an IO base > address > 32-bits for the e500-pci implementation, but 1) it's still a > regression and 2) it's more correct to handle things this way. Where do you see addresses over 32 bits in QEMU's e500-pci, at least with current mainline QEMU and the mpc8544ds model? I/O space should be at 0xe1000000. I'm also not sure what this has to do with the virtual address returned by ioremap(). > Signed-off-by: Ben Collins <bcollins@ubuntu.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > --- > arch/powerpc/kernel/pci-common.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c > index 8e78e93..be9ced7 100644 > --- a/arch/powerpc/kernel/pci-common.c > +++ b/arch/powerpc/kernel/pci-common.c > @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) > return pci_enable_resources(dev, mask); > } > > +/* Before assuming too much here, take care to realize that we need sign > + * extension from 32-bit pointers to 64-bit resource addresses to work. > + */ > resource_size_t pcibios_io_space_offset(struct pci_controller *hose) > { > - return (unsigned long) hose->io_base_virt - _IO_BASE; > + long vbase = (long)hose->io_base_virt; > + long io_base = _IO_BASE; > + > + return (resource_size_t)(vbase - io_base); Why do we want sign extension here? If we do want it, there are a lot of other places in this file where the same calculation is done. -Scott
On Wed, 2012-06-06 at 16:15 -0500, Scott Wood wrote: > On 06/05/2012 10:50 PM, Ben Collins wrote: > > The commit introducing pcibios_io_space_offset() was ignoring 32-bit to > > 64-bit sign extention, which is the case on ppc32 with 64-bit resource > > addresses. This only seems to have shown up while running under QEMU for > > e500mc target. It may or may be suboptimal that QEMU has an IO base > > address > 32-bits for the e500-pci implementation, but 1) it's still a > > regression and 2) it's more correct to handle things this way. > > Where do you see addresses over 32 bits in QEMU's e500-pci, at least > with current mainline QEMU and the mpc8544ds model? > > I/O space should be at 0xe1000000. > > I'm also not sure what this has to do with the virtual address returned > by ioremap(). This is due to how we calculate IO offsets on ppc32, see below > > Signed-off-by: Ben Collins <bcollins@ubuntu.com> > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > --- > > arch/powerpc/kernel/pci-common.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c > > index 8e78e93..be9ced7 100644 > > --- a/arch/powerpc/kernel/pci-common.c > > +++ b/arch/powerpc/kernel/pci-common.c > > @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) > > return pci_enable_resources(dev, mask); > > } > > > > +/* Before assuming too much here, take care to realize that we need sign > > + * extension from 32-bit pointers to 64-bit resource addresses to work. > > + */ > > resource_size_t pcibios_io_space_offset(struct pci_controller *hose) > > { > > - return (unsigned long) hose->io_base_virt - _IO_BASE; > > + long vbase = (long)hose->io_base_virt; > > + long io_base = _IO_BASE; > > + > > + return (resource_size_t)(vbase - io_base); > > Why do we want sign extension here? > > If we do want it, there are a lot of other places in this file where the > same calculation is done. We should probably as much as possible factor it, but basically what happens is that to access IO space, we turn: oub(port) into out_8(_IO_BASE + port) With _IO_BASE being a global. Now what happens when you have several PHBs ? Well, we make _IO_BASE be the result of ioremap'ing the IO space window of the first one, minus the bus address corresponding to the beginning of that window. Then for each device, we offset devices with the offset calculated above. Now that means that we can end up with funky arithmetic in a couple of cases: - If the bus address of the IO space is larger than the virtual address returned by ioremap (it's a bit silly to use large IO addresses but it's technically possible, normally IO windows start at 0 bus-side though). In fact I wouldn't be surprised if we have various other bugs if IO windows don't start at 0 (you may want to double check your dts setup here). - If the ioremap'ed address of the IO space of another domain is lower than the ioremap'ed address of the first domain, in which case the calculation: host->io_base_virt - _IO_BASE results in a negative offset. Thus we need to make sure that this offset is fully sign extended so that things work properly when applied to a resource_size_t which can be 64-bit. On ppc64 we do things differently, we have a single linear region that has all IO spaces and _IO_BASE is the beginning of it so offsets are never negative, we can do that because we don't care wasting address space there. Cheers, Ben.
On Jun 6, 2012, at 5:15 PM, Scott Wood wrote: > On 06/05/2012 10:50 PM, Ben Collins wrote: >> The commit introducing pcibios_io_space_offset() was ignoring 32-bit to >> 64-bit sign extention, which is the case on ppc32 with 64-bit resource >> addresses. This only seems to have shown up while running under QEMU for >> e500mc target. It may or may be suboptimal that QEMU has an IO base >> address > 32-bits for the e500-pci implementation, but 1) it's still a >> regression and 2) it's more correct to handle things this way. > > Where do you see addresses over 32 bits in QEMU's e500-pci, at least > with current mainline QEMU and the mpc8544ds model? > > I/O space should be at 0xe1000000. The problem is this: pci_bus 0000:00: root bus resource [io 0xffbed000-0xffbfcfff] (bus address [0x100000000-0x10000ffff]) Without the fix that I sent, it ends up looking like: pci_bus 0000:00: root bus resource [io 0xffbed000-0xffbfcfff] (bus address [0x0000-0xffff]) And that's when some devices fail to be assigned valid bar 0's and the kernel complains because of it. > I'm also not sure what this has to do with the virtual address returned > by ioremap(). > >> Signed-off-by: Ben Collins <bcollins@ubuntu.com> >> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> >> --- >> arch/powerpc/kernel/pci-common.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c >> index 8e78e93..be9ced7 100644 >> --- a/arch/powerpc/kernel/pci-common.c >> +++ b/arch/powerpc/kernel/pci-common.c >> @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) >> return pci_enable_resources(dev, mask); >> } >> >> +/* Before assuming too much here, take care to realize that we need sign >> + * extension from 32-bit pointers to 64-bit resource addresses to work. >> + */ >> resource_size_t pcibios_io_space_offset(struct pci_controller *hose) >> { >> - return (unsigned long) hose->io_base_virt - _IO_BASE; >> + long vbase = (long)hose->io_base_virt; >> + long io_base = _IO_BASE; >> + >> + return (resource_size_t)(vbase - io_base); > > Why do we want sign extension here? > > If we do want it, there are a lot of other places in this file where the > same calculation is done. > > -Scott > -- Ben Collins Servergy, Inc. (757) 243-7557 CONFIDENTIALITY NOTICE: This communication contains privileged and/or confidential information; and should be maintained with the strictest confidence. It is intended solely for the use of the person or entity in which it is addressed. If you are not the intended recipient, you are STRICTLY PROHIBITED from disclosing, copying, distributing or using any of this information. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy.
On 06/06/2012 05:21 PM, Benjamin Herrenschmidt wrote: > Now that means that we can end up with funky arithmetic in a couple of > cases: > > - If the bus address of the IO space is larger than the virtual address > returned by ioremap (it's a bit silly to use large IO addresses but it's > technically possible, normally IO windows start at 0 bus-side though). > In fact I wouldn't be surprised if we have various other bugs if IO > windows don't start at 0 (you may want to double check your dts setup > here). The dts does show the I/O beginning at bus address zero: ranges = <0x2000000 0x0 0xc0000000 0xc0000000 0x0 0x20000000 0x1000000 0x0 0x0 0xe1000000 0x0 0x10000>; > - If the ioremap'ed address of the IO space of another domain is lower > than the ioremap'ed address of the first domain, in which case the > calculation: > > host->io_base_virt - _IO_BASE > > results in a negative offset. There should have been only one PCI domain in the QEMU case. -Scott
On Wed, 2012-06-06 at 19:35 -0400, Ben Collins wrote: > > pci_bus 0000:00: root bus resource [io 0xffbed000-0xffbfcfff] (bus > address [0x100000000-0x10000ffff]) > > Without the fix that I sent, it ends up looking like: > > pci_bus 0000:00: root bus resource [io 0xffbed000-0xffbfcfff] (bus > address [0x0000-0xffff]) > > And that's when some devices fail to be assigned valid bar 0's and the > kernel complains because of it. Note that oddly, the second range of bus addresses looks -more- correct than the first one... Cheers, Ben.
On Jun 7, 2012, at 5:30 AM, Benjamin Herrenschmidt wrote: > On Wed, 2012-06-06 at 19:35 -0400, Ben Collins wrote: >> >> pci_bus 0000:00: root bus resource [io 0xffbed000-0xffbfcfff] (bus >> address [0x100000000-0x10000ffff]) >> >> Without the fix that I sent, it ends up looking like: >> >> pci_bus 0000:00: root bus resource [io 0xffbed000-0xffbfcfff] (bus >> address [0x0000-0xffff]) >> >> And that's when some devices fail to be assigned valid bar 0's and the >> kernel complains because of it. > > Note that oddly, the second range of bus addresses looks -more- correct > than the first one... Except that the first one is exactly the same bus address as my bare metal system: pci_bus 0000:00: root bus resource [io 0xffbeb000-0xffbfafff] (bus address [0x100000000-0x10000ffff]) I only have one PCIe RAID card on the bare metal system. Not surprising I never noticed the problem on it directly. > Cheers, > Ben. > > -- Ben Collins Servergy, Inc. (757) 243-7557 CONFIDENTIALITY NOTICE: This communication contains privileged and/or confidential information; and should be maintained with the strictest confidence. It is intended solely for the use of the person or entity in which it is addressed. If you are not the intended recipient, you are STRICTLY PROHIBITED from disclosing, copying, distributing or using any of this information. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy.
On Thu, 2012-06-07 at 11:38 -0400, Ben Collins wrote: > > Note that oddly, the second range of bus addresses looks -more- > correct > > than the first one... > > Except that the first one is exactly the same bus address as my bare > metal system: > > pci_bus 0000:00: root bus resource [io 0xffbeb000-0xffbfafff] (bus > address [0x100000000-0x10000ffff]) > > I only have one PCIe RAID card on the bare metal system. Not > surprising I never noticed the problem on it directly. Can you show me the device-tree node for that PCI host bridge ? Cheers, Ben.
On Jun 7, 2012, at 5:32 PM, Benjamin Herrenschmidt wrote: > On Thu, 2012-06-07 at 11:38 -0400, Ben Collins wrote: >>> Note that oddly, the second range of bus addresses looks -more- >> correct >>> than the first one... >> >> Except that the first one is exactly the same bus address as my bare >> metal system: >> >> pci_bus 0000:00: root bus resource [io 0xffbeb000-0xffbfafff] (bus >> address [0x100000000-0x10000ffff]) >> >> I only have one PCIe RAID card on the bare metal system. Not >> surprising I never noticed the problem on it directly. > > Can you show me the device-tree node for that PCI host bridge ? It's a p4080ds, so it's in arch/powerpc/boot/ And that means that this bug affects a real hardware platform, so I think it makes it more valid to include it. The only reason it didn't affect me directly is because my only PCIe card doesn't have io, just mem BARs. -- Bluecherry: http://www.bluecherrydvr.com/ SwissDisk : http://www.swissdisk.com/ Ubuntu : http://www.ubuntu.com/ My Blog : http://ben-collins.blogspot.com/
On Fri, 2012-06-08 at 14:38 -0400, Ben Collins wrote: > >> pci_bus 0000:00: root bus resource [io 0xffbeb000-0xffbfafff] (bus > >> address [0x100000000-0x10000ffff]) > >> > >> I only have one PCIe RAID card on the bare metal system. Not > >> surprising I never noticed the problem on it directly. > > > > Can you show me the device-tree node for that PCI host bridge ? > > > It's a p4080ds, so it's in arch/powerpc/boot/ > > And that means that this bug affects a real hardware platform, so I > think it makes it more valid to include it. The only reason it didn't > affect me directly is because my only PCIe card doesn't have io, just > mem BARs. Something doesn't make sense. The bus address printed above are clearly not right, the .dts contains for the "ranges" entries for IO space of all 3 bridges: 0x01000000 0 0x00000000 0xf 0xf8000000 0x0 0x00010000 0x01000000 0 0x00000000 0xf 0xf8010000 0x0 0x00010000 0x01000000 0 0x00000000 0xf 0xf8020000 0x0 0x00010000 So something is wrong with the printing of the bus address, there's a stale top bit (overflow from the 64-bit substraction somewhere ?) Cheers, Ben.
On Tue, Jun 5, 2012 at 11:15 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Tue, 2012-06-05 at 23:50 -0400, Ben Collins wrote: >> The commit introducing pcibios_io_space_offset() was ignoring 32-bit to >> 64-bit sign extention, which is the case on ppc32 with 64-bit resource >> addresses. This only seems to have shown up while running under QEMU for >> e500mc target. It may or may be suboptimal that QEMU has an IO base >> address > 32-bits for the e500-pci implementation, but 1) it's still a >> regression and 2) it's more correct to handle things this way. > > See Bjorn, we both did end up getting it wrong :-) Ooh, sorry about that. Who's going to push this fix? I don't see it in Linus' tree yet. If you want me to, please forward me the original patch. >> Signed-off-by: Ben Collins <bcollins@ubuntu.com> >> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > >> arch/powerpc/kernel/pci-common.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c >> index 8e78e93..be9ced7 100644 >> --- a/arch/powerpc/kernel/pci-common.c >> +++ b/arch/powerpc/kernel/pci-common.c >> @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) >> return pci_enable_resources(dev, mask); >> } >> >> +/* Before assuming too much here, take care to realize that we need sign >> + * extension from 32-bit pointers to 64-bit resource addresses to work. >> + */ >> resource_size_t pcibios_io_space_offset(struct pci_controller *hose) >> { >> - return (unsigned long) hose->io_base_virt - _IO_BASE; >> + long vbase = (long)hose->io_base_virt; >> + long io_base = _IO_BASE; >> + >> + return (resource_size_t)(vbase - io_base); >> } >> >> static void __devinit pcibios_setup_phb_resources(struct pci_controller *hose, struct list_head *resources) > >
On Mon, 2012-06-18 at 09:23 -0600, Bjorn Helgaas wrote: > Ooh, sorry about that. Who's going to push this fix? I don't see it > in Linus' tree yet. If you want me to, please forward me the original > patch. I want to double check something. We are still printing something wrong in the kernel log, so it looks like we have some incorrect carry in some arithmetic happening somewhere. Cheers, Ben.
On Jun 18, 2012, at 3:39 PM, Benjamin Herrenschmidt wrote: > On Mon, 2012-06-18 at 09:23 -0600, Bjorn Helgaas wrote: >> Ooh, sorry about that. Who's going to push this fix? I don't see it >> in Linus' tree yet. If you want me to, please forward me the original >> patch. > > I want to double check something. We are still printing something wrong > in the kernel log, so it looks like we have some incorrect carry in some > arithmetic happening somewhere. Can we at least get this patch in to get things working again? It's still a regression either way. > Cheers, > Ben. > > -- Ben Collins Servergy, Inc. (757) 243-7557 CONFIDENTIALITY NOTICE: This communication contains privileged and/or confidential information; and should be maintained with the strictest confidence. It is intended solely for the use of the person or entity in which it is addressed. If you are not the intended recipient, you are STRICTLY PROHIBITED from disclosing, copying, distributing or using any of this information. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy.
On Mon, 2012-06-18 at 16:04 -0500, Ben Collins wrote: > On Jun 18, 2012, at 3:39 PM, Benjamin Herrenschmidt wrote: > > > On Mon, 2012-06-18 at 09:23 -0600, Bjorn Helgaas wrote: > >> Ooh, sorry about that. Who's going to push this fix? I don't see it > >> in Linus' tree yet. If you want me to, please forward me the original > >> patch. > > > > I want to double check something. We are still printing something wrong > > in the kernel log, so it looks like we have some incorrect carry in some > > arithmetic happening somewhere. > > Can we at least get this patch in to get things working again? It's still a regression either way. Sure, I'm only just back from my sick leave, I haven't had a chance to go through the pending patches yet. Cheers, Ben. > > Cheers, > > Ben. > > > > > > -- > Ben Collins > Servergy, Inc. > (757) 243-7557 > > CONFIDENTIALITY NOTICE: This communication contains privileged and/or confidential information; and should be maintained with the strictest confidence. It is intended solely for the use of the person or entity in which it is addressed. If you are not the intended recipient, you are STRICTLY PROHIBITED from disclosing, copying, distributing or using any of this information. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy.
On Mon, Jun 18, 2012 at 2:39 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Mon, 2012-06-18 at 09:23 -0600, Bjorn Helgaas wrote: >> Ooh, sorry about that. Who's going to push this fix? I don't see it >> in Linus' tree yet. If you want me to, please forward me the original >> patch. > > I want to double check something. We are still printing something wrong > in the kernel log, so it looks like we have some incorrect carry in some > arithmetic happening somewhere. Probably not related to whatever you're seeing, but just FYI, I think the core has a sign-extension bug related to P2P bridge I/O windows: http://marc.info/?l=linux-pci&m=134023801524895&w=2
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 8e78e93..be9ced7 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) return pci_enable_resources(dev, mask); } +/* Before assuming too much here, take care to realize that we need sign + * extension from 32-bit pointers to 64-bit resource addresses to work. + */ resource_size_t pcibios_io_space_offset(struct pci_controller *hose) { - return (unsigned long) hose->io_base_virt - _IO_BASE; + long vbase = (long)hose->io_base_virt; + long io_base = _IO_BASE; + + return (resource_size_t)(vbase - io_base); } static void __devinit pcibios_setup_phb_resources(struct pci_controller *hose, struct list_head *resources)
The commit introducing pcibios_io_space_offset() was ignoring 32-bit to 64-bit sign extention, which is the case on ppc32 with 64-bit resource addresses. This only seems to have shown up while running under QEMU for e500mc target. It may or may be suboptimal that QEMU has an IO base address > 32-bits for the e500-pci implementation, but 1) it's still a regression and 2) it's more correct to handle things this way. Signed-off-by: Ben Collins <bcollins@ubuntu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- arch/powerpc/kernel/pci-common.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)