Message ID | 20110118224718.GA19039@us.ibm.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 18.01.2011 [14:47:18 -0800], Nishanth Aravamudan wrote: > On 18.01.2011 [12:31:52 +1100], Anton Blanchard wrote: > > Hi, > > > > I was testing 2.6.37-git17 on a POWER7 with virtual IO and hit this: > > > > Trying to unpack rootfs image as initramfs... > > Freeing initrd memory: 7446k freed > > vio 30000000: Warning: IOMMU dma not supported: mask > > 0xffffffffffffffff, table unavailable > > vio 4000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > vio 4001: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > vio 4002: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > vio 4004: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > audit: initializing netlink socket (disabled) > > > > Haven't had a chance to look closer yet. > > After debugging a bit, this would appear to be due to the second hunk of > b3c73856ae47d43d0d181f9de1c1c6c0820c4515. > > diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c > index b265405..1b695fd 100644 > --- a/arch/powerpc/kernel/vio.c > +++ b/arch/powerpc/kernel/vio.c > @@ -1257,6 +1257,10 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) > viodev->dev.parent = &vio_bus_device.dev; > viodev->dev.bus = &vio_bus_type; > viodev->dev.release = vio_dev_release; > + /* needed to ensure proper operation of coherent allocations > + * later, in case driver doesn't set it explicitly */ > + dma_set_mask(&viodev->dev, DMA_BIT_MASK(64)); > + dma_set_coherent_mask(&viodev->dev, DMA_BIT_MASK(64)); > > /* register with generic device framework */ > if (device_register(&viodev->dev)) { > > Milton, Sonny, any thoughts? A bit more detail after trying a few more kernels on the box that originally showed the error: 1) This doesn't actually prevent booting, afaict. I think it "just" disables DMA, which is bad, but not a boot fail, technically. 2) Reverting the above commit definitely prevents those messages. 3) I'm seeing a separate issue with 2.6.37-git17 (that's not present in 2.6.37): sd 0:4:2:0: [sda] Aborting command: 2A sd 0:4:2:0: Abort timed out. Resetting bus. At which point the box locks up :) So testing fixes is a bit of a challenge right now. Ben, if you're ok with waiting to see if Milton or Sonny have any ideas, I'd like to hold off on asking for a revert. In the case they do, I'll be able to test and send out any proposed fix rapidly. Thanks, Nish
On Tue, 2011-01-18 at 14:47 -0800, Nishanth Aravamudan wrote: > On 18.01.2011 [12:31:52 +1100], Anton Blanchard wrote: > > Hi, > > > > I was testing 2.6.37-git17 on a POWER7 with virtual IO and hit this: > > > > Trying to unpack rootfs image as initramfs... > > Freeing initrd memory: 7446k freed > > vio 30000000: Warning: IOMMU dma not supported: mask > > 0xffffffffffffffff, table unavailable > > vio 4000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > vio 4001: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > vio 4002: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > vio 4004: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > table unavailable > > audit: initializing netlink socket (disabled) > > > > Haven't had a chance to look closer yet. Well, this causes messages for vdevices that don't do DMA at all (such as vterm etc...) and don't have the necessary properties. However, it didn't -break- anything for me in my tests so far, just spurrious messages. Not sure what's up with Anton's setup. Anton, can you hack the printk to display the OF path to the device so we see what devices are complaining ? It could be a different issue that prevents booting. Cheers, Ben. > After debugging a bit, this would appear to be due to the second hunk of > b3c73856ae47d43d0d181f9de1c1c6c0820c4515. > > diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c > index b265405..1b695fd 100644 > --- a/arch/powerpc/kernel/vio.c > +++ b/arch/powerpc/kernel/vio.c > @@ -1257,6 +1257,10 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) > viodev->dev.parent = &vio_bus_device.dev; > viodev->dev.bus = &vio_bus_type; > viodev->dev.release = vio_dev_release; > + /* needed to ensure proper operation of coherent allocations > + * later, in case driver doesn't set it explicitly */ > + dma_set_mask(&viodev->dev, DMA_BIT_MASK(64)); > + dma_set_coherent_mask(&viodev->dev, DMA_BIT_MASK(64)); > > /* register with generic device framework */ > if (device_register(&viodev->dev)) { > > Milton, Sonny, any thoughts? > > Thanks, > Nish >
On 19.01.2011 [15:06:20 +1100], Benjamin Herrenschmidt wrote: > On Tue, 2011-01-18 at 14:47 -0800, Nishanth Aravamudan wrote: > > On 18.01.2011 [12:31:52 +1100], Anton Blanchard wrote: > > > Hi, > > > > > > I was testing 2.6.37-git17 on a POWER7 with virtual IO and hit this: > > > > > > Trying to unpack rootfs image as initramfs... > > > Freeing initrd memory: 7446k freed > > > vio 30000000: Warning: IOMMU dma not supported: mask > > > 0xffffffffffffffff, table unavailable > > > vio 4000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > > table unavailable > > > vio 4001: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > > table unavailable > > > vio 4002: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > > table unavailable > > > vio 4004: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, > > > table unavailable > > > audit: initializing netlink socket (disabled) > > > > > > Haven't had a chance to look closer yet. > > Well, this causes messages for vdevices that don't do DMA at all (such > as vterm etc...) and don't have the necessary properties. However, it > didn't -break- anything for me in my tests so far, just spurrious > messages. Not sure what's up with Anton's setup. Anton, can you hack the > printk to display the OF path to the device so we see what devices are > complaining ? It could be a different issue that prevents booting. Is this what you were looking for? vio 30000000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable vio 30000000: Path: /vdevice/vty@30000000 vio 4000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable vio 4000: Path: /vdevice/IBM,sp@4000 vio 4001: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable vio 4001: Path: /vdevice/rtc@4001 vio 4002: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable vio 4002: Path: /vdevice/nvram@4002 vio 4004: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable vio 4004: Path: /vdevice/gscsi@4004 FWIW, I looked at Anton's logs, and I don't think the boot failed, per se. I think it may have timed out (but not positive on that). I was able to boot 2.6.27-git17 on the exact same box, albeit it locks up at a later point (the sd abort I e-mailed about in a follow-up). > > Cheers, > Ben. > > > After debugging a bit, this would appear to be due to the second hunk of > > b3c73856ae47d43d0d181f9de1c1c6c0820c4515. > > > > diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c > > index b265405..1b695fd 100644 > > --- a/arch/powerpc/kernel/vio.c > > +++ b/arch/powerpc/kernel/vio.c > > @@ -1257,6 +1257,10 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) > > viodev->dev.parent = &vio_bus_device.dev; > > viodev->dev.bus = &vio_bus_type; > > viodev->dev.release = vio_dev_release; > > + /* needed to ensure proper operation of coherent allocations > > + * later, in case driver doesn't set it explicitly */ > > + dma_set_mask(&viodev->dev, DMA_BIT_MASK(64)); > > + dma_set_coherent_mask(&viodev->dev, DMA_BIT_MASK(64)); > > > > /* register with generic device framework */ > > if (device_register(&viodev->dev)) { > > > > Milton, Sonny, any thoughts? > > > > Thanks, > > Nish > > > >
On Tue, 2011-01-18 at 20:37 -0800, Nishanth Aravamudan wrote: > Is this what you were looking for? > > vio 30000000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable > vio 30000000: Path: /vdevice/vty@30000000 > vio 4000: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable > vio 4000: Path: /vdevice/IBM,sp@4000 > vio 4001: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable > vio 4001: Path: /vdevice/rtc@4001 > vio 4002: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable > vio 4002: Path: /vdevice/nvram@4002 > vio 4004: Warning: IOMMU dma not supported: mask 0xffffffffffffffff, table unavailable > vio 4004: Path: /vdevice/gscsi@4004 Ok, so they are all harmess (none of those device do DMA, appart maybe gscsi, I have no idea what it is :-) > FWIW, I looked at Anton's logs, and I don't think the boot failed, per > se. I think it may have timed out (but not positive on that). I was able > to boot 2.6.27-git17 on the exact same box, albeit it locks up at a > later point (the sd abort I e-mailed about in a follow-up). I haven't seen your email. I'll dig. Have to run now. Cheers, Ben. > > > > > > Cheers, > > Ben. > > > > > After debugging a bit, this would appear to be due to the second hunk of > > > b3c73856ae47d43d0d181f9de1c1c6c0820c4515. > > > > > > diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c > > > index b265405..1b695fd 100644 > > > --- a/arch/powerpc/kernel/vio.c > > > +++ b/arch/powerpc/kernel/vio.c > > > @@ -1257,6 +1257,10 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) > > > viodev->dev.parent = &vio_bus_device.dev; > > > viodev->dev.bus = &vio_bus_type; > > > viodev->dev.release = vio_dev_release; > > > + /* needed to ensure proper operation of coherent allocations > > > + * later, in case driver doesn't set it explicitly */ > > > + dma_set_mask(&viodev->dev, DMA_BIT_MASK(64)); > > > + dma_set_coherent_mask(&viodev->dev, DMA_BIT_MASK(64)); > > > > > > /* register with generic device framework */ > > > if (device_register(&viodev->dev)) { > > > > > > Milton, Sonny, any thoughts? > > > > > > Thanks, > > > Nish > > > > > > > >
On Tue, 2011-01-18 at 16:48 -0800, Nishanth Aravamudan wrote: > > Ben, if you're ok with waiting to see if Milton or Sonny have any > ideas, > I'd like to hold off on asking for a revert. In the case they do, I'll > be able to test and send out any proposed fix rapidly. I don't believe this specific error is causing the lockup, I think we only hit a spurrious message on devices that don't have DMA capabilities in the first place. (But I may be wrong, I'll wait for you guys to dig more or I'll have a look myself tomorrow if I manage to get out of meetings). So there's another problem with SCSI tho it -could- also be a DMA issue, hard to tell at this point. BTW. I'm not too happy with those defaults set to 64-bit. Probably not an issue until your other patches go in, but some devices like veth cannot do 64-bit DMA. I think we should default to 32-bit in the VIO base code and explicitely enable 64-bit DMA from drivers that support it (in theory vscsi but I haven't verified the implementation). Cheers, Ben.
On 19.01.2011 [17:06:18 +1100], Benjamin Herrenschmidt wrote: > On Tue, 2011-01-18 at 16:48 -0800, Nishanth Aravamudan wrote: > > > > Ben, if you're ok with waiting to see if Milton or Sonny have any > > ideas, > > I'd like to hold off on asking for a revert. In the case they do, I'll > > be able to test and send out any proposed fix rapidly. > > I don't believe this specific error is causing the lockup, I think we > only hit a spurrious message on devices that don't have DMA > capabilities in the first place. (But I may be wrong, I'll wait for > you guys to dig more or I'll have a look myself tomorrow if I manage > to get out of meetings). Yes, this seems accurate. Like I mentioned elsewhere, this box came up ok even with these messages and seemed ok (up until the disk locked up). > So there's another problem with SCSI tho it -could- also be a DMA issue, > hard to tell at this point. Right, I'm not sure how to determine that. I did see the lockup, though, with both my patches reverted (the patches for vio, I mean, after 2.6.37) > BTW. I'm not too happy with those defaults set to 64-bit. Probably not > an issue until your other patches go in, but some devices like veth > cannot do 64-bit DMA. I think we should default to 32-bit in the VIO > base code and explicitely enable 64-bit DMA from drivers that support it > (in theory vscsi but I haven't verified the implementation). Ok, so change the bit-mask to 32-bit? Or would it be appropriate to attempt 64-bit, if it fails fallback to 32-bit? Seems to be a common pattern throughout the DMA bit-setting callers. Thanks, Nish
Hi, > FWIW, I looked at Anton's logs, and I don't think the boot failed, per > se. I think it may have timed out (but not positive on that). I was > able to boot 2.6.27-git17 on the exact same box, albeit it locks up > at a later point (the sd abort I e-mailed about in a follow-up). This fail bisects down to the VPHN (shared processor affinity) patch. I've got some fixes on the way. Anton
diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c index b265405..1b695fd 100644 --- a/arch/powerpc/kernel/vio.c +++ b/arch/powerpc/kernel/vio.c @@ -1257,6 +1257,10 @@ struct vio_dev *vio_register_device_node(struct device_node *of_node) viodev->dev.parent = &vio_bus_device.dev; viodev->dev.bus = &vio_bus_type; viodev->dev.release = vio_dev_release; + /* needed to ensure proper operation of coherent allocations + * later, in case driver doesn't set it explicitly */ + dma_set_mask(&viodev->dev, DMA_BIT_MASK(64)); + dma_set_coherent_mask(&viodev->dev, DMA_BIT_MASK(64)); /* register with generic device framework */ if (device_register(&viodev->dev)) {