Message ID | 1465563831-6565-1-git-send-email-mauricfo@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Fri, 2016-06-10 at 10:03 -0300, Mauricio Faria de Oliveira wrote: > This prevents flooding the logs with 'iommu_alloc failed' messages > while I/O is performed (normally) to very fast devices (e.g. NVMe). > > That error is not necessarily a problem; device drivers can retry > later / reschedule the requests for which the allocation failed, > and handle things gracefully for the caller stack on top of them. > > This helps at least with NVMe devices without "64-bit"/direct DMA > window scenarios (e.g., systems with more than a few terabytes of > memory, on which DDW cannot be enabled, currently), where just an > 'dd' command can trigger errors. I'm not fan of this. This is a very useful message to diagnose why, for example, your network adapter is not working properly. A lot of drivers don't deal well with IOMMU errors. The fact that NVME trigger these is a problem that needs to be solved differently. Cheers, Ben. > > # dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=512k > <...> > # echo $? > 0 > > # dmesg > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c000000151c90000 npages 16 > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c000000151c90000 npages 16 > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c000000151c90000 npages 16 > <...> > ppc_iommu_map_sg: 8186 callbacks suppressed > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c0000000fa5c0000 npages 16 > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c000000100440000 npages 16 > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c000000100440000 npages 16 > <...> > ppc_iommu_map_sg: 5707 callbacks suppressed > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c0000000b5f50000 npages 16 > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c0000000b5c60000 npages 16 > nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr > c0000000b4b30000 npages 16 > <...> > > Tested on next-20160609. > > Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.co > m> > --- > arch/powerpc/kernel/iommu.c | 15 ++++++--------- > 1 file changed, 6 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kernel/iommu.c > b/arch/powerpc/kernel/iommu.c > index a8e3490..b585bdc 100644 > --- a/arch/powerpc/kernel/iommu.c > +++ b/arch/powerpc/kernel/iommu.c > @@ -479,10 +479,9 @@ int ppc_iommu_map_sg(struct device *dev, struct > iommu_table *tbl, > > /* Handle failure */ > if (unlikely(entry == DMA_ERROR_CODE)) { > - if (printk_ratelimit()) > - dev_info(dev, "iommu_alloc failed, > tbl %p " > - "vaddr %lx npages %lu\n", > tbl, vaddr, > - npages); > + dev_dbg_ratelimited(dev, "iommu_alloc > failed, tbl %p " > + "vaddr %lx npages %lu\n", tbl, > vaddr, > + npages); > goto failure; > } > > @@ -776,11 +775,9 @@ dma_addr_t iommu_map_page(struct device *dev, > struct iommu_table *tbl, > mask >> tbl->it_page_shift, > align, > attrs); > if (dma_handle == DMA_ERROR_CODE) { > - if (printk_ratelimit()) { > - dev_info(dev, "iommu_alloc failed, > tbl %p " > - "vaddr %p npages %d\n", > tbl, vaddr, > - npages); > - } > + dev_dbg_ratelimited(dev, "iommu_alloc > failed, tbl %p " > + "vaddr %p npages %d\n", tbl, vaddr, > + npages); > } else > dma_handle |= (uaddr & > ~IOMMU_PAGE_MASK(tbl)); > }
Hi Ben, On 06/11/2016 08:02 PM, Benjamin Herrenschmidt wrote: > I'm not fan of this. This is a very useful message to diagnose why, > for example, your network adapter is not working properly. > > A lot of drivers don't deal well with IOMMU errors. > > The fact that NVME trigger these is a problem that needs to be solved > differently. Ok, I understand your points. Thanks for the review -- helps in setting another direction. Kind regards,
On Mon, 2016-06-13 at 10:27 -0300, Mauricio Faria de Oliveira wrote: > Hi Ben, > > On 06/11/2016 08:02 PM, Benjamin Herrenschmidt wrote: > > I'm not fan of this. This is a very useful message to diagnose why, > > for example, your network adapter is not working properly. > > > > A lot of drivers don't deal well with IOMMU errors. > > > > The fact that NVME trigger these is a problem that needs to be > > solved > > differently. > > Ok, I understand your points. > > Thanks for the review -- helps in setting another direction. I've been thinking about this a bit... it might be worthwhile adding a dma_* call to query the approximate size of the IOMMU window, as a way for the device to adjust its requirements dynamically. Another option would be to use a dma_attr for silencing mapping errors which NVME could use provided it does handle them gracefully ... Cheers, Ben.
Hi Ben, On 06/13/2016 06:26 PM, Benjamin Herrenschmidt wrote: > I've been thinking about this a bit... it might be worthwhile adding > a dma_* call to query the approximate size of the IOMMU window, as > a way for the device to adjust its requirements dynamically. Ok, cool; something like it was one of the options being discussed here. What do you mean by 'approximate'? Maybe the size of 'free regions' in the pools? -- not sure because iiuic the window size is static / 2 gig, so didn't get why (or of what) to provide an approximation (for). > Another option would be to use a dma_attr for silencing mapping errors > which NVME could use provided it does handle them gracefully ... Ah, that's new. Interesting. Thanks for suggestion!
On Mon, 2016-06-13 at 18:43 -0300, Mauricio Faria de Oliveira wrote: > Hi Ben, > > On 06/13/2016 06:26 PM, Benjamin Herrenschmidt wrote: > > I've been thinking about this a bit... it might be worthwhile adding > > a dma_* call to query the approximate size of the IOMMU window, as > > a way for the device to adjust its requirements dynamically. > > Ok, cool; something like it was one of the options being discussed here. > > What do you mean by 'approximate'? Maybe the size of 'free regions' in > the pools? -- not sure because iiuic the window size is static / 2 gig, > so didn't get why (or of what) to provide an approximation (for). Approximate wasn't a great choice of word but what I meant is: - The size doesn't mean you can do an allocation that size (pools layout etc..) - And it might be shared with another device (though less likely these days). > > Another option would be to use a dma_attr for silencing mapping errors > > which NVME could use provided it does handle them gracefully ... > > Ah, that's new. Interesting. Thanks for suggestion! Cheers, Ben.
On 06/13/2016 06:51 PM, Benjamin Herrenschmidt wrote: > Approximate wasn't a great choice of word but what I meant is: > > - The size doesn't mean you can do an allocation that size (pools > layout etc..) > - And it might be shared with another device (though less likely > these days). Ah, yup - ok, now I get it; thanks.
Hi Ben, On 06/13/2016 06:26 PM, Benjamin Herrenschmidt wrote: > Another option would be to use a dma_attr for silencing mapping errors > which NVME could use provided it does handle them gracefully ... I recently submitted patches that implement your suggestion [1]. May you please review/comment if they're OK with you? Thanks! [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-August/146850.html
On Wed, 2016-08-03 at 16:39 -0300, Mauricio Faria de Oliveira wrote: > Hi Ben, > > On 06/13/2016 06:26 PM, Benjamin Herrenschmidt wrote: > > > > Another option would be to use a dma_attr for silencing mapping > > errors > > which NVME could use provided it does handle them gracefully ... > > I recently submitted patches that implement your suggestion [1]. > May you please review/comment if they're OK with you? I think this is best done by the relevant community maintainer, I just threw an idea but I'm not that familiar with the details :-) Did you send them to the lkml list ? > Thanks! > > [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-August/14685 > 0.html >
On 08/03/2016 06:34 PM, Benjamin Herrenschmidt wrote: > I think this is best done by the relevant community maintainer, > I just threw an idea but I'm not that familiar with the details:-) Ok, sure; got it. > Did you send them to the lkml list ? Yup, plus a few others lists from get_maintainer.pl iirc. Mailing list archive links: - linux-kernel: http://marc.info/?l=linux-kernel&m=146798084822100&w=2 - linux-doc: http://marc.info/?l=linux-doc&m=146798085522104&w=2 - linux-nvme: http://lists.infradead.org/pipermail/linux-nvme/2016-July/005349.html - linuxppc-dev: https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-July/145624.html Thanks,
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index a8e3490..b585bdc 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -479,10 +479,9 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table *tbl, /* Handle failure */ if (unlikely(entry == DMA_ERROR_CODE)) { - if (printk_ratelimit()) - dev_info(dev, "iommu_alloc failed, tbl %p " - "vaddr %lx npages %lu\n", tbl, vaddr, - npages); + dev_dbg_ratelimited(dev, "iommu_alloc failed, tbl %p " + "vaddr %lx npages %lu\n", tbl, vaddr, + npages); goto failure; } @@ -776,11 +775,9 @@ dma_addr_t iommu_map_page(struct device *dev, struct iommu_table *tbl, mask >> tbl->it_page_shift, align, attrs); if (dma_handle == DMA_ERROR_CODE) { - if (printk_ratelimit()) { - dev_info(dev, "iommu_alloc failed, tbl %p " - "vaddr %p npages %d\n", tbl, vaddr, - npages); - } + dev_dbg_ratelimited(dev, "iommu_alloc failed, tbl %p " + "vaddr %p npages %d\n", tbl, vaddr, + npages); } else dma_handle |= (uaddr & ~IOMMU_PAGE_MASK(tbl)); }
This prevents flooding the logs with 'iommu_alloc failed' messages while I/O is performed (normally) to very fast devices (e.g. NVMe). That error is not necessarily a problem; device drivers can retry later / reschedule the requests for which the allocation failed, and handle things gracefully for the caller stack on top of them. This helps at least with NVMe devices without "64-bit"/direct DMA window scenarios (e.g., systems with more than a few terabytes of memory, on which DDW cannot be enabled, currently), where just an 'dd' command can trigger errors. # dd if=/dev/zero of=/dev/nvme0n1p1 bs=64k count=512k <...> # echo $? 0 # dmesg nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c000000151c90000 npages 16 nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c000000151c90000 npages 16 nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c000000151c90000 npages 16 <...> ppc_iommu_map_sg: 8186 callbacks suppressed nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c0000000fa5c0000 npages 16 nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c000000100440000 npages 16 nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c000000100440000 npages 16 <...> ppc_iommu_map_sg: 5707 callbacks suppressed nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c0000000b5f50000 npages 16 nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c0000000b5c60000 npages 16 nvme 0000:00:06.0: iommu_alloc failed, tbl c0000001fa67c520 vaddr c0000000b4b30000 npages 16 <...> Tested on next-20160609. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> --- arch/powerpc/kernel/iommu.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)