diff mbox

[2/2] drivers/nvme: default to the IOMMU page size on Power

Message ID 20151002172313.GC41011@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show

Commit Message

Nishanth Aravamudan Oct. 2, 2015, 5:23 p.m. UTC
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).

The NVMe device driver, though, assumes that the DMA alignment for the
PRP entries will match the device's page size, and that the DMA aligment
matches the kernel's page aligment. On Power, the the IOMMU page size,
as mentioned above, can be 4K, while the device can have a page size of
8K, while the kernel has a page size of 64K. This eventually trips the
BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple
of 4K but not 8K (e.g., 0xF000).

In this particular case, and generally, we want to use the IOMMU's page
size for the default device page size, rather than the kernel's page
size.

With this patch, a NVMe device survives our internal hardware
exerciser; the kernel BUGs within a few seconds without the patch.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>

Comments

Christoph Hellwig Oct. 2, 2015, 5:25 p.m. UTC | #1
Hi Nishanth,

please expose this value through the generic DMA API instead of adding
architecture specific hacks to drivers.
Nishanth Aravamudan Oct. 2, 2015, 5:39 p.m. UTC | #2
On 02.10.2015 [10:25:44 -0700], Christoph Hellwig wrote:
> Hi Nishanth,
> 
> please expose this value through the generic DMA API instead of adding
> architecture specific hacks to drivers.

Ok, I'm happy to do that instead -- what I struggled with is that I
don't have enough knowledge of the various architectures to provide the
right default implementation. It should be sufficient for the default to
return PAGE_SHIFT, and on Power just override that to return the IOMMU
table's page size? Since the only user will be the NVMe driver
currently, that should be fine?

Sorry for the less-than-ideal patch!

-Nish
Christoph Hellwig Oct. 2, 2015, 5:41 p.m. UTC | #3
On Fri, Oct 02, 2015 at 10:39:47AM -0700, Nishanth Aravamudan wrote:
> Ok, I'm happy to do that instead -- what I struggled with is that I
> don't have enough knowledge of the various architectures to provide the
> right default implementation. It should be sufficient for the default to
> return PAGE_SHIFT, and on Power just override that to return the IOMMU
> table's page size? Since the only user will be the NVMe driver
> currently, that should be fine?

I think that's fine.

> Sorry for the less-than-ideal patch!

Np, it's a reasonable first attempt that we just need to refine.
kernel test robot Oct. 2, 2015, 6:57 p.m. UTC | #4
Hi Nishanth,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: arm64-allmodconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All error/warnings (new ones prefixed by >>):

>> drivers/block/nvme-core.c:45:23: fatal error: asm/iommu.h: No such file or directory
    #include <asm/iommu.h>
                          ^
   compilation terminated.

vim +45 drivers/block/nvme-core.c

    39	#include <linux/sched.h>
    40	#include <linux/slab.h>
    41	#include <linux/t10-pi.h>
    42	#include <linux/types.h>
    43	#include <scsi/sg.h>
    44	#include <asm-generic/io-64-nonatomic-lo-hi.h>
  > 45	#include <asm/iommu.h>
    46	
    47	#define NVME_MINORS		(1U << MINORBITS)
    48	#define NVME_Q_DEPTH		1024

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox

Patch

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 7920c27..969a95e 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -42,6 +42,7 @@ 
 #include <linux/types.h>
 #include <scsi/sg.h>
 #include <asm-generic/io-64-nonatomic-lo-hi.h>
+#include <asm/iommu.h>
 
 #define NVME_MINORS		(1U << MINORBITS)
 #define NVME_Q_DEPTH		1024
@@ -1680,6 +1681,11 @@  static int nvme_configure_admin_queue(struct nvme_dev *dev)
 	unsigned page_shift = PAGE_SHIFT;
 	unsigned dev_page_min = NVME_CAP_MPSMIN(cap) + 12;
 	unsigned dev_page_max = NVME_CAP_MPSMAX(cap) + 12;
+#ifdef CONFIG_PPC64
+	struct iommu_table *tbl = get_iommu_table_base(dev->dev);
+	if (tbl)
+		page_shift = IOMMU_PAGE_SHIFT(tbl);
+#endif
 
 	if (page_shift < dev_page_min) {
 		dev_err(dev->dev,