Patchwork Re[4]: [PATCH] katmai.dts: extend DMA ranges; add dma/sysace nodes

login
register
mail settings
Submitter Yuri Tikhonov
Date Nov. 27, 2008, 12:26 a.m.
Message ID <1351427808.20081127032633@emcraft.com>
Download mbox | patch
Permalink /patch/11075/
State Changes Requested
Delegated to: Benjamin Herrenschmidt
Headers show

Comments

Yuri Tikhonov - Nov. 27, 2008, 12:26 a.m.
Hello Ben,

On Friday, November 14, 2008 you wrote:

>>  My understanding was that the dma-ranges property is responsible for 
>> setting up the inbound ranges of RAM's physical addresses, where PCI 
>> could DMA to/from. As regarding the outbound property, this patch 
>> doesn't change this, and there we have the PCI space split (2 GB of 
>> memory, and 64K of I/O spaces mapped from the 64-bit physical 
>> addresses into 32-bit PCI address space). Am I missing something ?

> The PCI memory space doesn't differenciate inbound and outbound.

> For example, if you take the example of a PCI 2.0 or PCI-X setup (I'm
> leaving PCI-E alone in the discussion because it's a bit special in that
> area, but the problem is fundamentally the same), and you have on the
> PCI bus of that thing a device X configured to respond to MMIO at
> address, for example, 0x80000000.

> Now, what happens if another device, let's call it Y, tries to DMA to
> that same address ? Who will pick up the memory write at address
> 0x80000000 ? The host controller or device X ? 

> There is no concept of "inbound" here.. it's just a memory cycle to
> address A that gets decoded by whoever tries to decode it on that bus
> segment.

> Thus DMA ranges must -not- overlap areas of memory used for MMIO, it
> just won't work.

 Understood. Thanks a lot for the exhaustive comment.

 Speaking about inbound/outbound above I assumed the traffic direction 
through the PLB(Processor Local Bus)-PCI bridge controller of 
ppc440spe.

>>  With the default 2GB dma-ranges we just get the following on Katmai 
>> with 4GB of SDRAM installed:

> Because that is simply not supported at the moment unless we add what
> I've talked about earlier, ie basically, swiotlb support (which Becky is
> working on at the moment).

(1):
> You might be able to work around it somewhat if your PCI device supports
> 64-bit BARs in which case you can set your outbound regions above the
> 32-bit space. Note that I haven't verified whether the 32-bit linux
> kernel supports that tho. There used to be issues with >32-bit PCI BARs
> on 32-bit kernel, at least it wouldn't assign them but that may have
> been fixed.

(2):
> Another option you can try if you device only does 32-bit BARs but
> supports 64-bit DMA, is to move the DMA window away from the 32-bit MMIO
> space. Stick it at 1T or so in PCI space. You might need a little bit of
> kernel tweaking to make the dma-ops pick that up properly but that
> shouldnt be too hard.


 OK. Then I see the following major difference between the two 
approaches you suggest above:

(1) means that I should move MMIO ranges to the high PCI addresses 
(after 4GB), and thus we'll be able to extend dma-ranges up to 4GB. 
So, with work-around (1), any 32-bit PCI card will not work on my 
(Katmai) board.

(2) means that I should move the PCI address of the host RAM (DMA 
window) away from 32-bit PCI address space, and thus we'll be able to 
increase the window size up to necessary 4GB. So, with work-around 
(2), only that PCI cards will not work on Katmai, which do DMA but 
can't do the 64-bit DMA.

 Hence, the item (2) looks less harmful, and more preferable.

 I've implemented (2) (the code is below), and it works. But, 
admittedly, this (working) looks strange to me because of the 
following:
 To be able to use 64-bit PCI mapping on PPC32 I had to replace the
'unsigned long' type of pci_dram_offset with 'resource_size_t', which 
on ppc440spe is 'u64'. So, in dma_alloc_coherent() I put the 64-bit 
value into the 'dma_addr_t' handle. I use 2.6.27 kernel for testing, 
which has sizeof(dma_addr_t) == sizeof(u32). Thus, 
dma_alloc_coherent() cuts the upper 32 bits of PCI address, and returns 
only low 32-bit part of PCI address to its caller. And, regardless of 
this fact, the PCI device does operate somehow (this is the PCI-E LSI 
disk controller served by the drivers/message/fusion/mptbase.c + 
mptsas.c drivers).

 I've verified that ppc440spe PCI-E bridge's BARs (PECFGn_BAR0L,H) are 
configured with the new, 1TB, address value:


 Any ideas ?

> If you really need 32-bit DMA support, you'll have to wait for swiotlb
> from Becky or work with her in bringing it to powerpc so that we can do
> bounce buffering for those devices.

> Ben.


 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com
Benjamin Herrenschmidt - Nov. 27, 2008, 12:42 a.m.
>  I've implemented (2) (the code is below), and it works. But, 
> admittedly, this (working) looks strange to me because of the 
> following:
>  To be able to use 64-bit PCI mapping on PPC32 I had to replace the
> 'unsigned long' type of pci_dram_offset with 'resource_size_t', which 
> on ppc440spe is 'u64'. So, in dma_alloc_coherent() I put the 64-bit 
> value into the 'dma_addr_t' handle. I use 2.6.27 kernel for testing, 
> which has sizeof(dma_addr_t) == sizeof(u32). Thus, 
> dma_alloc_coherent() cuts the upper 32 bits of PCI address, and returns 
> only low 32-bit part of PCI address to its caller. And, regardless of 
> this fact, the PCI device does operate somehow (this is the PCI-E LSI 
> disk controller served by the drivers/message/fusion/mptbase.c + 
> mptsas.c drivers).
> 
>  I've verified that ppc440spe PCI-E bridge's BARs (PECFGn_BAR0L,H) are 
> configured with the new, 1TB, address value:

Strange... when I look at pci4xx_parse_dma_ranges() I see it
specifically avoiding PCI addresses above 4G ... That needs fixing.

To implement that trick you definitely need to make dma_addr_t 64 bits.

Cheers,
Ben.

Patch

--- a/arch/powerpc/sysdev/ppc4xx_pci.c
+++ b/arch/powerpc/sysdev/ppc4xx_pci.c
@@ -1439,6 +1446,8 @@  static void __init ppc4xx_configure_pciex_PIMs(struct ppc4xx_pciex_port *port,
 
                out_le32(mbase + PCI_BASE_ADDRESS_0, RES_TO_U32_LOW(res->start));
                out_le32(mbase + PCI_BASE_ADDRESS_1, RES_TO_U32_HIGH(res->start));
+               printk("<<<< base PCI_BASE_ADDRESS_0 0x%08x. PCI_BASE_ADDRESS_1 0x%08x\n",
+                      in_le32(mbase + PCI_BASE_ADDRESS_0), in_le32(mbase + PCI_BASE_ADDRESS_1));

 resulting to the following during boot:

<<<< base PCI_BASE_ADDRESS_0 0x00000008. PCI_BASE_ADDRESS_1 0x00000100


 Also I verified that the LSI device driver passes the DMA mapped 
addresses (which turn out to be cut because of types mismatching) to 
the PCI device indeed, and the latter manages to write there!:

--- a/arch/powerpc/lib/dma-noncoherent.c
+++ b/arch/powerpc/lib/dma-noncoherent.c
@@ -209,6 +209,8 @@  __dma_alloc_coherent(size_t size, dma_addr_t *handle, gfp_t gfp)
                 * Set the "dma handle"
                 */
                *handle = page_to_bus(page);
+               printk("\n### cut 0x%016llx to 0x%08x ###\n", page_to_bus(page), *handle);
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c

@@ -396,8 +397,8 @@  mpt_reply(MPT_ADAPTER *ioc, u32 pa)
        cb_idx = mr->u.frame.hwhdr.msgctxu.fld.cb_idx;
        mf = MPT_INDEX_2_MFPTR(ioc, req_idx);
 
-       dmfprintk(ioc, printk(MYIOC_s_DEBUG_FMT "Got non-TURBO reply=%p req_idx=
-                       ioc->name, mr, req_idx, cb_idx, mr->u.hdr.Function));
+       printk("Got non-TURBO reply=%p req_idx=%x cb_idx=%x Function=%x\n",
+               mr, req_idx, cb_idx, mr->u.hdr.Function);
        DBG_DUMP_REPLY_FRAME(ioc, (u32 *)mr);
 
         /*  Check/log IOC log info
@@ -4271,13 +4273,16 @@  PrimeIocFifos(MPT_ADAPTER *ioc)
        /* Post Reply frames to FIFO
         */
        alloc_dma = ioc->alloc_dma;
+       mem = ioc->alloc;
        dinitprintk(ioc, printk(MYIOC_s_DEBUG_FMT "ReplyBuffers @ %p[%p]\n",
                ioc->name, ioc->reply_frames, (void *)(ulong)alloc_dma));
 
        for (i = 0; i < ioc->reply_depth; i++) {
                /*  Write each address to the IOC!  */
                CHIPREG_WRITE32(&ioc->chip->ReplyFifo, alloc_dma);
+               printk("post 0x%08x(0x%p)\n", alloc_dma, mem);
                alloc_dma += ioc->reply_sz;
+               mem = (u8*)mem + ioc->reply_sz;
        }
@@ -466,6 +467,7 @@  mpt_interrupt(int irq, void *bus_id)
         *  Drain the reply FIFO!
         */
        do {
+               printk("%s: %08x\n", __FUNCTION__, le32_to_cpu(pa));
                if (pa & MPI_ADDRESS_REPLY_A_BIT)
                        mpt_reply(ioc, pa);
                else

 resulting to the following:

...
ioc0: LSISAS1068E B3: Capabilities={Initiator}
### cut 0x00000100288c0000 to 0x288c0000 ###
### cut 0x00000100288f0000 to 0x288f0000 ###
post 0x288c0000(0xffe10000)
post 0x288c0050(0xffe10050)
post 0x288c00a0(0xffe100a0)
...
mpt_interrupt: 00004694
Got non-TURBO reply=ffe10000 req_idx=0 cb_idx=f Function=7
mpt_interrupt: 28004694
Got non-TURBO reply=ffe10050 req_idx=1 cb_idx=f Function=4



 So, probably the following patch for work-around (2) is still
missing something, since cutting the high bits of 64-bit PCI address 
doesn't break anything:

diff --git a/arch/powerpc/boot/dts/katmai.dts b/arch/powerpc/boot/dts/katmai.dts
index 983272e..690ae4b 100644
--- a/arch/powerpc/boot/dts/katmai.dts
+++ b/arch/powerpc/boot/dts/katmai.dts
@@ -245,8 +245,8 @@ 
                        ranges = <0x02000000 0x00000000 0x80000000 0x0000000d 0x80000000 0x00000000 0x80000000
                                  0x01000000 0x00000000 0x00000000 0x0000000c 0x08000000 0x00000000 0x00010000>;
 
-                       /* Inbound 4GB range starting at 0 */
-                       dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x1 0x00000000>;
+                       /* Inbound 4GB range starting at 1T */
+                       dma-ranges = <0x42000000 0x100 0x0 0x0 0x0 0x1 0x00000000>;
 
                        /* This drives busses 0 to 0xf */
                        bus-range = <0x0 0xf>;
@@ -289,8 +289,8 @@ 
                        ranges = <0x02000000 0x00000000 0x80000000 0x0000000e 0x00000000 0x00000000 0x80000000
                                  0x01000000 0x00000000 0x00000000 0x0000000f 0x80000000 0x00000000 0x00010000>;
 
-                       /* Inbound 4GB range starting at 0 */
-                       dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x1 0x00000000>;
+                       /* Inbound 4GB range starting at 1T */
+                       dma-ranges = <0x42000000 0x100 0x0 0x0 0x0 0x1 0x00000000>;
 
                        /* This drives busses 10 to 0x1f */
                        bus-range = <0x10 0x1f>;
@@ -330,8 +330,8 @@ 
                        ranges = <0x02000000 0x00000000 0x80000000 0x0000000e 0x80000000 0x00000000 0x80000000
                                  0x01000000 0x00000000 0x00000000 0x0000000f 0x80010000 0x00000000 0x00010000>;
 
-                       /* Inbound 4GB range starting at 0 */
-                       dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x1 0x00000000>;
+                       /* Inbound 4GB range starting at 1T */
+                       dma-ranges = <0x42000000 0x100 0x0 0x0 0x0 0x1 0x00000000>;
 
                        /* This drives busses 10 to 0x1f */
                        bus-range = <0x20 0x2f>;
@@ -371,8 +371,8 @@ 
                        ranges = <0x02000000 0x00000000 0x80000000 0x0000000f 0x00000000 0x00000000 0x80000000
                                  0x01000000 0x00000000 0x00000000 0x0000000f 0x80020000 0x00000000 0x00010000>;
 
-                       /* Inbound 4GB range starting at 0 */
-                       dma-ranges = <0x42000000 0x0 0x0 0x0 0x0 0x1 0x00000000>;
+                       /* Inbound 4GB range starting at 1T */
+                       dma-ranges = <0x42000000 0x100 0x0 0x0 0x0 0x1 0x00000000>;
 
                        /* This drives busses 10 to 0x1f */
                        bus-range = <0x30 0x3f>;
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 77c7fa0..adbeb19 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -59,7 +59,7 @@  extern int check_legacy_ioport(unsigned long base_port);
 
 extern unsigned long isa_io_base;
 extern unsigned long pci_io_base;
-extern unsigned long pci_dram_offset;
+extern resource_size_t pci_dram_offset;
 
 extern resource_size_t isa_mem_base;
 
@@ -728,14 +728,14 @@  static inline void * phys_to_virt(unsigned long address)
  */
 #ifdef CONFIG_PPC32
 
-static inline unsigned long virt_to_bus(volatile void * address)
+static inline resource_size_t virt_to_bus(volatile void * address)
 {
         if (address == NULL)
                return 0;
         return __pa(address) + PCI_DRAM_OFFSET;
 }
 
-static inline void * bus_to_virt(unsigned long address)
+static inline void * bus_to_virt(resource_size_t address)
 {
         if (address == 0)
                return NULL;
diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index 88db4ff..5855937 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -33,7 +33,7 @@ 
 #endif
 
 unsigned long isa_io_base     = 0;
-unsigned long pci_dram_offset = 0;
+resource_size_t pci_dram_offset = 0;
 int pcibios_assign_bus_offset = 1;
 
 void pcibios_make_OF_bus_map(void);
diff --git a/arch/powerpc/sysdev/ppc4xx_pci.c b/arch/powerpc/sysdev/ppc4xx_pci.c
index afbdd48..f748c5b 100644
--- a/arch/powerpc/sysdev/ppc4xx_pci.c
+++ b/arch/powerpc/sysdev/ppc4xx_pci.c
@@ -126,10 +126,8 @@  static int __init ppc4xx_parse_dma_ranges(struct pci_controller *hose,
                if ((pci_space & 0x03000000) != 0x02000000)
                        continue;

-               /* We currently only support memory at 0, and pci_addr
-                * within 32 bits space
-                */
-               if (cpu_addr != 0 || pci_addr > 0xffffffff) {
+               /* We currently only support memory at 0 */
+               if (cpu_addr != 0) {
                        printk(KERN_WARNING "%s: Ignored unsupported dma range"
                               " 0x%016llx...0x%016llx -> 0x%016llx\n",
                               hose->dn->full_name,
@@ -179,18 +177,12 @@  static int __init ppc4xx_parse_dma_ranges(struct pci_controller *hose,
                return -ENXIO;
        }

-       /* Check that we are fully contained within 32 bits space */
-       if (res->end > 0xffffffff) {
-               printk(KERN_ERR "%s: dma-ranges outside of 32 bits space\n",
-                      hose->dn->full_name);
-               return -ENXIO;
-       }
  out:
        dma_offset_set = 1;
        pci_dram_offset = res->start;

-       printk(KERN_INFO "4xx PCI DMA offset set to 0x%08lx\n",
-              pci_dram_offset);
+       printk(KERN_INFO "4xx PCI DMA offset set to 0x%016llx\n",
+              (unsigned long long)pci_dram_offset);
        return 0;
 }