diff mbox

fsldma: use PCI Read Multiple command

Message ID 20090424183517.GB23140@ovro.caltech.edu (mailing list archive)
State Not Applicable, archived
Delegated to: Kumar Gala
Headers show

Commit Message

Ira Snyder April 24, 2009, 6:35 p.m. UTC
By default, the Freescale 83xx DMA controller uses the PCI Read Line
command when reading data over the PCI bus. Setting the controller to use
the PCI Read Multiple command instead allows the controller to read much
larger bursts of data, which provides a drastic speed increase.

The slowdown due to using PCI Read Line was only observed when a PCI-to-PCI
bridge was between the devices trying to communicate.

A simple test driver showed an increase from 4MB/sec to 116MB/sec when
performing DMA over the PCI bus. Using DMA to transfer between blocks of
local SDRAM showed no change in performance with this patch. The dmatest
driver was also used to verify the correctness of the transfers, and showed
no errors.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
---
 drivers/dma/fsldma.c |    5 +++--
 drivers/dma/fsldma.h |    1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

Comments

Yang Li April 27, 2009, 7:48 a.m. UTC | #1
On Sat, Apr 25, 2009 at 2:35 AM, Ira Snyder <iws@ovro.caltech.edu> wrote:
> By default, the Freescale 83xx DMA controller uses the PCI Read Line
> command when reading data over the PCI bus. Setting the controller to use
> the PCI Read Multiple command instead allows the controller to read much
> larger bursts of data, which provides a drastic speed increase.
>
> The slowdown due to using PCI Read Line was only observed when a PCI-to-PCI
> bridge was between the devices trying to communicate.
>
> A simple test driver showed an increase from 4MB/sec to 116MB/sec when
> performing DMA over the PCI bus. Using DMA to transfer between blocks of
> local SDRAM showed no change in performance with this patch. The dmatest
> driver was also used to verify the correctness of the transfers, and showed
> no errors.
>
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>

Acked-by: Li Yang <leoli@freescale.com>

Thanks,
Leo
Dave Liu April 27, 2009, 9:09 a.m. UTC | #2
> By default, the Freescale 83xx DMA controller uses the PCI Read Line
> command when reading data over the PCI bus. Setting the 
> controller to use the PCI Read Multiple command instead allows the
> controller to read much larger bursts of data, which provides a
drastic
> speed increase.

IIRC, the default for 83xx DMA controller uses the PCI mem read
command, not mem read line.

You are assuming the PCI memory space is prefetchable( no side effect)
for DMA.
Is it possible that DMA is from non-prefetchable memory space?

> The slowdown due to using PCI Read Line was only observed 
> when a PCI-to-PCI bridge was between the devices trying to
communicate.
> 
> A simple test driver showed an increase from 4MB/sec to 116MB/sec when
> performing DMA over the PCI bus. Using DMA to transfer between blocks
> of local SDRAM showed no change in performance with this patch. 
> The dmatest driver was also used to verify the correctness of the
transfers,
> and showed no errors.
Yang Li April 27, 2009, 10:16 a.m. UTC | #3
On Mon, Apr 27, 2009 at 5:09 PM, Liu Dave-R63238 <DaveLiu@freescale.com> wrote:
>> By default, the Freescale 83xx DMA controller uses the PCI Read Line
>> command when reading data over the PCI bus. Setting the
>> controller to use the PCI Read Multiple command instead allows the
>> controller to read much larger bursts of data, which provides a
> drastic
>> speed increase.
>
> IIRC, the default for 83xx DMA controller uses the PCI mem read
> command, not mem read line.
>
> You are assuming the PCI memory space is prefetchable( no side effect)
> for DMA.
> Is it possible that DMA is from non-prefetchable memory space?

I guess it's not reasonable to use DMA from non-prefetchable memory.
So it's up to the driver which uses the DMA engine to prevent from
using DMA API on non-prefetchable memory.

- Leo
Timur Tabi April 27, 2009, 2:31 p.m. UTC | #4
On Mon, Apr 27, 2009 at 4:09 AM, Liu Dave-R63238 <DaveLiu@freescale.com> wrote:

> You are assuming the PCI memory space is prefetchable( no side effect)
> for DMA.
> Is it possible that DMA is from non-prefetchable memory space?

This should be a safe assumption for this driver.  Remember, this
driver just does offload memcpy, from one region to another.  So the
PCI memory that you are reading from should be just a buffer of data,
and there should be side-effect of reading it.

However, I would like to see a comment at the top of the file warning
people that copying from PCI memory will result in prefetched reads.
David Hawkins April 27, 2009, 7:34 p.m. UTC | #5
Hi all,

>> You are assuming the PCI memory space is prefetchable( no side effect)
>> for DMA. Is it possible that DMA is from non-prefetchable memory space?
> 
> This should be a safe assumption for this driver.  Remember, this
> driver just does offload memcpy, from one region to another.  So the
> PCI memory that you are reading from should be just a buffer of data,
> and there should be side-effect of reading it.
> 
> However, I would like to see a comment at the top of the file warning
> people that copying from PCI memory will result in prefetched reads.
> 

Here's a few results from DMA tests between two
MPC8349EA boards housed in a CPCI chassis.

1. DMA mode register (DMAMRn)
    PCI read command (PRC, bits 11:10)

    a) DMAMRn[PRC] = 00 = PCI Read

       The PCI read command is 6h on the PCI bus.
       For DMA lengths of less than 1 cache line (32-bytes)
       the DMA controller will generate a PCI 6h command.
       However, for lengths of 32-bytes and higher, the
       DMA controller actually generates a PCI Read Line (Eh)
       command.

       Freescale indicated that this 'change of PCI command code'
       functionality is an undocumented 'feature', there to
       improve performance for longer read lengths.

    b) DMAMRn[PRC] = 01 = PCI Read Line

       Generated the PCI command code for PCI read line (Eh),
       regardless of DMA length.

    c) DMAMRn[PRC] = 10 = PCI Read Multiple

       Generated the PCI command code for PCI Read Multiple (Ch),
       regardless of DMA length.

2. DMA from areas marked as non-prefetchable.

    We setup two test cases:

    a) Two boards in the same PCI segment with no intervening
       bridges.

    b) Two boards in separate PCI segments with intervening
       bridges (in this case, there was two bridges on SBS
       CPCI-to-CPI backplane bridging cards).

    The PCI window to the IMMR registers on the boards is marked
    as non-prefetchable. DMA from that area (reads) appeared on the
    PCI bus as single 32-bit read transactions, i.e., the target
    device effectively ignores the PCI read command, and the target
    returns read data as single reads, i.e., the target acts as
    as non-prefetchable memory.

    We tested with the PCI Read Multiple command, and no data
    was ever prefetched from the target. The bridges did not
    prefetch and discard data, eg. we tried 36-bytes, and there
    were 9 separate 4-byte transactions (we were using a 32-bit
    PCI host for this test). Hence, at least for this example
    target device, there are no side effects to using PCI
    read multiple on the bus master, to a target PCI memory
    region marked as non-prefetchable.

    So even though the bridges were seeing transactions that
    indicated read multiple, only single read cycles were
    seen to be used.

Would you like some sort of summary of this info for a commit
message?

Would you like us to check any other transaction/register combos?

Cheers,
Dave
Timur Tabi April 27, 2009, 7:40 p.m. UTC | #6
David Hawkins wrote:

> Would you like some sort of summary of this info for a commit
> message?

That's probably overkill.  I just want a sentence or two that tells
someone looking at the code casually that the behavior of reading PCI
memory might be different than what they expect.

> Would you like us to check any other transaction/register combos?

Yes, could you try this on non-PCI memory?
David Hawkins April 27, 2009, 7:48 p.m. UTC | #7
>> Would you like some sort of summary of this info for a commit
>> message?
> 
> That's probably overkill.  I just want a sentence or two that tells
> someone looking at the code casually that the behavior of reading PCI
> memory might be different than what they expect.

Ok, will-do.

>> Would you like us to check any other transaction/register combos?
> 
> Yes, could you try this on non-PCI memory?

We've been using it to DMA between the x86 host main memory and
the MPC8349EA boards (PCI targets). The reason we changed to
Read Multiple was that it had a dramatic improvement in
efficiency through bridges. However, the x86 host memory
is prefetchable, so is consistent with the use of Read Multiple.

Can you give me an example of non-PCI memory that would be
non-prefetchable that you'd like us to try? We can see if our
host CPUs have an area like that ... we just need to know
what device to look for first :)

Cheers,
Dave
Kumar Gala April 27, 2009, 7:54 p.m. UTC | #8
On Apr 27, 2009, at 2:48 PM, David Hawkins wrote:

> Can you give me an example of non-PCI memory that would be
> non-prefetchable that you'd like us to try? We can see if our
> host CPUs have an area like that ... we just need to know
> what device to look for first :)

You can mark the pci inbound window on the 83xx as non-prefetchable  
(assuming 83xx is host). On a x86 host I doubt there is any easy way  
to get non-prefetchable memory.

- k
David Hawkins April 27, 2009, 8 p.m. UTC | #9
>> Can you give me an example of non-PCI memory that would be
>> non-prefetchable that you'd like us to try? We can see if our
>> host CPUs have an area like that ... we just need to know
>> what device to look for first :)
> 
> You can mark the pci inbound window on the 83xx as non-prefetchable 
> (assuming 83xx is host). On a x86 host I doubt there is any easy way to 
> get non-prefetchable memory.

Yep, we were going to do that, but chose to use the
1MB region already setup for the IMMRs since its already
marked as non-prefetchable. We were only doing reads, so
it wasn't going to hurt anything.

I doubt that marking one of the other BAR regions
as non-prefetchable will give a different result.
However, we're more than happy to double-check if
you'd like.

Cheers,
Dave
David Hawkins April 27, 2009, 8:01 p.m. UTC | #10
>> You can mark the pci inbound window on the 83xx as non-prefetchable 
>> (assuming 83xx is host). On a x86 host I doubt there is any easy way 
>> to get non-prefetchable memory.

One more note; we don't have access to a host-mode MPC8349EA,
our boards are all targets.

Cheers,
Dave
Kumar Gala April 27, 2009, 8:02 p.m. UTC | #11
On Apr 27, 2009, at 3:00 PM, David Hawkins wrote:

>
>>> Can you give me an example of non-PCI memory that would be
>>> non-prefetchable that you'd like us to try? We can see if our
>>> host CPUs have an area like that ... we just need to know
>>> what device to look for first :)
>> You can mark the pci inbound window on the 83xx as non-prefetchable  
>> (assuming 83xx is host). On a x86 host I doubt there is any easy  
>> way to get non-prefetchable memory.
>
> Yep, we were going to do that, but chose to use the
> 1MB region already setup for the IMMRs since its already
> marked as non-prefetchable. We were only doing reads, so
> it wasn't going to hurt anything.
>
> I doubt that marking one of the other BAR regions
> as non-prefetchable will give a different result.
> However, we're more than happy to double-check if
> you'd like.

Its possible you'll get a different result since IMMR is a register  
space internal and thats normally a completely different bus than  
memory would be (internal to the 83xx).  I'd suggest double-checking w/ 
a BAR marked non-prefetch pointing to real memory.

- k
Timur Tabi April 27, 2009, 8:04 p.m. UTC | #12
David Hawkins wrote:

> Can you give me an example of non-PCI memory that would be
> non-prefetchable that you'd like us to try? We can see if our
> host CPUs have an area like that ... we just need to know
> what device to look for first :)

Hmmmm.... I was going to say any SOC device in the IMMR, but I don't see
anything there that would constitute a memory buffer.

I test this change on an 8610 and DMA to a register I/O, where this bit
isn't even defined, and it made no difference.  So I guess this change
is okay.

Acked-by: Timur Tabi <timur@freescale.com>
David Hawkins April 27, 2009, 8:12 p.m. UTC | #13
>>>> Can you give me an example of non-PCI memory that would be
>>>> non-prefetchable that you'd like us to try? We can see if our
>>>> host CPUs have an area like that ... we just need to know
>>>> what device to look for first :)
>>> You can mark the pci inbound window on the 83xx as non-prefetchable 
>>> (assuming 83xx is host). On a x86 host I doubt there is any easy way 
>>> to get non-prefetchable memory.
>>
>> Yep, we were going to do that, but chose to use the
>> 1MB region already setup for the IMMRs since its already
>> marked as non-prefetchable. We were only doing reads, so
>> it wasn't going to hurt anything.
>>
>> I doubt that marking one of the other BAR regions
>> as non-prefetchable will give a different result.
>> However, we're more than happy to double-check if
>> you'd like.
> 
> Its possible you'll get a different result since IMMR is a register 
> space internal and thats normally a completely different bus than memory 
> would be (internal to the 83xx).  I'd suggest double-checking w/a BAR 
> marked non-prefetch pointing to real memory.

We had a 4k BAR1 setup to point to DDR memory.

With prefetchable set, a 36-byte transfer generated a
burst-of-8 32-bit words followed by a single transaction.

With non-prefetchable set, the transfers were all single.

So it works like we'd expect.

Cheers,
Dave
David Hawkins April 27, 2009, 8:22 p.m. UTC | #14
Hi Timur,

>> Would you like some sort of summary of this info for a commit
>> message?
> 
> That's probably overkill.  I just want a sentence or two that tells
> someone looking at the code casually that the behavior of reading PCI
> memory might be different than what they expect.

Could you help us with the wording you'd like to see in the code.
Did you want to see something in the header comments, or something
near the register settings?

How about something like this in place of the existing PCI_RM
comment:

PRC_RM - PCI read multiple
   The default PCI read command used by the DMA controller is
   PCI Read (PCI command 6h). When the burst length is 32-bytes
   or longer, PCI Read Line (PCI command Eh) is used (undocumented
   feature of the controller). Using PCI read multiple
   (PCI command Ch) results in high-performance across PCI
   bridges. DMA transfers to non-prefetchable PCI registers
   should not result in prefetched reads, even when using
   the PCI read multiple command.


Cheers,
Dave
Timur Tabi April 27, 2009, 8:26 p.m. UTC | #15
David Hawkins wrote:

> PRC_RM - PCI read multiple
>    The default PCI read command used by the DMA controller is
>    PCI Read (PCI command 6h). When the burst length is 32-bytes
>    or longer, PCI Read Line (PCI command Eh) is used (undocumented
>    feature of the controller). Using PCI read multiple
>    (PCI command Ch) results in high-performance across PCI
>    bridges. DMA transfers to non-prefetchable PCI registers
>    should not result in prefetched reads, even when using
>    the PCI read multiple command.

I was thinking more along the lines of:

"This driver tells the DMA controller to use the PCI Read Multiple
command, instead of the PCI Read Line command, for PCI read operations.
 Please be aware that this setting may result in read pre-fetching on
some platforms."
David Hawkins April 27, 2009, 8:41 p.m. UTC | #16
Hi Timur,

>> PRC_RM - PCI read multiple
>>    The default PCI read command used by the DMA controller is
>>    PCI Read (PCI command 6h). When the burst length is 32-bytes
>>    or longer, PCI Read Line (PCI command Eh) is used (undocumented
>>    feature of the controller). Using PCI read multiple
>>    (PCI command Ch) results in high-performance across PCI
>>    bridges. DMA transfers to non-prefetchable PCI registers
>>    should not result in prefetched reads, even when using
>>    the PCI read multiple command.
> 
> I was thinking more along the lines of:
> 
> "This driver tells the DMA controller to use the PCI Read Multiple
> command, instead of the PCI Read Line command, for PCI read operations.
>  Please be aware that this setting may result in read pre-fetching on
> some platforms."

Ok, thanks.

Ira will add your comment to the body of the code near
the PRC_RM command and submit a new patch.

Cheers,
Dave
Timur Tabi April 27, 2009, 8:42 p.m. UTC | #17
David Hawkins wrote:

> Ira will add your comment to the body of the code near
> the PRC_RM command and submit a new patch.

I'd rather have it near the top where people can see it.
David Hawkins April 27, 2009, 8:44 p.m. UTC | #18
Timur Tabi wrote:
> David Hawkins wrote:
> 
>> Ira will add your comment to the body of the code near
>> the PRC_RM command and submit a new patch.
> 
> I'd rather have it near the top where people can see it.

Looks like Ira had the same thought :)

Dave
Scott Wood April 27, 2009, 8:47 p.m. UTC | #19
On Mon, Apr 27, 2009 at 03:04:49PM -0500, Timur Tabi wrote:
> David Hawkins wrote:
> 
> > Can you give me an example of non-PCI memory that would be
> > non-prefetchable that you'd like us to try? We can see if our
> > host CPUs have an area like that ... we just need to know
> > what device to look for first :)
> 
> Hmmmm.... I was going to say any SOC device in the IMMR, but I don't see
> anything there that would constitute a memory buffer.
> 
> I test this change on an 8610 and DMA to a register I/O, where this bit
> isn't even defined, and it made no difference.  So I guess this change
> is okay.

I thought the driver only used the bit if the device tree indicated it
was an 83xx-era DMA controller?

That said, the bits are documented as specifically for PCI, so it would
be surprising if it had any effect elsewhere.

-Scott
Timur Tabi April 27, 2009, 8:49 p.m. UTC | #20
Scott Wood wrote:

> I thought the driver only used the bit if the device tree indicated it
> was an 83xx-era DMA controller?

I just wanted to make sure it didn't do anything weird.  It was the only
test I could think of that didn't involve PCI.

> That said, the bits are documented as specifically for PCI, so it would
> be surprising if it had any effect elsewhere.

Surely you wouldn't really be surprised by incorrect documentation of
our parts. :-)
Dave Liu April 27, 2009, 9:59 p.m. UTC | #21
>> You are assuming the PCI memory space is prefetchable( no side effect)
>> for DMA.
>> Is it possible that DMA is from non-prefetchable memory space?

> This should be a safe assumption for this driver.  Remember, this
> driver just does offload memcpy, from one region to another.  So the
> PCI memory that you are reading from should be just a buffer of data,
> and there should be side-effect of reading it.

How about one FIFO buffer?
Dave Liu April 28, 2009, 1:31 a.m. UTC | #22
> > You are assuming the PCI memory space is prefetchable( no 
> > side effect) for DMA.
> > Is it possible that DMA is from non-prefetchable memory space?
> 
> This should be a safe assumption for this driver.  Remember, this
> driver just does offload memcpy, from one region to another.  So the
> PCI memory that you are reading from should be just a buffer of data,
> and there should be side-effect of reading it.
> 
> However, I would like to see a comment at the top of the file warning
> people that copying from PCI memory will result in prefetched reads.

How about FIFO RAM case?
David Hawkins April 28, 2009, 1:36 a.m. UTC | #23
> How about FIFO RAM case?

If the FIFO has a fixed address, then according to
the user guide, the DMA controller won't generate
a burst transaction to it.

We can try confirming this if you'd like.

Cheers,
Dave
Dave Liu April 28, 2009, 1:48 a.m. UTC | #24
> Here's a few results from DMA tests between two
> MPC8349EA boards housed in a CPCI chassis.
> 
> 1. DMA mode register (DMAMRn)
>     PCI read command (PRC, bits 11:10)
> 
>     a) DMAMRn[PRC] = 00 = PCI Read
> 
>        The PCI read command is 6h on the PCI bus.
>        For DMA lengths of less than 1 cache line (32-bytes)
>        the DMA controller will generate a PCI 6h command.
>        However, for lengths of 32-bytes and higher, the
>        DMA controller actually generates a PCI Read Line (Eh)
>        command.
> 
>        Freescale indicated that this 'change of PCI command code'
>        functionality is an undocumented 'feature', there to
>        improve performance for longer read lengths.
> 
>     b) DMAMRn[PRC] = 01 = PCI Read Line
> 
>        Generated the PCI command code for PCI read line (Eh),
>        regardless of DMA length.
> 
>     c) DMAMRn[PRC] = 10 = PCI Read Multiple
> 
>        Generated the PCI command code for PCI Read Multiple (Ch),
>        regardless of DMA length.

Good summary!

For the DMA PCI read/line/multi-line is outbound transaction.
So according to your experiment, the 8349 PCI controller(as master)
attemp to streaming/combining the outbound transaction(treated as
prefetchable
space). 
IIRC, the early 8349EUM has the bit[2]-SE in the POCMRn
register, and is removed now. Not sure if it does function.
Dave Liu April 28, 2009, 2:06 a.m. UTC | #25
> >> You can mark the pci inbound window on the 83xx as 
> >> non-prefetchable(assuming 83xx is host). On a x86 host
> >> I doubt there is any easy way to get non-prefetchable memory.

> One more note; we don't have access to a host-mode MPC8349EA,
> our boards are all targets.

Actually we also can use the inbound window(w/wo prefetchable)
in the agent mode. It doesn't care if it is host or agent.
David Hawkins April 28, 2009, 2:08 a.m. UTC | #26
Hi Dave,

> For the DMA PCI read/line/multi-line is outbound transaction.
> So according to your experiment, the 8349 PCI controller(as master)
> attemp to streaming/combining the outbound transaction(treated as
> prefetchable space).

Yep, with the MPC8349EA configured as a PCI Target,
and operating as a Bus Master, DMA transfers between
two MPC8349EA targets to prefetchable memory on the
slave will burst at pretty much full-speed over the
PCI bus. The same goes for DMA to the host memory.
However, reading from the host is slower, as the bursts
get chopped up more than they do between two target
boards. At some point I'll get a bunch of screen
captures and put them in a document.

> IIRC, the early 8349EUM has the bit[2]-SE in the POCMRn
> register, and is removed now. Not sure if it does function.

Hey yeah, I looked in the 8349E manual, and it is defined.
I'm not sure why it would be defined there though. I can't
think of why the master would want to disable streaming
based on a bit setting; it should burst until the IOS
says its full. Anyway, the bit is gone now, so we'll
just ignore the fact it existed :)

Cheers,
Dave
Timur Tabi April 28, 2009, 1:43 p.m. UTC | #27
David Hawkins wrote:
>> How about FIFO RAM case?
> 
> If the FIFO has a fixed address, then according to
> the user guide, the DMA controller won't generate
> a burst transaction to it.
> 
> We can try confirming this if you'd like.

Like I said earlier, this driver does not support copying data to/from a
single register address.  It's just a off-loaded memcpy.  So we don't
need to worry about what happens to a fixed address.
diff mbox

Patch

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index da8a8ed..638d00e 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -49,9 +49,10 @@  static void dma_init(struct fsl_dma_chan *fsl_chan)
 	case FSL_DMA_IP_83XX:
 		/* Set the channel to below modes:
 		 * EOTIE - End-of-transfer interrupt enable
+		 * PRC_RM - PCI read multiple
 		 */
-		DMA_OUT(fsl_chan, &fsl_chan->reg_base->mr, FSL_DMA_MR_EOTIE,
-				32);
+		DMA_OUT(fsl_chan, &fsl_chan->reg_base->mr, FSL_DMA_MR_EOTIE
+				| FSL_DMA_MR_PRC_RM, 32);
 		break;
 	}
 
diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 4f21a51..dc7f268 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -38,6 +38,7 @@ 
 
 /* Special MR definition for MPC8349 */
 #define FSL_DMA_MR_EOTIE	0x00000080
+#define FSL_DMA_MR_PRC_RM	0x00000800
 
 #define FSL_DMA_SR_CH		0x00000020
 #define FSL_DMA_SR_PE		0x00000010