Message ID | 20090424183517.GB23140@ovro.caltech.edu (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | Kumar Gala |
Headers | show |
On Sat, Apr 25, 2009 at 2:35 AM, Ira Snyder <iws@ovro.caltech.edu> wrote: > By default, the Freescale 83xx DMA controller uses the PCI Read Line > command when reading data over the PCI bus. Setting the controller to use > the PCI Read Multiple command instead allows the controller to read much > larger bursts of data, which provides a drastic speed increase. > > The slowdown due to using PCI Read Line was only observed when a PCI-to-PCI > bridge was between the devices trying to communicate. > > A simple test driver showed an increase from 4MB/sec to 116MB/sec when > performing DMA over the PCI bus. Using DMA to transfer between blocks of > local SDRAM showed no change in performance with this patch. The dmatest > driver was also used to verify the correctness of the transfers, and showed > no errors. > > Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Acked-by: Li Yang <leoli@freescale.com> Thanks, Leo
> By default, the Freescale 83xx DMA controller uses the PCI Read Line > command when reading data over the PCI bus. Setting the > controller to use the PCI Read Multiple command instead allows the > controller to read much larger bursts of data, which provides a drastic > speed increase. IIRC, the default for 83xx DMA controller uses the PCI mem read command, not mem read line. You are assuming the PCI memory space is prefetchable( no side effect) for DMA. Is it possible that DMA is from non-prefetchable memory space? > The slowdown due to using PCI Read Line was only observed > when a PCI-to-PCI bridge was between the devices trying to communicate. > > A simple test driver showed an increase from 4MB/sec to 116MB/sec when > performing DMA over the PCI bus. Using DMA to transfer between blocks > of local SDRAM showed no change in performance with this patch. > The dmatest driver was also used to verify the correctness of the transfers, > and showed no errors.
On Mon, Apr 27, 2009 at 5:09 PM, Liu Dave-R63238 <DaveLiu@freescale.com> wrote: >> By default, the Freescale 83xx DMA controller uses the PCI Read Line >> command when reading data over the PCI bus. Setting the >> controller to use the PCI Read Multiple command instead allows the >> controller to read much larger bursts of data, which provides a > drastic >> speed increase. > > IIRC, the default for 83xx DMA controller uses the PCI mem read > command, not mem read line. > > You are assuming the PCI memory space is prefetchable( no side effect) > for DMA. > Is it possible that DMA is from non-prefetchable memory space? I guess it's not reasonable to use DMA from non-prefetchable memory. So it's up to the driver which uses the DMA engine to prevent from using DMA API on non-prefetchable memory. - Leo
On Mon, Apr 27, 2009 at 4:09 AM, Liu Dave-R63238 <DaveLiu@freescale.com> wrote: > You are assuming the PCI memory space is prefetchable( no side effect) > for DMA. > Is it possible that DMA is from non-prefetchable memory space? This should be a safe assumption for this driver. Remember, this driver just does offload memcpy, from one region to another. So the PCI memory that you are reading from should be just a buffer of data, and there should be side-effect of reading it. However, I would like to see a comment at the top of the file warning people that copying from PCI memory will result in prefetched reads.
Hi all, >> You are assuming the PCI memory space is prefetchable( no side effect) >> for DMA. Is it possible that DMA is from non-prefetchable memory space? > > This should be a safe assumption for this driver. Remember, this > driver just does offload memcpy, from one region to another. So the > PCI memory that you are reading from should be just a buffer of data, > and there should be side-effect of reading it. > > However, I would like to see a comment at the top of the file warning > people that copying from PCI memory will result in prefetched reads. > Here's a few results from DMA tests between two MPC8349EA boards housed in a CPCI chassis. 1. DMA mode register (DMAMRn) PCI read command (PRC, bits 11:10) a) DMAMRn[PRC] = 00 = PCI Read The PCI read command is 6h on the PCI bus. For DMA lengths of less than 1 cache line (32-bytes) the DMA controller will generate a PCI 6h command. However, for lengths of 32-bytes and higher, the DMA controller actually generates a PCI Read Line (Eh) command. Freescale indicated that this 'change of PCI command code' functionality is an undocumented 'feature', there to improve performance for longer read lengths. b) DMAMRn[PRC] = 01 = PCI Read Line Generated the PCI command code for PCI read line (Eh), regardless of DMA length. c) DMAMRn[PRC] = 10 = PCI Read Multiple Generated the PCI command code for PCI Read Multiple (Ch), regardless of DMA length. 2. DMA from areas marked as non-prefetchable. We setup two test cases: a) Two boards in the same PCI segment with no intervening bridges. b) Two boards in separate PCI segments with intervening bridges (in this case, there was two bridges on SBS CPCI-to-CPI backplane bridging cards). The PCI window to the IMMR registers on the boards is marked as non-prefetchable. DMA from that area (reads) appeared on the PCI bus as single 32-bit read transactions, i.e., the target device effectively ignores the PCI read command, and the target returns read data as single reads, i.e., the target acts as as non-prefetchable memory. We tested with the PCI Read Multiple command, and no data was ever prefetched from the target. The bridges did not prefetch and discard data, eg. we tried 36-bytes, and there were 9 separate 4-byte transactions (we were using a 32-bit PCI host for this test). Hence, at least for this example target device, there are no side effects to using PCI read multiple on the bus master, to a target PCI memory region marked as non-prefetchable. So even though the bridges were seeing transactions that indicated read multiple, only single read cycles were seen to be used. Would you like some sort of summary of this info for a commit message? Would you like us to check any other transaction/register combos? Cheers, Dave
David Hawkins wrote: > Would you like some sort of summary of this info for a commit > message? That's probably overkill. I just want a sentence or two that tells someone looking at the code casually that the behavior of reading PCI memory might be different than what they expect. > Would you like us to check any other transaction/register combos? Yes, could you try this on non-PCI memory?
>> Would you like some sort of summary of this info for a commit >> message? > > That's probably overkill. I just want a sentence or two that tells > someone looking at the code casually that the behavior of reading PCI > memory might be different than what they expect. Ok, will-do. >> Would you like us to check any other transaction/register combos? > > Yes, could you try this on non-PCI memory? We've been using it to DMA between the x86 host main memory and the MPC8349EA boards (PCI targets). The reason we changed to Read Multiple was that it had a dramatic improvement in efficiency through bridges. However, the x86 host memory is prefetchable, so is consistent with the use of Read Multiple. Can you give me an example of non-PCI memory that would be non-prefetchable that you'd like us to try? We can see if our host CPUs have an area like that ... we just need to know what device to look for first :) Cheers, Dave
On Apr 27, 2009, at 2:48 PM, David Hawkins wrote: > Can you give me an example of non-PCI memory that would be > non-prefetchable that you'd like us to try? We can see if our > host CPUs have an area like that ... we just need to know > what device to look for first :) You can mark the pci inbound window on the 83xx as non-prefetchable (assuming 83xx is host). On a x86 host I doubt there is any easy way to get non-prefetchable memory. - k
>> Can you give me an example of non-PCI memory that would be >> non-prefetchable that you'd like us to try? We can see if our >> host CPUs have an area like that ... we just need to know >> what device to look for first :) > > You can mark the pci inbound window on the 83xx as non-prefetchable > (assuming 83xx is host). On a x86 host I doubt there is any easy way to > get non-prefetchable memory. Yep, we were going to do that, but chose to use the 1MB region already setup for the IMMRs since its already marked as non-prefetchable. We were only doing reads, so it wasn't going to hurt anything. I doubt that marking one of the other BAR regions as non-prefetchable will give a different result. However, we're more than happy to double-check if you'd like. Cheers, Dave
>> You can mark the pci inbound window on the 83xx as non-prefetchable >> (assuming 83xx is host). On a x86 host I doubt there is any easy way >> to get non-prefetchable memory. One more note; we don't have access to a host-mode MPC8349EA, our boards are all targets. Cheers, Dave
On Apr 27, 2009, at 3:00 PM, David Hawkins wrote: > >>> Can you give me an example of non-PCI memory that would be >>> non-prefetchable that you'd like us to try? We can see if our >>> host CPUs have an area like that ... we just need to know >>> what device to look for first :) >> You can mark the pci inbound window on the 83xx as non-prefetchable >> (assuming 83xx is host). On a x86 host I doubt there is any easy >> way to get non-prefetchable memory. > > Yep, we were going to do that, but chose to use the > 1MB region already setup for the IMMRs since its already > marked as non-prefetchable. We were only doing reads, so > it wasn't going to hurt anything. > > I doubt that marking one of the other BAR regions > as non-prefetchable will give a different result. > However, we're more than happy to double-check if > you'd like. Its possible you'll get a different result since IMMR is a register space internal and thats normally a completely different bus than memory would be (internal to the 83xx). I'd suggest double-checking w/ a BAR marked non-prefetch pointing to real memory. - k
David Hawkins wrote: > Can you give me an example of non-PCI memory that would be > non-prefetchable that you'd like us to try? We can see if our > host CPUs have an area like that ... we just need to know > what device to look for first :) Hmmmm.... I was going to say any SOC device in the IMMR, but I don't see anything there that would constitute a memory buffer. I test this change on an 8610 and DMA to a register I/O, where this bit isn't even defined, and it made no difference. So I guess this change is okay. Acked-by: Timur Tabi <timur@freescale.com>
>>>> Can you give me an example of non-PCI memory that would be >>>> non-prefetchable that you'd like us to try? We can see if our >>>> host CPUs have an area like that ... we just need to know >>>> what device to look for first :) >>> You can mark the pci inbound window on the 83xx as non-prefetchable >>> (assuming 83xx is host). On a x86 host I doubt there is any easy way >>> to get non-prefetchable memory. >> >> Yep, we were going to do that, but chose to use the >> 1MB region already setup for the IMMRs since its already >> marked as non-prefetchable. We were only doing reads, so >> it wasn't going to hurt anything. >> >> I doubt that marking one of the other BAR regions >> as non-prefetchable will give a different result. >> However, we're more than happy to double-check if >> you'd like. > > Its possible you'll get a different result since IMMR is a register > space internal and thats normally a completely different bus than memory > would be (internal to the 83xx). I'd suggest double-checking w/a BAR > marked non-prefetch pointing to real memory. We had a 4k BAR1 setup to point to DDR memory. With prefetchable set, a 36-byte transfer generated a burst-of-8 32-bit words followed by a single transaction. With non-prefetchable set, the transfers were all single. So it works like we'd expect. Cheers, Dave
Hi Timur, >> Would you like some sort of summary of this info for a commit >> message? > > That's probably overkill. I just want a sentence or two that tells > someone looking at the code casually that the behavior of reading PCI > memory might be different than what they expect. Could you help us with the wording you'd like to see in the code. Did you want to see something in the header comments, or something near the register settings? How about something like this in place of the existing PCI_RM comment: PRC_RM - PCI read multiple The default PCI read command used by the DMA controller is PCI Read (PCI command 6h). When the burst length is 32-bytes or longer, PCI Read Line (PCI command Eh) is used (undocumented feature of the controller). Using PCI read multiple (PCI command Ch) results in high-performance across PCI bridges. DMA transfers to non-prefetchable PCI registers should not result in prefetched reads, even when using the PCI read multiple command. Cheers, Dave
David Hawkins wrote: > PRC_RM - PCI read multiple > The default PCI read command used by the DMA controller is > PCI Read (PCI command 6h). When the burst length is 32-bytes > or longer, PCI Read Line (PCI command Eh) is used (undocumented > feature of the controller). Using PCI read multiple > (PCI command Ch) results in high-performance across PCI > bridges. DMA transfers to non-prefetchable PCI registers > should not result in prefetched reads, even when using > the PCI read multiple command. I was thinking more along the lines of: "This driver tells the DMA controller to use the PCI Read Multiple command, instead of the PCI Read Line command, for PCI read operations. Please be aware that this setting may result in read pre-fetching on some platforms."
Hi Timur, >> PRC_RM - PCI read multiple >> The default PCI read command used by the DMA controller is >> PCI Read (PCI command 6h). When the burst length is 32-bytes >> or longer, PCI Read Line (PCI command Eh) is used (undocumented >> feature of the controller). Using PCI read multiple >> (PCI command Ch) results in high-performance across PCI >> bridges. DMA transfers to non-prefetchable PCI registers >> should not result in prefetched reads, even when using >> the PCI read multiple command. > > I was thinking more along the lines of: > > "This driver tells the DMA controller to use the PCI Read Multiple > command, instead of the PCI Read Line command, for PCI read operations. > Please be aware that this setting may result in read pre-fetching on > some platforms." Ok, thanks. Ira will add your comment to the body of the code near the PRC_RM command and submit a new patch. Cheers, Dave
David Hawkins wrote: > Ira will add your comment to the body of the code near > the PRC_RM command and submit a new patch. I'd rather have it near the top where people can see it.
Timur Tabi wrote: > David Hawkins wrote: > >> Ira will add your comment to the body of the code near >> the PRC_RM command and submit a new patch. > > I'd rather have it near the top where people can see it. Looks like Ira had the same thought :) Dave
On Mon, Apr 27, 2009 at 03:04:49PM -0500, Timur Tabi wrote: > David Hawkins wrote: > > > Can you give me an example of non-PCI memory that would be > > non-prefetchable that you'd like us to try? We can see if our > > host CPUs have an area like that ... we just need to know > > what device to look for first :) > > Hmmmm.... I was going to say any SOC device in the IMMR, but I don't see > anything there that would constitute a memory buffer. > > I test this change on an 8610 and DMA to a register I/O, where this bit > isn't even defined, and it made no difference. So I guess this change > is okay. I thought the driver only used the bit if the device tree indicated it was an 83xx-era DMA controller? That said, the bits are documented as specifically for PCI, so it would be surprising if it had any effect elsewhere. -Scott
Scott Wood wrote: > I thought the driver only used the bit if the device tree indicated it > was an 83xx-era DMA controller? I just wanted to make sure it didn't do anything weird. It was the only test I could think of that didn't involve PCI. > That said, the bits are documented as specifically for PCI, so it would > be surprising if it had any effect elsewhere. Surely you wouldn't really be surprised by incorrect documentation of our parts. :-)
>> You are assuming the PCI memory space is prefetchable( no side effect) >> for DMA. >> Is it possible that DMA is from non-prefetchable memory space? > This should be a safe assumption for this driver. Remember, this > driver just does offload memcpy, from one region to another. So the > PCI memory that you are reading from should be just a buffer of data, > and there should be side-effect of reading it. How about one FIFO buffer?
> > You are assuming the PCI memory space is prefetchable( no > > side effect) for DMA. > > Is it possible that DMA is from non-prefetchable memory space? > > This should be a safe assumption for this driver. Remember, this > driver just does offload memcpy, from one region to another. So the > PCI memory that you are reading from should be just a buffer of data, > and there should be side-effect of reading it. > > However, I would like to see a comment at the top of the file warning > people that copying from PCI memory will result in prefetched reads. How about FIFO RAM case?
> How about FIFO RAM case?
If the FIFO has a fixed address, then according to
the user guide, the DMA controller won't generate
a burst transaction to it.
We can try confirming this if you'd like.
Cheers,
Dave
> Here's a few results from DMA tests between two > MPC8349EA boards housed in a CPCI chassis. > > 1. DMA mode register (DMAMRn) > PCI read command (PRC, bits 11:10) > > a) DMAMRn[PRC] = 00 = PCI Read > > The PCI read command is 6h on the PCI bus. > For DMA lengths of less than 1 cache line (32-bytes) > the DMA controller will generate a PCI 6h command. > However, for lengths of 32-bytes and higher, the > DMA controller actually generates a PCI Read Line (Eh) > command. > > Freescale indicated that this 'change of PCI command code' > functionality is an undocumented 'feature', there to > improve performance for longer read lengths. > > b) DMAMRn[PRC] = 01 = PCI Read Line > > Generated the PCI command code for PCI read line (Eh), > regardless of DMA length. > > c) DMAMRn[PRC] = 10 = PCI Read Multiple > > Generated the PCI command code for PCI Read Multiple (Ch), > regardless of DMA length. Good summary! For the DMA PCI read/line/multi-line is outbound transaction. So according to your experiment, the 8349 PCI controller(as master) attemp to streaming/combining the outbound transaction(treated as prefetchable space). IIRC, the early 8349EUM has the bit[2]-SE in the POCMRn register, and is removed now. Not sure if it does function.
> >> You can mark the pci inbound window on the 83xx as > >> non-prefetchable(assuming 83xx is host). On a x86 host > >> I doubt there is any easy way to get non-prefetchable memory. > One more note; we don't have access to a host-mode MPC8349EA, > our boards are all targets. Actually we also can use the inbound window(w/wo prefetchable) in the agent mode. It doesn't care if it is host or agent.
Hi Dave, > For the DMA PCI read/line/multi-line is outbound transaction. > So according to your experiment, the 8349 PCI controller(as master) > attemp to streaming/combining the outbound transaction(treated as > prefetchable space). Yep, with the MPC8349EA configured as a PCI Target, and operating as a Bus Master, DMA transfers between two MPC8349EA targets to prefetchable memory on the slave will burst at pretty much full-speed over the PCI bus. The same goes for DMA to the host memory. However, reading from the host is slower, as the bursts get chopped up more than they do between two target boards. At some point I'll get a bunch of screen captures and put them in a document. > IIRC, the early 8349EUM has the bit[2]-SE in the POCMRn > register, and is removed now. Not sure if it does function. Hey yeah, I looked in the 8349E manual, and it is defined. I'm not sure why it would be defined there though. I can't think of why the master would want to disable streaming based on a bit setting; it should burst until the IOS says its full. Anyway, the bit is gone now, so we'll just ignore the fact it existed :) Cheers, Dave
David Hawkins wrote: >> How about FIFO RAM case? > > If the FIFO has a fixed address, then according to > the user guide, the DMA controller won't generate > a burst transaction to it. > > We can try confirming this if you'd like. Like I said earlier, this driver does not support copying data to/from a single register address. It's just a off-loaded memcpy. So we don't need to worry about what happens to a fixed address.
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index da8a8ed..638d00e 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -49,9 +49,10 @@ static void dma_init(struct fsl_dma_chan *fsl_chan) case FSL_DMA_IP_83XX: /* Set the channel to below modes: * EOTIE - End-of-transfer interrupt enable + * PRC_RM - PCI read multiple */ - DMA_OUT(fsl_chan, &fsl_chan->reg_base->mr, FSL_DMA_MR_EOTIE, - 32); + DMA_OUT(fsl_chan, &fsl_chan->reg_base->mr, FSL_DMA_MR_EOTIE + | FSL_DMA_MR_PRC_RM, 32); break; } diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h index 4f21a51..dc7f268 100644 --- a/drivers/dma/fsldma.h +++ b/drivers/dma/fsldma.h @@ -38,6 +38,7 @@ /* Special MR definition for MPC8349 */ #define FSL_DMA_MR_EOTIE 0x00000080 +#define FSL_DMA_MR_PRC_RM 0x00000800 #define FSL_DMA_SR_CH 0x00000020 #define FSL_DMA_SR_PE 0x00000010
By default, the Freescale 83xx DMA controller uses the PCI Read Line command when reading data over the PCI bus. Setting the controller to use the PCI Read Multiple command instead allows the controller to read much larger bursts of data, which provides a drastic speed increase. The slowdown due to using PCI Read Line was only observed when a PCI-to-PCI bridge was between the devices trying to communicate. A simple test driver showed an increase from 4MB/sec to 116MB/sec when performing DMA over the PCI bus. Using DMA to transfer between blocks of local SDRAM showed no change in performance with this patch. The dmatest driver was also used to verify the correctness of the transfers, and showed no errors. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> --- drivers/dma/fsldma.c | 5 +++-- drivers/dma/fsldma.h | 1 + 2 files changed, 4 insertions(+), 2 deletions(-)