[v3,2/2] dmaengine: tegra210-adma: Add memcpy support

Submitted by Nicole Otsuka on Sept. 6, 2016, 6:42 p.m.

Details

Message ID c730136b708404e1257eb28bb1dc2cbf13bf6023.1473186743.git.nicoleotsuka@gmail.com
State New
Headers show

Commit Message

Nicole Otsuka Sept. 6, 2016, 6:42 p.m.
ADMA supports non-flow controlled Memory-to-Memory direction
transactions. So this patch just adds an initial support for
that. It passed a simple dmatest:
        echo dma1chan0 > /sys/module/dmatest/parameters/channel
	echo 1024 > /sys/module/dmatest/parameters/iterations
	echo 0 > /sys/module/dmatest/parameters/dmatest
	echo 1 > /sys/module/dmatest/parameters/run
	dmesg | grep dmatest
Started 1 threads using dma1chan0
dma1chan0-copy0: summary 1024 tests, 0 failures 2054 iops 16520 KB/s (0)

Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
---
 drivers/dma/tegra210-adma.c | 105 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 93 insertions(+), 12 deletions(-)

Comments

Jon Hunter Sept. 8, 2016, 2:08 p.m.
On 06/09/16 19:42, Nicolin Chen wrote:
> ADMA supports non-flow controlled Memory-to-Memory direction
> transactions. So this patch just adds an initial support for
> that. It passed a simple dmatest:
>         echo dma1chan0 > /sys/module/dmatest/parameters/channel
> 	echo 1024 > /sys/module/dmatest/parameters/iterations
> 	echo 0 > /sys/module/dmatest/parameters/dmatest
> 	echo 1 > /sys/module/dmatest/parameters/run
> 	dmesg | grep dmatest
> Started 1 threads using dma1chan0
> dma1chan0-copy0: summary 1024 tests, 0 failures 2054 iops 16520 KB/s (0)

What board and kernel did you try this on?

I have tried this on a tegra210-jetson-tx1 and I get :

[  202.569204] dmatest: Started 1 threads using dma1chan0
[  205.620318] dmatest: dma1chan0-copy0: result #1: 'test timed out' with src_off=0x86c dst_off=0xc80 len=0x307c (0)
[  208.692315] dmatest: dma1chan0-copy0: result #2: 'test timed out' with src_off=0x3288 dst_off=0x2720 len=0xa1c (0)
[  211.764323] dmatest: dma1chan0-copy0: result #3: 'test timed out' with src_off=0x2f44 dst_off=0x3164 len=0x3a4 (0)
[  214.836309] dmatest: dma1chan0-copy0: result #4: 'test timed out' with src_off=0x17d0 dst_off=0x2b10 len=0xe2c (0)
[  217.908305] dmatest: dma1chan0-copy0: result #5: 'test timed out' with src_off=0x23d8 dst_off=0xe90 len=0xb7c (0)
...

I also tried a tegra210-smaug and I get:

[  167.508828] dmatest: Started 1 threads using dma1chan0
[  167.508870] dmatest: dma1chan0-copy0: dstbuf[0x0] not copied! Expected db, got 09
[  167.508873] dmatest: dma1chan0-copy0: dstbuf[0x1] not copied! Expected da, got 05
[  167.508875] dmatest: dma1chan0-copy0: dstbuf[0x2] not copied! Expected d9, got 26
[  167.508876] dmatest: dma1chan0-copy0: dstbuf[0x3] not copied! Expected d8, got 16
...

I am testing with your patches on next-20160905.

Cheers
Jon
Jon Hunter Sept. 8, 2016, 2:19 p.m.
On 08/09/16 15:08, Jon Hunter wrote:
> 
> On 06/09/16 19:42, Nicolin Chen wrote:
>> ADMA supports non-flow controlled Memory-to-Memory direction
>> transactions. So this patch just adds an initial support for
>> that. It passed a simple dmatest:
>>         echo dma1chan0 > /sys/module/dmatest/parameters/channel
>> 	echo 1024 > /sys/module/dmatest/parameters/iterations
>> 	echo 0 > /sys/module/dmatest/parameters/dmatest
>> 	echo 1 > /sys/module/dmatest/parameters/run
>> 	dmesg | grep dmatest
>> Started 1 threads using dma1chan0
>> dma1chan0-copy0: summary 1024 tests, 0 failures 2054 iops 16520 KB/s (0)
> 
> What board and kernel did you try this on?
> 
> I have tried this on a tegra210-jetson-tx1 and I get :
> 
> [  202.569204] dmatest: Started 1 threads using dma1chan0
> [  205.620318] dmatest: dma1chan0-copy0: result #1: 'test timed out' with src_off=0x86c dst_off=0xc80 len=0x307c (0)
> [  208.692315] dmatest: dma1chan0-copy0: result #2: 'test timed out' with src_off=0x3288 dst_off=0x2720 len=0xa1c (0)
> [  211.764323] dmatest: dma1chan0-copy0: result #3: 'test timed out' with src_off=0x2f44 dst_off=0x3164 len=0x3a4 (0)
> [  214.836309] dmatest: dma1chan0-copy0: result #4: 'test timed out' with src_off=0x17d0 dst_off=0x2b10 len=0xe2c (0)
> [  217.908305] dmatest: dma1chan0-copy0: result #5: 'test timed out' with src_off=0x23d8 dst_off=0xe90 len=0xb7c (0)
> ...
> 
> I also tried a tegra210-smaug and I get:
> 
> [  167.508828] dmatest: Started 1 threads using dma1chan0
> [  167.508870] dmatest: dma1chan0-copy0: dstbuf[0x0] not copied! Expected db, got 09
> [  167.508873] dmatest: dma1chan0-copy0: dstbuf[0x1] not copied! Expected da, got 05
> [  167.508875] dmatest: dma1chan0-copy0: dstbuf[0x2] not copied! Expected d9, got 26
> [  167.508876] dmatest: dma1chan0-copy0: dstbuf[0x3] not copied! Expected d8, got 16
> ...

I tried this again on my audio testing branch for tegra210 and it worked ...

[   36.427210] dmatest: Started 1 threads using dma1chan0
[   37.036948] dmatest: dma1chan0-copy0: summary 1024 tests, 0 failures 2410 iops 19419 KB/s (0)

Do you have other patches in your branch? If so it would be good to understand the dependency. May be we are missing a clock somewhere?

Cheers
Jon
Nicole Otsuka Sept. 8, 2016, 5:31 p.m.
On Thu, Sep 08, 2016 at 03:19:57PM +0100, Jon Hunter wrote:

> I tried this again on my audio testing branch for tegra210 and it worked ...
> 
> [   36.427210] dmatest: Started 1 threads using dma1chan0
> [   37.036948] dmatest: dma1chan0-copy0: summary 1024 tests, 0 failures 2410 iops 19419 KB/s (0)
> 
> Do you have other patches in your branch? If so it would be good to
> understand the dependency. May be we are missing a clock somewhere?

Sorry. I forgot to mention that the TEGRA210_CLK_APE_SLCG_OVR
clock is required for the tests. So I cherry-picked 2 patches
from your audio branch to the linux-next:
	clk: tegra210: Add SLCG override gate clocks
	ARM64: tegra: DT: Add SLCG clock for AUD

Besides these two, there'e merely some straightforward changes
to enable the ADMA in the defconfig and DT.

And it seems that you've submitted that patch once but it got
hold because it wasn't so useful at that time?

Thanks
Nic
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter Sept. 12, 2016, 2:34 p.m.
On 08/09/16 18:31, Nicolin Chen wrote:
> On Thu, Sep 08, 2016 at 03:19:57PM +0100, Jon Hunter wrote:
> 
>> I tried this again on my audio testing branch for tegra210 and it worked ...
>>
>> [   36.427210] dmatest: Started 1 threads using dma1chan0
>> [   37.036948] dmatest: dma1chan0-copy0: summary 1024 tests, 0 failures 2410 iops 19419 KB/s (0)
>>
>> Do you have other patches in your branch? If so it would be good to
>> understand the dependency. May be we are missing a clock somewhere?
> 
> Sorry. I forgot to mention that the TEGRA210_CLK_APE_SLCG_OVR
> clock is required for the tests. So I cherry-picked 2 patches
> from your audio branch to the linux-next:
> 	clk: tegra210: Add SLCG override gate clocks
> 	ARM64: tegra: DT: Add SLCG clock for AUD
> 
> Besides these two, there'e merely some straightforward changes
> to enable the ADMA in the defconfig and DT.
> 
> And it seems that you've submitted that patch once but it got
> hold because it wasn't so useful at that time?

Yes it was not being used at the time. It is on my list of things to do
and we need to revisit it. There was some discussion on the best way to
handle these clocks from a client perspective. I am not sure we came to
a conclusion on this. I need to find some time to look at this.

Jon
Nicole Otsuka Sept. 12, 2016, 8:50 p.m.
On Mon, Sep 12, 2016 at 03:34:08PM +0100, Jon Hunter wrote:

> > Sorry. I forgot to mention that the TEGRA210_CLK_APE_SLCG_OVR
> > clock is required for the tests. So I cherry-picked 2 patches
> > from your audio branch to the linux-next:
> > 	clk: tegra210: Add SLCG override gate clocks
> > 	ARM64: tegra: DT: Add SLCG clock for AUD

> > And it seems that you've submitted that patch once but it got
> > hold because it wasn't so useful at that time?

> Yes it was not being used at the time. It is on my list of things to do
> and we need to revisit it. There was some discussion on the best way to
> handle these clocks from a client perspective. I am not sure we came to
> a conclusion on this. I need to find some time to look at this.

I may also take a look to speed it up. Yet, putting that clock
aside, how about this patch then? I think we don't need to wait
for that clock patch in order to announce that we support this
now on a specific SoC but can just treat it as a new feature of
a DMA controller, which sounds quite plausible to me since the
ADMA module is now being disabled in all dts files of existing
SoCs -- There have to be some local changes in any way so as to
test it with the mainline code.

Thanks
Nic
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter Sept. 13, 2016, 8:52 a.m.
On 12/09/16 21:50, Nicolin Chen wrote:
> On Mon, Sep 12, 2016 at 03:34:08PM +0100, Jon Hunter wrote:
> 
>>> Sorry. I forgot to mention that the TEGRA210_CLK_APE_SLCG_OVR
>>> clock is required for the tests. So I cherry-picked 2 patches
>>> from your audio branch to the linux-next:
>>> 	clk: tegra210: Add SLCG override gate clocks
>>> 	ARM64: tegra: DT: Add SLCG clock for AUD
> 
>>> And it seems that you've submitted that patch once but it got
>>> hold because it wasn't so useful at that time?
> 
>> Yes it was not being used at the time. It is on my list of things to do
>> and we need to revisit it. There was some discussion on the best way to
>> handle these clocks from a client perspective. I am not sure we came to
>> a conclusion on this. I need to find some time to look at this.
> 
> I may also take a look to speed it up. Yet, putting that clock
> aside, how about this patch then? I think we don't need to wait
> for that clock patch in order to announce that we support this
> now on a specific SoC but can just treat it as a new feature of
> a DMA controller, which sounds quite plausible to me since the
> ADMA module is now being disabled in all dts files of existing
> SoCs -- There have to be some local changes in any way so as to
> test it with the mainline code.

I am fine with the changes. However, I am wondering if we should sort
out this clock business first just in case someone tries to use this.

Cheers
Jon
Koul, Vinod Sept. 14, 2016, 1:38 p.m.
On Tue, Sep 13, 2016 at 09:52:43AM +0100, Jon Hunter wrote:
> 
> On 12/09/16 21:50, Nicolin Chen wrote:
> > On Mon, Sep 12, 2016 at 03:34:08PM +0100, Jon Hunter wrote:
> > 
> >>> Sorry. I forgot to mention that the TEGRA210_CLK_APE_SLCG_OVR
> >>> clock is required for the tests. So I cherry-picked 2 patches
> >>> from your audio branch to the linux-next:
> >>> 	clk: tegra210: Add SLCG override gate clocks
> >>> 	ARM64: tegra: DT: Add SLCG clock for AUD
> > 
> >>> And it seems that you've submitted that patch once but it got
> >>> hold because it wasn't so useful at that time?
> > 
> >> Yes it was not being used at the time. It is on my list of things to do
> >> and we need to revisit it. There was some discussion on the best way to
> >> handle these clocks from a client perspective. I am not sure we came to
> >> a conclusion on this. I need to find some time to look at this.
> > 
> > I may also take a look to speed it up. Yet, putting that clock
> > aside, how about this patch then? I think we don't need to wait
> > for that clock patch in order to announce that we support this
> > now on a specific SoC but can just treat it as a new feature of
> > a DMA controller, which sounds quite plausible to me since the
> > ADMA module is now being disabled in all dts files of existing
> > SoCs -- There have to be some local changes in any way so as to
> > test it with the mainline code.
> 
> I am fine with the changes. However, I am wondering if we should sort
> out this clock business first just in case someone tries to use this.

I think that is better, so am dropping this series.

Patch hide | download patch | download mbox

diff --git a/drivers/dma/tegra210-adma.c b/drivers/dma/tegra210-adma.c
index 5b5d298..1a7148d 100644
--- a/drivers/dma/tegra210-adma.c
+++ b/drivers/dma/tegra210-adma.c
@@ -42,9 +42,14 @@ 
 #define ADMA_CH_CTRL_RX_REQ(val)			(((val) & 0xf) << 24)
 #define ADMA_CH_CTRL_RX_REQ_MAX				10
 #define ADMA_CH_CTRL_DIR(val)				(((val) & 0xf) << 12)
+#define ADMA_CH_CTRL_DIR_MEM2MEM			1
 #define ADMA_CH_CTRL_DIR_AHUB2MEM			2
 #define ADMA_CH_CTRL_DIR_MEM2AHUB			4
-#define ADMA_CH_CTRL_MODE_CONTINUOUS			(2 << 8)
+#define ADMA_CH_CTRL_DIR_AHUB2AHUB			8
+#define ADMA_CH_CTRL_MODE(val)				(((val) & 0x7) << 8)
+#define ADMA_CH_CTRL_MODE_ONCE				1
+#define ADMA_CH_CTRL_MODE_CONTINUOUS			2
+#define ADMA_CH_CTRL_MODE_LINKED_LIST			4
 #define ADMA_CH_CTRL_FLOWCTRL_EN			BIT(1)
 
 #define ADMA_CH_CONFIG					0x28
@@ -264,6 +269,9 @@  static int tegra_adma_request_alloc(struct tegra_adma_chan *tdc,
 		}
 		break;
 
+	case DMA_MEM_TO_MEM:
+		break;
+
 	default:
 		dev_WARN(tdma->dev, "channel %s has invalid transfer type\n",
 			 dma_chan_name(&tdc->vc.chan));
@@ -292,6 +300,9 @@  static void tegra_adma_request_free(struct tegra_adma_chan *tdc)
 		clear_bit(tdc->sreq_index, &tdma->rx_requests_reserved);
 		break;
 
+	case DMA_MEM_TO_MEM:
+		break;
+
 	default:
 		dev_WARN(tdma->dev, "channel %s has invalid transfer type\n",
 			 dma_chan_name(&tdc->vc.chan));
@@ -409,8 +420,14 @@  static irqreturn_t tegra_adma_isr(int irq, void *dev_id)
 		return IRQ_NONE;
 	}
 
-	if (tdc->desc->cyclic)
+	if (tdc->desc->cyclic) {
 		vchan_cyclic_callback(&tdc->desc->vd);
+	} else {
+		/* Disable the channel */
+		tdma_ch_write(tdc, ADMA_CH_CMD, 0);
+		vchan_cookie_complete(&tdc->desc->vd);
+		tdc->desc = NULL;
+	}
 
 	spin_unlock_irqrestore(&tdc->vc.lock, flags);
 
@@ -488,42 +505,59 @@  static enum dma_status tegra_adma_tx_status(struct dma_chan *dc,
 static int tegra_adma_set_xfer_params(struct tegra_adma_chan *tdc,
 				      struct tegra_adma_desc *desc,
 				      dma_addr_t buf_addr,
+				      dma_addr_t buf_addr2,
 				      enum dma_transfer_direction direction)
 {
 	struct tegra_adma_chan_regs *ch_regs = &desc->ch_regs;
-	unsigned int burst_size, adma_dir;
+	unsigned int num_periods = desc->num_periods;
+	unsigned int burst_size, adma_dir, adma_mode;
 
-	if (desc->num_periods > ADMA_CH_CONFIG_MAX_BUFS)
+	if (num_periods > ADMA_CH_CONFIG_MAX_BUFS)
 		return -EINVAL;
 
 	switch (direction) {
 	case DMA_MEM_TO_DEV:
 		adma_dir = ADMA_CH_CTRL_DIR_MEM2AHUB;
 		burst_size = fls(tdc->sconfig.dst_maxburst);
-		ch_regs->config = ADMA_CH_CONFIG_SRC_BUF(desc->num_periods - 1);
-		ch_regs->ctrl = ADMA_CH_CTRL_TX_REQ(tdc->sreq_index);
+		ch_regs->config = ADMA_CH_CONFIG_SRC_BUF(num_periods - 1);
+		ch_regs->ctrl = ADMA_CH_CTRL_TX_REQ(tdc->sreq_index) |
+				ADMA_CH_CTRL_FLOWCTRL_EN;
 		ch_regs->src_addr = buf_addr;
 		break;
 
 	case DMA_DEV_TO_MEM:
 		adma_dir = ADMA_CH_CTRL_DIR_AHUB2MEM;
 		burst_size = fls(tdc->sconfig.src_maxburst);
-		ch_regs->config = ADMA_CH_CONFIG_TRG_BUF(desc->num_periods - 1);
-		ch_regs->ctrl = ADMA_CH_CTRL_RX_REQ(tdc->sreq_index);
+		ch_regs->config = ADMA_CH_CONFIG_TRG_BUF(num_periods - 1);
+		ch_regs->ctrl = ADMA_CH_CTRL_RX_REQ(tdc->sreq_index) |
+				ADMA_CH_CTRL_FLOWCTRL_EN;
 		ch_regs->trg_addr = buf_addr;
 		break;
 
+	case DMA_MEM_TO_MEM:
+		adma_dir = ADMA_CH_CTRL_DIR_MEM2MEM;
+		burst_size = ADMA_CH_CONFIG_BURST_16;
+		ch_regs->config = ADMA_CH_CONFIG_SRC_BUF(num_periods - 1) |
+				  ADMA_CH_CONFIG_TRG_BUF(num_periods - 1);
+		ch_regs->src_addr = buf_addr;
+		ch_regs->trg_addr = buf_addr2;
+		break;
+
 	default:
 		dev_err(tdc2dev(tdc), "DMA direction is not supported\n");
 		return -EINVAL;
 	}
 
+	if (desc->cyclic)
+		adma_mode = ADMA_CH_CTRL_MODE_CONTINUOUS;
+	else
+		adma_mode = ADMA_CH_CTRL_MODE_ONCE;
+
 	if (!burst_size || burst_size > ADMA_CH_CONFIG_BURST_16)
 		burst_size = ADMA_CH_CONFIG_BURST_16;
 
 	ch_regs->ctrl |= ADMA_CH_CTRL_DIR(adma_dir) |
-			 ADMA_CH_CTRL_MODE_CONTINUOUS |
-			 ADMA_CH_CTRL_FLOWCTRL_EN;
+			 ADMA_CH_CTRL_MODE(adma_mode);
 	ch_regs->config |= ADMA_CH_CONFIG_BURST_SIZE(burst_size);
 	ch_regs->config |= ADMA_CH_CONFIG_WEIGHT_FOR_WRR(1);
 	ch_regs->fifo_ctrl = ADMA_CH_FIFO_CTRL_DEFAULT;
@@ -564,7 +598,49 @@  static struct dma_async_tx_descriptor *tegra_adma_prep_dma_cyclic(
 	desc->period_len = period_len;
 	desc->num_periods = buf_len / period_len;
 
-	if (tegra_adma_set_xfer_params(tdc, desc, buf_addr, direction)) {
+	if (tegra_adma_set_xfer_params(tdc, desc, buf_addr, 0, direction)) {
+		kfree(desc);
+		return NULL;
+	}
+
+	return vchan_tx_prep(&tdc->vc, &desc->vd, flags);
+}
+
+static struct dma_async_tx_descriptor *tegra_adma_prep_dma_memcpy(
+	struct dma_chan *dc, dma_addr_t dest, dma_addr_t src,
+	size_t buf_len, unsigned long flags)
+{
+	struct tegra_adma_chan *tdc = to_tegra_adma_chan(dc);
+	struct device *dev = dc->device->dev;
+	struct tegra_adma_desc *desc = NULL;
+
+	dev_dbg(dev, "%s channel: %d src=0x%llx dst=0x%llx len=%zu\n",
+		__func__, dc->chan_id, (unsigned long long)src,
+		(unsigned long long)dest, buf_len);
+
+	if (unlikely(!tdc || !buf_len))
+		return NULL;
+
+	/*
+	 * ADMA supports up to 8 periods but it should be sufficient to use
+	 * one period for now which already allows us to transfer up to 1GB
+	 * (28-bit word aligned transfer size). We may add multiple periods
+	 * support to extend the limitation later.
+	 */
+	if (buf_len > ADMA_CH_TC_COUNT_MASK) {
+		dev_err(dev, "only supports up to 1GB transfer size\n");
+		return NULL;
+	}
+
+	desc = kzalloc(sizeof(*desc), GFP_NOWAIT);
+	if (!desc)
+		return NULL;
+
+	desc->num_periods = 1;
+	desc->buf_len = buf_len;
+	desc->period_len = buf_len;
+
+	if (tegra_adma_set_xfer_params(tdc, desc, src, dest, DMA_MEM_TO_MEM)) {
 		kfree(desc);
 		return NULL;
 	}
@@ -741,6 +817,7 @@  static int tegra_adma_probe(struct platform_device *pdev)
 	dma_cap_set(DMA_SLAVE, tdma->dma_dev.cap_mask);
 	dma_cap_set(DMA_PRIVATE, tdma->dma_dev.cap_mask);
 	dma_cap_set(DMA_CYCLIC, tdma->dma_dev.cap_mask);
+	dma_cap_set(DMA_MEMCPY, tdma->dma_dev.cap_mask);
 
 	tdma->dma_dev.dev = &pdev->dev;
 	tdma->dma_dev.device_alloc_chan_resources =
@@ -749,14 +826,18 @@  static int tegra_adma_probe(struct platform_device *pdev)
 					tegra_adma_free_chan_resources;
 	tdma->dma_dev.device_issue_pending = tegra_adma_issue_pending;
 	tdma->dma_dev.device_prep_dma_cyclic = tegra_adma_prep_dma_cyclic;
+	tdma->dma_dev.device_prep_dma_memcpy = tegra_adma_prep_dma_memcpy;
 	tdma->dma_dev.device_config = tegra_adma_slave_config;
 	tdma->dma_dev.device_tx_status = tegra_adma_tx_status;
 	tdma->dma_dev.device_terminate_all = tegra_adma_terminate_all;
 	tdma->dma_dev.src_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_4_BYTES);
 	tdma->dma_dev.dst_addr_widths = BIT(DMA_SLAVE_BUSWIDTH_4_BYTES);
-	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
+	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV) |
+				   BIT(DMA_MEM_TO_MEM);
 	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
 
+	tdma->dma_dev.copy_align = DMAENGINE_ALIGN_4_BYTES;
+
 	ret = dma_async_device_register(&tdma->dma_dev);
 	if (ret < 0) {
 		dev_err(&pdev->dev, "ADMA registration failed: %d\n", ret);