mbox series

[net-next,v5,00/10] net: axienet: Introduce dmaengine

Message ID 1691387509-2113129-1-git-send-email-radhey.shyam.pandey@amd.com
Headers show
Series net: axienet: Introduce dmaengine | expand

Message

Pandey, Radhey Shyam Aug. 7, 2023, 5:51 a.m. UTC
The axiethernet driver can use the dmaengine framework to communicate
with the xilinx DMAengine driver(AXIDMA, MCDMA). The inspiration behind
this dmaengine adoption is to reuse the in-kernel xilinx dma engine
driver[1] and remove redundant dma programming sequence[2] from the
ethernet driver. This simplifies the ethernet driver and also makes
it generic to be hooked to any complaint dma IP i.e AXIDMA, MCDMA
without any modification.

The dmaengine framework was extended for metadata API support during
the axidma RFC[3] discussion. However, it still needs further
enhancements to make it well suited for ethernet usecases.

Comments, suggestions, thoughts to implement remaining functional
features are very welcome!

[1]: https://github.com/torvalds/linux/blob/master/drivers/dma/xilinx/xilinx_dma.c
[2]: https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/xilinx/xilinx_axienet_main.c#L238
[3]: http://lkml.iu.edu/hypermail/linux/kernel/1804.0/00367.html
[4]: https://lore.kernel.org/all/20221124102745.2620370-1-sarath.babu.naidu.gaddam@amd.com


Changes in v5:
- Fix git am failure on net-next
- Addressed DT binding review comments i.e Modified commit description to
  remove dmaengine framework references and instead describe how
  axiethernet IP uses DMA channels.
- Fix "^[tr]x_chan[0-9]|1[0-5]$" -> "^[tr]x_chan([0-9]|1[0-5])$"
- Drop generic dmas description.
- Fix kmem_cache resource leak.
- Merge Xilinx DMA enhancements and optimization[4] into this series.

Changes in V4:
- Updated commit description about tx/rx channels name(1/3).
- Removed "dt-bindings" and "dmaengine" strings in subject(1/3).
- Extended dmas and dma-names to support MCDMA channel names(1/3).
- Rename has_dmas to use_dmaegine(2/3).
- Remove the AXIENET_USE_DMA(2/3).
- Remove the AXIENET_USE_DMA(3/3).
- Add dev_err_probe for dma_request_chan error handling(3/3).
- Add kmem_cache_destroy for create in axienet_setup_dma_chan(3/3).

Changes in V3:
- Moved RFC to PATCH.
- Removed ethtool get/set coalesce, will be added later.
- Added backward comapatible support.
- Split the dmaengine support patch of V2 into two patches(2/3 and 3/3).
https://lore.kernel.org/all/20220920055703.13246-4-sarath.babu.naidu.gaddam@amd.com/

Changes in V2:
- Add ethtool get/set coalesce and DMA reset using DMAengine framework.
- Add performance numbers.
- Remove .txt and change the name of file to xlnx,axiethernet.yaml.
- Fix DT check warning(Fix DT check warning('device_type' does not match
   any of the regexes:'pinctrl-[0-9]+' From schema: Documentation/
   devicetree/bindings/net/xilinx_axienet.yaml).

Radhey Shyam Pandey (9):
  dt-bindings: dmaengine: xilinx_dma:Add xlnx,axistream-connected
    property
  dt-bindings: dmaengine: xilinx_dma: Add xlnx,irq-delay property
  dmaengine: xilinx_dma: Pass AXI4-Stream control words to dma client
  dmaengine: xilinx_dma: Increase AXI DMA transaction segment count
  dmaengine: xilinx_dma: Freeup active list based on descriptor
    completion bit
  dmaengine: xilinx_dma: Use tasklet_hi_schedule for timing critical
    usecase
  dmaengine: xilinx_dma: Program interrupt delay timeout
  dt-bindings: net: xlnx,axi-ethernet: Introduce DMA support
  net: axienet: Introduce dmaengine support

Sarath Babu Naidu Gaddam (1):
  net: axienet: Preparatory changes for dmaengine support

 .../bindings/dma/xilinx/xilinx_dma.txt        |   6 +
 .../bindings/net/xlnx,axi-ethernet.yaml       |  16 +
 drivers/dma/xilinx/xilinx_dma.c               |  70 +-
 drivers/net/ethernet/xilinx/Kconfig           |   1 +
 drivers/net/ethernet/xilinx/xilinx_axienet.h  |  10 +
 .../net/ethernet/xilinx/xilinx_axienet_main.c | 622 ++++++++++++++----
 6 files changed, 591 insertions(+), 134 deletions(-)

Comments

Jakub Kicinski Aug. 8, 2023, 10:48 p.m. UTC | #1
On Mon, 7 Aug 2023 11:21:49 +0530 Radhey Shyam Pandey wrote:
> +struct axi_skbuff {
> +	struct scatterlist sgl[MAX_SKB_FRAGS + 1];
> +	struct dma_async_tx_descriptor *desc;
> +	dma_addr_t dma_address;
> +	struct sk_buff *skb;
> +	struct list_head lh;
> +	int sg_len;
> +} __packed;

Why __packed?

> +static netdev_tx_t
> +axienet_start_xmit_dmaengine(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct dma_async_tx_descriptor *dma_tx_desc = NULL;
> +	struct axienet_local *lp = netdev_priv(ndev);
> +	u32 app[DMA_NUM_APP_WORDS] = {0};
> +	struct axi_skbuff *axi_skb;
> +	u32 csum_start_off;
> +	u32 csum_index_off;
> +	int sg_len;
> +	int ret;
> +
> +	sg_len = skb_shinfo(skb)->nr_frags + 1;
> +	axi_skb = kmem_cache_zalloc(lp->skb_cache, GFP_KERNEL);
> +	if (!axi_skb)
> +		return NETDEV_TX_BUSY;

Drop on error, you're not stopping the queue correctly, just drop,
return OK and avoid bugs.

> +static inline int axienet_init_dmaengine(struct net_device *ndev)

Why inline? Please remove the spurious inline annotations.

> +{
> +	struct axienet_local *lp = netdev_priv(ndev);
> +	int i, ret;
> +
> +	lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0");
> +	if (IS_ERR(lp->tx_chan)) {
> +		ret = PTR_ERR(lp->tx_chan);
> +		return dev_err_probe(lp->dev, ret, "No Ethernet DMA (TX) channel found\n");
> +	}
> +
> +	lp->rx_chan = dma_request_chan(lp->dev, "rx_chan0");
> +	if (IS_ERR(lp->rx_chan)) {
> +		ret = PTR_ERR(lp->rx_chan);
> +		dev_err_probe(lp->dev, ret, "No Ethernet DMA (RX) channel found\n");
> +		goto err_dma_request_rx;

name labels after the target, not the source. Makes it a ton easier 
to maintain sanity when changing the code later.

> +	}
> +
> +	lp->skb_cache = kmem_cache_create("ethernet", sizeof(struct axi_skbuff),
> +					  0, 0, NULL);

Why create a cache ?
Isn't it cleaner to create a fake ring buffer of sgl? Most packets will
not have MAX_SKB_FRAGS of memory. On a ring buffer you can use only as
many sg entries as the packet requires. Also no need to alloc/free.

> +	if (!lp->skb_cache) {
> +		ret =  -ENOMEM;

double space, please fix everywhere

> +		goto err_kmem;

> @@ -2140,6 +2432,31 @@ static int axienet_probe(struct platform_device *pdev)
>  		}
>  		netif_napi_add(ndev, &lp->napi_rx, axienet_rx_poll);
>  		netif_napi_add(ndev, &lp->napi_tx, axienet_tx_poll);
> +	} else {
> +		struct xilinx_vdma_config cfg;
> +		struct dma_chan *tx_chan;
> +
> +		lp->eth_irq = platform_get_irq_optional(pdev, 0);

This can't fail?

> +		tx_chan = dma_request_chan(lp->dev, "tx_chan0");
> +
> +		if (IS_ERR(tx_chan)) {

no empty lines between call and its error check, please fix thru out

> +			ret = PTR_ERR(tx_chan);
> +			dev_err_probe(lp->dev, ret, "No Ethernet DMA (TX) channel found\n");
> +			goto cleanup_clk;
> +		}
Jakub Kicinski Aug. 8, 2023, 10:53 p.m. UTC | #2
On Mon, 7 Aug 2023 11:21:39 +0530 Radhey Shyam Pandey wrote:
> The axiethernet driver can use the dmaengine framework to communicate
> with the xilinx DMAengine driver(AXIDMA, MCDMA). The inspiration behind
> this dmaengine adoption is to reuse the in-kernel xilinx dma engine
> driver[1] and remove redundant dma programming sequence[2] from the
> ethernet driver. This simplifies the ethernet driver and also makes
> it generic to be hooked to any complaint dma IP i.e AXIDMA, MCDMA
> without any modification.
> 
> The dmaengine framework was extended for metadata API support during
> the axidma RFC[3] discussion. However, it still needs further
> enhancements to make it well suited for ethernet usecases.
> 
> Comments, suggestions, thoughts to implement remaining functional
> features are very welcome!

Vinod, any preference on how this gets merged?
Since we're already at -rc5 if the dmaengine parts look good to you 
taking those in for 6.6 and delaying the networking bits until 6.7
could be on the table? Possibly?
Pandey, Radhey Shyam Aug. 12, 2023, 3:27 p.m. UTC | #3
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, August 9, 2023 4:19 AM
> To: Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>
> Cc: vkoul@kernel.org; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; Simek, Michal
> <michal.simek@amd.com>; davem@davemloft.net; edumazet@google.com;
> pabeni@redhat.com; linux@armlinux.org.uk; dmaengine@vger.kernel.org;
> devicetree@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org; git (AMD-Xilinx)
> <git@amd.com>
> Subject: Re: [PATCH net-next v5 10/10] net: axienet: Introduce dmaengine
> support
> 
> On Mon, 7 Aug 2023 11:21:49 +0530 Radhey Shyam Pandey wrote:
> > +struct axi_skbuff {
> > +	struct scatterlist sgl[MAX_SKB_FRAGS + 1];
> > +	struct dma_async_tx_descriptor *desc;
> > +	dma_addr_t dma_address;
> > +	struct sk_buff *skb;
> > +	struct list_head lh;
> > +	int sg_len;
> > +} __packed;
> 
> Why __packed?

It was added considering size optimization, but I see it may have 
performance overheads. Will remove it in next version.

> 
> > +static netdev_tx_t
> > +axienet_start_xmit_dmaengine(struct sk_buff *skb, struct net_device
> > +*ndev) {
> > +	struct dma_async_tx_descriptor *dma_tx_desc = NULL;
> > +	struct axienet_local *lp = netdev_priv(ndev);
> > +	u32 app[DMA_NUM_APP_WORDS] = {0};
> > +	struct axi_skbuff *axi_skb;
> > +	u32 csum_start_off;
> > +	u32 csum_index_off;
> > +	int sg_len;
> > +	int ret;
> > +
> > +	sg_len = skb_shinfo(skb)->nr_frags + 1;
> > +	axi_skb = kmem_cache_zalloc(lp->skb_cache, GFP_KERNEL);
> > +	if (!axi_skb)
> > +		return NETDEV_TX_BUSY;
> 
> Drop on error, you're not stopping the queue correctly, just drop, return OK
> and avoid bugs.

As I understand NETDEV_TX_OK returns means driver took care of packet.
So inline with non-dmaengine xmit (axienet_start_xmit_legacy) should
we stop the queue and return TX_BUSY?

> 
> > +static inline int axienet_init_dmaengine(struct net_device *ndev)
> 
> Why inline? Please remove the spurious inline annotations.

Ok will fix it in next version

> 
> > +{
> > +	struct axienet_local *lp = netdev_priv(ndev);
> > +	int i, ret;
> > +
> > +	lp->tx_chan = dma_request_chan(lp->dev, "tx_chan0");
> > +	if (IS_ERR(lp->tx_chan)) {
> > +		ret = PTR_ERR(lp->tx_chan);
> > +		return dev_err_probe(lp->dev, ret, "No Ethernet DMA (TX)
> channel found\n");
> > +	}
> > +
> > +	lp->rx_chan = dma_request_chan(lp->dev, "rx_chan0");
> > +	if (IS_ERR(lp->rx_chan)) {
> > +		ret = PTR_ERR(lp->rx_chan);
> > +		dev_err_probe(lp->dev, ret, "No Ethernet DMA (RX) channel
> found\n");
> > +		goto err_dma_request_rx;
> 
> name labels after the target, not the source. Makes it a ton easier to
> maintain sanity when changing the code later.

Ok will rename it to target i.e goto err_dma_free_tx and fix other as well.

> 
> > +	}
> > +
> > +	lp->skb_cache = kmem_cache_create("ethernet", sizeof(struct
> axi_skbuff),
> > +					  0, 0, NULL);
> 
> Why create a cache ?
> Isn't it cleaner to create a fake ring buffer of sgl? Most packets will not have
> MAX_SKB_FRAGS of memory. On a ring buffer you can use only as many sg
> entries as the packet requires. Also no need to alloc/free.


The kmem_cache is used with intent to use slab cache interface and
make use of reusing objects in the kernel. slab cache maintains a 
cache of objects. When we free an object, instead of
deallocating it, it give it back to the cache. Next time, if we
want to create a new object, slab cache gives us one object from the
slab cache.

If we maintain custom circular buffer (struct circ_buf) ring buffer 
we have to create two such ring buffers one for TX and other for RX.
For multichannel this will multiply to * no of queues. Also we have to
ensure proper occupancy checks and head/tail pointer updates.

With kmem_cache pool we are offloading queue maintenance ops to
framework with a benefit of optimized alloc/dealloc. Let me know if it 
looks functionally fine and can retain it for this baseline dmaengine 
support version?

> 
> > +	if (!lp->skb_cache) {
> > +		ret =  -ENOMEM;
> 
> double space, please fix everywhere

Will fix it in next version.

> 
> > +		goto err_kmem;
> 
> > @@ -2140,6 +2432,31 @@ static int axienet_probe(struct platform_device
> *pdev)
> >  		}
> >  		netif_napi_add(ndev, &lp->napi_rx, axienet_rx_poll);
> >  		netif_napi_add(ndev, &lp->napi_tx, axienet_tx_poll);
> > +	} else {
> > +		struct xilinx_vdma_config cfg;
> > +		struct dma_chan *tx_chan;
> > +
> > +		lp->eth_irq = platform_get_irq_optional(pdev, 0);
> 
> This can't fail?

I will add check for lp->eth_irq != -ENXIO and add its error handling.
> 
> > +		tx_chan = dma_request_chan(lp->dev, "tx_chan0");
> > +
> > +		if (IS_ERR(tx_chan)) {
> 
> no empty lines between call and its error check, please fix thru out

Ok will fix and cross-check for all other references.
> 
> > +			ret = PTR_ERR(tx_chan);
> > +			dev_err_probe(lp->dev, ret, "No Ethernet DMA (TX)
> channel found\n");
> > +			goto cleanup_clk;
> > +		}
> --
> pw-bot: cr
Jakub Kicinski Aug. 14, 2023, 3:29 p.m. UTC | #4
On Sat, 12 Aug 2023 15:27:19 +0000 Pandey, Radhey Shyam wrote:
> > Drop on error, you're not stopping the queue correctly, just drop, return OK
> > and avoid bugs.  
> 
> As I understand NETDEV_TX_OK returns means driver took care of packet.
> So inline with non-dmaengine xmit (axienet_start_xmit_legacy) should
> we stop the queue and return TX_BUSY?

You should only return BUSY if there is no space. All other errors
should lead to drops, and increment of tx_error. Otherwise problem
with handling a single packet may stall the NIC forever.
It is somewhat confusing that we return TX_OK in that case but it
is what it is.

> > Why create a cache ?
> > Isn't it cleaner to create a fake ring buffer of sgl? Most packets will not have
> > MAX_SKB_FRAGS of memory. On a ring buffer you can use only as many sg
> > entries as the packet requires. Also no need to alloc/free.  
> 
> The kmem_cache is used with intent to use slab cache interface and
> make use of reusing objects in the kernel. slab cache maintains a 
> cache of objects. When we free an object, instead of
> deallocating it, it give it back to the cache. Next time, if we
> want to create a new object, slab cache gives us one object from the
> slab cache.
> 
> If we maintain custom circular buffer (struct circ_buf) ring buffer 
> we have to create two such ring buffers one for TX and other for RX.
> For multichannel this will multiply to * no of queues. Also we have to
> ensure proper occupancy checks and head/tail pointer updates.
> 
> With kmem_cache pool we are offloading queue maintenance ops to
> framework with a benefit of optimized alloc/dealloc. Let me know if it 
> looks functionally fine and can retain it for this baseline dmaengine 
> support version?

The kmemcache is not the worst possible option but note that the
objects you're allocating (with zeroing) are 512+ bytes. That's
pretty large, when most packets will not have full 16 fragments.
Ring buffer would allow to better match the allocation size to 
the packets. Not to mention that it can be done fully locklessly.
Vinod Koul Aug. 21, 2023, 1:11 p.m. UTC | #5
On 08-08-23, 15:53, Jakub Kicinski wrote:
> On Mon, 7 Aug 2023 11:21:39 +0530 Radhey Shyam Pandey wrote:
> > The axiethernet driver can use the dmaengine framework to communicate
> > with the xilinx DMAengine driver(AXIDMA, MCDMA). The inspiration behind
> > this dmaengine adoption is to reuse the in-kernel xilinx dma engine
> > driver[1] and remove redundant dma programming sequence[2] from the
> > ethernet driver. This simplifies the ethernet driver and also makes
> > it generic to be hooked to any complaint dma IP i.e AXIDMA, MCDMA
> > without any modification.
> > 
> > The dmaengine framework was extended for metadata API support during
> > the axidma RFC[3] discussion. However, it still needs further
> > enhancements to make it well suited for ethernet usecases.
> > 
> > Comments, suggestions, thoughts to implement remaining functional
> > features are very welcome!
> 
> Vinod, any preference on how this gets merged?
> Since we're already at -rc5 if the dmaengine parts look good to you 
> taking those in for 6.6 and delaying the networking bits until 6.7
> could be on the table? Possibly?

Yep, I am picking the dmaengine bits
Vinod Koul Aug. 21, 2023, 1:52 p.m. UTC | #6
On Mon, 07 Aug 2023 11:21:39 +0530, Radhey Shyam Pandey wrote:
> The axiethernet driver can use the dmaengine framework to communicate
> with the xilinx DMAengine driver(AXIDMA, MCDMA). The inspiration behind
> this dmaengine adoption is to reuse the in-kernel xilinx dma engine
> driver[1] and remove redundant dma programming sequence[2] from the
> ethernet driver. This simplifies the ethernet driver and also makes
> it generic to be hooked to any complaint dma IP i.e AXIDMA, MCDMA
> without any modification.
> 
> [...]

Applied, thanks!

[01/10] dt-bindings: dmaengine: xilinx_dma:Add xlnx,axistream-connected property
        commit: 94afcfb819b3a07e55d463c29e2d594316f40b4a
[02/10] dt-bindings: dmaengine: xilinx_dma: Add xlnx,irq-delay property
        commit: e8cfa385054c6aa7ae8dd743d8ea980039a0fc0b
[03/10] dmaengine: xilinx_dma: Pass AXI4-Stream control words to dma client
        commit: d8a3f65f6c1de1028b9af6ca31d9dd3738fda97e
[04/10] dmaengine: xilinx_dma: Increase AXI DMA transaction segment count
        commit: 491e9d409629964457d094ac2b99e319d428dd1d
[05/10] dmaengine: xilinx_dma: Freeup active list based on descriptor completion bit
        commit: 7bcdaa65810212c999d21e5c3019d03da37b3be3
[06/10] dmaengine: xilinx_dma: Use tasklet_hi_schedule for timing critical usecase
        commit: c77d4c5081aa6508623be876afebff003a2e5875
[07/10] dmaengine: xilinx_dma: Program interrupt delay timeout
        commit: 84b798fedf3fa8f0ab0c096593ba817abc454fe5

Best regards,
Pandey, Radhey Shyam Aug. 23, 2023, 5:38 p.m. UTC | #7
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, August 14, 2023 9:00 PM
> To: Pandey, Radhey Shyam <radhey.shyam.pandey@amd.com>
> Cc: vkoul@kernel.org; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org; Simek, Michal
> <michal.simek@amd.com>; davem@davemloft.net; edumazet@google.com;
> pabeni@redhat.com; linux@armlinux.org.uk; dmaengine@vger.kernel.org;
> devicetree@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org; git (AMD-Xilinx)
> <git@amd.com>
> Subject: Re: [PATCH net-next v5 10/10] net: axienet: Introduce dmaengine
> support
> 
> On Sat, 12 Aug 2023 15:27:19 +0000 Pandey, Radhey Shyam wrote:
> > > Drop on error, you're not stopping the queue correctly, just drop, return
> OK
> > > and avoid bugs.
> >
> > As I understand NETDEV_TX_OK returns means driver took care of packet.
> > So inline with non-dmaengine xmit (axienet_start_xmit_legacy) should
> > we stop the queue and return TX_BUSY?
> 
> You should only return BUSY if there is no space. All other errors
> should lead to drops, and increment of tx_error. Otherwise problem
> with handling a single packet may stall the NIC forever.
> It is somewhat confusing that we return TX_OK in that case but it
> is what it is.
> 
> > > Why create a cache ?
> > > Isn't it cleaner to create a fake ring buffer of sgl? Most packets will not
> have
> > > MAX_SKB_FRAGS of memory. On a ring buffer you can use only as many
> sg
> > > entries as the packet requires. Also no need to alloc/free.
> >
> > The kmem_cache is used with intent to use slab cache interface and
> > make use of reusing objects in the kernel. slab cache maintains a
> > cache of objects. When we free an object, instead of
> > deallocating it, it give it back to the cache. Next time, if we
> > want to create a new object, slab cache gives us one object from the
> > slab cache.
> >
> > If we maintain custom circular buffer (struct circ_buf) ring buffer
> > we have to create two such ring buffers one for TX and other for RX.
> > For multichannel this will multiply to * no of queues. Also we have to
> > ensure proper occupancy checks and head/tail pointer updates.
> >
> > With kmem_cache pool we are offloading queue maintenance ops to
> > framework with a benefit of optimized alloc/dealloc. Let me know if it
> > looks functionally fine and can retain it for this baseline dmaengine
> > support version?
> 
> The kmemcache is not the worst possible option but note that the
> objects you're allocating (with zeroing) are 512+ bytes. That's
> pretty large, when most packets will not have full 16 fragments.
> Ring buffer would allow to better match the allocation size to
> the packets. Not to mention that it can be done fully locklessly.

I modified the implementation to use a circular ring buffer for TX
and RX. It seems to be working in initial testing and now running 
perf tests.

Just had one question on when to submit v6 ? Wait till dmaengine
patches([01/10-[07/10] is part of net-next? Or can I send it now also.

Thanks,
Radhey
Jakub Kicinski Aug. 24, 2023, 1:10 a.m. UTC | #8
On Wed, 23 Aug 2023 17:38:58 +0000 Pandey, Radhey Shyam wrote:
> > The kmemcache is not the worst possible option but note that the
> > objects you're allocating (with zeroing) are 512+ bytes. That's
> > pretty large, when most packets will not have full 16 fragments.
> > Ring buffer would allow to better match the allocation size to
> > the packets. Not to mention that it can be done fully locklessly.  
> 
> I modified the implementation to use a circular ring buffer for TX
> and RX. It seems to be working in initial testing and now running 
> perf tests.
> 
> Just had one question on when to submit v6 ? Wait till dmaengine
> patches([01/10-[07/10] is part of net-next? Or can I send it now also.

Assuming Linus cuts final this Sunday - after Sept 10th.