mbox series

[net-next,v2,0/9] net: ethernet: ti: add networking support for k3 am65x/j721e soc

Message ID 20200306234734.15014-1-grygorii.strashko@ti.com
Headers show
Series net: ethernet: ti: add networking support for k3 am65x/j721e soc | expand

Message

Grygorii Strashko March 6, 2020, 11:47 p.m. UTC
Hi

This v2 series adds basic networking support support TI K3 AM654x/J721E SoC which
have integrated Gigabit Ethernet MAC (Media Access Controller) into device MCU
domain and named MCU_CPSW0 (CPSW2G NUSS).

Formally TRMs refer CPSW2G NUSS as two-port Gigabit Ethernet Switch subsystem
with port 0 being the CPPI DMA host port and port 1 being the external Ethernet
port, but for 1 external port device it's just Port 0 <-> ALE <-> Port 1 and it's
rather device with HW filtering capabilities then actually switching device.
It's expected to have similar devices, but with more external ports.

The new Host port 0 Communications Port Programming Interface (CPPI5) is
operating by TI AM654x/J721E NAVSS Unified DMA Peripheral Root Complex (UDMA-P)
controller [1].

The CPSW2G contains below modules for which existing code is re-used:
 - MAC SL: cpsw_sl.c
 - Address Lookup Engine (ALE): cpsw_ale.c, basically compatible with K2 66AK2E/G
 - Management Data Input/Output interface (MDIO): davinci_mdio.c, fully 
   compatible with TI AM3/4/5 devices

Basic features supported by CPSW2G NUSS driver:
 - VLAN support, 802.1Q compliant, Auto add port VLAN for untagged frames on
   ingress, Auto VLAN removal on egress and auto pad to minimum frame size.
 - multicast filtering
 - promisc mode
 - TX multiq support in Round Robin or Fixed priority modes
 - RX checksum offload for non-fragmented IPv4/IPv6 TCP/UDP packets
 - TX checksum offload support for IPv4/IPv6 TCP/UDP packets (J721E only).

Features under development:
 - Support for IEEE 1588 Clock Synchronization. The CPSW2G NUSS includes new
   version of Common Platform Time Sync (CPTS)
 - tc-mqprio: priority level Quality Of Service (QOS) support (802.1p)
 - tc-cbs: Support for Audio/Video Bridging (P802.1Qav/D6.0) HW shapers
 - tc-taprio: IEEE 802.1Qbv/D2.2 Enhancements for Scheduled Traffic
 - frame preemption: IEEE P902.3br/D2.0 Interspersing Express Traffic, 802.1Qbu
 - extended ALE features: classifier/policers, auto-aging

This series depends on:
 [for-next PATCH 0/5] phy: ti: gmii-sel: add support for am654x/j721e soc
 https://lkml.org/lkml/2020/2/22/100

Patches 1-5 are intended for netdev, Patches 6-9 are intended for K# Platform
tree and provided here for testing purposes.

Changes:
 - fixed DT yaml definition
 - fixed comments from David Miller

v1: https://lwn.net/Articles/813087/

TRMs:
 AM654: http://www.ti.com/lit/ug/spruid7e/spruid7e.pdf
 J721E: http://www.ti.com/lit/ug/spruil1a/spruil1a.pdf

Preliminary documentation can be found at:
 http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/K3_CPSW2g.html

[1] https://lwn.net/Articles/808030/

Grygorii Strashko (9):
  net: ethernet: ti: ale: fix seeing unreg mcast packets with promisc
    and allmulti disabled
  net: ethernet: ti: ale: add support for mac-only mode
  net: ethernet: ti: ale: am65: add support for default thread cfg
  dt-binding: ti: am65x: document mcu cpsw nuss
  net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver
  arm64: dts: ti: k3-am65-mcu: add cpsw nuss node
  arm64: dts: k3-am654-base-board: add mcu cpsw nuss pinmux and phy defs
  arm64: dts: ti: k3-j721e-mcu: add mcu cpsw nuss node
  arm64: dts: ti: k3-j721e-common-proc-board: add mcu cpsw nuss pinmux
    and phy defs

 .../bindings/net/ti,k3-am654-cpsw-nuss.yaml   |  225 ++
 arch/arm64/boot/dts/ti/k3-am65-mcu.dtsi       |   49 +
 arch/arm64/boot/dts/ti/k3-am65.dtsi           |    1 +
 .../arm64/boot/dts/ti/k3-am654-base-board.dts |   42 +
 .../dts/ti/k3-j721e-common-proc-board.dts     |   43 +
 .../boot/dts/ti/k3-j721e-mcu-wakeup.dtsi      |   49 +
 arch/arm64/boot/dts/ti/k3-j721e.dtsi          |    1 +
 drivers/net/ethernet/ti/Kconfig               |   19 +-
 drivers/net/ethernet/ti/Makefile              |    3 +
 drivers/net/ethernet/ti/am65-cpsw-ethtool.c   |  763 +++++++
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      | 1991 +++++++++++++++++
 drivers/net/ethernet/ti/am65-cpsw-nuss.h      |  143 ++
 drivers/net/ethernet/ti/cpsw_ale.c            |   38 +
 drivers/net/ethernet/ti/cpsw_ale.h            |    4 +
 drivers/net/ethernet/ti/k3-udma-desc-pool.c   |  126 ++
 drivers/net/ethernet/ti/k3-udma-desc-pool.h   |   30 +
 16 files changed, 3525 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml
 create mode 100644 drivers/net/ethernet/ti/am65-cpsw-ethtool.c
 create mode 100644 drivers/net/ethernet/ti/am65-cpsw-nuss.c
 create mode 100644 drivers/net/ethernet/ti/am65-cpsw-nuss.h
 create mode 100644 drivers/net/ethernet/ti/k3-udma-desc-pool.c
 create mode 100644 drivers/net/ethernet/ti/k3-udma-desc-pool.h

Comments

Jakub Kicinski March 7, 2020, 1:20 a.m. UTC | #1
> +static void am65_cpsw_get_drvinfo(struct net_device *ndev,
> +				  struct ethtool_drvinfo *info)
> +{
> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +
> +	strlcpy(info->driver, dev_driver_string(common->dev),
> +		sizeof(info->driver));
> +	strlcpy(info->version, AM65_CPSW_DRV_VER, sizeof(info->version));

Please remove the driver version, use of driver versions is being deprecated upstream.

> +	strlcpy(info->bus_info, dev_name(common->dev), sizeof(info->bus_info));
> +}

> +static void am65_cpsw_get_channels(struct net_device *ndev,
> +				   struct ethtool_channels *ch)
> +{
> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +
> +	ch->max_combined = 0;

no need to zero fields

> +	ch->max_rx = AM65_CPSW_MAX_RX_QUEUES;
> +	ch->max_tx = AM65_CPSW_MAX_TX_QUEUES;
> +	ch->max_other = 0;
> +	ch->other_count = 0;
> +	ch->rx_count = AM65_CPSW_MAX_RX_QUEUES;
> +	ch->tx_count = common->tx_ch_num;
> +	ch->combined_count = 0;
> +}
> +
> +static int am65_cpsw_set_channels(struct net_device *ndev,
> +				  struct ethtool_channels *chs)
> +{
> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +
> +	if (chs->combined_count)
> +		return -EINVAL;

core will check if its larger than max_combined

> +	if (!chs->rx_count || !chs->tx_count)
> +		return -EINVAL;
> +
> +	if (chs->rx_count != 1 ||
> +	    chs->tx_count > AM65_CPSW_MAX_TX_QUEUES)
> +		return -EINVAL;

ditto

> +	/* Check if interface is up. Can change the num queues when
> +	 * the interface is down.
> +	 */
> +	if (netif_running(ndev))
> +		return -EBUSY;
> +
> +	am65_cpsw_nuss_remove_tx_chns(common);
> +
> +	return am65_cpsw_nuss_update_tx_chns(common, chs->tx_count);
> +}
> +
> +static void am65_cpsw_get_ringparam(struct net_device *ndev,
> +				    struct ethtool_ringparam *ering)
> +{
> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +
> +	/* not supported */
> +	ering->tx_max_pending = 0;

no need to zero fields

> +	ering->tx_pending = common->tx_chns[0].descs_num;
> +	ering->rx_max_pending = 0;
> +	ering->rx_pending = common->rx_chns.descs_num;
> +}

> +EXPORT_SYMBOL_GPL(am65_cpsw_nuss_adjust_link);

Why does this need to be exported?

> +	psdata = cppi5_hdesc_get_psdata(desc_rx);
> +	csum_info = psdata[2];
> +	dev_dbg(dev, "%s rx csum_info:%#x\n", __func__, csum_info);
> +
> +	dma_unmap_single(dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
> +
> +	k3_udma_desc_pool_free(rx_chn->desc_pool, desc_rx);
> +
> +	if (unlikely(!netif_running(skb->dev))) {

This is strange, does am65_cpsw_nuss_ndo_slave_stop() not stop RX?

> +		dev_kfree_skb_any(skb);
> +		return 0;
> +	}
> +
> +	new_skb = netdev_alloc_skb_ip_align(ndev, AM65_CPSW_MAX_PACKET_SIZE);
> +	if (new_skb) {
> +		skb_put(skb, pkt_len);
> +		skb->protocol = eth_type_trans(skb, ndev);
> +		am65_cpsw_nuss_rx_csum(skb, csum_info);
> +		napi_gro_receive(&common->napi_rx, skb);
> +
> +		ndev_priv = netdev_priv(ndev);
> +		stats = this_cpu_ptr(ndev_priv->stats);
> +
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->rx_packets++;
> +		stats->rx_bytes += pkt_len;
> +		u64_stats_update_end(&stats->syncp);
> +		kmemleak_not_leak(new_skb);
> +	} else {
> +		ndev->stats.rx_dropped++;
> +		new_skb = skb;
> +	}

> +static int am65_cpsw_nuss_tx_poll(struct napi_struct *napi_tx, int budget)
> +{
> +	struct am65_cpsw_tx_chn *tx_chn = am65_cpsw_napi_to_tx_chn(napi_tx);
> +	int num_tx;
> +
> +	num_tx = am65_cpsw_nuss_tx_compl_packets(tx_chn->common, tx_chn->id,
> +						 budget);
> +	if (num_tx < budget) {

The budget is for RX, you can just complete all TX on a NAPI poll.

> +		napi_complete(napi_tx);
> +		enable_irq(tx_chn->irq);
> +	}
> +
> +	return num_tx;
> +}

> +static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
> +						 struct net_device *ndev)
> +{
> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +	struct cppi5_host_desc_t *first_desc, *next_desc, *cur_desc;
> +	struct am65_cpsw_port *port = am65_ndev_to_port(ndev);
> +	struct device *dev = common->dev;
> +	struct am65_cpsw_tx_chn *tx_chn;
> +	struct netdev_queue *netif_txq;
> +	dma_addr_t desc_dma, buf_dma;
> +	int ret, q_idx, i;
> +	void **swdata;
> +	u32 *psdata;
> +	u32 pkt_len;
> +
> +	/* frag list based linkage is not supported for now. */
> +	if (skb_shinfo(skb)->frag_list) {
> +		dev_err_ratelimited(dev, "NETIF_F_FRAGLIST not supported\n");
> +		ret = -EINVAL;
> +		goto drop_free_skb;
> +	}

You don't advertise the feature, there is no need for this check.

> +	/* padding enabled in hw */
> +	pkt_len = skb_headlen(skb);
> +
> +	q_idx = skb_get_queue_mapping(skb);
> +	dev_dbg(dev, "%s skb_queue:%d\n", __func__, q_idx);
> +	q_idx = q_idx % common->tx_ch_num;

You should never see a packet for ring above your ring count, this
modulo is unnecessary.

> +	tx_chn = &common->tx_chns[q_idx];
> +	netif_txq = netdev_get_tx_queue(ndev, q_idx);
> +
> +	/* Map the linear buffer */
> +	buf_dma = dma_map_single(dev, skb->data, pkt_len,
> +				 DMA_TO_DEVICE);
> +	if (unlikely(dma_mapping_error(dev, buf_dma))) {
> +		dev_err(dev, "Failed to map tx skb buffer\n");

You probably don't want to print errors when memory gets depleted.
Counter is enough

> +		ret = -EINVAL;

EINVAL is not a valid netdev_tx_t..

> +		ndev->stats.tx_errors++;
> +		goto drop_stop_q;

Why stop queue on memory mapping error? What will re-enable it?

> +	}
> +
> +	first_desc = k3_udma_desc_pool_alloc(tx_chn->desc_pool);
> +	if (!first_desc) {
> +		dev_dbg(dev, "Failed to allocate descriptor\n");

ret not set

> +		dma_unmap_single(dev, buf_dma, pkt_len, DMA_TO_DEVICE);
> +		goto drop_stop_q_busy;
> +	}

> +done_tx:
> +	skb_tx_timestamp(skb);
> +
> +	/* report bql before sending packet */
> +	netdev_tx_sent_queue(netif_txq, pkt_len);
> +
> +	cppi5_hdesc_set_pktlen(first_desc, pkt_len);
> +	desc_dma = k3_udma_desc_pool_virt2dma(tx_chn->desc_pool, first_desc);
> +	ret = k3_udma_glue_push_tx_chn(tx_chn->tx_chn, first_desc, desc_dma);
> +	if (ret) {
> +		dev_err(dev, "can't push desc %d\n", ret);
> +		ndev->stats.tx_errors++;
> +		goto drop_free_descs;

BQL already counted this frame.

> +	}
> +
> +	if (k3_udma_desc_pool_avail(tx_chn->desc_pool) < MAX_SKB_FRAGS) {
> +		netif_tx_stop_queue(netif_txq);
> +		/* Barrier, so that stop_queue visible to other cpus */
> +		smp_mb__after_atomic();
> +		dev_err(dev, "netif_tx_stop_queue %d\n", q_idx);

How many descriptors are there if it's okay to print an error when
descriptors run out? :o

> +		/* re-check for smp */
> +		if (k3_udma_desc_pool_avail(tx_chn->desc_pool) >=
> +		    MAX_SKB_FRAGS) {
> +			netif_tx_wake_queue(netif_txq);
> +			dev_err(dev, "netif_tx_wake_queue %d\n", q_idx);
> +		}
> +	}
> +
> +	return NETDEV_TX_OK;
> +
> +drop_free_descs:
> +	am65_cpsw_nuss_xmit_free(tx_chn, dev, first_desc);
> +drop_stop_q:
> +	netif_tx_stop_queue(netif_txq);
> +drop_free_skb:
> +	ndev->stats.tx_dropped++;
> +	dev_kfree_skb_any(skb);
> +	return ret;

return NETDEV_TX_OK;

> +
> +drop_stop_q_busy:
> +	netif_tx_stop_queue(netif_txq);
> +	return NETDEV_TX_BUSY;
> +}

> +static int am65_cpsw_nuss_ndev_add_napi_2g(struct am65_cpsw_common *common)
> +{
> +	struct device *dev = common->dev;
> +	struct am65_cpsw_port *port;
> +	int i, ret = 0;
> +
> +	port = am65_common_get_port(common, 1);
> +
> +	for (i = 0; i < common->tx_ch_num; i++) {
> +		struct am65_cpsw_tx_chn	*tx_chn = &common->tx_chns[i];
> +
> +		netif_tx_napi_add(port->ndev, &tx_chn->napi_tx,
> +				  am65_cpsw_nuss_tx_poll, NAPI_POLL_WEIGHT);
> +
> +		ret = devm_request_irq(dev, tx_chn->irq,
> +				       am65_cpsw_nuss_tx_irq,
> +				       0, tx_chn->tx_chn_name, tx_chn);
> +		if (ret) {
> +			dev_err(dev, "failure requesting tx%u irq %u, %d\n",
> +				tx_chn->id, tx_chn->irq, ret);
> +			goto err;

If this fails half way through the loop is there something that'd call 
netif_tx_napi_del()?

> +		}
> +	}
> +
> +err:
> +	return ret;
> +}

> +	/* register devres action here, so dev will be disabled
> +	 * at right moment. The dev will be enabled in .remove() callback
> +	 * unconditionally.
> +	 */
> +	ret = devm_add_action_or_reset(dev, am65_cpsw_pm_runtime_free, dev);
> +	if (ret) {
> +		dev_err(dev, "failed to add pm put reset action %d", ret);
> +		return ret;
> +	}

Could you explain why you need this? Why can't remove disable PM?

In general looks to me like this driver abuses devm_ infra in ways
which make it more complex than it needs to be :(

> +	ret = devm_of_platform_populate(dev);
> +	/* We do not want to force this, as in some cases may not have child */
> +	if (ret)
> +		dev_warn(dev, "populating child nodes err:%d\n", ret);
> +
> +	am65_cpsw_nuss_get_ver(common);
Grygorii Strashko March 7, 2020, 5:19 a.m. UTC | #2
Hi Jakub,

Thank you for your review.

On 07/03/2020 03:20, Jakub Kicinski wrote:
>> +static void am65_cpsw_get_drvinfo(struct net_device *ndev,
>> +				  struct ethtool_drvinfo *info)
>> +{
>> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>> +
>> +	strlcpy(info->driver, dev_driver_string(common->dev),
>> +		sizeof(info->driver));
>> +	strlcpy(info->version, AM65_CPSW_DRV_VER, sizeof(info->version));
> 
> Please remove the driver version, use of driver versions is being deprecated upstream.

Hm. I can remove it np. But how do I or anybody else can know that it's deprecated

  * @get_drvinfo: Report driver/device information.  Should only set the
  *	@driver, @version, @fw_version and @bus_info fields.  If not
  *	implemented, the @driver and @bus_info fields will be filled in
  *	according to the netdev's parent device.

  * struct ethtool_drvinfo - general driver and device information
..
  * @version: Driver version string; may be an empty string

It seems not marked as deprecated.

> 
>> +	strlcpy(info->bus_info, dev_name(common->dev), sizeof(info->bus_info));
>> +}
> 
>> +static void am65_cpsw_get_channels(struct net_device *ndev,
>> +				   struct ethtool_channels *ch)
>> +{
>> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>> +
>> +	ch->max_combined = 0;
> 
> no need to zero fields

[...]

> 
>> +	psdata = cppi5_hdesc_get_psdata(desc_rx);
>> +	csum_info = psdata[2];
>> +	dev_dbg(dev, "%s rx csum_info:%#x\n", __func__, csum_info);
>> +
>> +	dma_unmap_single(dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
>> +
>> +	k3_udma_desc_pool_free(rx_chn->desc_pool, desc_rx);
>> +
>> +	if (unlikely(!netif_running(skb->dev))) {
> 
> This is strange, does am65_cpsw_nuss_ndo_slave_stop() not stop RX?

Net core will set __LINK_STATE_START = 0 before calling .ndo_stop() and there could some time gap
between clearing __LINK_STATE_START and actually disabling RX.
if NAPI is in progress it will just allow to complete current NAPI cycle by discarding unwanted packets
and without statistic update.


> 
>> +		dev_kfree_skb_any(skb);
>> +		return 0;
>> +	}
>> +
>> +	new_skb = netdev_alloc_skb_ip_align(ndev, AM65_CPSW_MAX_PACKET_SIZE);
>> +	if (new_skb) {
>> +		skb_put(skb, pkt_len);
>> +		skb->protocol = eth_type_trans(skb, ndev);
>> +		am65_cpsw_nuss_rx_csum(skb, csum_info);
>> +		napi_gro_receive(&common->napi_rx, skb);
>> +
>> +		ndev_priv = netdev_priv(ndev);
>> +		stats = this_cpu_ptr(ndev_priv->stats);
>> +
>> +		u64_stats_update_begin(&stats->syncp);
>> +		stats->rx_packets++;
>> +		stats->rx_bytes += pkt_len;
>> +		u64_stats_update_end(&stats->syncp);
>> +		kmemleak_not_leak(new_skb);
>> +	} else {
>> +		ndev->stats.rx_dropped++;
>> +		new_skb = skb;
>> +	}
> 
>> +static int am65_cpsw_nuss_tx_poll(struct napi_struct *napi_tx, int budget)
>> +{
>> +	struct am65_cpsw_tx_chn *tx_chn = am65_cpsw_napi_to_tx_chn(napi_tx);
>> +	int num_tx;
>> +
>> +	num_tx = am65_cpsw_nuss_tx_compl_packets(tx_chn->common, tx_chn->id,
>> +						 budget);
>> +	if (num_tx < budget) {
> 
> The budget is for RX, you can just complete all TX on a NAPI poll.

Then TX will block RX. Right? this is soft IRQs which are executed one by one.

> 
>> +		napi_complete(napi_tx);
>> +		enable_irq(tx_chn->irq);
>> +	}
>> +
>> +	return num_tx;
>> +}
> 
>> +static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
>> +						 struct net_device *ndev)
>> +{
>> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>> +	struct cppi5_host_desc_t *first_desc, *next_desc, *cur_desc;
>> +	struct am65_cpsw_port *port = am65_ndev_to_port(ndev);
>> +	struct device *dev = common->dev;
>> +	struct am65_cpsw_tx_chn *tx_chn;
>> +	struct netdev_queue *netif_txq;
>> +	dma_addr_t desc_dma, buf_dma;
>> +	int ret, q_idx, i;
>> +	void **swdata;
>> +	u32 *psdata;
>> +	u32 pkt_len;
>> +
>> +	/* frag list based linkage is not supported for now. */
>> +	if (skb_shinfo(skb)->frag_list) {
>> +		dev_err_ratelimited(dev, "NETIF_F_FRAGLIST not supported\n");
>> +		ret = -EINVAL;
>> +		goto drop_free_skb;
>> +	}
> 
> You don't advertise the feature, there is no need for this check.
> 
>> +	/* padding enabled in hw */
>> +	pkt_len = skb_headlen(skb);
>> +
>> +	q_idx = skb_get_queue_mapping(skb);
>> +	dev_dbg(dev, "%s skb_queue:%d\n", __func__, q_idx);
>> +	q_idx = q_idx % common->tx_ch_num;
> 
> You should never see a packet for ring above your ring count, this
> modulo is unnecessary.
> 
>> +	tx_chn = &common->tx_chns[q_idx];
>> +	netif_txq = netdev_get_tx_queue(ndev, q_idx);
>> +
>> +	/* Map the linear buffer */
>> +	buf_dma = dma_map_single(dev, skb->data, pkt_len,
>> +				 DMA_TO_DEVICE);
>> +	if (unlikely(dma_mapping_error(dev, buf_dma))) {
>> +		dev_err(dev, "Failed to map tx skb buffer\n");
> 
> You probably don't want to print errors when memory gets depleted.
> Counter is enough

Could you please help me understand what is the relation between "memory depletion"
and dma_mapping_error()?

> 
>> +		ret = -EINVAL;
> 
> EINVAL is not a valid netdev_tx_t..

Considering dev_xmit_complete() - this part was always "black magic" to me :(
Will fix.

> 
>> +		ndev->stats.tx_errors++;
>> +		goto drop_stop_q;
> 
> Why stop queue on memory mapping error? What will re-enable it?

it will not. I'm considering it as critical - no recovery.

> 
>> +	}
>> +
>> +	first_desc = k3_udma_desc_pool_alloc(tx_chn->desc_pool);
>> +	if (!first_desc) {
>> +		dev_dbg(dev, "Failed to allocate descriptor\n");
> 
> ret not set

It will return NETDEV_TX_BUSY in this  case - below.

> 
>> +		dma_unmap_single(dev, buf_dma, pkt_len, DMA_TO_DEVICE);
>> +		goto drop_stop_q_busy;
>> +	}
> 

[...]

> 
>> +static int am65_cpsw_nuss_ndev_add_napi_2g(struct am65_cpsw_common *common)
>> +{
>> +	struct device *dev = common->dev;
>> +	struct am65_cpsw_port *port;
>> +	int i, ret = 0;
>> +
>> +	port = am65_common_get_port(common, 1);
>> +
>> +	for (i = 0; i < common->tx_ch_num; i++) {
>> +		struct am65_cpsw_tx_chn	*tx_chn = &common->tx_chns[i];
>> +
>> +		netif_tx_napi_add(port->ndev, &tx_chn->napi_tx,
>> +				  am65_cpsw_nuss_tx_poll, NAPI_POLL_WEIGHT);
>> +
>> +		ret = devm_request_irq(dev, tx_chn->irq,
>> +				       am65_cpsw_nuss_tx_irq,
>> +				       0, tx_chn->tx_chn_name, tx_chn);
>> +		if (ret) {
>> +			dev_err(dev, "failure requesting tx%u irq %u, %d\n",
>> +				tx_chn->id, tx_chn->irq, ret);
>> +			goto err;
> 
> If this fails half way through the loop is there something that'd call
> netif_tx_napi_del()?

free_netdev()

> 
>> +		}
>> +	}
>> +
>> +err:
>> +	return ret;
>> +}
> 
>> +	/* register devres action here, so dev will be disabled
>> +	 * at right moment. The dev will be enabled in .remove() callback
>> +	 * unconditionally.
>> +	 */
>> +	ret = devm_add_action_or_reset(dev, am65_cpsw_pm_runtime_free, dev);
>> +	if (ret) {
>> +		dev_err(dev, "failed to add pm put reset action %d", ret);
>> +		return ret;
>> +	}
> 
> Could you explain why you need this? Why can't remove disable PM?
> 
> In general looks to me like this driver abuses devm_ infra in ways
> which make it more complex than it needs to be :(

Sry, can't agree here. This allows me to keep release sequence in sane way and get
rid of mostly all goto for fail cases (which are constant source of errors for complex
initialization sequences) by using standard framework.

Regarding PM runtime
  -  many Linux core framework provide devm_ APIs this and other
drivers are happy to use them.
  - but, there is the problem: DD release sequence is

	drv->remove(dev);

	devres_release_all(dev);

and there is no devm_ API for PM runtime. So, if some initialization step is done with devm_ API and
It depends on device to be active - no way to solve it in .remove() callback easily.
For example, devm_of_platform_populate().

With above code I ensure that:

drv->remove(dev);
  |- pm_runtime_get()

devres_release_all(dev);
  |- devm_of_platform_populate_release()
  |- pm_runtime_put()
  |- pm_runtime_disable()
Jakub Kicinski March 9, 2020, 7:36 p.m. UTC | #3
On Sat, 7 Mar 2020 07:19:17 +0200 Grygorii Strashko wrote:
> Hi Jakub,
> 
> Thank you for your review.
> 
> On 07/03/2020 03:20, Jakub Kicinski wrote:
> >> +static void am65_cpsw_get_drvinfo(struct net_device *ndev,
> >> +				  struct ethtool_drvinfo *info)
> >> +{
> >> +	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> >> +
> >> +	strlcpy(info->driver, dev_driver_string(common->dev),
> >> +		sizeof(info->driver));
> >> +	strlcpy(info->version, AM65_CPSW_DRV_VER, sizeof(info->version));  
> > 
> > Please remove the driver version, use of driver versions is being deprecated upstream.  
> 
> Hm. I can remove it np. But how do I or anybody else can know that it's deprecated
> 
>   * @get_drvinfo: Report driver/device information.  Should only set the
>   *	@driver, @version, @fw_version and @bus_info fields.  If not
>   *	implemented, the @driver and @bus_info fields will be filled in
>   *	according to the netdev's parent device.
> 
>   * struct ethtool_drvinfo - general driver and device information
> ..
>   * @version: Driver version string; may be an empty string
> 
> It seems not marked as deprecated.

Thanks, it's _being_ deprecated, by which I mean we are slowly removing
the use, and will mark it as deprecated afterwards. I take your point
that we should have started with marking..

> >> +	psdata = cppi5_hdesc_get_psdata(desc_rx);
> >> +	csum_info = psdata[2];
> >> +	dev_dbg(dev, "%s rx csum_info:%#x\n", __func__, csum_info);
> >> +
> >> +	dma_unmap_single(dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
> >> +
> >> +	k3_udma_desc_pool_free(rx_chn->desc_pool, desc_rx);
> >> +
> >> +	if (unlikely(!netif_running(skb->dev))) {  
> > 
> > This is strange, does am65_cpsw_nuss_ndo_slave_stop() not stop RX?  
> 
> Net core will set __LINK_STATE_START = 0 before calling .ndo_stop() and there could some time gap
> between clearing __LINK_STATE_START and actually disabling RX.
> if NAPI is in progress it will just allow to complete current NAPI cycle by discarding unwanted packets
> and without statistic update.

That's fine, let the core discard those packets if it wants to.
Disabling the interface while the traffic is flowing is a rare
occurrence, it's a waste of cycles to check for every packet.

> >> +		dev_kfree_skb_any(skb);
> >> +		return 0;
> >> +	}
> >> +
> >> +	new_skb = netdev_alloc_skb_ip_align(ndev, AM65_CPSW_MAX_PACKET_SIZE);
> >> +	if (new_skb) {
> >> +		skb_put(skb, pkt_len);
> >> +		skb->protocol = eth_type_trans(skb, ndev);
> >> +		am65_cpsw_nuss_rx_csum(skb, csum_info);
> >> +		napi_gro_receive(&common->napi_rx, skb);
> >> +
> >> +		ndev_priv = netdev_priv(ndev);
> >> +		stats = this_cpu_ptr(ndev_priv->stats);
> >> +
> >> +		u64_stats_update_begin(&stats->syncp);
> >> +		stats->rx_packets++;
> >> +		stats->rx_bytes += pkt_len;
> >> +		u64_stats_update_end(&stats->syncp);
> >> +		kmemleak_not_leak(new_skb);
> >> +	} else {
> >> +		ndev->stats.rx_dropped++;
> >> +		new_skb = skb;
> >> +	}  
> >   
> >> +static int am65_cpsw_nuss_tx_poll(struct napi_struct *napi_tx, int budget)
> >> +{
> >> +	struct am65_cpsw_tx_chn *tx_chn = am65_cpsw_napi_to_tx_chn(napi_tx);
> >> +	int num_tx;
> >> +
> >> +	num_tx = am65_cpsw_nuss_tx_compl_packets(tx_chn->common, tx_chn->id,
> >> +						 budget);
> >> +	if (num_tx < budget) {  
> > 
> > The budget is for RX, you can just complete all TX on a NAPI poll.  
> 
> Then TX will block RX. Right? this is soft IRQs which are executed one by one.

If anything completing all TX frames makes it more likely RX will have
another memory to allocate from. AFAIK live lock by TX completions is
unheard of.
> >> +	tx_chn = &common->tx_chns[q_idx];
> >> +	netif_txq = netdev_get_tx_queue(ndev, q_idx);
> >> +
> >> +	/* Map the linear buffer */
> >> +	buf_dma = dma_map_single(dev, skb->data, pkt_len,
> >> +				 DMA_TO_DEVICE);
> >> +	if (unlikely(dma_mapping_error(dev, buf_dma))) {
> >> +		dev_err(dev, "Failed to map tx skb buffer\n");  
> > 
> > You probably don't want to print errors when memory gets depleted.
> > Counter is enough  
> 
> Could you please help me understand what is the relation between "memory depletion"
> and dma_mapping_error()?

I don't know your platform, so the comment may not be accurate, but
usually DMA mappings fail if the device has some memory addressing
constraints, and all memory which can satisfy those is used. Or the
memory situation is extremely dire and IOMMU driver can't allocate
meta data.

> >> +		ret = -EINVAL;  
> > 
> > EINVAL is not a valid netdev_tx_t..  
> 
> Considering dev_xmit_complete() - this part was always "black magic" to me :(
> Will fix.
> 
> >> +		ndev->stats.tx_errors++;
> >> +		goto drop_stop_q;  
> > 
> > Why stop queue on memory mapping error? What will re-enable it?  
> 
> it will not. I'm considering it as critical - no recovery.

Oh, I see. If you're sure this can never happen (tm) on your platfrom
I'd throw in a WARN_ON_ONCE() in this path so users know what's going
on. That said it doesn't seem too hard to recover from, so normal error
handling would be best.

> >> +	}
> >> +
> >> +	first_desc = k3_udma_desc_pool_alloc(tx_chn->desc_pool);
> >> +	if (!first_desc) {
> >> +		dev_dbg(dev, "Failed to allocate descriptor\n");  
> > 
> > ret not set  
> 
> It will return NETDEV_TX_BUSY in this  case - below.
>
> >> +		dma_unmap_single(dev, buf_dma, pkt_len, DMA_TO_DEVICE);
> >> +		goto drop_stop_q_busy;
> >> +	}  

> >> +		}
> >> +	}
> >> +
> >> +err:
> >> +	return ret;
> >> +}  
> >   
> >> +	/* register devres action here, so dev will be disabled
> >> +	 * at right moment. The dev will be enabled in .remove() callback
> >> +	 * unconditionally.
> >> +	 */
> >> +	ret = devm_add_action_or_reset(dev, am65_cpsw_pm_runtime_free, dev);
> >> +	if (ret) {
> >> +		dev_err(dev, "failed to add pm put reset action %d", ret);
> >> +		return ret;
> >> +	}  
> > 
> > Could you explain why you need this? Why can't remove disable PM?
> > 
> > In general looks to me like this driver abuses devm_ infra in ways
> > which make it more complex than it needs to be :(  
> 
> Sry, can't agree here. This allows me to keep release sequence in sane way and get
> rid of mostly all goto for fail cases (which are constant source of errors for complex
> initialization sequences) by using standard framework.

As a reviewer I can tell you with absolute certainty that using devm
anywhere but in the probe function makes the driver a lot harder to
review.

IMHO the API to remove registered callbacks should be avoided like 
the plague.

> Regarding PM runtime
>   -  many Linux core framework provide devm_ APIs this and other
> drivers are happy to use them.
>   - but, there is the problem: DD release sequence is
> 
> 	drv->remove(dev);
> 
> 	devres_release_all(dev);
> 
> and there is no devm_ API for PM runtime. So, if some initialization step is done with devm_ API and
> It depends on device to be active - no way to solve it in .remove() callback easily.
> For example, devm_of_platform_populate().
> 
> With above code I ensure that:
> 
> drv->remove(dev);
>   |- pm_runtime_get()
> 
> devres_release_all(dev);
>   |- devm_of_platform_populate_release()
>   |- pm_runtime_put()
>   |- pm_runtime_disable()

Add devm_pm_* helpers for PM then?  That'd the preferred solution
upstream.