[v3,0/5] ARM: Add GXP UMAC Support

Message ID	20230816215220.114118-1-nick.hawkins@hpe.com
Headers	show Return-Path: <devicetree-owner@vger.kernel.org> From: nick.hawkins@hpe.com To: christophe.jaillet@wanadoo.fr, simon.horman@corigine.com, andrew@lunn.ch, verdun@hpe.com, nick.hawkins@hpe.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, robh+dt@kernel.org, krzysztof.kozlowski+dt@linaro.org, conor+dt@kernel.org, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 0/5] ARM: Add GXP UMAC Support Date: Wed, 16 Aug 2023 16:52:15 -0500 Message-Id: <20230816215220.114118-1-nick.hawkins@hpe.com> Precedence: bulk
Series	ARM: Add GXP UMAC Support \| expand [v3,0/5] ARM: Add GXP UMAC Support [v3,1/5] dt-bindings: net: Add HPE GXP UMAC MDIO [v3,3/5] dt-bindings: net: Add HPE GXP UMAC

Hawkins, Nick Aug. 16, 2023, 9:52 p.m. UTC

From: Nick Hawkins <nick.hawkins@hpe.com>

The GXP contains two Ethernet MACs that can be
connected externally to several physical devices. From an external
interface perspective the BMC provides two SERDES interface connections
capable of either SGMII or 1000Base-X operation. The BMC also provides
a RMII interface for sideband connections to external Ethernet controllers.

The primary MAC (umac0) can be mapped to either SGMII/1000-BaseX
SERDES interface.  The secondary MAC (umac1) can be mapped to only
the second SGMII/1000-Base X Serdes interface or it can be mapped for
RMII sideband.

The MDIO(mdio0) interface from the primary MAC (umac0) is used for
external PHY status. The MDIO(mdio1) interface from
the secondary MAC (umac1) is routed to the SGMII/100Base-X IP blocks
on the two SERDES interface connections. In most cases the internal
phy connects directly to the external phy.

---

Changes since v2:
 *Removed PHY Configuration from MAC driver to Bootloader
 *Fixed several issues with the Kconfig and Makefile for both
  the mdio and umac driver.
 *Fixed code alignment where applicable
 *Removed MDIO from MAC yaml.
 *Added description to explain the use-ncsi use case.
 *Removed multiple unecessary functions and function calls
  from MAC driver

Changes since v1:
 *Corrected improper descriptions and use of | in yaml files
 *Used reverse christmas tree format for network drivers
 *Moved gxp-umac-mdio.c to /mdio/
 *Fixed dependencies on both Kconfigs
 *Added COMPILE_TEST to both Kconfigs
 *Used devm_ functions where possible in both drivers
 *Moved mac-address to inside of port in yaml files
 *Exchanged listing individual yaml files for hpe,gxp*
 *Restricted use of le32

Nick Hawkins (5):
  dt-bindings: net: Add HPE GXP UMAC MDIO
  net: hpe: Add GXP UMAC MDIO
  dt-bindings: net: Add HPE GXP UMAC
  net: hpe: Add GXP UMAC Driver
  MAINTAINERS: HPE: Add GXP UMAC Networking Files

 .../bindings/net/hpe,gxp-umac-mdio.yaml       |  50 ++
 .../devicetree/bindings/net/hpe,gxp-umac.yaml |  97 +++
 MAINTAINERS                                   |   2 +
 drivers/net/ethernet/Kconfig                  |   1 +
 drivers/net/ethernet/Makefile                 |   1 +
 drivers/net/ethernet/hpe/Kconfig              |  32 +
 drivers/net/ethernet/hpe/Makefile             |   1 +
 drivers/net/ethernet/hpe/gxp-umac.c           | 759 ++++++++++++++++++
 drivers/net/ethernet/hpe/gxp-umac.h           |  89 ++
 drivers/net/mdio/Kconfig                      |  13 +
 drivers/net/mdio/Makefile                     |   1 +
 drivers/net/mdio/mdio-gxp-umac.c              | 142 ++++
 12 files changed, 1188 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hpe,gxp-umac-mdio.yaml
 create mode 100644 Documentation/devicetree/bindings/net/hpe,gxp-umac.yaml
 create mode 100644 drivers/net/ethernet/hpe/Kconfig
 create mode 100644 drivers/net/ethernet/hpe/Makefile
 create mode 100644 drivers/net/ethernet/hpe/gxp-umac.c
 create mode 100644 drivers/net/ethernet/hpe/gxp-umac.h
 create mode 100644 drivers/net/mdio/mdio-gxp-umac.c

Andrew Lunn Aug. 17, 2023, 1:08 a.m. UTC | #1

> Changes since v2:
>  *Removed PHY Configuration from MAC driver to Bootloader

By PHY, you mean PCS?

Can you still dynamically swap between 1000BaseX and SGMII?
Is there an interface to determine if the PCS has link?

Is the PHY code upstream in uboot and barebox? Can you detect when it
is missing because the vendor bootloader has been replaced, and report
useful error messages?

       Andrew

Andrew Lunn Aug. 17, 2023, 1:19 a.m. UTC | #2

On Wed, Aug 16, 2023 at 04:52:17PM -0500, nick.hawkins@hpe.com wrote:
> From: Nick Hawkins <nick.hawkins@hpe.com>
> 
> The GXP contains two Universal Ethernet MACs that can be
> connected externally to several physical devices. From an external
> interface perspective the BMC provides two SERDES interface connections
> capable of either SGMII or 1000Base-X operation. The BMC also provides
> a RMII interface for sideband connections to external Ethernet controllers.
> 
> The primary MAC (umac0) can be mapped to either SGMII/1000-BaseX
> SERDES interface.  The secondary MAC (umac1) can be mapped to only
> the second SGMII/1000-Base X Serdes interface or it can be mapped for
> RMII sideband.
> 
> The MDIO(mdio0) interface from the primary MAC (umac0) is used for
> external PHY status and configuration. The MDIO(mdio1) interface from
> the secondary MAC (umac1) is routed to the SGMII/100Base-X IP blocks

I think that is a typo. 100BaseX does not exist, the nearest is 100BaseFX.

> +config GXP_UMAC_MDIO
> +	tristate "GXP UMAC mdio support"
> +	depends on ARCH_HPE || COMPILE_TEST
> +	depends on OF_MDIO && HAS_IOMEM
> +	depends on MDIO_DEVRES
> +	help
> +	  Say y here to support the GXP UMAC MDIO bus. The
> +	  MDIO (mdio0) interface from the primary MAC (umac0)
> +	  is used for external PHY status and configuration.
> +	  The MDIO (mdio1) interface from the secondary MAC
> +	  (umac1) is routed to the SGMII/100Base-X IP blocks

Same here.

> --- a/drivers/net/mdio/Makefile
> +++ b/drivers/net/mdio/Makefile
> @@ -11,6 +11,7 @@ obj-$(CONFIG_MDIO_BCM_UNIMAC)		+= mdio-bcm-unimac.o
>  obj-$(CONFIG_MDIO_BITBANG)		+= mdio-bitbang.o
>  obj-$(CONFIG_MDIO_CAVIUM)		+= mdio-cavium.o
>  obj-$(CONFIG_MDIO_GPIO)			+= mdio-gpio.o
> +obj-$(CONFIG_GXP_UMAC_MDIO)		+= mdio-gxp-umac.o
>  obj-$(CONFIG_MDIO_HISI_FEMAC)		+= mdio-hisi-femac.o

Don't you think this looks out of place. The only one not CONFIG_MDIO ?

> +static int umac_mdio_write(struct mii_bus *bus, int phy_id, int reg, u16 value)
> +{
> +	struct umac_mdio_priv *umac_mdio = bus->priv;
> +	unsigned int status;
> +	int ret;
> +
> +	writel(value, umac_mdio->base + UMAC_MII_DATA);

...

> +	if (ret)
> +		dev_err(bus->parent, "mdio read time out\n");

cut/paste error.


    Andrew

---
pw-bot: cr

Andrew Lunn Aug. 17, 2023, 1:46 a.m. UTC | #3

> +static int umac_ioctl(struct net_device *ndev, struct ifreq *ifr, int cmd)
> +{
> +	if (!netif_running(ndev))
> +		return -EINVAL;
> +
> +	if (!ndev->phydev)
> +		return -ENODEV;
> +
> +	return phy_mii_ioctl(ndev->phydev, ifr, cmd);

Sometimes power management does not allow it, but can you use phy_do_ioctl()?

> +static int umac_init_hw(struct net_device *ndev)


> +{
> +	} else {
> +		value |= UMAC_CFG_FULL_DUPLEX;
> +
> +		if (ndev->phydev->speed == SPEED_1000) {

I'm pretty sure i pointed this out once before. It is not safe to
access phydev members outside of the adjust link callback.

> +static int umac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct umac_priv *umac = netdev_priv(ndev);
> +	struct umac_tx_desc_entry *ptxdesc;
> +	unsigned int length;
> +	u8 *pframe;
> +
> +	ptxdesc = &umac->tx_descs->entrylist[umac->tx_cur];
> +	pframe = umac->tx_descs->framelist[umac->tx_cur];
> +
> +	length = skb->len;
> +	if (length > 1514) {
> +		netdev_err(ndev, "send data %d bytes > 1514, clamp it to 1514\n",
> +			   skb->len);

Than should be rate limited.

Also, if you chop the end of the packet, it is going to be useless. It
is better to drop it, to improve your goodput.

> +		length = 1514;
> +	}
> +
> +	memset(pframe, 0, UMAC_MAX_FRAME_SIZE);
> +	memcpy(pframe, skb->data, length);

Is this cached or uncached memory? uncached is expansive so you want
to avoid touching it twice. Depending on how busy your cache is,
touching it twice might cause it to expelled from L1 on the first
write, so you could be writing to L2 twice for no reason. Do the math
and calculate the tail space you need to zero.

I would also suggest you look at the page pool code and use that for
all you buffer handling. It is likely to be more efficient than what
you have here.

> +static int umac_setup_phy(struct net_device *ndev)
> +{
> +	struct umac_priv *umac = netdev_priv(ndev);
> +	struct platform_device *pdev = umac->pdev;
> +	struct device_node *phy_handle;
> +	phy_interface_t interface;
> +	struct device_node *eth_ports_np;
> +	struct device_node *port_np;
> +	int ret;
> +	int i;
> +
> +	/* Get child node ethernet-ports. */
> +	eth_ports_np = of_get_child_by_name(pdev->dev.of_node, "ethernet-ports");
> +	if (!eth_ports_np) {
> +		dev_err(&pdev->dev, "No ethernet-ports child node found!\n");
> +		return -ENODEV;
> +	}
> +
> +	for (i = 0; i < NUMBER_OF_PORTS; i++) {
> +		/* Get port@i of node ethernet-ports */
> +		port_np = gxp_umac_get_eth_child_node(eth_ports_np, i);
> +		if (!port_np)
> +			break;
> +
> +		if (i == INTERNAL_PORT) {
> +			phy_handle = of_parse_phandle(port_np, "phy-handle", 0);
> +			if (phy_handle) {
> +				umac->int_phy_dev = of_phy_find_device(phy_handle);
> +				if (!umac->int_phy_dev)
> +					return -ENODEV;

You appear to find the PHY, and then do nothing with it?

    Andrew

Andrew Lunn Aug. 17, 2023, 1:47 a.m. UTC | #4

On Wed, Aug 16, 2023 at 04:52:20PM -0500, nick.hawkins@hpe.com wrote:
> From: Nick Hawkins <nick.hawkins@hpe.com>
> 
> List the files added for supporting the UMAC networking on GXP.
> 
> Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

Simon Horman Aug. 17, 2023, 7:45 a.m. UTC | #5

On Wed, Aug 16, 2023 at 04:52:19PM -0500, nick.hawkins@hpe.com wrote:
> From: Nick Hawkins <nick.hawkins@hpe.com>
> 
> The GXP contains two Ethernet MACs that can be connected externally
> to several physical devices. From an external interface perspective
> the BMC provides two SERDES interface connections capable of either
> SGMII or 1000Base-X operation. The BMC also provides a RMII interface
> for sideband connections to external Ethernet controllers.
> 
> The primary MAC (umac0) can be mapped to either SGMII/1000-BaseX
> SERDES interface.  The secondary MAC (umac1) can be mapped to only
> the second SGMII/1000-Base X Serdes interface or it can be mapped for
> RMII sideband.
> 
> Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com>

...

> diff --git a/drivers/net/ethernet/hpe/gxp-umac.c b/drivers/net/ethernet/hpe/gxp-umac.c

...

> +static int umac_init_mac_address(struct net_device *ndev)
> +{
> +	struct umac_priv *umac = netdev_priv(ndev);
> +	struct platform_device *pdev = umac->pdev;
> +	char addr[ETH_ALEN];
> +	int err;
> +
> +	err = of_get_mac_address(pdev->dev.of_node, addr);
> +	if (err)
> +		netdev_err(ndev, "Failed to get address from device-tree: %d\n",
> +			   err);
> +		return -EINVAL;

Hi Nick,

it looks like there should be some {} involved in the condition above,
else the function will return -EINVAL unconditionally.

Flagged by W=1 builds with clang-16 and gcc-13.

> +
> +	if (is_valid_ether_addr(addr)) {
> +		dev_addr_set(ndev, addr);
> +		netdev_dbg(ndev,
> +			   "Read MAC address %pM from DTB\n", ndev->dev_addr);
> +	} else {
> +		netdev_err(ndev, "Mac Address is Invalid");
> +		return -EINVAL;
> +	}
> +
> +	dev_addr_set(ndev, addr);
> +	umac_set_mac_address(ndev, addr);
> +
> +	return 0;
> +}

...

Hawkins, Nick Aug. 17, 2023, 7:19 p.m. UTC | #6

Hi Andrew,

> > +static int umac_init_hw(struct net_device *ndev)


> > +{
> > +	} else {
> > +		value |= UMAC_CFG_FULL_DUPLEX;
> > +
> > +		if (ndev->phydev->speed == SPEED_1000) {

> I'm pretty sure i pointed this out once before. It is not safe to
> access phydev members outside of the adjust link callback.

My mistake I missed this phydev member access. I will rely on
the adjust link callback to configure it.

> > +static int umac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> > +{
> > +	struct umac_priv *umac = netdev_priv(ndev);
> > +	struct umac_tx_desc_entry *ptxdesc;
> > +	unsigned int length;
> > +	u8 *pframe;
> > +
> > +	ptxdesc = &umac->tx_descs->entrylist[umac->tx_cur];
> > +	pframe = umac->tx_descs->framelist[umac->tx_cur];
> > +
> > +	length = skb->len;
> > +	if (length > 1514) {
> > +		netdev_err(ndev, "send data %d bytes > 1514, clamp it to 1514\n",
> > +			   skb->len);

> Than should be rate limited.

How is this done? Is there a particular function to call that
will handle it in the backend?
I thought the rate limiting is happening in this
function already below at:

/* if current tx ring buffer is full, stop the queue */
        ptxdesc = &umac->tx_descs->entrylist[umac->tx_cur];
        if (ptxdesc->status & UMAC_RING_ENTRY_HW_OWN)
                netif_stop_queue(ndev);

> Also, if you chop the end of the packet, it is going to be useless. It
> is better to drop it, to improve your goodput.

I will drop it.

Thanks,

-Nick Hawkins

Andrew Lunn Aug. 17, 2023, 7:44 p.m. UTC | #7

> > > +	if (length > 1514) {
> > > +		netdev_err(ndev, "send data %d bytes > 1514, clamp it to 1514\n",
> > > +			   skb->len);
> 
> > Than should be rate limited.
> 
> How is this done? Is there a particular function to call that
> will handle it in the backend?

Sorry, i was ambiguous. I meant the netdev_err() should be rate
limited, otherwise some user space application could DOS your system
by sending big packets at line rate flooding your logs.

   Andrew

Hawkins, Nick Aug. 18, 2023, 8:11 p.m. UTC | #8

Hi Andrew,

> > + length = 1514;
> > + }
> > +
> > + memset(pframe, 0, UMAC_MAX_FRAME_SIZE);
> > + memcpy(pframe, skb->data, length);


> Is this cached or uncached memory? uncached is expansive so you want
> to avoid touching it twice. Depending on how busy your cache is,
> touching it twice might cause it to expelled from L1 on the first
> write, so you could be writing to L2 twice for no reason. Do the math
> and calculate the tail space you need to zero.

> I would also suggest you look at the page pool code and use that for
> all you buffer handling. It is likely to be more efficient than what
> you have here.

Would this be the #include <linux/dmapool.h> library?

Thank you for the assistance,

-Nick Hawkins

Andrew Lunn Aug. 18, 2023, 9:15 p.m. UTC | #9

> Would this be the #include <linux/dmapool.h> library?

<include/net/page_pool/helpers.h>

Take a look at driver/net/ethernet/freescale/fec_main.c That
driver/device is of similar complexity to yours. It had a recent
change from its own buffer management to page pool. It
started with

commit 95698ff6177b5f1f13f251da60e7348413046ae4
Author: Shenwei Wang <shenwei.wang@nxp.com>
Date:   Fri Sep 30 15:44:27 2022 -0500

    net: fec: using page pool to manage RX buffers

but there are additional patches later.

    Andrew

Hawkins, Nick Aug. 22, 2023, 7 p.m. UTC | #10

> <include/net/page_pool/helpers.h>

Hi Andrew,

I can't seem to find this file in linux master. Where is it?

> Take a look at driver/net/ethernet/freescale/fec_main.c That
> driver/device is of similar complexity to yours. It had a recent
> change from its own buffer management to page pool. It
> started with

I have looked over this driver and have a couple questions
about the pages in general.

How do I determine what the correct pool size should be for the
RX and TX?

I must admit I am not familiar with XDP.
Is it required for the page pool library? Does it require any kind
of hardware enablement? Or is it just purely code?

Thanks for the feedback and assistance,

-Nick Hawkins

Andrew Lunn Aug. 22, 2023, 8 p.m. UTC | #11

On Tue, Aug 22, 2023 at 07:00:49PM +0000, Hawkins, Nick wrote:
> 
> > <include/net/page_pool/helpers.h>
> 
> Hi Andrew,
> 
> I can't seem to find this file in linux master. Where is it?

~/linux$ ls include/net/page_pool/helpers.h
include/net/page_pool/helpers.h

When you say master, do you mean net-next/main? This is a network
driver, so you should be based on top of that tree.

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html#netdev-faq

> > Take a look at driver/net/ethernet/freescale/fec_main.c That
> > driver/device is of similar complexity to yours. It had a recent
> > change from its own buffer management to page pool. It
> > started with
> 
> I have looked over this driver and have a couple questions
> about the pages in general.
> 
> How do I determine what the correct pool size should be for the
> RX and TX?

There has been some recent discussion about that. Search the netdev
list over the last couple of week. 

> I must admit I am not familiar with XDP.
> Is it required for the page pool library?

Nope, not required at all. The FEC driver was first converted to page
pool, and then XDP support added. The conversion to page pool made the
driver faster, it could handle more packets per second. That is why i
suggested using it, plus it means less driver code, which means less
bugs.

	Andrew

Hawkins, Nick Aug. 25, 2023, 6:54 p.m. UTC | #12

Hi Andrew,

Thank you for pointing me at the correct repo to use. I was using the
incorrect one.

> Nope, not required at all. The FEC driver was first converted to page
> pool, and then XDP support added. The conversion to page pool made the
> driver faster, it could handle more packets per second. That is why i
> suggested using it, plus it means less driver code, which means less
> bugs.

I have been trying to figure out how exactly I can translate the current code
over to using the page pool api over the past week. It seems like it is quiet
a complex change. As the driver seems to be keeping up with our
performance requirements would it be acceptable to mark this as a
TODO: for a future enhancement?

Thank you for the assistance,

-Nick Hawkins

Andrew Lunn Sept. 4, 2023, 3:58 a.m. UTC | #13

> I have been trying to figure out how exactly I can translate the current code
> over to using the page pool api over the past week. It seems like it is quiet
> a complex change. As the driver seems to be keeping up with our
> performance requirements would it be acceptable to mark this as a

Its not just performance. Using well debugged and shared core code
means less bugs in your driver. It makes maintenance simpler since
there are more people who understand the page pool code than what you
have in your driver and it makes your driver more like other drivers.

So overall you will end up with a better quality driver by adapting
the page pool code.

    Andrew

Hawkins, Nick Sept. 11, 2023, 9:12 p.m. UTC | #14

> > I have been trying to figure out how exactly I can translate the current code
> > over to using the page pool api over the past week. It seems like it is quiet
> > a complex change. As the driver seems to be keeping up with our
> > performance requirements would it be acceptable to mark this as a


> Its not just performance. Using well debugged and shared core code
> means less bugs in your driver. It makes maintenance simpler since
> there are more people who understand the page pool code than what you
> have in your driver and it makes your driver more like other drivers.


> So overall you will end up with a better quality driver by adapting
> the page pool code.

Greetings Andrew,

In that case I will continue to attempt to try and adopt the page pool
API. In all the examples with page pool HW rings it appears they are
using alloc_etherdev_mqs. Are there any HW requirements to use this
library? If there are none what is the typical number for rx and tx
queues?

Thank you for the assistance,

-Nick Hawkins

Andrew Lunn Sept. 12, 2023, 11:52 p.m. UTC | #15

> Greetings Andrew,
> 
> In that case I will continue to attempt to try and adopt the page pool
> API. In all the examples with page pool HW rings it appears they are
> using alloc_etherdev_mqs. Are there any HW requirements to use this
> library? If there are none what is the typical number for rx and tx
> queues?

There are no hardware requirements as far as i understand it. If your
hardware only has one RX queue and one TX queue, define it as
1. Having more allows you to spread the load over multiple CPUs, with
each queue typically having its own interrupt, and interrupts are then
mapped to a single CPU. But if you don't have any of that, it should
not be a hindrance.

    Andrew

[v3,0/5] ARM: Add GXP UMAC Support

Message

Comments