diff mbox series

[1/7] net/dsa: configure autoneg for CPU port

Message ID 20190910154238.9155-2-bob.beckett@collabora.com
State Changes Requested
Delegated to: David Miller
Headers show
Series net: dsa: mv88e6xxx: features to handle network storms | expand

Commit Message

Robert Beckett Sept. 10, 2019, 3:41 p.m. UTC
Configure autoneg for phy connected CPU ports.
This allows us to use autoneg between the CPU port's phy and the link
partner's phy.
This enables us to negoatiate pause frame transmission to prioritise
packet delivery over throughput.

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 net/dsa/port.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Vivien Didelot Sept. 10, 2019, 4:14 p.m. UTC | #1
Hi Robert,

On Tue, 10 Sep 2019 16:41:47 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> Configure autoneg for phy connected CPU ports.
> This allows us to use autoneg between the CPU port's phy and the link
> partner's phy.
> This enables us to negoatiate pause frame transmission to prioritise
> packet delivery over throughput.
> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  net/dsa/port.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index f071acf2842b..1b6832eac2c5 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -538,10 +538,20 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable)
>  		return PTR_ERR(phydev);
>  
>  	if (enable) {
> +		phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII |
> +				    SUPPORTED_AUI | SUPPORTED_FIBRE |
> +				    SUPPORTED_BNC | SUPPORTED_Pause |
> +				    SUPPORTED_Asym_Pause;
> +		phydev->advertising = phydev->supported;
> +

This seems a bit intruisive to me. I'll get back to you.

>  		err = genphy_config_init(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
>  
> +		err = genphy_config_aneg(phydev);
> +		if (err < 0)
> +			goto err_put_dev;
> +
>  		err = genphy_resume(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
Florian Fainelli Sept. 10, 2019, 4:56 p.m. UTC | #2
On 9/10/19 8:41 AM, Robert Beckett wrote:
> Configure autoneg for phy connected CPU ports.
> This allows us to use autoneg between the CPU port's phy and the link
> partner's phy.
> This enables us to negoatiate pause frame transmission to prioritise
> packet delivery over throughput.

s/autoneg/auto-negotiation/
s/phy/PHY/
s/negoatiate/negotiate/
s/prioritise/prioritize/ (maybe the latter is just my US english
dictionary tripping up)

Also the subject should be net: dsa: Configure auto-negotiation for CPU
port to match previous submissions done to that file.

Fixing up that code path sounds reasonable, but are you not hitting the
PHYLINK code path instead?

> 
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>  net/dsa/port.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index f071acf2842b..1b6832eac2c5 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -538,10 +538,20 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable)
>  		return PTR_ERR(phydev);
>  
>  	if (enable) {
> +		phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII |
> +				    SUPPORTED_AUI | SUPPORTED_FIBRE |
> +				    SUPPORTED_BNC | SUPPORTED_Pause |
> +				    SUPPORTED_Asym_Pause;
> +		phydev->advertising = phydev->supported;
> +
>  		err = genphy_config_init(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
>  
> +		err = genphy_config_aneg(phydev);
> +		if (err < 0)
> +			goto err_put_dev;
> +
>  		err = genphy_resume(phydev);
>  		if (err < 0)
>  			goto err_put_dev;
>
Andrew Lunn Sept. 10, 2019, 6:26 p.m. UTC | #3
On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote:
> This enables us to negoatiate pause frame transmission to prioritise
> packet delivery over throughput.

I don't think we can unconditionally enable this. It is a big
behaviour change, and it is likely to break running systems. It has
affects on QoS, packet prioritisation, etc.

I think there needs to be a configuration knob. But unfortunately, i
don't know of a good place to put this knob. The switch CPU port is
not visible in any way.

    Andrew
Florian Fainelli Sept. 10, 2019, 6:29 p.m. UTC | #4
On 9/10/19 11:26 AM, Andrew Lunn wrote:
> On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote:
>> This enables us to negoatiate pause frame transmission to prioritise
>> packet delivery over throughput.
> 
> I don't think we can unconditionally enable this. It is a big
> behaviour change, and it is likely to break running systems. It has
> affects on QoS, packet prioritisation, etc.
> 
> I think there needs to be a configuration knob. But unfortunately, i
> don't know of a good place to put this knob. The switch CPU port is
> not visible in any way.

Broadcast storm suppression is to be solved at ingress, not on the CPU
port, once this lands on the CPU port, it's game over already.
Robert Beckett Sept. 11, 2019, 9:16 a.m. UTC | #5
On Tue, 2019-09-10 at 11:29 -0700, Florian Fainelli wrote:
> On 9/10/19 11:26 AM, Andrew Lunn wrote:
> > On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote:
> > > This enables us to negoatiate pause frame transmission to
> > > prioritise
> > > packet delivery over throughput.
> > 
> > I don't think we can unconditionally enable this. It is a big
> > behaviour change, and it is likely to break running systems. It has
> > affects on QoS, packet prioritisation, etc.
> > 
> > I think there needs to be a configuration knob. But unfortunately,
> > i
> > don't know of a good place to put this knob. The switch CPU port is
> > not visible in any way.
> 
> Broadcast storm suppression is to be solved at ingress, not on the
> CPU
> port, once this lands on the CPU port, it's game over already.

It is not just for broadcast storm protection. The original issue that
made me look in to all of this turned out to be rx descritor ring
buffer exhaustion due to the CPU not being able to keep up with packet
reception.

Although the simple repro case for it is a broadcast storm, this could
happen with many legitimate small packets, and the correct way to
handle it seems to be pause frames, though I am not traditionally a
network programmer, so my knowledge may be incorrect. Please advise if
you know of a better way to handle that.

Fundamentally, with a phy to phy CPU connection, the CPU MAC may well
wish to enable pause frames for various reasons, so we should strive to
handle that I think.
Robert Beckett Sept. 11, 2019, 9:54 a.m. UTC | #6
On Wed, 2019-09-11 at 10:16 +0100, Robert Beckett wrote:
> On Tue, 2019-09-10 at 11:29 -0700, Florian Fainelli wrote:
> > On 9/10/19 11:26 AM, Andrew Lunn wrote:
> > > On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote:
> > > > This enables us to negoatiate pause frame transmission to
> > > > prioritise
> > > > packet delivery over throughput.
> > > 
> > > I don't think we can unconditionally enable this. It is a big
> > > behaviour change, and it is likely to break running systems. It
> > > has
> > > affects on QoS, packet prioritisation, etc.
> > > 
> > > I think there needs to be a configuration knob. But
> > > unfortunately,
> > > i
> > > don't know of a good place to put this knob. The switch CPU port
> > > is
> > > not visible in any way.
> > 
> > Broadcast storm suppression is to be solved at ingress, not on the
> > CPU
> > port, once this lands on the CPU port, it's game over already.
> 
> It is not just for broadcast storm protection. The original issue
> that
> made me look in to all of this turned out to be rx descritor ring
> buffer exhaustion due to the CPU not being able to keep up with
> packet
> reception.
> 
> Although the simple repro case for it is a broadcast storm, this
> could
> happen with many legitimate small packets, and the correct way to
> handle it seems to be pause frames, though I am not traditionally a
> network programmer, so my knowledge may be incorrect. Please advise
> if
> you know of a better way to handle that.
> 
> Fundamentally, with a phy to phy CPU connection, the CPU MAC may well
> wish to enable pause frames for various reasons, so we should strive
> to
> handle that I think.
> 

As an aside, do any of you have experience of trying to enable PIRL on
the Marvell switches? The first thing I tried was configuring it for
packet number based (rather than byte count based) input rate limiting,
but it never seemed to have any effect even at extreme values that
should in theory have greatly limited the number of packets allowed to
ingress.

After investigating the root cause and finding it was due to the CPU's
inability to process the received packets quickly enough, pause frames
and port prioritization seemed like the correct fix anyway.
Andrew Lunn Sept. 11, 2019, 10:52 p.m. UTC | #7
> It is not just for broadcast storm protection. The original issue that
> made me look in to all of this turned out to be rx descritor ring
> buffer exhaustion due to the CPU not being able to keep up with packet
> reception.

Pause frames does not really solve this problem. The switch will at
some point fill its buffers, and start throwing packets away. Or it
needs to send pause packets it its peers. And then your whole switch
throughput goes down. Packets will always get thrown away, so you need
QoS in your network to give the network hints about which frames is
should throw away first.

..

> Fundamentally, with a phy to phy CPU connection, the CPU MAC may well
> wish to enable pause frames for various reasons, so we should strive to
> handle that I think.

It actually has nothing to do with PHY to PHY connections. You can use
pause frames with direct MAC to MAC connections. PHY auto-negotiation
is one way to indicate both ends support it, but there are also other
ways. e.g.

ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]

on the SoC you could do

ethtool --pause eth0 autoneg off rx on tx on

to force the SoC to send and process pause frames. Ideally i would
prefer a solution like this, since it is not a change of behaviour for
everybody else.

   Andrew
Robert Beckett Sept. 12, 2019, 10:14 a.m. UTC | #8
On Thu, 2019-09-12 at 00:52 +0200, Andrew Lunn wrote:
> > It is not just for broadcast storm protection. The original issue
> > that
> > made me look in to all of this turned out to be rx descritor ring
> > buffer exhaustion due to the CPU not being able to keep up with
> > packet
> > reception.
> 
> Pause frames does not really solve this problem. The switch will at
> some point fill its buffers, and start throwing packets away. Or it
> needs to send pause packets it its peers. And then your whole switch
> throughput goes down. Packets will always get thrown away, so you
> need
> QoS in your network to give the network hints about which frames is
> should throw away first.
> 

Indeed. This is the understanding I was working with.
This patch series enables pause frames, output queue prriority and
strict scheduling to egress the high priority queues first.
This means that when the switch starts dropping frames, it drops from
the lowest priority as the highest ones are delivered at line speed
without issue.

> ..
> 
> > Fundamentally, with a phy to phy CPU connection, the CPU MAC may
> > well
> > wish to enable pause frames for various reasons, so we should
> > strive to
> > handle that I think.
> 
> It actually has nothing to do with PHY to PHY connections. You can
> use
> pause frames with direct MAC to MAC connections. PHY auto-negotiation
> is one way to indicate both ends support it, but there are also other
> ways. e.g.
> 
> ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]
> 
> on the SoC you could do
> 
> ethtool --pause eth0 autoneg off rx on tx on
> 
> to force the SoC to send and process pause frames. Ideally i would
> prefer a solution like this, since it is not a change of behaviour
> for
> everybody else.

Good point, well made.
The reason for using autoneg in this series was due to having no netdev
to run ethtool against for the CPU port.
If we go down the route of creating a netdev for the CPU port, then we
could indeed force pause frames at both ends.

However, given that the phy on the marvell switch is capable of autoneg
, is it not reasonable to setup the advertisement and let autoneg take
care of it if using phy to phy connection?

> 
>    Andrew
Andrew Lunn Sept. 12, 2019, 10:43 a.m. UTC | #9
> > It actually has nothing to do with PHY to PHY connections. You can
> > use
> > pause frames with direct MAC to MAC connections. PHY auto-negotiation
> > is one way to indicate both ends support it, but there are also other
> > ways. e.g.
> > 
> > ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]
> > 
> > on the SoC you could do
> > 
> > ethtool --pause eth0 autoneg off rx on tx on
> > 
> > to force the SoC to send and process pause frames. Ideally i would
> > prefer a solution like this, since it is not a change of behaviour
> > for
> > everybody else.
> 
> Good point, well made.
> The reason for using autoneg in this series was due to having no netdev
> to run ethtool against for the CPU port.

Do you need one? It is the IMX which is the bottle neck. It is the one
which needs to send pause frames. You have a netdev for that. Have you
checked if the switch will react on pause frames without your
change. Play with the command i give above on the master interface. It
looks like the FEC driver fully supports synchronous pause
configuration.

> However, given that the phy on the marvell switch is capable of
> autoneg , is it not reasonable to setup the advertisement and let
> autoneg take care of it if using phy to phy connection?

Most designs don't use back to back PHYs for the CPU port. They save
the cost and connect MACs back to back using RGMII, or maybe SERDES.
If we are going for a method which can configure pause between the CPU
and the switch, it needs to be generic and work for both setups.

    Andrew
diff mbox series

Patch

diff --git a/net/dsa/port.c b/net/dsa/port.c
index f071acf2842b..1b6832eac2c5 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -538,10 +538,20 @@  static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable)
 		return PTR_ERR(phydev);
 
 	if (enable) {
+		phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII |
+				    SUPPORTED_AUI | SUPPORTED_FIBRE |
+				    SUPPORTED_BNC | SUPPORTED_Pause |
+				    SUPPORTED_Asym_Pause;
+		phydev->advertising = phydev->supported;
+
 		err = genphy_config_init(phydev);
 		if (err < 0)
 			goto err_put_dev;
 
+		err = genphy_config_aneg(phydev);
+		if (err < 0)
+			goto err_put_dev;
+
 		err = genphy_resume(phydev);
 		if (err < 0)
 			goto err_put_dev;