Message ID | 20190910154238.9155-2-bob.beckett@collabora.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | net: dsa: mv88e6xxx: features to handle network storms | expand |
Hi Robert, On Tue, 10 Sep 2019 16:41:47 +0100, Robert Beckett <bob.beckett@collabora.com> wrote: > Configure autoneg for phy connected CPU ports. > This allows us to use autoneg between the CPU port's phy and the link > partner's phy. > This enables us to negoatiate pause frame transmission to prioritise > packet delivery over throughput. > > Signed-off-by: Robert Beckett <bob.beckett@collabora.com> > --- > net/dsa/port.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/net/dsa/port.c b/net/dsa/port.c > index f071acf2842b..1b6832eac2c5 100644 > --- a/net/dsa/port.c > +++ b/net/dsa/port.c > @@ -538,10 +538,20 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable) > return PTR_ERR(phydev); > > if (enable) { > + phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII | > + SUPPORTED_AUI | SUPPORTED_FIBRE | > + SUPPORTED_BNC | SUPPORTED_Pause | > + SUPPORTED_Asym_Pause; > + phydev->advertising = phydev->supported; > + This seems a bit intruisive to me. I'll get back to you. > err = genphy_config_init(phydev); > if (err < 0) > goto err_put_dev; > > + err = genphy_config_aneg(phydev); > + if (err < 0) > + goto err_put_dev; > + > err = genphy_resume(phydev); > if (err < 0) > goto err_put_dev;
On 9/10/19 8:41 AM, Robert Beckett wrote: > Configure autoneg for phy connected CPU ports. > This allows us to use autoneg between the CPU port's phy and the link > partner's phy. > This enables us to negoatiate pause frame transmission to prioritise > packet delivery over throughput. s/autoneg/auto-negotiation/ s/phy/PHY/ s/negoatiate/negotiate/ s/prioritise/prioritize/ (maybe the latter is just my US english dictionary tripping up) Also the subject should be net: dsa: Configure auto-negotiation for CPU port to match previous submissions done to that file. Fixing up that code path sounds reasonable, but are you not hitting the PHYLINK code path instead? > > Signed-off-by: Robert Beckett <bob.beckett@collabora.com> > --- > net/dsa/port.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/net/dsa/port.c b/net/dsa/port.c > index f071acf2842b..1b6832eac2c5 100644 > --- a/net/dsa/port.c > +++ b/net/dsa/port.c > @@ -538,10 +538,20 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable) > return PTR_ERR(phydev); > > if (enable) { > + phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII | > + SUPPORTED_AUI | SUPPORTED_FIBRE | > + SUPPORTED_BNC | SUPPORTED_Pause | > + SUPPORTED_Asym_Pause; > + phydev->advertising = phydev->supported; > + > err = genphy_config_init(phydev); > if (err < 0) > goto err_put_dev; > > + err = genphy_config_aneg(phydev); > + if (err < 0) > + goto err_put_dev; > + > err = genphy_resume(phydev); > if (err < 0) > goto err_put_dev; >
On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote: > This enables us to negoatiate pause frame transmission to prioritise > packet delivery over throughput. I don't think we can unconditionally enable this. It is a big behaviour change, and it is likely to break running systems. It has affects on QoS, packet prioritisation, etc. I think there needs to be a configuration knob. But unfortunately, i don't know of a good place to put this knob. The switch CPU port is not visible in any way. Andrew
On 9/10/19 11:26 AM, Andrew Lunn wrote: > On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote: >> This enables us to negoatiate pause frame transmission to prioritise >> packet delivery over throughput. > > I don't think we can unconditionally enable this. It is a big > behaviour change, and it is likely to break running systems. It has > affects on QoS, packet prioritisation, etc. > > I think there needs to be a configuration knob. But unfortunately, i > don't know of a good place to put this knob. The switch CPU port is > not visible in any way. Broadcast storm suppression is to be solved at ingress, not on the CPU port, once this lands on the CPU port, it's game over already.
On Tue, 2019-09-10 at 11:29 -0700, Florian Fainelli wrote: > On 9/10/19 11:26 AM, Andrew Lunn wrote: > > On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote: > > > This enables us to negoatiate pause frame transmission to > > > prioritise > > > packet delivery over throughput. > > > > I don't think we can unconditionally enable this. It is a big > > behaviour change, and it is likely to break running systems. It has > > affects on QoS, packet prioritisation, etc. > > > > I think there needs to be a configuration knob. But unfortunately, > > i > > don't know of a good place to put this knob. The switch CPU port is > > not visible in any way. > > Broadcast storm suppression is to be solved at ingress, not on the > CPU > port, once this lands on the CPU port, it's game over already. It is not just for broadcast storm protection. The original issue that made me look in to all of this turned out to be rx descritor ring buffer exhaustion due to the CPU not being able to keep up with packet reception. Although the simple repro case for it is a broadcast storm, this could happen with many legitimate small packets, and the correct way to handle it seems to be pause frames, though I am not traditionally a network programmer, so my knowledge may be incorrect. Please advise if you know of a better way to handle that. Fundamentally, with a phy to phy CPU connection, the CPU MAC may well wish to enable pause frames for various reasons, so we should strive to handle that I think.
On Wed, 2019-09-11 at 10:16 +0100, Robert Beckett wrote: > On Tue, 2019-09-10 at 11:29 -0700, Florian Fainelli wrote: > > On 9/10/19 11:26 AM, Andrew Lunn wrote: > > > On Tue, Sep 10, 2019 at 04:41:47PM +0100, Robert Beckett wrote: > > > > This enables us to negoatiate pause frame transmission to > > > > prioritise > > > > packet delivery over throughput. > > > > > > I don't think we can unconditionally enable this. It is a big > > > behaviour change, and it is likely to break running systems. It > > > has > > > affects on QoS, packet prioritisation, etc. > > > > > > I think there needs to be a configuration knob. But > > > unfortunately, > > > i > > > don't know of a good place to put this knob. The switch CPU port > > > is > > > not visible in any way. > > > > Broadcast storm suppression is to be solved at ingress, not on the > > CPU > > port, once this lands on the CPU port, it's game over already. > > It is not just for broadcast storm protection. The original issue > that > made me look in to all of this turned out to be rx descritor ring > buffer exhaustion due to the CPU not being able to keep up with > packet > reception. > > Although the simple repro case for it is a broadcast storm, this > could > happen with many legitimate small packets, and the correct way to > handle it seems to be pause frames, though I am not traditionally a > network programmer, so my knowledge may be incorrect. Please advise > if > you know of a better way to handle that. > > Fundamentally, with a phy to phy CPU connection, the CPU MAC may well > wish to enable pause frames for various reasons, so we should strive > to > handle that I think. > As an aside, do any of you have experience of trying to enable PIRL on the Marvell switches? The first thing I tried was configuring it for packet number based (rather than byte count based) input rate limiting, but it never seemed to have any effect even at extreme values that should in theory have greatly limited the number of packets allowed to ingress. After investigating the root cause and finding it was due to the CPU's inability to process the received packets quickly enough, pause frames and port prioritization seemed like the correct fix anyway.
> It is not just for broadcast storm protection. The original issue that > made me look in to all of this turned out to be rx descritor ring > buffer exhaustion due to the CPU not being able to keep up with packet > reception. Pause frames does not really solve this problem. The switch will at some point fill its buffers, and start throwing packets away. Or it needs to send pause packets it its peers. And then your whole switch throughput goes down. Packets will always get thrown away, so you need QoS in your network to give the network hints about which frames is should throw away first. .. > Fundamentally, with a phy to phy CPU connection, the CPU MAC may well > wish to enable pause frames for various reasons, so we should strive to > handle that I think. It actually has nothing to do with PHY to PHY connections. You can use pause frames with direct MAC to MAC connections. PHY auto-negotiation is one way to indicate both ends support it, but there are also other ways. e.g. ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off] on the SoC you could do ethtool --pause eth0 autoneg off rx on tx on to force the SoC to send and process pause frames. Ideally i would prefer a solution like this, since it is not a change of behaviour for everybody else. Andrew
On Thu, 2019-09-12 at 00:52 +0200, Andrew Lunn wrote: > > It is not just for broadcast storm protection. The original issue > > that > > made me look in to all of this turned out to be rx descritor ring > > buffer exhaustion due to the CPU not being able to keep up with > > packet > > reception. > > Pause frames does not really solve this problem. The switch will at > some point fill its buffers, and start throwing packets away. Or it > needs to send pause packets it its peers. And then your whole switch > throughput goes down. Packets will always get thrown away, so you > need > QoS in your network to give the network hints about which frames is > should throw away first. > Indeed. This is the understanding I was working with. This patch series enables pause frames, output queue prriority and strict scheduling to egress the high priority queues first. This means that when the switch starts dropping frames, it drops from the lowest priority as the highest ones are delivered at line speed without issue. > .. > > > Fundamentally, with a phy to phy CPU connection, the CPU MAC may > > well > > wish to enable pause frames for various reasons, so we should > > strive to > > handle that I think. > > It actually has nothing to do with PHY to PHY connections. You can > use > pause frames with direct MAC to MAC connections. PHY auto-negotiation > is one way to indicate both ends support it, but there are also other > ways. e.g. > > ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off] > > on the SoC you could do > > ethtool --pause eth0 autoneg off rx on tx on > > to force the SoC to send and process pause frames. Ideally i would > prefer a solution like this, since it is not a change of behaviour > for > everybody else. Good point, well made. The reason for using autoneg in this series was due to having no netdev to run ethtool against for the CPU port. If we go down the route of creating a netdev for the CPU port, then we could indeed force pause frames at both ends. However, given that the phy on the marvell switch is capable of autoneg , is it not reasonable to setup the advertisement and let autoneg take care of it if using phy to phy connection? > > Andrew
> > It actually has nothing to do with PHY to PHY connections. You can > > use > > pause frames with direct MAC to MAC connections. PHY auto-negotiation > > is one way to indicate both ends support it, but there are also other > > ways. e.g. > > > > ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off] > > > > on the SoC you could do > > > > ethtool --pause eth0 autoneg off rx on tx on > > > > to force the SoC to send and process pause frames. Ideally i would > > prefer a solution like this, since it is not a change of behaviour > > for > > everybody else. > > Good point, well made. > The reason for using autoneg in this series was due to having no netdev > to run ethtool against for the CPU port. Do you need one? It is the IMX which is the bottle neck. It is the one which needs to send pause frames. You have a netdev for that. Have you checked if the switch will react on pause frames without your change. Play with the command i give above on the master interface. It looks like the FEC driver fully supports synchronous pause configuration. > However, given that the phy on the marvell switch is capable of > autoneg , is it not reasonable to setup the advertisement and let > autoneg take care of it if using phy to phy connection? Most designs don't use back to back PHYs for the CPU port. They save the cost and connect MACs back to back using RGMII, or maybe SERDES. If we are going for a method which can configure pause between the CPU and the switch, it needs to be generic and work for both setups. Andrew
diff --git a/net/dsa/port.c b/net/dsa/port.c index f071acf2842b..1b6832eac2c5 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -538,10 +538,20 @@ static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable) return PTR_ERR(phydev); if (enable) { + phydev->supported = PHY_GBIT_FEATURES | SUPPORTED_MII | + SUPPORTED_AUI | SUPPORTED_FIBRE | + SUPPORTED_BNC | SUPPORTED_Pause | + SUPPORTED_Asym_Pause; + phydev->advertising = phydev->supported; + err = genphy_config_init(phydev); if (err < 0) goto err_put_dev; + err = genphy_config_aneg(phydev); + if (err < 0) + goto err_put_dev; + err = genphy_resume(phydev); if (err < 0) goto err_put_dev;
Configure autoneg for phy connected CPU ports. This allows us to use autoneg between the CPU port's phy and the link partner's phy. This enables us to negoatiate pause frame transmission to prioritise packet delivery over throughput. Signed-off-by: Robert Beckett <bob.beckett@collabora.com> --- net/dsa/port.c | 10 ++++++++++ 1 file changed, 10 insertions(+)