mbox series

[RFC,net-next,0/5] net: dsa: LAG support

Message ID 20171001194639.8647-1-f.fainelli@gmail.com
Headers show
Series net: dsa: LAG support | expand

Message

Florian Fainelli Oct. 1, 2017, 7:46 p.m. UTC
Hi all,

This patch series is sent as RFC since I have only been able to test LAG
with dsa-loop and not with real HW yet (that should be tomorrow). I also
looked at how the Marvell DSDT API is defined for adding ports to "trunk"
groups and the API proposed here should work there too. Can't speak about
QCA, Mediatek or KSZ switches though.

Few open questions that may need solving now or later:

- on Broadcom switches, we should allow enslaving a port as a LAG group
  member if its speed does not match that of the other members of the group

- not sure what to do with a switch fabric, naively, if adding two ports
  of two distinct switches as a LAG group, we may have to propagate that
  to "dsa" cross-chip interfaces as well

Thanks!

Florian Fainelli (5):
  net: dsa: Add infrastructure to support LAG
  net: dsa: b53: Define MAC trunking/bonding registers
  net: dsa: b53: Add support for LAG
  net: dsa: bcm_sf2: Add support for LAG
  net: dsa: loop: Add support for LAG

 drivers/net/dsa/b53/b53_common.c |  94 ++++++++++++++++++++++-
 drivers/net/dsa/b53/b53_priv.h   |   6 ++
 drivers/net/dsa/b53/b53_regs.h   |  18 +++++
 drivers/net/dsa/bcm_sf2.c        |   3 +
 drivers/net/dsa/dsa_loop.c       |  54 +++++++++++++-
 include/net/dsa.h                |  25 +++++++
 net/dsa/dsa2.c                   |  12 +++
 net/dsa/dsa_priv.h               |   7 ++
 net/dsa/port.c                   |  92 +++++++++++++++++++++++
 net/dsa/slave.c                  | 157 ++++++++++++++++++++++++++++++++++++---
 net/dsa/switch.c                 |  30 ++++++++
 11 files changed, 484 insertions(+), 14 deletions(-)

Comments

Ido Schimmel Oct. 2, 2017, 6:50 a.m. UTC | #1
Hi Florian,

On Sun, Oct 01, 2017 at 12:46:34PM -0700, Florian Fainelli wrote:
> Hi all,
> 
> This patch series is sent as RFC since I have only been able to test LAG
> with dsa-loop and not with real HW yet (that should be tomorrow). I also
> looked at how the Marvell DSDT API is defined for adding ports to "trunk"
> groups and the API proposed here should work there too. Can't speak about
> QCA, Mediatek or KSZ switches though.

Thanks for working on this. I've yet to look at the patches, but I
thought I'll mention a few issues we bumped into with LAG devices:

1) It is possible for users to stack devices on top of the LAG and only
then enslave your port. This means that the underlying driver might not
be aware of all the necessary configuration. It's quite a complicated
problem to solve properly, so we currently forbid enslavements to
devices that already have uppers.

There's also an issue with IP addresses and routes configured on top of
the LAG, but I hope to fix that soon. I don't think you support L3 in
DSA yet, so it shouldn't be a problem for you.

2) Similarly, you're no longer guaranteed to have the bridge do proper
clean up in case you pull a port out of a bridged LAG, so you'll need to
handle that. Any context you store for the bridge port needs to be
destroyed upon the removal of the last port from the LAG.

> Few open questions that may need solving now or later:
> 
> - on Broadcom switches, we should allow enslaving a port as a LAG group
>   member if its speed does not match that of the other members of the group
> 
> - not sure what to do with a switch fabric, naively, if adding two ports
>   of two distinct switches as a LAG group, we may have to propagate that
>   to "dsa" cross-chip interfaces as well

At least in mlxsw case, enslaving switch and non-switch ports to the
same LAG doesn't make sense. Any traffic routed by the switch will only
be load-balanced between the switch ports. One way to solve that is to
forbid such enslavements during NETDEV_PRECHANGEUPPER in case the lower
devices in the adjacency list of the LAG don't belong to the same
switch.

Note that such configurations are bound to fail anyway, as the
non-switch ports will not have `switchdev_ops` configured and thus fail
during __switchdev_port_obj_add() / __switchdev_port_attr_set().
Andrew Lunn Oct. 2, 2017, 12:51 p.m. UTC | #2
> - not sure what to do with a switch fabric, naively, if adding two ports
>   of two distinct switches as a LAG group, we may have to propagate that
>   to "dsa" cross-chip interfaces as well

Hi Florian

Marvell switches do support this. If i remember correctly, it requires
some setup for forwarding over the DSA ports.

But for a first implementation, i would be tempted to disallow such
setups. Force the LAG members to be on the same switch.

	Andrew
Andrew Lunn Oct. 2, 2017, 12:59 p.m. UTC | #3
> > - not sure what to do with a switch fabric, naively, if adding two ports
> >   of two distinct switches as a LAG group, we may have to propagate that
> >   to "dsa" cross-chip interfaces as well
> 
> At least in mlxsw case, enslaving switch and non-switch ports to the
> same LAG doesn't make sense. Any traffic routed by the switch will only
> be load-balanced between the switch ports. One way to solve that is to
> forbid such enslavements during NETDEV_PRECHANGEUPPER in case the lower
> devices in the adjacency list of the LAG don't belong to the same
> switch.
> 
> Note that such configurations are bound to fail anyway, as the
> non-switch ports will not have `switchdev_ops` configured and thus fail
> during __switchdev_port_obj_add() / __switchdev_port_attr_set().

Hi Ido

Here Florian is thinking about the D in DSA. Marvell switches have the
capabilities of building a switch fabric out of multiple
interconnected switches. To switchdev, they appear as a single switch.
switchdev has no idea of the mapping of interfaces to switches, nor
the routing of frames between switches. This all happens in the layers
bellow. The hardware does support LAG members on different switches
within the same fabric. But it requires some additional setup for the
ports which link switches together. We have the same issues with MDB,
where additional setup is required for group members spread over the
switch fabric.

      Andrew
Ido Schimmel Oct. 2, 2017, 1:05 p.m. UTC | #4
Hi Andrew,

On Mon, Oct 02, 2017 at 02:59:32PM +0200, Andrew Lunn wrote:
> > > - not sure what to do with a switch fabric, naively, if adding two ports
> > >   of two distinct switches as a LAG group, we may have to propagate that
> > >   to "dsa" cross-chip interfaces as well
> > 
> > At least in mlxsw case, enslaving switch and non-switch ports to the
> > same LAG doesn't make sense. Any traffic routed by the switch will only
> > be load-balanced between the switch ports. One way to solve that is to
> > forbid such enslavements during NETDEV_PRECHANGEUPPER in case the lower
> > devices in the adjacency list of the LAG don't belong to the same
> > switch.
> > 
> > Note that such configurations are bound to fail anyway, as the
> > non-switch ports will not have `switchdev_ops` configured and thus fail
> > during __switchdev_port_obj_add() / __switchdev_port_attr_set().
> 
> Hi Ido
> 
> Here Florian is thinking about the D in DSA. Marvell switches have the
> capabilities of building a switch fabric out of multiple
> interconnected switches. To switchdev, they appear as a single switch.
> switchdev has no idea of the mapping of interfaces to switches, nor
> the routing of frames between switches. This all happens in the layers
> bellow. The hardware does support LAG members on different switches
> within the same fabric. But it requires some additional setup for the
> ports which link switches together. We have the same issues with MDB,
> where additional setup is required for group members spread over the
> switch fabric.

Yes, I understood that. I was simply referring to the more general
problem of any two net devices and how to solve it. Not currently
implemented in mlxsw, but should be necessary for DSA as well.

Agree with your previous mail about keeping it simple for the first
implementation.