diff mbox series

[RFC,net-next,v3,09/10] net: bridge: mrp: Integrate MRP into the bridge

Message ID 20200124161828.12206-10-horatiu.vultur@microchip.com
State RFC
Delegated to: David Miller
Headers show
Series net: bridge: mrp: Add support for Media Redundancy Protocol (MRP) | expand

Commit Message

Horatiu Vultur Jan. 24, 2020, 4:18 p.m. UTC
To integrate MRP into the bridge, the bridge needs to do the following:
- initialized and destroy the generic netlink used by MRP
- detect if the MRP frame was received on a port that is part of a MRP ring. In
  case it was not, then forward the frame as usual, otherwise redirect the frame
  to the upper layer.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 net/bridge/br.c         | 11 +++++++++++
 net/bridge/br_device.c  |  3 +++
 net/bridge/br_if.c      |  6 ++++++
 net/bridge/br_input.c   | 14 ++++++++++++++
 net/bridge/br_private.h | 14 ++++++++++++++
 5 files changed, 48 insertions(+)

Comments

Andrew Lunn Jan. 25, 2020, 3:42 p.m. UTC | #1
On Fri, Jan 24, 2020 at 05:18:27PM +0100, Horatiu Vultur wrote:
> To integrate MRP into the bridge, the bridge needs to do the following:
> - initialized and destroy the generic netlink used by MRP
> - detect if the MRP frame was received on a port that is part of a MRP ring. In
>   case it was not, then forward the frame as usual, otherwise redirect the frame
>   to the upper layer.
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---
>  net/bridge/br.c         | 11 +++++++++++
>  net/bridge/br_device.c  |  3 +++
>  net/bridge/br_if.c      |  6 ++++++
>  net/bridge/br_input.c   | 14 ++++++++++++++
>  net/bridge/br_private.h | 14 ++++++++++++++
>  5 files changed, 48 insertions(+)
> 
> diff --git a/net/bridge/br.c b/net/bridge/br.c
> index b6fe30e3768f..d5e556eed4ba 100644
> --- a/net/bridge/br.c
> +++ b/net/bridge/br.c
> @@ -344,6 +344,12 @@ static int __init br_init(void)
>  	if (err)
>  		goto err_out5;
>  
> +#ifdef CONFIG_BRIDGE_MRP
> +	err = br_mrp_netlink_init();
> +	if (err)
> +		goto err_out6;
> +#endif

Please try to avoid #ifdef's like this in C code. Add a stub function
to br_private_mrp.h.

If you really cannot avoid #ifdef, please use #if IS_ENABLED(CONFIG_BRIDGE_MRP).
That expands to

	if (0) {

        }

So the compiler will compile it and then optimize it out. That gives
us added benefit of build testing, we don't suddenly find the code no
longer compiles when we enable the option.

> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -21,6 +21,9 @@
>  #include <linux/rculist.h>
>  #include "br_private.h"
>  #include "br_private_tunnel.h"
> +#ifdef CONFIG_BRIDGE_MRP
> +#include "br_private_mrp.h"
> +#endif

It should always be safe to include a header file.

   Andrew
Andrew Lunn Jan. 25, 2020, 4:16 p.m. UTC | #2
>  br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb)
> @@ -338,6 +341,17 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
>  			return RX_HANDLER_CONSUMED;
>  		}
>  	}
> +#ifdef CONFIG_BRIDGE_MRP
> +	/* If there is no MRP instance do normal forwarding */
> +	if (!p->mrp_aware)
> +		goto forward;
> +
> +	if (skb->protocol == htons(ETH_P_MRP))
> +		return RX_HANDLER_PASS;

What MAC address is used for these MRP frames? It would make sense to
use a L2 link local destination address, since i assume they are not
supposed to be forwarded by the bridge. If so, you could extend the
if (unlikely(is_link_local_ether_addr(dest))) condition.

> +
> +	if (p->state == BR_STATE_BLOCKING)
> +		goto drop;
> +#endif

Is this needed? The next block of code is a switch statement on
p->state. The default case, which BR_STATE_BLOCKING should hit, is
drop.

This function is on the hot path. So we should try to optimize it as
much as possible.

     Andrew
Horatiu Vultur Jan. 26, 2020, 12:49 p.m. UTC | #3
The 01/25/2020 16:42, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Fri, Jan 24, 2020 at 05:18:27PM +0100, Horatiu Vultur wrote:
> > To integrate MRP into the bridge, the bridge needs to do the following:
> > - initialized and destroy the generic netlink used by MRP
> > - detect if the MRP frame was received on a port that is part of a MRP ring. In
> >   case it was not, then forward the frame as usual, otherwise redirect the frame
> >   to the upper layer.
> >
> > Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> > ---
> >  net/bridge/br.c         | 11 +++++++++++
> >  net/bridge/br_device.c  |  3 +++
> >  net/bridge/br_if.c      |  6 ++++++
> >  net/bridge/br_input.c   | 14 ++++++++++++++
> >  net/bridge/br_private.h | 14 ++++++++++++++
> >  5 files changed, 48 insertions(+)
> >
> > diff --git a/net/bridge/br.c b/net/bridge/br.c
> > index b6fe30e3768f..d5e556eed4ba 100644
> > --- a/net/bridge/br.c
> > +++ b/net/bridge/br.c
> > @@ -344,6 +344,12 @@ static int __init br_init(void)
> >       if (err)
> >               goto err_out5;
> >
> > +#ifdef CONFIG_BRIDGE_MRP
> > +     err = br_mrp_netlink_init();
> > +     if (err)
> > +             goto err_out6;
> > +#endif
> 
> Please try to avoid #ifdef's like this in C code. Add a stub function
> to br_private_mrp.h.
> 
> If you really cannot avoid #ifdef, please use #if IS_ENABLED(CONFIG_BRIDGE_MRP).
> That expands to
> 
>         if (0) {
> 
>         }
> 
> So the compiler will compile it and then optimize it out. That gives
> us added benefit of build testing, we don't suddenly find the code no
> longer compiles when we enable the option.
> 
> > --- a/net/bridge/br_input.c
> > +++ b/net/bridge/br_input.c
> > @@ -21,6 +21,9 @@
> >  #include <linux/rculist.h>
> >  #include "br_private.h"
> >  #include "br_private_tunnel.h"
> > +#ifdef CONFIG_BRIDGE_MRP
> > +#include "br_private_mrp.h"
> > +#endif
> 
> It should always be safe to include a header file.
> 
>    Andrew

Thanks for pointing out these mistakes. I will try to avoid all these
#ifdef's in the next patch series.
Horatiu Vultur Jan. 26, 2020, 1:01 p.m. UTC | #4
The 01/25/2020 17:16, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> >  br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb)
> > @@ -338,6 +341,17 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> >                       return RX_HANDLER_CONSUMED;
> >               }
> >       }
> > +#ifdef CONFIG_BRIDGE_MRP
> > +     /* If there is no MRP instance do normal forwarding */
> > +     if (!p->mrp_aware)
> > +             goto forward;
> > +
> > +     if (skb->protocol == htons(ETH_P_MRP))
> > +             return RX_HANDLER_PASS;
> 
> What MAC address is used for these MRP frames? It would make sense to
> use a L2 link local destination address, since i assume they are not
> supposed to be forwarded by the bridge. If so, you could extend the
> if (unlikely(is_link_local_ether_addr(dest))) condition.

The MAC addresses used by MRP frames are:
0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 - used by MRP_Test frames
0x1, 0x15, 0x4e, 0x0, 0x0, 0x2 - used by the rest of MRP frames.

If we will add support also for MIM/MIC. These requires 2 more MAC
addresses:
0x1, 0x15, 0x4e, 0x0, 0x0, 0x3 - used by MRP_InTest frames.
0x1, 0x15, 0x4e, 0x0, 0x0, 0x4 - used by the other MRP interconnect
frames.

Then maybe I shoukd change the check to be something like:
if (unlikely(skb->protocol == htons(ETH_P_MRP)))

> 
> > +
> > +     if (p->state == BR_STATE_BLOCKING)
> > +             goto drop;
> > +#endif
> 
> Is this needed? The next block of code is a switch statement on
> p->state. The default case, which BR_STATE_BLOCKING should hit, is
> drop.

Yes you are rigth, it is not needed anymore.

> 
> This function is on the hot path. So we should try to optimize it as
> much as possible.
> 
>      Andrew
Andrew Lunn Jan. 26, 2020, 5:12 p.m. UTC | #5
On Sun, Jan 26, 2020 at 02:01:11PM +0100, Horatiu Vultur wrote:
> The 01/25/2020 17:16, Andrew Lunn wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > >  br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb)
> > > @@ -338,6 +341,17 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> > >                       return RX_HANDLER_CONSUMED;
> > >               }
> > >       }
> > > +#ifdef CONFIG_BRIDGE_MRP
> > > +     /* If there is no MRP instance do normal forwarding */
> > > +     if (!p->mrp_aware)
> > > +             goto forward;
> > > +
> > > +     if (skb->protocol == htons(ETH_P_MRP))
> > > +             return RX_HANDLER_PASS;
> > 
> > What MAC address is used for these MRP frames? It would make sense to
> > use a L2 link local destination address, since i assume they are not
> > supposed to be forwarded by the bridge. If so, you could extend the
> > if (unlikely(is_link_local_ether_addr(dest))) condition.
> 
> The MAC addresses used by MRP frames are:
> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 - used by MRP_Test frames
> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x2 - used by the rest of MRP frames.
> 
> If we will add support also for MIM/MIC. These requires 2 more MAC
> addresses:
> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x3 - used by MRP_InTest frames.
> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x4 - used by the other MRP interconnect
> frames.

Hi Horatiu

I made the wrong guess about how this protocol worked when i said L2
link local. These MAC addresses are L2 multicast.

And you are using a raw socket to receive them into userspace when
needed.

'Thinking allowed' here.

    +------------------------------------------+
    |                                          |
    +-->|H1|<---------->|H2|<---------->|H3|<--+
    eth0    eth1    eth0    eth1    eth0    eth1
     ^
     |
  Blocked


There are three major classes of user case here:

1) Pure software solution

You need the software bridge in the client to forward these frames
from the left side to the right side. (Does the standard give these
two ports names)? In the master, the left port is blocked, so the
bridge drops them anyway. You have a RAW socket open on both eth0 and
eth1, so you get to see the frames, even if the bridge drops them.

2) Hardware offload to an MRP unaware switch.

I'm thinking about a plain switch supported by DSA, Marvell, Broadcom,
etc. It has no special knowledge of MRP.

Ideally, you want the switch to forward MRP_Test frames left to right
for a client. In a master, i think you have a problem, since the port
is blocked. The hardware is unlikely to recognise these frames as
special, since they are not in the 01-80-C2-XX-XX-XX block, and let
them through. So your raw socket is never going to see them, and you
cannot detect open/closed ring.

I don't know how realistic it is to support MRP in this case, and i
also don't think you can fall back to a pure software solution,
because the software bridge is going to offload the basic bridge
operation to the hardware. It would be nice if you could detect this,
and return -EOPNOTSUPP.

3) Hardware offload to an MRP aware switch.

For a client, you tell it which port is left, which is right, and
assume it forwards the frames. For a master, you again tell it which
is left, which is right, and ask it send MRP_Test frames out right,
and report changes in open/closed on the right port. You don't need
the CPU to see the MRP_Test frames, so the switch has no need to
forward them to the CPU.

We should think about the general case of a bridge with many ports,
and many pairs of ports using MRP. This makes the forwarding of these
frames interesting. Given that they are multicast, the default action
of the software bridge is that it will flood them. Does the protocol
handle seeing MRP_Test from some other loop? Do we need to avoid this?
You could avoid this by adding MDB entries to the bridge. However,
this does not scale to more then one ring. I don't think an MDB is
associated to an ingress port. So you cannot say 

0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port1 egress port2
0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port3 egress port4

The best you can say is

0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 egress port2, port4

I'm sure there are other issues i'm missing, but it is interesting to
think about all this.

Andrew
Allan W. Nielsen Jan. 27, 2020, 10:57 a.m. UTC | #6
On 26.01.2020 18:12, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Sun, Jan 26, 2020 at 02:01:11PM +0100, Horatiu Vultur wrote:
> > The 01/25/2020 17:16, Andrew Lunn wrote:
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > >
> > > >  br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb)
> > > > @@ -338,6 +341,17 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> > > >                       return RX_HANDLER_CONSUMED;
> > > >               }
> > > >       }
> > > > +#ifdef CONFIG_BRIDGE_MRP
> > > > +     /* If there is no MRP instance do normal forwarding */
> > > > +     if (!p->mrp_aware)
> > > > +             goto forward;
> > > > +
> > > > +     if (skb->protocol == htons(ETH_P_MRP))
> > > > +             return RX_HANDLER_PASS;
> > >
> > > What MAC address is used for these MRP frames? It would make sense to
> > > use a L2 link local destination address, since i assume they are not
> > > supposed to be forwarded by the bridge. If so, you could extend the
> > > if (unlikely(is_link_local_ether_addr(dest))) condition.
> >
> > The MAC addresses used by MRP frames are:
> > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 - used by MRP_Test frames
> > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x2 - used by the rest of MRP frames.
> >
> > If we will add support also for MIM/MIC. These requires 2 more MAC
> > addresses:
> > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x3 - used by MRP_InTest frames.
> > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x4 - used by the other MRP interconnect
> > frames.
> 
> Hi Horatiu
> 
> I made the wrong guess about how this protocol worked when i said L2
> link local. These MAC addresses are L2 multicast.
> 
> And you are using a raw socket to receive them into userspace when
> needed.
> 
> 'Thinking allowed' here.
> 
>     +------------------------------------------+
>     |                                          |
>     +-->|H1|<---------->|H2|<---------->|H3|<--+
>     eth0    eth1    eth0    eth1    eth0    eth1
>      ^
>      |
>   Blocked
> 
> 
> There are three major classes of user case here:
> 
> 1) Pure software solution
> You need the software bridge in the client to forward these frames
> from the left side to the right side.
As far as I understand it is not the bridge which forward these frames -
it is the user-space tool. This was to put as much functionality in
user-space and only use the kernel to configure the HW. We can (and
should) discuss if this is the right decision.

> (Does the standard give these two ports names)?
Horatiu?

> In the master, the left port is blocked, so the bridge drops them
> anyway. You have a RAW socket open on both eth0 and eth1, so you get
> to see the frames, even if the bridge drops them.
Yes, in the current patch-set such frames are forwarded by the
user-space daemon.

We would properly have better performance if we do this in kernel-space.


> 2) Hardware offload to an MRP unaware switch.
> 
> I'm thinking about a plain switch supported by DSA, Marvell, Broadcom,
> etc. It has no special knowledge of MRP.
We have implemented this on Ocelot - which is not MRP aware at all. Not
sure what facilities Marvell and Broadcom has, but it is not a lot which
is needed.

> Ideally, you want the switch to forward MRP_Test frames left to right
> for a client.
Yes. If we have only 1 ring, then we can do that with a MAC table entry.
If we have more than 1 ring, then we will need a TCAM rule of some kind.

In the what we have today on Ocelot, we do not do this is HW, we do the
forwarding in SW.

BTW: It is not only from left to right, it is also from right to left.
The MRM will inject packets on both ring ports, and monitor both. This
is to detect asymmetrical link down or similar. The two ports are
treated the same. But you can set a priority (the primary/secondary) to
state your preference on what port to use if both are up and the ring is
closed.

> In a master, i think you have a problem, since the port
> is blocked. The hardware is unlikely to recognise these frames as
> special, since they are not in the 01-80-C2-XX-XX-XX block, and let
> them through. So your raw socket is never going to see them, and you
> cannot detect open/closed ring.
Again, I do not know how other HW is designed, but all the SOC's we are
working with, does allow us to add a TCAM rule which can redirect these
frames to the CPU even on a blocked port.

> I don't know how realistic it is to support MRP in this case, and i
> also don't think you can fall back to a pure software solution,
> because the software bridge is going to offload the basic bridge
> operation to the hardware. It would be nice if you could detect this,
> and return -EOPNOTSUPP.
We do want to support this on Ocelot, but you are right, if the current
running bridge, cannot block a port, and still get the MRP frames on
that port, then it cannot support MRM. And this we need to detect in
some way.

> 3) Hardware offload to an MRP aware switch.
> 
> For a client, you tell it which port is left, which is right, and
> assume it forwards the frames. For a master, you again tell it which
> is left, which is right, and ask it send MRP_Test frames out right,
> and report changes in open/closed on the right port. You don't need
> the CPU to see the MRP_Test frames, so the switch has no need to
> forward them to the CPU.
> 
> We should think about the general case of a bridge with many ports,
> and many pairs of ports using MRP. This makes the forwarding of these
> frames interesting. Given that they are multicast, the default action
> of the software bridge is that it will flood them. Does the protocol
> handle seeing MRP_Test from some other loop? Do we need to avoid this?
Yes, we need to avoid. We cannot "just" do normal flooding.

> You could avoid this by adding MDB entries to the bridge. However,
> this does not scale to more then one ring.
I would prefer a solution where the individual drivers can do what is
best on the given HW.

- If we have a 2 ported switch, then flooding seems like a perfect valid
   approach. There will be only 1 ring.
- If we have a many ported switch, then we could use MAC-table entry -
   if the user only configure 1 ring.
   - When adding more rings, it either needs to return error, or use
     other HW facilities.

> I don't think an MDB is associated to an ingress port. So you cannot
> say
Agree.

> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port1 egress port2
> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port3 egress port4
> 
> The best you can say is
> 
> 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 egress port2, port4
> 
> I'm sure there are other issues i'm missing, but it is interesting to
> think about all this.
Yes, the solution Horatiu has chosen, is not to forward MRP frames,
received in MRP ring ports at all. This is done by the user-space tool.

Again, not sure if this is the right way to do it, but it is what patch
v3 does.

The alternative to this would be to learn the bridge how to forward MRP
frames when it is a MRC. The user-space tool then never needs to do
this, it know that the kernel will take care of this part (either in SW
or in HW).

/Allan
Horatiu Vultur Jan. 27, 2020, 1:02 p.m. UTC | #7
The 01/27/2020 11:57, Allan W. Nielsen wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 26.01.2020 18:12, Andrew Lunn wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > On Sun, Jan 26, 2020 at 02:01:11PM +0100, Horatiu Vultur wrote:
> > > The 01/25/2020 17:16, Andrew Lunn wrote:
> > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > > >
> > > > >  br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb)
> > > > > @@ -338,6 +341,17 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> > > > >                       return RX_HANDLER_CONSUMED;
> > > > >               }
> > > > >       }
> > > > > +#ifdef CONFIG_BRIDGE_MRP
> > > > > +     /* If there is no MRP instance do normal forwarding */
> > > > > +     if (!p->mrp_aware)
> > > > > +             goto forward;
> > > > > +
> > > > > +     if (skb->protocol == htons(ETH_P_MRP))
> > > > > +             return RX_HANDLER_PASS;
> > > >
> > > > What MAC address is used for these MRP frames? It would make sense to
> > > > use a L2 link local destination address, since i assume they are not
> > > > supposed to be forwarded by the bridge. If so, you could extend the
> > > > if (unlikely(is_link_local_ether_addr(dest))) condition.
> > >
> > > The MAC addresses used by MRP frames are:
> > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 - used by MRP_Test frames
> > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x2 - used by the rest of MRP frames.
> > >
> > > If we will add support also for MIM/MIC. These requires 2 more MAC
> > > addresses:
> > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x3 - used by MRP_InTest frames.
> > > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x4 - used by the other MRP interconnect
> > > frames.
> > 
> > Hi Horatiu
> > 
> > I made the wrong guess about how this protocol worked when i said L2
> > link local. These MAC addresses are L2 multicast.
> > 
> > And you are using a raw socket to receive them into userspace when
> > needed.
> > 
> > 'Thinking allowed' here.
> > 
> >     +------------------------------------------+
> >     |                                          |
> >     +-->|H1|<---------->|H2|<---------->|H3|<--+
> >     eth0    eth1    eth0    eth1    eth0    eth1
> >      ^
> >      |
> >   Blocked
> > 
> > 
> > There are three major classes of user case here:
> > 
> > 1) Pure software solution
> > You need the software bridge in the client to forward these frames
> > from the left side to the right side.
> As far as I understand it is not the bridge which forward these frames -
> it is the user-space tool. This was to put as much functionality in
> user-space and only use the kernel to configure the HW. We can (and
> should) discuss if this is the right decision.
> 
> > (Does the standard give these two ports names)?
> Horatiu?

They don't have a specific name, the standard names them as "ring
ports". And to differentiate between them, they have roles: primary or
secondary. These roles are used to know which port needs to be
blocked(the secondary) and which needs to forward the frames. One
observation, these roles are not fix for the entire time. The ring ports
can interchange their roles. For example if eth0 is the primary port and
eth1 is the secondary port and then the eth0 link goes down then eth0
will have the secondary role and eth1 will become primary port.

> 
> > In the master, the left port is blocked, so the bridge drops them
> > anyway. You have a RAW socket open on both eth0 and eth1, so you get
> > to see the frames, even if the bridge drops them.
> Yes, in the current patch-set such frames are forwarded by the
> user-space daemon.
> 
> We would properly have better performance if we do this in kernel-space.
> 
> 
> > 2) Hardware offload to an MRP unaware switch.
> > 
> > I'm thinking about a plain switch supported by DSA, Marvell, Broadcom,
> > etc. It has no special knowledge of MRP.
> We have implemented this on Ocelot - which is not MRP aware at all. Not
> sure what facilities Marvell and Broadcom has, but it is not a lot which
> is needed.

Here is a small confusion. The implementation that we have done on
Ocelot doesn't have at all HW offload(I have hacked the network driver
to remove this support, so basically is just 4 NICs). Therefor all the
non-MRP frames switching were done by the SW bridge and forwarding of
MRP frames were done in the userspace.

> 
> > Ideally, you want the switch to forward MRP_Test frames left to right
> > for a client.
> Yes. If we have only 1 ring, then we can do that with a MAC table entry.
> If we have more than 1 ring, then we will need a TCAM rule of some kind.
> 
> In the what we have today on Ocelot, we do not do this is HW, we do the
> forwarding in SW.
> 
> BTW: It is not only from left to right, it is also from right to left.
> The MRM will inject packets on both ring ports, and monitor both. This
> is to detect asymmetrical link down or similar. The two ports are
> treated the same. But you can set a priority (the primary/secondary) to
> state your preference on what port to use if both are up and the ring is
> closed.

A small observation, the primary/secondary are defined in the standard
as roles and not priority. And yes it uses this role(primary/secondary)
to decide which port to block.

> 
> > In a master, i think you have a problem, since the port
> > is blocked. The hardware is unlikely to recognise these frames as
> > special, since they are not in the 01-80-C2-XX-XX-XX block, and let
> > them through. So your raw socket is never going to see them, and you
> > cannot detect open/closed ring.
> Again, I do not know how other HW is designed, but all the SOC's we are
> working with, does allow us to add a TCAM rule which can redirect these
> frames to the CPU even on a blocked port.
> 
> > I don't know how realistic it is to support MRP in this case, and i
> > also don't think you can fall back to a pure software solution,
> > because the software bridge is going to offload the basic bridge
> > operation to the hardware. It would be nice if you could detect this,
> > and return -EOPNOTSUPP.
> We do want to support this on Ocelot, but you are right, if the current
> running bridge, cannot block a port, and still get the MRP frames on
> that port, then it cannot support MRM. And this we need to detect in
> some way.
> 
> > 3) Hardware offload to an MRP aware switch.
> > 
> > For a client, you tell it which port is left, which is right, and
> > assume it forwards the frames. For a master, you again tell it which
> > is left, which is right, and ask it send MRP_Test frames out right,
> > and report changes in open/closed on the right port. You don't need
> > the CPU to see the MRP_Test frames, so the switch has no need to
> > forward them to the CPU.
> > 
> > We should think about the general case of a bridge with many ports,
> > and many pairs of ports using MRP. This makes the forwarding of these
> > frames interesting. Given that they are multicast, the default action
> > of the software bridge is that it will flood them. Does the protocol
> > handle seeing MRP_Test from some other loop? Do we need to avoid this?
> Yes, we need to avoid. We cannot "just" do normal flooding.
> 
> > You could avoid this by adding MDB entries to the bridge. However,
> > this does not scale to more then one ring.
> I would prefer a solution where the individual drivers can do what is
> best on the given HW.
> 
> - If we have a 2 ported switch, then flooding seems like a perfect valid
>   approach. There will be only 1 ring.
> - If we have a many ported switch, then we could use MAC-table entry -
>   if the user only configure 1 ring.
>   - When adding more rings, it either needs to return error, or use
>     other HW facilities.
> 
> > I don't think an MDB is associated to an ingress port. So you cannot
> > say
> Agree.
> 
> > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port1 egress port2
> > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 ingress port3 egress port4
> > 
> > The best you can say is
> > 
> > 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 egress port2, port4
> > 
> > I'm sure there are other issues i'm missing, but it is interesting to
> > think about all this.
> Yes, the solution Horatiu has chosen, is not to forward MRP frames,
> received in MRP ring ports at all. This is done by the user-space tool.
> 
> Again, not sure if this is the right way to do it, but it is what patch
> v3 does.
> 
> The alternative to this would be to learn the bridge how to forward MRP
> frames when it is a MRC. The user-space tool then never needs to do
> this, it know that the kernel will take care of this part (either in SW
> or in HW).
> 
> /Allan
Andrew Lunn Jan. 27, 2020, 1:40 p.m. UTC | #8
> > 'Thinking allowed' here.
> > 
> >     +------------------------------------------+
> >     |                                          |
> >     +-->|H1|<---------->|H2|<---------->|H3|<--+
> >     eth0    eth1    eth0    eth1    eth0    eth1
> >      ^
> >      |
> >   Blocked
> > 
> > 
> > There are three major classes of user case here:
> > 
> > 1) Pure software solution
> > You need the software bridge in the client to forward these frames
> > from the left side to the right side.

> As far as I understand it is not the bridge which forward these frames -
> it is the user-space tool. This was to put as much functionality in
> user-space and only use the kernel to configure the HW. We can (and
> should) discuss if this is the right decision.

So i need to flip the point around. How does the software switch know
not to forward the frames? Are you adding an MDB?

> We would properly have better performance if we do this in kernel-space.

Yes, that is what i think. And if you can do it without any additional
code, using the forwarding tables, so much the better.

> BTW: It is not only from left to right, it is also from right to left.
> The MRM will inject packets on both ring ports, and monitor both.

Using the same MAC address in both directions? I need to think what
that implies for MDB entries. It probably just works, since you never
flood back out the ingress port.

> Again, I do not know how other HW is designed, but all the SOC's we are
> working with, does allow us to add a TCAM rule which can redirect these
> frames to the CPU even on a blocked port.

It is not in scope for what you are doing, but i wonder how we
describe this in a generic Linux way? And then how we push it down to
the hardware?

For the Marvell Switches, it might be possible to do this without the
TCAM. You can add forwarding DB entries marked as Management. It is
unclear if this overrides the blocked state, but it would be a bit odd
if it did not.

> > You could avoid this by adding MDB entries to the bridge. However,
> > this does not scale to more then one ring.
> I would prefer a solution where the individual drivers can do what is
> best on the given HW.

The nice thing about adding MDB is that it is making use of the
software bridge facilities. In general, the software bridge and
hardware bridges are pretty similar. If you can solve the problem
using generic software bridge features, not additional special cases
in code, you have good chance of being able to offload it to a
hardware bridge which is not MRP aware. The switchdev API for MRP
specific features should then allow you to make use of any additional
features the hardware might have.

> Yes, the solution Horatiu has chosen, is not to forward MRP frames,
> received in MRP ring ports at all. This is done by the user-space tool.
> 
> Again, not sure if this is the right way to do it, but it is what patch
> v3 does.
> 
> The alternative to this would be to learn the bridge how to forward MRP
> frames when it is a MRC. The user-space tool then never needs to do
> this, it know that the kernel will take care of this part (either in SW
> or in HW).

I think that should be considered. I'm not saying it is the best way,
just that some thought should be put into it to figure out what it
actually implies.

	 Andrew
Jürgen Lambrecht Jan. 28, 2020, 9:56 a.m. UTC | #9
On 1/27/20 2:40 PM, Andrew Lunn wrote:
>> Again, I do not know how other HW is designed, but all the SOC's we are
>> working with, does allow us to add a TCAM rule which can redirect these
>> frames to the CPU even on a blocked port.
> It is not in scope for what you are doing, but i wonder how we
> describe this in a generic Linux way? And then how we push it down to
> the hardware?
>
> For the Marvell Switches, it might be possible to do this without the
> TCAM. You can add forwarding DB entries marked as Management. It is
> unclear if this overrides the blocked state, but it would be a bit odd
> if it did not.
A MGMT frame does override the blocked state according the the datasheet.
And any MAC address can be loaded, not only 01:80:C2:00:00:0x (802.1D) and 01:80:C2:00:00:2x (GARP). Then the ATU is used instead of something specialized.
(referring to Andrew's email of 20200126 6:12 PM)
(I only checked again 88E6250/88E6220/88E6071/88E6070/88E6020 Functional Specification)


Kind regards,

Jürgen
Allan W. Nielsen Jan. 28, 2020, 10:17 a.m. UTC | #10
On 27.01.2020 14:40, Andrew Lunn wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>> > 'Thinking allowed' here.
>> >
>> >     +------------------------------------------+
>> >     |                                          |
>> >     +-->|H1|<---------->|H2|<---------->|H3|<--+
>> >     eth0    eth1    eth0    eth1    eth0    eth1
>> >      ^
>> >      |
>> >   Blocked
>> >
>> >
>> > There are three major classes of user case here:
>> >
>> > 1) Pure software solution
>> > You need the software bridge in the client to forward these frames
>> > from the left side to the right side.
>
>> As far as I understand it is not the bridge which forward these frames -
>> it is the user-space tool. This was to put as much functionality in
>> user-space and only use the kernel to configure the HW. We can (and
>> should) discuss if this is the right decision.
>
>So i need to flip the point around. How does the software switch know
>not to forward the frames? Are you adding an MDB?
In the current implementation (patch v3) this is done here:
https://github.com/microchip-ung/mrp/blob/patch-v3/kernel-patches/v3-0009-net-bridge-mrp-Integrate-MRP-into-the-bridge.patch#L112

We simply ask the bridge not to forward any MRP frames, on MRP enabled
ports, and let "someone" else do that.

>> We would properly have better performance if we do this in kernel-space.
>
>Yes, that is what i think. And if you can do it without any additional
>code, using the forwarding tables, so much the better.
I understand the motivation of using the existing forwarding mechanism,
but I do not think we have all the hooks needed. But we can certainly
limit the impact on the existing code as much as possible.

>> BTW: It is not only from left to right, it is also from right to left.
>> The MRM will inject packets on both ring ports, and monitor both.
>
>Using the same MAC address in both directions? I need to think what
>that implies for MDB entries. It probably just works, since you never
>flood back out the ingress port.
Seems to work fine :-D

>> Again, I do not know how other HW is designed, but all the SOC's we are
>> working with, does allow us to add a TCAM rule which can redirect these
>> frames to the CPU even on a blocked port.
>
>It is not in scope for what you are doing, but i wonder how we
>describe this in a generic Linux way? And then how we push it down to
>the hardware?
>
>For the Marvell Switches, it might be possible to do this without the
>TCAM. You can add forwarding DB entries marked as Management. It is
>unclear if this overrides the blocked state, but it would be a bit odd
>if it did not.
Based on this, and also on the input from Jürgen, I think there is a
good chnage we can make this work for existing silicon from several
vendors.

>> > You could avoid this by adding MDB entries to the bridge. However,
>> > this does not scale to more then one ring.
>> I would prefer a solution where the individual drivers can do what is
>> best on the given HW.
>The nice thing about adding MDB is that it is making use of the
>software bridge facilities. In general, the software bridge and
>hardware bridges are pretty similar. If you can solve the problem
>using generic software bridge features, not additional special cases
>in code, you have good chance of being able to offload it to a
>hardware bridge which is not MRP aware. The switchdev API for MRP
>specific features should then allow you to make use of any additional
>features the hardware might have.
Yes, but the issues in using the MDB API for this is that it does not
allow to look at source ports, and it does not allow to update the
priority of the frames.

>> Yes, the solution Horatiu has chosen, is not to forward MRP frames,
>> received in MRP ring ports at all. This is done by the user-space tool.
>>
>> Again, not sure if this is the right way to do it, but it is what patch
>> v3 does.
>>
>> The alternative to this would be to learn the bridge how to forward MRP
>> frames when it is a MRC. The user-space tool then never needs to do
>> this, it know that the kernel will take care of this part (either in SW
>> or in HW).
>I think that should be considered. I'm not saying it is the best way,
>just that some thought should be put into it to figure out what it
>actually implies.
Sounds good - I will try to explain and illustrate this a bit better,
such that we all have the same understanding of the problem we need to
solve.

/Allan
diff mbox series

Patch

diff --git a/net/bridge/br.c b/net/bridge/br.c
index b6fe30e3768f..d5e556eed4ba 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -344,6 +344,12 @@  static int __init br_init(void)
 	if (err)
 		goto err_out5;
 
+#ifdef CONFIG_BRIDGE_MRP
+	err = br_mrp_netlink_init();
+	if (err)
+		goto err_out6;
+#endif
+
 	brioctl_set(br_ioctl_deviceless_stub);
 
 #if IS_ENABLED(CONFIG_ATM_LANE)
@@ -358,6 +364,11 @@  static int __init br_init(void)
 
 	return 0;
 
+#ifdef CONFIG_BRIDGE_MRP
+err_out6:
+	br_netlink_fini();
+#endif
+
 err_out5:
 	unregister_switchdev_notifier(&br_switchdev_notifier);
 err_out4:
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index fb38add21b37..29966754d86a 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -464,6 +464,9 @@  void br_dev_setup(struct net_device *dev)
 	spin_lock_init(&br->lock);
 	INIT_LIST_HEAD(&br->port_list);
 	INIT_HLIST_HEAD(&br->fdb_list);
+#ifdef CONFIG_BRIDGE_MRP
+	INIT_LIST_HEAD(&br->mrp_list);
+#endif
 	spin_lock_init(&br->hash_lock);
 
 	br->bridge_id.prio[0] = 0x80;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 4fe30b182ee7..9b8bb41c0574 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -331,6 +331,9 @@  static void del_nbp(struct net_bridge_port *p)
 
 	spin_lock_bh(&br->lock);
 	br_stp_disable_port(p);
+#ifdef CONFIG_BRIDGE_MRP
+	p->mrp_aware = false;
+#endif
 	spin_unlock_bh(&br->lock);
 
 	br_ifinfo_notify(RTM_DELLINK, NULL, p);
@@ -427,6 +430,9 @@  static struct net_bridge_port *new_nbp(struct net_bridge *br,
 	p->port_no = index;
 	p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
 	br_init_port(p);
+#ifdef CONFIG_BRIDGE_MRP
+	p->mrp_aware = false;
+#endif
 	br_set_state(p, BR_STATE_DISABLED);
 	br_stp_port_timer_init(p);
 	err = br_multicast_add_port(p);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 8944ceb47fe9..de7066b077e2 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -21,6 +21,9 @@ 
 #include <linux/rculist.h>
 #include "br_private.h"
 #include "br_private_tunnel.h"
+#ifdef CONFIG_BRIDGE_MRP
+#include "br_private_mrp.h"
+#endif
 
 static int
 br_netif_receive_skb(struct net *net, struct sock *sk, struct sk_buff *skb)
@@ -338,6 +341,17 @@  rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 			return RX_HANDLER_CONSUMED;
 		}
 	}
+#ifdef CONFIG_BRIDGE_MRP
+	/* If there is no MRP instance do normal forwarding */
+	if (!p->mrp_aware)
+		goto forward;
+
+	if (skb->protocol == htons(ETH_P_MRP))
+		return RX_HANDLER_PASS;
+
+	if (p->state == BR_STATE_BLOCKING)
+		goto drop;
+#endif
 
 forward:
 	switch (p->state) {
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f540f3bdf294..a5d01a394f54 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -285,6 +285,10 @@  struct net_bridge_port {
 	u16				backup_redirected_cnt;
 
 	struct bridge_stp_xstats	stp_xstats;
+
+#ifdef CONFIG_BRIDGE_MRP
+	bool				mrp_aware;
+#endif
 };
 
 #define kobj_to_brport(obj)	container_of(obj, struct net_bridge_port, kobj)
@@ -424,6 +428,10 @@  struct net_bridge {
 	int offload_fwd_mark;
 #endif
 	struct hlist_head		fdb_list;
+
+#ifdef CONFIG_BRIDGE_MRP
+	struct list_head		mrp_list;
+#endif
 };
 
 struct br_input_skb_cb {
@@ -1165,6 +1173,12 @@  unsigned long br_timer_value(const struct timer_list *timer);
 extern int (*br_fdb_test_addr_hook)(struct net_device *dev, unsigned char *addr);
 #endif
 
+/* br_mrp.c */
+#ifdef CONFIG_BRIDGE_MRP
+int br_mrp_netlink_init(void);
+void br_mrp_netlink_uninit(void);
+#endif
+
 /* br_netlink.c */
 extern struct rtnl_link_ops br_link_ops;
 int br_netlink_init(void);