mbox series

[RFC,net-next,v3,00/10] net: bridge: mrp: Add support for Media Redundancy Protocol (MRP)

Message ID 20200124161828.12206-1-horatiu.vultur@microchip.com
Headers show
Series net: bridge: mrp: Add support for Media Redundancy Protocol (MRP) | expand

Message

Horatiu Vultur Jan. 24, 2020, 4:18 p.m. UTC
Media Redundancy Protocol is a data network protocol standardized by
International Electrotechnical Commission as IEC 62439-2. It allows rings of
Ethernet switches to overcome any single failure with recovery time faster than
STP. It is primarily used in Industrial Ethernet applications.

Based on the previous RFC[1][2], the MRP state machine and all the
timers were moved to userspace. A generic netlink interface is added to
allow configuring the HW, and logic added to to implement the MRP
specific forwarding rules.

The userspace application that is using the new netlink can be found here[3].

The current implementation both in kernel and userspace supports only 2 roles:

  MRM - this one is responsible to send MRP_Test and MRP_Topo frames on both
  ring ports. It needs to process MRP_Test to know if the ring is open or
  closed. This operation is desired to be offloaded to the HW because it
  requires to generate and process up to 4000 frames per second. Whenever it
  detects that the ring open it sends MRP_Topo frames to notify all MRC about
  changes in the topology. MRM needs also to process MRP_LinkChange frames,
  these frames are generated by the MRC. When the ring is open the the state
  of both ports is to forward frames and when the ring is closed then the
  secondary port is blocked.

  MRC - this one is responsible to forward MRP frames between the ring ports.
  In case one of the ring ports gets a link down or up, then MRC will generate
  a MRP_LinkChange frames. This node should also process MRP_Topo frames and to
  clear its FDB when it receives this frame.

 Userspace
               Deamon +----------+ Client
                +
                |
 +--------------|-----------------------------------------+
  Kernel        |
                + Netlink

                |                              + Interrupt
                |                              |
 +--------------|------------------------------|----------+
  HW            | Switchdev                    |
                +                              |

The user interacts using the client (called 'mrp'), the client talks to the
deamon (called 'mrp_server'), which talks with the kernel using netlink. The
kernel will try to offload the requests to the HW via switchdev API. For this a
new generic netlink interface was added to the bridge.

If the kernel cannot offload MRP to HW (maybe it does not have a switchdev
driver, or it is just not supported), then all the netlink calls will return
-EOPNOTSUPP. In this case the user-space deamon fallback to SW only
implementation.

There are required changes to the SW bridge to be able to run the MRP. First the
bridge needs to initialize the netlink interface. And second it needs to know if
a MRP frame was received on a MRP ring port. In case it was received the SW
bridge should not forward the frame it needs to redirected to upper layes. In
case it was not received on a ring port then it just forwards it as usual.

To be able to offload this to the HW, it was required to extend the switchdev
API.

If this will be accepted then in the future the netlink interface can be
expended with multiple attributes which are required by different roles of the
MRP. Like Media Redundancy Automanager(MRA), Media Interconnect Manager(MIM) and
Media Interconnect Client(MIC).

[1] https://www.spinics.net/lists/netdev/msg623647.html
[2] https://www.spinics.net/lists/netdev/msg624378.html
[3] https://github.com/microchip-ung/mrp/tree/patch-v3

Horatiu Vultur (10):
  net: bridge: mrp: Expose mrp attributes.
  net: bridge: mrp: Expose function br_mrp_port_open
  net: bridge: mrp: Add MRP interface used by netlink
  net: bridge: mrp: Add generic netlink interface to configure MRP
  net: bridge: mrp: Update MRP interface to add switchdev support
  net: bridge: mrp: switchdev: Extend switchdev API to offload MRP
  net: bridge: mrp: switchdev: Implement MRP API for switchdev
  net: bridge: mrp: Connect MRP api with the switchev API
  net: bridge: mrp: Integrate MRP into the bridge
  net: bridge: mrp: Update Kconfig and Makefile

 include/linux/mrp_bridge.h      |  25 ++
 include/net/switchdev.h         |  51 +++
 include/uapi/linux/if_ether.h   |   1 +
 include/uapi/linux/mrp_bridge.h | 118 ++++++
 net/bridge/Kconfig              |  12 +
 net/bridge/Makefile             |   2 +
 net/bridge/br.c                 |  11 +
 net/bridge/br_device.c          |   3 +
 net/bridge/br_if.c              |   6 +
 net/bridge/br_input.c           |  14 +
 net/bridge/br_mrp.c             | 193 ++++++++++
 net/bridge/br_mrp_netlink.c     | 655 ++++++++++++++++++++++++++++++++
 net/bridge/br_mrp_switchdev.c   | 147 +++++++
 net/bridge/br_private.h         |  14 +
 net/bridge/br_private_mrp.h     |  58 +++
 15 files changed, 1310 insertions(+)
 create mode 100644 include/linux/mrp_bridge.h
 create mode 100644 include/uapi/linux/mrp_bridge.h
 create mode 100644 net/bridge/br_mrp.c
 create mode 100644 net/bridge/br_mrp_netlink.c
 create mode 100644 net/bridge/br_mrp_switchdev.c
 create mode 100644 net/bridge/br_private_mrp.h

Comments

Allan W. Nielsen Jan. 24, 2020, 8:34 p.m. UTC | #1
On 24.01.2020 17:18, Horatiu Vultur wrote:
>Media Redundancy Protocol is a data network protocol standardized by
>International Electrotechnical Commission as IEC 62439-2. It allows rings of
>Ethernet switches to overcome any single failure with recovery time faster than
>STP. It is primarily used in Industrial Ethernet applications.
>
>Based on the previous RFC[1][2], the MRP state machine and all the
>timers were moved to userspace. A generic netlink interface is added to
>allow configuring the HW, and logic added to to implement the MRP
>specific forwarding rules.
>
>The userspace application that is using the new netlink can be found here[3].
>
>The current implementation both in kernel and userspace supports only 2 roles:
>
>  MRM - this one is responsible to send MRP_Test and MRP_Topo frames on both
>  ring ports. It needs to process MRP_Test to know if the ring is open or
>  closed. This operation is desired to be offloaded to the HW because it
>  requires to generate and process up to 4000 frames per second. Whenever it
>  detects that the ring open it sends MRP_Topo frames to notify all MRC about
>  changes in the topology. MRM needs also to process MRP_LinkChange frames,
>  these frames are generated by the MRC. When the ring is open the the state
>  of both ports is to forward frames and when the ring is closed then the
>  secondary port is blocked.
>
>  MRC - this one is responsible to forward MRP frames between the ring ports.
>  In case one of the ring ports gets a link down or up, then MRC will generate
>  a MRP_LinkChange frames. This node should also process MRP_Topo frames and to
>  clear its FDB when it receives this frame.
>
> Userspace
>               Deamon +----------+ Client
>                +
>                |
> +--------------|-----------------------------------------+
>  Kernel        |
>                + Netlink
>
>                |                              + Interrupt
>                |                              |
> +--------------|------------------------------|----------+
>  HW            | Switchdev                    |
>                +                              |
>
>The user interacts using the client (called 'mrp'), the client talks to the
>deamon (called 'mrp_server'), which talks with the kernel using netlink. The
>kernel will try to offload the requests to the HW via switchdev API. For this a
>new generic netlink interface was added to the bridge.
>
>If the kernel cannot offload MRP to HW (maybe it does not have a switchdev
>driver, or it is just not supported), then all the netlink calls will return
>-EOPNOTSUPP. In this case the user-space deamon fallback to SW only
>implementation.
Horatiu and I have spend a bit of time discussing what you be best here.
An alternative to this would be to do the SW fallback in the kernel,
instead of user-land. This would mean that the user application does not
need to know if the function is offloaded (or partly offloaded) to HW.

We went with this approch to make the kernel part as simple as possible.
The alternative would still be much simpler than the first version
posted - but it would require a bit more.

Both options has pros and cons, and we looking forward to the
community's view on this.

>There are required changes to the SW bridge to be able to run the MRP. First the
>bridge needs to initialize the netlink interface. And second it needs to know if
>a MRP frame was received on a MRP ring port. In case it was received the SW
>bridge should not forward the frame it needs to redirected to upper layes. In
>case it was not received on a ring port then it just forwards it as usual.
>
>To be able to offload this to the HW, it was required to extend the switchdev
>API.
>
>If this will be accepted then in the future the netlink interface can be
>expended with multiple attributes which are required by different roles of the
>MRP. Like Media Redundancy Automanager(MRA), Media Interconnect Manager(MIM) and
>Media Interconnect Client(MIC).
>
>[1] https://www.spinics.net/lists/netdev/msg623647.html
>[2] https://www.spinics.net/lists/netdev/msg624378.html
>[3] https://github.com/microchip-ung/mrp/tree/patch-v3
>
>Horatiu Vultur (10):
>  net: bridge: mrp: Expose mrp attributes.
>  net: bridge: mrp: Expose function br_mrp_port_open
>  net: bridge: mrp: Add MRP interface used by netlink
>  net: bridge: mrp: Add generic netlink interface to configure MRP
>  net: bridge: mrp: Update MRP interface to add switchdev support
>  net: bridge: mrp: switchdev: Extend switchdev API to offload MRP
>  net: bridge: mrp: switchdev: Implement MRP API for switchdev
>  net: bridge: mrp: Connect MRP api with the switchev API
>  net: bridge: mrp: Integrate MRP into the bridge
>  net: bridge: mrp: Update Kconfig and Makefile
>
> include/linux/mrp_bridge.h      |  25 ++
> include/net/switchdev.h         |  51 +++
> include/uapi/linux/if_ether.h   |   1 +
> include/uapi/linux/mrp_bridge.h | 118 ++++++
> net/bridge/Kconfig              |  12 +
> net/bridge/Makefile             |   2 +
> net/bridge/br.c                 |  11 +
> net/bridge/br_device.c          |   3 +
> net/bridge/br_if.c              |   6 +
> net/bridge/br_input.c           |  14 +
> net/bridge/br_mrp.c             | 193 ++++++++++
> net/bridge/br_mrp_netlink.c     | 655 ++++++++++++++++++++++++++++++++
> net/bridge/br_mrp_switchdev.c   | 147 +++++++
> net/bridge/br_private.h         |  14 +
> net/bridge/br_private_mrp.h     |  58 +++
> 15 files changed, 1310 insertions(+)
> create mode 100644 include/linux/mrp_bridge.h
> create mode 100644 include/uapi/linux/mrp_bridge.h
> create mode 100644 net/bridge/br_mrp.c
> create mode 100644 net/bridge/br_mrp_netlink.c
> create mode 100644 net/bridge/br_mrp_switchdev.c
> create mode 100644 net/bridge/br_private_mrp.h
>
>-- 
>2.17.1
>
/Allan
Vinicius Costa Gomes Jan. 24, 2020, 9:05 p.m. UTC | #2
Hi,

"Allan W. Nielsen" <allan.nielsen@microchip.com> writes:

> On 24.01.2020 17:18, Horatiu Vultur wrote:
>>Media Redundancy Protocol is a data network protocol standardized by
>>International Electrotechnical Commission as IEC 62439-2. It allows rings of
>>Ethernet switches to overcome any single failure with recovery time faster than
>>STP. It is primarily used in Industrial Ethernet applications.
>>
>>Based on the previous RFC[1][2], the MRP state machine and all the
>>timers were moved to userspace. A generic netlink interface is added to
>>allow configuring the HW, and logic added to to implement the MRP
>>specific forwarding rules.
>>
>>The userspace application that is using the new netlink can be found here[3].
>>
>>The current implementation both in kernel and userspace supports only 2 roles:
>>
>>  MRM - this one is responsible to send MRP_Test and MRP_Topo frames on both
>>  ring ports. It needs to process MRP_Test to know if the ring is open or
>>  closed. This operation is desired to be offloaded to the HW because it
>>  requires to generate and process up to 4000 frames per second. Whenever it
>>  detects that the ring open it sends MRP_Topo frames to notify all MRC about
>>  changes in the topology. MRM needs also to process MRP_LinkChange frames,
>>  these frames are generated by the MRC. When the ring is open the the state
>>  of both ports is to forward frames and when the ring is closed then the
>>  secondary port is blocked.
>>
>>  MRC - this one is responsible to forward MRP frames between the ring ports.
>>  In case one of the ring ports gets a link down or up, then MRC will generate
>>  a MRP_LinkChange frames. This node should also process MRP_Topo frames and to
>>  clear its FDB when it receives this frame.
>>
>> Userspace
>>               Deamon +----------+ Client
>>                +
>>                |
>> +--------------|-----------------------------------------+
>>  Kernel        |
>>                + Netlink
>>
>>                |                              + Interrupt
>>                |                              |
>> +--------------|------------------------------|----------+
>>  HW            | Switchdev                    |
>>                +                              |
>>
>>The user interacts using the client (called 'mrp'), the client talks to the
>>deamon (called 'mrp_server'), which talks with the kernel using netlink. The
>>kernel will try to offload the requests to the HW via switchdev API. For this a
>>new generic netlink interface was added to the bridge.
>>
>>If the kernel cannot offload MRP to HW (maybe it does not have a switchdev
>>driver, or it is just not supported), then all the netlink calls will return
>>-EOPNOTSUPP. In this case the user-space deamon fallback to SW only
>>implementation.
> Horatiu and I have spend a bit of time discussing what you be best here.
> An alternative to this would be to do the SW fallback in the kernel,
> instead of user-land. This would mean that the user application does not
> need to know if the function is offloaded (or partly offloaded) to HW.
>
> We went with this approch to make the kernel part as simple as possible.
> The alternative would still be much simpler than the first version
> posted - but it would require a bit more.
>
> Both options has pros and cons, and we looking forward to the
> community's view on this.

I have one idea and one question.

The idea is:

'net/hsr' already has a software implementation of the HSR replication
tag (and some of the handling necessary). So what came to mind is to
add the necessary switchdev functions to the master HSR device. If
that's done, then it sounds that the rest will mostly work.

For the user the flow would be something like:

 - User takes two (or more interfaces) and set them as slaves of the HSR
   master device, say 'hsr0';

 - 'hsr0' implements some of the switchdev functionality so we can use
   the MRP userspace components on it;

Does it look like something that could work?

The question that I have is: what's the relation of IEC 62439-2 to IEEE
802.1CB? 


Cheers,
--
Vinicius
Allan W. Nielsen Jan. 25, 2020, 9:44 a.m. UTC | #3
Hi Vinicius,

On 24.01.2020 13:05, Vinicius Costa Gomes wrote:
>I have one idea and one question.

Let me answer the question before dicussing the idea.

>The question that I have is: what's the relation of IEC 62439-2 to IEEE
>802.1CB?
HSR and 802.1CB (often called FRER - Frame Replication and Elimination
for Reliability) shares a lot of functionallity. It is a while since I
read the 802.1CB standard, and I have only skimmed the HSR standard, but
as far as I understand 802.1CB is a super set of HSR. Also, I have not
studdied the HSR implementation.

Both HSR and 802.1CB replicate the frame and eliminate the additional
copies. If just 1 of the replicated fraems arrives, then higher layer
applications will not see any traffic lose.

MRP is different, it is a ring protocol, much more like ERPS defined in
G.8032 by ITU. Also, MRP only make sense in switches, it does not make
sense in a host (like HSR does).

In MRP, the higher layer application frames are not replicated. They are
send on either 1 port or the other.

Consider this exaple, with 3 nodes creating a ring. All notes has a br0
device which includes the 2 NICs.

     +------------------------------------------+
     |                                          |
     +-->|H1|<---------->|H2|<---------->|H3|<--+
     eth0    eth1    eth0    eth1    eth0    eth1

Lets say that H1 is the manager (MRM), and H2 + H3 is the client (MRC).

The MRM will now block one of the ports, lets say eth0, to prevent a
loop:

     +------------------------------------------+
     |                                          |
     +-->|H1|<---------->|H2|<---------->|H3|<--+
     eth0    eth1    eth0    eth1    eth0    eth1
      ^
      |
   Blocked


This mean that H1 can reach H2 and H3 via eth1
This mean that H2 can reach H1 eth0
This mean that H2 can reach H3 eth1
This mean that H3 can reach H1 and H2 via eth0

This is normal forwarding, doen by the MAC table.

Lets say that the link between H1 and H2 goes down:

     +------------------------------------------+
     |                                          |
     +-->|H1|<---  / --->|H2|<---------->|H3|<--+
     eth0    eth1    eth0    eth1    eth0    eth1

H1 will now observe that the test packets it sends on eth1, is not
received in eth0, meaninf that the ring is open, and it will unblock the
eth0 device, and send a message to all the nodes that they need to flush
the mac-table.

This mean that H1 can reach H2 and H3 via eth0
This mean that H2 can reach H1 and H3 via eth1
This mean that H3 can reach H2 eth0
This mean that H3 can reach H1 eth1

In all cases, higher layer application will use the br0 device to send
and receive frames. These higher layer applications will not see any
interruption (except during the few milliseconds it takes to unblock, and
flush the mac tables).

Sorry for the long explanation, but it is important to understand this
when discussion the design.

>The idea is:
>
>'net/hsr' already has a software implementation of the HSR replication
>tag (and some of the handling necessary). So what came to mind is to
>add the necessary switchdev functions to the master HSR device. If
>that's done, then it sounds that the rest will mostly work.
Maybe something could be done here, but it will not help MRP, as they do
not really share any functionality ;-)

>For the user the flow would be something like:
> - User takes two (or more interfaces) and set them as slaves of the HSR
>   master device, say 'hsr0';
> - 'hsr0' implements some of the switchdev functionality so we can use
>   the MRP userspace components on it;
For MRP to work, it really need the bridge interface, and the higher
layer applications needs to use the br0 device.

>Does it look like something that could work?
It would make much more sense if we discussed implementing 802.1CB in
some form (which we might get to).

/Allan
Andrew Lunn Jan. 25, 2020, 4:23 p.m. UTC | #4
> Lets say that the link between H1 and H2 goes down:
> 
>     +------------------------------------------+
>     |                                          |
>     +-->|H1|<---  / --->|H2|<---------->|H3|<--+
>     eth0    eth1    eth0    eth1    eth0    eth1
> 
> H1 will now observe that the test packets it sends on eth1, is not
> received in eth0, meaninf that the ring is open

Hi Allan

Is H1 the only device sending test packets? It is assumed that H2 and
H3 will forward them? Or does each device send test packets, and when
it stops hearing these packets from a neighbour, it assumes the link
is open?

   Andrew
Allan W. Nielsen Jan. 25, 2020, 7:12 p.m. UTC | #5
On 25.01.2020 17:23, Andrew Lunn wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>> Lets say that the link between H1 and H2 goes down:
>>
>>     +------------------------------------------+
>>     |                                          |
>>     +-->|H1|<---  / --->|H2|<---------->|H3|<--+
>>     eth0    eth1    eth0    eth1    eth0    eth1
>>
>> H1 will now observe that the test packets it sends on eth1, is not
>> received in eth0, meaninf that the ring is open
>
>Is H1 the only device sending test packets? It is assumed that H2 and
>H3 will forward them?
Horatiu, please correct me if I'm wrong, you have been spending more
time with the standard.

It is only the manager/MRM (in this case H1) which sends test-frames.
The other nodes (the MRC's) must forward the MRP-test frames, but only
on the ports which is part of the ring.

> Or does each device send test packets, and when it stops hearing these
> packets from a neighbour, it assumes the link is open?
No.

This also means that most non-MRP aware switches can properly act as
MRC with HW offload. It is good to have that in mind when reviewing the
netlink interface.

It is worth mentioning that the client shall send a frame if they see a
link up/down on one of the ring ports. This is to allow the manager to
react faster.

Also, in this first patch we have only defined the MRM and MRC roles. In
future version we would also like to support a MRA (auto manager), where
the clients is monitoring the test frames, and if there are no manager
(it disappear, or they are all MRA) then they can negotiate who should
take the role as manager.

/Allan
Vinicius Costa Gomes Jan. 25, 2020, 9:18 p.m. UTC | #6
Hi,

"Allan W. Nielsen" <allan.nielsen@microchip.com> writes:

> Hi Vinicius,
>
> On 24.01.2020 13:05, Vinicius Costa Gomes wrote:
>>I have one idea and one question.
>
> Let me answer the question before dicussing the idea.
>
>>The question that I have is: what's the relation of IEC 62439-2 to IEEE
>>802.1CB?
> HSR and 802.1CB (often called FRER - Frame Replication and Elimination
> for Reliability) shares a lot of functionallity. It is a while since I
> read the 802.1CB standard, and I have only skimmed the HSR standard, but
> as far as I understand 802.1CB is a super set of HSR. Also, I have not
> studdied the HSR implementation.
>
> Both HSR and 802.1CB replicate the frame and eliminate the additional
> copies. If just 1 of the replicated fraems arrives, then higher layer
> applications will not see any traffic lose.
>
> MRP is different, it is a ring protocol, much more like ERPS defined in
> G.8032 by ITU. Also, MRP only make sense in switches, it does not make
> sense in a host (like HSR does).
>
> In MRP, the higher layer application frames are not replicated. They are
> send on either 1 port or the other.
>
> Consider this exaple, with 3 nodes creating a ring. All notes has a br0
> device which includes the 2 NICs.
>
>      +------------------------------------------+
>      |                                          |
>      +-->|H1|<---------->|H2|<---------->|H3|<--+
>      eth0    eth1    eth0    eth1    eth0    eth1
>
> Lets say that H1 is the manager (MRM), and H2 + H3 is the client (MRC).
>
> The MRM will now block one of the ports, lets say eth0, to prevent a
> loop:
>
>      +------------------------------------------+
>      |                                          |
>      +-->|H1|<---------->|H2|<---------->|H3|<--+
>      eth0    eth1    eth0    eth1    eth0    eth1
>       ^
>       |
>    Blocked
>
>
> This mean that H1 can reach H2 and H3 via eth1
> This mean that H2 can reach H1 eth0
> This mean that H2 can reach H3 eth1
> This mean that H3 can reach H1 and H2 via eth0
>
> This is normal forwarding, doen by the MAC table.
>
> Lets say that the link between H1 and H2 goes down:
>
>      +------------------------------------------+
>      |                                          |
>      +-->|H1|<---  / --->|H2|<---------->|H3|<--+
>      eth0    eth1    eth0    eth1    eth0    eth1
>
> H1 will now observe that the test packets it sends on eth1, is not
> received in eth0, meaninf that the ring is open, and it will unblock the
> eth0 device, and send a message to all the nodes that they need to flush
> the mac-table.
>
> This mean that H1 can reach H2 and H3 via eth0
> This mean that H2 can reach H1 and H3 via eth1
> This mean that H3 can reach H2 eth0
> This mean that H3 can reach H1 eth1
>
> In all cases, higher layer application will use the br0 device to send
> and receive frames. These higher layer applications will not see any
> interruption (except during the few milliseconds it takes to unblock, and
> flush the mac tables).
>
> Sorry for the long explanation, but it is important to understand this
> when discussion the design.

Not at all, thanks a lot. Now it's clear to me that MRP and 802.1CB are
really different beasts, with different use cases/limitations:

 - MRP: now that we have a ring, let's break the loop, and use the
   redudancy provided by the ring to detect the problem and "repair" the
   network if something bad happens;

 - 802.1CB: now that we have a ring, let's send packets through
   two different paths, and find a way to discard duplicated ones, so
   even if something bad happens the packet will reach its destination;

(I know that it's more complicated than that in reality :-)

>
>>The idea is:
>>
>>'net/hsr' already has a software implementation of the HSR replication
>>tag (and some of the handling necessary). So what came to mind is to
>>add the necessary switchdev functions to the master HSR device. If
>>that's done, then it sounds that the rest will mostly work.
> Maybe something could be done here, but it will not help MRP, as they do
> not really share any functionality ;-)
>
>>For the user the flow would be something like:
>> - User takes two (or more interfaces) and set them as slaves of the HSR
>>   master device, say 'hsr0';
>> - 'hsr0' implements some of the switchdev functionality so we can use
>>   the MRP userspace components on it;
> For MRP to work, it really need the bridge interface, and the higher
> layer applications needs to use the br0 device.
>
>>Does it look like something that could work?
> It would make much more sense if we discussed implementing 802.1CB in
> some form (which we might get to).

I see. Agreed.

>
> /Allan


Cheers,
--
Vinicius
Jürgen Lambrecht Jan. 28, 2020, 10:35 a.m. UTC | #7
On 1/25/20 10:18 PM, Vinicius Costa Gomes wrote:
> CAUTION: This Email originated from outside Televic. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Hi,
>
> "Allan W. Nielsen" <allan.nielsen@microchip.com> writes:
>
>> Hi Vinicius,
>>
>> On 24.01.2020 13:05, Vinicius Costa Gomes wrote:
>>> I have one idea and one question.
>> Let me answer the question before dicussing the idea.
>>
>>> The question that I have is: what's the relation of IEC 62439-2 to IEEE
>>> 802.1CB?
>> HSR and 802.1CB (often called FRER - Frame Replication and Elimination
>> for Reliability) shares a lot of functionallity. It is a while since I
>> read the 802.1CB standard, and I have only skimmed the HSR standard, but
>> as far as I understand 802.1CB is a super set of HSR. Also, I have not
>> studdied the HSR implementation.
>> Both HSR and 802.1CB replicate the frame and eliminate the additional
>> copies. If just 1 of the replicated fraems arrives, then higher layer
>> applications will not see any traffic lose.
>>
>> MRP is different, it is a ring protocol, much more like ERPS defined in
>> G.8032 by ITU. Also, MRP only make sense in switches, it does not make
>> sense in a host (like HSR does).
>>
>> [snip MPR explanation]
>>
>> Sorry for the long explanation, but it is important to understand this
>> when discussion the design.
> Not at all, thanks a lot. Now it's clear to me that MRP and 802.1CB are
> really different beasts, with different use cases/limitations:
>
>  - MRP: now that we have a ring, let's break the loop, and use the
>    redudancy provided by the ring to detect the problem and "repair" the
>    network if something bad happens;
indeed. MRP is IEC 62439-2
>
>  - 802.1CB: now that we have a ring, let's send packets through
>    two different paths, and find a way to discard duplicated ones, so
>    even if something bad happens the packet will reach its destination;

Not exactly, 802.1CB is independent of the network layout, according to the abstract on https://ieeexplore.ieee.org/document/8091139.

The IEC 62439-3 standard mentions 2 network layouts: 2 parallel paths and a ring:

- IEC 62439-3.4 Parallel Redundancy Protocol (PRP): this runs on 2 separated parallel paths in the network

- IEC 62439-3.5 HSR (High availability seamless redundancy): this runs on a ring: each host sends all data in 2 directions, and when it receives its own data back, it discards it (to avoid a loop).

(and it is better to implement IEEE, because the standard costs only 151$, and the IEC ones cost 2x410$)

Kind regards,

Jürgen
Allan W. Nielsen Feb. 18, 2020, 12:18 p.m. UTC | #8
Hi All,

Its been a while since posting this serie. We got some good and very
specific comments, but there has not been much discussion on the overall
architecture.

Here is the list of items we have noted to be fixed in the next version:
- The headless chicken (it keeps sending test frames if user-space
   daemon dies)
- Avoid loops when bringing up the network - meaning we need to let MRP
   do its work before the br0 device is set to up, and we need to
   preserve that state.
- Unnessecary ifdef on the include.
- Extend the existing mac-table flush instead of adding
   BR_MRP_GENL_FLUSH
- Further optimize the changes in br_handle_frame

In v1 & v2 we had the entire protocol implemented in kernel-space.
Everybody told us this is a bad idea, and in v3 we have moved as much as
possible to user-space, and only kept the HW offload facilites in
kernel-space. The protocol is then implemented in user-space.

This is nice because it simplifies the code in the kernel and moves it
to user-space where such complexity is easier to handle. The downside of
this is that it makes the netlink interface more specific to our HW.

The way v3 is implemented, the netlink API returns an error if a given
operation cannot be HW offloaded. If the netlink calls return Ok,
user-space will trust that HW do the offloading as requested, if the
netlink calls return an error, it will implement all the functionallity
in user-space.

This works at-least in 2 scenarios: The HW we have with full MRP offload
capabilities, and a pure SW bridge.

But we should try make sure this also works in a backwards compatible
way with future MRP aware HW, and with existing (and future) SwitchDev
offloaded HW. At the very least we want to make this run on Ocelot, HW
offload the MRC role, but do the MRM in SW (as the HW is not capable of
this).

If we use the kernel to abstract the MRP forwarding (not the entire
protocol like we did in v1/v2, not just the HW like we did in v3) then
we will have more flxibility to support other HW with a different set of
offload facilities, we can most likely achieve better performance, and
it would be a cleaner design.

This will mean, that if user-space ask for MRP frame to be generated,
the kernel should make sure it will happen. The kernel can try to
offload this via the switchdev API, or it can do it in kernel-space.

Again, it will mean putting back some code into kernel space, but I
think it is worth it.

What do you think, what is the right design.

/Allan


On 24.01.2020 17:18, Horatiu Vultur wrote:
>Media Redundancy Protocol is a data network protocol standardized by
>International Electrotechnical Commission as IEC 62439-2. It allows rings of
>Ethernet switches to overcome any single failure with recovery time faster than
>STP. It is primarily used in Industrial Ethernet applications.
>
>Based on the previous RFC[1][2], the MRP state machine and all the
>timers were moved to userspace. A generic netlink interface is added to
>allow configuring the HW, and logic added to to implement the MRP
>specific forwarding rules.
>
>The userspace application that is using the new netlink can be found here[3].
>
>The current implementation both in kernel and userspace supports only 2 roles:
>
>  MRM - this one is responsible to send MRP_Test and MRP_Topo frames on both
>  ring ports. It needs to process MRP_Test to know if the ring is open or
>  closed. This operation is desired to be offloaded to the HW because it
>  requires to generate and process up to 4000 frames per second. Whenever it
>  detects that the ring open it sends MRP_Topo frames to notify all MRC about
>  changes in the topology. MRM needs also to process MRP_LinkChange frames,
>  these frames are generated by the MRC. When the ring is open the the state
>  of both ports is to forward frames and when the ring is closed then the
>  secondary port is blocked.
>
>  MRC - this one is responsible to forward MRP frames between the ring ports.
>  In case one of the ring ports gets a link down or up, then MRC will generate
>  a MRP_LinkChange frames. This node should also process MRP_Topo frames and to
>  clear its FDB when it receives this frame.
>
> Userspace
>               Deamon +----------+ Client
>                +
>                |
> +--------------|-----------------------------------------+
>  Kernel        |
>                + Netlink
>
>                |                              + Interrupt
>                |                              |
> +--------------|------------------------------|----------+
>  HW            | Switchdev                    |
>                +                              |
>
>The user interacts using the client (called 'mrp'), the client talks to the
>deamon (called 'mrp_server'), which talks with the kernel using netlink. The
>kernel will try to offload the requests to the HW via switchdev API. For this a
>new generic netlink interface was added to the bridge.
>
>If the kernel cannot offload MRP to HW (maybe it does not have a switchdev
>driver, or it is just not supported), then all the netlink calls will return
>-EOPNOTSUPP. In this case the user-space deamon fallback to SW only
>implementation.
>
>There are required changes to the SW bridge to be able to run the MRP. First the
>bridge needs to initialize the netlink interface. And second it needs to know if
>a MRP frame was received on a MRP ring port. In case it was received the SW
>bridge should not forward the frame it needs to redirected to upper layes. In
>case it was not received on a ring port then it just forwards it as usual.
>
>To be able to offload this to the HW, it was required to extend the switchdev
>API.
>
>If this will be accepted then in the future the netlink interface can be
>expended with multiple attributes which are required by different roles of the
>MRP. Like Media Redundancy Automanager(MRA), Media Interconnect Manager(MIM) and
>Media Interconnect Client(MIC).
>
>[1] https://www.spinics.net/lists/netdev/msg623647.html
>[2] https://www.spinics.net/lists/netdev/msg624378.html
>[3] https://github.com/microchip-ung/mrp/tree/patch-v3
>
>Horatiu Vultur (10):
>  net: bridge: mrp: Expose mrp attributes.
>  net: bridge: mrp: Expose function br_mrp_port_open
>  net: bridge: mrp: Add MRP interface used by netlink
>  net: bridge: mrp: Add generic netlink interface to configure MRP
>  net: bridge: mrp: Update MRP interface to add switchdev support
>  net: bridge: mrp: switchdev: Extend switchdev API to offload MRP
>  net: bridge: mrp: switchdev: Implement MRP API for switchdev
>  net: bridge: mrp: Connect MRP api with the switchev API
>  net: bridge: mrp: Integrate MRP into the bridge
>  net: bridge: mrp: Update Kconfig and Makefile
>
> include/linux/mrp_bridge.h      |  25 ++
> include/net/switchdev.h         |  51 +++
> include/uapi/linux/if_ether.h   |   1 +
> include/uapi/linux/mrp_bridge.h | 118 ++++++
> net/bridge/Kconfig              |  12 +
> net/bridge/Makefile             |   2 +
> net/bridge/br.c                 |  11 +
> net/bridge/br_device.c          |   3 +
> net/bridge/br_if.c              |   6 +
> net/bridge/br_input.c           |  14 +
> net/bridge/br_mrp.c             | 193 ++++++++++
> net/bridge/br_mrp_netlink.c     | 655 ++++++++++++++++++++++++++++++++
> net/bridge/br_mrp_switchdev.c   | 147 +++++++
> net/bridge/br_private.h         |  14 +
> net/bridge/br_private_mrp.h     |  58 +++
> 15 files changed, 1310 insertions(+)
> create mode 100644 include/linux/mrp_bridge.h
> create mode 100644 include/uapi/linux/mrp_bridge.h
> create mode 100644 net/bridge/br_mrp.c
> create mode 100644 net/bridge/br_mrp_netlink.c
> create mode 100644 net/bridge/br_mrp_switchdev.c
> create mode 100644 net/bridge/br_private_mrp.h
>
>-- 
>2.17.1
>
/Allan
Jakub Kicinski Feb. 18, 2020, 4:55 p.m. UTC | #9
On Tue, 18 Feb 2020 13:18:11 +0100 Allan W. Nielsen wrote:
> But we should try make sure this also works in a backwards compatible
> way with future MRP aware HW, and with existing (and future) SwitchDev
> offloaded HW. At the very least we want to make this run on Ocelot, HW
> offload the MRC role, but do the MRM in SW (as the HW is not capable of
> this).
> 
> If we use the kernel to abstract the MRP forwarding (not the entire
> protocol like we did in v1/v2, not just the HW like we did in v3) then
> we will have more flxibility to support other HW with a different set of
> offload facilities, we can most likely achieve better performance, and
> it would be a cleaner design.
> 
> This will mean, that if user-space ask for MRP frame to be generated,
> the kernel should make sure it will happen. The kernel can try to
> offload this via the switchdev API, or it can do it in kernel-space.
> 
> Again, it will mean putting back some code into kernel space, but I
> think it is worth it.

FWIW having the guarantee that the kernel can always perform requested
service/operation (either thru offload or in SW in kernel space) seems
appealing and in line what we've been doing for other offloads. 

IOW it'd be nice to have a kernel space software fallback for the
offloaded operations.
Nikolay Aleksandrov Feb. 20, 2020, 10:48 a.m. UTC | #10
On 18/02/2020 14:18, Allan W. Nielsen wrote:
> 
> Hi All,
> 
> Its been a while since posting this serie. We got some good and very
> specific comments, but there has not been much discussion on the overall
> architecture.
> 
> Here is the list of items we have noted to be fixed in the next version:
> - The headless chicken (it keeps sending test frames if user-space
>   daemon dies)
> - Avoid loops when bringing up the network - meaning we need to let MRP
>   do its work before the br0 device is set to up, and we need to
>   preserve that state.
> - Unnessecary ifdef on the include.
> - Extend the existing mac-table flush instead of adding
>   BR_MRP_GENL_FLUSH
> - Further optimize the changes in br_handle_frame
> 
> In v1 & v2 we had the entire protocol implemented in kernel-space.
> Everybody told us this is a bad idea, and in v3 we have moved as much as
> possible to user-space, and only kept the HW offload facilites in
> kernel-space. The protocol is then implemented in user-space.
> 
> This is nice because it simplifies the code in the kernel and moves it
> to user-space where such complexity is easier to handle. The downside of
> this is that it makes the netlink interface more specific to our HW.
> 
> The way v3 is implemented, the netlink API returns an error if a given
> operation cannot be HW offloaded. If the netlink calls return Ok,
> user-space will trust that HW do the offloading as requested, if the
> netlink calls return an error, it will implement all the functionallity
> in user-space.
> 
> This works at-least in 2 scenarios: The HW we have with full MRP offload
> capabilities, and a pure SW bridge.
> 
> But we should try make sure this also works in a backwards compatible
> way with future MRP aware HW, and with existing (and future) SwitchDev
> offloaded HW. At the very least we want to make this run on Ocelot, HW
> offload the MRC role, but do the MRM in SW (as the HW is not capable of
> this).
> 
> If we use the kernel to abstract the MRP forwarding (not the entire
> protocol like we did in v1/v2, not just the HW like we did in v3) then
> we will have more flxibility to support other HW with a different set of
> offload facilities, we can most likely achieve better performance, and
> it would be a cleaner design.
> 
> This will mean, that if user-space ask for MRP frame to be generated,
> the kernel should make sure it will happen. The kernel can try to
> offload this via the switchdev API, or it can do it in kernel-space.
> 
> Again, it will mean putting back some code into kernel space, but I
> think it is worth it.
> 
> What do you think, what is the right design.
> 
> /Allan
> 
> 

In light of all the discussions and details that were explained, and as you've
noted above, I think more code should be put in kernel space at the very least
the performance/latency critical parts would benefit from being executed in the
kernel (kind of control/data-plane separation). It seems from the switchdev calls there's
a minimal state working set which define the behaviour and can be used to make
decisions (similar to STP) in the kernel, but the complex logic how to set them can be
executed in user-space meaning that maybe these hw-offload calls can have a simple SW
fallback logic based on their current values. I think it is worth considering if this can
be achieved before going to full in-kernel implementation of the state machine.
Since you intend to hw-offload it then putting in some SW fallback logic would be good
when the HW can't offload everything, what you suggest above also sounds good to me and
I think you'll have to extend mdb and the multicast code to do it, but IIRC you already
have code to do that based on previous discussions.

As was already suggested you can put the MRP options in the bridge's options and
process them from the bridge netlink code, that should simplify your code. You could
also make the port's "mrp_aware" bool into an internal port flag (use net_bridge_port's
flags field) so it can be quickly tested and in a hot cache line.

> On 24.01.2020 17:18, Horatiu Vultur wrote:
>> Media Redundancy Protocol is a data network protocol standardized by
>> International Electrotechnical Commission as IEC 62439-2. It allows rings of
>> Ethernet switches to overcome any single failure with recovery time faster than
>> STP. It is primarily used in Industrial Ethernet applications.
>>
>> Based on the previous RFC[1][2], the MRP state machine and all the
>> timers were moved to userspace. A generic netlink interface is added to
>> allow configuring the HW, and logic added to to implement the MRP
>> specific forwarding rules.
>>
>> The userspace application that is using the new netlink can be found here[3].
>>
>> The current implementation both in kernel and userspace supports only 2 roles:
>>
>>  MRM - this one is responsible to send MRP_Test and MRP_Topo frames on both
>>  ring ports. It needs to process MRP_Test to know if the ring is open or
>>  closed. This operation is desired to be offloaded to the HW because it
>>  requires to generate and process up to 4000 frames per second. Whenever it
>>  detects that the ring open it sends MRP_Topo frames to notify all MRC about
>>  changes in the topology. MRM needs also to process MRP_LinkChange frames,
>>  these frames are generated by the MRC. When the ring is open the the state
>>  of both ports is to forward frames and when the ring is closed then the
>>  secondary port is blocked.
>>
>>  MRC - this one is responsible to forward MRP frames between the ring ports.
>>  In case one of the ring ports gets a link down or up, then MRC will generate
>>  a MRP_LinkChange frames. This node should also process MRP_Topo frames and to
>>  clear its FDB when it receives this frame.
>>
>> Userspace
>>               Deamon +----------+ Client
>>                +
>>                |
>> +--------------|-----------------------------------------+
>>  Kernel        |
>>                + Netlink
>>
>>                |                              + Interrupt
>>                |                              |
>> +--------------|------------------------------|----------+
>>  HW            | Switchdev                    |
>>                +                              |
>>
>> The user interacts using the client (called 'mrp'), the client talks to the
>> deamon (called 'mrp_server'), which talks with the kernel using netlink. The
>> kernel will try to offload the requests to the HW via switchdev API. For this a
>> new generic netlink interface was added to the bridge.
>>
>> If the kernel cannot offload MRP to HW (maybe it does not have a switchdev
>> driver, or it is just not supported), then all the netlink calls will return
>> -EOPNOTSUPP. In this case the user-space deamon fallback to SW only
>> implementation.
>>
>> There are required changes to the SW bridge to be able to run the MRP. First the
>> bridge needs to initialize the netlink interface. And second it needs to know if
>> a MRP frame was received on a MRP ring port. In case it was received the SW
>> bridge should not forward the frame it needs to redirected to upper layes. In
>> case it was not received on a ring port then it just forwards it as usual.
>>
>> To be able to offload this to the HW, it was required to extend the switchdev
>> API.
>>
>> If this will be accepted then in the future the netlink interface can be
>> expended with multiple attributes which are required by different roles of the
>> MRP. Like Media Redundancy Automanager(MRA), Media Interconnect Manager(MIM) and
>> Media Interconnect Client(MIC).
>>
>> [1] https://www.spinics.net/lists/netdev/msg623647.html
>> [2] https://www.spinics.net/lists/netdev/msg624378.html
>> [3] https://github.com/microchip-ung/mrp/tree/patch-v3
>>
>> Horatiu Vultur (10):
>>  net: bridge: mrp: Expose mrp attributes.
>>  net: bridge: mrp: Expose function br_mrp_port_open
>>  net: bridge: mrp: Add MRP interface used by netlink
>>  net: bridge: mrp: Add generic netlink interface to configure MRP
>>  net: bridge: mrp: Update MRP interface to add switchdev support
>>  net: bridge: mrp: switchdev: Extend switchdev API to offload MRP
>>  net: bridge: mrp: switchdev: Implement MRP API for switchdev
>>  net: bridge: mrp: Connect MRP api with the switchev API
>>  net: bridge: mrp: Integrate MRP into the bridge
>>  net: bridge: mrp: Update Kconfig and Makefile
>>
>> include/linux/mrp_bridge.h      |  25 ++
>> include/net/switchdev.h         |  51 +++
>> include/uapi/linux/if_ether.h   |   1 +
>> include/uapi/linux/mrp_bridge.h | 118 ++++++
>> net/bridge/Kconfig              |  12 +
>> net/bridge/Makefile             |   2 +
>> net/bridge/br.c                 |  11 +
>> net/bridge/br_device.c          |   3 +
>> net/bridge/br_if.c              |   6 +
>> net/bridge/br_input.c           |  14 +
>> net/bridge/br_mrp.c             | 193 ++++++++++
>> net/bridge/br_mrp_netlink.c     | 655 ++++++++++++++++++++++++++++++++
>> net/bridge/br_mrp_switchdev.c   | 147 +++++++
>> net/bridge/br_private.h         |  14 +
>> net/bridge/br_private_mrp.h     |  58 +++
>> 15 files changed, 1310 insertions(+)
>> create mode 100644 include/linux/mrp_bridge.h
>> create mode 100644 include/uapi/linux/mrp_bridge.h
>> create mode 100644 net/bridge/br_mrp.c
>> create mode 100644 net/bridge/br_mrp_netlink.c
>> create mode 100644 net/bridge/br_mrp_switchdev.c
>> create mode 100644 net/bridge/br_private_mrp.h
>>
>> -- 
>> 2.17.1
>>
> /Allan
Allan W. Nielsen Feb. 20, 2020, 12:58 p.m. UTC | #11
Hi Nik,

On 20.02.2020 12:48, Nikolay Aleksandrov wrote:
> In light of all the discussions and details that were explained, and as you've
> noted above, I think more code should be put in kernel space at the very least
> the performance/latency critical parts would benefit from being executed in the
> kernel (kind of control/data-plane separation). It seems from the switchdev calls there's
> a minimal state working set which define the behaviour and can be used to make
> decisions (similar to STP) in the kernel, but the complex logic how to set them can be
> executed in user-space meaning that maybe these hw-offload calls can have a simple SW
> fallback logic based on their current values. I think it is worth considering if this can
> be achieved before going to full in-kernel implementation of the state machine.
> Since you intend to hw-offload it then putting in some SW fallback logic would be good
> when the HW can't offload everything, what you suggest above also sounds good to me and
> I think you'll have to extend mdb and the multicast code to do it, but IIRC you already
> have code to do that based on previous discussions.
Sounds good. We will continue working on defining a good control/data-plane separation
and only keep the data-plane in the kernel. Also it seems that we agree that the SW fallback
of the data-plane should stay in the kernel - we will do that.

> As was already suggested you can put the MRP options in the bridge's options and
> process them from the bridge netlink code, that should simplify your code.
I'm okay with this.

The main argument I see for creating a seperate MRP netlink interface
instead of extending the bridge, is that MRP is properly not the last
bridge protocol we will want to work on.

To complete the MRP-2018 implementation, we will also need some CFM
support (defined in 802.1Qag or the latest version of 802.1Q). And
furhter out we will properly want to implement the full CFM protocol.

DLR may also be relevant at some point, and there may be other.

My main point is, that at some point we will properly want to do
seperate NETLINK interfaces for this. Not sure if that is now or later.

> You could also make the port's "mrp_aware" bool into an internal port
> flag (use net_bridge_port's flags field) so it can be quickly tested
> and in a hot cache line.
Good point, we will do that.

/Allan