mbox series

[ovs-dev,v7,0/5] Implement MTU Path Discovery for multichassis ports

Message ID 20230523215537.1167155-1-ihrachys@redhat.com
Headers show
Series Implement MTU Path Discovery for multichassis ports | expand

Message

Ihar Hrachyshka May 23, 2023, 9:55 p.m. UTC
This series fixes a non-optimal behavior with some multichassis ports.

Specifically,

- when a multichassis port belongs to a switch that also has a localnet
  port,
- because ingress and egress traffic for the port is funnelled through
  tunnels to guarantee delivery of packets to all chassis that host the
  port,
- because tunnels add overhead to each funnelled packet,
- leaving less space for actual packets,

... the effective MTU of the path for multichassis ports is reduced by
the size that takes the tunnel to wrap a packet around. (This depends on
the type of tunnel, also on IP version of the tunnel.)

When this happens, the port and its peers are not notified about the
reduced MTU for the path to the port, effectively blackholing some of
the larger packets.

This patch series makes OVN detect these scenarios and handle them as
follows:

- for all multichassis ports, 4 flows are added - for inport and
  outport matches - to detect "too-big" IP packets;
- for all "too-big" packets, 2 flows are added to produce a ICMP
  Fragmentation Needed (for IPv4) or Too Big (for IPv6) packet and
  return it to the offending sender.

The proposed implementation is isolated in ovn-controller. An
alternative would be to implement flows producing ICMP errors via the
existing Logical_Flow action icmp4_error, as is done for router ports.
In this case, the 4 flows that detect "too-big" packets would still have
to reside in ovn-controller because of limitations of the current OVN
port model, namely lack of `mtu` as an attribute of OVN logical port.

Caveats: this works for IP traffic only. The party receiving the ICMP
error should handle it by temporarily lowering the MTU used to reach the
port. Behavior may depend on protocol implementation of the offending
peer as well as its networking stack configuration.

This patch series was successfully tested with Linux kernel 6.0.8.

This patch series is designed to simplify backporting process. It is
split in logical pieces to simplify cherry-picking. I consider the
original behavior described at the start of the cover letter a bug and
worth a backport.

---
v1: - initial version.
v2: - more system test cases adjusted to new table numbers for egress
      pipeline;
    - switch from xxd to coreutils (tr, head) tools to avoid a new
      dependency;
    - adjusted a comment in the new test case to avoid "migrator"
      terminology that makes no sense outside live migration context -
      which is not the only potential use case for multichassis ports;
    - guard the new test case with HAVE_SCAPY;
    - nit: rewrapped some lines in patch 6/7 to avoid unnecessary line
      changes;
    - added a NEWS entry for patch 6/7 that implements the core of the
      feature;
    - rebased to latest main.
v3: - expose and reuse lib/actions.c:encode_[start|finish]_controller_op
      functions;
    - reuse ETH_HEADER_LEN, IP_HEADER_LEN, and IPV6_HEADER_LEN from
      ovs:lib/packets.h;
    - add a comment for REGBIT_PKT_LARGER that mentions that the
      register is also used in OFTABLE_OUTPUT_LARGE_PKT_DETECT table;
    - squash the patch that makes if-status-mgr track changes to
      interface mtu into the patch that introduces the mtu field in the
      manager;
    - doc: don't describe path discovery flows in the patch that
      introduces new tables (only in the patch that actually implements
      the flows);
    - nit: use MAX to shave off several lines of code;
    - nit: remove redundant if_status_mgr_iface_get_mtu declaration from
      a header file;
    - rebased to latest main;
    - updated acks based on latest review comments.
v4: - fix compilation issue in the middle of the series due to previous
      commit rearrangement.
v5: - introduce a new if-status-mgr input to pflow_output, then process
      OVS interface updates inside if-status-mgr handlers.
    - if-status-mgr: remove set_mtu public function, instead other
      modules should call to if_status_mgr_iface_update. This allows the
      callers to not care which particular fields if-status-mgr would
      like to persist from the OVS record, keeping concerns separated.
v6: - update more tests in ovn-controller.at to reflect new table
      numbers.
v7: - remove ovs submodule bump that creeped in by mistake.
---

Ihar Hrachyshka (5):
  Track ip version of tunnel in chassis_tunnel struct
  Track interface MTU in if-status-mgr
  if-status: track interfaces for additional chassis
  Add new egress tables to accommodate for too-big packets handling
  Implement MTU Path Discovery for multichassis ports

 NEWS                        |   6 +
 controller/binding.c        |  48 +--
 controller/binding.h        |   4 +
 controller/if-status.c      |  48 ++-
 controller/if-status.h      |   8 +-
 controller/lflow.c          |   4 +-
 controller/lflow.h          |  49 +--
 controller/local_data.c     |   2 +
 controller/local_data.h     |   1 +
 controller/ovn-controller.c | 122 +++++++
 controller/ovsport.c        |   9 +
 controller/ovsport.h        |   2 +
 controller/physical.c       | 337 ++++++++++++++++++--
 controller/physical.h       |   2 +
 controller/pinctrl.c        |   8 +-
 include/ovn/actions.h       |   3 +
 lib/actions.c               |   4 +-
 lib/ovn-util.h              |   7 +
 northd/northd.c             |   2 +
 ovn-architecture.7.xml      |  76 ++---
 tests/ovn-controller.at     | 298 +++++++++---------
 tests/ovn.at                | 611 +++++++++++++++++++++++++++---------
 tests/system-ovn-kmod.at    |   2 +-
 tests/system-ovn.at         |   8 +-
 24 files changed, 1241 insertions(+), 420 deletions(-)

Comments

Ales Musil May 25, 2023, 8:51 a.m. UTC | #1
On Tue, May 23, 2023 at 11:56 PM Ihar Hrachyshka <ihrachys@redhat.com>
wrote:

> This series fixes a non-optimal behavior with some multichassis ports.
>
> Specifically,
>
> - when a multichassis port belongs to a switch that also has a localnet
>   port,
> - because ingress and egress traffic for the port is funnelled through
>   tunnels to guarantee delivery of packets to all chassis that host the
>   port,
> - because tunnels add overhead to each funnelled packet,
> - leaving less space for actual packets,
>
> ... the effective MTU of the path for multichassis ports is reduced by
> the size that takes the tunnel to wrap a packet around. (This depends on
> the type of tunnel, also on IP version of the tunnel.)
>
> When this happens, the port and its peers are not notified about the
> reduced MTU for the path to the port, effectively blackholing some of
> the larger packets.
>
> This patch series makes OVN detect these scenarios and handle them as
> follows:
>
> - for all multichassis ports, 4 flows are added - for inport and
>   outport matches - to detect "too-big" IP packets;
> - for all "too-big" packets, 2 flows are added to produce a ICMP
>   Fragmentation Needed (for IPv4) or Too Big (for IPv6) packet and
>   return it to the offending sender.
>
> The proposed implementation is isolated in ovn-controller. An
> alternative would be to implement flows producing ICMP errors via the
> existing Logical_Flow action icmp4_error, as is done for router ports.
> In this case, the 4 flows that detect "too-big" packets would still have
> to reside in ovn-controller because of limitations of the current OVN
> port model, namely lack of `mtu` as an attribute of OVN logical port.
>
> Caveats: this works for IP traffic only. The party receiving the ICMP
> error should handle it by temporarily lowering the MTU used to reach the
> port. Behavior may depend on protocol implementation of the offending
> peer as well as its networking stack configuration.
>
> This patch series was successfully tested with Linux kernel 6.0.8.
>
> This patch series is designed to simplify backporting process. It is
> split in logical pieces to simplify cherry-picking. I consider the
> original behavior described at the start of the cover letter a bug and
> worth a backport.
>
> ---
> v1: - initial version.
> v2: - more system test cases adjusted to new table numbers for egress
>       pipeline;
>     - switch from xxd to coreutils (tr, head) tools to avoid a new
>       dependency;
>     - adjusted a comment in the new test case to avoid "migrator"
>       terminology that makes no sense outside live migration context -
>       which is not the only potential use case for multichassis ports;
>     - guard the new test case with HAVE_SCAPY;
>     - nit: rewrapped some lines in patch 6/7 to avoid unnecessary line
>       changes;
>     - added a NEWS entry for patch 6/7 that implements the core of the
>       feature;
>     - rebased to latest main.
> v3: - expose and reuse lib/actions.c:encode_[start|finish]_controller_op
>       functions;
>     - reuse ETH_HEADER_LEN, IP_HEADER_LEN, and IPV6_HEADER_LEN from
>       ovs:lib/packets.h;
>     - add a comment for REGBIT_PKT_LARGER that mentions that the
>       register is also used in OFTABLE_OUTPUT_LARGE_PKT_DETECT table;
>     - squash the patch that makes if-status-mgr track changes to
>       interface mtu into the patch that introduces the mtu field in the
>       manager;
>     - doc: don't describe path discovery flows in the patch that
>       introduces new tables (only in the patch that actually implements
>       the flows);
>     - nit: use MAX to shave off several lines of code;
>     - nit: remove redundant if_status_mgr_iface_get_mtu declaration from
>       a header file;
>     - rebased to latest main;
>     - updated acks based on latest review comments.
> v4: - fix compilation issue in the middle of the series due to previous
>       commit rearrangement.
> v5: - introduce a new if-status-mgr input to pflow_output, then process
>       OVS interface updates inside if-status-mgr handlers.
>     - if-status-mgr: remove set_mtu public function, instead other
>       modules should call to if_status_mgr_iface_update. This allows the
>       callers to not care which particular fields if-status-mgr would
>       like to persist from the OVS record, keeping concerns separated.
> v6: - update more tests in ovn-controller.at to reflect new table
>       numbers.
> v7: - remove ovs submodule bump that creeped in by mistake.
> ---
>
> Ihar Hrachyshka (5):
>   Track ip version of tunnel in chassis_tunnel struct
>   Track interface MTU in if-status-mgr
>   if-status: track interfaces for additional chassis
>   Add new egress tables to accommodate for too-big packets handling
>   Implement MTU Path Discovery for multichassis ports
>
>  NEWS                        |   6 +
>  controller/binding.c        |  48 +--
>  controller/binding.h        |   4 +
>  controller/if-status.c      |  48 ++-
>  controller/if-status.h      |   8 +-
>  controller/lflow.c          |   4 +-
>  controller/lflow.h          |  49 +--
>  controller/local_data.c     |   2 +
>  controller/local_data.h     |   1 +
>  controller/ovn-controller.c | 122 +++++++
>  controller/ovsport.c        |   9 +
>  controller/ovsport.h        |   2 +
>  controller/physical.c       | 337 ++++++++++++++++++--
>  controller/physical.h       |   2 +
>  controller/pinctrl.c        |   8 +-
>  include/ovn/actions.h       |   3 +
>  lib/actions.c               |   4 +-
>  lib/ovn-util.h              |   7 +
>  northd/northd.c             |   2 +
>  ovn-architecture.7.xml      |  76 ++---
>  tests/ovn-controller.at     | 298 +++++++++---------
>  tests/ovn.at                | 611 +++++++++++++++++++++++++++---------
>  tests/system-ovn-kmod.at    |   2 +-
>  tests/system-ovn.at         |   8 +-
>  24 files changed, 1241 insertions(+), 420 deletions(-)
>
> --
> 2.38.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
The whole series looks good to me, thanks.

Reviewed-by: Ales Musil <amusil@redhat.com>
Mark Michelson May 30, 2023, 6:27 p.m. UTC | #2
Thanks Ihar and Dumitru.

I applied this series to main and branch-23.06. I encountered conflicts 
when trying to apply this to branch-23.03, specifically with patch 4. I 
suspect a lot of this is due to the fact that the tiered ACL changes are 
not present in branch-23.03, so

* The table numbers don't work out the same in some cases.
* ACL flow actions are different prior to 23.06.

If you can post backport versions of the patches for 23.03 and below, 
that would be great, Ihar.

Thanks,
Mark Michelson

On 5/23/23 17:55, Ihar Hrachyshka wrote:
> This series fixes a non-optimal behavior with some multichassis ports.
> 
> Specifically,
> 
> - when a multichassis port belongs to a switch that also has a localnet
>    port,
> - because ingress and egress traffic for the port is funnelled through
>    tunnels to guarantee delivery of packets to all chassis that host the
>    port,
> - because tunnels add overhead to each funnelled packet,
> - leaving less space for actual packets,
> 
> ... the effective MTU of the path for multichassis ports is reduced by
> the size that takes the tunnel to wrap a packet around. (This depends on
> the type of tunnel, also on IP version of the tunnel.)
> 
> When this happens, the port and its peers are not notified about the
> reduced MTU for the path to the port, effectively blackholing some of
> the larger packets.
> 
> This patch series makes OVN detect these scenarios and handle them as
> follows:
> 
> - for all multichassis ports, 4 flows are added - for inport and
>    outport matches - to detect "too-big" IP packets;
> - for all "too-big" packets, 2 flows are added to produce a ICMP
>    Fragmentation Needed (for IPv4) or Too Big (for IPv6) packet and
>    return it to the offending sender.
> 
> The proposed implementation is isolated in ovn-controller. An
> alternative would be to implement flows producing ICMP errors via the
> existing Logical_Flow action icmp4_error, as is done for router ports.
> In this case, the 4 flows that detect "too-big" packets would still have
> to reside in ovn-controller because of limitations of the current OVN
> port model, namely lack of `mtu` as an attribute of OVN logical port.
> 
> Caveats: this works for IP traffic only. The party receiving the ICMP
> error should handle it by temporarily lowering the MTU used to reach the
> port. Behavior may depend on protocol implementation of the offending
> peer as well as its networking stack configuration.
> 
> This patch series was successfully tested with Linux kernel 6.0.8.
> 
> This patch series is designed to simplify backporting process. It is
> split in logical pieces to simplify cherry-picking. I consider the
> original behavior described at the start of the cover letter a bug and
> worth a backport.
> 
> ---
> v1: - initial version.
> v2: - more system test cases adjusted to new table numbers for egress
>        pipeline;
>      - switch from xxd to coreutils (tr, head) tools to avoid a new
>        dependency;
>      - adjusted a comment in the new test case to avoid "migrator"
>        terminology that makes no sense outside live migration context -
>        which is not the only potential use case for multichassis ports;
>      - guard the new test case with HAVE_SCAPY;
>      - nit: rewrapped some lines in patch 6/7 to avoid unnecessary line
>        changes;
>      - added a NEWS entry for patch 6/7 that implements the core of the
>        feature;
>      - rebased to latest main.
> v3: - expose and reuse lib/actions.c:encode_[start|finish]_controller_op
>        functions;
>      - reuse ETH_HEADER_LEN, IP_HEADER_LEN, and IPV6_HEADER_LEN from
>        ovs:lib/packets.h;
>      - add a comment for REGBIT_PKT_LARGER that mentions that the
>        register is also used in OFTABLE_OUTPUT_LARGE_PKT_DETECT table;
>      - squash the patch that makes if-status-mgr track changes to
>        interface mtu into the patch that introduces the mtu field in the
>        manager;
>      - doc: don't describe path discovery flows in the patch that
>        introduces new tables (only in the patch that actually implements
>        the flows);
>      - nit: use MAX to shave off several lines of code;
>      - nit: remove redundant if_status_mgr_iface_get_mtu declaration from
>        a header file;
>      - rebased to latest main;
>      - updated acks based on latest review comments.
> v4: - fix compilation issue in the middle of the series due to previous
>        commit rearrangement.
> v5: - introduce a new if-status-mgr input to pflow_output, then process
>        OVS interface updates inside if-status-mgr handlers.
>      - if-status-mgr: remove set_mtu public function, instead other
>        modules should call to if_status_mgr_iface_update. This allows the
>        callers to not care which particular fields if-status-mgr would
>        like to persist from the OVS record, keeping concerns separated.
> v6: - update more tests in ovn-controller.at to reflect new table
>        numbers.
> v7: - remove ovs submodule bump that creeped in by mistake.
> ---
> 
> Ihar Hrachyshka (5):
>    Track ip version of tunnel in chassis_tunnel struct
>    Track interface MTU in if-status-mgr
>    if-status: track interfaces for additional chassis
>    Add new egress tables to accommodate for too-big packets handling
>    Implement MTU Path Discovery for multichassis ports
> 
>   NEWS                        |   6 +
>   controller/binding.c        |  48 +--
>   controller/binding.h        |   4 +
>   controller/if-status.c      |  48 ++-
>   controller/if-status.h      |   8 +-
>   controller/lflow.c          |   4 +-
>   controller/lflow.h          |  49 +--
>   controller/local_data.c     |   2 +
>   controller/local_data.h     |   1 +
>   controller/ovn-controller.c | 122 +++++++
>   controller/ovsport.c        |   9 +
>   controller/ovsport.h        |   2 +
>   controller/physical.c       | 337 ++++++++++++++++++--
>   controller/physical.h       |   2 +
>   controller/pinctrl.c        |   8 +-
>   include/ovn/actions.h       |   3 +
>   lib/actions.c               |   4 +-
>   lib/ovn-util.h              |   7 +
>   northd/northd.c             |   2 +
>   ovn-architecture.7.xml      |  76 ++---
>   tests/ovn-controller.at     | 298 +++++++++---------
>   tests/ovn.at                | 611 +++++++++++++++++++++++++++---------
>   tests/system-ovn-kmod.at    |   2 +-
>   tests/system-ovn.at         |   8 +-
>   24 files changed, 1241 insertions(+), 420 deletions(-)
>
Dumitru Ceara May 30, 2023, 8:52 p.m. UTC | #3
On 5/30/23 20:27, Mark Michelson wrote:
> Thanks Ihar and Dumitru.
> 
> I applied this series to main and branch-23.06. I encountered conflicts

Thanks, Mark!  Unfortunately, I had missed the fact that a new test had
been added in the meantime and that we had to update the table numbers
for that one too.  CI was failing because of that.

I posted a fix for it:

https://patchwork.ozlabs.org/project/ovn/patch/20230530205146.960485-1-dceara@redhat.com/

Regards,
Dumitru

> when trying to apply this to branch-23.03, specifically with patch 4. I
> suspect a lot of this is due to the fact that the tiered ACL changes are
> not present in branch-23.03, so
> 
> * The table numbers don't work out the same in some cases.
> * ACL flow actions are different prior to 23.06.
> 
> If you can post backport versions of the patches for 23.03 and below,
> that would be great, Ihar.
> 
> Thanks,
> Mark Michelson
> 
> On 5/23/23 17:55, Ihar Hrachyshka wrote:
>> This series fixes a non-optimal behavior with some multichassis ports.
>>
>> Specifically,
>>
>> - when a multichassis port belongs to a switch that also has a localnet
>>    port,
>> - because ingress and egress traffic for the port is funnelled through
>>    tunnels to guarantee delivery of packets to all chassis that host the
>>    port,
>> - because tunnels add overhead to each funnelled packet,
>> - leaving less space for actual packets,
>>
>> ... the effective MTU of the path for multichassis ports is reduced by
>> the size that takes the tunnel to wrap a packet around. (This depends on
>> the type of tunnel, also on IP version of the tunnel.)
>>
>> When this happens, the port and its peers are not notified about the
>> reduced MTU for the path to the port, effectively blackholing some of
>> the larger packets.
>>
>> This patch series makes OVN detect these scenarios and handle them as
>> follows:
>>
>> - for all multichassis ports, 4 flows are added - for inport and
>>    outport matches - to detect "too-big" IP packets;
>> - for all "too-big" packets, 2 flows are added to produce a ICMP
>>    Fragmentation Needed (for IPv4) or Too Big (for IPv6) packet and
>>    return it to the offending sender.
>>
>> The proposed implementation is isolated in ovn-controller. An
>> alternative would be to implement flows producing ICMP errors via the
>> existing Logical_Flow action icmp4_error, as is done for router ports.
>> In this case, the 4 flows that detect "too-big" packets would still have
>> to reside in ovn-controller because of limitations of the current OVN
>> port model, namely lack of `mtu` as an attribute of OVN logical port.
>>
>> Caveats: this works for IP traffic only. The party receiving the ICMP
>> error should handle it by temporarily lowering the MTU used to reach the
>> port. Behavior may depend on protocol implementation of the offending
>> peer as well as its networking stack configuration.
>>
>> This patch series was successfully tested with Linux kernel 6.0.8.
>>
>> This patch series is designed to simplify backporting process. It is
>> split in logical pieces to simplify cherry-picking. I consider the
>> original behavior described at the start of the cover letter a bug and
>> worth a backport.
>>
>> ---
>> v1: - initial version.
>> v2: - more system test cases adjusted to new table numbers for egress
>>        pipeline;
>>      - switch from xxd to coreutils (tr, head) tools to avoid a new
>>        dependency;
>>      - adjusted a comment in the new test case to avoid "migrator"
>>        terminology that makes no sense outside live migration context -
>>        which is not the only potential use case for multichassis ports;
>>      - guard the new test case with HAVE_SCAPY;
>>      - nit: rewrapped some lines in patch 6/7 to avoid unnecessary line
>>        changes;
>>      - added a NEWS entry for patch 6/7 that implements the core of the
>>        feature;
>>      - rebased to latest main.
>> v3: - expose and reuse lib/actions.c:encode_[start|finish]_controller_op
>>        functions;
>>      - reuse ETH_HEADER_LEN, IP_HEADER_LEN, and IPV6_HEADER_LEN from
>>        ovs:lib/packets.h;
>>      - add a comment for REGBIT_PKT_LARGER that mentions that the
>>        register is also used in OFTABLE_OUTPUT_LARGE_PKT_DETECT table;
>>      - squash the patch that makes if-status-mgr track changes to
>>        interface mtu into the patch that introduces the mtu field in the
>>        manager;
>>      - doc: don't describe path discovery flows in the patch that
>>        introduces new tables (only in the patch that actually implements
>>        the flows);
>>      - nit: use MAX to shave off several lines of code;
>>      - nit: remove redundant if_status_mgr_iface_get_mtu declaration from
>>        a header file;
>>      - rebased to latest main;
>>      - updated acks based on latest review comments.
>> v4: - fix compilation issue in the middle of the series due to previous
>>        commit rearrangement.
>> v5: - introduce a new if-status-mgr input to pflow_output, then process
>>        OVS interface updates inside if-status-mgr handlers.
>>      - if-status-mgr: remove set_mtu public function, instead other
>>        modules should call to if_status_mgr_iface_update. This allows the
>>        callers to not care which particular fields if-status-mgr would
>>        like to persist from the OVS record, keeping concerns separated.
>> v6: - update more tests in ovn-controller.at to reflect new table
>>        numbers.
>> v7: - remove ovs submodule bump that creeped in by mistake.
>> ---
>>
>> Ihar Hrachyshka (5):
>>    Track ip version of tunnel in chassis_tunnel struct
>>    Track interface MTU in if-status-mgr
>>    if-status: track interfaces for additional chassis
>>    Add new egress tables to accommodate for too-big packets handling
>>    Implement MTU Path Discovery for multichassis ports
>>
>>   NEWS                        |   6 +
>>   controller/binding.c        |  48 +--
>>   controller/binding.h        |   4 +
>>   controller/if-status.c      |  48 ++-
>>   controller/if-status.h      |   8 +-
>>   controller/lflow.c          |   4 +-
>>   controller/lflow.h          |  49 +--
>>   controller/local_data.c     |   2 +
>>   controller/local_data.h     |   1 +
>>   controller/ovn-controller.c | 122 +++++++
>>   controller/ovsport.c        |   9 +
>>   controller/ovsport.h        |   2 +
>>   controller/physical.c       | 337 ++++++++++++++++++--
>>   controller/physical.h       |   2 +
>>   controller/pinctrl.c        |   8 +-
>>   include/ovn/actions.h       |   3 +
>>   lib/actions.c               |   4 +-
>>   lib/ovn-util.h              |   7 +
>>   northd/northd.c             |   2 +
>>   ovn-architecture.7.xml      |  76 ++---
>>   tests/ovn-controller.at     | 298 +++++++++---------
>>   tests/ovn.at                | 611 +++++++++++++++++++++++++++---------
>>   tests/system-ovn-kmod.at    |   2 +-
>>   tests/system-ovn.at         |   8 +-
>>   24 files changed, 1241 insertions(+), 420 deletions(-)
>>
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ihar Hrachyshka May 30, 2023, 9:06 p.m. UTC | #4
On Tue, May 30, 2023 at 2:28 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> Thanks Ihar and Dumitru.
>
> I applied this series to main and branch-23.06. I encountered conflicts
> when trying to apply this to branch-23.03, specifically with patch 4. I
> suspect a lot of this is due to the fact that the tiered ACL changes are
> not present in branch-23.03, so
>
> * The table numbers don't work out the same in some cases.
> * ACL flow actions are different prior to 23.06.
>
> If you can post backport versions of the patches for 23.03 and below,
> that would be great, Ihar.

Should I backport down to LTS (22.03)?

>
> Thanks,
> Mark Michelson
>
> On 5/23/23 17:55, Ihar Hrachyshka wrote:
> > This series fixes a non-optimal behavior with some multichassis ports.
> >
> > Specifically,
> >
> > - when a multichassis port belongs to a switch that also has a localnet
> >    port,
> > - because ingress and egress traffic for the port is funnelled through
> >    tunnels to guarantee delivery of packets to all chassis that host the
> >    port,
> > - because tunnels add overhead to each funnelled packet,
> > - leaving less space for actual packets,
> >
> > ... the effective MTU of the path for multichassis ports is reduced by
> > the size that takes the tunnel to wrap a packet around. (This depends on
> > the type of tunnel, also on IP version of the tunnel.)
> >
> > When this happens, the port and its peers are not notified about the
> > reduced MTU for the path to the port, effectively blackholing some of
> > the larger packets.
> >
> > This patch series makes OVN detect these scenarios and handle them as
> > follows:
> >
> > - for all multichassis ports, 4 flows are added - for inport and
> >    outport matches - to detect "too-big" IP packets;
> > - for all "too-big" packets, 2 flows are added to produce a ICMP
> >    Fragmentation Needed (for IPv4) or Too Big (for IPv6) packet and
> >    return it to the offending sender.
> >
> > The proposed implementation is isolated in ovn-controller. An
> > alternative would be to implement flows producing ICMP errors via the
> > existing Logical_Flow action icmp4_error, as is done for router ports.
> > In this case, the 4 flows that detect "too-big" packets would still have
> > to reside in ovn-controller because of limitations of the current OVN
> > port model, namely lack of `mtu` as an attribute of OVN logical port.
> >
> > Caveats: this works for IP traffic only. The party receiving the ICMP
> > error should handle it by temporarily lowering the MTU used to reach the
> > port. Behavior may depend on protocol implementation of the offending
> > peer as well as its networking stack configuration.
> >
> > This patch series was successfully tested with Linux kernel 6.0.8.
> >
> > This patch series is designed to simplify backporting process. It is
> > split in logical pieces to simplify cherry-picking. I consider the
> > original behavior described at the start of the cover letter a bug and
> > worth a backport.
> >
> > ---
> > v1: - initial version.
> > v2: - more system test cases adjusted to new table numbers for egress
> >        pipeline;
> >      - switch from xxd to coreutils (tr, head) tools to avoid a new
> >        dependency;
> >      - adjusted a comment in the new test case to avoid "migrator"
> >        terminology that makes no sense outside live migration context -
> >        which is not the only potential use case for multichassis ports;
> >      - guard the new test case with HAVE_SCAPY;
> >      - nit: rewrapped some lines in patch 6/7 to avoid unnecessary line
> >        changes;
> >      - added a NEWS entry for patch 6/7 that implements the core of the
> >        feature;
> >      - rebased to latest main.
> > v3: - expose and reuse lib/actions.c:encode_[start|finish]_controller_op
> >        functions;
> >      - reuse ETH_HEADER_LEN, IP_HEADER_LEN, and IPV6_HEADER_LEN from
> >        ovs:lib/packets.h;
> >      - add a comment for REGBIT_PKT_LARGER that mentions that the
> >        register is also used in OFTABLE_OUTPUT_LARGE_PKT_DETECT table;
> >      - squash the patch that makes if-status-mgr track changes to
> >        interface mtu into the patch that introduces the mtu field in the
> >        manager;
> >      - doc: don't describe path discovery flows in the patch that
> >        introduces new tables (only in the patch that actually implements
> >        the flows);
> >      - nit: use MAX to shave off several lines of code;
> >      - nit: remove redundant if_status_mgr_iface_get_mtu declaration from
> >        a header file;
> >      - rebased to latest main;
> >      - updated acks based on latest review comments.
> > v4: - fix compilation issue in the middle of the series due to previous
> >        commit rearrangement.
> > v5: - introduce a new if-status-mgr input to pflow_output, then process
> >        OVS interface updates inside if-status-mgr handlers.
> >      - if-status-mgr: remove set_mtu public function, instead other
> >        modules should call to if_status_mgr_iface_update. This allows the
> >        callers to not care which particular fields if-status-mgr would
> >        like to persist from the OVS record, keeping concerns separated.
> > v6: - update more tests in ovn-controller.at to reflect new table
> >        numbers.
> > v7: - remove ovs submodule bump that creeped in by mistake.
> > ---
> >
> > Ihar Hrachyshka (5):
> >    Track ip version of tunnel in chassis_tunnel struct
> >    Track interface MTU in if-status-mgr
> >    if-status: track interfaces for additional chassis
> >    Add new egress tables to accommodate for too-big packets handling
> >    Implement MTU Path Discovery for multichassis ports
> >
> >   NEWS                        |   6 +
> >   controller/binding.c        |  48 +--
> >   controller/binding.h        |   4 +
> >   controller/if-status.c      |  48 ++-
> >   controller/if-status.h      |   8 +-
> >   controller/lflow.c          |   4 +-
> >   controller/lflow.h          |  49 +--
> >   controller/local_data.c     |   2 +
> >   controller/local_data.h     |   1 +
> >   controller/ovn-controller.c | 122 +++++++
> >   controller/ovsport.c        |   9 +
> >   controller/ovsport.h        |   2 +
> >   controller/physical.c       | 337 ++++++++++++++++++--
> >   controller/physical.h       |   2 +
> >   controller/pinctrl.c        |   8 +-
> >   include/ovn/actions.h       |   3 +
> >   lib/actions.c               |   4 +-
> >   lib/ovn-util.h              |   7 +
> >   northd/northd.c             |   2 +
> >   ovn-architecture.7.xml      |  76 ++---
> >   tests/ovn-controller.at     | 298 +++++++++---------
> >   tests/ovn.at                | 611 +++++++++++++++++++++++++++---------
> >   tests/system-ovn-kmod.at    |   2 +-
> >   tests/system-ovn.at         |   8 +-
> >   24 files changed, 1241 insertions(+), 420 deletions(-)
> >
>
Ihar Hrachyshka May 30, 2023, 9:53 p.m. UTC | #5
On Tue, May 30, 2023 at 5:06 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>
> On Tue, May 30, 2023 at 2:28 PM Mark Michelson <mmichels@redhat.com> wrote:
> >
> > Thanks Ihar and Dumitru.
> >
> > I applied this series to main and branch-23.06. I encountered conflicts
> > when trying to apply this to branch-23.03, specifically with patch 4. I
> > suspect a lot of this is due to the fact that the tiered ACL changes are
> > not present in branch-23.03, so
> >
> > * The table numbers don't work out the same in some cases.
> > * ACL flow actions are different prior to 23.06.
> >
> > If you can post backport versions of the patches for 23.03 and below,
> > that would be great, Ihar.
>
> Should I backport down to LTS (22.03)?
>

Sorry, I mean down to 22.06 (non-LTS) because the first release with
multichassis ports is 22.06.

> >
> > Thanks,
> > Mark Michelson
> >
> > On 5/23/23 17:55, Ihar Hrachyshka wrote:
> > > This series fixes a non-optimal behavior with some multichassis ports.
> > >
> > > Specifically,
> > >
> > > - when a multichassis port belongs to a switch that also has a localnet
> > >    port,
> > > - because ingress and egress traffic for the port is funnelled through
> > >    tunnels to guarantee delivery of packets to all chassis that host the
> > >    port,
> > > - because tunnels add overhead to each funnelled packet,
> > > - leaving less space for actual packets,
> > >
> > > ... the effective MTU of the path for multichassis ports is reduced by
> > > the size that takes the tunnel to wrap a packet around. (This depends on
> > > the type of tunnel, also on IP version of the tunnel.)
> > >
> > > When this happens, the port and its peers are not notified about the
> > > reduced MTU for the path to the port, effectively blackholing some of
> > > the larger packets.
> > >
> > > This patch series makes OVN detect these scenarios and handle them as
> > > follows:
> > >
> > > - for all multichassis ports, 4 flows are added - for inport and
> > >    outport matches - to detect "too-big" IP packets;
> > > - for all "too-big" packets, 2 flows are added to produce a ICMP
> > >    Fragmentation Needed (for IPv4) or Too Big (for IPv6) packet and
> > >    return it to the offending sender.
> > >
> > > The proposed implementation is isolated in ovn-controller. An
> > > alternative would be to implement flows producing ICMP errors via the
> > > existing Logical_Flow action icmp4_error, as is done for router ports.
> > > In this case, the 4 flows that detect "too-big" packets would still have
> > > to reside in ovn-controller because of limitations of the current OVN
> > > port model, namely lack of `mtu` as an attribute of OVN logical port.
> > >
> > > Caveats: this works for IP traffic only. The party receiving the ICMP
> > > error should handle it by temporarily lowering the MTU used to reach the
> > > port. Behavior may depend on protocol implementation of the offending
> > > peer as well as its networking stack configuration.
> > >
> > > This patch series was successfully tested with Linux kernel 6.0.8.
> > >
> > > This patch series is designed to simplify backporting process. It is
> > > split in logical pieces to simplify cherry-picking. I consider the
> > > original behavior described at the start of the cover letter a bug and
> > > worth a backport.
> > >
> > > ---
> > > v1: - initial version.
> > > v2: - more system test cases adjusted to new table numbers for egress
> > >        pipeline;
> > >      - switch from xxd to coreutils (tr, head) tools to avoid a new
> > >        dependency;
> > >      - adjusted a comment in the new test case to avoid "migrator"
> > >        terminology that makes no sense outside live migration context -
> > >        which is not the only potential use case for multichassis ports;
> > >      - guard the new test case with HAVE_SCAPY;
> > >      - nit: rewrapped some lines in patch 6/7 to avoid unnecessary line
> > >        changes;
> > >      - added a NEWS entry for patch 6/7 that implements the core of the
> > >        feature;
> > >      - rebased to latest main.
> > > v3: - expose and reuse lib/actions.c:encode_[start|finish]_controller_op
> > >        functions;
> > >      - reuse ETH_HEADER_LEN, IP_HEADER_LEN, and IPV6_HEADER_LEN from
> > >        ovs:lib/packets.h;
> > >      - add a comment for REGBIT_PKT_LARGER that mentions that the
> > >        register is also used in OFTABLE_OUTPUT_LARGE_PKT_DETECT table;
> > >      - squash the patch that makes if-status-mgr track changes to
> > >        interface mtu into the patch that introduces the mtu field in the
> > >        manager;
> > >      - doc: don't describe path discovery flows in the patch that
> > >        introduces new tables (only in the patch that actually implements
> > >        the flows);
> > >      - nit: use MAX to shave off several lines of code;
> > >      - nit: remove redundant if_status_mgr_iface_get_mtu declaration from
> > >        a header file;
> > >      - rebased to latest main;
> > >      - updated acks based on latest review comments.
> > > v4: - fix compilation issue in the middle of the series due to previous
> > >        commit rearrangement.
> > > v5: - introduce a new if-status-mgr input to pflow_output, then process
> > >        OVS interface updates inside if-status-mgr handlers.
> > >      - if-status-mgr: remove set_mtu public function, instead other
> > >        modules should call to if_status_mgr_iface_update. This allows the
> > >        callers to not care which particular fields if-status-mgr would
> > >        like to persist from the OVS record, keeping concerns separated.
> > > v6: - update more tests in ovn-controller.at to reflect new table
> > >        numbers.
> > > v7: - remove ovs submodule bump that creeped in by mistake.
> > > ---
> > >
> > > Ihar Hrachyshka (5):
> > >    Track ip version of tunnel in chassis_tunnel struct
> > >    Track interface MTU in if-status-mgr
> > >    if-status: track interfaces for additional chassis
> > >    Add new egress tables to accommodate for too-big packets handling
> > >    Implement MTU Path Discovery for multichassis ports
> > >
> > >   NEWS                        |   6 +
> > >   controller/binding.c        |  48 +--
> > >   controller/binding.h        |   4 +
> > >   controller/if-status.c      |  48 ++-
> > >   controller/if-status.h      |   8 +-
> > >   controller/lflow.c          |   4 +-
> > >   controller/lflow.h          |  49 +--
> > >   controller/local_data.c     |   2 +
> > >   controller/local_data.h     |   1 +
> > >   controller/ovn-controller.c | 122 +++++++
> > >   controller/ovsport.c        |   9 +
> > >   controller/ovsport.h        |   2 +
> > >   controller/physical.c       | 337 ++++++++++++++++++--
> > >   controller/physical.h       |   2 +
> > >   controller/pinctrl.c        |   8 +-
> > >   include/ovn/actions.h       |   3 +
> > >   lib/actions.c               |   4 +-
> > >   lib/ovn-util.h              |   7 +
> > >   northd/northd.c             |   2 +
> > >   ovn-architecture.7.xml      |  76 ++---
> > >   tests/ovn-controller.at     | 298 +++++++++---------
> > >   tests/ovn.at                | 611 +++++++++++++++++++++++++++---------
> > >   tests/system-ovn-kmod.at    |   2 +-
> > >   tests/system-ovn.at         |   8 +-
> > >   24 files changed, 1241 insertions(+), 420 deletions(-)
> > >
> >
Ihar Hrachyshka May 31, 2023, 8:13 p.m. UTC | #6
On Tue, May 30, 2023 at 5:53 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>
> On Tue, May 30, 2023 at 5:06 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
> >
> > On Tue, May 30, 2023 at 2:28 PM Mark Michelson <mmichels@redhat.com> wrote:
> > >
> > > Thanks Ihar and Dumitru.
> > >
> > > I applied this series to main and branch-23.06. I encountered conflicts
> > > when trying to apply this to branch-23.03, specifically with patch 4. I
> > > suspect a lot of this is due to the fact that the tiered ACL changes are
> > > not present in branch-23.03, so
> > >
> > > * The table numbers don't work out the same in some cases.
> > > * ACL flow actions are different prior to 23.06.
> > >
> > > If you can post backport versions of the patches for 23.03 and below,
> > > that would be great, Ihar.
> >
> > Should I backport down to LTS (22.03)?
> >
>
> Sorry, I mean down to 22.06 (non-LTS) because the first release with
> multichassis ports is 22.06.
>

UPD I've posted backports down to 22.09. As for 22.06, this branch
seems to lack a number of other bug fixes that make the backporting
process more complex. I will assume that it's not required to
backported to all the branches, and I will leave this last 22.06
branch without the backport. Unless someone confirms this is required.
(Note that LTS is not affected because it didn't have the feature, so
it's not necessary to backport to 22.06 just for the sake of transit
cherry-picking to LTS.)

Ihar