diff mbox series

[ovs-dev] northd: Allow Load balancer to be attached to router with multiple GW ports

Message ID 20240717090334.3957-1-priyankar.jain@nutanix.com
State New
Headers show
Series [ovs-dev] northd: Allow Load balancer to be attached to router with multiple GW ports | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Priyankar Jain July 17, 2024, 9:03 a.m. UTC
This commit removes the restriction of support LB for router with only
<= 1 Distributed gateway ports.
Added datapath and logical flows validation cases.

Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
---
 northd/en-lr-stateful.c |  12 ---
 northd/northd.c         |  12 +--
 tests/ovn-northd.at     |  86 +++++++++++++++++++++
 tests/ovn.at            | 167 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 260 insertions(+), 17 deletions(-)

Comments

0-day Robot July 17, 2024, 9:21 a.m. UTC | #1
References:  <20240717090334.3957-1-priyankar.jain@nutanix.com>
 

Bleep bloop.  Greetings Priyankar Jain, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject, '<area>: <summary>', is over 70 characters, i.e., 75.
WARNING: The subject summary should end with a dot.
Subject: northd: Allow Load balancer to be attached to router with multiple GW ports
Lines checked: 352, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Numan Siddique July 30, 2024, 11:16 p.m. UTC | #2
On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
<priyankar.jain@nutanix.com> wrote:
>
> This commit removes the restriction of support LB for router with only
> <= 1 Distributed gateway ports.
> Added datapath and logical flows validation cases.
>
> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>

Hi Priyankar,

Thanks for the patch.

Can you please tell me the use case ?

Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
(20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
gw ports are bound to chassis ch1, ch2 and ch3 respectively.

Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.

If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
the chassis ch1 and so on for other switches.

This patch generates the below logical flows in the router pipeline
for load balancer lb1, lb2 and lb3

  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw1")),
action=(ct_lb_mark(backends=20.0.0.40:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw2")),
action=(ct_lb_mark(backends=20.0.0.40:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw3")),
action=(ct_lb_mark(backends=20.0.0.40:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw1")),
action=(ct_lb_mark(backends=30.0.0.50:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw2")),
action=(ct_lb_mark(backends=30.0.0.50:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw3")),
action=(ct_lb_mark(backends=30.0.0.50:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw1")),
action=(ct_lb_mark(backends=10.0.0.60:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw2")),
action=(ct_lb_mark(backends=10.0.0.60:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw3")),
action=(ct_lb_mark(backends=10.0.0.60:80);)

If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
then ideally  the routing will be handled in ch1 since lr1-sw1 is
bound on ch1.

Now with the above logical flows,  load balancing could happen in the
source chassis itself. Is that ok ?

Shouldn't this patch generate the flows like this for lb1, lb2 and lb3 ?

  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw1")),
action=(ct_lb_mark(backends=20.0.0.40:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw2")),
action=(ct_lb_mark(backends=30.0.0.50:80);)
  table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
!ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
is_chassis_resident("cr-lr0-sw3")),
action=(ct_lb_mark(backends=10.0.0.60:80);)

So that the LB NAT happens on the chassis where the distributed
gateway port for the LB vip subnet resides ?

Thanks
Numan





> ---
>  northd/en-lr-stateful.c |  12 ---
>  northd/northd.c         |  12 +--
>  tests/ovn-northd.at     |  86 +++++++++++++++++++++
>  tests/ovn.at            | 167 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 260 insertions(+), 17 deletions(-)
>
> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
> index baf1bd2f8..f09691af6 100644
> --- a/northd/en-lr-stateful.c
> +++ b/northd/en-lr-stateful.c
> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct lr_stateful_table *table,
>
>      table->array[od->index] = lr_stateful_rec;
>
> -    /* Load balancers are not supported (yet) if a logical router has multiple
> -     * distributed gateway port.  Log a warning. */
> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od)) {
> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
> -                     "router %s, which has %"PRIuSIZE" distributed "
> -                     "gateway ports. Load-balancer is not supported "
> -                     "yet when there is more than one distributed "
> -                     "gateway port on the router.",
> -                     od->nbr->name, od->n_l3dgw_ports);
> -    }
> -
>      return lr_stateful_rec;
>  }
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 6898daa00..e6f53f361 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -11026,10 +11026,9 @@ static void
>  build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
>                                       enum lrouter_nat_lb_flow_type type,
>                                       struct ovn_datapath *od,
> +                                     struct ovn_port *dgp,
>                                       struct lflow_ref *lflow_ref)
>  {
> -    struct ovn_port *dgp = od->l3dgw_ports[0];
> -
>      const char *undnat_action;
>
>      switch (type) {
> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
>
>      if (ctx->lb_vip->n_backends || !ctx->lb_vip->empty_backend_rej) {
>          ds_put_format(ctx->new_match, " && is_chassis_resident(%s)",
> -                      od->l3dgw_ports[0]->cr_port->json_key);
> +                      dgp->cr_port->json_key);
>      }
>
>      ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT, ctx->prio,
> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
>          if (!od->n_l3dgw_ports) {
>              bitmap_set1(gw_dp_bitmap[type], index);
>          } else {
> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> -                                                 lb_dps->lflow_ref);
> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
> +                struct ovn_port *dgp = od->l3dgw_ports[i];
> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od, dgp,
> +                                                     lb_dps->lflow_ref);
> +            }
>          }
>
>          if (lb->affinity_timeout) {
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index a389d1988..5be48f49e 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d
>
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD_NO_HV([
> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
> +ovn_start
> +
> +# Logical network:
> +# 1 Logical Router, 3 bridged Logical Switches,
> +# 1 gateway chassis attached to each corresponding LRP.
> +# LB added attached to DR
> +#
> +#                | S1 (gw1)
> +#                |
> +#      ls  ----  DR -- S3 (gw3)
> +# (20.0.0.0/24)  |
> +#                | S2 (gw2)
> +#
> +# Validate basic LB logical flows.
> +
> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
> +
> +check ovn-nbctl lr-add DR
> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
> +
> +check ovn-nbctl ls-add S1
> +check ovn-nbctl lsp-add S1 S1-DR
> +check ovn-nbctl lsp-set-type S1-DR router
> +check ovn-nbctl lsp-set-addresses S1-DR router
> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
> +
> +check ovn-nbctl ls-add S2
> +check ovn-nbctl lsp-add S2 S2-DR
> +check ovn-nbctl lsp-set-type S2-DR router
> +check ovn-nbctl lsp-set-addresses S2-DR router
> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
> +
> +check ovn-nbctl ls-add S3
> +check ovn-nbctl lsp-add S3 S3-DR
> +check ovn-nbctl lsp-set-type S3-DR router
> +check ovn-nbctl lsp-set-addresses S3-DR router
> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
> +
> +check ovn-nbctl ls-add  ls
> +check ovn-nbctl lsp-add ls ls-DR
> +check ovn-nbctl lsp-set-type ls-DR router
> +check ovn-nbctl lsp-set-addresses ls-DR router
> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
> +
> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
> +
> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
> +check ovn-nbctl lr-lb-add DR lb-1
> +
> +check ovn-nbctl --wait=sb sync
> +
> +ovn-sbctl dump-flows DR > lrflows
> +AT_CAPTURE_FILE([lrflows])
> +
> +# Check the flows in lr_in_dnat stage
> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR | ovn_strip_lflows], [0], [dnl
> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
> +])
> +# Check the flows in lr_in_gw_redirect stage
> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep cr-DR | ovn_strip_lflows], [0], [dnl
> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
> +])
> +# Check the flows in lr_out_undnat stage
> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR | ovn_strip_lflows], [0], [dnl
> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S1" || outport == "DR-S1") && is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S2" || outport == "DR-S2") && is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S3" || outport == "DR-S3") && is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
> +])
> +
> +AT_CLEANUP
> +])
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 185ba4a21..8e8c102c0 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
>
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([Multiple DGP and LB traffic])
> +AT_KEYWORDS([dgp-lb])
> +AT_SKIP_IF([test $HAVE_SCAPY = no])
> +ovn_start
> +
> +# Logical network:
> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
> +# 1 gateway chassis attached to each corresponding LRP.
> +# LB added attached to DR
> +#
> +#                | public (gw1) (172.168.0.0/24)
> +#                |
> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
> +# (10.0.0.0/24)
> +#
> +# Routes (lr0):
> +#
> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
> +#   default      ----> 172.168.0.1 (public)
> +#
> +#
> +# Validate Traffic from public to LB and its response.
> +# Validate traffic from public2 to LB and its response.
> +
> +test_ip_req_packet() {
> +    local src_mac="$1"
> +    local dst_mac="$2"
> +    local src_ip="$3"
> +    local dst_ip="$4"
> +    local sport=$5
> +    local iface=$6
> +
> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> +                    UDP(sport=${sport}, dport=4369)")
> +
> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
> +}
> +
> +test_ip_rep_packet() {
> +    local src_mac="$1"
> +    local dst_mac="$2"
> +    local src_ip="$3"
> +    local dst_ip="$4"
> +    local dport=$5
> +
> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> +                    UDP(sport=4369, dport=${dport})")
> +
> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +}
> +
> +net_add n
> +
> +sim_add hv1
> +as hv1
> +check ovs-vsctl add-br br-phys
> +ovn_attach n br-phys 192.168.0.1
> +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
> +    options:tx_pcap=hv1/vif1-tx.pcap \
> +    options:rxq_pcap=hv1/vif1-rx.pcap
> +
> +sim_add hv2
> +as hv2
> +check ovs-vsctl add-br br-phys
> +ovn_attach n br-phys 192.168.0.2
> +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
> +    options:tx_pcap=hv2/vif1-tx.pcap \
> +    options:rxq_pcap=hv2/vif1-rx.pcap
> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
> +    options:tx_pcap=hv2/vif2-tx.pcap \
> +    options:rxq_pcap=hv2/vif2-rx.pcap
> +
> +check ovn-nbctl ls-add sw0
> +check ovn-nbctl lsp-add sw0 sw0-port1
> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 10.0.0.2"
> +
> +check ovn-nbctl ls-add public
> +check ovn-nbctl lsp-add public ln-public
> +check ovn-nbctl lsp-set-type ln-public localnet
> +check ovn-nbctl lsp-set-addresses ln-public unknown
> +check ovn-nbctl lsp-set-options ln-public network_name=phys
> +check ovn-nbctl lsp-add public public-port1
> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88 172.168.0.200"
> +
> +check ovn-nbctl ls-add public2
> +check ovn-nbctl lsp-add public2 ln-public2
> +check ovn-nbctl lsp-set-type ln-public2 localnet
> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
> +check ovn-nbctl lsp-add public2 public2-port1
> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99 173.168.0.200"
> +
> +check ovn-nbctl lr-add lr0
> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> +check ovn-nbctl lsp-add sw0 sw0-lr0
> +check ovn-nbctl lsp-set-type sw0-lr0 router
> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> +
> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.1/24
> +check ovn-nbctl lsp-add public public-lr0
> +check ovn-nbctl lsp-set-type public-lr0 router
> +check ovn-nbctl lsp-set-addresses public-lr0 router
> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
> +
> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14 173.168.0.1/24
> +check ovn-nbctl lsp-add public2 public2-lr0
> +check ovn-nbctl lsp-set-type public2-lr0 router
> +check ovn-nbctl lsp-set-addresses public2-lr0 router
> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
> +
> +
> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
> +
> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
> +
> +wait_for_ports_up
> +OVN_POPULATE_ARP
> +
> +
> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
> +check ovn-nbctl lr-lb-add lr0 lb0
> +
> +ovn-sbctl lflow-list lr0
> +
> +# send UDP request to the load-balancer VIP from public switch 1
> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13" "172.168.0.200" "172.168.0.10" 53 hv2-vif1
> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | wc -l) -ge 1])
> +# send UDP reply from sw0-port1
> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2" "172.168.0.200" 53
> +# packet sent by the load balancer VIP
> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88', src='00:00:20:20:12:13')/
> +                      IP(dst='172.168.0.200', src='172.168.0.10', ttl=63)/ \
> +                      UDP(sport=4369, dport=53)")
> +echo $packet > expected
> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
> +
> +# send UDP request to the load-balancer VIP from public switch 2
> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14" "173.168.0.200" "172.168.0.10" 54 hv2-vif2
> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | wc -l) -ge 1])
> +# send UDP reply from sw0-port1
> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2" "173.168.0.200" 54
> +# packet sent by the load balancer VIP
> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99', src='00:00:20:20:12:14')/
> +                      IP(dst='173.168.0.200', src='172.168.0.10', ttl=63)/ \
> +                      UDP(sport=4369, dport=54)")
> +echo $packet > expected
> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
> +
> +
> +OVN_CLEANUP([hv1],[hv2])
> +AT_CLEANUP
> +])
> --
> 2.39.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Priyankar Jain Aug. 6, 2024, 10:57 a.m. UTC | #3
Hi,
Please find my reply inline.

Thanks,
Priyankar

On 31/07/24 4:46 am, Numan Siddique wrote:
> On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
> <priyankar.jain@nutanix.com> wrote:
>> This commit removes the restriction of support LB for router with only
>> <= 1 Distributed gateway ports.
>> Added datapath and logical flows validation cases.
>>
>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
> Hi Priyankar,
>
> Thanks for the patch.
>
> Can you please tell me the use case ?
>
> Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
> (20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
> gw ports are bound to chassis ch1, ch2 and ch3 respectively.
>
> Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
> 20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
> lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.
>
> If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
> the chassis ch1 and so on for other switches.
>
> This patch generates the below logical flows in the router pipeline
> for load balancer lb1, lb2 and lb3
>
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw1")),
> action=(ct_lb_mark(backends=20.0.0.40:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw2")),
> action=(ct_lb_mark(backends=20.0.0.40:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw3")),
> action=(ct_lb_mark(backends=20.0.0.40:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw1")),
> action=(ct_lb_mark(backends=30.0.0.50:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw2")),
> action=(ct_lb_mark(backends=30.0.0.50:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw3")),
> action=(ct_lb_mark(backends=30.0.0.50:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw1")),
> action=(ct_lb_mark(backends=10.0.0.60:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw2")),
> action=(ct_lb_mark(backends=10.0.0.60:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw3")),
> action=(ct_lb_mark(backends=10.0.0.60:80);)
>
> If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
> then ideally  the routing will be handled in ch1 since lr1-sw1 is
> bound on ch1.
>
> Now with the above logical flows,  load balancing could happen in the
> source chassis itself. Is that ok ?
In our usecase, sw2 is not expected to handle load balancing for
10.0.0.0/24 CIDR. it is expected to handle LB only for 20.0.0.0/24
(Note: LB is not attached to the logical switch itself, since the
same LS can be connected to multiple such LRs.)

                  | S1 (gw1) [10.0.0.0/24]
                  |
       ls1  ----  LR -- S2 (gw2) [20.0.0.0/24]
  (40.0.0.0/24)

LB1: 10.0.0.40:80 --> 40.0.0.10:80, 40.0.0.20:80
LB2: 30.0.0.40:80 --> 40.0.0.30:80, 40.0.0.40:80

Routes:
10.0.0.0/24 next-hop via S1
default next-hop via S2


In our usecase, backends are only present on ls1.
S1 and S2 does not have any backends. Similarly,
S1 needs to handle the traffic only for LB: 10.0.0.40
S2 needs to handle the traffic only for LB: 30.0.0.40

Hence, ingress coming from S2 happens on gw2 during the
request part. and egress again happens on gw2 (due to
above routes) for the response.

Although, the patch does not handle all the usecases, but with
existing OVN impl, it does not even program any LB rules.
Hence, this patch acheives the above restricted use-case.
Let me know, if i need to add some logs etc..

>
> Shouldn't this patch generate the flows like this for lb1, lb2 and lb3 ?
>
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw1")),
> action=(ct_lb_mark(backends=20.0.0.40:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw2")),
> action=(ct_lb_mark(backends=30.0.0.50:80);)
>    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-lr0-sw3")),
> action=(ct_lb_mark(backends=10.0.0.60:80);)
>
> So that the LB NAT happens on the chassis where the distributed
> gateway port for the LB vip subnet resides ?
The problem is how do we provide mapping of LB VIP and DGP? VIP need
not to belong to subnet's CIDR.
>
> Thanks
> Numan
>
>
>
>
>
>> ---
>>   northd/en-lr-stateful.c |  12 ---
>>   northd/northd.c         |  12 +--
>>   tests/ovn-northd.at     |  86 +++++++++++++++++++++
>>   tests/ovn.at            | 167 ++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 260 insertions(+), 17 deletions(-)
>>
>> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
>> index baf1bd2f8..f09691af6 100644
>> --- a/northd/en-lr-stateful.c
>> +++ b/northd/en-lr-stateful.c
>> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct lr_stateful_table *table,
>>
>>       table->array[od->index] = lr_stateful_rec;
>>
>> -    /* Load balancers are not supported (yet) if a logical router has multiple
>> -     * distributed gateway port.  Log a warning. */
>> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od)) {
>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
>> -                     "router %s, which has %"PRIuSIZE" distributed "
>> -                     "gateway ports. Load-balancer is not supported "
>> -                     "yet when there is more than one distributed "
>> -                     "gateway port on the router.",
>> -                     od->nbr->name, od->n_l3dgw_ports);
>> -    }
>> -
>>       return lr_stateful_rec;
>>   }
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 6898daa00..e6f53f361 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -11026,10 +11026,9 @@ static void
>>   build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
>>                                        enum lrouter_nat_lb_flow_type type,
>>                                        struct ovn_datapath *od,
>> +                                     struct ovn_port *dgp,
>>                                        struct lflow_ref *lflow_ref)
>>   {
>> -    struct ovn_port *dgp = od->l3dgw_ports[0];
>> -
>>       const char *undnat_action;
>>
>>       switch (type) {
>> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
>>
>>       if (ctx->lb_vip->n_backends || !ctx->lb_vip->empty_backend_rej) {
>>           ds_put_format(ctx->new_match, " && is_chassis_resident(%s)",
>> -                      od->l3dgw_ports[0]->cr_port->json_key);
>> +                      dgp->cr_port->json_key);
>>       }
>>
>>       ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT, ctx->prio,
>> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
>>           if (!od->n_l3dgw_ports) {
>>               bitmap_set1(gw_dp_bitmap[type], index);
>>           } else {
>> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
>> -                                                 lb_dps->lflow_ref);
>> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
>> +                struct ovn_port *dgp = od->l3dgw_ports[i];
>> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od, dgp,
>> +                                                     lb_dps->lflow_ref);
>> +            }
>>           }
>>
>>           if (lb->affinity_timeout) {
>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> index a389d1988..5be48f49e 100644
>> --- a/tests/ovn-northd.at
>> +++ b/tests/ovn-northd.at
>> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d
>>
>>   AT_CLEANUP
>>   ])
>> +
>> +OVN_FOR_EACH_NORTHD_NO_HV([
>> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
>> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
>> +ovn_start
>> +
>> +# Logical network:
>> +# 1 Logical Router, 3 bridged Logical Switches,
>> +# 1 gateway chassis attached to each corresponding LRP.
>> +# LB added attached to DR
>> +#
>> +#                | S1 (gw1)
>> +#                |
>> +#      ls  ----  DR -- S3 (gw3)
>> +# (20.0.0.0/24)  |
>> +#                | S2 (gw2)
>> +#
>> +# Validate basic LB logical flows.
>> +
>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
>> +
>> +check ovn-nbctl lr-add DR
>> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
>> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
>> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
>> +
>> +check ovn-nbctl ls-add S1
>> +check ovn-nbctl lsp-add S1 S1-DR
>> +check ovn-nbctl lsp-set-type S1-DR router
>> +check ovn-nbctl lsp-set-addresses S1-DR router
>> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
>> +
>> +check ovn-nbctl ls-add S2
>> +check ovn-nbctl lsp-add S2 S2-DR
>> +check ovn-nbctl lsp-set-type S2-DR router
>> +check ovn-nbctl lsp-set-addresses S2-DR router
>> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
>> +
>> +check ovn-nbctl ls-add S3
>> +check ovn-nbctl lsp-add S3 S3-DR
>> +check ovn-nbctl lsp-set-type S3-DR router
>> +check ovn-nbctl lsp-set-addresses S3-DR router
>> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
>> +
>> +check ovn-nbctl ls-add  ls
>> +check ovn-nbctl lsp-add ls ls-DR
>> +check ovn-nbctl lsp-set-type ls-DR router
>> +check ovn-nbctl lsp-set-addresses ls-DR router
>> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
>> +
>> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
>> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
>> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
>> +
>> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
>> +check ovn-nbctl lr-lb-add DR lb-1
>> +
>> +check ovn-nbctl --wait=sb sync
>> +
>> +ovn-sbctl dump-flows DR > lrflows
>> +AT_CAPTURE_FILE([lrflows])
>> +
>> +# Check the flows in lr_in_dnat stage
>> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR | ovn_strip_lflows], [0], [dnl
>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
>> +])
>> +# Check the flows in lr_in_gw_redirect stage
>> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep cr-DR | ovn_strip_lflows], [0], [dnl
>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
>> +])
>> +# Check the flows in lr_out_undnat stage
>> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR | ovn_strip_lflows], [0], [dnl
>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S1" || outport == "DR-S1") && is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S2" || outport == "DR-S2") && is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S3" || outport == "DR-S3") && is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
>> +])
>> +
>> +AT_CLEANUP
>> +])
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index 185ba4a21..8e8c102c0 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
>>
>>   AT_CLEANUP
>>   ])
>> +
>> +OVN_FOR_EACH_NORTHD([
>> +AT_SETUP([Multiple DGP and LB traffic])
>> +AT_KEYWORDS([dgp-lb])
>> +AT_SKIP_IF([test $HAVE_SCAPY = no])
>> +ovn_start
>> +
>> +# Logical network:
>> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
>> +# 1 gateway chassis attached to each corresponding LRP.
>> +# LB added attached to DR
>> +#
>> +#                | public (gw1) (172.168.0.0/24)
>> +#                |
>> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
>> +# (10.0.0.0/24)
>> +#
>> +# Routes (lr0):
>> +#
>> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
>> +#   default      ----> 172.168.0.1 (public)
>> +#
>> +#
>> +# Validate Traffic from public to LB and its response.
>> +# Validate traffic from public2 to LB and its response.
>> +
>> +test_ip_req_packet() {
>> +    local src_mac="$1"
>> +    local dst_mac="$2"
>> +    local src_ip="$3"
>> +    local dst_ip="$4"
>> +    local sport=$5
>> +    local iface=$6
>> +
>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>> +                    UDP(sport=${sport}, dport=4369)")
>> +
>> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
>> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
>> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
>> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
>> +}
>> +
>> +test_ip_rep_packet() {
>> +    local src_mac="$1"
>> +    local dst_mac="$2"
>> +    local src_ip="$3"
>> +    local dst_ip="$4"
>> +    local dport=$5
>> +
>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>> +                    UDP(sport=4369, dport=${dport})")
>> +
>> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +}
>> +
>> +net_add n
>> +
>> +sim_add hv1
>> +as hv1
>> +check ovs-vsctl add-br br-phys
>> +ovn_attach n br-phys 192.168.0.1
>> +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>> +    options:rxq_pcap=hv1/vif1-rx.pcap
>> +
>> +sim_add hv2
>> +as hv2
>> +check ovs-vsctl add-br br-phys
>> +ovn_attach n br-phys 192.168.0.2
>> +check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
>> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
>> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
>> +    options:tx_pcap=hv2/vif1-tx.pcap \
>> +    options:rxq_pcap=hv2/vif1-rx.pcap
>> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
>> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
>> +    options:tx_pcap=hv2/vif2-tx.pcap \
>> +    options:rxq_pcap=hv2/vif2-rx.pcap
>> +
>> +check ovn-nbctl ls-add sw0
>> +check ovn-nbctl lsp-add sw0 sw0-port1
>> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 10.0.0.2"
>> +
>> +check ovn-nbctl ls-add public
>> +check ovn-nbctl lsp-add public ln-public
>> +check ovn-nbctl lsp-set-type ln-public localnet
>> +check ovn-nbctl lsp-set-addresses ln-public unknown
>> +check ovn-nbctl lsp-set-options ln-public network_name=phys
>> +check ovn-nbctl lsp-add public public-port1
>> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88 172.168.0.200"
>> +
>> +check ovn-nbctl ls-add public2
>> +check ovn-nbctl lsp-add public2 ln-public2
>> +check ovn-nbctl lsp-set-type ln-public2 localnet
>> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
>> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
>> +check ovn-nbctl lsp-add public2 public2-port1
>> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99 173.168.0.200"
>> +
>> +check ovn-nbctl lr-add lr0
>> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
>> +check ovn-nbctl lsp-add sw0 sw0-lr0
>> +check ovn-nbctl lsp-set-type sw0-lr0 router
>> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
>> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
>> +
>> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.1/24
>> +check ovn-nbctl lsp-add public public-lr0
>> +check ovn-nbctl lsp-set-type public-lr0 router
>> +check ovn-nbctl lsp-set-addresses public-lr0 router
>> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
>> +
>> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14 173.168.0.1/24
>> +check ovn-nbctl lsp-add public2 public2-lr0
>> +check ovn-nbctl lsp-set-type public2-lr0 router
>> +check ovn-nbctl lsp-set-addresses public2-lr0 router
>> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
>> +
>> +
>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
>> +
>> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
>> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
>> +
>> +wait_for_ports_up
>> +OVN_POPULATE_ARP
>> +
>> +
>> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
>> +check ovn-nbctl lr-lb-add lr0 lb0
>> +
>> +ovn-sbctl lflow-list lr0
>> +
>> +# send UDP request to the load-balancer VIP from public switch 1
>> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13" "172.168.0.200" "172.168.0.10" 53 hv2-vif1
>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | wc -l) -ge 1])
>> +# send UDP reply from sw0-port1
>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2" "172.168.0.200" 53
>> +# packet sent by the load balancer VIP
>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88', src='00:00:20:20:12:13')/
>> +                      IP(dst='172.168.0.200', src='172.168.0.10', ttl=63)/ \
>> +                      UDP(sport=4369, dport=53)")
>> +echo $packet > expected
>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
>> +
>> +# send UDP request to the load-balancer VIP from public switch 2
>> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14" "173.168.0.200" "172.168.0.10" 54 hv2-vif2
>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | wc -l) -ge 1])
>> +# send UDP reply from sw0-port1
>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2" "173.168.0.200" 54
>> +# packet sent by the load balancer VIP
>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99', src='00:00:20:20:12:14')/
>> +                      IP(dst='173.168.0.200', src='172.168.0.10', ttl=63)/ \
>> +                      UDP(sport=4369, dport=54)")
>> +echo $packet > expected
>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
>> +
>> +
>> +OVN_CLEANUP([hv1],[hv2])
>> +AT_CLEANUP
>> +])
>> --
>> 2.39.2
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=MK6dYyDJXEqoWiI5_CHJlLMBqkveZAbXAkd1nboKnOQttNKUVr7bMurJRHckcEve&s=Z4TvyW108zN_uD5A6r4R0AcyVSdmfR3GVvWLssPYumY&e=
>>
Roberto Bartzen Acosta Aug. 6, 2024, 12:45 p.m. UTC | #4
Hi Priyankar,

So, what is the difference between the patch and the one already proposed
[1]?

The change you are proposing has already been proposed before, and in
discussion with Mark we came to the conclusion that this breaks the DNAT
case when using multiple chassis. So, I pushed the v4 [2][3] with Stateless
NAT implementation to solve this problem.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411981.html
[2] https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415977.html
[3]
https://github.com/ovsrobot/ovn/commit/9379c81152df2bcd0626269988efb8ac36d4f624

Thanks,
Roberto

Em ter., 6 de ago. de 2024 às 07:57, Priyankar Jain <
priyankar.jain@nutanix.com> escreveu:

> Hi,
> Please find my reply inline.
>
> Thanks,
> Priyankar
>
> On 31/07/24 4:46 am, Numan Siddique wrote:
> > On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
> > <priyankar.jain@nutanix.com> wrote:
> >> This commit removes the restriction of support LB for router with only
> >> <= 1 Distributed gateway ports.
> >> Added datapath and logical flows validation cases.
> >>
> >> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
> > Hi Priyankar,
> >
> > Thanks for the patch.
> >
> > Can you please tell me the use case ?
> >
> > Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
> > (20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
> > gw ports are bound to chassis ch1, ch2 and ch3 respectively.
> >
> > Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
> > 20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
> > lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.
> >
> > If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
> > the chassis ch1 and so on for other switches.
> >
> > This patch generates the below logical flows in the router pipeline
> > for load balancer lb1, lb2 and lb3
> >
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw1")),
> > action=(ct_lb_mark(backends=20.0.0.40:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw2")),
> > action=(ct_lb_mark(backends=20.0.0.40:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw3")),
> > action=(ct_lb_mark(backends=20.0.0.40:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw1")),
> > action=(ct_lb_mark(backends=30.0.0.50:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw2")),
> > action=(ct_lb_mark(backends=30.0.0.50:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw3")),
> > action=(ct_lb_mark(backends=30.0.0.50:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw1")),
> > action=(ct_lb_mark(backends=10.0.0.60:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw2")),
> > action=(ct_lb_mark(backends=10.0.0.60:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw3")),
> > action=(ct_lb_mark(backends=10.0.0.60:80);)
> >
> > If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
> > then ideally  the routing will be handled in ch1 since lr1-sw1 is
> > bound on ch1.
> >
> > Now with the above logical flows,  load balancing could happen in the
> > source chassis itself. Is that ok ?
> In our usecase, sw2 is not expected to handle load balancing for
> 10.0.0.0/24 CIDR. it is expected to handle LB only for 20.0.0.0/24
> (Note: LB is not attached to the logical switch itself, since the
> same LS can be connected to multiple such LRs.)
>
>                   | S1 (gw1) [10.0.0.0/24]
>                   |
>        ls1  ----  LR -- S2 (gw2) [20.0.0.0/24]
>   (40.0.0.0/24)
>
> LB1: 10.0.0.40:80 --> 40.0.0.10:80, 40.0.0.20:80
> LB2: 30.0.0.40:80 --> 40.0.0.30:80, 40.0.0.40:80
>
> Routes:
> 10.0.0.0/24 next-hop via S1
> default next-hop via S2
>
>
> In our usecase, backends are only present on ls1.
> S1 and S2 does not have any backends. Similarly,
> S1 needs to handle the traffic only for LB: 10.0.0.40
> S2 needs to handle the traffic only for LB: 30.0.0.40
>
> Hence, ingress coming from S2 happens on gw2 during the
> request part. and egress again happens on gw2 (due to
> above routes) for the response.
>
> Although, the patch does not handle all the usecases, but with
> existing OVN impl, it does not even program any LB rules.
> Hence, this patch acheives the above restricted use-case.
> Let me know, if i need to add some logs etc..
>
> >
> > Shouldn't this patch generate the flows like this for lb1, lb2 and lb3 ?
> >
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw1")),
> > action=(ct_lb_mark(backends=20.0.0.40:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw2")),
> > action=(ct_lb_mark(backends=30.0.0.50:80);)
> >    table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> > !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> > is_chassis_resident("cr-lr0-sw3")),
> > action=(ct_lb_mark(backends=10.0.0.60:80);)
> >
> > So that the LB NAT happens on the chassis where the distributed
> > gateway port for the LB vip subnet resides ?
> The problem is how do we provide mapping of LB VIP and DGP? VIP need
> not to belong to subnet's CIDR.
> >
> > Thanks
> > Numan
> >
> >
> >
> >
> >
> >> ---
> >>   northd/en-lr-stateful.c |  12 ---
> >>   northd/northd.c         |  12 +--
> >>   tests/ovn-northd.at     |  86 +++++++++++++++++++++
> >>   tests/ovn.at            | 167
> ++++++++++++++++++++++++++++++++++++++++
> >>   4 files changed, 260 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
> >> index baf1bd2f8..f09691af6 100644
> >> --- a/northd/en-lr-stateful.c
> >> +++ b/northd/en-lr-stateful.c
> >> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct lr_stateful_table
> *table,
> >>
> >>       table->array[od->index] = lr_stateful_rec;
> >>
> >> -    /* Load balancers are not supported (yet) if a logical router has
> multiple
> >> -     * distributed gateway port.  Log a warning. */
> >> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od)) {
> >> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> >> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
> >> -                     "router %s, which has %"PRIuSIZE" distributed "
> >> -                     "gateway ports. Load-balancer is not supported "
> >> -                     "yet when there is more than one distributed "
> >> -                     "gateway port on the router.",
> >> -                     od->nbr->name, od->n_l3dgw_ports);
> >> -    }
> >> -
> >>       return lr_stateful_rec;
> >>   }
> >>
> >> diff --git a/northd/northd.c b/northd/northd.c
> >> index 6898daa00..e6f53f361 100644
> >> --- a/northd/northd.c
> >> +++ b/northd/northd.c
> >> @@ -11026,10 +11026,9 @@ static void
> >>   build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx
> *ctx,
> >>                                        enum lrouter_nat_lb_flow_type
> type,
> >>                                        struct ovn_datapath *od,
> >> +                                     struct ovn_port *dgp,
> >>                                        struct lflow_ref *lflow_ref)
> >>   {
> >> -    struct ovn_port *dgp = od->l3dgw_ports[0];
> >> -
> >>       const char *undnat_action;
> >>
> >>       switch (type) {
> >> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct
> lrouter_nat_lb_flows_ctx *ctx,
> >>
> >>       if (ctx->lb_vip->n_backends || !ctx->lb_vip->empty_backend_rej) {
> >>           ds_put_format(ctx->new_match, " && is_chassis_resident(%s)",
> >> -                      od->l3dgw_ports[0]->cr_port->json_key);
> >> +                      dgp->cr_port->json_key);
> >>       }
> >>
> >>       ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT,
> ctx->prio,
> >> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
> >>           if (!od->n_l3dgw_ports) {
> >>               bitmap_set1(gw_dp_bitmap[type], index);
> >>           } else {
> >> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> >> -                                                 lb_dps->lflow_ref);
> >> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
> >> +                struct ovn_port *dgp = od->l3dgw_ports[i];
> >> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> dgp,
> >> +
>  lb_dps->lflow_ref);
> >> +            }
> >>           }
> >>
> >>           if (lb->affinity_timeout) {
> >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >> index a389d1988..5be48f49e 100644
> >> --- a/tests/ovn-northd.at
> >> +++ b/tests/ovn-northd.at
> >> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep
> lr_in_dnat | ovn_strip_lflows], [0], [d
> >>
> >>   AT_CLEANUP
> >>   ])
> >> +
> >> +OVN_FOR_EACH_NORTHD_NO_HV([
> >> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
> >> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
> >> +ovn_start
> >> +
> >> +# Logical network:
> >> +# 1 Logical Router, 3 bridged Logical Switches,
> >> +# 1 gateway chassis attached to each corresponding LRP.
> >> +# LB added attached to DR
> >> +#
> >> +#                | S1 (gw1)
> >> +#                |
> >> +#      ls  ----  DR -- S3 (gw3)
> >> +# (20.0.0.0/24)  |
> >> +#                | S2 (gw2)
> >> +#
> >> +# Validate basic LB logical flows.
> >> +
> >> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
> >> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
> >> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
> >> +
> >> +check ovn-nbctl lr-add DR
> >> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
> >> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
> >> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
> >> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
> >> +
> >> +check ovn-nbctl ls-add S1
> >> +check ovn-nbctl lsp-add S1 S1-DR
> >> +check ovn-nbctl lsp-set-type S1-DR router
> >> +check ovn-nbctl lsp-set-addresses S1-DR router
> >> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
> >> +
> >> +check ovn-nbctl ls-add S2
> >> +check ovn-nbctl lsp-add S2 S2-DR
> >> +check ovn-nbctl lsp-set-type S2-DR router
> >> +check ovn-nbctl lsp-set-addresses S2-DR router
> >> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
> >> +
> >> +check ovn-nbctl ls-add S3
> >> +check ovn-nbctl lsp-add S3 S3-DR
> >> +check ovn-nbctl lsp-set-type S3-DR router
> >> +check ovn-nbctl lsp-set-addresses S3-DR router
> >> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
> >> +
> >> +check ovn-nbctl ls-add  ls
> >> +check ovn-nbctl lsp-add ls ls-DR
> >> +check ovn-nbctl lsp-set-type ls-DR router
> >> +check ovn-nbctl lsp-set-addresses ls-DR router
> >> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
> >> +
> >> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
> >> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
> >> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
> >> +
> >> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
> >> +check ovn-nbctl lr-lb-add DR lb-1
> >> +
> >> +check ovn-nbctl --wait=sb sync
> >> +
> >> +ovn-sbctl dump-flows DR > lrflows
> >> +AT_CAPTURE_FILE([lrflows])
> >> +
> >> +# Check the flows in lr_in_dnat stage
> >> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR |
> ovn_strip_lflows], [0], [dnl
> >> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80
> ,20.0.0.9:80);)
> >> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80
> ,20.0.0.9:80);)
> >> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80
> ,20.0.0.9:80);)
> >> +])
> >> +# Check the flows in lr_in_gw_redirect stage
> >> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep
> cr-DR | ovn_strip_lflows], [0], [dnl
> >> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
> == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
> >> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
> == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
> >> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
> == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
> >> +])
> >> +# Check the flows in lr_out_undnat stage
> >> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR
> | ovn_strip_lflows], [0], [dnl
> >> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
> == 80)) && (inport == "DR-S1" || outport == "DR-S1") &&
> is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
> >> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
> == 80)) && (inport == "DR-S2" || outport == "DR-S2") &&
> is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
> >> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
> == 80)) && (inport == "DR-S3" || outport == "DR-S3") &&
> is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
> >> +])
> >> +
> >> +AT_CLEANUP
> >> +])
> >> diff --git a/tests/ovn.at b/tests/ovn.at
> >> index 185ba4a21..8e8c102c0 100644
> >> --- a/tests/ovn.at
> >> +++ b/tests/ovn.at
> >> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
> >>
> >>   AT_CLEANUP
> >>   ])
> >> +
> >> +OVN_FOR_EACH_NORTHD([
> >> +AT_SETUP([Multiple DGP and LB traffic])
> >> +AT_KEYWORDS([dgp-lb])
> >> +AT_SKIP_IF([test $HAVE_SCAPY = no])
> >> +ovn_start
> >> +
> >> +# Logical network:
> >> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
> >> +# 1 gateway chassis attached to each corresponding LRP.
> >> +# LB added attached to DR
> >> +#
> >> +#                | public (gw1) (172.168.0.0/24)
> >> +#                |
> >> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
> >> +# (10.0.0.0/24)
> >> +#
> >> +# Routes (lr0):
> >> +#
> >> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
> >> +#   default      ----> 172.168.0.1 (public)
> >> +#
> >> +#
> >> +# Validate Traffic from public to LB and its response.
> >> +# Validate traffic from public2 to LB and its response.
> >> +
> >> +test_ip_req_packet() {
> >> +    local src_mac="$1"
> >> +    local dst_mac="$2"
> >> +    local src_ip="$3"
> >> +    local dst_ip="$4"
> >> +    local sport=$5
> >> +    local iface=$6
> >> +
> >> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
> >> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> >> +                    UDP(sport=${sport}, dport=4369)")
> >> +
> >> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
> >> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
> >> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
> >> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
> >> +}
> >> +
> >> +test_ip_rep_packet() {
> >> +    local src_mac="$1"
> >> +    local dst_mac="$2"
> >> +    local src_ip="$3"
> >> +    local dst_ip="$4"
> >> +    local dport=$5
> >> +
> >> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
> >> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> >> +                    UDP(sport=4369, dport=${dport})")
> >> +
> >> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> >> +}
> >> +
> >> +net_add n
> >> +
> >> +sim_add hv1
> >> +as hv1
> >> +check ovs-vsctl add-br br-phys
> >> +ovn_attach n br-phys 192.168.0.1
> >> +check ovs-vsctl set open .
> external-ids:ovn-bridge-mappings=phys:br-phys
> >> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
> >> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
> >> +    options:tx_pcap=hv1/vif1-tx.pcap \
> >> +    options:rxq_pcap=hv1/vif1-rx.pcap
> >> +
> >> +sim_add hv2
> >> +as hv2
> >> +check ovs-vsctl add-br br-phys
> >> +ovn_attach n br-phys 192.168.0.2
> >> +check ovs-vsctl set open .
> external-ids:ovn-bridge-mappings=phys:br-phys
> >> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
> >> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
> >> +    options:tx_pcap=hv2/vif1-tx.pcap \
> >> +    options:rxq_pcap=hv2/vif1-rx.pcap
> >> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
> >> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
> >> +    options:tx_pcap=hv2/vif2-tx.pcap \
> >> +    options:rxq_pcap=hv2/vif2-rx.pcap
> >> +
> >> +check ovn-nbctl ls-add sw0
> >> +check ovn-nbctl lsp-add sw0 sw0-port1
> >> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01
> 10.0.0.2"
> >> +
> >> +check ovn-nbctl ls-add public
> >> +check ovn-nbctl lsp-add public ln-public
> >> +check ovn-nbctl lsp-set-type ln-public localnet
> >> +check ovn-nbctl lsp-set-addresses ln-public unknown
> >> +check ovn-nbctl lsp-set-options ln-public network_name=phys
> >> +check ovn-nbctl lsp-add public public-port1
> >> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88
> 172.168.0.200"
> >> +
> >> +check ovn-nbctl ls-add public2
> >> +check ovn-nbctl lsp-add public2 ln-public2
> >> +check ovn-nbctl lsp-set-type ln-public2 localnet
> >> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
> >> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
> >> +check ovn-nbctl lsp-add public2 public2-port1
> >> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99
> 173.168.0.200"
> >> +
> >> +check ovn-nbctl lr-add lr0
> >> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> >> +check ovn-nbctl lsp-add sw0 sw0-lr0
> >> +check ovn-nbctl lsp-set-type sw0-lr0 router
> >> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
> >> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> >> +
> >> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13
> 172.168.0.1/24
> >> +check ovn-nbctl lsp-add public public-lr0
> >> +check ovn-nbctl lsp-set-type public-lr0 router
> >> +check ovn-nbctl lsp-set-addresses public-lr0 router
> >> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
> >> +
> >> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14
> 173.168.0.1/24
> >> +check ovn-nbctl lsp-add public2 public2-lr0
> >> +check ovn-nbctl lsp-set-type public2-lr0 router
> >> +check ovn-nbctl lsp-set-addresses public2-lr0 router
> >> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
> >> +
> >> +
> >> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
> >> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
> >> +
> >> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
> >> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
> >> +
> >> +wait_for_ports_up
> >> +OVN_POPULATE_ARP
> >> +
> >> +
> >> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
> >> +check ovn-nbctl lr-lb-add lr0 lb0
> >> +
> >> +ovn-sbctl lflow-list lr0
> >> +
> >> +# send UDP request to the load-balancer VIP from public switch 1
> >> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13"
> "172.168.0.200" "172.168.0.10" 53 hv2-vif1
> >> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
> hv1/vif1-tx.pcap | wc -l) -ge 1])
> >> +# send UDP reply from sw0-port1
> >> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
> "172.168.0.200" 53
> >> +# packet sent by the load balancer VIP
> >> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88',
> src='00:00:20:20:12:13')/
> >> +                      IP(dst='172.168.0.200', src='172.168.0.10',
> ttl=63)/ \
> >> +                      UDP(sport=4369, dport=53)")
> >> +echo $packet > expected
> >> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
> >> +
> >> +# send UDP request to the load-balancer VIP from public switch 2
> >> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14"
> "173.168.0.200" "172.168.0.10" 54 hv2-vif2
> >> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
> hv1/vif1-tx.pcap | wc -l) -ge 1])
> >> +# send UDP reply from sw0-port1
> >> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
> "173.168.0.200" 54
> >> +# packet sent by the load balancer VIP
> >> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99',
> src='00:00:20:20:12:14')/
> >> +                      IP(dst='173.168.0.200', src='172.168.0.10',
> ttl=63)/ \
> >> +                      UDP(sport=4369, dport=54)")
> >> +echo $packet > expected
> >> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
> >> +
> >> +
> >> +OVN_CLEANUP([hv1],[hv2])
> >> +AT_CLEANUP
> >> +])
> >> --
> >> 2.39.2
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=MK6dYyDJXEqoWiI5_CHJlLMBqkveZAbXAkd1nboKnOQttNKUVr7bMurJRHckcEve&s=Z4TvyW108zN_uD5A6r4R0AcyVSdmfR3GVvWLssPYumY&e=
> >>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Priyankar Jain Aug. 6, 2024, 3:35 p.m. UTC | #5
Hi,

On 06/08/24 6:15 pm, Roberto Bartzen Acosta wrote:
> Hi Priyankar,
>
> So, what is the difference between the patch and the one already proposed
> [1]?
Ahh I just got to know about this patch. Looks same to me.
> The change you are proposing has already been proposed before, and in
> discussion with Mark we came to the conclusion that this breaks the DNAT
> case when using multiple chassis. So, I pushed the v4 [2][3] with Stateless
> NAT implementation to solve this problem.
I reviewed this patch. If we are using stateless NAT, does that mean 
packet won't
be passed through CT in the egress direction? If so, wouldn't this cause 
TCP state
machine to break wrt CT (and CT table marking the packets as invalid). 
say SYN gets
commited to CT (ingress pipeline ct_lb_mark action), syn-ACK won't be 
commited
as only egress will run on GW node. Again the ACK would be commited. So 
are we
sure wrt sequence numbers/window scaling, we are safe wrt TCP CT state 
machine
here?

Also, as i mentioned in my previous reply, our usecase is much more 
restricted
and routing ensures ingress during request (from outside to LR) and egress (
for undnat) runs on the gw node.

Thanks,
Priyankar
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DFebruary_411981.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=JR1CKXLiNrDGVt85mXYTutzBqhpCpiABOutl2TMPWEE&e=
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DJuly_415977.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=QA8-pREOG2X_f-Hk8zDEeXxyV6jOoNx3pRxuKAIhsCk&e=
> [3]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovsrobot_ovn_commit_9379c81152df2bcd0626269988efb8ac36d4f624&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=axc06cz9XfMSiEKgrD6QmmoYwrBVxBnm1T2gEx-_byY&e=
>
> Thanks,
> Roberto
>
> Em ter., 6 de ago. de 2024 às 07:57, Priyankar Jain <
> priyankar.jain@nutanix.com> escreveu:
>
>> Hi,
>> Please find my reply inline.
>>
>> Thanks,
>> Priyankar
>>
>> On 31/07/24 4:46 am, Numan Siddique wrote:
>>> On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
>>> <priyankar.jain@nutanix.com> wrote:
>>>> This commit removes the restriction of support LB for router with only
>>>> <= 1 Distributed gateway ports.
>>>> Added datapath and logical flows validation cases.
>>>>
>>>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
>>> Hi Priyankar,
>>>
>>> Thanks for the patch.
>>>
>>> Can you please tell me the use case ?
>>>
>>> Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
>>> (20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
>>> gw ports are bound to chassis ch1, ch2 and ch3 respectively.
>>>
>>> Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
>>> 20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
>>> lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.
>>>
>>> If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
>>> the chassis ch1 and so on for other switches.
>>>
>>> This patch generates the below logical flows in the router pipeline
>>> for load balancer lb1, lb2 and lb3
>>>
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw1")),
>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw2")),
>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw3")),
>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw1")),
>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw2")),
>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw3")),
>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw1")),
>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw2")),
>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw3")),
>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>
>>> If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
>>> then ideally  the routing will be handled in ch1 since lr1-sw1 is
>>> bound on ch1.
>>>
>>> Now with the above logical flows,  load balancing could happen in the
>>> source chassis itself. Is that ok ?
>> In our usecase, sw2 is not expected to handle load balancing for
>> 10.0.0.0/24 CIDR. it is expected to handle LB only for 20.0.0.0/24
>> (Note: LB is not attached to the logical switch itself, since the
>> same LS can be connected to multiple such LRs.)
>>
>>                    | S1 (gw1) [10.0.0.0/24]
>>                    |
>>         ls1  ----  LR -- S2 (gw2) [20.0.0.0/24]
>>    (40.0.0.0/24)
>>
>> LB1: 10.0.0.40:80 --> 40.0.0.10:80, 40.0.0.20:80
>> LB2: 30.0.0.40:80 --> 40.0.0.30:80, 40.0.0.40:80
>>
>> Routes:
>> 10.0.0.0/24 next-hop via S1
>> default next-hop via S2
>>
>>
>> In our usecase, backends are only present on ls1.
>> S1 and S2 does not have any backends. Similarly,
>> S1 needs to handle the traffic only for LB: 10.0.0.40
>> S2 needs to handle the traffic only for LB: 30.0.0.40
>>
>> Hence, ingress coming from S2 happens on gw2 during the
>> request part. and egress again happens on gw2 (due to
>> above routes) for the response.
>>
>> Although, the patch does not handle all the usecases, but with
>> existing OVN impl, it does not even program any LB rules.
>> Hence, this patch acheives the above restricted use-case.
>> Let me know, if i need to add some logs etc..
>>
>>> Shouldn't this patch generate the flows like this for lb1, lb2 and lb3 ?
>>>
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw1")),
>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw2")),
>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>> is_chassis_resident("cr-lr0-sw3")),
>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>
>>> So that the LB NAT happens on the chassis where the distributed
>>> gateway port for the LB vip subnet resides ?
>> The problem is how do we provide mapping of LB VIP and DGP? VIP need
>> not to belong to subnet's CIDR.
>>> Thanks
>>> Numan
>>>
>>>
>>>
>>>
>>>
>>>> ---
>>>>    northd/en-lr-stateful.c |  12 ---
>>>>    northd/northd.c         |  12 +--
>>>>    tests/ovn-northd.at     |  86 +++++++++++++++++++++
>>>>    tests/ovn.at            | 167
>> ++++++++++++++++++++++++++++++++++++++++
>>>>    4 files changed, 260 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
>>>> index baf1bd2f8..f09691af6 100644
>>>> --- a/northd/en-lr-stateful.c
>>>> +++ b/northd/en-lr-stateful.c
>>>> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct lr_stateful_table
>> *table,
>>>>        table->array[od->index] = lr_stateful_rec;
>>>>
>>>> -    /* Load balancers are not supported (yet) if a logical router has
>> multiple
>>>> -     * distributed gateway port.  Log a warning. */
>>>> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od)) {
>>>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>>>> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
>>>> -                     "router %s, which has %"PRIuSIZE" distributed "
>>>> -                     "gateway ports. Load-balancer is not supported "
>>>> -                     "yet when there is more than one distributed "
>>>> -                     "gateway port on the router.",
>>>> -                     od->nbr->name, od->n_l3dgw_ports);
>>>> -    }
>>>> -
>>>>        return lr_stateful_rec;
>>>>    }
>>>>
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 6898daa00..e6f53f361 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -11026,10 +11026,9 @@ static void
>>>>    build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx
>> *ctx,
>>>>                                         enum lrouter_nat_lb_flow_type
>> type,
>>>>                                         struct ovn_datapath *od,
>>>> +                                     struct ovn_port *dgp,
>>>>                                         struct lflow_ref *lflow_ref)
>>>>    {
>>>> -    struct ovn_port *dgp = od->l3dgw_ports[0];
>>>> -
>>>>        const char *undnat_action;
>>>>
>>>>        switch (type) {
>>>> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct
>> lrouter_nat_lb_flows_ctx *ctx,
>>>>        if (ctx->lb_vip->n_backends || !ctx->lb_vip->empty_backend_rej) {
>>>>            ds_put_format(ctx->new_match, " && is_chassis_resident(%s)",
>>>> -                      od->l3dgw_ports[0]->cr_port->json_key);
>>>> +                      dgp->cr_port->json_key);
>>>>        }
>>>>
>>>>        ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT,
>> ctx->prio,
>>>> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
>>>>            if (!od->n_l3dgw_ports) {
>>>>                bitmap_set1(gw_dp_bitmap[type], index);
>>>>            } else {
>>>> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
>>>> -                                                 lb_dps->lflow_ref);
>>>> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
>>>> +                struct ovn_port *dgp = od->l3dgw_ports[i];
>>>> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
>> dgp,
>>>> +
>>   lb_dps->lflow_ref);
>>>> +            }
>>>>            }
>>>>
>>>>            if (lb->affinity_timeout) {
>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>>> index a389d1988..5be48f49e 100644
>>>> --- a/tests/ovn-northd.at
>>>> +++ b/tests/ovn-northd.at
>>>> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep
>> lr_in_dnat | ovn_strip_lflows], [0], [d
>>>>    AT_CLEANUP
>>>>    ])
>>>> +
>>>> +OVN_FOR_EACH_NORTHD_NO_HV([
>>>> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
>>>> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
>>>> +ovn_start
>>>> +
>>>> +# Logical network:
>>>> +# 1 Logical Router, 3 bridged Logical Switches,
>>>> +# 1 gateway chassis attached to each corresponding LRP.
>>>> +# LB added attached to DR
>>>> +#
>>>> +#                | S1 (gw1)
>>>> +#                |
>>>> +#      ls  ----  DR -- S3 (gw3)
>>>> +# (20.0.0.0/24)  |
>>>> +#                | S2 (gw2)
>>>> +#
>>>> +# Validate basic LB logical flows.
>>>> +
>>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>>>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
>>>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
>>>> +
>>>> +check ovn-nbctl lr-add DR
>>>> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>>>> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
>>>> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
>>>> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
>>>> +
>>>> +check ovn-nbctl ls-add S1
>>>> +check ovn-nbctl lsp-add S1 S1-DR
>>>> +check ovn-nbctl lsp-set-type S1-DR router
>>>> +check ovn-nbctl lsp-set-addresses S1-DR router
>>>> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
>>>> +
>>>> +check ovn-nbctl ls-add S2
>>>> +check ovn-nbctl lsp-add S2 S2-DR
>>>> +check ovn-nbctl lsp-set-type S2-DR router
>>>> +check ovn-nbctl lsp-set-addresses S2-DR router
>>>> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
>>>> +
>>>> +check ovn-nbctl ls-add S3
>>>> +check ovn-nbctl lsp-add S3 S3-DR
>>>> +check ovn-nbctl lsp-set-type S3-DR router
>>>> +check ovn-nbctl lsp-set-addresses S3-DR router
>>>> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
>>>> +
>>>> +check ovn-nbctl ls-add  ls
>>>> +check ovn-nbctl lsp-add ls ls-DR
>>>> +check ovn-nbctl lsp-set-type ls-DR router
>>>> +check ovn-nbctl lsp-set-addresses ls-DR router
>>>> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
>>>> +
>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
>>>> +
>>>> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
>>>> +check ovn-nbctl lr-lb-add DR lb-1
>>>> +
>>>> +check ovn-nbctl --wait=sb sync
>>>> +
>>>> +ovn-sbctl dump-flows DR > lrflows
>>>> +AT_CAPTURE_FILE([lrflows])
>>>> +
>>>> +# Check the flows in lr_in_dnat stage
>>>> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR |
>> ovn_strip_lflows], [0], [dnl
>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>> is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80
>> ,20.0.0.9:80);)
>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>> is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80
>> ,20.0.0.9:80);)
>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>> is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80
>> ,20.0.0.9:80);)
>>>> +])
>>>> +# Check the flows in lr_in_gw_redirect stage
>>>> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep
>> cr-DR | ovn_strip_lflows], [0], [dnl
>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
>> == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
>> == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
>> == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
>>>> +])
>>>> +# Check the flows in lr_out_undnat stage
>>>> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR
>> | ovn_strip_lflows], [0], [dnl
>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
>> == 80)) && (inport == "DR-S1" || outport == "DR-S1") &&
>> is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
>> == 80)) && (inport == "DR-S2" || outport == "DR-S2") &&
>> is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src
>> == 80)) && (inport == "DR-S3" || outport == "DR-S3") &&
>> is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
>>>> +])
>>>> +
>>>> +AT_CLEANUP
>>>> +])
>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>> index 185ba4a21..8e8c102c0 100644
>>>> --- a/tests/ovn.at
>>>> +++ b/tests/ovn.at
>>>> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
>>>>
>>>>    AT_CLEANUP
>>>>    ])
>>>> +
>>>> +OVN_FOR_EACH_NORTHD([
>>>> +AT_SETUP([Multiple DGP and LB traffic])
>>>> +AT_KEYWORDS([dgp-lb])
>>>> +AT_SKIP_IF([test $HAVE_SCAPY = no])
>>>> +ovn_start
>>>> +
>>>> +# Logical network:
>>>> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
>>>> +# 1 gateway chassis attached to each corresponding LRP.
>>>> +# LB added attached to DR
>>>> +#
>>>> +#                | public (gw1) (172.168.0.0/24)
>>>> +#                |
>>>> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
>>>> +# (10.0.0.0/24)
>>>> +#
>>>> +# Routes (lr0):
>>>> +#
>>>> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
>>>> +#   default      ----> 172.168.0.1 (public)
>>>> +#
>>>> +#
>>>> +# Validate Traffic from public to LB and its response.
>>>> +# Validate traffic from public2 to LB and its response.
>>>> +
>>>> +test_ip_req_packet() {
>>>> +    local src_mac="$1"
>>>> +    local dst_mac="$2"
>>>> +    local src_ip="$3"
>>>> +    local dst_ip="$4"
>>>> +    local sport=$5
>>>> +    local iface=$6
>>>> +
>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>>>> +                    UDP(sport=${sport}, dport=4369)")
>>>> +
>>>> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
>>>> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
>>>> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
>>>> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
>>>> +}
>>>> +
>>>> +test_ip_rep_packet() {
>>>> +    local src_mac="$1"
>>>> +    local dst_mac="$2"
>>>> +    local src_ip="$3"
>>>> +    local dst_ip="$4"
>>>> +    local dport=$5
>>>> +
>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>>>> +                    UDP(sport=4369, dport=${dport})")
>>>> +
>>>> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>>>> +}
>>>> +
>>>> +net_add n
>>>> +
>>>> +sim_add hv1
>>>> +as hv1
>>>> +check ovs-vsctl add-br br-phys
>>>> +ovn_attach n br-phys 192.168.0.1
>>>> +check ovs-vsctl set open .
>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
>>>> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
>>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>>>> +    options:rxq_pcap=hv1/vif1-rx.pcap
>>>> +
>>>> +sim_add hv2
>>>> +as hv2
>>>> +check ovs-vsctl add-br br-phys
>>>> +ovn_attach n br-phys 192.168.0.2
>>>> +check ovs-vsctl set open .
>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
>>>> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
>>>> +    options:tx_pcap=hv2/vif1-tx.pcap \
>>>> +    options:rxq_pcap=hv2/vif1-rx.pcap
>>>> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
>>>> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
>>>> +    options:tx_pcap=hv2/vif2-tx.pcap \
>>>> +    options:rxq_pcap=hv2/vif2-rx.pcap
>>>> +
>>>> +check ovn-nbctl ls-add sw0
>>>> +check ovn-nbctl lsp-add sw0 sw0-port1
>>>> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01
>> 10.0.0.2"
>>>> +
>>>> +check ovn-nbctl ls-add public
>>>> +check ovn-nbctl lsp-add public ln-public
>>>> +check ovn-nbctl lsp-set-type ln-public localnet
>>>> +check ovn-nbctl lsp-set-addresses ln-public unknown
>>>> +check ovn-nbctl lsp-set-options ln-public network_name=phys
>>>> +check ovn-nbctl lsp-add public public-port1
>>>> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88
>> 172.168.0.200"
>>>> +
>>>> +check ovn-nbctl ls-add public2
>>>> +check ovn-nbctl lsp-add public2 ln-public2
>>>> +check ovn-nbctl lsp-set-type ln-public2 localnet
>>>> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
>>>> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
>>>> +check ovn-nbctl lsp-add public2 public2-port1
>>>> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99
>> 173.168.0.200"
>>>> +
>>>> +check ovn-nbctl lr-add lr0
>>>> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
>>>> +check ovn-nbctl lsp-add sw0 sw0-lr0
>>>> +check ovn-nbctl lsp-set-type sw0-lr0 router
>>>> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
>>>> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
>>>> +
>>>> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13
>> 172.168.0.1/24
>>>> +check ovn-nbctl lsp-add public public-lr0
>>>> +check ovn-nbctl lsp-set-type public-lr0 router
>>>> +check ovn-nbctl lsp-set-addresses public-lr0 router
>>>> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
>>>> +
>>>> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14
>> 173.168.0.1/24
>>>> +check ovn-nbctl lsp-add public2 public2-lr0
>>>> +check ovn-nbctl lsp-set-type public2-lr0 router
>>>> +check ovn-nbctl lsp-set-addresses public2-lr0 router
>>>> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
>>>> +
>>>> +
>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
>>>> +
>>>> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
>>>> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
>>>> +
>>>> +wait_for_ports_up
>>>> +OVN_POPULATE_ARP
>>>> +
>>>> +
>>>> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
>>>> +check ovn-nbctl lr-lb-add lr0 lb0
>>>> +
>>>> +ovn-sbctl lflow-list lr0
>>>> +
>>>> +# send UDP request to the load-balancer VIP from public switch 1
>>>> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13"
>> "172.168.0.200" "172.168.0.10" 53 hv2-vif1
>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
>> hv1/vif1-tx.pcap | wc -l) -ge 1])
>>>> +# send UDP reply from sw0-port1
>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
>> "172.168.0.200" 53
>>>> +# packet sent by the load balancer VIP
>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88',
>> src='00:00:20:20:12:13')/
>>>> +                      IP(dst='172.168.0.200', src='172.168.0.10',
>> ttl=63)/ \
>>>> +                      UDP(sport=4369, dport=53)")
>>>> +echo $packet > expected
>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
>>>> +
>>>> +# send UDP request to the load-balancer VIP from public switch 2
>>>> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14"
>> "173.168.0.200" "172.168.0.10" 54 hv2-vif2
>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
>> hv1/vif1-tx.pcap | wc -l) -ge 1])
>>>> +# send UDP reply from sw0-port1
>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
>> "173.168.0.200" 54
>>>> +# packet sent by the load balancer VIP
>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99',
>> src='00:00:20:20:12:14')/
>>>> +                      IP(dst='173.168.0.200', src='172.168.0.10',
>> ttl=63)/ \
>>>> +                      UDP(sport=4369, dport=54)")
>>>> +echo $packet > expected
>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
>>>> +
>>>> +
>>>> +OVN_CLEANUP([hv1],[hv2])
>>>> +AT_CLEANUP
>>>> +])
>>>> --
>>>> 2.39.2
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=MK6dYyDJXEqoWiI5_CHJlLMBqkveZAbXAkd1nboKnOQttNKUVr7bMurJRHckcEve&s=Z4TvyW108zN_uD5A6r4R0AcyVSdmfR3GVvWLssPYumY&e=
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=gd4MU4YDCxFHxMDPFWsES-e_pK1HDm28jFCnXhb4bHU&e=
>>
Roberto Bartzen Acosta Aug. 6, 2024, 5:26 p.m. UTC | #6
Hi Priyankar,

Em ter., 6 de ago. de 2024 às 12:35, Priyankar Jain <
priyankar.jain@nutanix.com> escreveu:

> Hi,
>
> On 06/08/24 6:15 pm, Roberto Bartzen Acosta wrote:
> > Hi Priyankar,
> >
> > So, what is the difference between the patch and the one already proposed
> > [1]?
> Ahh I just got to know about this patch. Looks same to me.
> > The change you are proposing has already been proposed before, and in
> > discussion with Mark we came to the conclusion that this breaks the DNAT
> > case when using multiple chassis. So, I pushed the v4 [2][3] with
> Stateless
> > NAT implementation to solve this problem.
> I reviewed this patch. If we are using stateless NAT, does that mean
> packet won't
> be passed through CT in the egress direction? If so, wouldn't this cause
> TCP state
> machine to break wrt CT (and CT table marking the packets as invalid).
> say SYN gets
> commited to CT (ingress pipeline ct_lb_mark action), syn-ACK won't be
> commited
> as only egress will run on GW node. Again the ACK would be commited. So
> are we
> sure wrt sequence numbers/window scaling, we are safe wrt TCP CT state
> machine
> here?
>

My proposal does not affect LB egress since the output from the backends is
still done integrated with the ls_in_lb pipeline, and the DGP egress flow
depends on how you configure the routing/external NAT (e.g. OVN-IC uses
direct routing for this).

In addition, the v4 patch adds the ability to receive packets in the
lr_in_dnat pipeline via different chassis_resident and, additionally,
deliver the packet to ct_lb_mark to process the CT state in the LB pipeline
before delivering it to the backend IPs (regardless of the chassis that
receives the traffic).


>
> Also, as i mentioned in my previous reply, our usecase is much more
> restricted
> and routing ensures ingress during request (from outside to LR) and egress
> (
> for undnat) runs on the gw node.
>
>
Have you ever applied the patch v4 to your use case? Considering the test
cases you write in your patch, the tests passed with my version v4 without
any issues. What do you think?


## ------------------------ ##
## ovn 24.03.90 test suite. ##
## ------------------------ ##

OVN end-to-end tests

533: Multiple DGP and LB traffic -- parallelization=yes --
ovn_monitor_all=yes ok
534: Multiple DGP and LB traffic -- parallelization=yes --
ovn_monitor_all=no ok


Thanks,
Roberto



> Thanks,
> Priyankar
> > [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DFebruary_411981.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=JR1CKXLiNrDGVt85mXYTutzBqhpCpiABOutl2TMPWEE&e=
> > [2]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DJuly_415977.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=QA8-pREOG2X_f-Hk8zDEeXxyV6jOoNx3pRxuKAIhsCk&e=
> > [3]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovsrobot_ovn_commit_9379c81152df2bcd0626269988efb8ac36d4f624&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=axc06cz9XfMSiEKgrD6QmmoYwrBVxBnm1T2gEx-_byY&e=
> >
> > Thanks,
> > Roberto
> >
> > Em ter., 6 de ago. de 2024 às 07:57, Priyankar Jain <
> > priyankar.jain@nutanix.com> escreveu:
> >
> >> Hi,
> >> Please find my reply inline.
> >>
> >> Thanks,
> >> Priyankar
> >>
> >> On 31/07/24 4:46 am, Numan Siddique wrote:
> >>> On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
> >>> <priyankar.jain@nutanix.com> wrote:
> >>>> This commit removes the restriction of support LB for router with only
> >>>> <= 1 Distributed gateway ports.
> >>>> Added datapath and logical flows validation cases.
> >>>>
> >>>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
> >>> Hi Priyankar,
> >>>
> >>> Thanks for the patch.
> >>>
> >>> Can you please tell me the use case ?
> >>>
> >>> Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
> >>> (20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
> >>> gw ports are bound to chassis ch1, ch2 and ch3 respectively.
> >>>
> >>> Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
> >>> 20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
> >>> lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.
> >>>
> >>> If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
> >>> the chassis ch1 and so on for other switches.
> >>>
> >>> This patch generates the below logical flows in the router pipeline
> >>> for load balancer lb1, lb2 and lb3
> >>>
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw1")),
> >>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw2")),
> >>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw3")),
> >>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw1")),
> >>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw2")),
> >>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw3")),
> >>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw1")),
> >>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw2")),
> >>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw3")),
> >>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>
> >>> If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
> >>> then ideally  the routing will be handled in ch1 since lr1-sw1 is
> >>> bound on ch1.
> >>>
> >>> Now with the above logical flows,  load balancing could happen in the
> >>> source chassis itself. Is that ok ?
> >> In our usecase, sw2 is not expected to handle load balancing for
> >> 10.0.0.0/24 CIDR. it is expected to handle LB only for 20.0.0.0/24
> >> (Note: LB is not attached to the logical switch itself, since the
> >> same LS can be connected to multiple such LRs.)
> >>
> >>                    | S1 (gw1) [10.0.0.0/24]
> >>                    |
> >>         ls1  ----  LR -- S2 (gw2) [20.0.0.0/24]
> >>    (40.0.0.0/24)
> >>
> >> LB1: 10.0.0.40:80 --> 40.0.0.10:80, 40.0.0.20:80
> >> LB2: 30.0.0.40:80 --> 40.0.0.30:80, 40.0.0.40:80
> >>
> >> Routes:
> >> 10.0.0.0/24 next-hop via S1
> >> default next-hop via S2
> >>
> >>
> >> In our usecase, backends are only present on ls1.
> >> S1 and S2 does not have any backends. Similarly,
> >> S1 needs to handle the traffic only for LB: 10.0.0.40
> >> S2 needs to handle the traffic only for LB: 30.0.0.40
> >>
> >> Hence, ingress coming from S2 happens on gw2 during the
> >> request part. and egress again happens on gw2 (due to
> >> above routes) for the response.
> >>
> >> Although, the patch does not handle all the usecases, but with
> >> existing OVN impl, it does not even program any LB rules.
> >> Hence, this patch acheives the above restricted use-case.
> >> Let me know, if i need to add some logs etc..
> >>
> >>> Shouldn't this patch generate the flows like this for lb1, lb2 and lb3
> ?
> >>>
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw1")),
> >>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw2")),
> >>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>     table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>> is_chassis_resident("cr-lr0-sw3")),
> >>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>
> >>> So that the LB NAT happens on the chassis where the distributed
> >>> gateway port for the LB vip subnet resides ?
> >> The problem is how do we provide mapping of LB VIP and DGP? VIP need
> >> not to belong to subnet's CIDR.
> >>> Thanks
> >>> Numan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> ---
> >>>>    northd/en-lr-stateful.c |  12 ---
> >>>>    northd/northd.c         |  12 +--
> >>>>    tests/ovn-northd.at     |  86 +++++++++++++++++++++
> >>>>    tests/ovn.at            | 167
> >> ++++++++++++++++++++++++++++++++++++++++
> >>>>    4 files changed, 260 insertions(+), 17 deletions(-)
> >>>>
> >>>> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
> >>>> index baf1bd2f8..f09691af6 100644
> >>>> --- a/northd/en-lr-stateful.c
> >>>> +++ b/northd/en-lr-stateful.c
> >>>> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct
> lr_stateful_table
> >> *table,
> >>>>        table->array[od->index] = lr_stateful_rec;
> >>>>
> >>>> -    /* Load balancers are not supported (yet) if a logical router has
> >> multiple
> >>>> -     * distributed gateway port.  Log a warning. */
> >>>> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od))
> {
> >>>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
> 1);
> >>>> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
> >>>> -                     "router %s, which has %"PRIuSIZE" distributed "
> >>>> -                     "gateway ports. Load-balancer is not supported "
> >>>> -                     "yet when there is more than one distributed "
> >>>> -                     "gateway port on the router.",
> >>>> -                     od->nbr->name, od->n_l3dgw_ports);
> >>>> -    }
> >>>> -
> >>>>        return lr_stateful_rec;
> >>>>    }
> >>>>
> >>>> diff --git a/northd/northd.c b/northd/northd.c
> >>>> index 6898daa00..e6f53f361 100644
> >>>> --- a/northd/northd.c
> >>>> +++ b/northd/northd.c
> >>>> @@ -11026,10 +11026,9 @@ static void
> >>>>    build_distr_lrouter_nat_flows_for_lb(struct
> lrouter_nat_lb_flows_ctx
> >> *ctx,
> >>>>                                         enum lrouter_nat_lb_flow_type
> >> type,
> >>>>                                         struct ovn_datapath *od,
> >>>> +                                     struct ovn_port *dgp,
> >>>>                                         struct lflow_ref *lflow_ref)
> >>>>    {
> >>>> -    struct ovn_port *dgp = od->l3dgw_ports[0];
> >>>> -
> >>>>        const char *undnat_action;
> >>>>
> >>>>        switch (type) {
> >>>> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct
> >> lrouter_nat_lb_flows_ctx *ctx,
> >>>>        if (ctx->lb_vip->n_backends ||
> !ctx->lb_vip->empty_backend_rej) {
> >>>>            ds_put_format(ctx->new_match, " &&
> is_chassis_resident(%s)",
> >>>> -                      od->l3dgw_ports[0]->cr_port->json_key);
> >>>> +                      dgp->cr_port->json_key);
> >>>>        }
> >>>>
> >>>>        ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT,
> >> ctx->prio,
> >>>> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
> >>>>            if (!od->n_l3dgw_ports) {
> >>>>                bitmap_set1(gw_dp_bitmap[type], index);
> >>>>            } else {
> >>>> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> >>>> -                                                 lb_dps->lflow_ref);
> >>>> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
> >>>> +                struct ovn_port *dgp = od->l3dgw_ports[i];
> >>>> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> >> dgp,
> >>>> +
> >>   lb_dps->lflow_ref);
> >>>> +            }
> >>>>            }
> >>>>
> >>>>            if (lb->affinity_timeout) {
> >>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >>>> index a389d1988..5be48f49e 100644
> >>>> --- a/tests/ovn-northd.at
> >>>> +++ b/tests/ovn-northd.at
> >>>> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep
> >> lr_in_dnat | ovn_strip_lflows], [0], [d
> >>>>    AT_CLEANUP
> >>>>    ])
> >>>> +
> >>>> +OVN_FOR_EACH_NORTHD_NO_HV([
> >>>> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
> >>>> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
> >>>> +ovn_start
> >>>> +
> >>>> +# Logical network:
> >>>> +# 1 Logical Router, 3 bridged Logical Switches,
> >>>> +# 1 gateway chassis attached to each corresponding LRP.
> >>>> +# LB added attached to DR
> >>>> +#
> >>>> +#                | S1 (gw1)
> >>>> +#                |
> >>>> +#      ls  ----  DR -- S3 (gw3)
> >>>> +# (20.0.0.0/24)  |
> >>>> +#                | S2 (gw2)
> >>>> +#
> >>>> +# Validate basic LB logical flows.
> >>>> +
> >>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
> >>>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
> >>>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
> >>>> +
> >>>> +check ovn-nbctl lr-add DR
> >>>> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
> >>>> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
> >>>> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
> >>>> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
> >>>> +
> >>>> +check ovn-nbctl ls-add S1
> >>>> +check ovn-nbctl lsp-add S1 S1-DR
> >>>> +check ovn-nbctl lsp-set-type S1-DR router
> >>>> +check ovn-nbctl lsp-set-addresses S1-DR router
> >>>> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
> >>>> +
> >>>> +check ovn-nbctl ls-add S2
> >>>> +check ovn-nbctl lsp-add S2 S2-DR
> >>>> +check ovn-nbctl lsp-set-type S2-DR router
> >>>> +check ovn-nbctl lsp-set-addresses S2-DR router
> >>>> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
> >>>> +
> >>>> +check ovn-nbctl ls-add S3
> >>>> +check ovn-nbctl lsp-add S3 S3-DR
> >>>> +check ovn-nbctl lsp-set-type S3-DR router
> >>>> +check ovn-nbctl lsp-set-addresses S3-DR router
> >>>> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
> >>>> +
> >>>> +check ovn-nbctl ls-add  ls
> >>>> +check ovn-nbctl lsp-add ls ls-DR
> >>>> +check ovn-nbctl lsp-set-type ls-DR router
> >>>> +check ovn-nbctl lsp-set-addresses ls-DR router
> >>>> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
> >>>> +
> >>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
> >>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
> >>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
> >>>> +
> >>>> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
> >>>> +check ovn-nbctl lr-lb-add DR lb-1
> >>>> +
> >>>> +check ovn-nbctl --wait=sb sync
> >>>> +
> >>>> +ovn-sbctl dump-flows DR > lrflows
> >>>> +AT_CAPTURE_FILE([lrflows])
> >>>> +
> >>>> +# Check the flows in lr_in_dnat stage
> >>>> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR |
> >> ovn_strip_lflows], [0], [dnl
> >>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> >> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> >> is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80
> >> ,20.0.0.9:80);)
> >>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> >> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> >> is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80
> >> ,20.0.0.9:80);)
> >>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> >> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> >> is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80
> >> ,20.0.0.9:80);)
> >>>> +])
> >>>> +# Check the flows in lr_in_gw_redirect stage
> >>>> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep
> >> cr-DR | ovn_strip_lflows], [0], [dnl
> >>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> >> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> tcp.src
> >> == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
> >>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> >> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> tcp.src
> >> == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
> >>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> >> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> tcp.src
> >> == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
> >>>> +])
> >>>> +# Check the flows in lr_out_undnat stage
> >>>> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR
> >> | ovn_strip_lflows], [0], [dnl
> >>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> >> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> tcp.src
> >> == 80)) && (inport == "DR-S1" || outport == "DR-S1") &&
> >> is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
> >>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> >> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> tcp.src
> >> == 80)) && (inport == "DR-S2" || outport == "DR-S2") &&
> >> is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
> >>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> >> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> tcp.src
> >> == 80)) && (inport == "DR-S3" || outport == "DR-S3") &&
> >> is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
> >>>> +])
> >>>> +
> >>>> +AT_CLEANUP
> >>>> +])
> >>>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>>> index 185ba4a21..8e8c102c0 100644
> >>>> --- a/tests/ovn.at
> >>>> +++ b/tests/ovn.at
> >>>> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
> >>>>
> >>>>    AT_CLEANUP
> >>>>    ])
> >>>> +
> >>>> +OVN_FOR_EACH_NORTHD([
> >>>> +AT_SETUP([Multiple DGP and LB traffic])
> >>>> +AT_KEYWORDS([dgp-lb])
> >>>> +AT_SKIP_IF([test $HAVE_SCAPY = no])
> >>>> +ovn_start
> >>>> +
> >>>> +# Logical network:
> >>>> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
> >>>> +# 1 gateway chassis attached to each corresponding LRP.
> >>>> +# LB added attached to DR
> >>>> +#
> >>>> +#                | public (gw1) (172.168.0.0/24)
> >>>> +#                |
> >>>> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
> >>>> +# (10.0.0.0/24)
> >>>> +#
> >>>> +# Routes (lr0):
> >>>> +#
> >>>> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
> >>>> +#   default      ----> 172.168.0.1 (public)
> >>>> +#
> >>>> +#
> >>>> +# Validate Traffic from public to LB and its response.
> >>>> +# Validate traffic from public2 to LB and its response.
> >>>> +
> >>>> +test_ip_req_packet() {
> >>>> +    local src_mac="$1"
> >>>> +    local dst_mac="$2"
> >>>> +    local src_ip="$3"
> >>>> +    local dst_ip="$4"
> >>>> +    local sport=$5
> >>>> +    local iface=$6
> >>>> +
> >>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
> src='${src_mac}')/
> >>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> >>>> +                    UDP(sport=${sport}, dport=4369)")
> >>>> +
> >>>> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
> >>>> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
> >>>> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
> >>>> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
> >>>> +}
> >>>> +
> >>>> +test_ip_rep_packet() {
> >>>> +    local src_mac="$1"
> >>>> +    local dst_mac="$2"
> >>>> +    local src_ip="$3"
> >>>> +    local dst_ip="$4"
> >>>> +    local dport=$5
> >>>> +
> >>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
> src='${src_mac}')/
> >>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> >>>> +                    UDP(sport=4369, dport=${dport})")
> >>>> +
> >>>> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> >>>> +}
> >>>> +
> >>>> +net_add n
> >>>> +
> >>>> +sim_add hv1
> >>>> +as hv1
> >>>> +check ovs-vsctl add-br br-phys
> >>>> +ovn_attach n br-phys 192.168.0.1
> >>>> +check ovs-vsctl set open .
> >> external-ids:ovn-bridge-mappings=phys:br-phys
> >>>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
> >>>> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
> >>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
> >>>> +    options:rxq_pcap=hv1/vif1-rx.pcap
> >>>> +
> >>>> +sim_add hv2
> >>>> +as hv2
> >>>> +check ovs-vsctl add-br br-phys
> >>>> +ovn_attach n br-phys 192.168.0.2
> >>>> +check ovs-vsctl set open .
> >> external-ids:ovn-bridge-mappings=phys:br-phys
> >>>> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
> >>>> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
> >>>> +    options:tx_pcap=hv2/vif1-tx.pcap \
> >>>> +    options:rxq_pcap=hv2/vif1-rx.pcap
> >>>> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
> >>>> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
> >>>> +    options:tx_pcap=hv2/vif2-tx.pcap \
> >>>> +    options:rxq_pcap=hv2/vif2-rx.pcap
> >>>> +
> >>>> +check ovn-nbctl ls-add sw0
> >>>> +check ovn-nbctl lsp-add sw0 sw0-port1
> >>>> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01
> >> 10.0.0.2"
> >>>> +
> >>>> +check ovn-nbctl ls-add public
> >>>> +check ovn-nbctl lsp-add public ln-public
> >>>> +check ovn-nbctl lsp-set-type ln-public localnet
> >>>> +check ovn-nbctl lsp-set-addresses ln-public unknown
> >>>> +check ovn-nbctl lsp-set-options ln-public network_name=phys
> >>>> +check ovn-nbctl lsp-add public public-port1
> >>>> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88
> >> 172.168.0.200"
> >>>> +
> >>>> +check ovn-nbctl ls-add public2
> >>>> +check ovn-nbctl lsp-add public2 ln-public2
> >>>> +check ovn-nbctl lsp-set-type ln-public2 localnet
> >>>> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
> >>>> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
> >>>> +check ovn-nbctl lsp-add public2 public2-port1
> >>>> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99
> >> 173.168.0.200"
> >>>> +
> >>>> +check ovn-nbctl lr-add lr0
> >>>> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> >>>> +check ovn-nbctl lsp-add sw0 sw0-lr0
> >>>> +check ovn-nbctl lsp-set-type sw0-lr0 router
> >>>> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
> >>>> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> >>>> +
> >>>> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13
> >> 172.168.0.1/24
> >>>> +check ovn-nbctl lsp-add public public-lr0
> >>>> +check ovn-nbctl lsp-set-type public-lr0 router
> >>>> +check ovn-nbctl lsp-set-addresses public-lr0 router
> >>>> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
> >>>> +
> >>>> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14
> >> 173.168.0.1/24
> >>>> +check ovn-nbctl lsp-add public2 public2-lr0
> >>>> +check ovn-nbctl lsp-set-type public2-lr0 router
> >>>> +check ovn-nbctl lsp-set-addresses public2-lr0 router
> >>>> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
> >>>> +
> >>>> +
> >>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
> >>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
> >>>> +
> >>>> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
> >>>> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
> >>>> +
> >>>> +wait_for_ports_up
> >>>> +OVN_POPULATE_ARP
> >>>> +
> >>>> +
> >>>> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
> >>>> +check ovn-nbctl lr-lb-add lr0 lb0
> >>>> +
> >>>> +ovn-sbctl lflow-list lr0
> >>>> +
> >>>> +# send UDP request to the load-balancer VIP from public switch 1
> >>>> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13"
> >> "172.168.0.200" "172.168.0.10" 53 hv2-vif1
> >>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
> >> hv1/vif1-tx.pcap | wc -l) -ge 1])
> >>>> +# send UDP reply from sw0-port1
> >>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
> >> "172.168.0.200" 53
> >>>> +# packet sent by the load balancer VIP
> >>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88',
> >> src='00:00:20:20:12:13')/
> >>>> +                      IP(dst='172.168.0.200', src='172.168.0.10',
> >> ttl=63)/ \
> >>>> +                      UDP(sport=4369, dport=53)")
> >>>> +echo $packet > expected
> >>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
> >>>> +
> >>>> +# send UDP request to the load-balancer VIP from public switch 2
> >>>> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14"
> >> "173.168.0.200" "172.168.0.10" 54 hv2-vif2
> >>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
> >> hv1/vif1-tx.pcap | wc -l) -ge 1])
> >>>> +# send UDP reply from sw0-port1
> >>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
> >> "173.168.0.200" 54
> >>>> +# packet sent by the load balancer VIP
> >>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99',
> >> src='00:00:20:20:12:14')/
> >>>> +                      IP(dst='173.168.0.200', src='172.168.0.10',
> >> ttl=63)/ \
> >>>> +                      UDP(sport=4369, dport=54)")
> >>>> +echo $packet > expected
> >>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
> >>>> +
> >>>> +
> >>>> +OVN_CLEANUP([hv1],[hv2])
> >>>> +AT_CLEANUP
> >>>> +])
> >>>> --
> >>>> 2.39.2
> >>>>
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> dev@openvswitch.org
> >>>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=MK6dYyDJXEqoWiI5_CHJlLMBqkveZAbXAkd1nboKnOQttNKUVr7bMurJRHckcEve&s=Z4TvyW108zN_uD5A6r4R0AcyVSdmfR3GVvWLssPYumY&e=
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=gd4MU4YDCxFHxMDPFWsES-e_pK1HDm28jFCnXhb4bHU&e=
> >>
>
>
Mark Michelson Aug. 6, 2024, 8:41 p.m. UTC | #7
Hi I just wanted to weigh in.

On 8/6/24 13:26, Roberto Bartzen Acosta via dev wrote:
> Hi Priyankar,
> 
> Em ter., 6 de ago. de 2024 às 12:35, Priyankar Jain <
> priyankar.jain@nutanix.com> escreveu:
> 
>> Hi,
>>
>> On 06/08/24 6:15 pm, Roberto Bartzen Acosta wrote:
>>> Hi Priyankar,
>>>
>>> So, what is the difference between the patch and the one already proposed
>>> [1]?
>> Ahh I just got to know about this patch. Looks same to me.
>>> The change you are proposing has already been proposed before, and in
>>> discussion with Mark we came to the conclusion that this breaks the DNAT
>>> case when using multiple chassis. So, I pushed the v4 [2][3] with
>> Stateless
>>> NAT implementation to solve this problem.

I agree that this patch looks a lot like v1 of Roberto's series. I can't 
comment on v4 of the patch yet since I haven't had a chance to look at 
it yet.

I have it on my list to get reviewed before the end of the week.

>> I reviewed this patch. If we are using stateless NAT, does that mean
>> packet won't
>> be passed through CT in the egress direction? If so, wouldn't this cause
>> TCP state
>> machine to break wrt CT (and CT table marking the packets as invalid).
>> say SYN gets
>> commited to CT (ingress pipeline ct_lb_mark action), syn-ACK won't be
>> commited
>> as only egress will run on GW node. Again the ACK would be commited. So
>> are we
>> sure wrt sequence numbers/window scaling, we are safe wrt TCP CT state
>> machine
>> here?
>>
> 
> My proposal does not affect LB egress since the output from the backends is
> still done integrated with the ls_in_lb pipeline, and the DGP egress flow
> depends on how you configure the routing/external NAT (e.g. OVN-IC uses
> direct routing for this).
> 
> In addition, the v4 patch adds the ability to receive packets in the
> lr_in_dnat pipeline via different chassis_resident and, additionally,
> deliver the packet to ct_lb_mark to process the CT state in the LB pipeline
> before delivering it to the backend IPs (regardless of the chassis that
> receives the traffic).
> 
> 
>>
>> Also, as i mentioned in my previous reply, our usecase is much more
>> restricted
>> and routing ensures ingress during request (from outside to LR) and egress
>> (
>> for undnat) runs on the gw node.
>>

The problem with this is that there's nothing in the attached patch to 
enforce this restricted scneario. It's possible for people to shoot 
themselves in the foot.

>>
> Have you ever applied the patch v4 to your use case? Considering the test
> cases you write in your patch, the tests passed with my version v4 without
> any issues. What do you think?
> 
> 
> ## ------------------------ ##
> ## ovn 24.03.90 test suite. ##
> ## ------------------------ ##
> 
> OVN end-to-end tests
> 
> 533: Multiple DGP and LB traffic -- parallelization=yes --
> ovn_monitor_all=yes ok
> 534: Multiple DGP and LB traffic -- parallelization=yes --
> ovn_monitor_all=no ok
> 
> 
> Thanks,
> Roberto
> 
> 
> 
>> Thanks,
>> Priyankar
>>> [1]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DFebruary_411981.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=JR1CKXLiNrDGVt85mXYTutzBqhpCpiABOutl2TMPWEE&e=
>>> [2]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DJuly_415977.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=QA8-pREOG2X_f-Hk8zDEeXxyV6jOoNx3pRxuKAIhsCk&e=
>>> [3]
>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovsrobot_ovn_commit_9379c81152df2bcd0626269988efb8ac36d4f624&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=axc06cz9XfMSiEKgrD6QmmoYwrBVxBnm1T2gEx-_byY&e=
>>>
>>> Thanks,
>>> Roberto
>>>
>>> Em ter., 6 de ago. de 2024 às 07:57, Priyankar Jain <
>>> priyankar.jain@nutanix.com> escreveu:
>>>
>>>> Hi,
>>>> Please find my reply inline.
>>>>
>>>> Thanks,
>>>> Priyankar
>>>>
>>>> On 31/07/24 4:46 am, Numan Siddique wrote:
>>>>> On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
>>>>> <priyankar.jain@nutanix.com> wrote:
>>>>>> This commit removes the restriction of support LB for router with only
>>>>>> <= 1 Distributed gateway ports.
>>>>>> Added datapath and logical flows validation cases.
>>>>>>
>>>>>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
>>>>> Hi Priyankar,
>>>>>
>>>>> Thanks for the patch.
>>>>>
>>>>> Can you please tell me the use case ?
>>>>>
>>>>> Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
>>>>> (20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
>>>>> gw ports are bound to chassis ch1, ch2 and ch3 respectively.
>>>>>
>>>>> Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
>>>>> 20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
>>>>> lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.
>>>>>
>>>>> If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
>>>>> the chassis ch1 and so on for other switches.
>>>>>
>>>>> This patch generates the below logical flows in the router pipeline
>>>>> for load balancer lb1, lb2 and lb3
>>>>>
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>
>>>>> If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
>>>>> then ideally  the routing will be handled in ch1 since lr1-sw1 is
>>>>> bound on ch1.
>>>>>
>>>>> Now with the above logical flows,  load balancing could happen in the
>>>>> source chassis itself. Is that ok ?
>>>> In our usecase, sw2 is not expected to handle load balancing for
>>>> 10.0.0.0/24 CIDR. it is expected to handle LB only for 20.0.0.0/24
>>>> (Note: LB is not attached to the logical switch itself, since the
>>>> same LS can be connected to multiple such LRs.)
>>>>
>>>>                     | S1 (gw1) [10.0.0.0/24]
>>>>                     |
>>>>          ls1  ----  LR -- S2 (gw2) [20.0.0.0/24]
>>>>     (40.0.0.0/24)
>>>>
>>>> LB1: 10.0.0.40:80 --> 40.0.0.10:80, 40.0.0.20:80
>>>> LB2: 30.0.0.40:80 --> 40.0.0.30:80, 40.0.0.40:80
>>>>
>>>> Routes:
>>>> 10.0.0.0/24 next-hop via S1
>>>> default next-hop via S2
>>>>
>>>>
>>>> In our usecase, backends are only present on ls1.
>>>> S1 and S2 does not have any backends. Similarly,
>>>> S1 needs to handle the traffic only for LB: 10.0.0.40
>>>> S2 needs to handle the traffic only for LB: 30.0.0.40
>>>>
>>>> Hence, ingress coming from S2 happens on gw2 during the
>>>> request part. and egress again happens on gw2 (due to
>>>> above routes) for the response.
>>>>
>>>> Although, the patch does not handle all the usecases, but with
>>>> existing OVN impl, it does not even program any LB rules.
>>>> Hence, this patch acheives the above restricted use-case.
>>>> Let me know, if i need to add some logs etc..
>>>>
>>>>> Shouldn't this patch generate the flows like this for lb1, lb2 and lb3
>> ?
>>>>>
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>
>>>>> So that the LB NAT happens on the chassis where the distributed
>>>>> gateway port for the LB vip subnet resides ?
>>>> The problem is how do we provide mapping of LB VIP and DGP? VIP need
>>>> not to belong to subnet's CIDR.
>>>>> Thanks
>>>>> Numan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> ---
>>>>>>     northd/en-lr-stateful.c |  12 ---
>>>>>>     northd/northd.c         |  12 +--
>>>>>>     tests/ovn-northd.at     |  86 +++++++++++++++++++++
>>>>>>     tests/ovn.at            | 167
>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>     4 files changed, 260 insertions(+), 17 deletions(-)
>>>>>>
>>>>>> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
>>>>>> index baf1bd2f8..f09691af6 100644
>>>>>> --- a/northd/en-lr-stateful.c
>>>>>> +++ b/northd/en-lr-stateful.c
>>>>>> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct
>> lr_stateful_table
>>>> *table,
>>>>>>         table->array[od->index] = lr_stateful_rec;
>>>>>>
>>>>>> -    /* Load balancers are not supported (yet) if a logical router has
>>>> multiple
>>>>>> -     * distributed gateway port.  Log a warning. */
>>>>>> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od))
>> {
>>>>>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
>> 1);
>>>>>> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
>>>>>> -                     "router %s, which has %"PRIuSIZE" distributed "
>>>>>> -                     "gateway ports. Load-balancer is not supported "
>>>>>> -                     "yet when there is more than one distributed "
>>>>>> -                     "gateway port on the router.",
>>>>>> -                     od->nbr->name, od->n_l3dgw_ports);
>>>>>> -    }
>>>>>> -
>>>>>>         return lr_stateful_rec;
>>>>>>     }
>>>>>>
>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>> index 6898daa00..e6f53f361 100644
>>>>>> --- a/northd/northd.c
>>>>>> +++ b/northd/northd.c
>>>>>> @@ -11026,10 +11026,9 @@ static void
>>>>>>     build_distr_lrouter_nat_flows_for_lb(struct
>> lrouter_nat_lb_flows_ctx
>>>> *ctx,
>>>>>>                                          enum lrouter_nat_lb_flow_type
>>>> type,
>>>>>>                                          struct ovn_datapath *od,
>>>>>> +                                     struct ovn_port *dgp,
>>>>>>                                          struct lflow_ref *lflow_ref)
>>>>>>     {
>>>>>> -    struct ovn_port *dgp = od->l3dgw_ports[0];
>>>>>> -
>>>>>>         const char *undnat_action;
>>>>>>
>>>>>>         switch (type) {
>>>>>> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct
>>>> lrouter_nat_lb_flows_ctx *ctx,
>>>>>>         if (ctx->lb_vip->n_backends ||
>> !ctx->lb_vip->empty_backend_rej) {
>>>>>>             ds_put_format(ctx->new_match, " &&
>> is_chassis_resident(%s)",
>>>>>> -                      od->l3dgw_ports[0]->cr_port->json_key);
>>>>>> +                      dgp->cr_port->json_key);
>>>>>>         }
>>>>>>
>>>>>>         ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT,
>>>> ctx->prio,
>>>>>> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
>>>>>>             if (!od->n_l3dgw_ports) {
>>>>>>                 bitmap_set1(gw_dp_bitmap[type], index);
>>>>>>             } else {
>>>>>> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
>>>>>> -                                                 lb_dps->lflow_ref);
>>>>>> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
>>>>>> +                struct ovn_port *dgp = od->l3dgw_ports[i];
>>>>>> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
>>>> dgp,
>>>>>> +
>>>>    lb_dps->lflow_ref);
>>>>>> +            }
>>>>>>             }
>>>>>>
>>>>>>             if (lb->affinity_timeout) {
>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>>>>> index a389d1988..5be48f49e 100644
>>>>>> --- a/tests/ovn-northd.at
>>>>>> +++ b/tests/ovn-northd.at
>>>>>> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep
>>>> lr_in_dnat | ovn_strip_lflows], [0], [d
>>>>>>     AT_CLEANUP
>>>>>>     ])
>>>>>> +
>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([
>>>>>> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
>>>>>> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
>>>>>> +ovn_start
>>>>>> +
>>>>>> +# Logical network:
>>>>>> +# 1 Logical Router, 3 bridged Logical Switches,
>>>>>> +# 1 gateway chassis attached to each corresponding LRP.
>>>>>> +# LB added attached to DR
>>>>>> +#
>>>>>> +#                | S1 (gw1)
>>>>>> +#                |
>>>>>> +#      ls  ----  DR -- S3 (gw3)
>>>>>> +# (20.0.0.0/24)  |
>>>>>> +#                | S2 (gw2)
>>>>>> +#
>>>>>> +# Validate basic LB logical flows.
>>>>>> +
>>>>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>>>>>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
>>>>>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
>>>>>> +
>>>>>> +check ovn-nbctl lr-add DR
>>>>>> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>>>>>> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
>>>>>> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
>>>>>> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
>>>>>> +
>>>>>> +check ovn-nbctl ls-add S1
>>>>>> +check ovn-nbctl lsp-add S1 S1-DR
>>>>>> +check ovn-nbctl lsp-set-type S1-DR router
>>>>>> +check ovn-nbctl lsp-set-addresses S1-DR router
>>>>>> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
>>>>>> +
>>>>>> +check ovn-nbctl ls-add S2
>>>>>> +check ovn-nbctl lsp-add S2 S2-DR
>>>>>> +check ovn-nbctl lsp-set-type S2-DR router
>>>>>> +check ovn-nbctl lsp-set-addresses S2-DR router
>>>>>> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
>>>>>> +
>>>>>> +check ovn-nbctl ls-add S3
>>>>>> +check ovn-nbctl lsp-add S3 S3-DR
>>>>>> +check ovn-nbctl lsp-set-type S3-DR router
>>>>>> +check ovn-nbctl lsp-set-addresses S3-DR router
>>>>>> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
>>>>>> +
>>>>>> +check ovn-nbctl ls-add  ls
>>>>>> +check ovn-nbctl lsp-add ls ls-DR
>>>>>> +check ovn-nbctl lsp-set-type ls-DR router
>>>>>> +check ovn-nbctl lsp-set-addresses ls-DR router
>>>>>> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
>>>>>> +
>>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
>>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
>>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
>>>>>> +
>>>>>> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
>>>>>> +check ovn-nbctl lr-lb-add DR lb-1
>>>>>> +
>>>>>> +check ovn-nbctl --wait=sb sync
>>>>>> +
>>>>>> +ovn-sbctl dump-flows DR > lrflows
>>>>>> +AT_CAPTURE_FILE([lrflows])
>>>>>> +
>>>>>> +# Check the flows in lr_in_dnat stage
>>>>>> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR |
>>>> ovn_strip_lflows], [0], [dnl
>>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>>>> is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80
>>>> ,20.0.0.9:80);)
>>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>>>> is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80
>>>> ,20.0.0.9:80);)
>>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>>>> is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80
>>>> ,20.0.0.9:80);)
>>>>>> +])
>>>>>> +# Check the flows in lr_in_gw_redirect stage
>>>>>> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep
>>>> cr-DR | ovn_strip_lflows], [0], [dnl
>>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>> tcp.src
>>>> == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
>>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>> tcp.src
>>>> == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
>>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>> tcp.src
>>>> == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
>>>>>> +])
>>>>>> +# Check the flows in lr_out_undnat stage
>>>>>> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR
>>>> | ovn_strip_lflows], [0], [dnl
>>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>> tcp.src
>>>> == 80)) && (inport == "DR-S1" || outport == "DR-S1") &&
>>>> is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
>>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>> tcp.src
>>>> == 80)) && (inport == "DR-S2" || outport == "DR-S2") &&
>>>> is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
>>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>> tcp.src
>>>> == 80)) && (inport == "DR-S3" || outport == "DR-S3") &&
>>>> is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
>>>>>> +])
>>>>>> +
>>>>>> +AT_CLEANUP
>>>>>> +])
>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>>> index 185ba4a21..8e8c102c0 100644
>>>>>> --- a/tests/ovn.at
>>>>>> +++ b/tests/ovn.at
>>>>>> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
>>>>>>
>>>>>>     AT_CLEANUP
>>>>>>     ])
>>>>>> +
>>>>>> +OVN_FOR_EACH_NORTHD([
>>>>>> +AT_SETUP([Multiple DGP and LB traffic])
>>>>>> +AT_KEYWORDS([dgp-lb])
>>>>>> +AT_SKIP_IF([test $HAVE_SCAPY = no])
>>>>>> +ovn_start
>>>>>> +
>>>>>> +# Logical network:
>>>>>> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
>>>>>> +# 1 gateway chassis attached to each corresponding LRP.
>>>>>> +# LB added attached to DR
>>>>>> +#
>>>>>> +#                | public (gw1) (172.168.0.0/24)
>>>>>> +#                |
>>>>>> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
>>>>>> +# (10.0.0.0/24)
>>>>>> +#
>>>>>> +# Routes (lr0):
>>>>>> +#
>>>>>> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
>>>>>> +#   default      ----> 172.168.0.1 (public)
>>>>>> +#
>>>>>> +#
>>>>>> +# Validate Traffic from public to LB and its response.
>>>>>> +# Validate traffic from public2 to LB and its response.
>>>>>> +
>>>>>> +test_ip_req_packet() {
>>>>>> +    local src_mac="$1"
>>>>>> +    local dst_mac="$2"
>>>>>> +    local src_ip="$3"
>>>>>> +    local dst_ip="$4"
>>>>>> +    local sport=$5
>>>>>> +    local iface=$6
>>>>>> +
>>>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
>> src='${src_mac}')/
>>>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>>>>>> +                    UDP(sport=${sport}, dport=4369)")
>>>>>> +
>>>>>> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
>>>>>> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
>>>>>> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
>>>>>> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
>>>>>> +}
>>>>>> +
>>>>>> +test_ip_rep_packet() {
>>>>>> +    local src_mac="$1"
>>>>>> +    local dst_mac="$2"
>>>>>> +    local src_ip="$3"
>>>>>> +    local dst_ip="$4"
>>>>>> +    local dport=$5
>>>>>> +
>>>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
>> src='${src_mac}')/
>>>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>>>>>> +                    UDP(sport=4369, dport=${dport})")
>>>>>> +
>>>>>> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>>>>>> +}
>>>>>> +
>>>>>> +net_add n
>>>>>> +
>>>>>> +sim_add hv1
>>>>>> +as hv1
>>>>>> +check ovs-vsctl add-br br-phys
>>>>>> +ovn_attach n br-phys 192.168.0.1
>>>>>> +check ovs-vsctl set open .
>>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
>>>>>> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
>>>>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>>>>>> +    options:rxq_pcap=hv1/vif1-rx.pcap
>>>>>> +
>>>>>> +sim_add hv2
>>>>>> +as hv2
>>>>>> +check ovs-vsctl add-br br-phys
>>>>>> +ovn_attach n br-phys 192.168.0.2
>>>>>> +check ovs-vsctl set open .
>>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
>>>>>> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
>>>>>> +    options:tx_pcap=hv2/vif1-tx.pcap \
>>>>>> +    options:rxq_pcap=hv2/vif1-rx.pcap
>>>>>> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
>>>>>> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
>>>>>> +    options:tx_pcap=hv2/vif2-tx.pcap \
>>>>>> +    options:rxq_pcap=hv2/vif2-rx.pcap
>>>>>> +
>>>>>> +check ovn-nbctl ls-add sw0
>>>>>> +check ovn-nbctl lsp-add sw0 sw0-port1
>>>>>> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01
>>>> 10.0.0.2"
>>>>>> +
>>>>>> +check ovn-nbctl ls-add public
>>>>>> +check ovn-nbctl lsp-add public ln-public
>>>>>> +check ovn-nbctl lsp-set-type ln-public localnet
>>>>>> +check ovn-nbctl lsp-set-addresses ln-public unknown
>>>>>> +check ovn-nbctl lsp-set-options ln-public network_name=phys
>>>>>> +check ovn-nbctl lsp-add public public-port1
>>>>>> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88
>>>> 172.168.0.200"
>>>>>> +
>>>>>> +check ovn-nbctl ls-add public2
>>>>>> +check ovn-nbctl lsp-add public2 ln-public2
>>>>>> +check ovn-nbctl lsp-set-type ln-public2 localnet
>>>>>> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
>>>>>> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
>>>>>> +check ovn-nbctl lsp-add public2 public2-port1
>>>>>> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99
>>>> 173.168.0.200"
>>>>>> +
>>>>>> +check ovn-nbctl lr-add lr0
>>>>>> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
>>>>>> +check ovn-nbctl lsp-add sw0 sw0-lr0
>>>>>> +check ovn-nbctl lsp-set-type sw0-lr0 router
>>>>>> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
>>>>>> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
>>>>>> +
>>>>>> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13
>>>> 172.168.0.1/24
>>>>>> +check ovn-nbctl lsp-add public public-lr0
>>>>>> +check ovn-nbctl lsp-set-type public-lr0 router
>>>>>> +check ovn-nbctl lsp-set-addresses public-lr0 router
>>>>>> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
>>>>>> +
>>>>>> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14
>>>> 173.168.0.1/24
>>>>>> +check ovn-nbctl lsp-add public2 public2-lr0
>>>>>> +check ovn-nbctl lsp-set-type public2-lr0 router
>>>>>> +check ovn-nbctl lsp-set-addresses public2-lr0 router
>>>>>> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
>>>>>> +
>>>>>> +
>>>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
>>>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
>>>>>> +
>>>>>> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
>>>>>> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
>>>>>> +
>>>>>> +wait_for_ports_up
>>>>>> +OVN_POPULATE_ARP
>>>>>> +
>>>>>> +
>>>>>> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
>>>>>> +check ovn-nbctl lr-lb-add lr0 lb0
>>>>>> +
>>>>>> +ovn-sbctl lflow-list lr0
>>>>>> +
>>>>>> +# send UDP request to the load-balancer VIP from public switch 1
>>>>>> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13"
>>>> "172.168.0.200" "172.168.0.10" 53 hv2-vif1
>>>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
>>>> hv1/vif1-tx.pcap | wc -l) -ge 1])
>>>>>> +# send UDP reply from sw0-port1
>>>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
>>>> "172.168.0.200" 53
>>>>>> +# packet sent by the load balancer VIP
>>>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88',
>>>> src='00:00:20:20:12:13')/
>>>>>> +                      IP(dst='172.168.0.200', src='172.168.0.10',
>>>> ttl=63)/ \
>>>>>> +                      UDP(sport=4369, dport=53)")
>>>>>> +echo $packet > expected
>>>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
>>>>>> +
>>>>>> +# send UDP request to the load-balancer VIP from public switch 2
>>>>>> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14"
>>>> "173.168.0.200" "172.168.0.10" 54 hv2-vif2
>>>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
>>>> hv1/vif1-tx.pcap | wc -l) -ge 1])
>>>>>> +# send UDP reply from sw0-port1
>>>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
>>>> "173.168.0.200" 54
>>>>>> +# packet sent by the load balancer VIP
>>>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99',
>>>> src='00:00:20:20:12:14')/
>>>>>> +                      IP(dst='173.168.0.200', src='172.168.0.10',
>>>> ttl=63)/ \
>>>>>> +                      UDP(sport=4369, dport=54)")
>>>>>> +echo $packet > expected
>>>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
>>>>>> +
>>>>>> +
>>>>>> +OVN_CLEANUP([hv1],[hv2])
>>>>>> +AT_CLEANUP
>>>>>> +])
>>>>>> --
>>>>>> 2.39.2
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev@openvswitch.org
>>>>>>
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=MK6dYyDJXEqoWiI5_CHJlLMBqkveZAbXAkd1nboKnOQttNKUVr7bMurJRHckcEve&s=Z4TvyW108zN_uD5A6r4R0AcyVSdmfR3GVvWLssPYumY&e=
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=gd4MU4YDCxFHxMDPFWsES-e_pK1HDm28jFCnXhb4bHU&e=
>>>>
>>
>>
>
Numan Siddique Aug. 6, 2024, 9:03 p.m. UTC | #8
On Tue, Aug 6, 2024 at 4:42 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> Hi I just wanted to weigh in.
>
> On 8/6/24 13:26, Roberto Bartzen Acosta via dev wrote:
> > Hi Priyankar,
> >
> > Em ter., 6 de ago. de 2024 às 12:35, Priyankar Jain <
> > priyankar.jain@nutanix.com> escreveu:
> >
> >> Hi,
> >>
> >> On 06/08/24 6:15 pm, Roberto Bartzen Acosta wrote:
> >>> Hi Priyankar,
> >>>
> >>> So, what is the difference between the patch and the one already proposed
> >>> [1]?
> >> Ahh I just got to know about this patch. Looks same to me.
> >>> The change you are proposing has already been proposed before, and in
> >>> discussion with Mark we came to the conclusion that this breaks the DNAT
> >>> case when using multiple chassis. So, I pushed the v4 [2][3] with
> >> Stateless
> >>> NAT implementation to solve this problem.
>
> I agree that this patch looks a lot like v1 of Roberto's series. I can't
> comment on v4 of the patch yet since I haven't had a chance to look at
> it yet.
>
> I have it on my list to get reviewed before the end of the week.
>
> >> I reviewed this patch. If we are using stateless NAT, does that mean
> >> packet won't
> >> be passed through CT in the egress direction? If so, wouldn't this cause
> >> TCP state
> >> machine to break wrt CT (and CT table marking the packets as invalid).
> >> say SYN gets
> >> commited to CT (ingress pipeline ct_lb_mark action), syn-ACK won't be
> >> commited
> >> as only egress will run on GW node. Again the ACK would be commited. So
> >> are we
> >> sure wrt sequence numbers/window scaling, we are safe wrt TCP CT state
> >> machine
> >> here?
> >>
> >
> > My proposal does not affect LB egress since the output from the backends is
> > still done integrated with the ls_in_lb pipeline, and the DGP egress flow
> > depends on how you configure the routing/external NAT (e.g. OVN-IC uses
> > direct routing for this).
> >
> > In addition, the v4 patch adds the ability to receive packets in the
> > lr_in_dnat pipeline via different chassis_resident and, additionally,
> > deliver the packet to ct_lb_mark to process the CT state in the LB pipeline
> > before delivering it to the backend IPs (regardless of the chassis that
> > receives the traffic).
> >
> >
> >>
> >> Also, as i mentioned in my previous reply, our usecase is much more
> >> restricted
> >> and routing ensures ingress during request (from outside to LR) and egress
> >> (
> >> for undnat) runs on the gw node.
> >>
>
> The problem with this is that there's nothing in the attached patch to
> enforce this restricted scneario. It's possible for people to shoot
> themselves in the foot.

In the current OVN main code, when a logical router with multiple DGPs
is configured
with load balancers,  then ovn-northd logs a warning, but it also
generates logical flows
for the VIPs and all generated logical flows related to the LBs in the
router pipeline
has a  match - "is_chassis_resident(<first_found_dgp_port_name>)". Looks like
they are shooting themselves in the foot anyway.

Maybe we should not generate these logical flows OR we just accept this patch
documenting the limitations.  Let the user know that they may shoot
themselves in the foot.
@Mark Michelson  wdyt ?

Thanks
Numan


>
> >>
> > Have you ever applied the patch v4 to your use case? Considering the test
> > cases you write in your patch, the tests passed with my version v4 without
> > any issues. What do you think?
> >
> >
> > ## ------------------------ ##
> > ## ovn 24.03.90 test suite. ##
> > ## ------------------------ ##
> >
> > OVN end-to-end tests
> >
> > 533: Multiple DGP and LB traffic -- parallelization=yes --
> > ovn_monitor_all=yes ok
> > 534: Multiple DGP and LB traffic -- parallelization=yes --
> > ovn_monitor_all=no ok
> >
> >
> > Thanks,
> > Roberto
> >
> >
> >
> >> Thanks,
> >> Priyankar
> >>> [1]
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DFebruary_411981.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=JR1CKXLiNrDGVt85mXYTutzBqhpCpiABOutl2TMPWEE&e=
> >>> [2]
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DJuly_415977.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=QA8-pREOG2X_f-Hk8zDEeXxyV6jOoNx3pRxuKAIhsCk&e=
> >>> [3]
> >>>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovsrobot_ovn_commit_9379c81152df2bcd0626269988efb8ac36d4f624&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=axc06cz9XfMSiEKgrD6QmmoYwrBVxBnm1T2gEx-_byY&e=
> >>>
> >>> Thanks,
> >>> Roberto
> >>>
> >>> Em ter., 6 de ago. de 2024 às 07:57, Priyankar Jain <
> >>> priyankar.jain@nutanix.com> escreveu:
> >>>
> >>>> Hi,
> >>>> Please find my reply inline.
> >>>>
> >>>> Thanks,
> >>>> Priyankar
> >>>>
> >>>> On 31/07/24 4:46 am, Numan Siddique wrote:
> >>>>> On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
> >>>>> <priyankar.jain@nutanix.com> wrote:
> >>>>>> This commit removes the restriction of support LB for router with only
> >>>>>> <= 1 Distributed gateway ports.
> >>>>>> Added datapath and logical flows validation cases.
> >>>>>>
> >>>>>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
> >>>>> Hi Priyankar,
> >>>>>
> >>>>> Thanks for the patch.
> >>>>>
> >>>>> Can you please tell me the use case ?
> >>>>>
> >>>>> Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
> >>>>> (20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
> >>>>> gw ports are bound to chassis ch1, ch2 and ch3 respectively.
> >>>>>
> >>>>> Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
> >>>>> 20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
> >>>>> lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.
> >>>>>
> >>>>> If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
> >>>>> the chassis ch1 and so on for other switches.
> >>>>>
> >>>>> This patch generates the below logical flows in the router pipeline
> >>>>> for load balancer lb1, lb2 and lb3
> >>>>>
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw1")),
> >>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw2")),
> >>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw3")),
> >>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw1")),
> >>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw2")),
> >>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw3")),
> >>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw1")),
> >>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw2")),
> >>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw3")),
> >>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>>>
> >>>>> If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
> >>>>> then ideally  the routing will be handled in ch1 since lr1-sw1 is
> >>>>> bound on ch1.
> >>>>>
> >>>>> Now with the above logical flows,  load balancing could happen in the
> >>>>> source chassis itself. Is that ok ?
> >>>> In our usecase, sw2 is not expected to handle load balancing for
> >>>> 10.0.0.0/24 CIDR. it is expected to handle LB only for 20.0.0.0/24
> >>>> (Note: LB is not attached to the logical switch itself, since the
> >>>> same LS can be connected to multiple such LRs.)
> >>>>
> >>>>                     | S1 (gw1) [10.0.0.0/24]
> >>>>                     |
> >>>>          ls1  ----  LR -- S2 (gw2) [20.0.0.0/24]
> >>>>     (40.0.0.0/24)
> >>>>
> >>>> LB1: 10.0.0.40:80 --> 40.0.0.10:80, 40.0.0.20:80
> >>>> LB2: 30.0.0.40:80 --> 40.0.0.30:80, 40.0.0.40:80
> >>>>
> >>>> Routes:
> >>>> 10.0.0.0/24 next-hop via S1
> >>>> default next-hop via S2
> >>>>
> >>>>
> >>>> In our usecase, backends are only present on ls1.
> >>>> S1 and S2 does not have any backends. Similarly,
> >>>> S1 needs to handle the traffic only for LB: 10.0.0.40
> >>>> S2 needs to handle the traffic only for LB: 30.0.0.40
> >>>>
> >>>> Hence, ingress coming from S2 happens on gw2 during the
> >>>> request part. and egress again happens on gw2 (due to
> >>>> above routes) for the response.
> >>>>
> >>>> Although, the patch does not handle all the usecases, but with
> >>>> existing OVN impl, it does not even program any LB rules.
> >>>> Hence, this patch acheives the above restricted use-case.
> >>>> Let me know, if i need to add some logs etc..
> >>>>
> >>>>> Shouldn't this patch generate the flows like this for lb1, lb2 and lb3
> >> ?
> >>>>>
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw1")),
> >>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw2")),
> >>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
> >>>>>      table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
> >>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
> >>>>> is_chassis_resident("cr-lr0-sw3")),
> >>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
> >>>>>
> >>>>> So that the LB NAT happens on the chassis where the distributed
> >>>>> gateway port for the LB vip subnet resides ?
> >>>> The problem is how do we provide mapping of LB VIP and DGP? VIP need
> >>>> not to belong to subnet's CIDR.
> >>>>> Thanks
> >>>>> Numan
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> ---
> >>>>>>     northd/en-lr-stateful.c |  12 ---
> >>>>>>     northd/northd.c         |  12 +--
> >>>>>>     tests/ovn-northd.at     |  86 +++++++++++++++++++++
> >>>>>>     tests/ovn.at            | 167
> >>>> ++++++++++++++++++++++++++++++++++++++++
> >>>>>>     4 files changed, 260 insertions(+), 17 deletions(-)
> >>>>>>
> >>>>>> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
> >>>>>> index baf1bd2f8..f09691af6 100644
> >>>>>> --- a/northd/en-lr-stateful.c
> >>>>>> +++ b/northd/en-lr-stateful.c
> >>>>>> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct
> >> lr_stateful_table
> >>>> *table,
> >>>>>>         table->array[od->index] = lr_stateful_rec;
> >>>>>>
> >>>>>> -    /* Load balancers are not supported (yet) if a logical router has
> >>>> multiple
> >>>>>> -     * distributed gateway port.  Log a warning. */
> >>>>>> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od))
> >> {
> >>>>>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
> >> 1);
> >>>>>> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
> >>>>>> -                     "router %s, which has %"PRIuSIZE" distributed "
> >>>>>> -                     "gateway ports. Load-balancer is not supported "
> >>>>>> -                     "yet when there is more than one distributed "
> >>>>>> -                     "gateway port on the router.",
> >>>>>> -                     od->nbr->name, od->n_l3dgw_ports);
> >>>>>> -    }
> >>>>>> -
> >>>>>>         return lr_stateful_rec;
> >>>>>>     }
> >>>>>>
> >>>>>> diff --git a/northd/northd.c b/northd/northd.c
> >>>>>> index 6898daa00..e6f53f361 100644
> >>>>>> --- a/northd/northd.c
> >>>>>> +++ b/northd/northd.c
> >>>>>> @@ -11026,10 +11026,9 @@ static void
> >>>>>>     build_distr_lrouter_nat_flows_for_lb(struct
> >> lrouter_nat_lb_flows_ctx
> >>>> *ctx,
> >>>>>>                                          enum lrouter_nat_lb_flow_type
> >>>> type,
> >>>>>>                                          struct ovn_datapath *od,
> >>>>>> +                                     struct ovn_port *dgp,
> >>>>>>                                          struct lflow_ref *lflow_ref)
> >>>>>>     {
> >>>>>> -    struct ovn_port *dgp = od->l3dgw_ports[0];
> >>>>>> -
> >>>>>>         const char *undnat_action;
> >>>>>>
> >>>>>>         switch (type) {
> >>>>>> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct
> >>>> lrouter_nat_lb_flows_ctx *ctx,
> >>>>>>         if (ctx->lb_vip->n_backends ||
> >> !ctx->lb_vip->empty_backend_rej) {
> >>>>>>             ds_put_format(ctx->new_match, " &&
> >> is_chassis_resident(%s)",
> >>>>>> -                      od->l3dgw_ports[0]->cr_port->json_key);
> >>>>>> +                      dgp->cr_port->json_key);
> >>>>>>         }
> >>>>>>
> >>>>>>         ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT,
> >>>> ctx->prio,
> >>>>>> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
> >>>>>>             if (!od->n_l3dgw_ports) {
> >>>>>>                 bitmap_set1(gw_dp_bitmap[type], index);
> >>>>>>             } else {
> >>>>>> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> >>>>>> -                                                 lb_dps->lflow_ref);
> >>>>>> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
> >>>>>> +                struct ovn_port *dgp = od->l3dgw_ports[i];
> >>>>>> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
> >>>> dgp,
> >>>>>> +
> >>>>    lb_dps->lflow_ref);
> >>>>>> +            }
> >>>>>>             }
> >>>>>>
> >>>>>>             if (lb->affinity_timeout) {
> >>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >>>>>> index a389d1988..5be48f49e 100644
> >>>>>> --- a/tests/ovn-northd.at
> >>>>>> +++ b/tests/ovn-northd.at
> >>>>>> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep
> >>>> lr_in_dnat | ovn_strip_lflows], [0], [d
> >>>>>>     AT_CLEANUP
> >>>>>>     ])
> >>>>>> +
> >>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([
> >>>>>> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
> >>>>>> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
> >>>>>> +ovn_start
> >>>>>> +
> >>>>>> +# Logical network:
> >>>>>> +# 1 Logical Router, 3 bridged Logical Switches,
> >>>>>> +# 1 gateway chassis attached to each corresponding LRP.
> >>>>>> +# LB added attached to DR
> >>>>>> +#
> >>>>>> +#                | S1 (gw1)
> >>>>>> +#                |
> >>>>>> +#      ls  ----  DR -- S3 (gw3)
> >>>>>> +# (20.0.0.0/24)  |
> >>>>>> +#                | S2 (gw2)
> >>>>>> +#
> >>>>>> +# Validate basic LB logical flows.
> >>>>>> +
> >>>>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
> >>>>>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
> >>>>>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
> >>>>>> +
> >>>>>> +check ovn-nbctl lr-add DR
> >>>>>> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
> >>>>>> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
> >>>>>> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
> >>>>>> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
> >>>>>> +
> >>>>>> +check ovn-nbctl ls-add S1
> >>>>>> +check ovn-nbctl lsp-add S1 S1-DR
> >>>>>> +check ovn-nbctl lsp-set-type S1-DR router
> >>>>>> +check ovn-nbctl lsp-set-addresses S1-DR router
> >>>>>> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
> >>>>>> +
> >>>>>> +check ovn-nbctl ls-add S2
> >>>>>> +check ovn-nbctl lsp-add S2 S2-DR
> >>>>>> +check ovn-nbctl lsp-set-type S2-DR router
> >>>>>> +check ovn-nbctl lsp-set-addresses S2-DR router
> >>>>>> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
> >>>>>> +
> >>>>>> +check ovn-nbctl ls-add S3
> >>>>>> +check ovn-nbctl lsp-add S3 S3-DR
> >>>>>> +check ovn-nbctl lsp-set-type S3-DR router
> >>>>>> +check ovn-nbctl lsp-set-addresses S3-DR router
> >>>>>> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
> >>>>>> +
> >>>>>> +check ovn-nbctl ls-add  ls
> >>>>>> +check ovn-nbctl lsp-add ls ls-DR
> >>>>>> +check ovn-nbctl lsp-set-type ls-DR router
> >>>>>> +check ovn-nbctl lsp-set-addresses ls-DR router
> >>>>>> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
> >>>>>> +
> >>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
> >>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
> >>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
> >>>>>> +
> >>>>>> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
> >>>>>> +check ovn-nbctl lr-lb-add DR lb-1
> >>>>>> +
> >>>>>> +check ovn-nbctl --wait=sb sync
> >>>>>> +
> >>>>>> +ovn-sbctl dump-flows DR > lrflows
> >>>>>> +AT_CAPTURE_FILE([lrflows])
> >>>>>> +
> >>>>>> +# Check the flows in lr_in_dnat stage
> >>>>>> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR |
> >>>> ovn_strip_lflows], [0], [dnl
> >>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> >>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> >>>> is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80
> >>>> ,20.0.0.9:80);)
> >>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> >>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> >>>> is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80
> >>>> ,20.0.0.9:80);)
> >>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
> >>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
> >>>> is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80
> >>>> ,20.0.0.9:80);)
> >>>>>> +])
> >>>>>> +# Check the flows in lr_in_gw_redirect stage
> >>>>>> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep
> >>>> cr-DR | ovn_strip_lflows], [0], [dnl
> >>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> >>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> >> tcp.src
> >>>> == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
> >>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> >>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> >> tcp.src
> >>>> == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
> >>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
> >>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> >> tcp.src
> >>>> == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
> >>>>>> +])
> >>>>>> +# Check the flows in lr_out_undnat stage
> >>>>>> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR
> >>>> | ovn_strip_lflows], [0], [dnl
> >>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> >>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> >> tcp.src
> >>>> == 80)) && (inport == "DR-S1" || outport == "DR-S1") &&
> >>>> is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
> >>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> >>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> >> tcp.src
> >>>> == 80)) && (inport == "DR-S2" || outport == "DR-S2") &&
> >>>> is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
> >>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
> >>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
> >> tcp.src
> >>>> == 80)) && (inport == "DR-S3" || outport == "DR-S3") &&
> >>>> is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
> >>>>>> +])
> >>>>>> +
> >>>>>> +AT_CLEANUP
> >>>>>> +])
> >>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>>>>> index 185ba4a21..8e8c102c0 100644
> >>>>>> --- a/tests/ovn.at
> >>>>>> +++ b/tests/ovn.at
> >>>>>> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
> >>>>>>
> >>>>>>     AT_CLEANUP
> >>>>>>     ])
> >>>>>> +
> >>>>>> +OVN_FOR_EACH_NORTHD([
> >>>>>> +AT_SETUP([Multiple DGP and LB traffic])
> >>>>>> +AT_KEYWORDS([dgp-lb])
> >>>>>> +AT_SKIP_IF([test $HAVE_SCAPY = no])
> >>>>>> +ovn_start
> >>>>>> +
> >>>>>> +# Logical network:
> >>>>>> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
> >>>>>> +# 1 gateway chassis attached to each corresponding LRP.
> >>>>>> +# LB added attached to DR
> >>>>>> +#
> >>>>>> +#                | public (gw1) (172.168.0.0/24)
> >>>>>> +#                |
> >>>>>> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
> >>>>>> +# (10.0.0.0/24)
> >>>>>> +#
> >>>>>> +# Routes (lr0):
> >>>>>> +#
> >>>>>> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
> >>>>>> +#   default      ----> 172.168.0.1 (public)
> >>>>>> +#
> >>>>>> +#
> >>>>>> +# Validate Traffic from public to LB and its response.
> >>>>>> +# Validate traffic from public2 to LB and its response.
> >>>>>> +
> >>>>>> +test_ip_req_packet() {
> >>>>>> +    local src_mac="$1"
> >>>>>> +    local dst_mac="$2"
> >>>>>> +    local src_ip="$3"
> >>>>>> +    local dst_ip="$4"
> >>>>>> +    local sport=$5
> >>>>>> +    local iface=$6
> >>>>>> +
> >>>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
> >> src='${src_mac}')/
> >>>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> >>>>>> +                    UDP(sport=${sport}, dport=4369)")
> >>>>>> +
> >>>>>> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
> >>>>>> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
> >>>>>> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
> >>>>>> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
> >>>>>> +}
> >>>>>> +
> >>>>>> +test_ip_rep_packet() {
> >>>>>> +    local src_mac="$1"
> >>>>>> +    local dst_mac="$2"
> >>>>>> +    local src_ip="$3"
> >>>>>> +    local dst_ip="$4"
> >>>>>> +    local dport=$5
> >>>>>> +
> >>>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
> >> src='${src_mac}')/
> >>>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
> >>>>>> +                    UDP(sport=4369, dport=${dport})")
> >>>>>> +
> >>>>>> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> >>>>>> +}
> >>>>>> +
> >>>>>> +net_add n
> >>>>>> +
> >>>>>> +sim_add hv1
> >>>>>> +as hv1
> >>>>>> +check ovs-vsctl add-br br-phys
> >>>>>> +ovn_attach n br-phys 192.168.0.1
> >>>>>> +check ovs-vsctl set open .
> >>>> external-ids:ovn-bridge-mappings=phys:br-phys
> >>>>>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
> >>>>>> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
> >>>>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
> >>>>>> +    options:rxq_pcap=hv1/vif1-rx.pcap
> >>>>>> +
> >>>>>> +sim_add hv2
> >>>>>> +as hv2
> >>>>>> +check ovs-vsctl add-br br-phys
> >>>>>> +ovn_attach n br-phys 192.168.0.2
> >>>>>> +check ovs-vsctl set open .
> >>>> external-ids:ovn-bridge-mappings=phys:br-phys
> >>>>>> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
> >>>>>> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
> >>>>>> +    options:tx_pcap=hv2/vif1-tx.pcap \
> >>>>>> +    options:rxq_pcap=hv2/vif1-rx.pcap
> >>>>>> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
> >>>>>> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
> >>>>>> +    options:tx_pcap=hv2/vif2-tx.pcap \
> >>>>>> +    options:rxq_pcap=hv2/vif2-rx.pcap
> >>>>>> +
> >>>>>> +check ovn-nbctl ls-add sw0
> >>>>>> +check ovn-nbctl lsp-add sw0 sw0-port1
> >>>>>> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01
> >>>> 10.0.0.2"
> >>>>>> +
> >>>>>> +check ovn-nbctl ls-add public
> >>>>>> +check ovn-nbctl lsp-add public ln-public
> >>>>>> +check ovn-nbctl lsp-set-type ln-public localnet
> >>>>>> +check ovn-nbctl lsp-set-addresses ln-public unknown
> >>>>>> +check ovn-nbctl lsp-set-options ln-public network_name=phys
> >>>>>> +check ovn-nbctl lsp-add public public-port1
> >>>>>> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88
> >>>> 172.168.0.200"
> >>>>>> +
> >>>>>> +check ovn-nbctl ls-add public2
> >>>>>> +check ovn-nbctl lsp-add public2 ln-public2
> >>>>>> +check ovn-nbctl lsp-set-type ln-public2 localnet
> >>>>>> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
> >>>>>> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
> >>>>>> +check ovn-nbctl lsp-add public2 public2-port1
> >>>>>> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99
> >>>> 173.168.0.200"
> >>>>>> +
> >>>>>> +check ovn-nbctl lr-add lr0
> >>>>>> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
> >>>>>> +check ovn-nbctl lsp-add sw0 sw0-lr0
> >>>>>> +check ovn-nbctl lsp-set-type sw0-lr0 router
> >>>>>> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
> >>>>>> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
> >>>>>> +
> >>>>>> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13
> >>>> 172.168.0.1/24
> >>>>>> +check ovn-nbctl lsp-add public public-lr0
> >>>>>> +check ovn-nbctl lsp-set-type public-lr0 router
> >>>>>> +check ovn-nbctl lsp-set-addresses public-lr0 router
> >>>>>> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
> >>>>>> +
> >>>>>> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14
> >>>> 173.168.0.1/24
> >>>>>> +check ovn-nbctl lsp-add public2 public2-lr0
> >>>>>> +check ovn-nbctl lsp-set-type public2-lr0 router
> >>>>>> +check ovn-nbctl lsp-set-addresses public2-lr0 router
> >>>>>> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
> >>>>>> +
> >>>>>> +
> >>>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
> >>>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
> >>>>>> +
> >>>>>> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
> >>>>>> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
> >>>>>> +
> >>>>>> +wait_for_ports_up
> >>>>>> +OVN_POPULATE_ARP
> >>>>>> +
> >>>>>> +
> >>>>>> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
> >>>>>> +check ovn-nbctl lr-lb-add lr0 lb0
> >>>>>> +
> >>>>>> +ovn-sbctl lflow-list lr0
> >>>>>> +
> >>>>>> +# send UDP request to the load-balancer VIP from public switch 1
> >>>>>> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13"
> >>>> "172.168.0.200" "172.168.0.10" 53 hv2-vif1
> >>>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
> >>>> hv1/vif1-tx.pcap | wc -l) -ge 1])
> >>>>>> +# send UDP reply from sw0-port1
> >>>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
> >>>> "172.168.0.200" 53
> >>>>>> +# packet sent by the load balancer VIP
> >>>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88',
> >>>> src='00:00:20:20:12:13')/
> >>>>>> +                      IP(dst='172.168.0.200', src='172.168.0.10',
> >>>> ttl=63)/ \
> >>>>>> +                      UDP(sport=4369, dport=53)")
> >>>>>> +echo $packet > expected
> >>>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
> >>>>>> +
> >>>>>> +# send UDP request to the load-balancer VIP from public switch 2
> >>>>>> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14"
> >>>> "173.168.0.200" "172.168.0.10" 54 hv2-vif2
> >>>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
> >>>> hv1/vif1-tx.pcap | wc -l) -ge 1])
> >>>>>> +# send UDP reply from sw0-port1
> >>>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
> >>>> "173.168.0.200" 54
> >>>>>> +# packet sent by the load balancer VIP
> >>>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99',
> >>>> src='00:00:20:20:12:14')/
> >>>>>> +                      IP(dst='173.168.0.200', src='172.168.0.10',
> >>>> ttl=63)/ \
> >>>>>> +                      UDP(sport=4369, dport=54)")
> >>>>>> +echo $packet > expected
> >>>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
> >>>>>> +
> >>>>>> +
> >>>>>> +OVN_CLEANUP([hv1],[hv2])
> >>>>>> +AT_CLEANUP
> >>>>>> +])
> >>>>>> --
> >>>>>> 2.39.2
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> dev mailing list
> >>>>>> dev@openvswitch.org
> >>>>>>
> >>>>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=MK6dYyDJXEqoWiI5_CHJlLMBqkveZAbXAkd1nboKnOQttNKUVr7bMurJRHckcEve&s=Z4TvyW108zN_uD5A6r4R0AcyVSdmfR3GVvWLssPYumY&e=
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> dev@openvswitch.org
> >>>>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=gd4MU4YDCxFHxMDPFWsES-e_pK1HDm28jFCnXhb4bHU&e=
> >>>>
> >>
> >>
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Priyankar Jain Aug. 13, 2024, 5:09 p.m. UTC | #9
On 07/08/24 2:33 am, Numan Siddique wrote:
> !-------------------------------------------------------------------|
>    CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> On Tue, Aug 6, 2024 at 4:42 PM Mark Michelson <mmichels@redhat.com> wrote:
>> Hi I just wanted to weigh in.
>>
>> On 8/6/24 13:26, Roberto Bartzen Acosta via dev wrote:
>>> Hi Priyankar,
>>>
>>> Em ter., 6 de ago. de 2024 às 12:35, Priyankar Jain <
>>> priyankar.jain@nutanix.com> escreveu:
>>>
>>>> Hi,
>>>>
>>>> On 06/08/24 6:15 pm, Roberto Bartzen Acosta wrote:
>>>>> Hi Priyankar,
>>>>>
>>>>> So, what is the difference between the patch and the one already proposed
>>>>> [1]?
>>>> Ahh I just got to know about this patch. Looks same to me.
>>>>> The change you are proposing has already been proposed before, and in
>>>>> discussion with Mark we came to the conclusion that this breaks the DNAT
>>>>> case when using multiple chassis. So, I pushed the v4 [2][3] with
>>>> Stateless
>>>>> NAT implementation to solve this problem.
>> I agree that this patch looks a lot like v1 of Roberto's series. I can't
>> comment on v4 of the patch yet since I haven't had a chance to look at
>> it yet.
>>
>> I have it on my list to get reviewed before the end of the week.
>>
>>>> I reviewed this patch. If we are using stateless NAT, does that mean
>>>> packet won't
>>>> be passed through CT in the egress direction? If so, wouldn't this cause
>>>> TCP state
>>>> machine to break wrt CT (and CT table marking the packets as invalid).
>>>> say SYN gets
>>>> commited to CT (ingress pipeline ct_lb_mark action), syn-ACK won't be
>>>> commited
>>>> as only egress will run on GW node. Again the ACK would be commited. So
>>>> are we
>>>> sure wrt sequence numbers/window scaling, we are safe wrt TCP CT state
>>>> machine
>>>> here?
>>>>
>>> My proposal does not affect LB egress since the output from the backends is
>>> still done integrated with the ls_in_lb pipeline, and the DGP egress flow
>>> depends on how you configure the routing/external NAT (e.g. OVN-IC uses
>>> direct routing for this).
>>>
>>> In addition, the v4 patch adds the ability to receive packets in the
>>> lr_in_dnat pipeline via different chassis_resident and, additionally,
>>> deliver the packet to ct_lb_mark to process the CT state in the LB pipeline
>>> before delivering it to the backend IPs (regardless of the chassis that
>>> receives the traffic).
>>>
>>>
>>>> Also, as i mentioned in my previous reply, our usecase is much more
>>>> restricted
>>>> and routing ensures ingress during request (from outside to LR) and egress
>>>> (
>>>> for undnat) runs on the gw node.
>>>>
>> The problem with this is that there's nothing in the attached patch to
>> enforce this restricted scneario. It's possible for people to shoot
>> themselves in the foot.
> In the current OVN main code, when a logical router with multiple DGPs
> is configured
> with load balancers,  then ovn-northd logs a warning, but it also
> generates logical flows
> for the VIPs and all generated logical flows related to the LBs in the
> router pipeline
> has a  match - "is_chassis_resident(<first_found_dgp_port_name>)". Looks like
> they are shooting themselves in the foot anyway.
>
> Maybe we should not generate these logical flows OR we just accept this patch
> documenting the limitations.  Let the user know that they may shoot
> themselves in the foot.
> @Mark Michelson  wdyt ?
>
> Thanks
> Numan
>

Hi, Can you please let me know the next steps here? Is there any other 
direction
I should think of solving the above usecase.
In case we want to accept this patch, we can document the usecase where 
it would
work and where it won't.
Also, I am not very sure abt the statless unDNAT for reply direction 
that was
introduced in the other patch series. Its implication on conntrack.

Regards,
Priyankar

>>> Have you ever applied the patch v4 to your use case? Considering the test
>>> cases you write in your patch, the tests passed with my version v4 without
>>> any issues. What do you think?
>>>
>>>
>>> ## ------------------------ ##
>>> ## ovn 24.03.90 test suite. ##
>>> ## ------------------------ ##
>>>
>>> OVN end-to-end tests
>>>
>>> 533: Multiple DGP and LB traffic -- parallelization=yes --
>>> ovn_monitor_all=yes ok
>>> 534: Multiple DGP and LB traffic -- parallelization=yes --
>>> ovn_monitor_all=no ok
>>>
>>>
>>> Thanks,
>>> Roberto
>>>
>>>
>>>
>>>> Thanks,
>>>> Priyankar
>>>>> [1]
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DFebruary_411981.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=JR1CKXLiNrDGVt85mXYTutzBqhpCpiABOutl2TMPWEE&e=
>>>>> [2]
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2024-2DJuly_415977.html&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=QA8-pREOG2X_f-Hk8zDEeXxyV6jOoNx3pRxuKAIhsCk&e=
>>>>> [3]
>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovsrobot_ovn_commit_9379c81152df2bcd0626269988efb8ac36d4f624&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=axc06cz9XfMSiEKgrD6QmmoYwrBVxBnm1T2gEx-_byY&e=
>>>>> Thanks,
>>>>> Roberto
>>>>>
>>>>> Em ter., 6 de ago. de 2024 às 07:57, Priyankar Jain <
>>>>> priyankar.jain@nutanix.com> escreveu:
>>>>>
>>>>>> Hi,
>>>>>> Please find my reply inline.
>>>>>>
>>>>>> Thanks,
>>>>>> Priyankar
>>>>>>
>>>>>> On 31/07/24 4:46 am, Numan Siddique wrote:
>>>>>>> On Wed, Jul 17, 2024 at 5:04 AM Priyankar Jain
>>>>>>> <priyankar.jain@nutanix.com> wrote:
>>>>>>>> This commit removes the restriction of support LB for router with only
>>>>>>>> <= 1 Distributed gateway ports.
>>>>>>>> Added datapath and logical flows validation cases.
>>>>>>>>
>>>>>>>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
>>>>>>> Hi Priyankar,
>>>>>>>
>>>>>>> Thanks for the patch.
>>>>>>>
>>>>>>> Can you please tell me the use case ?
>>>>>>>
>>>>>>> Lets say a logical router lr1 has sw1 (10.0.0.0/24) , sw2
>>>>>>> (20.0.0.0/24) and sw3 (30.0.0.0/24) and the corresponding
>>>>>>> gw ports are bound to chassis ch1, ch2 and ch3 respectively.
>>>>>>>
>>>>>>> Let's say we attach load balancers - lb1 (10.0.0.40:80 ->
>>>>>>> 20.0.0.40:80), lb2 (20.0.0.50:80 -> 30.0.0.50:80) and
>>>>>>> lb3 (30.0.0.60:80 -> 10.0.0.60:80) to lr1.
>>>>>>>
>>>>>>> If I understand correctly, routing for sw1 (10.0.0.0/24) is handled on
>>>>>>> the chassis ch1 and so on for other switches.
>>>>>>>
>>>>>>> This patch generates the below logical flows in the router pipeline
>>>>>>> for load balancer lb1, lb2 and lb3
>>>>>>>
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>>>
>>>>>>> If a logical port of sw2 (20.0.0.4) sends any packet to 10.0.0.0/24,
>>>>>>> then ideally  the routing will be handled in ch1 since lr1-sw1 is
>>>>>>> bound on ch1.
>>>>>>>
>>>>>>> Now with the above logical flows,  load balancing could happen in the
>>>>>>> source chassis itself. Is that ok ?
>>>>>> In our usecase, sw2 is not expected to handle load balancing for
>>>>>> 10.0.0.0/24 CIDR. it is expected to handle LB only for 20.0.0.0/24
>>>>>> (Note: LB is not attached to the logical switch itself, since the
>>>>>> same LS can be connected to multiple such LRs.)
>>>>>>
>>>>>>                      | S1 (gw1) [10.0.0.0/24]
>>>>>>                      |
>>>>>>           ls1  ----  LR -- S2 (gw2) [20.0.0.0/24]
>>>>>>      (40.0.0.0/24)
>>>>>>
>>>>>> LB1: 10.0.0.40:80 --> 40.0.0.10:80, 40.0.0.20:80
>>>>>> LB2: 30.0.0.40:80 --> 40.0.0.30:80, 40.0.0.40:80
>>>>>>
>>>>>> Routes:
>>>>>> 10.0.0.0/24 next-hop via S1
>>>>>> default next-hop via S2
>>>>>>
>>>>>>
>>>>>> In our usecase, backends are only present on ls1.
>>>>>> S1 and S2 does not have any backends. Similarly,
>>>>>> S1 needs to handle the traffic only for LB: 10.0.0.40
>>>>>> S2 needs to handle the traffic only for LB: 30.0.0.40
>>>>>>
>>>>>> Hence, ingress coming from S2 happens on gw2 during the
>>>>>> request part. and egress again happens on gw2 (due to
>>>>>> above routes) for the response.
>>>>>>
>>>>>> Although, the patch does not handle all the usecases, but with
>>>>>> existing OVN impl, it does not even program any LB rules.
>>>>>> Hence, this patch acheives the above restricted use-case.
>>>>>> Let me know, if i need to add some logs etc..
>>>>>>
>>>>>>> Shouldn't this patch generate the flows like this for lb1, lb2 and lb3
>>>> ?
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 10.0.0.40 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw1")),
>>>>>>> action=(ct_lb_mark(backends=20.0.0.40:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.50 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw2")),
>>>>>>> action=(ct_lb_mark(backends=30.0.0.50:80);)
>>>>>>>       table=8 (lr_in_dnat         ), priority=110  , match=(ct.new &&
>>>>>>> !ct.rel && ip4 && ip4.dst == 30.0.0.60 && tcp && tcp.dst == 80 &&
>>>>>>> is_chassis_resident("cr-lr0-sw3")),
>>>>>>> action=(ct_lb_mark(backends=10.0.0.60:80);)
>>>>>>>
>>>>>>> So that the LB NAT happens on the chassis where the distributed
>>>>>>> gateway port for the LB vip subnet resides ?
>>>>>> The problem is how do we provide mapping of LB VIP and DGP? VIP need
>>>>>> not to belong to subnet's CIDR.
>>>>>>> Thanks
>>>>>>> Numan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> ---
>>>>>>>>      northd/en-lr-stateful.c |  12 ---
>>>>>>>>      northd/northd.c         |  12 +--
>>>>>>>>      tests/ovn-northd.at     |  86 +++++++++++++++++++++
>>>>>>>>      tests/ovn.at            | 167
>>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>>>      4 files changed, 260 insertions(+), 17 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
>>>>>>>> index baf1bd2f8..f09691af6 100644
>>>>>>>> --- a/northd/en-lr-stateful.c
>>>>>>>> +++ b/northd/en-lr-stateful.c
>>>>>>>> @@ -516,18 +516,6 @@ lr_stateful_record_create(struct
>>>> lr_stateful_table
>>>>>> *table,
>>>>>>>>          table->array[od->index] = lr_stateful_rec;
>>>>>>>>
>>>>>>>> -    /* Load balancers are not supported (yet) if a logical router has
>>>>>> multiple
>>>>>>>> -     * distributed gateway port.  Log a warning. */
>>>>>>>> -    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od))
>>>> {
>>>>>>>> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
>>>> 1);
>>>>>>>> -        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
>>>>>>>> -                     "router %s, which has %"PRIuSIZE" distributed "
>>>>>>>> -                     "gateway ports. Load-balancer is not supported "
>>>>>>>> -                     "yet when there is more than one distributed "
>>>>>>>> -                     "gateway port on the router.",
>>>>>>>> -                     od->nbr->name, od->n_l3dgw_ports);
>>>>>>>> -    }
>>>>>>>> -
>>>>>>>>          return lr_stateful_rec;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>>> index 6898daa00..e6f53f361 100644
>>>>>>>> --- a/northd/northd.c
>>>>>>>> +++ b/northd/northd.c
>>>>>>>> @@ -11026,10 +11026,9 @@ static void
>>>>>>>>      build_distr_lrouter_nat_flows_for_lb(struct
>>>> lrouter_nat_lb_flows_ctx
>>>>>> *ctx,
>>>>>>>>                                           enum lrouter_nat_lb_flow_type
>>>>>> type,
>>>>>>>>                                           struct ovn_datapath *od,
>>>>>>>> +                                     struct ovn_port *dgp,
>>>>>>>>                                           struct lflow_ref *lflow_ref)
>>>>>>>>      {
>>>>>>>> -    struct ovn_port *dgp = od->l3dgw_ports[0];
>>>>>>>> -
>>>>>>>>          const char *undnat_action;
>>>>>>>>
>>>>>>>>          switch (type) {
>>>>>>>> @@ -11060,7 +11059,7 @@ build_distr_lrouter_nat_flows_for_lb(struct
>>>>>> lrouter_nat_lb_flows_ctx *ctx,
>>>>>>>>          if (ctx->lb_vip->n_backends ||
>>>> !ctx->lb_vip->empty_backend_rej) {
>>>>>>>>              ds_put_format(ctx->new_match, " &&
>>>> is_chassis_resident(%s)",
>>>>>>>> -                      od->l3dgw_ports[0]->cr_port->json_key);
>>>>>>>> +                      dgp->cr_port->json_key);
>>>>>>>>          }
>>>>>>>>
>>>>>>>>          ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT,
>>>>>> ctx->prio,
>>>>>>>> @@ -11263,8 +11262,11 @@ build_lrouter_nat_flows_for_lb(
>>>>>>>>              if (!od->n_l3dgw_ports) {
>>>>>>>>                  bitmap_set1(gw_dp_bitmap[type], index);
>>>>>>>>              } else {
>>>>>>>> -            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
>>>>>>>> -                                                 lb_dps->lflow_ref);
>>>>>>>> +            for (int i = 0; i < od->n_l3dgw_ports; i++) {
>>>>>>>> +                struct ovn_port *dgp = od->l3dgw_ports[i];
>>>>>>>> +                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
>>>>>> dgp,
>>>>>>>> +
>>>>>>     lb_dps->lflow_ref);
>>>>>>>> +            }
>>>>>>>>              }
>>>>>>>>
>>>>>>>>              if (lb->affinity_timeout) {
>>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>>>>>>> index a389d1988..5be48f49e 100644
>>>>>>>> --- a/tests/ovn-northd.at
>>>>>>>> +++ b/tests/ovn-northd.at
>>>>>>>> @@ -12721,3 +12721,89 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep
>>>>>> lr_in_dnat | ovn_strip_lflows], [0], [d
>>>>>>>>      AT_CLEANUP
>>>>>>>>      ])
>>>>>>>> +
>>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([
>>>>>>>> +AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
>>>>>>>> +AT_KEYWORDS([lb-multiple-l3dgw-ports])
>>>>>>>> +ovn_start
>>>>>>>> +
>>>>>>>> +# Logical network:
>>>>>>>> +# 1 Logical Router, 3 bridged Logical Switches,
>>>>>>>> +# 1 gateway chassis attached to each corresponding LRP.
>>>>>>>> +# LB added attached to DR
>>>>>>>> +#
>>>>>>>> +#                | S1 (gw1)
>>>>>>>> +#                |
>>>>>>>> +#      ls  ----  DR -- S3 (gw3)
>>>>>>>> +# (20.0.0.0/24)  |
>>>>>>>> +#                | S2 (gw2)
>>>>>>>> +#
>>>>>>>> +# Validate basic LB logical flows.
>>>>>>>> +
>>>>>>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>>>>>>>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
>>>>>>>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lr-add DR
>>>>>>>> +check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>>>>>>>> +check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
>>>>>>>> +check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
>>>>>>>> +check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
>>>>>>>> +
>>>>>>>> +check ovn-nbctl ls-add S1
>>>>>>>> +check ovn-nbctl lsp-add S1 S1-DR
>>>>>>>> +check ovn-nbctl lsp-set-type S1-DR router
>>>>>>>> +check ovn-nbctl lsp-set-addresses S1-DR router
>>>>>>>> +check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
>>>>>>>> +
>>>>>>>> +check ovn-nbctl ls-add S2
>>>>>>>> +check ovn-nbctl lsp-add S2 S2-DR
>>>>>>>> +check ovn-nbctl lsp-set-type S2-DR router
>>>>>>>> +check ovn-nbctl lsp-set-addresses S2-DR router
>>>>>>>> +check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
>>>>>>>> +
>>>>>>>> +check ovn-nbctl ls-add S3
>>>>>>>> +check ovn-nbctl lsp-add S3 S3-DR
>>>>>>>> +check ovn-nbctl lsp-set-type S3-DR router
>>>>>>>> +check ovn-nbctl lsp-set-addresses S3-DR router
>>>>>>>> +check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
>>>>>>>> +
>>>>>>>> +check ovn-nbctl ls-add  ls
>>>>>>>> +check ovn-nbctl lsp-add ls ls-DR
>>>>>>>> +check ovn-nbctl lsp-set-type ls-DR router
>>>>>>>> +check ovn-nbctl lsp-set-addresses ls-DR router
>>>>>>>> +check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
>>>>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
>>>>>>>> +check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
>>>>>>>> +check ovn-nbctl lr-lb-add DR lb-1
>>>>>>>> +
>>>>>>>> +check ovn-nbctl --wait=sb sync
>>>>>>>> +
>>>>>>>> +ovn-sbctl dump-flows DR > lrflows
>>>>>>>> +AT_CAPTURE_FILE([lrflows])
>>>>>>>> +
>>>>>>>> +# Check the flows in lr_in_dnat stage
>>>>>>>> +AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR |
>>>>>> ovn_strip_lflows], [0], [dnl
>>>>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>>>>>> is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80
>>>>>> ,20.0.0.9:80);)
>>>>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>>>>>> is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80
>>>>>> ,20.0.0.9:80);)
>>>>>>>> +  table=??(lr_in_dnat         ), priority=120  , match=(ct.new &&
>>>>>> !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 &&
>>>>>> is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80
>>>>>> ,20.0.0.9:80);)
>>>>>>>> +])
>>>>>>>> +# Check the flows in lr_in_gw_redirect stage
>>>>>>>> +AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep
>>>>>> cr-DR | ovn_strip_lflows], [0], [dnl
>>>>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>>>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>>>> tcp.src
>>>>>> == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
>>>>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>>>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>>>> tcp.src
>>>>>> == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
>>>>>>>> +  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 &&
>>>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>>>> tcp.src
>>>>>> == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
>>>>>>>> +])
>>>>>>>> +# Check the flows in lr_out_undnat stage
>>>>>>>> +AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR
>>>>>> | ovn_strip_lflows], [0], [dnl
>>>>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>>>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>>>> tcp.src
>>>>>> == 80)) && (inport == "DR-S1" || outport == "DR-S1") &&
>>>>>> is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
>>>>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>>>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>>>> tcp.src
>>>>>> == 80)) && (inport == "DR-S2" || outport == "DR-S2") &&
>>>>>> is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
>>>>>>>> +  table=??(lr_out_undnat      ), priority=120  , match=(ip4 &&
>>>>>> ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 &&
>>>> tcp.src
>>>>>> == 80)) && (inport == "DR-S3" || outport == "DR-S3") &&
>>>>>> is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
>>>>>>>> +])
>>>>>>>> +
>>>>>>>> +AT_CLEANUP
>>>>>>>> +])
>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>>>>> index 185ba4a21..8e8c102c0 100644
>>>>>>>> --- a/tests/ovn.at
>>>>>>>> +++ b/tests/ovn.at
>>>>>>>> @@ -38426,3 +38426,170 @@ OVN_CLEANUP([hv1],[hv2])
>>>>>>>>
>>>>>>>>      AT_CLEANUP
>>>>>>>>      ])
>>>>>>>> +
>>>>>>>> +OVN_FOR_EACH_NORTHD([
>>>>>>>> +AT_SETUP([Multiple DGP and LB traffic])
>>>>>>>> +AT_KEYWORDS([dgp-lb])
>>>>>>>> +AT_SKIP_IF([test $HAVE_SCAPY = no])
>>>>>>>> +ovn_start
>>>>>>>> +
>>>>>>>> +# Logical network:
>>>>>>>> +# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
>>>>>>>> +# 1 gateway chassis attached to each corresponding LRP.
>>>>>>>> +# LB added attached to DR
>>>>>>>> +#
>>>>>>>> +#                | public (gw1) (172.168.0.0/24)
>>>>>>>> +#                |
>>>>>>>> +#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
>>>>>>>> +# (10.0.0.0/24)
>>>>>>>> +#
>>>>>>>> +# Routes (lr0):
>>>>>>>> +#
>>>>>>>> +#   173.0.0.0/24 ----> 173.168.0.1 (public2)
>>>>>>>> +#   default      ----> 172.168.0.1 (public)
>>>>>>>> +#
>>>>>>>> +#
>>>>>>>> +# Validate Traffic from public to LB and its response.
>>>>>>>> +# Validate traffic from public2 to LB and its response.
>>>>>>>> +
>>>>>>>> +test_ip_req_packet() {
>>>>>>>> +    local src_mac="$1"
>>>>>>>> +    local dst_mac="$2"
>>>>>>>> +    local src_ip="$3"
>>>>>>>> +    local dst_ip="$4"
>>>>>>>> +    local sport=$5
>>>>>>>> +    local iface=$6
>>>>>>>> +
>>>>>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
>>>> src='${src_mac}')/
>>>>>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>>>>>>>> +                    UDP(sport=${sport}, dport=4369)")
>>>>>>>> +
>>>>>>>> +    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
>>>>>>>> +    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
>>>>>>>> +    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
>>>>>>>> +    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +test_ip_rep_packet() {
>>>>>>>> +    local src_mac="$1"
>>>>>>>> +    local dst_mac="$2"
>>>>>>>> +    local src_ip="$3"
>>>>>>>> +    local dst_ip="$4"
>>>>>>>> +    local dport=$5
>>>>>>>> +
>>>>>>>> +    local packet=$(fmt_pkt "Ether(dst='${dst_mac}',
>>>> src='${src_mac}')/
>>>>>>>> +                    IP(dst='${dst_ip}', src='${src_ip}')/ \
>>>>>>>> +                    UDP(sport=4369, dport=${dport})")
>>>>>>>> +
>>>>>>>> +    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +net_add n
>>>>>>>> +
>>>>>>>> +sim_add hv1
>>>>>>>> +as hv1
>>>>>>>> +check ovs-vsctl add-br br-phys
>>>>>>>> +ovn_attach n br-phys 192.168.0.1
>>>>>>>> +check ovs-vsctl set open .
>>>>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>>>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
>>>>>>>> +    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
>>>>>>>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>>>>>>>> +    options:rxq_pcap=hv1/vif1-rx.pcap
>>>>>>>> +
>>>>>>>> +sim_add hv2
>>>>>>>> +as hv2
>>>>>>>> +check ovs-vsctl add-br br-phys
>>>>>>>> +ovn_attach n br-phys 192.168.0.2
>>>>>>>> +check ovs-vsctl set open .
>>>>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>>>> +check ovs-vsctl -- add-port br-int hv2-vif1 -- \
>>>>>>>> +    set interface hv2-vif1 external-ids:iface-id=public-port1 \
>>>>>>>> +    options:tx_pcap=hv2/vif1-tx.pcap \
>>>>>>>> +    options:rxq_pcap=hv2/vif1-rx.pcap
>>>>>>>> +check ovs-vsctl -- add-port br-int hv2-vif2 -- \
>>>>>>>> +    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
>>>>>>>> +    options:tx_pcap=hv2/vif2-tx.pcap \
>>>>>>>> +    options:rxq_pcap=hv2/vif2-rx.pcap
>>>>>>>> +
>>>>>>>> +check ovn-nbctl ls-add sw0
>>>>>>>> +check ovn-nbctl lsp-add sw0 sw0-port1
>>>>>>>> +check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01
>>>>>> 10.0.0.2"
>>>>>>>> +
>>>>>>>> +check ovn-nbctl ls-add public
>>>>>>>> +check ovn-nbctl lsp-add public ln-public
>>>>>>>> +check ovn-nbctl lsp-set-type ln-public localnet
>>>>>>>> +check ovn-nbctl lsp-set-addresses ln-public unknown
>>>>>>>> +check ovn-nbctl lsp-set-options ln-public network_name=phys
>>>>>>>> +check ovn-nbctl lsp-add public public-port1
>>>>>>>> +check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88
>>>>>> 172.168.0.200"
>>>>>>>> +
>>>>>>>> +check ovn-nbctl ls-add public2
>>>>>>>> +check ovn-nbctl lsp-add public2 ln-public2
>>>>>>>> +check ovn-nbctl lsp-set-type ln-public2 localnet
>>>>>>>> +check ovn-nbctl lsp-set-addresses ln-public2 unknown
>>>>>>>> +check ovn-nbctl lsp-set-options ln-public2 network_name=phys
>>>>>>>> +check ovn-nbctl lsp-add public2 public2-port1
>>>>>>>> +check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99
>>>>>> 173.168.0.200"
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lr-add lr0
>>>>>>>> +check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
>>>>>>>> +check ovn-nbctl lsp-add sw0 sw0-lr0
>>>>>>>> +check ovn-nbctl lsp-set-type sw0-lr0 router
>>>>>>>> +check ovn-nbctl lsp-set-addresses sw0-lr0 router
>>>>>>>> +check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13
>>>>>> 172.168.0.1/24
>>>>>>>> +check ovn-nbctl lsp-add public public-lr0
>>>>>>>> +check ovn-nbctl lsp-set-type public-lr0 router
>>>>>>>> +check ovn-nbctl lsp-set-addresses public-lr0 router
>>>>>>>> +check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14
>>>>>> 173.168.0.1/24
>>>>>>>> +check ovn-nbctl lsp-add public2 public2-lr0
>>>>>>>> +check ovn-nbctl lsp-set-type public2-lr0 router
>>>>>>>> +check ovn-nbctl lsp-set-addresses public2-lr0 router
>>>>>>>> +check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
>>>>>>>> +check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
>>>>>>>> +check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
>>>>>>>> +
>>>>>>>> +wait_for_ports_up
>>>>>>>> +OVN_POPULATE_ARP
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
>>>>>>>> +check ovn-nbctl lr-lb-add lr0 lb0
>>>>>>>> +
>>>>>>>> +ovn-sbctl lflow-list lr0
>>>>>>>> +
>>>>>>>> +# send UDP request to the load-balancer VIP from public switch 1
>>>>>>>> +test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13"
>>>>>> "172.168.0.200" "172.168.0.10" 53 hv2-vif1
>>>>>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
>>>>>> hv1/vif1-tx.pcap | wc -l) -ge 1])
>>>>>>>> +# send UDP reply from sw0-port1
>>>>>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
>>>>>> "172.168.0.200" 53
>>>>>>>> +# packet sent by the load balancer VIP
>>>>>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88',
>>>>>> src='00:00:20:20:12:13')/
>>>>>>>> +                      IP(dst='172.168.0.200', src='172.168.0.10',
>>>>>> ttl=63)/ \
>>>>>>>> +                      UDP(sport=4369, dport=53)")
>>>>>>>> +echo $packet > expected
>>>>>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
>>>>>>>> +
>>>>>>>> +# send UDP request to the load-balancer VIP from public switch 2
>>>>>>>> +test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14"
>>>>>> "173.168.0.200" "172.168.0.10" 54 hv2-vif2
>>>>>>>> +OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
>>>>>> hv1/vif1-tx.pcap | wc -l) -ge 1])
>>>>>>>> +# send UDP reply from sw0-port1
>>>>>>>> +test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2"
>>>>>> "173.168.0.200" 54
>>>>>>>> +# packet sent by the load balancer VIP
>>>>>>>> +packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99',
>>>>>> src='00:00:20:20:12:14')/
>>>>>>>> +                      IP(dst='173.168.0.200', src='172.168.0.10',
>>>>>> ttl=63)/ \
>>>>>>>> +                      UDP(sport=4369, dport=54)")
>>>>>>>> +echo $packet > expected
>>>>>>>> +OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +OVN_CLEANUP([hv1],[hv2])
>>>>>>>> +AT_CLEANUP
>>>>>>>> +])
>>>>>>>> --
>>>>>>>> 2.39.2
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> dev@openvswitch.org
>>>>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=MK6dYyDJXEqoWiI5_CHJlLMBqkveZAbXAkd1nboKnOQttNKUVr7bMurJRHckcEve&s=Z4TvyW108zN_uD5A6r4R0AcyVSdmfR3GVvWLssPYumY&e=
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev@openvswitch.org
>>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=WfKrYrekTgMkBBiSsurQ7bYA31uPfqnE-Ybpq06XApWiO9Ij2MlCi8dO02I-kZf0&s=gd4MU4YDCxFHxMDPFWsES-e_pK1HDm28jFCnXhb4bHU&e=
>>>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=RiwEyFBf4IqZMKhkrPbRCNLiQaNgs0lSbxh9l0_cind6x-QMK2VKqhc5ihpMNcwA&s=WMYWjDVPW3S0wvU8qatNjrkPIweuZcVqVnpN-nVa96E&e=
diff mbox series

Patch

diff --git a/northd/en-lr-stateful.c b/northd/en-lr-stateful.c
index baf1bd2f8..f09691af6 100644
--- a/northd/en-lr-stateful.c
+++ b/northd/en-lr-stateful.c
@@ -516,18 +516,6 @@  lr_stateful_record_create(struct lr_stateful_table *table,
 
     table->array[od->index] = lr_stateful_rec;
 
-    /* Load balancers are not supported (yet) if a logical router has multiple
-     * distributed gateway port.  Log a warning. */
-    if (lr_stateful_rec->has_lb_vip && lr_has_multiple_gw_ports(od)) {
-        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
-        VLOG_WARN_RL(&rl, "Load-balancers are configured on logical "
-                     "router %s, which has %"PRIuSIZE" distributed "
-                     "gateway ports. Load-balancer is not supported "
-                     "yet when there is more than one distributed "
-                     "gateway port on the router.",
-                     od->nbr->name, od->n_l3dgw_ports);
-    }
-
     return lr_stateful_rec;
 }
 
diff --git a/northd/northd.c b/northd/northd.c
index 6898daa00..e6f53f361 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -11026,10 +11026,9 @@  static void
 build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
                                      enum lrouter_nat_lb_flow_type type,
                                      struct ovn_datapath *od,
+                                     struct ovn_port *dgp,
                                      struct lflow_ref *lflow_ref)
 {
-    struct ovn_port *dgp = od->l3dgw_ports[0];
-
     const char *undnat_action;
 
     switch (type) {
@@ -11060,7 +11059,7 @@  build_distr_lrouter_nat_flows_for_lb(struct lrouter_nat_lb_flows_ctx *ctx,
 
     if (ctx->lb_vip->n_backends || !ctx->lb_vip->empty_backend_rej) {
         ds_put_format(ctx->new_match, " && is_chassis_resident(%s)",
-                      od->l3dgw_ports[0]->cr_port->json_key);
+                      dgp->cr_port->json_key);
     }
 
     ovn_lflow_add_with_hint__(ctx->lflows, od, S_ROUTER_IN_DNAT, ctx->prio,
@@ -11263,8 +11262,11 @@  build_lrouter_nat_flows_for_lb(
         if (!od->n_l3dgw_ports) {
             bitmap_set1(gw_dp_bitmap[type], index);
         } else {
-            build_distr_lrouter_nat_flows_for_lb(&ctx, type, od,
-                                                 lb_dps->lflow_ref);
+            for (int i = 0; i < od->n_l3dgw_ports; i++) {
+                struct ovn_port *dgp = od->l3dgw_ports[i];
+                build_distr_lrouter_nat_flows_for_lb(&ctx, type, od, dgp,
+                                                     lb_dps->lflow_ref);
+            }
         }
 
         if (lb->affinity_timeout) {
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index a389d1988..5be48f49e 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -12721,3 +12721,89 @@  AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d
 
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([ovn-northd -- LB on Lr with multiple gw ports])
+AT_KEYWORDS([lb-multiple-l3dgw-ports])
+ovn_start
+
+# Logical network:
+# 1 Logical Router, 3 bridged Logical Switches,
+# 1 gateway chassis attached to each corresponding LRP.
+# LB added attached to DR
+#
+#                | S1 (gw1)
+#                |
+#      ls  ----  DR -- S3 (gw3)
+# (20.0.0.0/24)  |
+#                | S2 (gw2)
+#
+# Validate basic LB logical flows.
+
+check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
+check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
+check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
+
+check ovn-nbctl lr-add DR
+check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
+check ovn-nbctl lrp-add DR DR-S2 03:ac:10:01:00:01 172.16.2.1/24
+check ovn-nbctl lrp-add DR DR-S3 04:ac:10:01:00:01 172.16.3.1/24
+check ovn-nbctl lrp-add DR DR-ls 05:ac:10:01:00:01 20.0.0.1/24
+
+check ovn-nbctl ls-add S1
+check ovn-nbctl lsp-add S1 S1-DR
+check ovn-nbctl lsp-set-type S1-DR router
+check ovn-nbctl lsp-set-addresses S1-DR router
+check ovn-nbctl --wait=sb lsp-set-options S1-DR router-port=DR-S1
+
+check ovn-nbctl ls-add S2
+check ovn-nbctl lsp-add S2 S2-DR
+check ovn-nbctl lsp-set-type S2-DR router
+check ovn-nbctl lsp-set-addresses S2-DR router
+check ovn-nbctl --wait=sb lsp-set-options S2-DR router-port=DR-S2
+
+check ovn-nbctl ls-add S3
+check ovn-nbctl lsp-add S3 S3-DR
+check ovn-nbctl lsp-set-type S3-DR router
+check ovn-nbctl lsp-set-addresses S3-DR router
+check ovn-nbctl --wait=sb lsp-set-options S3-DR router-port=DR-S3
+
+check ovn-nbctl ls-add  ls
+check ovn-nbctl lsp-add ls ls-DR
+check ovn-nbctl lsp-set-type ls-DR router
+check ovn-nbctl lsp-set-addresses ls-DR router
+check ovn-nbctl --wait=sb lsp-set-options ls-DR router-port=DR-ls
+
+check ovn-nbctl lrp-set-gateway-chassis DR-S1 gw1
+check ovn-nbctl lrp-set-gateway-chassis DR-S2 gw2
+check ovn-nbctl lrp-set-gateway-chassis DR-S3 gw3
+
+check ovn-nbctl lb-add lb-1 20.0.0.10:80 20.0.0.8:80,20.0.0.9:80 tcp
+check ovn-nbctl lr-lb-add DR lb-1
+
+check ovn-nbctl --wait=sb sync
+
+ovn-sbctl dump-flows DR > lrflows
+AT_CAPTURE_FILE([lrflows])
+
+# Check the flows in lr_in_dnat stage
+AT_CHECK([grep lr_in_dnat lrflows | grep priority=120 | grep cr-DR | ovn_strip_lflows], [0], [dnl
+  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S1")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
+  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S2")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
+  table=??(lr_in_dnat         ), priority=120  , match=(ct.new && !ct.rel && ip4 && ip4.dst == 20.0.0.10 && tcp && tcp.dst == 80 && is_chassis_resident("cr-DR-S3")), action=(ct_lb(backends=20.0.0.8:80,20.0.0.9:80);)
+])
+# Check the flows in lr_in_gw_redirect stage
+AT_CHECK([grep lr_in_gw_redirect lrflows | grep priority=200 | grep cr-DR | ovn_strip_lflows], [0], [dnl
+  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S1"), action=(outport = "cr-DR-S1"; next;)
+  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S2"), action=(outport = "cr-DR-S2"; next;)
+  table=??(lr_in_gw_redirect  ), priority=200  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && outport == "DR-S3"), action=(outport = "cr-DR-S3"; next;)
+])
+# Check the flows in lr_out_undnat stage
+AT_CHECK([grep lr_out_undnat lrflows | grep priority=120 | grep cr-DR | ovn_strip_lflows], [0], [dnl
+  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S1" || outport == "DR-S1") && is_chassis_resident("cr-DR-S1")), action=(ct_dnat;)
+  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S2" || outport == "DR-S2") && is_chassis_resident("cr-DR-S2")), action=(ct_dnat;)
+  table=??(lr_out_undnat      ), priority=120  , match=(ip4 && ((ip4.src == 20.0.0.8 && tcp.src == 80) || (ip4.src == 20.0.0.9 && tcp.src == 80)) && (inport == "DR-S3" || outport == "DR-S3") && is_chassis_resident("cr-DR-S3")), action=(ct_dnat;)
+])
+
+AT_CLEANUP
+])
diff --git a/tests/ovn.at b/tests/ovn.at
index 185ba4a21..8e8c102c0 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -38426,3 +38426,170 @@  OVN_CLEANUP([hv1],[hv2])
 
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([Multiple DGP and LB traffic])
+AT_KEYWORDS([dgp-lb])
+AT_SKIP_IF([test $HAVE_SCAPY = no])
+ovn_start
+
+# Logical network:
+# 1 Logical Router, 2 bridged Logical Switches, 1 Logical switch
+# 1 gateway chassis attached to each corresponding LRP.
+# LB added attached to DR
+#
+#                | public (gw1) (172.168.0.0/24)
+#                |
+#      sw0  --- lr0 --- public2 (gw2) (173.168.0.0./24)
+# (10.0.0.0/24)
+#
+# Routes (lr0):
+#
+#   173.0.0.0/24 ----> 173.168.0.1 (public2)
+#   default      ----> 172.168.0.1 (public)
+#
+#
+# Validate Traffic from public to LB and its response.
+# Validate traffic from public2 to LB and its response.
+
+test_ip_req_packet() {
+    local src_mac="$1"
+    local dst_mac="$2"
+    local src_ip="$3"
+    local dst_ip="$4"
+    local sport=$5
+    local iface=$6
+
+    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
+                    IP(dst='${dst_ip}', src='${src_ip}')/ \
+                    UDP(sport=${sport}, dport=4369)")
+
+    as hv1 reset_pcap_file hv1-vif1 hv1/vif1
+    as hv2 reset_pcap_file hv2-vif1 hv2/vif1
+    as hv2 reset_pcap_file hv2-vif2 hv2/vif2
+    check as hv2 ovs-appctl netdev-dummy/receive $iface $packet
+}
+
+test_ip_rep_packet() {
+    local src_mac="$1"
+    local dst_mac="$2"
+    local src_ip="$3"
+    local dst_ip="$4"
+    local dport=$5
+
+    local packet=$(fmt_pkt "Ether(dst='${dst_mac}', src='${src_mac}')/
+                    IP(dst='${dst_ip}', src='${src_ip}')/ \
+                    UDP(sport=4369, dport=${dport})")
+
+    check as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+}
+
+net_add n
+
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_attach n br-phys 192.168.0.1
+check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+check ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=sw0-port1 \
+    options:tx_pcap=hv1/vif1-tx.pcap \
+    options:rxq_pcap=hv1/vif1-rx.pcap
+
+sim_add hv2
+as hv2
+check ovs-vsctl add-br br-phys
+ovn_attach n br-phys 192.168.0.2
+check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+check ovs-vsctl -- add-port br-int hv2-vif1 -- \
+    set interface hv2-vif1 external-ids:iface-id=public-port1 \
+    options:tx_pcap=hv2/vif1-tx.pcap \
+    options:rxq_pcap=hv2/vif1-rx.pcap
+check ovs-vsctl -- add-port br-int hv2-vif2 -- \
+    set interface hv2-vif2 external-ids:iface-id=public2-port1 \
+    options:tx_pcap=hv2/vif2-tx.pcap \
+    options:rxq_pcap=hv2/vif2-rx.pcap
+
+check ovn-nbctl ls-add sw0
+check ovn-nbctl lsp-add sw0 sw0-port1
+check ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 10.0.0.2"
+
+check ovn-nbctl ls-add public
+check ovn-nbctl lsp-add public ln-public
+check ovn-nbctl lsp-set-type ln-public localnet
+check ovn-nbctl lsp-set-addresses ln-public unknown
+check ovn-nbctl lsp-set-options ln-public network_name=phys
+check ovn-nbctl lsp-add public public-port1
+check ovn-nbctl lsp-set-addresses public-port1 "50:54:00:00:00:88 172.168.0.200"
+
+check ovn-nbctl ls-add public2
+check ovn-nbctl lsp-add public2 ln-public2
+check ovn-nbctl lsp-set-type ln-public2 localnet
+check ovn-nbctl lsp-set-addresses ln-public2 unknown
+check ovn-nbctl lsp-set-options ln-public2 network_name=phys
+check ovn-nbctl lsp-add public2 public2-port1
+check ovn-nbctl lsp-set-addresses public2-port1 "50:54:00:00:00:99 173.168.0.200"
+
+check ovn-nbctl lr-add lr0
+check ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
+check ovn-nbctl lsp-add sw0 sw0-lr0
+check ovn-nbctl lsp-set-type sw0-lr0 router
+check ovn-nbctl lsp-set-addresses sw0-lr0 router
+check ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0
+
+check ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.1/24
+check ovn-nbctl lsp-add public public-lr0
+check ovn-nbctl lsp-set-type public-lr0 router
+check ovn-nbctl lsp-set-addresses public-lr0 router
+check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
+
+check ovn-nbctl lrp-add lr0 lr0-public2 00:00:20:20:12:14 173.168.0.1/24
+check ovn-nbctl lsp-add public2 public2-lr0
+check ovn-nbctl lsp-set-type public2-lr0 router
+check ovn-nbctl lsp-set-addresses public2-lr0 router
+check ovn-nbctl lsp-set-options public2-lr0 router-port=lr0-public2
+
+
+check ovn-nbctl lrp-set-gateway-chassis lr0-public hv2 20
+check ovn-nbctl lrp-set-gateway-chassis lr0-public2 hv2 20
+
+check ovn-nbctl lr-route-add lr0 173.168.0.0/24 173.168.0.1
+check ovn-nbctl lr-route-add lr0 0.0.0.0/0 172.168.0.1
+
+wait_for_ports_up
+OVN_POPULATE_ARP
+
+
+check ovn-nbctl lb-add lb0 172.168.0.10:4369 10.0.0.2:4369 udp
+check ovn-nbctl lr-lb-add lr0 lb0
+
+ovn-sbctl lflow-list lr0
+
+# send UDP request to the load-balancer VIP from public switch 1
+test_ip_req_packet "50:54:00:00:00:88" "00:00:20:20:12:13" "172.168.0.200" "172.168.0.10" 53 hv2-vif1
+OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | wc -l) -ge 1])
+# send UDP reply from sw0-port1
+test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2" "172.168.0.200" 53
+# packet sent by the load balancer VIP
+packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:88', src='00:00:20:20:12:13')/
+                      IP(dst='172.168.0.200', src='172.168.0.10', ttl=63)/ \
+                      UDP(sport=4369, dport=53)")
+echo $packet > expected
+OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif1-tx.pcap], [expected])
+
+# send UDP request to the load-balancer VIP from public switch 2
+test_ip_req_packet "50:54:00:00:00:99" "00:00:20:20:12:14" "173.168.0.200" "172.168.0.10" 54 hv2-vif2
+OVS_WAIT_UNTIL([test $($PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | wc -l) -ge 1])
+# send UDP reply from sw0-port1
+test_ip_rep_packet "50:54:00:00:00:01" "00:00:00:00:ff:01" "10.0.0.2" "173.168.0.200" 54
+# packet sent by the load balancer VIP
+packet=$(fmt_pkt "Ether(dst='50:54:00:00:00:99', src='00:00:20:20:12:14')/
+                      IP(dst='173.168.0.200', src='172.168.0.10', ttl=63)/ \
+                      UDP(sport=4369, dport=54)")
+echo $packet > expected
+OVN_CHECK_PACKETS_REMOVE_BROADCAST([hv2/vif2-tx.pcap], [expected])
+
+
+OVN_CLEANUP([hv1],[hv2])
+AT_CLEANUP
+])