diff mbox series

[ovs-dev,ovn] ovn-northd: Skip unsnat flows for load balancer vips in router ingress pipeline

Message ID 20200326113247.526346-1-numans@ovn.org
State Superseded
Headers show
Series [ovs-dev,ovn] ovn-northd: Skip unsnat flows for load balancer vips in router ingress pipeline | expand

Commit Message

Numan Siddique March 26, 2020, 11:32 a.m. UTC
From: Numan Siddique <numans@ovn.org>

Suppose there is below NAT entry with external_ip = 172.168.0.100

nat <UUID>
    external ip: "172.168.0.100"
    logical ip: "10.0.0.0/24"
    type: "snat"

And a load balancer with the VIP - 172.168.0.100

_uuid               : <UUID>
external_ids        : {}
name                : lb1
protocol            : tcp
vips                : {"172.168.0.100:8080"="10.0.0.4:8080"}

And if these are associated to a gateway logical router

Then we will see the below lflows in the router pipeline

...
table=5 (lr_in_unsnat       ), priority=90   , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
...
table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
table=6 (lr_in_dnat         ), priority=120  , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)

When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
dnatted to 10.0.0.4 in the dnat conntrack zone.

But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
but looks like there is some issue in either ovs-vswitchd or in datapath, and the ct flags
are not set properly.

In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
lflows in lr_in_dnat stage hit.

This needs to be investigated further. But we can avoid this scenario in OVN
by adding the below lflow.

table=5 (lr_in_unsnat       ), priority=120  , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)

This patch adds the above lflow if the lb vip also has an entry in the NAT table.

This patch is also required to support sctp load balancers in OVN.

Reported-by: Tim Rozet <trozet@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
Signed-off-by: Numan Siddique <numans@ovn.org>
---
 northd/ovn-northd.8.xml | 14 ++++++++++
 northd/ovn-northd.c     | 58 +++++++++++++++++++++++++++++++----------
 tests/ovn-northd.at     | 40 ++++++++++++++++++++++++++++
 tests/system-ovn.at     | 51 +++++++++++++++++++++++++++++++++++-
 4 files changed, 148 insertions(+), 15 deletions(-)

Comments

Mark Michelson March 26, 2020, 12:35 p.m. UTC | #1
The gist of this change is that the configured SNAT external address and 
the load balancer VIP are the same. This means that we end up with both 
SNAT and DNAT occurring for the same external address. Would we see a 
similar problem if there were no load balancer, but we had configured 
the nat type to be "dnat_and_snat"? In this case, we wouldn't have the 
same conntrack-based lb flows in table 6. However, would we still run 
into the same issue with the conntrack state not being "est" because of 
the table 5 ct_snat flow? Or is there some safeguard in place to ensure 
this does not happen?

See inline below for more

On 3/26/20 7:32 AM, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> Suppose there is below NAT entry with external_ip = 172.168.0.100
> 
> nat <UUID>
>      external ip: "172.168.0.100"
>      logical ip: "10.0.0.0/24"
>      type: "snat"
> 
> And a load balancer with the VIP - 172.168.0.100
> 
> _uuid               : <UUID>
> external_ids        : {}
> name                : lb1
> protocol            : tcp
> vips                : {"172.168.0.100:8080"="10.0.0.4:8080"}
> 
> And if these are associated to a gateway logical router
> 
> Then we will see the below lflows in the router pipeline
> 
> ...
> table=5 (lr_in_unsnat       ), priority=90   , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
> ...
> table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
> table=6 (lr_in_dnat         ), priority=120  , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)
> 
> When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
> is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
> dnatted to 10.0.0.4 in the dnat conntrack zone.
> 
> But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
> stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
> as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
> but looks like there is some issue in either ovs-vswitchd or in datapath, and the ct flags
> are not set properly.
> 
> In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
> lflows in lr_in_dnat stage hit.
> 
> This needs to be investigated further. But we can avoid this scenario in OVN
> by adding the below lflow.
> 
> table=5 (lr_in_unsnat       ), priority=120  , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)
> 
> This patch adds the above lflow if the lb vip also has an entry in the NAT table.
> 
> This patch is also required to support sctp load balancers in OVN.
> 
> Reported-by: Tim Rozet <trozet@redhat.com>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
> Signed-off-by: Numan Siddique <numans@ovn.org>
> ---
>   northd/ovn-northd.8.xml | 14 ++++++++++
>   northd/ovn-northd.c     | 58 +++++++++++++++++++++++++++++++----------
>   tests/ovn-northd.at     | 40 ++++++++++++++++++++++++++++
>   tests/system-ovn.at     | 51 +++++++++++++++++++++++++++++++++++-
>   4 files changed, 148 insertions(+), 15 deletions(-)
> 
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 1e0993e07..0682b8b79 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -2138,6 +2138,20 @@ icmp6 {
>             <code>redirect-chassis</code>.
>           </p>
>   
> +        <p>
> +          For each IPv4 address <var>A</var> defined as load balancer
> +          VIP (associated with the logical router) with the protocol
> +          <var>P</var> (and the protocol port <var>T</var> if defined) is also
> +          present as an <code>external_ip</code> in the NAT table,
> +          a priority-120 logical flow is added with the match
> +          <code>ip4 &amp;&amp; ip4.dst == <var>A</var> &amp;&amp; </code> with
> +          the actions <code>next;</code> to advance the packet to the next
> +          table. If the load balancer also has protocol port <code>B</code> is
> +          defined, then the match also has
> +          <code><var>P</var>.dst == <var>B</var></code>.
> +          The same flows are added for IPv6 load balancers too.
> +        </p>
> +
>           <p>
>             A priority-0 logical flow with match <code>1</code> has actions
>             <code>next;</code>.
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index b76df0554..18fbbc7f1 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -7574,7 +7574,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>                      struct ds *match, struct ds *actions, int priority,
>                      const char *lb_force_snat_ip, struct lb_vip *lb_vip,
>                      bool is_udp, struct nbrec_load_balancer *lb,
> -                   struct shash *meter_groups)
> +                   struct shash *meter_groups, struct sset *nat_entries)
>   {
>       build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
>                                 meter_groups);
> @@ -7607,6 +7607,39 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>       free(new_match);
>       free(est_match);
>   
> +    const char *ip_match = NULL;
> +    if (lb_vip->addr_family == AF_INET) {
> +        ip_match = "ip4";
> +    } else {
> +        ip_match = "ip6";
> +    }
> +
> +    if (sset_contains(nat_entries, lb_vip->vip)) {
> +        /* The load balancer vip is also present in the NAT entries.
> +         * So add a high priority lflow to advance the the packet
> +         * destined to the vip (and the vip port if defined)
> +         * in the S_ROUTER_IN_UNSNAT stage.
> +         * There seems to be an issue with ovs-vswitchd. When the new
> +         * connection packet destined for the lb vip is received,
> +         * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
> +         * conntrack zone. For the next packet, if it goes through
> +         * unsnat stage, the conntrack flags are not set properly, and
> +         * it doesn't hit the established state flows in
> +         * S_ROUTER_IN_DNAT stage. */
> +        struct ds unsnat_match = DS_EMPTY_INITIALIZER;
> +        ds_put_format(&unsnat_match, "%s && %s.dst == %s",
> +                      ip_match, ip_match, lb_vip->vip);

I think you should add the load balancer l4 protocol here. Consider that 
you have the described NAT setup with a UDP load balancer with no VIP 
port. TCP traffic would hit this flow intended for load balancer traffic 
instead of the lower priority flow intended to UNSNAT.

> +        if (lb_vip->vip_port) {
> +            ds_put_format(&unsnat_match, " && %s.dst == %d",
> +                          is_udp ? "udp" : "tcp", lb_vip->vip_port);
> +        }
> +
> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
> +                                ds_cstr(&unsnat_match), "next;", &lb->header_);
> +
> +        ds_destroy(&unsnat_match);
> +    }
> +
>       if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
>           return;
>       }
> @@ -7616,19 +7649,11 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>        * router has a gateway router port associated.
>        */
>       struct ds undnat_match = DS_EMPTY_INITIALIZER;
> -    if (lb_vip->addr_family == AF_INET) {
> -        ds_put_cstr(&undnat_match, "ip4 && (");
> -    } else {
> -        ds_put_cstr(&undnat_match, "ip6 && (");
> -    }
> +    ds_put_format(&undnat_match, "%s && (", ip_match);
>   
>       for (size_t i = 0; i < lb_vip->n_backends; i++) {
>           struct lb_vip_backend *backend = &lb_vip->backends[i];
> -        if (backend->addr_family == AF_INET) {
> -            ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
> -        } else {
> -            ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
> -        }
> +        ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
>   
>           if (backend->port) {
>               ds_put_format(&undnat_match, " && %s.src == %d) || ",
> @@ -8879,6 +8904,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>                                               &nat->header_);
>                       sset_add(&nat_entries, nat->external_ip);
>                   }
> +            } else {
> +                /* Add the NAT external_ip to the nat_entries even for
> +                 * gateway routers. This is required for adding load balancer
> +                 * flows.*/
> +                sset_add(&nat_entries, nat->external_ip);
>               }
>   
>               /* Egress UNDNAT table: It is for already established connections'
> @@ -9059,8 +9089,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>               }
>           }
>   
> -        sset_destroy(&nat_entries);
> -
>           /* Handle force SNAT options set in the gateway router. */
>           if (dnat_force_snat_ip && !od->l3dgw_port) {
>               /* If a packet with destination IP address as that of the
> @@ -9121,6 +9149,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>           /* Load balancing and packet defrag are only valid on
>            * Gateway routers or router with gateway port. */
>           if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
> +            sset_destroy(&nat_entries);
>               continue;
>           }
>   
> @@ -9195,10 +9224,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>                   }
>                   add_router_lb_flow(lflows, od, &match, &actions, prio,
>                                      lb_force_snat_ip, lb_vip, is_udp,
> -                                   nb_lb, meter_groups);
> +                                   nb_lb, meter_groups, &nat_entries);
>               }
>           }
>           sset_destroy(&all_ips);
> +        sset_destroy(&nat_entries);
>       }
>   
>       /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index a2989e78e..c783ea2ac 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -1288,3 +1288,43 @@ ovn-nbctl --wait=sb lb-del lb2
>   OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor |  wc -l`])
>   
>   AT_CLEANUP
> +
> +AT_SETUP([ovn -- Load balancer VIP in NAT entries])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +ovn-nbctl lr-add lr0
> +ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
> +ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
> +
> +ovn-nbctl set logical_router lr0 options:chassis=ch1
> +
> +ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
> +ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080"
> +ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
> +ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
> +
> +ovn-nbctl lr-lb-add lr0 lb1
> +ovn-nbctl lr-lb-add lr0 lb2
> +ovn-nbctl lr-lb-add lr0 lb3
> +ovn-nbctl lr-lb-add lr0 lb4
> +
> +ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
> +ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
> +ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
> +
> +OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.1 && tcp.dst == 8080" -c) ])
> +
> +ovn-sbctl dump-flows lr0
Is this left over from debugging?

> +
> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.4 && tcp.dst == 8080" -c) ])
> +
> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.5 && tcp.dst == 8080" -c) ])
> +
> +AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.6 && tcp.dst == 8080" -c) ])
> +
> +AT_CLEANUP
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 3b3379840..f1ae69b20 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -1635,7 +1635,6 @@ ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
>   ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
>       external_ip=30.0.0.2 -- add logical_router R2 nat @nat
>   
> -
>   # Wait for ovn-controller to catch up.
>   ovn-nbctl --wait=hv sync
>   OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
> @@ -1671,6 +1670,56 @@ tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
>   tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
>   ])
>   
> +check_est_flows () {
> +    n=$(ovs-ofctl dump-flows br-int table=14 | grep \
> +"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
> +| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
> +
> +    echo "n_packets=$n"
> +    test "$n" != 0
> +}
> +
> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> +
> +
> +ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
> +
> +# Destroy the load balancer and create again. ovn-controller will
> +# clear the OF flows and re add again and clears the n_packets
> +# for these flows.
> +ovn-nbctl destroy load_balancer $uuid
> +uuid=`ovn-nbctl  create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
> +ovn-nbctl set logical_router R2 load_balancer=$uuid
> +
> +# Config OVN load-balancer with another VIP (this time with ports).
> +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
> +
> +ovn-nbctl list load_balancer
> +ovn-sbctl dump-flows R2
> +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
> +grep 'nat(src=20.0.0.2)'])
> +
> +dnl Test load-balancing that includes L4 ports in NAT.
> +for i in `seq 1 20`; do
> +    echo Request $i
> +    NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
> +done
> +
> +dnl Each server should have at least one connection.
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +])
> +
> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> +
>   OVS_APP_EXIT_AND_WAIT([ovn-controller])
>   
>   as ovn-sb
>
Numan Siddique March 26, 2020, 2:17 p.m. UTC | #2
On Thu, Mar 26, 2020 at 6:06 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> The gist of this change is that the configured SNAT external address and
> the load balancer VIP are the same. This means that we end up with both
> SNAT and DNAT occurring for the same external address. Would we see a
> similar problem if there were no load balancer, but we had configured
> the nat type to be "dnat_and_snat"? In this case, we wouldn't have the
> same conntrack-based lb flows in table 6. However, would we still run
> into the same issue with the conntrack state not being "est" because of
> the table 5 ct_snat flow? Or is there some safeguard in place to ensure
> this does not happen?

I don't think it's required in the case of dnat_and_snat. That's
because we don't check ct flags for
dnat_and_snat entries in lr_in_dnat.

Let's say we have below dnat_and_snat entry

***
nat c34ff396-80d2-42aa-a4cd-747150e4b1ab
        external ip: "172.168.0.110"
        logical ip: "10.0.0.3"
        type: "dnat_and_snat"
****
Then the flow is
 table=6 (lr_in_dnat         ), priority=100  , match=(ip && ip4.dst
== 172.168.0.110 && inport == "lr0-public"),
action=(ct_dnat(10.0.0.3);)

And the corresponding OF flow is

table=14, n_packets=0, n_bytes=0, idle_age=523,
priority=100,ip,reg14=0x3,metadata=0x3,nw_dst=172.168.0.110
actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=10.0.0.3))

Probably we may have to revisit the logical flows and see why ct
flags are not used in the case of dnat_and_snat entries.

>
> See inline below for more
>
> On 3/26/20 7:32 AM, numans@ovn.org wrote:
> > From: Numan Siddique <numans@ovn.org>
> >
> > Suppose there is below NAT entry with external_ip = 172.168.0.100
> >
> > nat <UUID>
> >      external ip: "172.168.0.100"
> >      logical ip: "10.0.0.0/24"
> >      type: "snat"
> >
> > And a load balancer with the VIP - 172.168.0.100
> >
> > _uuid               : <UUID>
> > external_ids        : {}
> > name                : lb1
> > protocol            : tcp
> > vips                : {"172.168.0.100:8080"="10.0.0.4:8080"}
> >
> > And if these are associated to a gateway logical router
> >
> > Then we will see the below lflows in the router pipeline
> >
> > ...
> > table=5 (lr_in_unsnat       ), priority=90   , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
> > ...
> > table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
> > table=6 (lr_in_dnat         ), priority=120  , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)
> >
> > When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
> > is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
> > dnatted to 10.0.0.4 in the dnat conntrack zone.
> >
> > But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
> > stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
> > as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
> > but looks like there is some issue in either ovs-vswitchd or in datapath, and the ct flags
> > are not set properly.
> >
> > In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
> > lflows in lr_in_dnat stage hit.
> >
> > This needs to be investigated further. But we can avoid this scenario in OVN
> > by adding the below lflow.
> >
> > table=5 (lr_in_unsnat       ), priority=120  , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)
> >
> > This patch adds the above lflow if the lb vip also has an entry in the NAT table.
> >
> > This patch is also required to support sctp load balancers in OVN.
> >
> > Reported-by: Tim Rozet <trozet@redhat.com>
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
> > Signed-off-by: Numan Siddique <numans@ovn.org>
> > ---
> >   northd/ovn-northd.8.xml | 14 ++++++++++
> >   northd/ovn-northd.c     | 58 +++++++++++++++++++++++++++++++----------
> >   tests/ovn-northd.at     | 40 ++++++++++++++++++++++++++++
> >   tests/system-ovn.at     | 51 +++++++++++++++++++++++++++++++++++-
> >   4 files changed, 148 insertions(+), 15 deletions(-)
> >
> > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > index 1e0993e07..0682b8b79 100644
> > --- a/northd/ovn-northd.8.xml
> > +++ b/northd/ovn-northd.8.xml
> > @@ -2138,6 +2138,20 @@ icmp6 {
> >             <code>redirect-chassis</code>.
> >           </p>
> >
> > +        <p>
> > +          For each IPv4 address <var>A</var> defined as load balancer
> > +          VIP (associated with the logical router) with the protocol
> > +          <var>P</var> (and the protocol port <var>T</var> if defined) is also
> > +          present as an <code>external_ip</code> in the NAT table,
> > +          a priority-120 logical flow is added with the match
> > +          <code>ip4 &amp;&amp; ip4.dst == <var>A</var> &amp;&amp; </code> with
> > +          the actions <code>next;</code> to advance the packet to the next
> > +          table. If the load balancer also has protocol port <code>B</code> is
> > +          defined, then the match also has
> > +          <code><var>P</var>.dst == <var>B</var></code>.
> > +          The same flows are added for IPv6 load balancers too.
> > +        </p>
> > +
> >           <p>
> >             A priority-0 logical flow with match <code>1</code> has actions
> >             <code>next;</code>.
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index b76df0554..18fbbc7f1 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -7574,7 +7574,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> >                      struct ds *match, struct ds *actions, int priority,
> >                      const char *lb_force_snat_ip, struct lb_vip *lb_vip,
> >                      bool is_udp, struct nbrec_load_balancer *lb,
> > -                   struct shash *meter_groups)
> > +                   struct shash *meter_groups, struct sset *nat_entries)
> >   {
> >       build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
> >                                 meter_groups);
> > @@ -7607,6 +7607,39 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> >       free(new_match);
> >       free(est_match);
> >
> > +    const char *ip_match = NULL;
> > +    if (lb_vip->addr_family == AF_INET) {
> > +        ip_match = "ip4";
> > +    } else {
> > +        ip_match = "ip6";
> > +    }
> > +
> > +    if (sset_contains(nat_entries, lb_vip->vip)) {
> > +        /* The load balancer vip is also present in the NAT entries.
> > +         * So add a high priority lflow to advance the the packet
> > +         * destined to the vip (and the vip port if defined)
> > +         * in the S_ROUTER_IN_UNSNAT stage.
> > +         * There seems to be an issue with ovs-vswitchd. When the new
> > +         * connection packet destined for the lb vip is received,
> > +         * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
> > +         * conntrack zone. For the next packet, if it goes through
> > +         * unsnat stage, the conntrack flags are not set properly, and
> > +         * it doesn't hit the established state flows in
> > +         * S_ROUTER_IN_DNAT stage. */
> > +        struct ds unsnat_match = DS_EMPTY_INITIALIZER;
> > +        ds_put_format(&unsnat_match, "%s && %s.dst == %s",
> > +                      ip_match, ip_match, lb_vip->vip);
>
> I think you should add the load balancer l4 protocol here. Consider that
> you have the described NAT setup with a UDP load balancer with no VIP
> port. TCP traffic would hit this flow intended for load balancer traffic
> instead of the lower priority flow intended to UNSNAT.

Agree. I'll do in v2.


>
> > +        if (lb_vip->vip_port) {
> > +            ds_put_format(&unsnat_match, " && %s.dst == %d",
> > +                          is_udp ? "udp" : "tcp", lb_vip->vip_port);
> > +        }
> > +
> > +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
> > +                                ds_cstr(&unsnat_match), "next;", &lb->header_);
> > +
> > +        ds_destroy(&unsnat_match);
> > +    }
> > +
> >       if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
> >           return;
> >       }
> > @@ -7616,19 +7649,11 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> >        * router has a gateway router port associated.
> >        */
> >       struct ds undnat_match = DS_EMPTY_INITIALIZER;
> > -    if (lb_vip->addr_family == AF_INET) {
> > -        ds_put_cstr(&undnat_match, "ip4 && (");
> > -    } else {
> > -        ds_put_cstr(&undnat_match, "ip6 && (");
> > -    }
> > +    ds_put_format(&undnat_match, "%s && (", ip_match);
> >
> >       for (size_t i = 0; i < lb_vip->n_backends; i++) {
> >           struct lb_vip_backend *backend = &lb_vip->backends[i];
> > -        if (backend->addr_family == AF_INET) {
> > -            ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
> > -        } else {
> > -            ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
> > -        }
> > +        ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
> >
> >           if (backend->port) {
> >               ds_put_format(&undnat_match, " && %s.src == %d) || ",
> > @@ -8879,6 +8904,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >                                               &nat->header_);
> >                       sset_add(&nat_entries, nat->external_ip);
> >                   }
> > +            } else {
> > +                /* Add the NAT external_ip to the nat_entries even for
> > +                 * gateway routers. This is required for adding load balancer
> > +                 * flows.*/
> > +                sset_add(&nat_entries, nat->external_ip);
> >               }
> >
> >               /* Egress UNDNAT table: It is for already established connections'
> > @@ -9059,8 +9089,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >               }
> >           }
> >
> > -        sset_destroy(&nat_entries);
> > -
> >           /* Handle force SNAT options set in the gateway router. */
> >           if (dnat_force_snat_ip && !od->l3dgw_port) {
> >               /* If a packet with destination IP address as that of the
> > @@ -9121,6 +9149,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >           /* Load balancing and packet defrag are only valid on
> >            * Gateway routers or router with gateway port. */
> >           if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
> > +            sset_destroy(&nat_entries);
> >               continue;
> >           }
> >
> > @@ -9195,10 +9224,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >                   }
> >                   add_router_lb_flow(lflows, od, &match, &actions, prio,
> >                                      lb_force_snat_ip, lb_vip, is_udp,
> > -                                   nb_lb, meter_groups);
> > +                                   nb_lb, meter_groups, &nat_entries);
> >               }
> >           }
> >           sset_destroy(&all_ips);
> > +        sset_destroy(&nat_entries);
> >       }
> >
> >       /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index a2989e78e..c783ea2ac 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -1288,3 +1288,43 @@ ovn-nbctl --wait=sb lb-del lb2
> >   OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor |  wc -l`])
> >
> >   AT_CLEANUP
> > +
> > +AT_SETUP([ovn -- Load balancer VIP in NAT entries])
> > +AT_SKIP_IF([test $HAVE_PYTHON = no])
> > +ovn_start
> > +
> > +ovn-nbctl lr-add lr0
> > +ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
> > +ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
> > +
> > +ovn-nbctl set logical_router lr0 options:chassis=ch1
> > +
> > +ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
> > +ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080"
> > +ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
> > +ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
> > +
> > +ovn-nbctl lr-lb-add lr0 lb1
> > +ovn-nbctl lr-lb-add lr0 lb2
> > +ovn-nbctl lr-lb-add lr0 lb3
> > +ovn-nbctl lr-lb-add lr0 lb4
> > +
> > +ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
> > +ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
> > +ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
> > +
> > +OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.1 && tcp.dst == 8080" -c) ])
> > +
> > +ovn-sbctl dump-flows lr0
> Is this left over from debugging?

Yes. I'll clean it in v2.

Thanks
Numan

>
> > +
> > +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.4 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.5 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.6 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CLEANUP
> > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > index 3b3379840..f1ae69b20 100644
> > --- a/tests/system-ovn.at
> > +++ b/tests/system-ovn.at
> > @@ -1635,7 +1635,6 @@ ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
> >   ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
> >       external_ip=30.0.0.2 -- add logical_router R2 nat @nat
> >
> > -
> >   # Wait for ovn-controller to catch up.
> >   ovn-nbctl --wait=hv sync
> >   OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
> > @@ -1671,6 +1670,56 @@ tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
> >   tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> >   ])
> >
> > +check_est_flows () {
> > +    n=$(ovs-ofctl dump-flows br-int table=14 | grep \
> > +"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
> > +| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
> > +
> > +    echo "n_packets=$n"
> > +    test "$n" != 0
> > +}
> > +
> > +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> > +
> > +
> > +ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
> > +
> > +# Destroy the load balancer and create again. ovn-controller will
> > +# clear the OF flows and re add again and clears the n_packets
> > +# for these flows.
> > +ovn-nbctl destroy load_balancer $uuid
> > +uuid=`ovn-nbctl  create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
> > +ovn-nbctl set logical_router R2 load_balancer=$uuid
> > +
> > +# Config OVN load-balancer with another VIP (this time with ports).
> > +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
> > +
> > +ovn-nbctl list load_balancer
> > +ovn-sbctl dump-flows R2
> > +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
> > +grep 'nat(src=20.0.0.2)'])
> > +
> > +dnl Test load-balancing that includes L4 ports in NAT.
> > +for i in `seq 1 20`; do
> > +    echo Request $i
> > +    NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
> > +done
> > +
> > +dnl Each server should have at least one connection.
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> > +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> > +tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +])
> > +
> > +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> > +
> >   OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >
> >   as ovn-sb
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Mark Michelson March 26, 2020, 2:49 p.m. UTC | #3
On 3/26/20 10:17 AM, Numan Siddique wrote:
> On Thu, Mar 26, 2020 at 6:06 PM Mark Michelson <mmichels@redhat.com> wrote:
>>
>> The gist of this change is that the configured SNAT external address and
>> the load balancer VIP are the same. This means that we end up with both
>> SNAT and DNAT occurring for the same external address. Would we see a
>> similar problem if there were no load balancer, but we had configured
>> the nat type to be "dnat_and_snat"? In this case, we wouldn't have the
>> same conntrack-based lb flows in table 6. However, would we still run
>> into the same issue with the conntrack state not being "est" because of
>> the table 5 ct_snat flow? Or is there some safeguard in place to ensure
>> this does not happen?
> 
> I don't think it's required in the case of dnat_and_snat. That's
> because we don't check ct flags for
> dnat_and_snat entries in lr_in_dnat.
> 
> Let's say we have below dnat_and_snat entry
> 
> ***
> nat c34ff396-80d2-42aa-a4cd-747150e4b1ab
>          external ip: "172.168.0.110"
>          logical ip: "10.0.0.3"
>          type: "dnat_and_snat"
> ****
> Then the flow is
>   table=6 (lr_in_dnat         ), priority=100  , match=(ip && ip4.dst
> == 172.168.0.110 && inport == "lr0-public"),
> action=(ct_dnat(10.0.0.3);)
> 
> And the corresponding OF flow is
> 
> table=14, n_packets=0, n_bytes=0, idle_age=523,
> priority=100,ip,reg14=0x3,metadata=0x3,nw_dst=172.168.0.110
> actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=10.0.0.3))
> 
> Probably we may have to revisit the logical flows and see why ct
> flags are not used in the case of dnat_and_snat entries.

Sorry, I don't think I made my concern clear. I know that table 6 (aka 
OF table 14) won't have any ct-related flows if there are no load 
balancers. What I was concerned about is table 5 resulting in ct state 
becoming invalid in the case where dnat_and_snat-type nat is used. While 
that won't have issues in table 6, I'm worried it might cause problems 
in other tables where conntrack state is used in the match.

Doing a quick browse of the northd code, though, it doesn't look like ct 
state is used in any other router flows. So maybe this isn't an urgent 
concern. ct state is used extensively for ACL flows in logical switches, 
but those should be different conntrack entries/zones than the router, 
so they shouldn't be affected.

> 
>>
>> See inline below for more
>>
>> On 3/26/20 7:32 AM, numans@ovn.org wrote:
>>> From: Numan Siddique <numans@ovn.org>
>>>
>>> Suppose there is below NAT entry with external_ip = 172.168.0.100
>>>
>>> nat <UUID>
>>>       external ip: "172.168.0.100"
>>>       logical ip: "10.0.0.0/24"
>>>       type: "snat"
>>>
>>> And a load balancer with the VIP - 172.168.0.100
>>>
>>> _uuid               : <UUID>
>>> external_ids        : {}
>>> name                : lb1
>>> protocol            : tcp
>>> vips                : {"172.168.0.100:8080"="10.0.0.4:8080"}
>>>
>>> And if these are associated to a gateway logical router
>>>
>>> Then we will see the below lflows in the router pipeline
>>>
>>> ...
>>> table=5 (lr_in_unsnat       ), priority=90   , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
>>> ...
>>> table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
>>> table=6 (lr_in_dnat         ), priority=120  , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)
>>>
>>> When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
>>> is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
>>> dnatted to 10.0.0.4 in the dnat conntrack zone.
>>>
>>> But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
>>> stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
>>> as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
>>> but looks like there is some issue in either ovs-vswitchd or in datapath, and the ct flags
>>> are not set properly.
>>>
>>> In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
>>> lflows in lr_in_dnat stage hit.
>>>
>>> This needs to be investigated further. But we can avoid this scenario in OVN
>>> by adding the below lflow.
>>>
>>> table=5 (lr_in_unsnat       ), priority=120  , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)
>>>
>>> This patch adds the above lflow if the lb vip also has an entry in the NAT table.
>>>
>>> This patch is also required to support sctp load balancers in OVN.
>>>
>>> Reported-by: Tim Rozet <trozet@redhat.com>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
>>> Signed-off-by: Numan Siddique <numans@ovn.org>
>>> ---
>>>    northd/ovn-northd.8.xml | 14 ++++++++++
>>>    northd/ovn-northd.c     | 58 +++++++++++++++++++++++++++++++----------
>>>    tests/ovn-northd.at     | 40 ++++++++++++++++++++++++++++
>>>    tests/system-ovn.at     | 51 +++++++++++++++++++++++++++++++++++-
>>>    4 files changed, 148 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>> index 1e0993e07..0682b8b79 100644
>>> --- a/northd/ovn-northd.8.xml
>>> +++ b/northd/ovn-northd.8.xml
>>> @@ -2138,6 +2138,20 @@ icmp6 {
>>>              <code>redirect-chassis</code>.
>>>            </p>
>>>
>>> +        <p>
>>> +          For each IPv4 address <var>A</var> defined as load balancer
>>> +          VIP (associated with the logical router) with the protocol
>>> +          <var>P</var> (and the protocol port <var>T</var> if defined) is also
>>> +          present as an <code>external_ip</code> in the NAT table,
>>> +          a priority-120 logical flow is added with the match
>>> +          <code>ip4 &amp;&amp; ip4.dst == <var>A</var> &amp;&amp; </code> with
>>> +          the actions <code>next;</code> to advance the packet to the next
>>> +          table. If the load balancer also has protocol port <code>B</code> is
>>> +          defined, then the match also has
>>> +          <code><var>P</var>.dst == <var>B</var></code>.
>>> +          The same flows are added for IPv6 load balancers too.
>>> +        </p>
>>> +
>>>            <p>
>>>              A priority-0 logical flow with match <code>1</code> has actions
>>>              <code>next;</code>.
>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>>> index b76df0554..18fbbc7f1 100644
>>> --- a/northd/ovn-northd.c
>>> +++ b/northd/ovn-northd.c
>>> @@ -7574,7 +7574,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>>>                       struct ds *match, struct ds *actions, int priority,
>>>                       const char *lb_force_snat_ip, struct lb_vip *lb_vip,
>>>                       bool is_udp, struct nbrec_load_balancer *lb,
>>> -                   struct shash *meter_groups)
>>> +                   struct shash *meter_groups, struct sset *nat_entries)
>>>    {
>>>        build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
>>>                                  meter_groups);
>>> @@ -7607,6 +7607,39 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>>>        free(new_match);
>>>        free(est_match);
>>>
>>> +    const char *ip_match = NULL;
>>> +    if (lb_vip->addr_family == AF_INET) {
>>> +        ip_match = "ip4";
>>> +    } else {
>>> +        ip_match = "ip6";
>>> +    }
>>> +
>>> +    if (sset_contains(nat_entries, lb_vip->vip)) {
>>> +        /* The load balancer vip is also present in the NAT entries.
>>> +         * So add a high priority lflow to advance the the packet
>>> +         * destined to the vip (and the vip port if defined)
>>> +         * in the S_ROUTER_IN_UNSNAT stage.
>>> +         * There seems to be an issue with ovs-vswitchd. When the new
>>> +         * connection packet destined for the lb vip is received,
>>> +         * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
>>> +         * conntrack zone. For the next packet, if it goes through
>>> +         * unsnat stage, the conntrack flags are not set properly, and
>>> +         * it doesn't hit the established state flows in
>>> +         * S_ROUTER_IN_DNAT stage. */
>>> +        struct ds unsnat_match = DS_EMPTY_INITIALIZER;
>>> +        ds_put_format(&unsnat_match, "%s && %s.dst == %s",
>>> +                      ip_match, ip_match, lb_vip->vip);
>>
>> I think you should add the load balancer l4 protocol here. Consider that
>> you have the described NAT setup with a UDP load balancer with no VIP
>> port. TCP traffic would hit this flow intended for load balancer traffic
>> instead of the lower priority flow intended to UNSNAT.
> 
> Agree. I'll do in v2.
> 
> 
>>
>>> +        if (lb_vip->vip_port) {
>>> +            ds_put_format(&unsnat_match, " && %s.dst == %d",
>>> +                          is_udp ? "udp" : "tcp", lb_vip->vip_port);
>>> +        }
>>> +
>>> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
>>> +                                ds_cstr(&unsnat_match), "next;", &lb->header_);
>>> +
>>> +        ds_destroy(&unsnat_match);
>>> +    }
>>> +
>>>        if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
>>>            return;
>>>        }
>>> @@ -7616,19 +7649,11 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>>>         * router has a gateway router port associated.
>>>         */
>>>        struct ds undnat_match = DS_EMPTY_INITIALIZER;
>>> -    if (lb_vip->addr_family == AF_INET) {
>>> -        ds_put_cstr(&undnat_match, "ip4 && (");
>>> -    } else {
>>> -        ds_put_cstr(&undnat_match, "ip6 && (");
>>> -    }
>>> +    ds_put_format(&undnat_match, "%s && (", ip_match);
>>>
>>>        for (size_t i = 0; i < lb_vip->n_backends; i++) {
>>>            struct lb_vip_backend *backend = &lb_vip->backends[i];
>>> -        if (backend->addr_family == AF_INET) {
>>> -            ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
>>> -        } else {
>>> -            ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
>>> -        }
>>> +        ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
>>>
>>>            if (backend->port) {
>>>                ds_put_format(&undnat_match, " && %s.src == %d) || ",
>>> @@ -8879,6 +8904,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>>                                                &nat->header_);
>>>                        sset_add(&nat_entries, nat->external_ip);
>>>                    }
>>> +            } else {
>>> +                /* Add the NAT external_ip to the nat_entries even for
>>> +                 * gateway routers. This is required for adding load balancer
>>> +                 * flows.*/
>>> +                sset_add(&nat_entries, nat->external_ip);
>>>                }
>>>
>>>                /* Egress UNDNAT table: It is for already established connections'
>>> @@ -9059,8 +9089,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>>                }
>>>            }
>>>
>>> -        sset_destroy(&nat_entries);
>>> -
>>>            /* Handle force SNAT options set in the gateway router. */
>>>            if (dnat_force_snat_ip && !od->l3dgw_port) {
>>>                /* If a packet with destination IP address as that of the
>>> @@ -9121,6 +9149,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>>            /* Load balancing and packet defrag are only valid on
>>>             * Gateway routers or router with gateway port. */
>>>            if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
>>> +            sset_destroy(&nat_entries);
>>>                continue;
>>>            }
>>>
>>> @@ -9195,10 +9224,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>>                    }
>>>                    add_router_lb_flow(lflows, od, &match, &actions, prio,
>>>                                       lb_force_snat_ip, lb_vip, is_udp,
>>> -                                   nb_lb, meter_groups);
>>> +                                   nb_lb, meter_groups, &nat_entries);
>>>                }
>>>            }
>>>            sset_destroy(&all_ips);
>>> +        sset_destroy(&nat_entries);
>>>        }
>>>
>>>        /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>> index a2989e78e..c783ea2ac 100644
>>> --- a/tests/ovn-northd.at
>>> +++ b/tests/ovn-northd.at
>>> @@ -1288,3 +1288,43 @@ ovn-nbctl --wait=sb lb-del lb2
>>>    OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor |  wc -l`])
>>>
>>>    AT_CLEANUP
>>> +
>>> +AT_SETUP([ovn -- Load balancer VIP in NAT entries])
>>> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>>> +ovn_start
>>> +
>>> +ovn-nbctl lr-add lr0
>>> +ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
>>> +ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
>>> +
>>> +ovn-nbctl set logical_router lr0 options:chassis=ch1
>>> +
>>> +ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
>>> +ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080"
>>> +ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
>>> +ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
>>> +
>>> +ovn-nbctl lr-lb-add lr0 lb1
>>> +ovn-nbctl lr-lb-add lr0 lb2
>>> +ovn-nbctl lr-lb-add lr0 lb3
>>> +ovn-nbctl lr-lb-add lr0 lb4
>>> +
>>> +ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
>>> +ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
>>> +ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
>>> +
>>> +OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
>>> +grep "ip4 && ip4.dst == 192.168.2.1 && tcp.dst == 8080" -c) ])
>>> +
>>> +ovn-sbctl dump-flows lr0
>> Is this left over from debugging?
> 
> Yes. I'll clean it in v2.
> 
> Thanks
> Numan
> 
>>
>>> +
>>> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
>>> +grep "ip4 && ip4.dst == 192.168.2.4 && tcp.dst == 8080" -c) ])
>>> +
>>> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
>>> +grep "ip4 && ip4.dst == 192.168.2.5 && tcp.dst == 8080" -c) ])
>>> +
>>> +AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
>>> +grep "ip4 && ip4.dst == 192.168.2.6 && tcp.dst == 8080" -c) ])
>>> +
>>> +AT_CLEANUP
>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>> index 3b3379840..f1ae69b20 100644
>>> --- a/tests/system-ovn.at
>>> +++ b/tests/system-ovn.at
>>> @@ -1635,7 +1635,6 @@ ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
>>>    ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
>>>        external_ip=30.0.0.2 -- add logical_router R2 nat @nat
>>>
>>> -
>>>    # Wait for ovn-controller to catch up.
>>>    ovn-nbctl --wait=hv sync
>>>    OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
>>> @@ -1671,6 +1670,56 @@ tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
>>>    tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
>>>    ])
>>>
>>> +check_est_flows () {
>>> +    n=$(ovs-ofctl dump-flows br-int table=14 | grep \
>>> +"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
>>> +| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
>>> +
>>> +    echo "n_packets=$n"
>>> +    test "$n" != 0
>>> +}
>>> +
>>> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
>>> +
>>> +
>>> +ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
>>> +
>>> +# Destroy the load balancer and create again. ovn-controller will
>>> +# clear the OF flows and re add again and clears the n_packets
>>> +# for these flows.
>>> +ovn-nbctl destroy load_balancer $uuid
>>> +uuid=`ovn-nbctl  create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
>>> +ovn-nbctl set logical_router R2 load_balancer=$uuid
>>> +
>>> +# Config OVN load-balancer with another VIP (this time with ports).
>>> +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
>>> +
>>> +ovn-nbctl list load_balancer
>>> +ovn-sbctl dump-flows R2
>>> +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
>>> +grep 'nat(src=20.0.0.2)'])
>>> +
>>> +dnl Test load-balancing that includes L4 ports in NAT.
>>> +for i in `seq 1 20`; do
>>> +    echo Request $i
>>> +    NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
>>> +done
>>> +
>>> +dnl Each server should have at least one connection.
>>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
>>> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
>>> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
>>> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>> +tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
>>> +tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
>>> +])
>>> +
>>> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
>>> +
>>>    OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>>
>>>    as ovn-sb
>>>
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
diff mbox series

Patch

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 1e0993e07..0682b8b79 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -2138,6 +2138,20 @@  icmp6 {
           <code>redirect-chassis</code>.
         </p>
 
+        <p>
+          For each IPv4 address <var>A</var> defined as load balancer
+          VIP (associated with the logical router) with the protocol
+          <var>P</var> (and the protocol port <var>T</var> if defined) is also
+          present as an <code>external_ip</code> in the NAT table,
+          a priority-120 logical flow is added with the match
+          <code>ip4 &amp;&amp; ip4.dst == <var>A</var> &amp;&amp; </code> with
+          the actions <code>next;</code> to advance the packet to the next
+          table. If the load balancer also has protocol port <code>B</code> is
+          defined, then the match also has
+          <code><var>P</var>.dst == <var>B</var></code>.
+          The same flows are added for IPv6 load balancers too.
+        </p>
+
         <p>
           A priority-0 logical flow with match <code>1</code> has actions
           <code>next;</code>.
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index b76df0554..18fbbc7f1 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -7574,7 +7574,7 @@  add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
                    struct ds *match, struct ds *actions, int priority,
                    const char *lb_force_snat_ip, struct lb_vip *lb_vip,
                    bool is_udp, struct nbrec_load_balancer *lb,
-                   struct shash *meter_groups)
+                   struct shash *meter_groups, struct sset *nat_entries)
 {
     build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
                               meter_groups);
@@ -7607,6 +7607,39 @@  add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
     free(new_match);
     free(est_match);
 
+    const char *ip_match = NULL;
+    if (lb_vip->addr_family == AF_INET) {
+        ip_match = "ip4";
+    } else {
+        ip_match = "ip6";
+    }
+
+    if (sset_contains(nat_entries, lb_vip->vip)) {
+        /* The load balancer vip is also present in the NAT entries.
+         * So add a high priority lflow to advance the the packet
+         * destined to the vip (and the vip port if defined)
+         * in the S_ROUTER_IN_UNSNAT stage.
+         * There seems to be an issue with ovs-vswitchd. When the new
+         * connection packet destined for the lb vip is received,
+         * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
+         * conntrack zone. For the next packet, if it goes through
+         * unsnat stage, the conntrack flags are not set properly, and
+         * it doesn't hit the established state flows in
+         * S_ROUTER_IN_DNAT stage. */
+        struct ds unsnat_match = DS_EMPTY_INITIALIZER;
+        ds_put_format(&unsnat_match, "%s && %s.dst == %s",
+                      ip_match, ip_match, lb_vip->vip);
+        if (lb_vip->vip_port) {
+            ds_put_format(&unsnat_match, " && %s.dst == %d",
+                          is_udp ? "udp" : "tcp", lb_vip->vip_port);
+        }
+
+        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
+                                ds_cstr(&unsnat_match), "next;", &lb->header_);
+
+        ds_destroy(&unsnat_match);
+    }
+
     if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
         return;
     }
@@ -7616,19 +7649,11 @@  add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
      * router has a gateway router port associated.
      */
     struct ds undnat_match = DS_EMPTY_INITIALIZER;
-    if (lb_vip->addr_family == AF_INET) {
-        ds_put_cstr(&undnat_match, "ip4 && (");
-    } else {
-        ds_put_cstr(&undnat_match, "ip6 && (");
-    }
+    ds_put_format(&undnat_match, "%s && (", ip_match);
 
     for (size_t i = 0; i < lb_vip->n_backends; i++) {
         struct lb_vip_backend *backend = &lb_vip->backends[i];
-        if (backend->addr_family == AF_INET) {
-            ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
-        } else {
-            ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
-        }
+        ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
 
         if (backend->port) {
             ds_put_format(&undnat_match, " && %s.src == %d) || ",
@@ -8879,6 +8904,11 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                                             &nat->header_);
                     sset_add(&nat_entries, nat->external_ip);
                 }
+            } else {
+                /* Add the NAT external_ip to the nat_entries even for
+                 * gateway routers. This is required for adding load balancer
+                 * flows.*/
+                sset_add(&nat_entries, nat->external_ip);
             }
 
             /* Egress UNDNAT table: It is for already established connections'
@@ -9059,8 +9089,6 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             }
         }
 
-        sset_destroy(&nat_entries);
-
         /* Handle force SNAT options set in the gateway router. */
         if (dnat_force_snat_ip && !od->l3dgw_port) {
             /* If a packet with destination IP address as that of the
@@ -9121,6 +9149,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         /* Load balancing and packet defrag are only valid on
          * Gateway routers or router with gateway port. */
         if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
+            sset_destroy(&nat_entries);
             continue;
         }
 
@@ -9195,10 +9224,11 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                 }
                 add_router_lb_flow(lflows, od, &match, &actions, prio,
                                    lb_force_snat_ip, lb_vip, is_udp,
-                                   nb_lb, meter_groups);
+                                   nb_lb, meter_groups, &nat_entries);
             }
         }
         sset_destroy(&all_ips);
+        sset_destroy(&nat_entries);
     }
 
     /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index a2989e78e..c783ea2ac 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1288,3 +1288,43 @@  ovn-nbctl --wait=sb lb-del lb2
 OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor |  wc -l`])
 
 AT_CLEANUP
+
+AT_SETUP([ovn -- Load balancer VIP in NAT entries])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+ovn-nbctl lr-add lr0
+ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
+ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
+
+ovn-nbctl set logical_router lr0 options:chassis=ch1
+
+ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
+ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080"
+ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
+ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
+
+ovn-nbctl lr-lb-add lr0 lb1
+ovn-nbctl lr-lb-add lr0 lb2
+ovn-nbctl lr-lb-add lr0 lb3
+ovn-nbctl lr-lb-add lr0 lb4
+
+ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
+ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
+ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
+
+OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
+grep "ip4 && ip4.dst == 192.168.2.1 && tcp.dst == 8080" -c) ])
+
+ovn-sbctl dump-flows lr0
+
+AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
+grep "ip4 && ip4.dst == 192.168.2.4 && tcp.dst == 8080" -c) ])
+
+AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
+grep "ip4 && ip4.dst == 192.168.2.5 && tcp.dst == 8080" -c) ])
+
+AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
+grep "ip4 && ip4.dst == 192.168.2.6 && tcp.dst == 8080" -c) ])
+
+AT_CLEANUP
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 3b3379840..f1ae69b20 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -1635,7 +1635,6 @@  ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
 ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
     external_ip=30.0.0.2 -- add logical_router R2 nat @nat
 
-
 # Wait for ovn-controller to catch up.
 ovn-nbctl --wait=hv sync
 OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
@@ -1671,6 +1670,56 @@  tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
 tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
 ])
 
+check_est_flows () {
+    n=$(ovs-ofctl dump-flows br-int table=14 | grep \
+"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
+| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
+
+    echo "n_packets=$n"
+    test "$n" != 0
+}
+
+OVS_WAIT_UNTIL([check_est_flows], [check established flows])
+
+
+ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
+
+# Destroy the load balancer and create again. ovn-controller will
+# clear the OF flows and re add again and clears the n_packets
+# for these flows.
+ovn-nbctl destroy load_balancer $uuid
+uuid=`ovn-nbctl  create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
+ovn-nbctl set logical_router R2 load_balancer=$uuid
+
+# Config OVN load-balancer with another VIP (this time with ports).
+ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
+
+ovn-nbctl list load_balancer
+ovn-sbctl dump-flows R2
+OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
+grep 'nat(src=20.0.0.2)'])
+
+dnl Test load-balancing that includes L4 ports in NAT.
+for i in `seq 1 20`; do
+    echo Request $i
+    NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
+done
+
+dnl Each server should have at least one connection.
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
+sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
+tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
+tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
+sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
+tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
+tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
+])
+
+OVS_WAIT_UNTIL([check_est_flows], [check established flows])
+
 OVS_APP_EXIT_AND_WAIT([ovn-controller])
 
 as ovn-sb