diff mbox series

[ovs-dev] northd: Use separate SNAT for already-DNATted traffic.

Message ID 20220309193810.246221-1-mmichels@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] northd: Use separate SNAT for already-DNATted traffic. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Mark Michelson March 9, 2022, 7:38 p.m. UTC
Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality
in ovn-northd to use a single zone for SNAT and DNAT when possible. It
accounts for certain situations, such as hairpinned traffic, where we
still need a separate SNAT and DNAT zone.

However, one situation it did not account for was when traffic traverses
a logical router and is DNATted as a result of a load balancer, then
when the traffic egresses the router, it needs to be SNATted. In this
situation, we try to use the same CT zone for the SNAT as for the load
balancer DNAT, which does not work.

This commit fixes the issue by installing a flow in the OUT_SNAT stage
of the logical router egress pipeline for each SNAT. This flow will fall
back to using a separate CT zone for SNAT if the ct_label.natted flag is
set.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2044127

Signed-off-by: Mark Michelson <mmichels@redhat.com>
---
Note: The "SNAT in separate zone from DNAT" test in system-ovn.at can
fail on some systems, and at this point it's not clear what causes it.
On my development system running Fedora 33 and kernel 5.14.18-100, the
test fails because the pings fail. In a container running Fedora 33 and
kernel 5.10.19-200, the test succeeds. It's unknown if it's the kernel
version or some other setting that has a bearing on the success or
failure of the test. The OpenFlow that ovn-controller generates is
identical on both successful and failure scenarios.

If this inconsistency causes issues with CI, then this patch may need to
be re-examined to determine if there are other parameters that may be
needed to correct the referenced issue.
---
 northd/northd.c     |  53 ++++++++++++++------
 tests/ovn-northd.at |  36 ++++++++++++++
 tests/system-ovn.at | 117 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 190 insertions(+), 16 deletions(-)

Comments

Numan Siddique March 18, 2022, 8:34 p.m. UTC | #1
On Wed, Mar 9, 2022 at 2:38 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality
> in ovn-northd to use a single zone for SNAT and DNAT when possible. It
> accounts for certain situations, such as hairpinned traffic, where we
> still need a separate SNAT and DNAT zone.
>
> However, one situation it did not account for was when traffic traverses
> a logical router and is DNATted as a result of a load balancer, then
> when the traffic egresses the router, it needs to be SNATted. In this
> situation, we try to use the same CT zone for the SNAT as for the load
> balancer DNAT, which does not work.
>
> This commit fixes the issue by installing a flow in the OUT_SNAT stage
> of the logical router egress pipeline for each SNAT. This flow will fall
> back to using a separate CT zone for SNAT if the ct_label.natted flag is
> set.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2044127
>
> Signed-off-by: Mark Michelson <mmichels@redhat.com>


Hi Mark,

Few comments,

 - There are a few test cases failing with this patch.

 - I'm pretty sure the issue will not be seen if the load balancer is
also associated with the logical switch.  In that case, dnat would
happen in the logical switch pipeline.
   But still we should address this issue as it's a regression
introduced by the commit 4deac4509abb.

 - Only small concern I've with this patch is the usage of
ct_label.natted field in the egress pipeline.  In the ingress pipeline
of the router we send the pkt to conntrack and then do NAT.
   I'm just worried what if the ct states/labels are not valid or they
get cleared when the pkt enters ingress to egress pipeline.  Looks
like we don't clear the conntrack state now.
   But future patches could.  I think we should document this somewhere.

-  This patch needs to add the relevant documentation in the
ovn-northd.8.xml for the new flows added.

-  Instead of adding the flows in the "lr_out_snat" stage,  I'd
suggest to add the below flow in the "lr_out_chk_dnat_local" pipeline

    table=0 (lr_out_chk_dnat_local), priority=50   , match=(ip &&
ct_label.natted == 1),  action=(reg9[4] = 1; next;)

We already have flows in the "lr_out_snat" stage to make use of
"ct_snat" action instead of "ct_snat_in_czone" if reg9[4] is set.
Can you please see if this suggestion is possible ?


Thanks
Numan


> ---
> Note: The "SNAT in separate zone from DNAT" test in system-ovn.at can
> fail on some systems, and at this point it's not clear what causes it.
> On my development system running Fedora 33 and kernel 5.14.18-100, the
> test fails because the pings fail. In a container running Fedora 33 and
> kernel 5.10.19-200, the test succeeds. It's unknown if it's the kernel
> version or some other setting that has a bearing on the success or
> failure of the test. The OpenFlow that ovn-controller generates is
> identical on both successful and failure scenarios.
>
> If this inconsistency causes issues with CI, then this patch may need to
> be re-examined to determine if there are other parameters that may be
> needed to correct the referenced issue.
> ---
>  northd/northd.c     |  53 ++++++++++++++------
>  tests/ovn-northd.at |  36 ++++++++++++++
>  tests/system-ovn.at | 117 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 190 insertions(+), 16 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index b264fb850..866a81310 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -12961,23 +12961,44 @@ build_lrouter_out_snat_flow(struct hmap *lflows, struct ovn_datapath *od,
>                                  priority, ds_cstr(match),
>                                  ds_cstr(actions), &nat->header_);
>
> -        if (!stateless) {
> -            ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
> -            ds_clear(actions);
> -            if (distributed) {
> -                ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
> -                              ETH_ADDR_ARGS(mac));
> -            }
> -            ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; ct_snat(%s",
> -                          nat->external_ip);
> -            if (nat->external_port_range[0]) {
> -                ds_put_format(actions, ",%s", nat->external_port_range);
> -            }
> -            ds_put_format(actions, ");");
> -            ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> -                                    priority + 1, ds_cstr(match),
> -                                    ds_cstr(actions), &nat->header_);
> +        if (stateless) {
> +            return;
> +        }
> +
> +        struct ds match_natted = DS_EMPTY_INITIALIZER;
> +        struct ds actions_natted = DS_EMPTY_INITIALIZER;
> +
> +        ds_clone(&match_natted, match);
> +
> +        ds_clear(actions);
> +
> +        if (distributed) {
> +            ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
> +                          ETH_ADDR_ARGS(mac));
>          }
> +
> +        ds_clone(&actions_natted, actions);
> +
> +        ds_put_format(&actions_natted, "ct_snat(%s", nat->external_ip);
> +        if (nat->external_port_range[0]) {
> +            ds_put_format(&actions_natted, ",%s", nat->external_port_range);
> +        }
> +        ds_put_format(&actions_natted, ");");
> +
> +        ds_put_cstr(&match_natted, " && ct_label.natted == 1");
> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> +                                priority + 2, ds_cstr(&match_natted),
> +                                ds_cstr(&actions_natted), &nat->header_);
> +
> +        ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
> +        ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; %s",
> +                      ds_cstr(&actions_natted));
> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> +                                priority + 1, ds_cstr(match),
> +                                ds_cstr(actions), &nat->header_);
> +
> +        ds_destroy(&match_natted);
> +        ds_destroy(&actions_natted);
>      }
>  }
>
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 60d91a771..d9657bcb8 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -5974,6 +5974,42 @@ AT_CHECK([grep -e "(lr_in_ip_routing   ).*outport" lr0flows | sed 's/table=../ta
>    table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;)
>  ])
>
> +AT_CLEANUP
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([nat -- SNAT flows for DNAT traffic])
> +# This test makes sure that SNAT flows are added that ensure a unique
> +# SNAT zone is used for traffic that is natted. We don't need an elaborate
> +# setup just to ensure the flow is present. We just need to create the router
> +# and add an SNAT to it.
> +
> +ovn_start
> +
> +check ovn-nbctl lr-add r0
> +check ovn-nbctl lrp-add r0 r0p1 00:00:00:00:00:01 10.0.0.1
> +check ovn-nbctl lrp-set-gateway-chassis r0p1 hv1
> +check ovn-nbctl lr-nat-add r0 snat 10.0.0.1 192.168.0.1
> +check ovn-nbctl --wait=sb sync
> +
> +ovn-sbctl lflow-list r0 | grep lr_out_snat | sed 's/table=../table=??/' > snat_flows
> +
> +# We expect three SNAT flows for the NAT we created.
> +# Priority-161 flow that uses CT zone stored in reg11
> +# Priority-162 flow that uses CT zone stored in reg12 if local dnat is detected
> +# Priority-163 flow taht uses CT zone stored in reg12 if traffic is natted
> +# Since priorities are unique, we can grep based on that and check the resulting
> +# flows are as expected.
> +
> +AT_CHECK([grep priority=163 snat_flows], [0], [dnl
> +  table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && ct_label.natted == 1), action=(ct_snat(10.0.0.1);)
> +])
> +AT_CHECK([grep priority=162 snat_flows], [0], [dnl
> +  table=??(lr_out_snat        ), priority=162  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && reg9[[4]] == 1), action=(reg9[[4]] = 0; ct_snat(10.0.0.1);)
> +])
> +AT_CHECK([grep priority=161 snat_flows], [0], [dnl
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1")), action=(ct_snat_in_czone(10.0.0.1);)
> +])
> +
>  AT_CLEANUP
>  ])
>
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index d4c22e7e9..0b29becc8 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -7975,3 +7975,120 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([SNAT in separate zone from DNAT])
> +
> +CHECK_CONNTRACK()
> +CHECK_CONNTRACK_NAT()
> +ovn_start
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-int])
> +
> +# Set external-ids in br-int needed for ovn-controller
> +ovs-vsctl \
> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> +        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> +        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
> +
> +# The goal of this test is to ensure that when traffic is first DNATted
> +# (by way of a load balancer), and then SNATted, the SNAT happens in a
> +# separate conntrack zone from the DNAT.
> +
> +start_daemon ovn-controller
> +
> +check ovn-nbctl ls-add public
> +
> +check ovn-nbctl lr-add r1
> +check ovn-nbctl lrp-add r1 r1_public 00:de:ad:ff:00:01 172.16.0.1/16
> +check ovn-nbctl lrp-add r1 r1_s1 00:de:ad:fe:00:01 173.0.1.1/24
> +check ovn-nbctl lrp-set-gateway-chassis r1_public hv1
> +
> +check ovn-nbctl lb-add r1_lb 30.0.0.1 172.16.0.102
> +check ovn-nbctl lr-lb-add r1 r1_lb
> +
> +check ovn-nbctl ls-add s1
> +check ovn-nbctl lsp-add s1 s1_r1
> +check ovn-nbctl lsp-set-type s1_r1 router
> +check ovn-nbctl lsp-set-addresses s1_r1 router
> +check ovn-nbctl lsp-set-options s1_r1 router-port=r1_s1
> +
> +check ovn-nbctl lsp-add s1 vm1
> +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 173.0.1.2"
> +
> +check ovn-nbctl lsp-add public public_r1
> +check ovn-nbctl lsp-set-type public_r1 router
> +check ovn-nbctl lsp-set-addresses public_r1 router
> +check ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router
> +
> +check ovn-nbctl lr-add r2
> +check ovn-nbctl lrp-add r2 r2_public 00:de:ad:ff:00:02 172.16.0.2/16
> +check ovn-nbctl lrp-add r2 r2_s2 00:de:ad:fe:00:02 173.0.2.1/24
> +check ovn-nbctl lr-nat-add r2 dnat_and_snat 172.16.0.102 173.0.2.2
> +check ovn-nbctl lrp-set-gateway-chassis r2_public hv1
> +
> +check ovn-nbctl ls-add s2
> +check ovn-nbctl lsp-add s2 s2_r2
> +check ovn-nbctl lsp-set-type s2_r2 router
> +check ovn-nbctl lsp-set-addresses s2_r2 router
> +check ovn-nbctl lsp-set-options s2_r2 router-port=r2_s2
> +
> +check ovn-nbctl lsp-add s2 vm2
> +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 173.0.2.2"
> +
> +check ovn-nbctl lsp-add public public_r2
> +check ovn-nbctl lsp-set-type public_r2 router
> +check ovn-nbctl lsp-set-addresses public_r2 router
> +check ovn-nbctl lsp-set-options public_r2 router-port=r2_public nat-addresses=router
> +
> +ADD_NAMESPACES(vm1)
> +ADD_VETH(vm1, vm1, br-int, "173.0.1.2/24", "00:de:ad:01:00:01", \
> +         "173.0.1.1")
> +ADD_NAMESPACES(vm2)
> +ADD_VETH(vm2, vm2, br-int, "173.0.2.2/24", "00:de:ad:01:00:02", \
> +         "173.0.2.1")
> +
> +check ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03
> +check ovn-nbctl --wait=hv sync
> +
> +# Next, make sure that a ping works as expected
> +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.1 | FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +# Finally, make sure that conntrack shows two separate zones being used for
> +# DNAT and SNAT
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +icmp,orig=(src=173.0.1.2,dst=30.0.0.1,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=173.0.1.2,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x2
> +])
> +
> +# The final two entries appear identical here. That is because FORMAT_CT
> +# scrubs the zone numbers. In actuality, the zone numbers are different,
> +# which is why there are two entries.
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.0.102) | \
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +icmp,orig=(src=172.16.0.101,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=173.0.2.2,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> +])
> +
> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> +
> +as ovn-sb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as ovn-nb
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as northd
> +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
> +
> +as
> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> +/connection dropped.*/d"])
> +AT_CLEANUP
> +])
> --
> 2.31.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Mark Michelson March 24, 2022, 8:46 p.m. UTC | #2
Hi Numan,

This message got buried in my inbox, so sorry about only replying now. I 
have some replies in-line below

On 3/18/22 16:34, Numan Siddique wrote:
> On Wed, Mar 9, 2022 at 2:38 PM Mark Michelson <mmichels@redhat.com> wrote:
>>
>> Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality
>> in ovn-northd to use a single zone for SNAT and DNAT when possible. It
>> accounts for certain situations, such as hairpinned traffic, where we
>> still need a separate SNAT and DNAT zone.
>>
>> However, one situation it did not account for was when traffic traverses
>> a logical router and is DNATted as a result of a load balancer, then
>> when the traffic egresses the router, it needs to be SNATted. In this
>> situation, we try to use the same CT zone for the SNAT as for the load
>> balancer DNAT, which does not work.
>>
>> This commit fixes the issue by installing a flow in the OUT_SNAT stage
>> of the logical router egress pipeline for each SNAT. This flow will fall
>> back to using a separate CT zone for SNAT if the ct_label.natted flag is
>> set.
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2044127
>>
>> Signed-off-by: Mark Michelson <mmichels@redhat.com>
> 
> 
> Hi Mark,
> 
> Few comments,
> 
>   - There are a few test cases failing with this patch.

Interesting. I'll have to double-check about that but I hadn't noticed 
it happening locally.

> 
>   - I'm pretty sure the issue will not be seen if the load balancer is
> also associated with the logical switch.  In that case, dnat would
> happen in the logical switch pipeline.
>     But still we should address this issue as it's a regression
> introduced by the commit 4deac4509abb.
> 
>   - Only small concern I've with this patch is the usage of
> ct_label.natted field in the egress pipeline.  In the ingress pipeline
> of the router we send the pkt to conntrack and then do NAT.
>     I'm just worried what if the ct states/labels are not valid or they
> get cleared when the pkt enters ingress to egress pipeline.  Looks
> like we don't clear the conntrack state now.
>     But future patches could.  I think we should document this somewhere.

The tricky thing here is that I don't know if there is any other 
reliable way to know in the egress pipeline if the packet was natted in 
the ingress pipeline. AFAIK, there is no scenario where the router 
egress pipeline can run on a different node than the router ingress 
pipeline, so at least the way it currently works should be reliable.

Where is the best place to document this?

> 
> -  This patch needs to add the relevant documentation in the
> ovn-northd.8.xml for the new flows added.
> 
> -  Instead of adding the flows in the "lr_out_snat" stage,  I'd
> suggest to add the below flow in the "lr_out_chk_dnat_local" pipeline
> 
>      table=0 (lr_out_chk_dnat_local), priority=50   , match=(ip &&
> ct_label.natted == 1),  action=(reg9[4] = 1; next;)
> 
> We already have flows in the "lr_out_snat" stage to make use of
> "ct_snat" action instead of "ct_snat_in_czone" if reg9[4] is set.
> Can you please see if this suggestion is possible ?

That sounds perfectly reasonable. I'll test it out and see if it works.

> 
> 
> Thanks
> Numan


> 
> 
>> ---
>> Note: The "SNAT in separate zone from DNAT" test in system-ovn.at can
>> fail on some systems, and at this point it's not clear what causes it.
>> On my development system running Fedora 33 and kernel 5.14.18-100, the
>> test fails because the pings fail. In a container running Fedora 33 and
>> kernel 5.10.19-200, the test succeeds. It's unknown if it's the kernel
>> version or some other setting that has a bearing on the success or
>> failure of the test. The OpenFlow that ovn-controller generates is
>> identical on both successful and failure scenarios.
>>
>> If this inconsistency causes issues with CI, then this patch may need to
>> be re-examined to determine if there are other parameters that may be
>> needed to correct the referenced issue.
>> ---
>>   northd/northd.c     |  53 ++++++++++++++------
>>   tests/ovn-northd.at |  36 ++++++++++++++
>>   tests/system-ovn.at | 117 ++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 190 insertions(+), 16 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index b264fb850..866a81310 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -12961,23 +12961,44 @@ build_lrouter_out_snat_flow(struct hmap *lflows, struct ovn_datapath *od,
>>                                   priority, ds_cstr(match),
>>                                   ds_cstr(actions), &nat->header_);
>>
>> -        if (!stateless) {
>> -            ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
>> -            ds_clear(actions);
>> -            if (distributed) {
>> -                ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
>> -                              ETH_ADDR_ARGS(mac));
>> -            }
>> -            ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; ct_snat(%s",
>> -                          nat->external_ip);
>> -            if (nat->external_port_range[0]) {
>> -                ds_put_format(actions, ",%s", nat->external_port_range);
>> -            }
>> -            ds_put_format(actions, ");");
>> -            ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
>> -                                    priority + 1, ds_cstr(match),
>> -                                    ds_cstr(actions), &nat->header_);
>> +        if (stateless) {
>> +            return;
>> +        }
>> +
>> +        struct ds match_natted = DS_EMPTY_INITIALIZER;
>> +        struct ds actions_natted = DS_EMPTY_INITIALIZER;
>> +
>> +        ds_clone(&match_natted, match);
>> +
>> +        ds_clear(actions);
>> +
>> +        if (distributed) {
>> +            ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
>> +                          ETH_ADDR_ARGS(mac));
>>           }
>> +
>> +        ds_clone(&actions_natted, actions);
>> +
>> +        ds_put_format(&actions_natted, "ct_snat(%s", nat->external_ip);
>> +        if (nat->external_port_range[0]) {
>> +            ds_put_format(&actions_natted, ",%s", nat->external_port_range);
>> +        }
>> +        ds_put_format(&actions_natted, ");");
>> +
>> +        ds_put_cstr(&match_natted, " && ct_label.natted == 1");
>> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
>> +                                priority + 2, ds_cstr(&match_natted),
>> +                                ds_cstr(&actions_natted), &nat->header_);
>> +
>> +        ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
>> +        ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; %s",
>> +                      ds_cstr(&actions_natted));
>> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
>> +                                priority + 1, ds_cstr(match),
>> +                                ds_cstr(actions), &nat->header_);
>> +
>> +        ds_destroy(&match_natted);
>> +        ds_destroy(&actions_natted);
>>       }
>>   }
>>
>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> index 60d91a771..d9657bcb8 100644
>> --- a/tests/ovn-northd.at
>> +++ b/tests/ovn-northd.at
>> @@ -5974,6 +5974,42 @@ AT_CHECK([grep -e "(lr_in_ip_routing   ).*outport" lr0flows | sed 's/table=../ta
>>     table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;)
>>   ])
>>
>> +AT_CLEANUP
>> +
>> +OVN_FOR_EACH_NORTHD([
>> +AT_SETUP([nat -- SNAT flows for DNAT traffic])
>> +# This test makes sure that SNAT flows are added that ensure a unique
>> +# SNAT zone is used for traffic that is natted. We don't need an elaborate
>> +# setup just to ensure the flow is present. We just need to create the router
>> +# and add an SNAT to it.
>> +
>> +ovn_start
>> +
>> +check ovn-nbctl lr-add r0
>> +check ovn-nbctl lrp-add r0 r0p1 00:00:00:00:00:01 10.0.0.1
>> +check ovn-nbctl lrp-set-gateway-chassis r0p1 hv1
>> +check ovn-nbctl lr-nat-add r0 snat 10.0.0.1 192.168.0.1
>> +check ovn-nbctl --wait=sb sync
>> +
>> +ovn-sbctl lflow-list r0 | grep lr_out_snat | sed 's/table=../table=??/' > snat_flows
>> +
>> +# We expect three SNAT flows for the NAT we created.
>> +# Priority-161 flow that uses CT zone stored in reg11
>> +# Priority-162 flow that uses CT zone stored in reg12 if local dnat is detected
>> +# Priority-163 flow taht uses CT zone stored in reg12 if traffic is natted
>> +# Since priorities are unique, we can grep based on that and check the resulting
>> +# flows are as expected.
>> +
>> +AT_CHECK([grep priority=163 snat_flows], [0], [dnl
>> +  table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && ct_label.natted == 1), action=(ct_snat(10.0.0.1);)
>> +])
>> +AT_CHECK([grep priority=162 snat_flows], [0], [dnl
>> +  table=??(lr_out_snat        ), priority=162  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && reg9[[4]] == 1), action=(reg9[[4]] = 0; ct_snat(10.0.0.1);)
>> +])
>> +AT_CHECK([grep priority=161 snat_flows], [0], [dnl
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1")), action=(ct_snat_in_czone(10.0.0.1);)
>> +])
>> +
>>   AT_CLEANUP
>>   ])
>>
>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>> index d4c22e7e9..0b29becc8 100644
>> --- a/tests/system-ovn.at
>> +++ b/tests/system-ovn.at
>> @@ -7975,3 +7975,120 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>
>>   AT_CLEANUP
>>   ])
>> +
>> +OVN_FOR_EACH_NORTHD([
>> +AT_SETUP([SNAT in separate zone from DNAT])
>> +
>> +CHECK_CONNTRACK()
>> +CHECK_CONNTRACK_NAT()
>> +ovn_start
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +ADD_BR([br-int])
>> +
>> +# Set external-ids in br-int needed for ovn-controller
>> +ovs-vsctl \
>> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
>> +        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
>> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
>> +        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
>> +
>> +# The goal of this test is to ensure that when traffic is first DNATted
>> +# (by way of a load balancer), and then SNATted, the SNAT happens in a
>> +# separate conntrack zone from the DNAT.
>> +
>> +start_daemon ovn-controller
>> +
>> +check ovn-nbctl ls-add public
>> +
>> +check ovn-nbctl lr-add r1
>> +check ovn-nbctl lrp-add r1 r1_public 00:de:ad:ff:00:01 172.16.0.1/16
>> +check ovn-nbctl lrp-add r1 r1_s1 00:de:ad:fe:00:01 173.0.1.1/24
>> +check ovn-nbctl lrp-set-gateway-chassis r1_public hv1
>> +
>> +check ovn-nbctl lb-add r1_lb 30.0.0.1 172.16.0.102
>> +check ovn-nbctl lr-lb-add r1 r1_lb
>> +
>> +check ovn-nbctl ls-add s1
>> +check ovn-nbctl lsp-add s1 s1_r1
>> +check ovn-nbctl lsp-set-type s1_r1 router
>> +check ovn-nbctl lsp-set-addresses s1_r1 router
>> +check ovn-nbctl lsp-set-options s1_r1 router-port=r1_s1
>> +
>> +check ovn-nbctl lsp-add s1 vm1
>> +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 173.0.1.2"
>> +
>> +check ovn-nbctl lsp-add public public_r1
>> +check ovn-nbctl lsp-set-type public_r1 router
>> +check ovn-nbctl lsp-set-addresses public_r1 router
>> +check ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router
>> +
>> +check ovn-nbctl lr-add r2
>> +check ovn-nbctl lrp-add r2 r2_public 00:de:ad:ff:00:02 172.16.0.2/16
>> +check ovn-nbctl lrp-add r2 r2_s2 00:de:ad:fe:00:02 173.0.2.1/24
>> +check ovn-nbctl lr-nat-add r2 dnat_and_snat 172.16.0.102 173.0.2.2
>> +check ovn-nbctl lrp-set-gateway-chassis r2_public hv1
>> +
>> +check ovn-nbctl ls-add s2
>> +check ovn-nbctl lsp-add s2 s2_r2
>> +check ovn-nbctl lsp-set-type s2_r2 router
>> +check ovn-nbctl lsp-set-addresses s2_r2 router
>> +check ovn-nbctl lsp-set-options s2_r2 router-port=r2_s2
>> +
>> +check ovn-nbctl lsp-add s2 vm2
>> +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 173.0.2.2"
>> +
>> +check ovn-nbctl lsp-add public public_r2
>> +check ovn-nbctl lsp-set-type public_r2 router
>> +check ovn-nbctl lsp-set-addresses public_r2 router
>> +check ovn-nbctl lsp-set-options public_r2 router-port=r2_public nat-addresses=router
>> +
>> +ADD_NAMESPACES(vm1)
>> +ADD_VETH(vm1, vm1, br-int, "173.0.1.2/24", "00:de:ad:01:00:01", \
>> +         "173.0.1.1")
>> +ADD_NAMESPACES(vm2)
>> +ADD_VETH(vm2, vm2, br-int, "173.0.2.2/24", "00:de:ad:01:00:02", \
>> +         "173.0.2.1")
>> +
>> +check ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03
>> +check ovn-nbctl --wait=hv sync
>> +
>> +# Next, make sure that a ping works as expected
>> +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.1 | FORMAT_PING], \
>> +[0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +
>> +# Finally, make sure that conntrack shows two separate zones being used for
>> +# DNAT and SNAT
>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \
>> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>> +icmp,orig=(src=173.0.1.2,dst=30.0.0.1,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=173.0.1.2,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x2
>> +])
>> +
>> +# The final two entries appear identical here. That is because FORMAT_CT
>> +# scrubs the zone numbers. In actuality, the zone numbers are different,
>> +# which is why there are two entries.
>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.0.102) | \
>> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>> +icmp,orig=(src=172.16.0.101,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=173.0.2.2,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
>> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
>> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
>> +])
>> +
>> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
>> +
>> +as ovn-sb
>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> +
>> +as ovn-nb
>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> +
>> +as northd
>> +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
>> +
>> +as
>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>> +/connection dropped.*/d"])
>> +AT_CLEANUP
>> +])
>> --
>> 2.31.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Numan Siddique March 24, 2022, 9:17 p.m. UTC | #3
On Thu, Mar 24, 2022 at 4:47 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> Hi Numan,
>
> This message got buried in my inbox, so sorry about only replying now. I
> have some replies in-line below
>
> On 3/18/22 16:34, Numan Siddique wrote:
> > On Wed, Mar 9, 2022 at 2:38 PM Mark Michelson <mmichels@redhat.com> wrote:
> >>
> >> Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality
> >> in ovn-northd to use a single zone for SNAT and DNAT when possible. It
> >> accounts for certain situations, such as hairpinned traffic, where we
> >> still need a separate SNAT and DNAT zone.
> >>
> >> However, one situation it did not account for was when traffic traverses
> >> a logical router and is DNATted as a result of a load balancer, then
> >> when the traffic egresses the router, it needs to be SNATted. In this
> >> situation, we try to use the same CT zone for the SNAT as for the load
> >> balancer DNAT, which does not work.
> >>
> >> This commit fixes the issue by installing a flow in the OUT_SNAT stage
> >> of the logical router egress pipeline for each SNAT. This flow will fall
> >> back to using a separate CT zone for SNAT if the ct_label.natted flag is
> >> set.
> >>
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2044127
> >>
> >> Signed-off-by: Mark Michelson <mmichels@redhat.com>
> >
> >
> > Hi Mark,
> >
> > Few comments,
> >
> >   - There are a few test cases failing with this patch.
>
> Interesting. I'll have to double-check about that but I hadn't noticed
> it happening locally.
>
> >
> >   - I'm pretty sure the issue will not be seen if the load balancer is
> > also associated with the logical switch.  In that case, dnat would
> > happen in the logical switch pipeline.
> >     But still we should address this issue as it's a regression
> > introduced by the commit 4deac4509abb.
> >
> >   - Only small concern I've with this patch is the usage of
> > ct_label.natted field in the egress pipeline.  In the ingress pipeline
> > of the router we send the pkt to conntrack and then do NAT.
> >     I'm just worried what if the ct states/labels are not valid or they
> > get cleared when the pkt enters ingress to egress pipeline.  Looks
> > like we don't clear the conntrack state now.
> >     But future patches could.  I think we should document this somewhere.
>
> The tricky thing here is that I don't know if there is any other
> reliable way to know in the egress pipeline if the packet was natted in
> the ingress pipeline. AFAIK, there is no scenario where the router
> egress pipeline can run on a different node than the router ingress
> pipeline, so at least the way it currently works should be reliable.
>
> Where is the best place to document this?

I guess a code comment in northd.c would be helpful.  I'd suggest to see
if we can document in ovn-northd.8.xml while updating this file with the newly
added logical flow(s).  I don't have any strong preference.  So please see
if it makes sense to document in ovn-northd.8.xml too.

Numan

>
> >
> > -  This patch needs to add the relevant documentation in the
> > ovn-northd.8.xml for the new flows added.
> >
> > -  Instead of adding the flows in the "lr_out_snat" stage,  I'd
> > suggest to add the below flow in the "lr_out_chk_dnat_local" pipeline
> >
> >      table=0 (lr_out_chk_dnat_local), priority=50   , match=(ip &&
> > ct_label.natted == 1),  action=(reg9[4] = 1; next;)
> >
> > We already have flows in the "lr_out_snat" stage to make use of
> > "ct_snat" action instead of "ct_snat_in_czone" if reg9[4] is set.
> > Can you please see if this suggestion is possible ?
>
> That sounds perfectly reasonable. I'll test it out and see if it works.
>
> >
> >
> > Thanks
> > Numan
>
>
> >
> >
> >> ---
> >> Note: The "SNAT in separate zone from DNAT" test in system-ovn.at can
> >> fail on some systems, and at this point it's not clear what causes it.
> >> On my development system running Fedora 33 and kernel 5.14.18-100, the
> >> test fails because the pings fail. In a container running Fedora 33 and
> >> kernel 5.10.19-200, the test succeeds. It's unknown if it's the kernel
> >> version or some other setting that has a bearing on the success or
> >> failure of the test. The OpenFlow that ovn-controller generates is
> >> identical on both successful and failure scenarios.
> >>
> >> If this inconsistency causes issues with CI, then this patch may need to
> >> be re-examined to determine if there are other parameters that may be
> >> needed to correct the referenced issue.
> >> ---
> >>   northd/northd.c     |  53 ++++++++++++++------
> >>   tests/ovn-northd.at |  36 ++++++++++++++
> >>   tests/system-ovn.at | 117 ++++++++++++++++++++++++++++++++++++++++++++
> >>   3 files changed, 190 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/northd/northd.c b/northd/northd.c
> >> index b264fb850..866a81310 100644
> >> --- a/northd/northd.c
> >> +++ b/northd/northd.c
> >> @@ -12961,23 +12961,44 @@ build_lrouter_out_snat_flow(struct hmap *lflows, struct ovn_datapath *od,
> >>                                   priority, ds_cstr(match),
> >>                                   ds_cstr(actions), &nat->header_);
> >>
> >> -        if (!stateless) {
> >> -            ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
> >> -            ds_clear(actions);
> >> -            if (distributed) {
> >> -                ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
> >> -                              ETH_ADDR_ARGS(mac));
> >> -            }
> >> -            ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; ct_snat(%s",
> >> -                          nat->external_ip);
> >> -            if (nat->external_port_range[0]) {
> >> -                ds_put_format(actions, ",%s", nat->external_port_range);
> >> -            }
> >> -            ds_put_format(actions, ");");
> >> -            ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> >> -                                    priority + 1, ds_cstr(match),
> >> -                                    ds_cstr(actions), &nat->header_);
> >> +        if (stateless) {
> >> +            return;
> >> +        }
> >> +
> >> +        struct ds match_natted = DS_EMPTY_INITIALIZER;
> >> +        struct ds actions_natted = DS_EMPTY_INITIALIZER;
> >> +
> >> +        ds_clone(&match_natted, match);
> >> +
> >> +        ds_clear(actions);
> >> +
> >> +        if (distributed) {
> >> +            ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
> >> +                          ETH_ADDR_ARGS(mac));
> >>           }
> >> +
> >> +        ds_clone(&actions_natted, actions);
> >> +
> >> +        ds_put_format(&actions_natted, "ct_snat(%s", nat->external_ip);
> >> +        if (nat->external_port_range[0]) {
> >> +            ds_put_format(&actions_natted, ",%s", nat->external_port_range);
> >> +        }
> >> +        ds_put_format(&actions_natted, ");");
> >> +
> >> +        ds_put_cstr(&match_natted, " && ct_label.natted == 1");
> >> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> >> +                                priority + 2, ds_cstr(&match_natted),
> >> +                                ds_cstr(&actions_natted), &nat->header_);
> >> +
> >> +        ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
> >> +        ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; %s",
> >> +                      ds_cstr(&actions_natted));
> >> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
> >> +                                priority + 1, ds_cstr(match),
> >> +                                ds_cstr(actions), &nat->header_);
> >> +
> >> +        ds_destroy(&match_natted);
> >> +        ds_destroy(&actions_natted);
> >>       }
> >>   }
> >>
> >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >> index 60d91a771..d9657bcb8 100644
> >> --- a/tests/ovn-northd.at
> >> +++ b/tests/ovn-northd.at
> >> @@ -5974,6 +5974,42 @@ AT_CHECK([grep -e "(lr_in_ip_routing   ).*outport" lr0flows | sed 's/table=../ta
> >>     table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;)
> >>   ])
> >>
> >> +AT_CLEANUP
> >> +
> >> +OVN_FOR_EACH_NORTHD([
> >> +AT_SETUP([nat -- SNAT flows for DNAT traffic])
> >> +# This test makes sure that SNAT flows are added that ensure a unique
> >> +# SNAT zone is used for traffic that is natted. We don't need an elaborate
> >> +# setup just to ensure the flow is present. We just need to create the router
> >> +# and add an SNAT to it.
> >> +
> >> +ovn_start
> >> +
> >> +check ovn-nbctl lr-add r0
> >> +check ovn-nbctl lrp-add r0 r0p1 00:00:00:00:00:01 10.0.0.1
> >> +check ovn-nbctl lrp-set-gateway-chassis r0p1 hv1
> >> +check ovn-nbctl lr-nat-add r0 snat 10.0.0.1 192.168.0.1
> >> +check ovn-nbctl --wait=sb sync
> >> +
> >> +ovn-sbctl lflow-list r0 | grep lr_out_snat | sed 's/table=../table=??/' > snat_flows
> >> +
> >> +# We expect three SNAT flows for the NAT we created.
> >> +# Priority-161 flow that uses CT zone stored in reg11
> >> +# Priority-162 flow that uses CT zone stored in reg12 if local dnat is detected
> >> +# Priority-163 flow taht uses CT zone stored in reg12 if traffic is natted
> >> +# Since priorities are unique, we can grep based on that and check the resulting
> >> +# flows are as expected.
> >> +
> >> +AT_CHECK([grep priority=163 snat_flows], [0], [dnl
> >> +  table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && ct_label.natted == 1), action=(ct_snat(10.0.0.1);)
> >> +])
> >> +AT_CHECK([grep priority=162 snat_flows], [0], [dnl
> >> +  table=??(lr_out_snat        ), priority=162  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && reg9[[4]] == 1), action=(reg9[[4]] = 0; ct_snat(10.0.0.1);)
> >> +])
> >> +AT_CHECK([grep priority=161 snat_flows], [0], [dnl
> >> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1")), action=(ct_snat_in_czone(10.0.0.1);)
> >> +])
> >> +
> >>   AT_CLEANUP
> >>   ])
> >>
> >> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> >> index d4c22e7e9..0b29becc8 100644
> >> --- a/tests/system-ovn.at
> >> +++ b/tests/system-ovn.at
> >> @@ -7975,3 +7975,120 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> >>
> >>   AT_CLEANUP
> >>   ])
> >> +
> >> +OVN_FOR_EACH_NORTHD([
> >> +AT_SETUP([SNAT in separate zone from DNAT])
> >> +
> >> +CHECK_CONNTRACK()
> >> +CHECK_CONNTRACK_NAT()
> >> +ovn_start
> >> +OVS_TRAFFIC_VSWITCHD_START()
> >> +ADD_BR([br-int])
> >> +
> >> +# Set external-ids in br-int needed for ovn-controller
> >> +ovs-vsctl \
> >> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
> >> +        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> >> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
> >> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
> >> +        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
> >> +
> >> +# The goal of this test is to ensure that when traffic is first DNATted
> >> +# (by way of a load balancer), and then SNATted, the SNAT happens in a
> >> +# separate conntrack zone from the DNAT.
> >> +
> >> +start_daemon ovn-controller
> >> +
> >> +check ovn-nbctl ls-add public
> >> +
> >> +check ovn-nbctl lr-add r1
> >> +check ovn-nbctl lrp-add r1 r1_public 00:de:ad:ff:00:01 172.16.0.1/16
> >> +check ovn-nbctl lrp-add r1 r1_s1 00:de:ad:fe:00:01 173.0.1.1/24
> >> +check ovn-nbctl lrp-set-gateway-chassis r1_public hv1
> >> +
> >> +check ovn-nbctl lb-add r1_lb 30.0.0.1 172.16.0.102
> >> +check ovn-nbctl lr-lb-add r1 r1_lb
> >> +
> >> +check ovn-nbctl ls-add s1
> >> +check ovn-nbctl lsp-add s1 s1_r1
> >> +check ovn-nbctl lsp-set-type s1_r1 router
> >> +check ovn-nbctl lsp-set-addresses s1_r1 router
> >> +check ovn-nbctl lsp-set-options s1_r1 router-port=r1_s1
> >> +
> >> +check ovn-nbctl lsp-add s1 vm1
> >> +check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 173.0.1.2"
> >> +
> >> +check ovn-nbctl lsp-add public public_r1
> >> +check ovn-nbctl lsp-set-type public_r1 router
> >> +check ovn-nbctl lsp-set-addresses public_r1 router
> >> +check ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router
> >> +
> >> +check ovn-nbctl lr-add r2
> >> +check ovn-nbctl lrp-add r2 r2_public 00:de:ad:ff:00:02 172.16.0.2/16
> >> +check ovn-nbctl lrp-add r2 r2_s2 00:de:ad:fe:00:02 173.0.2.1/24
> >> +check ovn-nbctl lr-nat-add r2 dnat_and_snat 172.16.0.102 173.0.2.2
> >> +check ovn-nbctl lrp-set-gateway-chassis r2_public hv1
> >> +
> >> +check ovn-nbctl ls-add s2
> >> +check ovn-nbctl lsp-add s2 s2_r2
> >> +check ovn-nbctl lsp-set-type s2_r2 router
> >> +check ovn-nbctl lsp-set-addresses s2_r2 router
> >> +check ovn-nbctl lsp-set-options s2_r2 router-port=r2_s2
> >> +
> >> +check ovn-nbctl lsp-add s2 vm2
> >> +check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 173.0.2.2"
> >> +
> >> +check ovn-nbctl lsp-add public public_r2
> >> +check ovn-nbctl lsp-set-type public_r2 router
> >> +check ovn-nbctl lsp-set-addresses public_r2 router
> >> +check ovn-nbctl lsp-set-options public_r2 router-port=r2_public nat-addresses=router
> >> +
> >> +ADD_NAMESPACES(vm1)
> >> +ADD_VETH(vm1, vm1, br-int, "173.0.1.2/24", "00:de:ad:01:00:01", \
> >> +         "173.0.1.1")
> >> +ADD_NAMESPACES(vm2)
> >> +ADD_VETH(vm2, vm2, br-int, "173.0.2.2/24", "00:de:ad:01:00:02", \
> >> +         "173.0.2.1")
> >> +
> >> +check ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03
> >> +check ovn-nbctl --wait=hv sync
> >> +
> >> +# Next, make sure that a ping works as expected
> >> +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.1 | FORMAT_PING], \
> >> +[0], [dnl
> >> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >> +])
> >> +
> >> +# Finally, make sure that conntrack shows two separate zones being used for
> >> +# DNAT and SNAT
> >> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \
> >> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> >> +icmp,orig=(src=173.0.1.2,dst=30.0.0.1,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=173.0.1.2,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x2
> >> +])
> >> +
> >> +# The final two entries appear identical here. That is because FORMAT_CT
> >> +# scrubs the zone numbers. In actuality, the zone numbers are different,
> >> +# which is why there are two entries.
> >> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.0.102) | \
> >> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> >> +icmp,orig=(src=172.16.0.101,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=173.0.2.2,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> >> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> >> +icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
> >> +])
> >> +
> >> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >> +
> >> +as ovn-sb
> >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >> +
> >> +as ovn-nb
> >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >> +
> >> +as northd
> >> +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
> >> +
> >> +as
> >> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
> >> +/connection dropped.*/d"])
> >> +AT_CLEANUP
> >> +])
> >> --
> >> 2.31.1
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index b264fb850..866a81310 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -12961,23 +12961,44 @@  build_lrouter_out_snat_flow(struct hmap *lflows, struct ovn_datapath *od,
                                 priority, ds_cstr(match),
                                 ds_cstr(actions), &nat->header_);
 
-        if (!stateless) {
-            ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
-            ds_clear(actions);
-            if (distributed) {
-                ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
-                              ETH_ADDR_ARGS(mac));
-            }
-            ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; ct_snat(%s",
-                          nat->external_ip);
-            if (nat->external_port_range[0]) {
-                ds_put_format(actions, ",%s", nat->external_port_range);
-            }
-            ds_put_format(actions, ");");
-            ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
-                                    priority + 1, ds_cstr(match),
-                                    ds_cstr(actions), &nat->header_);
+        if (stateless) {
+            return;
+        }
+
+        struct ds match_natted = DS_EMPTY_INITIALIZER;
+        struct ds actions_natted = DS_EMPTY_INITIALIZER;
+
+        ds_clone(&match_natted, match);
+
+        ds_clear(actions);
+
+        if (distributed) {
+            ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ",
+                          ETH_ADDR_ARGS(mac));
         }
+
+        ds_clone(&actions_natted, actions);
+
+        ds_put_format(&actions_natted, "ct_snat(%s", nat->external_ip);
+        if (nat->external_port_range[0]) {
+            ds_put_format(&actions_natted, ",%s", nat->external_port_range);
+        }
+        ds_put_format(&actions_natted, ");");
+
+        ds_put_cstr(&match_natted, " && ct_label.natted == 1");
+        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
+                                priority + 2, ds_cstr(&match_natted),
+                                ds_cstr(&actions_natted), &nat->header_);
+
+        ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1");
+        ds_put_format(actions,  REGBIT_DST_NAT_IP_LOCAL" = 0; %s",
+                      ds_cstr(&actions_natted));
+        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_OUT_SNAT,
+                                priority + 1, ds_cstr(match),
+                                ds_cstr(actions), &nat->header_);
+
+        ds_destroy(&match_natted);
+        ds_destroy(&actions_natted);
     }
 }
 
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 60d91a771..d9657bcb8 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -5974,6 +5974,42 @@  AT_CHECK([grep -e "(lr_in_ip_routing   ).*outport" lr0flows | sed 's/table=../ta
   table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;)
 ])
 
+AT_CLEANUP
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([nat -- SNAT flows for DNAT traffic])
+# This test makes sure that SNAT flows are added that ensure a unique
+# SNAT zone is used for traffic that is natted. We don't need an elaborate
+# setup just to ensure the flow is present. We just need to create the router
+# and add an SNAT to it.
+
+ovn_start
+
+check ovn-nbctl lr-add r0
+check ovn-nbctl lrp-add r0 r0p1 00:00:00:00:00:01 10.0.0.1
+check ovn-nbctl lrp-set-gateway-chassis r0p1 hv1
+check ovn-nbctl lr-nat-add r0 snat 10.0.0.1 192.168.0.1
+check ovn-nbctl --wait=sb sync
+
+ovn-sbctl lflow-list r0 | grep lr_out_snat | sed 's/table=../table=??/' > snat_flows
+
+# We expect three SNAT flows for the NAT we created.
+# Priority-161 flow that uses CT zone stored in reg11
+# Priority-162 flow that uses CT zone stored in reg12 if local dnat is detected
+# Priority-163 flow taht uses CT zone stored in reg12 if traffic is natted
+# Since priorities are unique, we can grep based on that and check the resulting
+# flows are as expected.
+
+AT_CHECK([grep priority=163 snat_flows], [0], [dnl
+  table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && ct_label.natted == 1), action=(ct_snat(10.0.0.1);)
+])
+AT_CHECK([grep priority=162 snat_flows], [0], [dnl
+  table=??(lr_out_snat        ), priority=162  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1") && reg9[[4]] == 1), action=(reg9[[4]] = 0; ct_snat(10.0.0.1);)
+])
+AT_CHECK([grep priority=161 snat_flows], [0], [dnl
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 192.168.0.1 && outport == "r0p1" && is_chassis_resident("cr-r0p1")), action=(ct_snat_in_czone(10.0.0.1);)
+])
+
 AT_CLEANUP
 ])
 
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index d4c22e7e9..0b29becc8 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -7975,3 +7975,120 @@  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
 
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([SNAT in separate zone from DNAT])
+
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+ovn_start
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-int])
+
+# Set external-ids in br-int needed for ovn-controller
+ovs-vsctl \
+        -- set Open_vSwitch . external-ids:system-id=hv1 \
+        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
+        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
+        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
+        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
+
+# The goal of this test is to ensure that when traffic is first DNATted
+# (by way of a load balancer), and then SNATted, the SNAT happens in a
+# separate conntrack zone from the DNAT.
+
+start_daemon ovn-controller
+
+check ovn-nbctl ls-add public
+
+check ovn-nbctl lr-add r1
+check ovn-nbctl lrp-add r1 r1_public 00:de:ad:ff:00:01 172.16.0.1/16
+check ovn-nbctl lrp-add r1 r1_s1 00:de:ad:fe:00:01 173.0.1.1/24
+check ovn-nbctl lrp-set-gateway-chassis r1_public hv1
+
+check ovn-nbctl lb-add r1_lb 30.0.0.1 172.16.0.102
+check ovn-nbctl lr-lb-add r1 r1_lb
+
+check ovn-nbctl ls-add s1
+check ovn-nbctl lsp-add s1 s1_r1
+check ovn-nbctl lsp-set-type s1_r1 router
+check ovn-nbctl lsp-set-addresses s1_r1 router
+check ovn-nbctl lsp-set-options s1_r1 router-port=r1_s1
+
+check ovn-nbctl lsp-add s1 vm1
+check ovn-nbctl lsp-set-addresses vm1 "00:de:ad:01:00:01 173.0.1.2"
+
+check ovn-nbctl lsp-add public public_r1
+check ovn-nbctl lsp-set-type public_r1 router
+check ovn-nbctl lsp-set-addresses public_r1 router
+check ovn-nbctl lsp-set-options public_r1 router-port=r1_public nat-addresses=router
+
+check ovn-nbctl lr-add r2
+check ovn-nbctl lrp-add r2 r2_public 00:de:ad:ff:00:02 172.16.0.2/16
+check ovn-nbctl lrp-add r2 r2_s2 00:de:ad:fe:00:02 173.0.2.1/24
+check ovn-nbctl lr-nat-add r2 dnat_and_snat 172.16.0.102 173.0.2.2
+check ovn-nbctl lrp-set-gateway-chassis r2_public hv1
+
+check ovn-nbctl ls-add s2
+check ovn-nbctl lsp-add s2 s2_r2
+check ovn-nbctl lsp-set-type s2_r2 router
+check ovn-nbctl lsp-set-addresses s2_r2 router
+check ovn-nbctl lsp-set-options s2_r2 router-port=r2_s2
+
+check ovn-nbctl lsp-add s2 vm2
+check ovn-nbctl lsp-set-addresses vm2 "00:de:ad:01:00:02 173.0.2.2"
+
+check ovn-nbctl lsp-add public public_r2
+check ovn-nbctl lsp-set-type public_r2 router
+check ovn-nbctl lsp-set-addresses public_r2 router
+check ovn-nbctl lsp-set-options public_r2 router-port=r2_public nat-addresses=router
+
+ADD_NAMESPACES(vm1)
+ADD_VETH(vm1, vm1, br-int, "173.0.1.2/24", "00:de:ad:01:00:01", \
+         "173.0.1.1")
+ADD_NAMESPACES(vm2)
+ADD_VETH(vm2, vm2, br-int, "173.0.2.2/24", "00:de:ad:01:00:02", \
+         "173.0.2.1")
+
+check ovn-nbctl lr-nat-add r1 dnat_and_snat 172.16.0.101 173.0.1.2 vm1 00:00:00:01:02:03
+check ovn-nbctl --wait=hv sync
+
+# Next, make sure that a ping works as expected
+NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 30.0.0.1 | FORMAT_PING], \
+[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+# Finally, make sure that conntrack shows two separate zones being used for
+# DNAT and SNAT
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | \
+sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
+icmp,orig=(src=173.0.1.2,dst=30.0.0.1,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=173.0.1.2,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x2
+])
+
+# The final two entries appear identical here. That is because FORMAT_CT
+# scrubs the zone numbers. In actuality, the zone numbers are different,
+# which is why there are two entries.
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.0.102) | \
+sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
+icmp,orig=(src=172.16.0.101,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=173.0.2.2,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
+icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
+icmp,orig=(src=173.0.1.2,dst=172.16.0.102,id=<cleared>,type=8,code=0),reply=(src=172.16.0.102,dst=172.16.0.101,id=<cleared>,type=0,code=0),zone=<cleared>
+])
+
+OVS_APP_EXIT_AND_WAIT([ovn-controller])
+
+as ovn-sb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as ovn-nb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as northd
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
+
+as
+OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
+/connection dropped.*/d"])
+AT_CLEANUP
+])