diff mbox series

[ovs-dev,v2,2/2] northd: Fix direct access to SNAT network on DR.

Message ID 20240312194426.88840-2-martin.kalcok@canonical.com
State Superseded
Delegated to: Dumitru Ceara
Headers show
Series [ovs-dev,v2,1/2] actions: Enable specifying zone for ct_commit. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Martin Kalcok March 12, 2024, 7:44 p.m. UTC
This change fixes bug that breaks ability of machines from external
networks to communicate with machines in SNATed networks (specifically
when using a Distributed router).

Currently when a machine (S1) on an external network tries to talk
over TCP with a machine (A1) in a network that has enabled SNAT, the
connection is established successfully. However after the three-way
handshake, any packets that come from the A1 machine will have their
source address translated by the Distributed router, breaking the
communication.

Existing rule in `build_lrouter_out_snat_flow` that decides which
packets should be SNATed already tries to avoid SNATing packets in
reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages
in the distributed LR egress pipeline do not initiate the CT state.

Additionally we need to commit new connections that originate from
external networks into CT, so that the packets in the reply direction
can be properly identified.

Rationale:

In my original RFC [0], there were questions about the motivation for
fixing this issue. I'll try to summarize why I think this is a bug
that should be fixed.

1. Current implementation for Distributed router already tries to
   avoid SNATing packets in the reply direction, it's just missing
   initialized CT states to make proper decisions.

2. This same scenario works with Gateway Router. I tested with
   following setup:

    foo -- R1 -- join - R3 -- alice
                  |
    bar ----------R2

    R1 is a Distributed router with SNAT for foo. R2 is a Gateway
    router with SNAT for bar. R3 is a Gateway router with no SNAT.
    Using 'alice1' as a client I was able to talk over TCP with
    'bar1' but connection with 'foo1' failed.

3. Regarding security and "leaking" of internal IPs. Reading through
   RFC 4787 [1], 5382 [2] and their update in 7857 [3], the
   specifications do not seem to mandate that SNAT implementations
   must filter incoming traffic destined directly to the internal
   network. Sections like "5. Filtering Behavior" in 4787 and
   "4.3 Externally Initiated Connections" in 5382 describe only
   behavior for traffic destined to external IP/Port associated
   with NAT on the device that performs NAT.

   Besides, with the current implementation, it's already possible
   to scan the internal network with pings and TCP syn scanning.

4. We do have customers/clouds that depend on this functionality.
   This is a scenario that used to work in Openstack with ML2/OVS
   and migrating those clouds to ML2/OVN would break it.

[0]https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html
[1]https://datatracker.ietf.org/doc/html/rfc4787
[2]https://datatracker.ietf.org/doc/html/rfc5382
[3]https://datatracker.ietf.org/doc/html/rfc7857

Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
---
 northd/northd.c         | 68 ++++++++++++++++++++++++++++++++--------
 northd/ovn-northd.8.xml | 29 +++++++++++++++++
 tests/ovn-northd.at     | 33 ++++++++++++++++----
 tests/system-ovn.at     | 69 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 180 insertions(+), 19 deletions(-)

Comments

Martin Kalcok March 13, 2024, 9:17 a.m. UTC | #1
Regarding the failed unstable test in the CI, I suspect that this is not
related to the code change, I've had couple successful CI runs in my branch
[0].

Martin.

[0] https://github.com/mkalcok/ovn/actions/runs/8256539983

On Tue, Mar 12, 2024 at 8:45 PM Martin Kalcok <martin.kalcok@canonical.com>
wrote:

> This change fixes bug that breaks ability of machines from external
> networks to communicate with machines in SNATed networks (specifically
> when using a Distributed router).
>
> Currently when a machine (S1) on an external network tries to talk
> over TCP with a machine (A1) in a network that has enabled SNAT, the
> connection is established successfully. However after the three-way
> handshake, any packets that come from the A1 machine will have their
> source address translated by the Distributed router, breaking the
> communication.
>
> Existing rule in `build_lrouter_out_snat_flow` that decides which
> packets should be SNATed already tries to avoid SNATing packets in
> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages
> in the distributed LR egress pipeline do not initiate the CT state.
>
> Additionally we need to commit new connections that originate from
> external networks into CT, so that the packets in the reply direction
> can be properly identified.
>
> Rationale:
>
> In my original RFC [0], there were questions about the motivation for
> fixing this issue. I'll try to summarize why I think this is a bug
> that should be fixed.
>
> 1. Current implementation for Distributed router already tries to
>    avoid SNATing packets in the reply direction, it's just missing
>    initialized CT states to make proper decisions.
>
> 2. This same scenario works with Gateway Router. I tested with
>    following setup:
>
>     foo -- R1 -- join - R3 -- alice
>                   |
>     bar ----------R2
>
>     R1 is a Distributed router with SNAT for foo. R2 is a Gateway
>     router with SNAT for bar. R3 is a Gateway router with no SNAT.
>     Using 'alice1' as a client I was able to talk over TCP with
>     'bar1' but connection with 'foo1' failed.
>
> 3. Regarding security and "leaking" of internal IPs. Reading through
>    RFC 4787 [1], 5382 [2] and their update in 7857 [3], the
>    specifications do not seem to mandate that SNAT implementations
>    must filter incoming traffic destined directly to the internal
>    network. Sections like "5. Filtering Behavior" in 4787 and
>    "4.3 Externally Initiated Connections" in 5382 describe only
>    behavior for traffic destined to external IP/Port associated
>    with NAT on the device that performs NAT.
>
>    Besides, with the current implementation, it's already possible
>    to scan the internal network with pings and TCP syn scanning.
>
> 4. We do have customers/clouds that depend on this functionality.
>    This is a scenario that used to work in Openstack with ML2/OVS
>    and migrating those clouds to ML2/OVN would break it.
>
> [0]
> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html
> [1]https://datatracker.ietf.org/doc/html/rfc4787
> [2]https://datatracker.ietf.org/doc/html/rfc5382
> [3]https://datatracker.ietf.org/doc/html/rfc7857
>
> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
> ---
>  northd/northd.c         | 68 ++++++++++++++++++++++++++++++++--------
>  northd/ovn-northd.8.xml | 29 +++++++++++++++++
>  tests/ovn-northd.at     | 33 ++++++++++++++++----
>  tests/system-ovn.at     | 69 +++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 180 insertions(+), 19 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 2c3560ce2..25af52d5a 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -14438,20 +14438,27 @@ build_lrouter_out_is_dnat_local(struct
> lflow_table *lflows,
>
>  static void
>  build_lrouter_out_snat_match(struct lflow_table *lflows,
> -                             const struct ovn_datapath *od,
> -                             const struct nbrec_nat *nat, struct ds
> *match,
> -                             bool distributed_nat, int cidr_bits, bool
> is_v6,
> -                             struct ovn_port *l3dgw_port,
> -                             struct lflow_ref *lflow_ref)
> +                                         const struct ovn_datapath *od,
> +                                         const struct nbrec_nat *nat,
> +                                         struct ds *match,
> +                                         bool distributed_nat, int
> cidr_bits,
> +                                         bool is_v6,
> +                                         struct ovn_port *l3dgw_port,
> +                                         struct lflow_ref *lflow_ref,
> +                                         bool is_reverse)
>  {
>      ds_clear(match);
>
> -    ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4',
> +    ds_put_format(match, "ip && ip%c.%s == %s",
> +                  is_v6 ? '6' : '4',
> +                  is_reverse ? "dst" : "src",
>                    nat->logical_ip);
>
>      if (!od->is_gw_router) {
>          /* Distributed router. */
> -        ds_put_format(match, " && outport == %s", l3dgw_port->json_key);
> +        ds_put_format(match, " && %s == %s",
> +                      is_reverse ? "inport" : "outport",
> +                      l3dgw_port->json_key);
>          if (od->n_l3dgw_ports) {
>              ds_put_format(match, " && is_chassis_resident(\"%s\")",
>                            distributed_nat
> @@ -14462,7 +14469,7 @@ build_lrouter_out_snat_match(struct lflow_table
> *lflows,
>
>      if (nat->allowed_ext_ips || nat->exempted_ext_ips) {
>          lrouter_nat_add_ext_ip_match(od, lflows, match, nat,
> -                                     is_v6, false, cidr_bits,
> +                                     is_v6, is_reverse, cidr_bits,
>                                       lflow_ref);
>      }
>  }
> @@ -14489,7 +14496,8 @@ build_lrouter_out_snat_stateless_flow(struct
> lflow_table *lflows,
>      uint16_t priority = cidr_bits + 1;
>
>      build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
> -                                 cidr_bits, is_v6, l3dgw_port, lflow_ref);
> +                                 cidr_bits, is_v6, l3dgw_port, lflow_ref,
> +                                 false);
>
>      if (!od->is_gw_router) {
>          /* Distributed router. */
> @@ -14536,7 +14544,7 @@ build_lrouter_out_snat_in_czone_flow(struct
> lflow_table *lflows,
>
>      build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
>                                   cidr_bits, is_v6, l3dgw_port,
> -                                 lflow_ref);
> +                                 lflow_ref, false);
>
>      if (od->n_l3dgw_ports) {
>          priority += 128;
> @@ -14585,12 +14593,14 @@ build_lrouter_out_snat_flow(struct lflow_table
> *lflows,
>                              struct ds *actions, bool distributed_nat,
>                              struct eth_addr mac, int cidr_bits, bool
> is_v6,
>                              struct ovn_port *l3dgw_port,
> -                            struct lflow_ref *lflow_ref)
> +                            struct lflow_ref *lflow_ref,
> +                            const struct chassis_features *features)
>  {
>      if (strcmp(nat->type, "snat") && strcmp(nat->type, "dnat_and_snat")) {
>          return;
>      }
>
> +    struct ds match_all_from_snat = DS_EMPTY_INITIALIZER;
>      ds_clear(actions);
>
>      /* The priority here is calculated such that the
> @@ -14599,7 +14609,9 @@ build_lrouter_out_snat_flow(struct lflow_table
> *lflows,
>      uint16_t priority = cidr_bits + 1;
>
>      build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
> -                                 cidr_bits, is_v6, l3dgw_port, lflow_ref);
> +                                 cidr_bits, is_v6, l3dgw_port, lflow_ref,
> +                                 false);
> +    ds_clone(&match_all_from_snat, match);
>
>      if (!od->is_gw_router) {
>          /* Distributed router. */
> @@ -14624,6 +14636,36 @@ build_lrouter_out_snat_flow(struct lflow_table
> *lflows,
>                              priority, ds_cstr(match),
>                              ds_cstr(actions), &nat->header_,
>                              lflow_ref);
> +
> +    /* For the SNAT networks, we need to make sure that connections are
> +     * properly tracked so we can decide whether to perform SNAT on
> traffic
> +     * exiting the network. */
> +    if (features->ct_commit_to_zone && !strcmp(nat->type, "snat") &&
> +        !od->is_gw_router) {
> +        /* For traffic that comes from SNAT network, initiate CT state
> before
> +         * entering S_ROUTER_OUT_SNAT to allow matching on various CT
> states.
> +         */
> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70,
> +                      ds_cstr(&match_all_from_snat), "ct_snat; ",
> +                      lflow_ref);
> +
> +        build_lrouter_out_snat_match(lflows, od, nat, match,
> +                                     distributed_nat, cidr_bits, is_v6,
> +                                     l3dgw_port, lflow_ref, true);
> +
> +        /* New traffic that goes into SNAT network is committed to CT to
> avoid
> +         * SNAT-ing replies.*/
> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority,
> +                      ds_cstr(match), "ct_snat;",
> +                      lflow_ref);
> +
> +        ds_put_cstr(match, "&& ct.new");
> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority,
> +                      ds_cstr(match), "ct_commit(snat); next; ",
> +                      lflow_ref);
> +    }
> +
> +    ds_destroy(&match_all_from_snat);
>  }
>
>  static void
> @@ -15167,7 +15209,7 @@ build_lrouter_nat_defrag_and_lb(
>          } else {
>              build_lrouter_out_snat_flow(lflows, od, nat, match, actions,
>                                          distributed_nat, mac, cidr_bits,
> is_v6,
> -                                        l3dgw_port, lflow_ref);
> +                                        l3dgw_port, lflow_ref, features);
>          }
>
>          /* S_ROUTER_IN_ADMISSION - S_ROUTER_IN_IP_INPUT */
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 17b414144..ce6fd1270 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -4869,6 +4869,13 @@ nd_ns {
>
>      <p>
>        <ul>
> +        <li>
> +          A priority-70 logical flow is added that initiates CT state for
> +          traffic that is configured to be SNATed on Distributed routers.
> +          This allows the next table, <code>lr_out_snat</code>, to
> +          effectively match on various CT states.
> +        </li>
> +
>          <li>
>            A priority-50 logical flow is added that commits any untracked
> flows
>            from the previous table <code>lr_out_undnat</code> for Gateway
> @@ -5061,6 +5068,18 @@ nd_ns {
>
>        </li>
>
> +      <li>
> +        An additional flow is added for traffic that goes in opposite
> +        direction (i.e. it enters a network with configured SNAT). Where
> the
> +        flow above matched on <code>ip4.src == <var>A</var> &amp;&amp;
> outport
> +        == <var>GW</var></code>, this flow matches on <code> ip4.dst ==
> +        <var>A</var> &amp;&amp; inport == <var>GW</var></code>. A CT
> state is
> +        initiated for this traffic so that the following table, <code>
> +        lr_out_post_snat</code>, can identify whether the traffic flow was
> +        initiated from the internal or external network.
> +
> +      </li>
> +
>        <li>
>          A priority-0 logical flow with match <code>1</code> has actions
>          <code>next;</code>.
> @@ -5072,6 +5091,16 @@ nd_ns {
>      <p>
>        Packets reaching this table are processed according to the flows
> below:
>        <ul>
> +        <li>
> +          Traffic that goes directly into a network configured with SNAT
> on
> +          Distributed routers, and was initiated from an external network
> (i.e.
> +          it matches <code>ct.new</code>), is committed to the SNAT CT
> zone.
> +          This ensures that replies returning from the SNATed network do
> not
> +          have their source address translated. For details about match
> rules
> +          and priority see section "Egress Table 3: SNAT on Distributed
> +          Routers".
> +        </li>
> +
>          <li>
>            A priority-0 logical flow that matches all packets not already
>            handled (match <code>1</code>) and action <code>next;</code>.
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 0732486f3..d4a1212d0 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -1107,7 +1107,8 @@ ovn_start
>  #
>  # DR is connected to S1 and CR is connected to S2
>
> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>
>  check ovn-nbctl lr-add DR
>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
> @@ -1135,6 +1136,7 @@ echo "CR-LRP UUID is: " $uuid
>  check ovn-nbctl set Logical_Router $cr_uuid options:chassis=gw1
>  check ovn-nbctl --wait=sb sync
>
> +
>  ovn-nbctl create Address_Set name=allowed_range addresses=\"1.1.1.1\"
>  ovn-nbctl create Address_Set name=disallowed_range addresses=\"2.2.2.2\"
>
> @@ -1155,6 +1157,7 @@ AT_CAPTURE_FILE([crflows])
>  AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl
>    table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
> action=(next;)
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
> 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
> ip4.src == $allowed_range), action=(ct_snat;)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
> ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)),
> action=(ct_snat(172.16.1.1);)
>  ])
>
> @@ -1185,6 +1188,7 @@ AT_CAPTURE_FILE([crflows2])
>  AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl
>    table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
> action=(next;)
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
> 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
> action=(ct_snat;)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);)
>    table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src ==
> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
> ip4.dst == $disallowed_range), action=(next;)
>  ])
> @@ -1279,7 +1283,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 |
> ovn_strip_lflows], [0], [dnl
>    table=??(lr_out_snat        ), priority=33   , match=(ip && ip4.src ==
> 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; next;)
>  ])
>
> -# Stateful FIP with DISALLOWED_IPs
> +# Stateless FIP with DISALLOWED_IPs
>  ovn-nbctl lr-nat-del DR dnat_and_snat  172.16.1.2
>  ovn-nbctl lr-nat-del CR dnat_and_snat  172.16.1.2
>
> @@ -5489,7 +5493,8 @@ AT_CHECK([grep "lr_out_snat" lr0flows |
> ovn_strip_lflows], [0], [dnl
>
>  check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>    -- set chassis gw1 other_config:ct-no-masked-label="true" \
> -  -- set chassis gw1 other_config:ovn-ct-lb-related="true"
> +  -- set chassis gw1 other_config:ovn-ct-lb-related="true" \
> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>
>  # Create a distributed gw port on lr0
>  check ovn-nbctl ls-add public
> @@ -5590,12 +5595,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
> ovn_strip_lflows], [0], [dnl
>
>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
> [dnl
>    table=??(lr_out_post_undnat ), priority=0    , match=(1), action=(next;)
> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
> 10.0.0.0/24 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
> 10.0.0.10 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>  ])
>
>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>    table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
> action=(next;)
> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst ==
> 10.0.0.0/24 && inport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src ==
> 10.0.0.0/24 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
> action=(ct_snat(172.168.0.10);)
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
> 10.0.0.10 && inport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 10.0.0.10 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
> action=(ct_snat(172.168.0.30);)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")
> && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);)
>  ])
> @@ -5738,12 +5747,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
> ovn_strip_lflows], [0], [dnl
>
>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
> [dnl
>    table=??(lr_out_post_undnat ), priority=0    , match=(1), action=(next;)
> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
> 10.0.0.0/24 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
> 10.0.0.10 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>  ])
>
>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>    table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
> action=(next;)
> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst ==
> 10.0.0.0/24 && inport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src ==
> 10.0.0.0/24 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
> action=(ct_snat(172.168.0.10);)
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
> 10.0.0.10 && inport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 10.0.0.10 && outport == "lr0-public" &&
> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
> action=(ct_snat(172.168.0.30);)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")
> && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);)
>  ])
> @@ -7576,9 +7589,14 @@ ovn_start
>  # Validate SNAT, DNAT and DNAT_AND_SNAT behavior with multiple
>  # distributed gateway LRPs.
>
> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
> -check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
> -check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
> +
> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1 \
> +  -- set chassis gw2 other_config:ct-commit-to-zone="true"
> +
> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1 \
> +  -- set chassis gw3 other_config:ct-commit-to-zone="true"
>
>  check ovn-nbctl lr-add DR
>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
> @@ -7643,6 +7661,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat |
> ovn_strip_lflows], [0], [dn
>  ])
>
>  AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows],
> [0], [dnl
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
> 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
> action=(ct_snat;)
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
> 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")),
> action=(ct_snat;)
> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
> 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")),
> action=(ct_snat;)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") &&
> (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);)
>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") &&
> (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);)
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index c22c7882f..80e9b7d0b 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2
> 10.0.0.1 | FORMAT_PING], \
>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
>  ])
>
> +
> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
> +# icmp, udp and tcp connectivity from external network to the 'vm' on
> +# the specified 'ip'.
> +test_connectivity_from_ext() {
> +    local vm=$1; shift
> +    local ip=$1; shift
> +
> +    # Start listening daemons for UDP and TCP connections
> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
> +
> +    # Ensure that vm can be pinged on the specified IP
> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING],
> \
> +    [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +    # Perform two consecutive UDP connections to the specified IP
> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
> +
> +    # Send data over TCP connection to the specified IP
> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
> +}
> +
> +# Test access from external network to the internal IP of a VM that
> +# has also configured DNAT
> +test_connectivity_from_ext foo1 192.168.1.2
> +
> +# Test access from external network to the internal IP of a VM that
> +# does not have DNAT
> +test_connectivity_from_ext bar1 192.168.2.2
> +
>  OVS_WAIT_UNTIL([
>      total_pkts=$(cat ext-net.pcap | wc -l)
>      test "${total_pkts}" = "3"
> @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>
>  icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>  ])
>
> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
> +# icmp, udp and tcp connectivity from external network to the 'vm' on
> +# the specified 'ip'.
> +test_connectivity_from_ext() {
> +    local vm=$1; shift
> +    local ip=$1; shift
> +
> +    # Start listening daemons for UDP and TCP connections
> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
> +
> +    # Ensure that vm can be pinged on the specified IP
> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING],
> \
> +    [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +    # Perform two consecutive UDP connections to the specified IP
> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
> +
> +    # Send data over TCP connection to the specified IP
> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
> +}
> +
> +# Test access from external network to the internal IP of a VM that
> +# has also configured DNAT
> +test_connectivity_from_ext foo1 fd11::2
> +
> +# Test access from external network to the internal IP of a VM that
> +# does not have DNAT
> +test_connectivity_from_ext bar1 fd12::2
> +
>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
>
>  as ovn-sb
> @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
> 172.16.1.4 | FORMAT_PING], \
>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp |
> FORMAT_CT(172.16.1.1) | \
>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>
>  icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>
> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>
>  icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>  ])
>
> @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
> fd20::4 | FORMAT_PING], \
>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \
>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>
>  icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>
> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>
>  icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>  ])
>
> --
> 2.40.1
>
>
Martin Kalcok March 14, 2024, 1:12 p.m. UTC | #2
Hello all,
I have one more follow-up regarding the comments in v1. @amusil, you were
concerned about the impact this change would have on the performance so I
ran some tests to try to gauge it. I used following setup with two physical
hosts connected over switch:

       |    Physical ext. | Hypervisor1
       |       network    |
alice1-|----- switch -----|-- DLR --- foo1
       |                  |

* alice1 acts as external device.
  * It doesn't run OVN
  * It's connected to regular physical network.
* Hypervisor1 runs OVN chassis
* foo1 is a container behind Distributed LR with SNAT enabled.

Goal of this test was to measure whether the proposed patch affects
throughput of the traffic that flows from "foo1" to external network. I
used iperf3 and ran 3x 5 minute measurements with 10 streams ("iperf3 -i 0
-t 300 -P 10 -c alice1"). I used 5 minute for each measurement to smooth
out possible jitter and I ran each measurement 3 times to get feel for the
variance in the network throughput over longer period. I also did some
trial and error testing that showed that 10 parallel streams reached
highest throughput.

Following are the three (summary) results with this patch applied:

# run 1 (with patch)
[SUM]   0.00-300.00 sec   296 GBytes  8.48 Gbits/sec  sender
[SUM]   0.00-300.00 sec   296 GBytes  8.48 Gbits/sec  receiver

# run 2 (with patch)
[SUM]   0.00-300.00 sec   288 GBytes  8.24 Gbits/sec sender
[SUM]   0.00-300.00 sec   288 GBytes  8.23 Gbits/sec receiver

# run 3 (with patch)
[SUM]   0.00-300.00 sec   299 GBytes  8.55 Gbits/sec sender
[SUM]   0.00-300.00 sec   299 GBytes  8.55 Gbits/sec receiver

Next thing I did was to rebuild ovn-controller without these patches,
replaced the ovn-controller binary on the Hypervisor1 and restarted
ovn-controller daemon. I verified that the feature flag "ct-commit-to-zone"
was removed from the chassis (in the SB database), which forced the
recompute, and then I reran the tests:

# run 1 (clean)
[SUM]   0.00-300.00 sec   300 GBytes  8.59 Gbits/sec sender
[SUM]   0.00-300.00 sec   300 GBytes  8.59 Gbits/sec receiver

# run 2 (clean)
[SUM]   0.00-300.00 sec   285 GBytes  8.15 Gbits/sec sender
[SUM]   0.00-300.00 sec   285 GBytes  8.15 Gbits/sec receiver

# run 3 (clean)
[SUM]   0.00-300.00 sec   295 GBytes  8.46 Gbits/sec sender
[SUM]   0.00-300.00 sec   295 GBytes  8.46 Gbits/sec receiver

These tests show that while there was some fluctuation between each
individual test, when comparing patched and clean version, they come out
about the same. I'm happy to run more tests/scenarios if you have something
else in mind that should be tested.


On Wed, Mar 13, 2024 at 10:17 AM Martin Kalcok <martin.kalcok@canonical.com>
wrote:

> Regarding the failed unstable test in the CI, I suspect that this is not
> related to the code change, I've had couple successful CI runs in my branch
> [0].
>
> Martin.
>
> [0] https://github.com/mkalcok/ovn/actions/runs/8256539983
>
> On Tue, Mar 12, 2024 at 8:45 PM Martin Kalcok <martin.kalcok@canonical.com>
> wrote:
>
>> This change fixes bug that breaks ability of machines from external
>> networks to communicate with machines in SNATed networks (specifically
>> when using a Distributed router).
>>
>> Currently when a machine (S1) on an external network tries to talk
>> over TCP with a machine (A1) in a network that has enabled SNAT, the
>> connection is established successfully. However after the three-way
>> handshake, any packets that come from the A1 machine will have their
>> source address translated by the Distributed router, breaking the
>> communication.
>>
>> Existing rule in `build_lrouter_out_snat_flow` that decides which
>> packets should be SNATed already tries to avoid SNATing packets in
>> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages
>> in the distributed LR egress pipeline do not initiate the CT state.
>>
>> Additionally we need to commit new connections that originate from
>> external networks into CT, so that the packets in the reply direction
>> can be properly identified.
>>
>> Rationale:
>>
>> In my original RFC [0], there were questions about the motivation for
>> fixing this issue. I'll try to summarize why I think this is a bug
>> that should be fixed.
>>
>> 1. Current implementation for Distributed router already tries to
>>    avoid SNATing packets in the reply direction, it's just missing
>>    initialized CT states to make proper decisions.
>>
>> 2. This same scenario works with Gateway Router. I tested with
>>    following setup:
>>
>>     foo -- R1 -- join - R3 -- alice
>>                   |
>>     bar ----------R2
>>
>>     R1 is a Distributed router with SNAT for foo. R2 is a Gateway
>>     router with SNAT for bar. R3 is a Gateway router with no SNAT.
>>     Using 'alice1' as a client I was able to talk over TCP with
>>     'bar1' but connection with 'foo1' failed.
>>
>> 3. Regarding security and "leaking" of internal IPs. Reading through
>>    RFC 4787 [1], 5382 [2] and their update in 7857 [3], the
>>    specifications do not seem to mandate that SNAT implementations
>>    must filter incoming traffic destined directly to the internal
>>    network. Sections like "5. Filtering Behavior" in 4787 and
>>    "4.3 Externally Initiated Connections" in 5382 describe only
>>    behavior for traffic destined to external IP/Port associated
>>    with NAT on the device that performs NAT.
>>
>>    Besides, with the current implementation, it's already possible
>>    to scan the internal network with pings and TCP syn scanning.
>>
>> 4. We do have customers/clouds that depend on this functionality.
>>    This is a scenario that used to work in Openstack with ML2/OVS
>>    and migrating those clouds to ML2/OVN would break it.
>>
>> [0]
>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html
>> [1]https://datatracker.ietf.org/doc/html/rfc4787
>> [2]https://datatracker.ietf.org/doc/html/rfc5382
>> [3]https://datatracker.ietf.org/doc/html/rfc7857
>>
>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
>> ---
>>  northd/northd.c         | 68 ++++++++++++++++++++++++++++++++--------
>>  northd/ovn-northd.8.xml | 29 +++++++++++++++++
>>  tests/ovn-northd.at     | 33 ++++++++++++++++----
>>  tests/system-ovn.at     | 69 +++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 180 insertions(+), 19 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 2c3560ce2..25af52d5a 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -14438,20 +14438,27 @@ build_lrouter_out_is_dnat_local(struct
>> lflow_table *lflows,
>>
>>  static void
>>  build_lrouter_out_snat_match(struct lflow_table *lflows,
>> -                             const struct ovn_datapath *od,
>> -                             const struct nbrec_nat *nat, struct ds
>> *match,
>> -                             bool distributed_nat, int cidr_bits, bool
>> is_v6,
>> -                             struct ovn_port *l3dgw_port,
>> -                             struct lflow_ref *lflow_ref)
>> +                                         const struct ovn_datapath *od,
>> +                                         const struct nbrec_nat *nat,
>> +                                         struct ds *match,
>> +                                         bool distributed_nat, int
>> cidr_bits,
>> +                                         bool is_v6,
>> +                                         struct ovn_port *l3dgw_port,
>> +                                         struct lflow_ref *lflow_ref,
>> +                                         bool is_reverse)
>>  {
>>      ds_clear(match);
>>
>> -    ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4',
>> +    ds_put_format(match, "ip && ip%c.%s == %s",
>> +                  is_v6 ? '6' : '4',
>> +                  is_reverse ? "dst" : "src",
>>                    nat->logical_ip);
>>
>>      if (!od->is_gw_router) {
>>          /* Distributed router. */
>> -        ds_put_format(match, " && outport == %s", l3dgw_port->json_key);
>> +        ds_put_format(match, " && %s == %s",
>> +                      is_reverse ? "inport" : "outport",
>> +                      l3dgw_port->json_key);
>>          if (od->n_l3dgw_ports) {
>>              ds_put_format(match, " && is_chassis_resident(\"%s\")",
>>                            distributed_nat
>> @@ -14462,7 +14469,7 @@ build_lrouter_out_snat_match(struct lflow_table
>> *lflows,
>>
>>      if (nat->allowed_ext_ips || nat->exempted_ext_ips) {
>>          lrouter_nat_add_ext_ip_match(od, lflows, match, nat,
>> -                                     is_v6, false, cidr_bits,
>> +                                     is_v6, is_reverse, cidr_bits,
>>                                       lflow_ref);
>>      }
>>  }
>> @@ -14489,7 +14496,8 @@ build_lrouter_out_snat_stateless_flow(struct
>> lflow_table *lflows,
>>      uint16_t priority = cidr_bits + 1;
>>
>>      build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
>> -                                 cidr_bits, is_v6, l3dgw_port,
>> lflow_ref);
>> +                                 cidr_bits, is_v6, l3dgw_port, lflow_ref,
>> +                                 false);
>>
>>      if (!od->is_gw_router) {
>>          /* Distributed router. */
>> @@ -14536,7 +14544,7 @@ build_lrouter_out_snat_in_czone_flow(struct
>> lflow_table *lflows,
>>
>>      build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
>>                                   cidr_bits, is_v6, l3dgw_port,
>> -                                 lflow_ref);
>> +                                 lflow_ref, false);
>>
>>      if (od->n_l3dgw_ports) {
>>          priority += 128;
>> @@ -14585,12 +14593,14 @@ build_lrouter_out_snat_flow(struct lflow_table
>> *lflows,
>>                              struct ds *actions, bool distributed_nat,
>>                              struct eth_addr mac, int cidr_bits, bool
>> is_v6,
>>                              struct ovn_port *l3dgw_port,
>> -                            struct lflow_ref *lflow_ref)
>> +                            struct lflow_ref *lflow_ref,
>> +                            const struct chassis_features *features)
>>  {
>>      if (strcmp(nat->type, "snat") && strcmp(nat->type, "dnat_and_snat"))
>> {
>>          return;
>>      }
>>
>> +    struct ds match_all_from_snat = DS_EMPTY_INITIALIZER;
>>      ds_clear(actions);
>>
>>      /* The priority here is calculated such that the
>> @@ -14599,7 +14609,9 @@ build_lrouter_out_snat_flow(struct lflow_table
>> *lflows,
>>      uint16_t priority = cidr_bits + 1;
>>
>>      build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
>> -                                 cidr_bits, is_v6, l3dgw_port,
>> lflow_ref);
>> +                                 cidr_bits, is_v6, l3dgw_port, lflow_ref,
>> +                                 false);
>> +    ds_clone(&match_all_from_snat, match);
>>
>>      if (!od->is_gw_router) {
>>          /* Distributed router. */
>> @@ -14624,6 +14636,36 @@ build_lrouter_out_snat_flow(struct lflow_table
>> *lflows,
>>                              priority, ds_cstr(match),
>>                              ds_cstr(actions), &nat->header_,
>>                              lflow_ref);
>> +
>> +    /* For the SNAT networks, we need to make sure that connections are
>> +     * properly tracked so we can decide whether to perform SNAT on
>> traffic
>> +     * exiting the network. */
>> +    if (features->ct_commit_to_zone && !strcmp(nat->type, "snat") &&
>> +        !od->is_gw_router) {
>> +        /* For traffic that comes from SNAT network, initiate CT state
>> before
>> +         * entering S_ROUTER_OUT_SNAT to allow matching on various CT
>> states.
>> +         */
>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70,
>> +                      ds_cstr(&match_all_from_snat), "ct_snat; ",
>> +                      lflow_ref);
>> +
>> +        build_lrouter_out_snat_match(lflows, od, nat, match,
>> +                                     distributed_nat, cidr_bits, is_v6,
>> +                                     l3dgw_port, lflow_ref, true);
>> +
>> +        /* New traffic that goes into SNAT network is committed to CT to
>> avoid
>> +         * SNAT-ing replies.*/
>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority,
>> +                      ds_cstr(match), "ct_snat;",
>> +                      lflow_ref);
>> +
>> +        ds_put_cstr(match, "&& ct.new");
>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority,
>> +                      ds_cstr(match), "ct_commit(snat); next; ",
>> +                      lflow_ref);
>> +    }
>> +
>> +    ds_destroy(&match_all_from_snat);
>>  }
>>
>>  static void
>> @@ -15167,7 +15209,7 @@ build_lrouter_nat_defrag_and_lb(
>>          } else {
>>              build_lrouter_out_snat_flow(lflows, od, nat, match, actions,
>>                                          distributed_nat, mac, cidr_bits,
>> is_v6,
>> -                                        l3dgw_port, lflow_ref);
>> +                                        l3dgw_port, lflow_ref, features);
>>          }
>>
>>          /* S_ROUTER_IN_ADMISSION - S_ROUTER_IN_IP_INPUT */
>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>> index 17b414144..ce6fd1270 100644
>> --- a/northd/ovn-northd.8.xml
>> +++ b/northd/ovn-northd.8.xml
>> @@ -4869,6 +4869,13 @@ nd_ns {
>>
>>      <p>
>>        <ul>
>> +        <li>
>> +          A priority-70 logical flow is added that initiates CT state for
>> +          traffic that is configured to be SNATed on Distributed routers.
>> +          This allows the next table, <code>lr_out_snat</code>, to
>> +          effectively match on various CT states.
>> +        </li>
>> +
>>          <li>
>>            A priority-50 logical flow is added that commits any untracked
>> flows
>>            from the previous table <code>lr_out_undnat</code> for Gateway
>> @@ -5061,6 +5068,18 @@ nd_ns {
>>
>>        </li>
>>
>> +      <li>
>> +        An additional flow is added for traffic that goes in opposite
>> +        direction (i.e. it enters a network with configured SNAT). Where
>> the
>> +        flow above matched on <code>ip4.src == <var>A</var> &amp;&amp;
>> outport
>> +        == <var>GW</var></code>, this flow matches on <code> ip4.dst ==
>> +        <var>A</var> &amp;&amp; inport == <var>GW</var></code>. A CT
>> state is
>> +        initiated for this traffic so that the following table, <code>
>> +        lr_out_post_snat</code>, can identify whether the traffic flow
>> was
>> +        initiated from the internal or external network.
>> +
>> +      </li>
>> +
>>        <li>
>>          A priority-0 logical flow with match <code>1</code> has actions
>>          <code>next;</code>.
>> @@ -5072,6 +5091,16 @@ nd_ns {
>>      <p>
>>        Packets reaching this table are processed according to the flows
>> below:
>>        <ul>
>> +        <li>
>> +          Traffic that goes directly into a network configured with SNAT
>> on
>> +          Distributed routers, and was initiated from an external
>> network (i.e.
>> +          it matches <code>ct.new</code>), is committed to the SNAT CT
>> zone.
>> +          This ensures that replies returning from the SNATed network do
>> not
>> +          have their source address translated. For details about match
>> rules
>> +          and priority see section "Egress Table 3: SNAT on Distributed
>> +          Routers".
>> +        </li>
>> +
>>          <li>
>>            A priority-0 logical flow that matches all packets not already
>>            handled (match <code>1</code>) and action <code>next;</code>.
>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> index 0732486f3..d4a1212d0 100644
>> --- a/tests/ovn-northd.at
>> +++ b/tests/ovn-northd.at
>> @@ -1107,7 +1107,8 @@ ovn_start
>>  #
>>  # DR is connected to S1 and CR is connected to S2
>>
>> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>
>>  check ovn-nbctl lr-add DR
>>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>> @@ -1135,6 +1136,7 @@ echo "CR-LRP UUID is: " $uuid
>>  check ovn-nbctl set Logical_Router $cr_uuid options:chassis=gw1
>>  check ovn-nbctl --wait=sb sync
>>
>> +
>>  ovn-nbctl create Address_Set name=allowed_range addresses=\"1.1.1.1\"
>>  ovn-nbctl create Address_Set name=disallowed_range addresses=\"2.2.2.2\"
>>
>> @@ -1155,6 +1157,7 @@ AT_CAPTURE_FILE([crflows])
>>  AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl
>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>> action=(next;)
>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>> action=(next;)
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
>> 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>> ip4.src == $allowed_range), action=(ct_snat;)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>> ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)),
>> action=(ct_snat(172.16.1.1);)
>>  ])
>>
>> @@ -1185,6 +1188,7 @@ AT_CAPTURE_FILE([crflows2])
>>  AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl
>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>> action=(next;)
>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>> action=(next;)
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
>> 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
>> action=(ct_snat;)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);)
>>    table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src ==
>> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>> ip4.dst == $disallowed_range), action=(next;)
>>  ])
>> @@ -1279,7 +1283,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 |
>> ovn_strip_lflows], [0], [dnl
>>    table=??(lr_out_snat        ), priority=33   , match=(ip && ip4.src ==
>> 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; next;)
>>  ])
>>
>> -# Stateful FIP with DISALLOWED_IPs
>> +# Stateless FIP with DISALLOWED_IPs
>>  ovn-nbctl lr-nat-del DR dnat_and_snat  172.16.1.2
>>  ovn-nbctl lr-nat-del CR dnat_and_snat  172.16.1.2
>>
>> @@ -5489,7 +5493,8 @@ AT_CHECK([grep "lr_out_snat" lr0flows |
>> ovn_strip_lflows], [0], [dnl
>>
>>  check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>>    -- set chassis gw1 other_config:ct-no-masked-label="true" \
>> -  -- set chassis gw1 other_config:ovn-ct-lb-related="true"
>> +  -- set chassis gw1 other_config:ovn-ct-lb-related="true" \
>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>
>>  # Create a distributed gw port on lr0
>>  check ovn-nbctl ls-add public
>> @@ -5590,12 +5595,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
>> ovn_strip_lflows], [0], [dnl
>>
>>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
>> [dnl
>>    table=??(lr_out_post_undnat ), priority=0    , match=(1),
>> action=(next;)
>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
>> 10.0.0.0/24 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
>> 10.0.0.10 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>  ])
>>
>>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>> action=(next;)
>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>> action=(next;)
>> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst ==
>> 10.0.0.0/24 && inport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src ==
>> 10.0.0.0/24 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>> action=(ct_snat(172.168.0.10);)
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
>> 10.0.0.10 && inport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 10.0.0.10 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>> action=(ct_snat(172.168.0.30);)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")
>> && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);)
>>  ])
>> @@ -5738,12 +5747,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
>> ovn_strip_lflows], [0], [dnl
>>
>>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
>> [dnl
>>    table=??(lr_out_post_undnat ), priority=0    , match=(1),
>> action=(next;)
>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
>> 10.0.0.0/24 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src ==
>> 10.0.0.10 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>  ])
>>
>>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>> action=(next;)
>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>> action=(next;)
>> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst ==
>> 10.0.0.0/24 && inport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src ==
>> 10.0.0.0/24 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>> action=(ct_snat(172.168.0.10);)
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
>> 10.0.0.10 && inport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 10.0.0.10 && outport == "lr0-public" &&
>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>> action=(ct_snat(172.168.0.30);)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")
>> && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);)
>>  ])
>> @@ -7576,9 +7589,14 @@ ovn_start
>>  # Validate SNAT, DNAT and DNAT_AND_SNAT behavior with multiple
>>  # distributed gateway LRPs.
>>
>> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>> -check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
>> -check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>> +
>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1 \
>> +  -- set chassis gw2 other_config:ct-commit-to-zone="true"
>> +
>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1 \
>> +  -- set chassis gw3 other_config:ct-commit-to-zone="true"
>>
>>  check ovn-nbctl lr-add DR
>>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>> @@ -7643,6 +7661,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat
>> | ovn_strip_lflows], [0], [dn
>>  ])
>>
>>  AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows],
>> [0], [dnl
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
>> 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
>> action=(ct_snat;)
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
>> 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")),
>> action=(ct_snat;)
>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst ==
>> 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")),
>> action=(ct_snat;)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") &&
>> (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);)
>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
>> 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") &&
>> (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);)
>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>> index c22c7882f..80e9b7d0b 100644
>> --- a/tests/system-ovn.at
>> +++ b/tests/system-ovn.at
>> @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2
>> 10.0.0.1 | FORMAT_PING], \
>>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>  ])
>>
>> +
>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
>> +# icmp, udp and tcp connectivity from external network to the 'vm' on
>> +# the specified 'ip'.
>> +test_connectivity_from_ext() {
>> +    local vm=$1; shift
>> +    local ip=$1; shift
>> +
>> +    # Start listening daemons for UDP and TCP connections
>> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
>> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
>> +
>> +    # Ensure that vm can be pinged on the specified IP
>> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip |
>> FORMAT_PING], \
>> +    [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +
>> +    # Perform two consecutive UDP connections to the specified IP
>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>> +
>> +    # Send data over TCP connection to the specified IP
>> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
>> +}
>> +
>> +# Test access from external network to the internal IP of a VM that
>> +# has also configured DNAT
>> +test_connectivity_from_ext foo1 192.168.1.2
>> +
>> +# Test access from external network to the internal IP of a VM that
>> +# does not have DNAT
>> +test_connectivity_from_ext bar1 192.168.2.2
>> +
>>  OVS_WAIT_UNTIL([
>>      total_pkts=$(cat ext-net.pcap | wc -l)
>>      test "${total_pkts}" = "3"
>> @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0],
>> [dnl
>>
>>  icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>  ])
>>
>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
>> +# icmp, udp and tcp connectivity from external network to the 'vm' on
>> +# the specified 'ip'.
>> +test_connectivity_from_ext() {
>> +    local vm=$1; shift
>> +    local ip=$1; shift
>> +
>> +    # Start listening daemons for UDP and TCP connections
>> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
>> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
>> +
>> +    # Ensure that vm can be pinged on the specified IP
>> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip |
>> FORMAT_PING], \
>> +    [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +
>> +    # Perform two consecutive UDP connections to the specified IP
>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>> +
>> +    # Send data over TCP connection to the specified IP
>> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
>> +}
>> +
>> +# Test access from external network to the internal IP of a VM that
>> +# has also configured DNAT
>> +test_connectivity_from_ext foo1 fd11::2
>> +
>> +# Test access from external network to the internal IP of a VM that
>> +# does not have DNAT
>> +test_connectivity_from_ext bar1 fd12::2
>> +
>>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>
>>  as ovn-sb
>> @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
>> 172.16.1.4 | FORMAT_PING], \
>>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp |
>> FORMAT_CT(172.16.1.1) | \
>>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>
>>  icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>
>> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>
>>  icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>  ])
>>
>> @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
>> fd20::4 | FORMAT_PING], \
>>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \
>>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>
>>  icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>
>> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>
>>  icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>  ])
>>
>> --
>> 2.40.1
>>
>>
>
> --
> Best Regards,
> Martin Kalcok.
>
Ales Musil March 20, 2024, 11:33 a.m. UTC | #3
On Thu, Mar 14, 2024 at 2:13 PM Martin Kalcok <martin.kalcok@canonical.com>
wrote:

> Hello all,
> I have one more follow-up regarding the comments in v1. @amusil, you were
> concerned about the impact this change would have on the performance so I
> ran some tests to try to gauge it. I used following setup with two physical
> hosts connected over switch:
>
>        |    Physical ext. | Hypervisor1
>        |       network    |
> alice1-|----- switch -----|-- DLR --- foo1
>        |                  |
>
> * alice1 acts as external device.
>   * It doesn't run OVN
>   * It's connected to regular physical network.
> * Hypervisor1 runs OVN chassis
> * foo1 is a container behind Distributed LR with SNAT enabled.
>
> Goal of this test was to measure whether the proposed patch affects
> throughput of the traffic that flows from "foo1" to external network. I
> used iperf3 and ran 3x 5 minute measurements with 10 streams ("iperf3 -i 0
> -t 300 -P 10 -c alice1"). I used 5 minute for each measurement to smooth
> out possible jitter and I ran each measurement 3 times to get feel for the
> variance in the network throughput over longer period. I also did some
> trial and error testing that showed that 10 parallel streams reached
> highest throughput.
>
> Following are the three (summary) results with this patch applied:
>
> # run 1 (with patch)
> [SUM]   0.00-300.00 sec   296 GBytes  8.48 Gbits/sec  sender
> [SUM]   0.00-300.00 sec   296 GBytes  8.48 Gbits/sec  receiver
>
> # run 2 (with patch)
> [SUM]   0.00-300.00 sec   288 GBytes  8.24 Gbits/sec sender
> [SUM]   0.00-300.00 sec   288 GBytes  8.23 Gbits/sec receiver
>
> # run 3 (with patch)
> [SUM]   0.00-300.00 sec   299 GBytes  8.55 Gbits/sec sender
> [SUM]   0.00-300.00 sec   299 GBytes  8.55 Gbits/sec receiver
>
> Next thing I did was to rebuild ovn-controller without these patches,
> replaced the ovn-controller binary on the Hypervisor1 and restarted
> ovn-controller daemon. I verified that the feature flag "ct-commit-to-zone"
> was removed from the chassis (in the SB database), which forced the
> recompute, and then I reran the tests:
>
> # run 1 (clean)
> [SUM]   0.00-300.00 sec   300 GBytes  8.59 Gbits/sec sender
> [SUM]   0.00-300.00 sec   300 GBytes  8.59 Gbits/sec receiver
>
> # run 2 (clean)
> [SUM]   0.00-300.00 sec   285 GBytes  8.15 Gbits/sec sender
> [SUM]   0.00-300.00 sec   285 GBytes  8.15 Gbits/sec receiver
>
> # run 3 (clean)
> [SUM]   0.00-300.00 sec   295 GBytes  8.46 Gbits/sec sender
> [SUM]   0.00-300.00 sec   295 GBytes  8.46 Gbits/sec receiver
>
> These tests show that while there was some fluctuation between each
> individual test, when comparing patched and clean version, they come out
> about the same. I'm happy to run more tests/scenarios if you have something
> else in mind that should be tested.
>
>
>
Hi Martin,

thank you for the performance numbers, it looks fine to me. I have some
comments, see down below.


> On Wed, Mar 13, 2024 at 10:17 AM Martin Kalcok <
> martin.kalcok@canonical.com> wrote:
>
>> Regarding the failed unstable test in the CI, I suspect that this is not
>> related to the code change, I've had couple successful CI runs in my branch
>> [0].
>>
>> Martin.
>>
>> [0] https://github.com/mkalcok/ovn/actions/runs/8256539983
>>
>> On Tue, Mar 12, 2024 at 8:45 PM Martin Kalcok <
>> martin.kalcok@canonical.com> wrote:
>>
>>> This change fixes bug that breaks ability of machines from external
>>> networks to communicate with machines in SNATed networks (specifically
>>> when using a Distributed router).
>>>
>>> Currently when a machine (S1) on an external network tries to talk
>>> over TCP with a machine (A1) in a network that has enabled SNAT, the
>>> connection is established successfully. However after the three-way
>>> handshake, any packets that come from the A1 machine will have their
>>> source address translated by the Distributed router, breaking the
>>> communication.
>>>
>>> Existing rule in `build_lrouter_out_snat_flow` that decides which
>>> packets should be SNATed already tries to avoid SNATing packets in
>>> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages
>>> in the distributed LR egress pipeline do not initiate the CT state.
>>>
>>> Additionally we need to commit new connections that originate from
>>> external networks into CT, so that the packets in the reply direction
>>> can be properly identified.
>>>
>>> Rationale:
>>>
>>> In my original RFC [0], there were questions about the motivation for
>>> fixing this issue. I'll try to summarize why I think this is a bug
>>> that should be fixed.
>>>
>>> 1. Current implementation for Distributed router already tries to
>>>    avoid SNATing packets in the reply direction, it's just missing
>>>    initialized CT states to make proper decisions.
>>>
>>> 2. This same scenario works with Gateway Router. I tested with
>>>    following setup:
>>>
>>>     foo -- R1 -- join - R3 -- alice
>>>                   |
>>>     bar ----------R2
>>>
>>>     R1 is a Distributed router with SNAT for foo. R2 is a Gateway
>>>     router with SNAT for bar. R3 is a Gateway router with no SNAT.
>>>     Using 'alice1' as a client I was able to talk over TCP with
>>>     'bar1' but connection with 'foo1' failed.
>>>
>>> 3. Regarding security and "leaking" of internal IPs. Reading through
>>>    RFC 4787 [1], 5382 [2] and their update in 7857 [3], the
>>>    specifications do not seem to mandate that SNAT implementations
>>>    must filter incoming traffic destined directly to the internal
>>>    network. Sections like "5. Filtering Behavior" in 4787 and
>>>    "4.3 Externally Initiated Connections" in 5382 describe only
>>>    behavior for traffic destined to external IP/Port associated
>>>    with NAT on the device that performs NAT.
>>>
>>>    Besides, with the current implementation, it's already possible
>>>    to scan the internal network with pings and TCP syn scanning.
>>>
>>> 4. We do have customers/clouds that depend on this functionality.
>>>    This is a scenario that used to work in Openstack with ML2/OVS
>>>    and migrating those clouds to ML2/OVN would break it.
>>>
>>> [0]
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html
>>> [1]https://datatracker.ietf.org/doc/html/rfc4787
>>> [2]https://datatracker.ietf.org/doc/html/rfc5382
>>> [3]https://datatracker.ietf.org/doc/html/rfc7857
>>>
>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
>>> ---
>>>  northd/northd.c         | 68 ++++++++++++++++++++++++++++++++--------
>>>  northd/ovn-northd.8.xml | 29 +++++++++++++++++
>>>  tests/ovn-northd.at     | 33 ++++++++++++++++----
>>>  tests/system-ovn.at     | 69 +++++++++++++++++++++++++++++++++++++++++
>>>  4 files changed, 180 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 2c3560ce2..25af52d5a 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -14438,20 +14438,27 @@ build_lrouter_out_is_dnat_local(struct
>>> lflow_table *lflows,
>>>
>>>  static void
>>>  build_lrouter_out_snat_match(struct lflow_table *lflows,
>>> -                             const struct ovn_datapath *od,
>>> -                             const struct nbrec_nat *nat, struct ds
>>> *match,
>>> -                             bool distributed_nat, int cidr_bits, bool
>>> is_v6,
>>> -                             struct ovn_port *l3dgw_port,
>>> -                             struct lflow_ref *lflow_ref)
>>> +                                         const struct ovn_datapath *od,
>>> +                                         const struct nbrec_nat *nat,
>>> +                                         struct ds *match,
>>> +                                         bool distributed_nat, int
>>> cidr_bits,
>>> +                                         bool is_v6,
>>> +                                         struct ovn_port *l3dgw_port,
>>> +                                         struct lflow_ref *lflow_ref,
>>> +                                         bool is_reverse)
>>>  {
>>>      ds_clear(match);
>>>
>>> -    ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4',
>>> +    ds_put_format(match, "ip && ip%c.%s == %s",
>>> +                  is_v6 ? '6' : '4',
>>> +                  is_reverse ? "dst" : "src",
>>>                    nat->logical_ip);
>>>
>>>      if (!od->is_gw_router) {
>>>          /* Distributed router. */
>>> -        ds_put_format(match, " && outport == %s", l3dgw_port->json_key);
>>> +        ds_put_format(match, " && %s == %s",
>>> +                      is_reverse ? "inport" : "outport",
>>> +                      l3dgw_port->json_key);
>>>          if (od->n_l3dgw_ports) {
>>>              ds_put_format(match, " && is_chassis_resident(\"%s\")",
>>>                            distributed_nat
>>> @@ -14462,7 +14469,7 @@ build_lrouter_out_snat_match(struct lflow_table
>>> *lflows,
>>>
>>>      if (nat->allowed_ext_ips || nat->exempted_ext_ips) {
>>>          lrouter_nat_add_ext_ip_match(od, lflows, match, nat,
>>> -                                     is_v6, false, cidr_bits,
>>> +                                     is_v6, is_reverse, cidr_bits,
>>>                                       lflow_ref);
>>>      }
>>>  }
>>> @@ -14489,7 +14496,8 @@ build_lrouter_out_snat_stateless_flow(struct
>>> lflow_table *lflows,
>>>      uint16_t priority = cidr_bits + 1;
>>>
>>>      build_lrouter_out_snat_match(lflows, od, nat, match,
>>> distributed_nat,
>>> -                                 cidr_bits, is_v6, l3dgw_port,
>>> lflow_ref);
>>> +                                 cidr_bits, is_v6, l3dgw_port,
>>> lflow_ref,
>>> +                                 false);
>>>
>>>      if (!od->is_gw_router) {
>>>          /* Distributed router. */
>>> @@ -14536,7 +14544,7 @@ build_lrouter_out_snat_in_czone_flow(struct
>>> lflow_table *lflows,
>>>
>>>      build_lrouter_out_snat_match(lflows, od, nat, match,
>>> distributed_nat,
>>>                                   cidr_bits, is_v6, l3dgw_port,
>>> -                                 lflow_ref);
>>> +                                 lflow_ref, false);
>>>
>>>      if (od->n_l3dgw_ports) {
>>>          priority += 128;
>>> @@ -14585,12 +14593,14 @@ build_lrouter_out_snat_flow(struct lflow_table
>>> *lflows,
>>>                              struct ds *actions, bool distributed_nat,
>>>                              struct eth_addr mac, int cidr_bits, bool
>>> is_v6,
>>>                              struct ovn_port *l3dgw_port,
>>> -                            struct lflow_ref *lflow_ref)
>>> +                            struct lflow_ref *lflow_ref,
>>> +                            const struct chassis_features *features)
>>>  {
>>>      if (strcmp(nat->type, "snat") && strcmp(nat->type,
>>> "dnat_and_snat")) {
>>>          return;
>>>      }
>>>
>>> +    struct ds match_all_from_snat = DS_EMPTY_INITIALIZER;
>>>      ds_clear(actions);
>>>
>>>      /* The priority here is calculated such that the
>>> @@ -14599,7 +14609,9 @@ build_lrouter_out_snat_flow(struct lflow_table
>>> *lflows,
>>>      uint16_t priority = cidr_bits + 1;
>>>
>>>      build_lrouter_out_snat_match(lflows, od, nat, match,
>>> distributed_nat,
>>> -                                 cidr_bits, is_v6, l3dgw_port,
>>> lflow_ref);
>>> +                                 cidr_bits, is_v6, l3dgw_port,
>>> lflow_ref,
>>> +                                 false);
>>> +    ds_clone(&match_all_from_snat, match);
>>>
>>
We don't have to clone the match, you can just store the match length e.g.
size_t match_len = match->len;


>
>>>      if (!od->is_gw_router) {
>>>          /* Distributed router. */
>>> @@ -14624,6 +14636,36 @@ build_lrouter_out_snat_flow(struct lflow_table
>>> *lflows,
>>>                              priority, ds_cstr(match),
>>>                              ds_cstr(actions), &nat->header_,
>>>                              lflow_ref);
>>> +
>>> +    /* For the SNAT networks, we need to make sure that connections are
>>> +     * properly tracked so we can decide whether to perform SNAT on
>>> traffic
>>> +     * exiting the network. */
>>> +    if (features->ct_commit_to_zone && !strcmp(nat->type, "snat") &&
>>> +        !od->is_gw_router) {
>>>
>>
Here you can just truncate the ds to the state before the previous if:
ds_truncate(match, match_len);


> +        /* For traffic that comes from SNAT network, initiate CT state
>>> before
>>> +         * entering S_ROUTER_OUT_SNAT to allow matching on various CT
>>> states.
>>> +         */
>>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70,
>>> +                      ds_cstr(&match_all_from_snat), "ct_snat; ",
>>> +                      lflow_ref);
>>> +
>>> +        build_lrouter_out_snat_match(lflows, od, nat, match,
>>> +                                     distributed_nat, cidr_bits, is_v6,
>>> +                                     l3dgw_port, lflow_ref, true);
>>> +
>>> +        /* New traffic that goes into SNAT network is committed to CT
>>> to avoid
>>> +         * SNAT-ing replies.*/
>>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority,
>>> +                      ds_cstr(match), "ct_snat;",
>>> +                      lflow_ref);
>>> +
>>> +        ds_put_cstr(match, "&& ct.new");
>>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority,
>>> +                      ds_cstr(match), "ct_commit(snat); next; ",
>>> +                      lflow_ref);
>>> +    }
>>> +
>>> +    ds_destroy(&match_all_from_snat);
>>>  }
>>>
>>>  static void
>>> @@ -15167,7 +15209,7 @@ build_lrouter_nat_defrag_and_lb(
>>>          } else {
>>>              build_lrouter_out_snat_flow(lflows, od, nat, match, actions,
>>>                                          distributed_nat, mac,
>>> cidr_bits, is_v6,
>>> -                                        l3dgw_port, lflow_ref);
>>> +                                        l3dgw_port, lflow_ref,
>>> features);
>>>          }
>>>
>>>          /* S_ROUTER_IN_ADMISSION - S_ROUTER_IN_IP_INPUT */
>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>> index 17b414144..ce6fd1270 100644
>>> --- a/northd/ovn-northd.8.xml
>>> +++ b/northd/ovn-northd.8.xml
>>> @@ -4869,6 +4869,13 @@ nd_ns {
>>>
>>>      <p>
>>>        <ul>
>>> +        <li>
>>> +          A priority-70 logical flow is added that initiates CT state
>>> for
>>> +          traffic that is configured to be SNATed on Distributed
>>> routers.
>>> +          This allows the next table, <code>lr_out_snat</code>, to
>>> +          effectively match on various CT states.
>>> +        </li>
>>> +
>>>          <li>
>>>            A priority-50 logical flow is added that commits any
>>> untracked flows
>>>            from the previous table <code>lr_out_undnat</code> for Gateway
>>> @@ -5061,6 +5068,18 @@ nd_ns {
>>>
>>>        </li>
>>>
>>> +      <li>
>>> +        An additional flow is added for traffic that goes in opposite
>>> +        direction (i.e. it enters a network with configured SNAT).
>>> Where the
>>> +        flow above matched on <code>ip4.src == <var>A</var> &amp;&amp;
>>> outport
>>> +        == <var>GW</var></code>, this flow matches on <code> ip4.dst ==
>>> +        <var>A</var> &amp;&amp; inport == <var>GW</var></code>. A CT
>>> state is
>>> +        initiated for this traffic so that the following table, <code>
>>> +        lr_out_post_snat</code>, can identify whether the traffic flow
>>> was
>>> +        initiated from the internal or external network.
>>> +
>>> +      </li>
>>> +
>>>        <li>
>>>          A priority-0 logical flow with match <code>1</code> has actions
>>>          <code>next;</code>.
>>> @@ -5072,6 +5091,16 @@ nd_ns {
>>>      <p>
>>>        Packets reaching this table are processed according to the flows
>>> below:
>>>        <ul>
>>> +        <li>
>>> +          Traffic that goes directly into a network configured with
>>> SNAT on
>>> +          Distributed routers, and was initiated from an external
>>> network (i.e.
>>> +          it matches <code>ct.new</code>), is committed to the SNAT CT
>>> zone.
>>> +          This ensures that replies returning from the SNATed network
>>> do not
>>> +          have their source address translated. For details about match
>>> rules
>>> +          and priority see section "Egress Table 3: SNAT on Distributed
>>> +          Routers".
>>> +        </li>
>>> +
>>>          <li>
>>>            A priority-0 logical flow that matches all packets not already
>>>            handled (match <code>1</code>) and action <code>next;</code>.
>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>> index 0732486f3..d4a1212d0 100644
>>> --- a/tests/ovn-northd.at
>>> +++ b/tests/ovn-northd.at
>>> @@ -1107,7 +1107,8 @@ ovn_start
>>>  #
>>>  # DR is connected to S1 and CR is connected to S2
>>>
>>> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>>
>>>  check ovn-nbctl lr-add DR
>>>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>>> @@ -1135,6 +1136,7 @@ echo "CR-LRP UUID is: " $uuid
>>>  check ovn-nbctl set Logical_Router $cr_uuid options:chassis=gw1
>>>  check ovn-nbctl --wait=sb sync
>>>
>>> +
>>>
>>
nit: Unrelated

 ovn-nbctl create Address_Set name=allowed_range addresses=\"1.1.1.1\"
>>>  ovn-nbctl create Address_Set name=disallowed_range addresses=\"2.2.2.2\"
>>>
>>> @@ -1155,6 +1157,7 @@ AT_CAPTURE_FILE([crflows])
>>>  AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl
>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>> action=(next;)
>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>> action=(next;)
>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>> ip4.src == $allowed_range), action=(ct_snat;)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>> ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)),
>>> action=(ct_snat(172.16.1.1);)
>>>  ])
>>>
>>> @@ -1185,6 +1188,7 @@ AT_CAPTURE_FILE([crflows2])
>>>  AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl
>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>> action=(next;)
>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>> action=(next;)
>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
>>> action=(ct_snat;)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);)
>>>    table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src
>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>> ip4.dst == $disallowed_range), action=(next;)
>>>  ])
>>> @@ -1279,7 +1283,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 |
>>> ovn_strip_lflows], [0], [dnl
>>>    table=??(lr_out_snat        ), priority=33   , match=(ip && ip4.src
>>> == 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2;
>>> next;)
>>>  ])
>>>
>>> -# Stateful FIP with DISALLOWED_IPs
>>> +# Stateless FIP with DISALLOWED_IPs
>>>  ovn-nbctl lr-nat-del DR dnat_and_snat  172.16.1.2
>>>  ovn-nbctl lr-nat-del CR dnat_and_snat  172.16.1.2
>>>
>>> @@ -5489,7 +5493,8 @@ AT_CHECK([grep "lr_out_snat" lr0flows |
>>> ovn_strip_lflows], [0], [dnl
>>>
>>>  check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>>>    -- set chassis gw1 other_config:ct-no-masked-label="true" \
>>> -  -- set chassis gw1 other_config:ovn-ct-lb-related="true"
>>> +  -- set chassis gw1 other_config:ovn-ct-lb-related="true" \
>>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>>
>>>  # Create a distributed gw port on lr0
>>>  check ovn-nbctl ls-add public
>>> @@ -5590,12 +5595,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
>>> ovn_strip_lflows], [0], [dnl
>>>
>>>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
>>> [dnl
>>>    table=??(lr_out_post_undnat ), priority=0    , match=(1),
>>> action=(next;)
>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>> == 10.0.0.10 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>>  ])
>>>
>>>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>> action=(next;)
>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>> action=(next;)
>>> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst
>>> == 10.0.0.0/24 && inport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src
>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>> action=(ct_snat(172.168.0.10);)
>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>> == 10.0.0.10 && inport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 10.0.0.10 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>> action=(ct_snat(172.168.0.30);)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 10.0.0.3 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>> action=(ct_snat(172.168.0.20);)
>>>  ])
>>> @@ -5738,12 +5747,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
>>> ovn_strip_lflows], [0], [dnl
>>>
>>>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
>>> [dnl
>>>    table=??(lr_out_post_undnat ), priority=0    , match=(1),
>>> action=(next;)
>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>> == 10.0.0.10 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>>  ])
>>>
>>>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>> action=(next;)
>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>> action=(next;)
>>> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst
>>> == 10.0.0.0/24 && inport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src
>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>> action=(ct_snat(172.168.0.10);)
>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>> == 10.0.0.10 && inport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 10.0.0.10 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>> action=(ct_snat(172.168.0.30);)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 10.0.0.3 && outport == "lr0-public" &&
>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>> action=(ct_snat(172.168.0.20);)
>>>  ])
>>>
>> @@ -7576,9 +7589,14 @@ ovn_start
>>>  # Validate SNAT, DNAT and DNAT_AND_SNAT behavior with multiple
>>>  # distributed gateway LRPs.
>>>
>>> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>>> -check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
>>> -check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>> +
>>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1 \
>>> +  -- set chassis gw2 other_config:ct-commit-to-zone="true"
>>> +
>>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1 \
>>> +  -- set chassis gw3 other_config:ct-commit-to-zone="true"
>>>
>>>  check ovn-nbctl lr-add DR
>>>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>>> @@ -7643,6 +7661,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat
>>> | ovn_strip_lflows], [0], [dn
>>>  ])
>>>
>>>  AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows],
>>> [0], [dnl
>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>> == 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
>>> action=(ct_snat;)
>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>> == 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")),
>>> action=(ct_snat;)
>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>> == 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")),
>>> action=(ct_snat;)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") &&
>>> (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);)
>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>> == 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") &&
>>> (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);)
>>>
>>
I would actually expect one of the tests in ovn-northd.at to also check the
flow that does the ct_commit(snat).


> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>> index c22c7882f..80e9b7d0b 100644
>>> --- a/tests/system-ovn.at
>>> +++ b/tests/system-ovn.at
>>> @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2
>>> 10.0.0.1 | FORMAT_PING], \
>>>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>>  ])
>>>
>>> +
>>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
>>> +# icmp, udp and tcp connectivity from external network to the 'vm' on
>>> +# the specified 'ip'.
>>> +test_connectivity_from_ext() {
>>> +    local vm=$1; shift
>>> +    local ip=$1; shift
>>> +
>>> +    # Start listening daemons for UDP and TCP connections
>>> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
>>> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
>>> +
>>> +    # Ensure that vm can be pinged on the specified IP
>>> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip |
>>> FORMAT_PING], \
>>> +    [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +
>>> +    # Perform two consecutive UDP connections to the specified IP
>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>> +
>>> +    # Send data over TCP connection to the specified IP
>>> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
>>>
>>
Do we actually need to send data? I think -z is also enough for TCP to
prove that the connection works.


> +}
>>> +
>>> +# Test access from external network to the internal IP of a VM that
>>> +# has also configured DNAT
>>> +test_connectivity_from_ext foo1 192.168.1.2
>>> +
>>> +# Test access from external network to the internal IP of a VM that
>>> +# does not have DNAT
>>> +test_connectivity_from_ext bar1 192.168.2.2
>>> +
>>>  OVS_WAIT_UNTIL([
>>>      total_pkts=$(cat ext-net.pcap | wc -l)
>>>      test "${total_pkts}" = "3"
>>> @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0],
>>> [dnl
>>>
>>>  icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>  ])
>>>
>>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
>>> +# icmp, udp and tcp connectivity from external network to the 'vm' on
>>> +# the specified 'ip'.
>>> +test_connectivity_from_ext() {
>>> +    local vm=$1; shift
>>> +    local ip=$1; shift
>>> +
>>> +    # Start listening daemons for UDP and TCP connections
>>> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
>>> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
>>> +
>>> +    # Ensure that vm can be pinged on the specified IP
>>> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip |
>>> FORMAT_PING], \
>>> +    [0], [dnl
>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>> +])
>>> +
>>> +    # Perform two consecutive UDP connections to the specified IP
>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>> +
>>> +    # Send data over TCP connection to the specified IP
>>> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
>>>
>>

Same as above.

+}
>>> +
>>> +# Test access from external network to the internal IP of a VM that
>>> +# has also configured DNAT
>>> +test_connectivity_from_ext foo1 fd11::2
>>> +
>>> +# Test access from external network to the internal IP of a VM that
>>> +# does not have DNAT
>>> +test_connectivity_from_ext bar1 fd12::2
>>> +
>>>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>>
>>>  as ovn-sb
>>> @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
>>> 172.16.1.4 | FORMAT_PING], \
>>>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp |
>>> FORMAT_CT(172.16.1.1) | \
>>>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>>
>>>  icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>>
>>> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>>
>>>  icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>>  ])
>>>
>>> @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
>>> fd20::4 | FORMAT_PING], \
>>>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \
>>>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>>
>>>  icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>
>>> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>
>>>  icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>  ])
>>>
>>> --
>>> 2.40.1
>>>
>>>
>>
>> --
>> Best Regards,
>> Martin Kalcok.
>>
>
>
> --
> Best Regards,
> Martin Kalcok.
>

Thanks,
Ales
Martin Kalcok April 9, 2024, 11:48 a.m. UTC | #4
Hi Ales, thanks for another round of review.

On Wed, Mar 20, 2024 at 12:34 PM Ales Musil <amusil@redhat.com> wrote:

>
>
> On Thu, Mar 14, 2024 at 2:13 PM Martin Kalcok <martin.kalcok@canonical.com>
> wrote:
>
>> Hello all,
>> I have one more follow-up regarding the comments in v1. @amusil, you were
>> concerned about the impact this change would have on the performance so I
>> ran some tests to try to gauge it. I used following setup with two physical
>> hosts connected over switch:
>>
>>        |    Physical ext. | Hypervisor1
>>        |       network    |
>> alice1-|----- switch -----|-- DLR --- foo1
>>        |                  |
>>
>> * alice1 acts as external device.
>>   * It doesn't run OVN
>>   * It's connected to regular physical network.
>> * Hypervisor1 runs OVN chassis
>> * foo1 is a container behind Distributed LR with SNAT enabled.
>>
>> Goal of this test was to measure whether the proposed patch affects
>> throughput of the traffic that flows from "foo1" to external network. I
>> used iperf3 and ran 3x 5 minute measurements with 10 streams ("iperf3 -i 0
>> -t 300 -P 10 -c alice1"). I used 5 minute for each measurement to smooth
>> out possible jitter and I ran each measurement 3 times to get feel for the
>> variance in the network throughput over longer period. I also did some
>> trial and error testing that showed that 10 parallel streams reached
>> highest throughput.
>>
>> Following are the three (summary) results with this patch applied:
>>
>> # run 1 (with patch)
>> [SUM]   0.00-300.00 sec   296 GBytes  8.48 Gbits/sec  sender
>> [SUM]   0.00-300.00 sec   296 GBytes  8.48 Gbits/sec  receiver
>>
>> # run 2 (with patch)
>> [SUM]   0.00-300.00 sec   288 GBytes  8.24 Gbits/sec sender
>> [SUM]   0.00-300.00 sec   288 GBytes  8.23 Gbits/sec receiver
>>
>> # run 3 (with patch)
>> [SUM]   0.00-300.00 sec   299 GBytes  8.55 Gbits/sec sender
>> [SUM]   0.00-300.00 sec   299 GBytes  8.55 Gbits/sec receiver
>>
>> Next thing I did was to rebuild ovn-controller without these patches,
>> replaced the ovn-controller binary on the Hypervisor1 and restarted
>> ovn-controller daemon. I verified that the feature flag "ct-commit-to-zone"
>> was removed from the chassis (in the SB database), which forced the
>> recompute, and then I reran the tests:
>>
>> # run 1 (clean)
>> [SUM]   0.00-300.00 sec   300 GBytes  8.59 Gbits/sec sender
>> [SUM]   0.00-300.00 sec   300 GBytes  8.59 Gbits/sec receiver
>>
>> # run 2 (clean)
>> [SUM]   0.00-300.00 sec   285 GBytes  8.15 Gbits/sec sender
>> [SUM]   0.00-300.00 sec   285 GBytes  8.15 Gbits/sec receiver
>>
>> # run 3 (clean)
>> [SUM]   0.00-300.00 sec   295 GBytes  8.46 Gbits/sec sender
>> [SUM]   0.00-300.00 sec   295 GBytes  8.46 Gbits/sec receiver
>>
>> These tests show that while there was some fluctuation between each
>> individual test, when comparing patched and clean version, they come out
>> about the same. I'm happy to run more tests/scenarios if you have something
>> else in mind that should be tested.
>>
>>
>>
> Hi Martin,
>
> thank you for the performance numbers, it looks fine to me. I have some
> comments, see down below.
>
>
>> On Wed, Mar 13, 2024 at 10:17 AM Martin Kalcok <
>> martin.kalcok@canonical.com> wrote:
>>
>>> Regarding the failed unstable test in the CI, I suspect that this is not
>>> related to the code change, I've had couple successful CI runs in my branch
>>> [0].
>>>
>>> Martin.
>>>
>>> [0] https://github.com/mkalcok/ovn/actions/runs/8256539983
>>>
>>> On Tue, Mar 12, 2024 at 8:45 PM Martin Kalcok <
>>> martin.kalcok@canonical.com> wrote:
>>>
>>>> This change fixes bug that breaks ability of machines from external
>>>> networks to communicate with machines in SNATed networks (specifically
>>>> when using a Distributed router).
>>>>
>>>> Currently when a machine (S1) on an external network tries to talk
>>>> over TCP with a machine (A1) in a network that has enabled SNAT, the
>>>> connection is established successfully. However after the three-way
>>>> handshake, any packets that come from the A1 machine will have their
>>>> source address translated by the Distributed router, breaking the
>>>> communication.
>>>>
>>>> Existing rule in `build_lrouter_out_snat_flow` that decides which
>>>> packets should be SNATed already tries to avoid SNATing packets in
>>>> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages
>>>> in the distributed LR egress pipeline do not initiate the CT state.
>>>>
>>>> Additionally we need to commit new connections that originate from
>>>> external networks into CT, so that the packets in the reply direction
>>>> can be properly identified.
>>>>
>>>> Rationale:
>>>>
>>>> In my original RFC [0], there were questions about the motivation for
>>>> fixing this issue. I'll try to summarize why I think this is a bug
>>>> that should be fixed.
>>>>
>>>> 1. Current implementation for Distributed router already tries to
>>>>    avoid SNATing packets in the reply direction, it's just missing
>>>>    initialized CT states to make proper decisions.
>>>>
>>>> 2. This same scenario works with Gateway Router. I tested with
>>>>    following setup:
>>>>
>>>>     foo -- R1 -- join - R3 -- alice
>>>>                   |
>>>>     bar ----------R2
>>>>
>>>>     R1 is a Distributed router with SNAT for foo. R2 is a Gateway
>>>>     router with SNAT for bar. R3 is a Gateway router with no SNAT.
>>>>     Using 'alice1' as a client I was able to talk over TCP with
>>>>     'bar1' but connection with 'foo1' failed.
>>>>
>>>> 3. Regarding security and "leaking" of internal IPs. Reading through
>>>>    RFC 4787 [1], 5382 [2] and their update in 7857 [3], the
>>>>    specifications do not seem to mandate that SNAT implementations
>>>>    must filter incoming traffic destined directly to the internal
>>>>    network. Sections like "5. Filtering Behavior" in 4787 and
>>>>    "4.3 Externally Initiated Connections" in 5382 describe only
>>>>    behavior for traffic destined to external IP/Port associated
>>>>    with NAT on the device that performs NAT.
>>>>
>>>>    Besides, with the current implementation, it's already possible
>>>>    to scan the internal network with pings and TCP syn scanning.
>>>>
>>>> 4. We do have customers/clouds that depend on this functionality.
>>>>    This is a scenario that used to work in Openstack with ML2/OVS
>>>>    and migrating those clouds to ML2/OVN would break it.
>>>>
>>>> [0]
>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html
>>>> [1]https://datatracker.ietf.org/doc/html/rfc4787
>>>> [2]https://datatracker.ietf.org/doc/html/rfc5382
>>>> [3]https://datatracker.ietf.org/doc/html/rfc7857
>>>>
>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com>
>>>> ---
>>>>  northd/northd.c         | 68 ++++++++++++++++++++++++++++++++--------
>>>>  northd/ovn-northd.8.xml | 29 +++++++++++++++++
>>>>  tests/ovn-northd.at     | 33 ++++++++++++++++----
>>>>  tests/system-ovn.at     | 69 +++++++++++++++++++++++++++++++++++++++++
>>>>  4 files changed, 180 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 2c3560ce2..25af52d5a 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -14438,20 +14438,27 @@ build_lrouter_out_is_dnat_local(struct
>>>> lflow_table *lflows,
>>>>
>>>>  static void
>>>>  build_lrouter_out_snat_match(struct lflow_table *lflows,
>>>> -                             const struct ovn_datapath *od,
>>>> -                             const struct nbrec_nat *nat, struct ds
>>>> *match,
>>>> -                             bool distributed_nat, int cidr_bits, bool
>>>> is_v6,
>>>> -                             struct ovn_port *l3dgw_port,
>>>> -                             struct lflow_ref *lflow_ref)
>>>> +                                         const struct ovn_datapath *od,
>>>> +                                         const struct nbrec_nat *nat,
>>>> +                                         struct ds *match,
>>>> +                                         bool distributed_nat, int
>>>> cidr_bits,
>>>> +                                         bool is_v6,
>>>> +                                         struct ovn_port *l3dgw_port,
>>>> +                                         struct lflow_ref *lflow_ref,
>>>> +                                         bool is_reverse)
>>>>  {
>>>>      ds_clear(match);
>>>>
>>>> -    ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4',
>>>> +    ds_put_format(match, "ip && ip%c.%s == %s",
>>>> +                  is_v6 ? '6' : '4',
>>>> +                  is_reverse ? "dst" : "src",
>>>>                    nat->logical_ip);
>>>>
>>>>      if (!od->is_gw_router) {
>>>>          /* Distributed router. */
>>>> -        ds_put_format(match, " && outport == %s",
>>>> l3dgw_port->json_key);
>>>> +        ds_put_format(match, " && %s == %s",
>>>> +                      is_reverse ? "inport" : "outport",
>>>> +                      l3dgw_port->json_key);
>>>>          if (od->n_l3dgw_ports) {
>>>>              ds_put_format(match, " && is_chassis_resident(\"%s\")",
>>>>                            distributed_nat
>>>> @@ -14462,7 +14469,7 @@ build_lrouter_out_snat_match(struct lflow_table
>>>> *lflows,
>>>>
>>>>      if (nat->allowed_ext_ips || nat->exempted_ext_ips) {
>>>>          lrouter_nat_add_ext_ip_match(od, lflows, match, nat,
>>>> -                                     is_v6, false, cidr_bits,
>>>> +                                     is_v6, is_reverse, cidr_bits,
>>>>                                       lflow_ref);
>>>>      }
>>>>  }
>>>> @@ -14489,7 +14496,8 @@ build_lrouter_out_snat_stateless_flow(struct
>>>> lflow_table *lflows,
>>>>      uint16_t priority = cidr_bits + 1;
>>>>
>>>>      build_lrouter_out_snat_match(lflows, od, nat, match,
>>>> distributed_nat,
>>>> -                                 cidr_bits, is_v6, l3dgw_port,
>>>> lflow_ref);
>>>> +                                 cidr_bits, is_v6, l3dgw_port,
>>>> lflow_ref,
>>>> +                                 false);
>>>>
>>>>      if (!od->is_gw_router) {
>>>>          /* Distributed router. */
>>>> @@ -14536,7 +14544,7 @@ build_lrouter_out_snat_in_czone_flow(struct
>>>> lflow_table *lflows,
>>>>
>>>>      build_lrouter_out_snat_match(lflows, od, nat, match,
>>>> distributed_nat,
>>>>                                   cidr_bits, is_v6, l3dgw_port,
>>>> -                                 lflow_ref);
>>>> +                                 lflow_ref, false);
>>>>
>>>>      if (od->n_l3dgw_ports) {
>>>>          priority += 128;
>>>> @@ -14585,12 +14593,14 @@ build_lrouter_out_snat_flow(struct
>>>> lflow_table *lflows,
>>>>                              struct ds *actions, bool distributed_nat,
>>>>                              struct eth_addr mac, int cidr_bits, bool
>>>> is_v6,
>>>>                              struct ovn_port *l3dgw_port,
>>>> -                            struct lflow_ref *lflow_ref)
>>>> +                            struct lflow_ref *lflow_ref,
>>>> +                            const struct chassis_features *features)
>>>>  {
>>>>      if (strcmp(nat->type, "snat") && strcmp(nat->type,
>>>> "dnat_and_snat")) {
>>>>          return;
>>>>      }
>>>>
>>>> +    struct ds match_all_from_snat = DS_EMPTY_INITIALIZER;
>>>>      ds_clear(actions);
>>>>
>>>>      /* The priority here is calculated such that the
>>>> @@ -14599,7 +14609,9 @@ build_lrouter_out_snat_flow(struct lflow_table
>>>> *lflows,
>>>>      uint16_t priority = cidr_bits + 1;
>>>>
>>>>      build_lrouter_out_snat_match(lflows, od, nat, match,
>>>> distributed_nat,
>>>> -                                 cidr_bits, is_v6, l3dgw_port,
>>>> lflow_ref);
>>>> +                                 cidr_bits, is_v6, l3dgw_port,
>>>> lflow_ref,
>>>> +                                 false);
>>>> +    ds_clone(&match_all_from_snat, match);
>>>>
>>>
> We don't have to clone the match, you can just store the match length e.g.
> size_t match_len = match->len;
>

Oh, cool approach. I didn't think about it this way, thanks.


>
>
>>
>>>>      if (!od->is_gw_router) {
>>>>          /* Distributed router. */
>>>> @@ -14624,6 +14636,36 @@ build_lrouter_out_snat_flow(struct lflow_table
>>>> *lflows,
>>>>                              priority, ds_cstr(match),
>>>>                              ds_cstr(actions), &nat->header_,
>>>>                              lflow_ref);
>>>> +
>>>> +    /* For the SNAT networks, we need to make sure that connections are
>>>> +     * properly tracked so we can decide whether to perform SNAT on
>>>> traffic
>>>> +     * exiting the network. */
>>>> +    if (features->ct_commit_to_zone && !strcmp(nat->type, "snat") &&
>>>> +        !od->is_gw_router) {
>>>>
>>>
> Here you can just truncate the ds to the state before the previous if:
> ds_truncate(match, match_len);
>
>
>> +        /* For traffic that comes from SNAT network, initiate CT state
>>>> before
>>>> +         * entering S_ROUTER_OUT_SNAT to allow matching on various CT
>>>> states.
>>>> +         */
>>>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70,
>>>> +                      ds_cstr(&match_all_from_snat), "ct_snat; ",
>>>> +                      lflow_ref);
>>>> +
>>>> +        build_lrouter_out_snat_match(lflows, od, nat, match,
>>>> +                                     distributed_nat, cidr_bits, is_v6,
>>>> +                                     l3dgw_port, lflow_ref, true);
>>>> +
>>>> +        /* New traffic that goes into SNAT network is committed to CT
>>>> to avoid
>>>> +         * SNAT-ing replies.*/
>>>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority,
>>>> +                      ds_cstr(match), "ct_snat;",
>>>> +                      lflow_ref);
>>>> +
>>>> +        ds_put_cstr(match, "&& ct.new");
>>>> +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority,
>>>> +                      ds_cstr(match), "ct_commit(snat); next; ",
>>>> +                      lflow_ref);
>>>> +    }
>>>> +
>>>> +    ds_destroy(&match_all_from_snat);
>>>>  }
>>>>
>>>>  static void
>>>> @@ -15167,7 +15209,7 @@ build_lrouter_nat_defrag_and_lb(
>>>>          } else {
>>>>              build_lrouter_out_snat_flow(lflows, od, nat, match,
>>>> actions,
>>>>                                          distributed_nat, mac,
>>>> cidr_bits, is_v6,
>>>> -                                        l3dgw_port, lflow_ref);
>>>> +                                        l3dgw_port, lflow_ref,
>>>> features);
>>>>          }
>>>>
>>>>          /* S_ROUTER_IN_ADMISSION - S_ROUTER_IN_IP_INPUT */
>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>>> index 17b414144..ce6fd1270 100644
>>>> --- a/northd/ovn-northd.8.xml
>>>> +++ b/northd/ovn-northd.8.xml
>>>> @@ -4869,6 +4869,13 @@ nd_ns {
>>>>
>>>>      <p>
>>>>        <ul>
>>>> +        <li>
>>>> +          A priority-70 logical flow is added that initiates CT state
>>>> for
>>>> +          traffic that is configured to be SNATed on Distributed
>>>> routers.
>>>> +          This allows the next table, <code>lr_out_snat</code>, to
>>>> +          effectively match on various CT states.
>>>> +        </li>
>>>> +
>>>>          <li>
>>>>            A priority-50 logical flow is added that commits any
>>>> untracked flows
>>>>            from the previous table <code>lr_out_undnat</code> for
>>>> Gateway
>>>> @@ -5061,6 +5068,18 @@ nd_ns {
>>>>
>>>>        </li>
>>>>
>>>> +      <li>
>>>> +        An additional flow is added for traffic that goes in opposite
>>>> +        direction (i.e. it enters a network with configured SNAT).
>>>> Where the
>>>> +        flow above matched on <code>ip4.src == <var>A</var> &amp;&amp;
>>>> outport
>>>> +        == <var>GW</var></code>, this flow matches on <code> ip4.dst ==
>>>> +        <var>A</var> &amp;&amp; inport == <var>GW</var></code>. A CT
>>>> state is
>>>> +        initiated for this traffic so that the following table, <code>
>>>> +        lr_out_post_snat</code>, can identify whether the traffic flow
>>>> was
>>>> +        initiated from the internal or external network.
>>>> +
>>>> +      </li>
>>>> +
>>>>        <li>
>>>>          A priority-0 logical flow with match <code>1</code> has actions
>>>>          <code>next;</code>.
>>>> @@ -5072,6 +5091,16 @@ nd_ns {
>>>>      <p>
>>>>        Packets reaching this table are processed according to the flows
>>>> below:
>>>>        <ul>
>>>> +        <li>
>>>> +          Traffic that goes directly into a network configured with
>>>> SNAT on
>>>> +          Distributed routers, and was initiated from an external
>>>> network (i.e.
>>>> +          it matches <code>ct.new</code>), is committed to the SNAT CT
>>>> zone.
>>>> +          This ensures that replies returning from the SNATed network
>>>> do not
>>>> +          have their source address translated. For details about
>>>> match rules
>>>> +          and priority see section "Egress Table 3: SNAT on Distributed
>>>> +          Routers".
>>>> +        </li>
>>>> +
>>>>          <li>
>>>>            A priority-0 logical flow that matches all packets not
>>>> already
>>>>            handled (match <code>1</code>) and action <code>next;</code>.
>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>>>> index 0732486f3..d4a1212d0 100644
>>>> --- a/tests/ovn-northd.at
>>>> +++ b/tests/ovn-northd.at
>>>> @@ -1107,7 +1107,8 @@ ovn_start
>>>>  #
>>>>  # DR is connected to S1 and CR is connected to S2
>>>>
>>>> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>>>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>>>
>>>>  check ovn-nbctl lr-add DR
>>>>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>>>> @@ -1135,6 +1136,7 @@ echo "CR-LRP UUID is: " $uuid
>>>>  check ovn-nbctl set Logical_Router $cr_uuid options:chassis=gw1
>>>>  check ovn-nbctl --wait=sb sync
>>>>
>>>> +
>>>>
>>>
> nit: Unrelated
>

You mean the unnecessary extra line, right? ack.


>
>  ovn-nbctl create Address_Set name=allowed_range addresses=\"1.1.1.1\"
>>>>  ovn-nbctl create Address_Set name=disallowed_range
>>>> addresses=\"2.2.2.2\"
>>>>
>>>> @@ -1155,6 +1157,7 @@ AT_CAPTURE_FILE([crflows])
>>>>  AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl
>>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>>> action=(next;)
>>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>>> action=(next;)
>>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>>> ip4.src == $allowed_range), action=(ct_snat;)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>>> ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)),
>>>> action=(ct_snat(172.16.1.1);)
>>>>  ])
>>>>
>>>> @@ -1185,6 +1188,7 @@ AT_CAPTURE_FILE([crflows2])
>>>>  AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl
>>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>>> action=(next;)
>>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>>> action=(next;)
>>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
>>>> action=(ct_snat;)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);)
>>>>    table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src
>>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>>> ip4.dst == $disallowed_range), action=(next;)
>>>>  ])
>>>> @@ -1279,7 +1283,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 |
>>>> ovn_strip_lflows], [0], [dnl
>>>>    table=??(lr_out_snat        ), priority=33   , match=(ip && ip4.src
>>>> == 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2;
>>>> next;)
>>>>  ])
>>>>
>>>> -# Stateful FIP with DISALLOWED_IPs
>>>> +# Stateless FIP with DISALLOWED_IPs
>>>>  ovn-nbctl lr-nat-del DR dnat_and_snat  172.16.1.2
>>>>  ovn-nbctl lr-nat-del CR dnat_and_snat  172.16.1.2
>>>>
>>>> @@ -5489,7 +5493,8 @@ AT_CHECK([grep "lr_out_snat" lr0flows |
>>>> ovn_strip_lflows], [0], [dnl
>>>>
>>>>  check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>>>>    -- set chassis gw1 other_config:ct-no-masked-label="true" \
>>>> -  -- set chassis gw1 other_config:ovn-ct-lb-related="true"
>>>> +  -- set chassis gw1 other_config:ovn-ct-lb-related="true" \
>>>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>>>
>>>>  # Create a distributed gw port on lr0
>>>>  check ovn-nbctl ls-add public
>>>> @@ -5590,12 +5595,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
>>>> ovn_strip_lflows], [0], [dnl
>>>>
>>>>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
>>>> [dnl
>>>>    table=??(lr_out_post_undnat ), priority=0    , match=(1),
>>>> action=(next;)
>>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>>> == 10.0.0.10 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>>>  ])
>>>>
>>>>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>>> action=(next;)
>>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>>> action=(next;)
>>>> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst
>>>> == 10.0.0.0/24 && inport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src
>>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>>> action=(ct_snat(172.168.0.10);)
>>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>>> == 10.0.0.10 && inport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 10.0.0.10 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>>> action=(ct_snat(172.168.0.30);)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 10.0.0.3 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>>> action=(ct_snat(172.168.0.20);)
>>>>  ])
>>>> @@ -5738,12 +5747,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows |
>>>> ovn_strip_lflows], [0], [dnl
>>>>
>>>>  AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0],
>>>> [dnl
>>>>    table=??(lr_out_post_undnat ), priority=0    , match=(1),
>>>> action=(next;)
>>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>>> +  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src
>>>> == 10.0.0.10 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
>>>>  ])
>>>>
>>>>  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
>>>>    table=??(lr_out_snat        ), priority=0    , match=(1),
>>>> action=(next;)
>>>>    table=??(lr_out_snat        ), priority=120  , match=(nd_ns),
>>>> action=(next;)
>>>> +  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst
>>>> == 10.0.0.0/24 && inport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>>    table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src
>>>> == 10.0.0.0/24 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>>> action=(ct_snat(172.168.0.10);)
>>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>>> == 10.0.0.10 && inport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 10.0.0.10 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>>> action=(ct_snat(172.168.0.30);)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 10.0.0.3 && outport == "lr0-public" &&
>>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)),
>>>> action=(ct_snat(172.168.0.20);)
>>>>  ])
>>>>
>>> @@ -7576,9 +7589,14 @@ ovn_start
>>>>  # Validate SNAT, DNAT and DNAT_AND_SNAT behavior with multiple
>>>>  # distributed gateway LRPs.
>>>>
>>>> -check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
>>>> -check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
>>>> -check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
>>>> +check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
>>>> +  -- set chassis gw1 other_config:ct-commit-to-zone="true"
>>>> +
>>>> +check ovn-sbctl chassis-add gw2 geneve 128.0.0.1 \
>>>> +  -- set chassis gw2 other_config:ct-commit-to-zone="true"
>>>> +
>>>> +check ovn-sbctl chassis-add gw3 geneve 129.0.0.1 \
>>>> +  -- set chassis gw3 other_config:ct-commit-to-zone="true"
>>>>
>>>>  check ovn-nbctl lr-add DR
>>>>  check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
>>>> @@ -7643,6 +7661,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep
>>>> ct_snat | ovn_strip_lflows], [0], [dn
>>>>  ])
>>>>
>>>>  AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows],
>>>> [0], [dnl
>>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>>> == 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")),
>>>> action=(ct_snat;)
>>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>>> == 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")),
>>>> action=(ct_snat;)
>>>> +  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst
>>>> == 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")),
>>>> action=(ct_snat;)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") &&
>>>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") &&
>>>> (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);)
>>>>    table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src
>>>> == 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") &&
>>>> (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);)
>>>>
>>>
> I would actually expect one of the tests in ovn-northd.at to also check
> the flow that does the ct_commit(snat).
>

ack.


>
>
>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>>> index c22c7882f..80e9b7d0b 100644
>>>> --- a/tests/system-ovn.at
>>>> +++ b/tests/system-ovn.at
>>>> @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2
>>>> 10.0.0.1 | FORMAT_PING], \
>>>>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>>>  ])
>>>>
>>>> +
>>>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
>>>> +# icmp, udp and tcp connectivity from external network to the 'vm' on
>>>> +# the specified 'ip'.
>>>> +test_connectivity_from_ext() {
>>>> +    local vm=$1; shift
>>>> +    local ip=$1; shift
>>>> +
>>>> +    # Start listening daemons for UDP and TCP connections
>>>> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
>>>> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
>>>> +
>>>> +    # Ensure that vm can be pinged on the specified IP
>>>> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip |
>>>> FORMAT_PING], \
>>>> +    [0], [dnl
>>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>>> +])
>>>> +
>>>> +    # Perform two consecutive UDP connections to the specified IP
>>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>>> +
>>>> +    # Send data over TCP connection to the specified IP
>>>> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip
>>>> 1235])
>>>>
>>>
> Do we actually need to send data? I think -z is also enough for TCP to
> prove that the connection works.
>

Ordinarily nc with -z would be enough. However in this case even the
original code would pass the test without sending any payload. As mentioned
in the my ovs-discuss thread about this issue [0], TCP handshake worked
fine. The problem manifests only after the actual data was sent.

[0]
https://mail.openvswitch.org/pipermail/ovs-discuss/2024-January/052901.html


>
>
>> +}
>>>> +
>>>> +# Test access from external network to the internal IP of a VM that
>>>> +# has also configured DNAT
>>>> +test_connectivity_from_ext foo1 192.168.1.2
>>>> +
>>>> +# Test access from external network to the internal IP of a VM that
>>>> +# does not have DNAT
>>>> +test_connectivity_from_ext bar1 192.168.2.2
>>>> +
>>>>  OVS_WAIT_UNTIL([
>>>>      total_pkts=$(cat ext-net.pcap | wc -l)
>>>>      test "${total_pkts}" = "3"
>>>> @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0],
>>>> [dnl
>>>>
>>>>  icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>>  ])
>>>>
>>>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
>>>> +# icmp, udp and tcp connectivity from external network to the 'vm' on
>>>> +# the specified 'ip'.
>>>> +test_connectivity_from_ext() {
>>>> +    local vm=$1; shift
>>>> +    local ip=$1; shift
>>>> +
>>>> +    # Start listening daemons for UDP and TCP connections
>>>> +    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
>>>> +    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
>>>> +
>>>> +    # Ensure that vm can be pinged on the specified IP
>>>> +    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip |
>>>> FORMAT_PING], \
>>>> +    [0], [dnl
>>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>>> +])
>>>> +
>>>> +    # Perform two consecutive UDP connections to the specified IP
>>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>>> +    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
>>>> +
>>>> +    # Send data over TCP connection to the specified IP
>>>> +    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip
>>>> 1235])
>>>>
>>>
>
> Same as above.
>
> +}
>>>> +
>>>> +# Test access from external network to the internal IP of a VM that
>>>> +# has also configured DNAT
>>>> +test_connectivity_from_ext foo1 fd11::2
>>>> +
>>>> +# Test access from external network to the internal IP of a VM that
>>>> +# does not have DNAT
>>>> +test_connectivity_from_ext bar1 fd12::2
>>>> +
>>>>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>>>
>>>>  as ovn-sb
>>>> @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
>>>> 172.16.1.4 | FORMAT_PING], \
>>>>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp |
>>>> FORMAT_CT(172.16.1.1) | \
>>>>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>>>
>>>>  icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>>>
>>>> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>>>
>>>>  icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
>>>>  ])
>>>>
>>>> @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
>>>> fd20::4 | FORMAT_PING], \
>>>>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \
>>>>  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
>>>>
>>>>  icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>>
>>>> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>>
>>>>  icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
>>>>  ])
>>>>
>>>> --
>>>> 2.40.1
>>>>
>>>>
>>>
>>> --
>>> Best Regards,
>>> Martin Kalcok.
>>>
>>
>>
>> --
>> Best Regards,
>> Martin Kalcok.
>>
>
> Thanks,
> Ales
> --
>
> Ales Musil
>
> Senior Software Engineer - OVN Core
>
> Red Hat EMEA <https://www.redhat.com>
>
> amusil@redhat.com
> <https://red.ht/sig>
>
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index 2c3560ce2..25af52d5a 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -14438,20 +14438,27 @@  build_lrouter_out_is_dnat_local(struct lflow_table *lflows,
 
 static void
 build_lrouter_out_snat_match(struct lflow_table *lflows,
-                             const struct ovn_datapath *od,
-                             const struct nbrec_nat *nat, struct ds *match,
-                             bool distributed_nat, int cidr_bits, bool is_v6,
-                             struct ovn_port *l3dgw_port,
-                             struct lflow_ref *lflow_ref)
+                                         const struct ovn_datapath *od,
+                                         const struct nbrec_nat *nat,
+                                         struct ds *match,
+                                         bool distributed_nat, int cidr_bits,
+                                         bool is_v6,
+                                         struct ovn_port *l3dgw_port,
+                                         struct lflow_ref *lflow_ref,
+                                         bool is_reverse)
 {
     ds_clear(match);
 
-    ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4',
+    ds_put_format(match, "ip && ip%c.%s == %s",
+                  is_v6 ? '6' : '4',
+                  is_reverse ? "dst" : "src",
                   nat->logical_ip);
 
     if (!od->is_gw_router) {
         /* Distributed router. */
-        ds_put_format(match, " && outport == %s", l3dgw_port->json_key);
+        ds_put_format(match, " && %s == %s",
+                      is_reverse ? "inport" : "outport",
+                      l3dgw_port->json_key);
         if (od->n_l3dgw_ports) {
             ds_put_format(match, " && is_chassis_resident(\"%s\")",
                           distributed_nat
@@ -14462,7 +14469,7 @@  build_lrouter_out_snat_match(struct lflow_table *lflows,
 
     if (nat->allowed_ext_ips || nat->exempted_ext_ips) {
         lrouter_nat_add_ext_ip_match(od, lflows, match, nat,
-                                     is_v6, false, cidr_bits,
+                                     is_v6, is_reverse, cidr_bits,
                                      lflow_ref);
     }
 }
@@ -14489,7 +14496,8 @@  build_lrouter_out_snat_stateless_flow(struct lflow_table *lflows,
     uint16_t priority = cidr_bits + 1;
 
     build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
-                                 cidr_bits, is_v6, l3dgw_port, lflow_ref);
+                                 cidr_bits, is_v6, l3dgw_port, lflow_ref,
+                                 false);
 
     if (!od->is_gw_router) {
         /* Distributed router. */
@@ -14536,7 +14544,7 @@  build_lrouter_out_snat_in_czone_flow(struct lflow_table *lflows,
 
     build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
                                  cidr_bits, is_v6, l3dgw_port,
-                                 lflow_ref);
+                                 lflow_ref, false);
 
     if (od->n_l3dgw_ports) {
         priority += 128;
@@ -14585,12 +14593,14 @@  build_lrouter_out_snat_flow(struct lflow_table *lflows,
                             struct ds *actions, bool distributed_nat,
                             struct eth_addr mac, int cidr_bits, bool is_v6,
                             struct ovn_port *l3dgw_port,
-                            struct lflow_ref *lflow_ref)
+                            struct lflow_ref *lflow_ref,
+                            const struct chassis_features *features)
 {
     if (strcmp(nat->type, "snat") && strcmp(nat->type, "dnat_and_snat")) {
         return;
     }
 
+    struct ds match_all_from_snat = DS_EMPTY_INITIALIZER;
     ds_clear(actions);
 
     /* The priority here is calculated such that the
@@ -14599,7 +14609,9 @@  build_lrouter_out_snat_flow(struct lflow_table *lflows,
     uint16_t priority = cidr_bits + 1;
 
     build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat,
-                                 cidr_bits, is_v6, l3dgw_port, lflow_ref);
+                                 cidr_bits, is_v6, l3dgw_port, lflow_ref,
+                                 false);
+    ds_clone(&match_all_from_snat, match);
 
     if (!od->is_gw_router) {
         /* Distributed router. */
@@ -14624,6 +14636,36 @@  build_lrouter_out_snat_flow(struct lflow_table *lflows,
                             priority, ds_cstr(match),
                             ds_cstr(actions), &nat->header_,
                             lflow_ref);
+
+    /* For the SNAT networks, we need to make sure that connections are
+     * properly tracked so we can decide whether to perform SNAT on traffic
+     * exiting the network. */
+    if (features->ct_commit_to_zone && !strcmp(nat->type, "snat") &&
+        !od->is_gw_router) {
+        /* For traffic that comes from SNAT network, initiate CT state before
+         * entering S_ROUTER_OUT_SNAT to allow matching on various CT states.
+         */
+        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70,
+                      ds_cstr(&match_all_from_snat), "ct_snat; ",
+                      lflow_ref);
+
+        build_lrouter_out_snat_match(lflows, od, nat, match,
+                                     distributed_nat, cidr_bits, is_v6,
+                                     l3dgw_port, lflow_ref, true);
+
+        /* New traffic that goes into SNAT network is committed to CT to avoid
+         * SNAT-ing replies.*/
+        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority,
+                      ds_cstr(match), "ct_snat;",
+                      lflow_ref);
+
+        ds_put_cstr(match, "&& ct.new");
+        ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority,
+                      ds_cstr(match), "ct_commit(snat); next; ",
+                      lflow_ref);
+    }
+
+    ds_destroy(&match_all_from_snat);
 }
 
 static void
@@ -15167,7 +15209,7 @@  build_lrouter_nat_defrag_and_lb(
         } else {
             build_lrouter_out_snat_flow(lflows, od, nat, match, actions,
                                         distributed_nat, mac, cidr_bits, is_v6,
-                                        l3dgw_port, lflow_ref);
+                                        l3dgw_port, lflow_ref, features);
         }
 
         /* S_ROUTER_IN_ADMISSION - S_ROUTER_IN_IP_INPUT */
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 17b414144..ce6fd1270 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -4869,6 +4869,13 @@  nd_ns {
 
     <p>
       <ul>
+        <li>
+          A priority-70 logical flow is added that initiates CT state for
+          traffic that is configured to be SNATed on Distributed routers.
+          This allows the next table, <code>lr_out_snat</code>, to
+          effectively match on various CT states.
+        </li>
+
         <li>
           A priority-50 logical flow is added that commits any untracked flows
           from the previous table <code>lr_out_undnat</code> for Gateway
@@ -5061,6 +5068,18 @@  nd_ns {
 
       </li>
 
+      <li>
+        An additional flow is added for traffic that goes in opposite
+        direction (i.e. it enters a network with configured SNAT). Where the
+        flow above matched on <code>ip4.src == <var>A</var> &amp;&amp; outport
+        == <var>GW</var></code>, this flow matches on <code> ip4.dst ==
+        <var>A</var> &amp;&amp; inport == <var>GW</var></code>. A CT state is
+        initiated for this traffic so that the following table, <code>
+        lr_out_post_snat</code>, can identify whether the traffic flow was
+        initiated from the internal or external network.
+
+      </li>
+
       <li>
         A priority-0 logical flow with match <code>1</code> has actions
         <code>next;</code>.
@@ -5072,6 +5091,16 @@  nd_ns {
     <p>
       Packets reaching this table are processed according to the flows below:
       <ul>
+        <li>
+          Traffic that goes directly into a network configured with SNAT on
+          Distributed routers, and was initiated from an external network (i.e.
+          it matches <code>ct.new</code>), is committed to the SNAT CT zone.
+          This ensures that replies returning from the SNATed network do not
+          have their source address translated. For details about match rules
+          and priority see section "Egress Table 3: SNAT on Distributed
+          Routers".
+        </li>
+
         <li>
           A priority-0 logical flow that matches all packets not already
           handled (match <code>1</code>) and action <code>next;</code>.
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 0732486f3..d4a1212d0 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1107,7 +1107,8 @@  ovn_start
 #
 # DR is connected to S1 and CR is connected to S2
 
-check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
+check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
+  -- set chassis gw1 other_config:ct-commit-to-zone="true"
 
 check ovn-nbctl lr-add DR
 check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
@@ -1135,6 +1136,7 @@  echo "CR-LRP UUID is: " $uuid
 check ovn-nbctl set Logical_Router $cr_uuid options:chassis=gw1
 check ovn-nbctl --wait=sb sync
 
+
 ovn-nbctl create Address_Set name=allowed_range addresses=\"1.1.1.1\"
 ovn-nbctl create Address_Set name=disallowed_range addresses=\"2.2.2.2\"
 
@@ -1155,6 +1157,7 @@  AT_CAPTURE_FILE([crflows])
 AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl
   table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
   table=??(lr_out_snat        ), priority=120  , match=(nd_ns), action=(next;)
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") && ip4.src == $allowed_range), action=(ct_snat;)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);)
 ])
 
@@ -1185,6 +1188,7 @@  AT_CAPTURE_FILE([crflows2])
 AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl
   table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
   table=??(lr_out_snat        ), priority=120  , match=(nd_ns), action=(next;)
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), action=(ct_snat;)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);)
   table=??(lr_out_snat        ), priority=163  , match=(ip && ip4.src == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && ip4.dst == $disallowed_range), action=(next;)
 ])
@@ -1279,7 +1283,7 @@  AT_CHECK([grep -e "lr_out_snat" crflows5 | ovn_strip_lflows], [0], [dnl
   table=??(lr_out_snat        ), priority=33   , match=(ip && ip4.src == 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; next;)
 ])
 
-# Stateful FIP with DISALLOWED_IPs
+# Stateless FIP with DISALLOWED_IPs
 ovn-nbctl lr-nat-del DR dnat_and_snat  172.16.1.2
 ovn-nbctl lr-nat-del CR dnat_and_snat  172.16.1.2
 
@@ -5489,7 +5493,8 @@  AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
 
 check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
   -- set chassis gw1 other_config:ct-no-masked-label="true" \
-  -- set chassis gw1 other_config:ovn-ct-lb-related="true"
+  -- set chassis gw1 other_config:ovn-ct-lb-related="true" \
+  -- set chassis gw1 other_config:ct-commit-to-zone="true"
 
 # Create a distributed gw port on lr0
 check ovn-nbctl ls-add public
@@ -5590,12 +5595,16 @@  AT_CHECK([grep "lr_out_undnat" lr0flows | ovn_strip_lflows], [0], [dnl
 
 AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], [dnl
   table=??(lr_out_post_undnat ), priority=0    , match=(1), action=(next;)
+  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
+  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
 ])
 
 AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
   table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
   table=??(lr_out_snat        ), priority=120  , match=(nd_ns), action=(next;)
+  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst == 10.0.0.0/24 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
   table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.10);)
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst == 10.0.0.10 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.30);)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);)
 ])
@@ -5738,12 +5747,16 @@  AT_CHECK([grep "lr_out_undnat" lr0flows | ovn_strip_lflows], [0], [dnl
 
 AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], [dnl
   table=??(lr_out_post_undnat ), priority=0    , match=(1), action=(next;)
+  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
+  table=??(lr_out_post_undnat ), priority=70   , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; )
 ])
 
 AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl
   table=??(lr_out_snat        ), priority=0    , match=(1), action=(next;)
   table=??(lr_out_snat        ), priority=120  , match=(nd_ns), action=(next;)
+  table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.dst == 10.0.0.0/24 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
   table=??(lr_out_snat        ), priority=153  , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.10);)
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst == 10.0.0.10 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.30);)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);)
 ])
@@ -7576,9 +7589,14 @@  ovn_start
 # Validate SNAT, DNAT and DNAT_AND_SNAT behavior with multiple
 # distributed gateway LRPs.
 
-check ovn-sbctl chassis-add gw1 geneve 127.0.0.1
-check ovn-sbctl chassis-add gw2 geneve 128.0.0.1
-check ovn-sbctl chassis-add gw3 geneve 129.0.0.1
+check ovn-sbctl chassis-add gw1 geneve 127.0.0.1 \
+  -- set chassis gw1 other_config:ct-commit-to-zone="true"
+
+check ovn-sbctl chassis-add gw2 geneve 128.0.0.1 \
+  -- set chassis gw2 other_config:ct-commit-to-zone="true"
+
+check ovn-sbctl chassis-add gw3 geneve 129.0.0.1 \
+  -- set chassis gw3 other_config:ct-commit-to-zone="true"
 
 check ovn-nbctl lr-add DR
 check ovn-nbctl lrp-add DR DR-S1 02:ac:10:01:00:01 172.16.1.1/24
@@ -7643,6 +7661,9 @@  AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat | ovn_strip_lflows], [0], [dn
 ])
 
 AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows], [0], [dnl
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst == 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), action=(ct_snat;)
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst == 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")), action=(ct_snat;)
+  table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.dst == 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")), action=(ct_snat;)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") && (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);)
   table=??(lr_out_snat        ), priority=161  , match=(ip && ip4.src == 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") && (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);)
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index c22c7882f..80e9b7d0b 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -3572,6 +3572,40 @@  NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 10.0.0.1 | FORMAT_PING], \
 3 packets transmitted, 3 received, 0% packet loss, time 0ms
 ])
 
+
+# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
+# icmp, udp and tcp connectivity from external network to the 'vm' on
+# the specified 'ip'.
+test_connectivity_from_ext() {
+    local vm=$1; shift
+    local ip=$1; shift
+
+    # Start listening daemons for UDP and TCP connections
+    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
+    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
+
+    # Ensure that vm can be pinged on the specified IP
+    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING], \
+    [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+    # Perform two consecutive UDP connections to the specified IP
+    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
+    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
+
+    # Send data over TCP connection to the specified IP
+    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
+}
+
+# Test access from external network to the internal IP of a VM that
+# has also configured DNAT
+test_connectivity_from_ext foo1 192.168.1.2
+
+# Test access from external network to the internal IP of a VM that
+# does not have DNAT
+test_connectivity_from_ext bar1 192.168.2.2
+
 OVS_WAIT_UNTIL([
     total_pkts=$(cat ext-net.pcap | wc -l)
     test "${total_pkts}" = "3"
@@ -3738,6 +3772,39 @@  sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
 icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
 ])
 
+# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests
+# icmp, udp and tcp connectivity from external network to the 'vm' on
+# the specified 'ip'.
+test_connectivity_from_ext() {
+    local vm=$1; shift
+    local ip=$1; shift
+
+    # Start listening daemons for UDP and TCP connections
+    NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid])
+    NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid])
+
+    # Ensure that vm can be pinged on the specified IP
+    NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING], \
+    [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+    # Perform two consecutive UDP connections to the specified IP
+    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
+    NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z])
+
+    # Send data over TCP connection to the specified IP
+    NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235])
+}
+
+# Test access from external network to the internal IP of a VM that
+# has also configured DNAT
+test_connectivity_from_ext foo1 fd11::2
+
+# Test access from external network to the internal IP of a VM that
+# does not have DNAT
+test_connectivity_from_ext bar1 fd12::2
+
 OVS_APP_EXIT_AND_WAIT([ovn-controller])
 
 as ovn-sb
@@ -3918,6 +3985,7 @@  NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 172.16.1.4 | FORMAT_PING], \
 AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp | FORMAT_CT(172.16.1.1) | \
 sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
 icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
+icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
 icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared>
 ])
 
@@ -4086,6 +4154,7 @@  NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 fd20::4 | FORMAT_PING], \
 AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \
 sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
 icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
+icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
 icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared>
 ])