diff mbox series

[ovs-dev,v9] ovn-northd: introduce new allow-stateless ACL verb

Message ID 20210506143446.638103-1-ihrachys@redhat.com
State Superseded, archived
Headers show
Series [ovs-dev,v9] ovn-northd: introduce new allow-stateless ACL verb | expand

Commit Message

Ihar Hrachyshka May 6, 2021, 2:34 p.m. UTC
For allow-stateless ACLs, bypass connection tracking by avoiding
setting ct hints for matching traffic. Avoid sending all traffic to ct
when a stateful ACL is present.

===

Reusing an existing 'allow' verb for stateless matching would have its
drawbacks, specifically, when it comes to backwards incompatibility of
the new behavior with existing environments. When using "allow" ACLs
in mixed allow/allow-related environment, we still commit "allow"
traffic to conntrack. This unnecessarily hits performance when mixed
ACL action types were used for the same datapath. This is why we
introduce a new action verb to describe stateless behavior.

Another complexity to consider is the fact that with stateless
matching, one would not be able to rely on 'related' magic that
guarantees that reply traffic is passed through. Instead, the user
would have to accurately define matching rules both for request and
reply directions of a protocol session. Specifically, when allowing
ICMP for a specific peer host, one has to define 'allow-stateless'
rules that would match against ip.dst for request direction and ip.src
for reply direction. Other protocols and scenarios will require their
own fine grained matching approaches implemented by the user.

===

For performance measurements, qperf was used. Tests were executed on two
setups:

1) ovn-fake-multinode virtual environment;
2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute
   nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
   SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.

1) ovn-fake-multinode:

Performance measured between two virtual nodes, two ports
that belong to different LSs connected via router. Using qperf,
performance was measured for UDP, TCP, SCTP protocols (using
<proto>_lat and <proto>_bw tests). The qperf version used:
0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
averages compared.

Tests were executed with `allow-stateless` rules for the tested
protocol and `allow-related` for another protocol set for both ports,
both directions, e.g. for TCP scenario, the following ACLs were
defined:

ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless

ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related

In this particular environment, improvement was seen in send_bw,
latency, and msg_rate measurements, where applicable, for all three
protocols under test.

for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
         latency: 16 us => 14.08 us (-12%)
         msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)

for TCP, latency: 18.6 us => 14.88 us (-20%)
         msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)

for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
          msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)

Interestingly, some performance improvement was also seen for the same
scenarios with no ACLs set at all, albeit significantly more
negligible.

for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
         latency: 13.74 us => 12.88 us (-6.68%)
         msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)

for TCP, latency: 15.62 us => 14.26 us (-9.54%)
         msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)

for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
          msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)

2) Supermicro RH-OSP cluster:

Two scenarios executed:
- connectivity between two ports on the same switch
- connectivity between two ports on different switches connected via
  router

For the same switch, improvements are as follows:
- TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
- TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%

For different switches, improvements are as follows:
- TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
- TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%

The effect is more noticeable in same switch scenario.

Comparable numbers can be captured with iperf. It may be useful to run
more tests in a more elaborate (bare metal) environment.

===

The patch takes inspiration from a now abandoned patch:

"ovn-northd: Support mixing stateless/stateful ACLs with
Stateless_Filter." by Dumitru Ceara.

The original patch assumed CMS doesn't require full flexibility of
matching rules for stateless matching (for example, to be used by
OpenShift). But other CMS interfaces may require the same
customizability for stateless as well as stateful matching, like in
OpenStack Neutron API. Which is why this patch reuses existing ACL
object type to describe stateless rules.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>

---

v1: initial version.
v2: rebased after conflict.
v3: added ddlog implementation.
v3: apply stateless skip-hint-set rules to appropriate direction only.
v3: added more background as to implementation in commit message.
v3: test stateless scenarios with ddlog too.
v3: rebased after conflict.
v4: introduce a separate allow-stateless ACL match verb.
v5: rebased.
v6: updated docs for new allow-stateless approach.
v6: removed no longer valid comments.
v6: removed acl_is_stateless.
v7: bump north db schema version to 5.31.0 -> 5.32.0.
v8: fixed checkpatch failure on a too long line in dbschema.
v8: added perf data on baremetal lab testing.
v9: fixed ovsdb checksum.
---
 NEWS                    |   2 +
 northd/ovn-northd.8.xml |   8 +-
 northd/ovn-northd.c     |  65 ++++++++-
 northd/ovn_northd.dl    |  31 ++++
 ovn-nb.ovsschema        |   9 +-
 ovn-nb.xml              |   9 +-
 tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
 utilities/ovn-nbctl.c   |   6 +-
 8 files changed, 428 insertions(+), 11 deletions(-)

Comments

Han Zhou May 7, 2021, 5:35 a.m. UTC | #1
Hi Ihar,

Thanks for the patch, and it's great to see the significant performance
improvement!
Please see my comments below.

On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>
> For allow-stateless ACLs, bypass connection tracking by avoiding
> setting ct hints for matching traffic. Avoid sending all traffic to ct
> when a stateful ACL is present.
>
> ===
>
> Reusing an existing 'allow' verb for stateless matching would have its
> drawbacks, specifically, when it comes to backwards incompatibility of
> the new behavior with existing environments. When using "allow" ACLs
> in mixed allow/allow-related environment, we still commit "allow"
> traffic to conntrack. This unnecessarily hits performance when mixed
> ACL action types were used for the same datapath. This is why we
> introduce a new action verb to describe stateless behavior.
>
> Another complexity to consider is the fact that with stateless
> matching, one would not be able to rely on 'related' magic that
> guarantees that reply traffic is passed through. Instead, the user
> would have to accurately define matching rules both for request and
> reply directions of a protocol session. Specifically, when allowing
> ICMP for a specific peer host, one has to define 'allow-stateless'
> rules that would match against ip.dst for request direction and ip.src
> for reply direction. Other protocols and scenarios will require their
> own fine grained matching approaches implemented by the user.
>
> ===
>
> For performance measurements, qperf was used. Tests were executed on two
> setups:
>
> 1) ovn-fake-multinode virtual environment;
> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute
>    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
>    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
>
> 1) ovn-fake-multinode:
>
> Performance measured between two virtual nodes, two ports
> that belong to different LSs connected via router. Using qperf,
> performance was measured for UDP, TCP, SCTP protocols (using
> <proto>_lat and <proto>_bw tests). The qperf version used:
> 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
> averages compared.
>
> Tests were executed with `allow-stateless` rules for the tested
> protocol and `allow-related` for another protocol set for both ports,
> both directions, e.g. for TCP scenario, the following ACLs were
> defined:
>
> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
>
> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
>
> In this particular environment, improvement was seen in send_bw,
> latency, and msg_rate measurements, where applicable, for all three
> protocols under test.
>
> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
>          latency: 16 us => 14.08 us (-12%)
>          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
>
> for TCP, latency: 18.6 us => 14.88 us (-20%)
>          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
>
> for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
>           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
>
> Interestingly, some performance improvement was also seen for the same
> scenarios with no ACLs set at all, albeit significantly more
> negligible.
>
> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
>          latency: 13.74 us => 12.88 us (-6.68%)
>          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
>
> for TCP, latency: 15.62 us => 14.26 us (-9.54%)
>          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
>
> for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
>           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
>
> 2) Supermicro RH-OSP cluster:
>
> Two scenarios executed:
> - connectivity between two ports on the same switch
> - connectivity between two ports on different switches connected via
>   router
>
> For the same switch, improvements are as follows:
> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
>
> For different switches, improvements are as follows:
> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
>
> The effect is more noticeable in same switch scenario.

This is interesting. Any idea why with different switches the result is so
different? I'd expect similar improvement just like in same switch, because
in this kind of test the flow should be cached in kernel datapath
(regardless of same/different logical switches) and in both cases the cost
of traversing conntrack is saved.

>
> Comparable numbers can be captured with iperf. It may be useful to run
> more tests in a more elaborate (bare metal) environment.
>
> ===
>
> The patch takes inspiration from a now abandoned patch:
>
> "ovn-northd: Support mixing stateless/stateful ACLs with
> Stateless_Filter." by Dumitru Ceara.
>
> The original patch assumed CMS doesn't require full flexibility of
> matching rules for stateless matching (for example, to be used by
> OpenShift). But other CMS interfaces may require the same
> customizability for stateless as well as stateful matching, like in
> OpenStack Neutron API. Which is why this patch reuses existing ACL
> object type to describe stateless rules.
>
> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
>
> ---
>
> v1: initial version.
> v2: rebased after conflict.
> v3: added ddlog implementation.
> v3: apply stateless skip-hint-set rules to appropriate direction only.
> v3: added more background as to implementation in commit message.
> v3: test stateless scenarios with ddlog too.
> v3: rebased after conflict.
> v4: introduce a separate allow-stateless ACL match verb.
> v5: rebased.
> v6: updated docs for new allow-stateless approach.
> v6: removed no longer valid comments.
> v6: removed acl_is_stateless.
> v7: bump north db schema version to 5.31.0 -> 5.32.0.
> v8: fixed checkpatch failure on a too long line in dbschema.
> v8: added perf data on baremetal lab testing.
> v9: fixed ovsdb checksum.
> ---
>  NEWS                    |   2 +
>  northd/ovn-northd.8.xml |   8 +-
>  northd/ovn-northd.c     |  65 ++++++++-
>  northd/ovn_northd.dl    |  31 ++++
>  ovn-nb.ovsschema        |   9 +-
>  ovn-nb.xml              |   9 +-
>  tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
>  utilities/ovn-nbctl.c   |   6 +-
>  8 files changed, 428 insertions(+), 11 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 1ddde15f8..d72139e76 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -11,6 +11,8 @@ Post-v21.03.0
>      'use_parallel_build' to enable it.  It is disabled by default.
>    - Support vlan-passthru mode for tag=0 localnet ports.
>    - Support custom 802.11ad EthType for localnet ports.
> +  - Introduce a new "allow-stateless" ACL verb to always bypass
connection
> +    tracking. The existing "allow" verb behavior is left intact.
>
>  OVN v21.03.0 - 12 Mar 2021
>  -------------------------
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 54e88d3fa..08c4ea852 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -419,7 +419,9 @@
>        before eventually advancing to ingress table <code>ACLs</code>. If
>        special ports such as route ports or localnet ports can't use
ct(), a
>        priority-110 flow is added to skip over stateful ACLs. IPv6
Neighbor
> -      Discovery and MLD traffic also skips stateful ACLs.
> +      Discovery and MLD traffic also skips stateful ACLs. For
"allow-stateless"
> +      ACLs, a flow is added to bypass setting the hint for connection
tracker
> +      processing.

This approach clearly achieves the goal of bypassing conntrack and there is
great dataplane performance gain, but it also doubles the ACL logical flows
which in turn could significantly increase the control plane cost,
especially in ovn-controller. In large scale environments where lots of
ACLs referencing large port-group/address-sets are used, it is already a
bottleneck of ovn-controller flow computing. It would be better if we could
avoid doubling this cost. Instead of directly add the ACL match here, can
we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage
pre-stateful add a higher priority flow, if this hint is matched then
directly "next" without going through CT? I hope this can achieve the same
goal but not increasing control plane cost. What do you think?

>      </p>
>
>      <p>
> @@ -614,6 +616,10 @@
>          for new connections and <code>reg0[1] = 1; next;</code> for
existing
>          connections.
>        </li>
> +      <li>
> +        <code>allow-stateless</code> ACLs translate into logical
> +        flows with the <code>next;</code> action.
> +      </li>
>        <li>
>          <code>reject</code> ACLs translate into logical
>          flows with the
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 94fae5648..4032f1441 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -4974,7 +4974,52 @@ skip_port_from_conntrack(struct ovn_datapath *od,
struct ovn_port *op,
>  }
>
>  static void
> -build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
> +build_stateless_filter(struct ovn_datapath *od,
> +                       const struct nbrec_acl *acl,
> +                       struct hmap *lflows)
> +{
> +    if (!strcmp(acl->direction, "from-lport")) {
> +        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL,
> +                                acl->priority + OVN_ACL_PRI_OFFSET,
> +                                acl->match,
> +                                "next;",
> +                                &acl->header_);
> +    } else {
> +        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL,
> +                                acl->priority + OVN_ACL_PRI_OFFSET,
> +                                acl->match,
> +                                "next;",
> +                                &acl->header_);
> +    }
> +}
> +
> +static void
> +build_stateless_filters(struct ovn_datapath *od, struct hmap
*port_groups,
> +                        struct hmap *lflows)
> +{
> +    for (size_t i = 0; i < od->nbs->n_acls; i++) {
> +        const struct nbrec_acl *acl = od->nbs->acls[i];
> +        if (!strcmp(acl->action, "allow-stateless")) {
> +            build_stateless_filter(od, acl, lflows);
> +        }
> +    }
> +
> +    struct ovn_port_group *pg;
> +    HMAP_FOR_EACH (pg, key_node, port_groups) {
> +        if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) {
> +            for (size_t i = 0; i < pg->nb_pg->n_acls; i++) {
> +                const struct nbrec_acl *acl = pg->nb_pg->acls[i];
> +                if (!strcmp(acl->action, "allow-stateless")) {
> +                    build_stateless_filter(od, acl, lflows);
> +                }
> +            }
> +        }
> +    }
> +}
> +
> +static void
> +build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups,
> +               struct hmap *lflows)
>  {
>      /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are
>       * allowed by default. */
> @@ -5002,6 +5047,8 @@ build_pre_acls(struct ovn_datapath *od, struct hmap
*lflows)
>                                       110, lflows);
>          }
>

nit: there is a comment here several lines above this that needs to be
updated:

|     /* If there are any stateful ACL rules in this datapath, we must

|      * send all IP packets through the conntrack action, which handles

|      * defragmentation, in order to match L4 headers. */

> +        build_stateless_filters(od, port_groups, lflows);
> +
>          /* Ingress and Egress Pre-ACL Table (Priority 110).
>           *
>           * Not to do conntrack on ND and ICMP destination
> @@ -5369,7 +5416,8 @@ build_acl_log(struct ds *actions, const struct
nbrec_acl *acl,
>      } else if (!strcmp(acl->action, "reject")) {
>          ds_put_cstr(actions, "verdict=reject, ");
>      } else if (!strcmp(acl->action, "allow")
> -        || !strcmp(acl->action, "allow-related")) {
> +        || !strcmp(acl->action, "allow-related")
> +        || !strcmp(acl->action, "allow-stateless")) {
>          ds_put_cstr(actions, "verdict=allow, ");
>      }
>
> @@ -5428,7 +5476,16 @@ consider_acl(struct hmap *lflows, struct
ovn_datapath *od,
>      bool ingress = !strcmp(acl->direction, "from-lport") ? true :false;
>      enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL;
>
> -    if (!strcmp(acl->action, "allow")
> +    if (!strcmp(acl->action, "allow-stateless")) {
> +        struct ds actions = DS_EMPTY_INITIALIZER;
> +        build_acl_log(&actions, acl, meter_groups);
> +        ds_put_cstr(&actions, "next;");
> +        ovn_lflow_add_with_hint(lflows, od, stage,
> +                                acl->priority + OVN_ACL_PRI_OFFSET,
> +                                acl->match, ds_cstr(&actions),
> +                                &acl->header_);
> +        ds_destroy(&actions);
> +    } else if (!strcmp(acl->action, "allow")
>          || !strcmp(acl->action, "allow-related")) {
>          /* If there are any stateful flows, we must even commit "allow"
>           * actions.  This is because, while the initiater's
> @@ -6828,7 +6885,7 @@ build_lswitch_lflows_pre_acl_and_acl(struct
ovn_datapath *od,
>          od->has_stateful_acl = ls_has_stateful_acl(od);
>          od->has_lb_vip = ls_has_lb_vip(od);
>
> -        build_pre_acls(od, lflows);
> +        build_pre_acls(od, port_groups, lflows);
>          build_pre_lb(od, lflows, meter_groups, lbs);
>          build_pre_stateful(od, lflows);
>          build_acl_hints(od, lflows);
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 7953325aa..5ae52b3e8 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -1824,6 +1824,27 @@ for (&Switch(.ls =ls)) {
>           .external_ids     = map_empty())
>  }
>
> +for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter
= fair_meter)) {
> +    if (sw.has_stateful_acl) {
> +        if (acl.action == "allow-stateless") {
> +            if (acl.direction == "from-lport") {
> +                Flow(.logical_datapath = ls._uuid,
> +                     .stage            = s_SWITCH_IN_PRE_ACL(),
> +                     .priority         = acl.priority +
oVN_ACL_PRI_OFFSET(),
> +                     .__match          = acl.__match,
> +                     .actions          = "next;",
> +                     .external_ids     = stage_hint(acl._uuid))
> +            } else {
> +                Flow(.logical_datapath = ls._uuid,
> +                     .stage            = s_SWITCH_OUT_PRE_ACL(),
> +                     .priority         = acl.priority +
oVN_ACL_PRI_OFFSET(),
> +                     .__match          = acl.__match,
> +                     .actions          = "next;",
> +                     .external_ids     = stage_hint(acl._uuid))
> +            }
> +        }
> +    }
> +}
>
>  /* If there are any stateful ACL rules in this datapath, we must
>   * send all IP packets through the conntrack action, which handles
> @@ -2168,6 +2189,9 @@ function build_acl_log(acl: nb::ACL, fair_meter:
bool): string =
>              "allow-related" -> {
>                  strs.push("verdict=allow")
>              },
> +            "allow-stateless" -> {
> +                strs.push("verdict=allow")
> +            },
>              _ -> ()
>          };
>          match (acl.meter) {
> @@ -2546,6 +2570,13 @@ for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl =
&acl, .has_fair_meter = fair_
>                   .actions          = "${acl_log}next;",
>                   .external_ids     = stage_hint)
>          }
> +    } else if (acl.action == "allow-stateless") {
> +        Flow(.logical_datapath = ls._uuid,
> +             .stage            = stage,
> +             .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
> +             .__match          = acl.__match,
> +             .actions          = "${acl_log}next;",
> +             .external_ids     = stage_hint)
>      } else if (acl.action == "drop" or acl.action == "reject") {
>          /* The implementation of "drop" differs if stateful ACLs are in
>           * use for this datapath.  In that case, the actions differ
> diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema
> index 29019809c..faf619a1c 100644
> --- a/ovn-nb.ovsschema
> +++ b/ovn-nb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Northbound",
> -    "version": "5.31.0",
> -    "cksum": "2352750632 28701",
> +    "version": "5.32.0",
> +    "cksum": "204590300 28863",
>      "tables": {
>          "NB_Global": {
>              "columns": {
> @@ -221,7 +221,10 @@
>                                              "enum": ["set",
["from-lport", "to-lport"]]}}},
>                  "match": {"type": "string"},
>                  "action": {"type": {"key": {"type": "string",
> -                                            "enum": ["set", ["allow",
"allow-related", "drop", "reject"]]}}},
> +                                            "enum": ["set",
> +                                               ["allow", "allow-related",
> +                                                "allow-stateless",
"drop",
> +                                                "reject"]]}}},
>                  "log": {"type": "boolean"},
>                  "severity": {"type": {"key": {"type": "string",
>                                                "enum": ["set",
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index feb38b5d3..5386c8fef 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -1791,7 +1791,9 @@
>        <p>The action to take when the ACL rule matches:</p>
>        <ul>
>          <li>
> -          <code>allow</code>: Forward the packet.
> +          <code>allow</code>: Forward the packet. It will also send the
> +          packets through connection tracking when
> +          <code>allow-related</code> rules exist on the logical switch.
>          </li>

Thanks for updating the documentation. It is really helpful for
understanding the difference between "allow" and "allow-stateless". I think
it could be even more clear if we just say something like: when there is no
"allow-related" rules exist on the logical switch, "allow" is equivalent to
"allow-stateless". Otherwise, it is equivalent to "allow-related". (And the
order of "allow-stateless" and "allow" may be switched)

>
>          <li>
> @@ -1799,6 +1801,11 @@
>            (e.g. inbound replies to an outbound connection).
>          </li>
>
> +        <li>
> +          <code>allow-stateless</code>: Always forward the packet in
stateless
> +          manner. May require defining additional rules for inbound
replies.
> +        </li>
> +
It would be better to clarify more on the "inbound replies" part. The
commit message has more details, but it is better to make sure the users
understand it by just reading this text. We may also emphasise that the
packets will not go through conntrack table, even if there are other
"allow-related" rules co-exist.

Thanks,
Han

>          <li>
>            <code>drop</code>: Silently drop the packet.
>          </li>
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 32afb4fa8..f62526fd3 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -2592,6 +2592,315 @@ sed 's/reg8\[[0..15\]] ==
[[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0],
>  AT_CLEANUP
>  ])
>
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Logical_Switch])
> +ovn_start
> +
> +ovn-nbctl ls-add ls
> +ovn-nbctl lsp-add ls lsp1
> +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
> +ovn-nbctl lsp-add ls lsp2
> +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
> +
> +for direction in from to; do
> +    ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related
> +    ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related
> +    ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop
> +done
> +ovn-nbctl --wait=sb sync
> +
> +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66'
> +flow_tcp='tcp && tcp.dst == 80'
> +flow_udp='udp && udp.dst == 80'
> +
> +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
> +
> +# TCP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +ct_next(ct_state=new|trk) {
> +    ct_next(ct_state=new|trk) {
> +        output("lsp2");
> +    };
> +};
> +])
> +
> +# UDP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_next(ct_state=new|trk) {
> +        output("lsp2");
> +    };
> +};
> +])
> +
> +# Allow stateless for TCP.
> +for direction in from to; do
> +    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
> +done
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should not go to conntrack anymore.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +output("lsp2");
> +])
> +
> +# UDP packets still go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_next(ct_state=new|trk) {
> +        output("lsp2");
> +    };
> +};
> +])
> +
> +# Add a load balancer.
> +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
> +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
> +ovn-nbctl ls-lb-add ls lb-tcp
> +ovn-nbctl ls-lb-add ls lb-udp
> +
> +# Remove stateless for TCP.
> +ovn-nbctl acl-del ls
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +# UDP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +# Allow stateless for TCP.
> +for direction in from to; do
> +    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
> +done
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should go to conntrack for load balancing.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +# UDP packets still go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +AT_CLEANUP
> +])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group])
> +ovn_start
> +
> +ovn-nbctl ls-add ls
> +ovn-nbctl lsp-add ls lsp1
> +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
> +ovn-nbctl lsp-add ls lsp2
> +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
> +
> +ovn-nbctl pg-add pg lsp1 lsp2
> +
> +for direction in from to; do
> +    ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related
> +    ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related
> +    ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop
> +done
> +ovn-nbctl --wait=sb sync
> +
> +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
> +echo $lsp1_inport
> +
> +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66'
> +flow_tcp='tcp && tcp.dst == 80'
> +flow_udp='udp && udp.dst == 80'
> +
> +# TCP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +ct_next(ct_state=new|trk) {
> +    ct_next(ct_state=new|trk) {
> +        output("lsp2");
> +    };
> +};
> +])
> +
> +# UDP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_next(ct_state=new|trk) {
> +        output("lsp2");
> +    };
> +};
> +])
> +
> +# Allow stateless for TCP.
> +for direction in from to; do
> +    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
> +done
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should not go to conntrack anymore.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +output("lsp2");
> +])
> +
> +# UDP packets still go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_next(ct_state=new|trk) {
> +        output("lsp2");
> +    };
> +};
> +])
> +
> +# Add a load balancer.
> +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
> +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
> +ovn-nbctl ls-lb-add ls lb-tcp
> +ovn-nbctl ls-lb-add ls lb-udp
> +
> +# Remove stateless for TCP.
> +ovn-nbctl acl-del pg
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +# UDP packets should go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +# Allow stateless for TCP.
> +for direction in from to; do
> +    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
> +done
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should go to conntrack for load balancing.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +# UDP packets still go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> +ct_next(ct_state=new|trk) {
> +    ct_lb {
> +        reg0[[6]] = 0;
> +        *** chk_lb_hairpin_reply action not implemented;
> +        reg0[[12]] = 0;
> +        ct_next(ct_state=new|trk) {
> +            output("lsp2");
> +        };
> +    };
> +};
> +])
> +
> +AT_CLEANUP
> +])
> +
>  OVN_FOR_EACH_NORTHD([
>  AT_SETUP([ovn -- check BFD config propagation to SBDB])
>  AT_KEYWORDS([northd-bfd])
> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
> index 042c21002..48fd0b7ee 100644
> --- a/utilities/ovn-nbctl.c
> +++ b/utilities/ovn-nbctl.c
> @@ -2303,9 +2303,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>
>      /* Validate action. */
>      if (strcmp(action, "allow") && strcmp(action, "allow-related")
> -        && strcmp(action, "drop") && strcmp(action, "reject")) {
> +        && strcmp(action, "allow-stateless") && strcmp(action, "drop")
> +        && strcmp(action, "reject")) {
>          ctl_error(ctx, "%s: action must be one of \"allow\", "
> -                  "\"allow-related\", \"drop\", and \"reject\"", action);
> +                  "\"allow-related\", \"allow-stateless\", \"drop\", "
> +                  "and \"reject\"", action);
>          return;
>      }
>
> --
> 2.31.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Dumitru Ceara May 7, 2021, 7:54 a.m. UTC | #2
On 5/7/21 7:35 AM, Han Zhou wrote:
> Hi Ihar,
> 
> Thanks for the patch, and it's great to see the significant performance
> improvement!
> Please see my comments below.
> 
> On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>>
>> For allow-stateless ACLs, bypass connection tracking by avoiding
>> setting ct hints for matching traffic. Avoid sending all traffic to ct
>> when a stateful ACL is present.
>>
>> ===
>>
>> Reusing an existing 'allow' verb for stateless matching would have its
>> drawbacks, specifically, when it comes to backwards incompatibility of
>> the new behavior with existing environments. When using "allow" ACLs
>> in mixed allow/allow-related environment, we still commit "allow"
>> traffic to conntrack. This unnecessarily hits performance when mixed
>> ACL action types were used for the same datapath. This is why we
>> introduce a new action verb to describe stateless behavior.
>>
>> Another complexity to consider is the fact that with stateless
>> matching, one would not be able to rely on 'related' magic that
>> guarantees that reply traffic is passed through. Instead, the user
>> would have to accurately define matching rules both for request and
>> reply directions of a protocol session. Specifically, when allowing
>> ICMP for a specific peer host, one has to define 'allow-stateless'
>> rules that would match against ip.dst for request direction and ip.src
>> for reply direction. Other protocols and scenarios will require their
>> own fine grained matching approaches implemented by the user.
>>
>> ===
>>
>> For performance measurements, qperf was used. Tests were executed on two
>> setups:
>>
>> 1) ovn-fake-multinode virtual environment;
>> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute
>>    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
>>    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
>>
>> 1) ovn-fake-multinode:
>>
>> Performance measured between two virtual nodes, two ports
>> that belong to different LSs connected via router. Using qperf,
>> performance was measured for UDP, TCP, SCTP protocols (using
>> <proto>_lat and <proto>_bw tests). The qperf version used:
>> 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
>> averages compared.
>>
>> Tests were executed with `allow-stateless` rules for the tested
>> protocol and `allow-related` for another protocol set for both ports,
>> both directions, e.g. for TCP scenario, the following ACLs were
>> defined:
>>
>> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
>> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
>> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
>> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
>>
>> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
>> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
>> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
>> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
>>
>> In this particular environment, improvement was seen in send_bw,
>> latency, and msg_rate measurements, where applicable, for all three
>> protocols under test.
>>
>> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
>>          latency: 16 us => 14.08 us (-12%)
>>          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
>>
>> for TCP, latency: 18.6 us => 14.88 us (-20%)
>>          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
>>
>> for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
>>           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
>>
>> Interestingly, some performance improvement was also seen for the same
>> scenarios with no ACLs set at all, albeit significantly more
>> negligible.
>>
>> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
>>          latency: 13.74 us => 12.88 us (-6.68%)
>>          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
>>
>> for TCP, latency: 15.62 us => 14.26 us (-9.54%)
>>          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
>>
>> for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
>>           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
>>
>> 2) Supermicro RH-OSP cluster:
>>
>> Two scenarios executed:
>> - connectivity between two ports on the same switch
>> - connectivity between two ports on different switches connected via
>>   router
>>
>> For the same switch, improvements are as follows:
>> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
>> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
>>
>> For different switches, improvements are as follows:
>> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
>> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
>>
>> The effect is more noticeable in same switch scenario.
> 
> This is interesting. Any idea why with different switches the result is so
> different? I'd expect similar improvement just like in same switch, because
> in this kind of test the flow should be cached in kernel datapath
> (regardless of same/different logical switches) and in both cases the cost
> of traversing conntrack is saved.
> 
>>
>> Comparable numbers can be captured with iperf. It may be useful to run
>> more tests in a more elaborate (bare metal) environment.
>>
>> ===
>>
>> The patch takes inspiration from a now abandoned patch:
>>
>> "ovn-northd: Support mixing stateless/stateful ACLs with
>> Stateless_Filter." by Dumitru Ceara.
>>
>> The original patch assumed CMS doesn't require full flexibility of
>> matching rules for stateless matching (for example, to be used by
>> OpenShift). But other CMS interfaces may require the same
>> customizability for stateless as well as stateful matching, like in
>> OpenStack Neutron API. Which is why this patch reuses existing ACL
>> object type to describe stateless rules.
>>
>> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
>>
>> ---
>>
>> v1: initial version.
>> v2: rebased after conflict.
>> v3: added ddlog implementation.
>> v3: apply stateless skip-hint-set rules to appropriate direction only.
>> v3: added more background as to implementation in commit message.
>> v3: test stateless scenarios with ddlog too.
>> v3: rebased after conflict.
>> v4: introduce a separate allow-stateless ACL match verb.
>> v5: rebased.
>> v6: updated docs for new allow-stateless approach.
>> v6: removed no longer valid comments.
>> v6: removed acl_is_stateless.
>> v7: bump north db schema version to 5.31.0 -> 5.32.0.
>> v8: fixed checkpatch failure on a too long line in dbschema.
>> v8: added perf data on baremetal lab testing.
>> v9: fixed ovsdb checksum.
>> ---
>>  NEWS                    |   2 +
>>  northd/ovn-northd.8.xml |   8 +-
>>  northd/ovn-northd.c     |  65 ++++++++-
>>  northd/ovn_northd.dl    |  31 ++++
>>  ovn-nb.ovsschema        |   9 +-
>>  ovn-nb.xml              |   9 +-
>>  tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
>>  utilities/ovn-nbctl.c   |   6 +-
>>  8 files changed, 428 insertions(+), 11 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 1ddde15f8..d72139e76 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -11,6 +11,8 @@ Post-v21.03.0
>>      'use_parallel_build' to enable it.  It is disabled by default.
>>    - Support vlan-passthru mode for tag=0 localnet ports.
>>    - Support custom 802.11ad EthType for localnet ports.
>> +  - Introduce a new "allow-stateless" ACL verb to always bypass
> connection
>> +    tracking. The existing "allow" verb behavior is left intact.
>>
>>  OVN v21.03.0 - 12 Mar 2021
>>  -------------------------
>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>> index 54e88d3fa..08c4ea852 100644
>> --- a/northd/ovn-northd.8.xml
>> +++ b/northd/ovn-northd.8.xml
>> @@ -419,7 +419,9 @@
>>        before eventually advancing to ingress table <code>ACLs</code>. If
>>        special ports such as route ports or localnet ports can't use
> ct(), a
>>        priority-110 flow is added to skip over stateful ACLs. IPv6
> Neighbor
>> -      Discovery and MLD traffic also skips stateful ACLs.
>> +      Discovery and MLD traffic also skips stateful ACLs. For
> "allow-stateless"
>> +      ACLs, a flow is added to bypass setting the hint for connection
> tracker
>> +      processing.
> 
> This approach clearly achieves the goal of bypassing conntrack and there is
> great dataplane performance gain, but it also doubles the ACL logical flows
> which in turn could significantly increase the control plane cost,
> especially in ovn-controller. In large scale environments where lots of
> ACLs referencing large port-group/address-sets are used, it is already a
> bottleneck of ovn-controller flow computing. It would be better if we could
> avoid doubling this cost. Instead of directly add the ACL match here, can
> we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage
> pre-stateful add a higher priority flow, if this hint is matched then
> directly "next" without going through CT? I hope this can achieve the same
> goal but not increasing control plane cost. What do you think?
> 

Wouldn't this break ACL priority?  E.g., with the following ACLs:

ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless
ACL2: from-lport, prio=50, ip, drop

If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic
will be dropped by the lflow added for ACL2.

On the other hand, IIUC, this feature is designed to be used in
combination with "allow-related" ACLs defined on the same switch,
replacing "allow" ACLs in that case.

Before this change an "allow" ACL would've been treated as
"allow-related" if there was at least one other "allow-related" ACL
applied on the logical switch.  Essentially generating 2 logical flows
per "allow" ACL:

lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then
"REGBIT_CONNTRACK_COMMIT=1, next"
lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next"

If I'm not wrong, changing this ACL to "allow-stateless" would also
generate 2 logical flows, so at least from a number of logical flows
perspective, the result is the same.

Regards,
Dumitru
Han Zhou May 7, 2021, 11:05 p.m. UTC | #3
On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 5/7/21 7:35 AM, Han Zhou wrote:
> > Hi Ihar,
> >
> > Thanks for the patch, and it's great to see the significant performance
> > improvement!
> > Please see my comments below.
> >
> > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com>
wrote:
> >>
> >> For allow-stateless ACLs, bypass connection tracking by avoiding
> >> setting ct hints for matching traffic. Avoid sending all traffic to ct
> >> when a stateful ACL is present.
> >>
> >> ===
> >>
> >> Reusing an existing 'allow' verb for stateless matching would have its
> >> drawbacks, specifically, when it comes to backwards incompatibility of
> >> the new behavior with existing environments. When using "allow" ACLs
> >> in mixed allow/allow-related environment, we still commit "allow"
> >> traffic to conntrack. This unnecessarily hits performance when mixed
> >> ACL action types were used for the same datapath. This is why we
> >> introduce a new action verb to describe stateless behavior.
> >>
> >> Another complexity to consider is the fact that with stateless
> >> matching, one would not be able to rely on 'related' magic that
> >> guarantees that reply traffic is passed through. Instead, the user
> >> would have to accurately define matching rules both for request and
> >> reply directions of a protocol session. Specifically, when allowing
> >> ICMP for a specific peer host, one has to define 'allow-stateless'
> >> rules that would match against ip.dst for request direction and ip.src
> >> for reply direction. Other protocols and scenarios will require their
> >> own fine grained matching approaches implemented by the user.
> >>
> >> ===
> >>
> >> For performance measurements, qperf was used. Tests were executed on
two
> >> setups:
> >>
> >> 1) ovn-fake-multinode virtual environment;
> >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2
compute
> >>    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
> >>    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
> >>
> >> 1) ovn-fake-multinode:
> >>
> >> Performance measured between two virtual nodes, two ports
> >> that belong to different LSs connected via router. Using qperf,
> >> performance was measured for UDP, TCP, SCTP protocols (using
> >> <proto>_lat and <proto>_bw tests). The qperf version used:
> >> 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
> >> averages compared.
> >>
> >> Tests were executed with `allow-stateless` rules for the tested
> >> protocol and `allow-related` for another protocol set for both ports,
> >> both directions, e.g. for TCP scenario, the following ACLs were
> >> defined:
> >>
> >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
> >>
> >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
> >>
> >> In this particular environment, improvement was seen in send_bw,
> >> latency, and msg_rate measurements, where applicable, for all three
> >> protocols under test.
> >>
> >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
> >>          latency: 16 us => 14.08 us (-12%)
> >>          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
> >>
> >> for TCP, latency: 18.6 us => 14.88 us (-20%)
> >>          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
> >>
> >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
> >>           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
> >>
> >> Interestingly, some performance improvement was also seen for the same
> >> scenarios with no ACLs set at all, albeit significantly more
> >> negligible.
> >>
> >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
> >>          latency: 13.74 us => 12.88 us (-6.68%)
> >>          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
> >>
> >> for TCP, latency: 15.62 us => 14.26 us (-9.54%)
> >>          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
> >>
> >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
> >>           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
> >>
> >> 2) Supermicro RH-OSP cluster:
> >>
> >> Two scenarios executed:
> >> - connectivity between two ports on the same switch
> >> - connectivity between two ports on different switches connected via
> >>   router
> >>
> >> For the same switch, improvements are as follows:
> >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
> >>
> >> For different switches, improvements are as follows:
> >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
> >>
> >> The effect is more noticeable in same switch scenario.
> >
> > This is interesting. Any idea why with different switches the result is
so
> > different? I'd expect similar improvement just like in same switch,
because
> > in this kind of test the flow should be cached in kernel datapath
> > (regardless of same/different logical switches) and in both cases the
cost
> > of traversing conntrack is saved.
> >
> >>
> >> Comparable numbers can be captured with iperf. It may be useful to run
> >> more tests in a more elaborate (bare metal) environment.
> >>
> >> ===
> >>
> >> The patch takes inspiration from a now abandoned patch:
> >>
> >> "ovn-northd: Support mixing stateless/stateful ACLs with
> >> Stateless_Filter." by Dumitru Ceara.
> >>
> >> The original patch assumed CMS doesn't require full flexibility of
> >> matching rules for stateless matching (for example, to be used by
> >> OpenShift). But other CMS interfaces may require the same
> >> customizability for stateless as well as stateful matching, like in
> >> OpenStack Neutron API. Which is why this patch reuses existing ACL
> >> object type to describe stateless rules.
> >>
> >> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> >>
> >> ---
> >>
> >> v1: initial version.
> >> v2: rebased after conflict.
> >> v3: added ddlog implementation.
> >> v3: apply stateless skip-hint-set rules to appropriate direction only.
> >> v3: added more background as to implementation in commit message.
> >> v3: test stateless scenarios with ddlog too.
> >> v3: rebased after conflict.
> >> v4: introduce a separate allow-stateless ACL match verb.
> >> v5: rebased.
> >> v6: updated docs for new allow-stateless approach.
> >> v6: removed no longer valid comments.
> >> v6: removed acl_is_stateless.
> >> v7: bump north db schema version to 5.31.0 -> 5.32.0.
> >> v8: fixed checkpatch failure on a too long line in dbschema.
> >> v8: added perf data on baremetal lab testing.
> >> v9: fixed ovsdb checksum.
> >> ---
> >>  NEWS                    |   2 +
> >>  northd/ovn-northd.8.xml |   8 +-
> >>  northd/ovn-northd.c     |  65 ++++++++-
> >>  northd/ovn_northd.dl    |  31 ++++
> >>  ovn-nb.ovsschema        |   9 +-
> >>  ovn-nb.xml              |   9 +-
> >>  tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
> >>  utilities/ovn-nbctl.c   |   6 +-
> >>  8 files changed, 428 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index 1ddde15f8..d72139e76 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -11,6 +11,8 @@ Post-v21.03.0
> >>      'use_parallel_build' to enable it.  It is disabled by default.
> >>    - Support vlan-passthru mode for tag=0 localnet ports.
> >>    - Support custom 802.11ad EthType for localnet ports.
> >> +  - Introduce a new "allow-stateless" ACL verb to always bypass
> > connection
> >> +    tracking. The existing "allow" verb behavior is left intact.
> >>
> >>  OVN v21.03.0 - 12 Mar 2021
> >>  -------------------------
> >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> >> index 54e88d3fa..08c4ea852 100644
> >> --- a/northd/ovn-northd.8.xml
> >> +++ b/northd/ovn-northd.8.xml
> >> @@ -419,7 +419,9 @@
> >>        before eventually advancing to ingress table <code>ACLs</code>.
If
> >>        special ports such as route ports or localnet ports can't use
> > ct(), a
> >>        priority-110 flow is added to skip over stateful ACLs. IPv6
> > Neighbor
> >> -      Discovery and MLD traffic also skips stateful ACLs.
> >> +      Discovery and MLD traffic also skips stateful ACLs. For
> > "allow-stateless"
> >> +      ACLs, a flow is added to bypass setting the hint for connection
> > tracker
> >> +      processing.
> >
> > This approach clearly achieves the goal of bypassing conntrack and
there is
> > great dataplane performance gain, but it also doubles the ACL logical
flows
> > which in turn could significantly increase the control plane cost,
> > especially in ovn-controller. In large scale environments where lots of
> > ACLs referencing large port-group/address-sets are used, it is already a
> > bottleneck of ovn-controller flow computing. It would be better if we
could
> > avoid doubling this cost. Instead of directly add the ACL match here,
can
> > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage
> > pre-stateful add a higher priority flow, if this hint is matched then
> > directly "next" without going through CT? I hope this can achieve the
same
> > goal but not increasing control plane cost. What do you think?
> >
>
> Wouldn't this break ACL priority?  E.g., with the following ACLs:
>
> ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless
> ACL2: from-lport, prio=50, ip, drop
>
> If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic
> will be dropped by the lflow added for ACL2.
>
You are right. Actually I was thinking about matching the hint again in acl
stage and just move next, but it would skip all the stateful ACLs that
could have a higher priority. This would work only if the stateless ACLs
can always be treated as higher priority than stateful ACLs, but it seems
unreasonable. It seems all the ACL matches have to be evaluated at the same
stage to ensure the priorities are followed. So, forget about this proposal.

> On the other hand, IIUC, this feature is designed to be used in
> combination with "allow-related" ACLs defined on the same switch,
> replacing "allow" ACLs in that case.
>
> Before this change an "allow" ACL would've been treated as
> "allow-related" if there was at least one other "allow-related" ACL
> applied on the logical switch.  Essentially generating 2 logical flows
> per "allow" ACL:
>
> lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then
> "REGBIT_CONNTRACK_COMMIT=1, next"
> lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next"
>
> If I'm not wrong, changing this ACL to "allow-stateless" would also
> generate 2 logical flows, so at least from a number of logical flows
> perspective, the result is the same.
>
You are right. But this also means that when there is no stateful rules,
"allow-stateless" is worse than "allow" because "allow" would generate only
a single lflow in that case, while "allow-stateless" always generates
doubled lflows. So, I think at least we could do the simple improvement to
make "allow-stateless" perform the same as "allow" when there are NO
stateful ACLs. (Otherwise, my comment on the documentation regarding
'"allow-stateless" is equivalent to "allow" when there are no stateful
ACLs' would be wrong). We can move the logic of adding the stateless
matching flows of pre-acl stage into a if-block (only when has_stateful is
true). Would this work?

Thanks,
Han
Ihar Hrachyshka May 10, 2021, 1:28 a.m. UTC | #4
Hi Han,

On Fri, May 7, 2021 at 1:58 AM Han Zhou <hzhou@ovn.org> wrote:
>
> Hi Ihar,
>
> Thanks for the patch, and it's great to see the significant performance improvement!
> Please see my comments below.
>
> On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
> >
> > For allow-stateless ACLs, bypass connection tracking by avoiding
> > setting ct hints for matching traffic. Avoid sending all traffic to ct
> > when a stateful ACL is present.
> >
> > ===
> >
> > Reusing an existing 'allow' verb for stateless matching would have its
> > drawbacks, specifically, when it comes to backwards incompatibility of
> > the new behavior with existing environments. When using "allow" ACLs
> > in mixed allow/allow-related environment, we still commit "allow"
> > traffic to conntrack. This unnecessarily hits performance when mixed
> > ACL action types were used for the same datapath. This is why we
> > introduce a new action verb to describe stateless behavior.
> >
> > Another complexity to consider is the fact that with stateless
> > matching, one would not be able to rely on 'related' magic that
> > guarantees that reply traffic is passed through. Instead, the user
> > would have to accurately define matching rules both for request and
> > reply directions of a protocol session. Specifically, when allowing
> > ICMP for a specific peer host, one has to define 'allow-stateless'
> > rules that would match against ip.dst for request direction and ip.src
> > for reply direction. Other protocols and scenarios will require their
> > own fine grained matching approaches implemented by the user.
> >
> > ===
> >
> > For performance measurements, qperf was used. Tests were executed on two
> > setups:
> >
> > 1) ovn-fake-multinode virtual environment;
> > 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute
> >    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
> >    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
> >
> > 1) ovn-fake-multinode:
> >
> > Performance measured between two virtual nodes, two ports
> > that belong to different LSs connected via router. Using qperf,
> > performance was measured for UDP, TCP, SCTP protocols (using
> > <proto>_lat and <proto>_bw tests). The qperf version used:
> > 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
> > averages compared.
> >
> > Tests were executed with `allow-stateless` rules for the tested
> > protocol and `allow-related` for another protocol set for both ports,
> > both directions, e.g. for TCP scenario, the following ACLs were
> > defined:
> >
> > ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> > ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> > ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> > ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
> >
> > ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> > ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> > ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> > ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
> >
> > In this particular environment, improvement was seen in send_bw,
> > latency, and msg_rate measurements, where applicable, for all three
> > protocols under test.
> >
> > for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
> >          latency: 16 us => 14.08 us (-12%)
> >          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
> >
> > for TCP, latency: 18.6 us => 14.88 us (-20%)
> >          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
> >
> > for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
> >           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
> >
> > Interestingly, some performance improvement was also seen for the same
> > scenarios with no ACLs set at all, albeit significantly more
> > negligible.
> >
> > for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
> >          latency: 13.74 us => 12.88 us (-6.68%)
> >          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
> >
> > for TCP, latency: 15.62 us => 14.26 us (-9.54%)
> >          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
> >
> > for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
> >           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
> >
> > 2) Supermicro RH-OSP cluster:
> >
> > Two scenarios executed:
> > - connectivity between two ports on the same switch
> > - connectivity between two ports on different switches connected via
> >   router
> >
> > For the same switch, improvements are as follows:
> > - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> > - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
> >
> > For different switches, improvements are as follows:
> > - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> > - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
> >
> > The effect is more noticeable in same switch scenario.
>
> This is interesting. Any idea why with different switches the result is so different? I'd expect similar improvement just like in same switch, because in this kind of test the flow should be cached in kernel datapath (regardless of same/different logical switches) and in both cases the cost of traversing conntrack is saved.
>

Thanks for bringing this up. I've made some repeated measurements with
more test iterations to avoid flukes, also noticed that sometimes
qperf server loses connectivity to the client and reports incorrect
data, so avoided spurious runs in repeated tests. After doing all
that, I get numbers that are a lot closer for both scenarios, though
still I see some reduction in latency (f.e. 13.9% improvement for UDP
latency goes slightly down to 11.9%). One other thing I notice is that
TCP, for some reason, gains a lot less from the changes than UDP. My
knowledge is not enough to explain the difference though. :(

> >
> > Comparable numbers can be captured with iperf. It may be useful to run
> > more tests in a more elaborate (bare metal) environment.
> >
> > ===
> >
> > The patch takes inspiration from a now abandoned patch:
> >
> > "ovn-northd: Support mixing stateless/stateful ACLs with
> > Stateless_Filter." by Dumitru Ceara.
> >
> > The original patch assumed CMS doesn't require full flexibility of
> > matching rules for stateless matching (for example, to be used by
> > OpenShift). But other CMS interfaces may require the same
> > customizability for stateless as well as stateful matching, like in
> > OpenStack Neutron API. Which is why this patch reuses existing ACL
> > object type to describe stateless rules.
> >
> > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> >
> > ---
> >
> > v1: initial version.
> > v2: rebased after conflict.
> > v3: added ddlog implementation.
> > v3: apply stateless skip-hint-set rules to appropriate direction only.
> > v3: added more background as to implementation in commit message.
> > v3: test stateless scenarios with ddlog too.
> > v3: rebased after conflict.
> > v4: introduce a separate allow-stateless ACL match verb.
> > v5: rebased.
> > v6: updated docs for new allow-stateless approach.
> > v6: removed no longer valid comments.
> > v6: removed acl_is_stateless.
> > v7: bump north db schema version to 5.31.0 -> 5.32.0.
> > v8: fixed checkpatch failure on a too long line in dbschema.
> > v8: added perf data on baremetal lab testing.
> > v9: fixed ovsdb checksum.
> > ---
> >  NEWS                    |   2 +
> >  northd/ovn-northd.8.xml |   8 +-
> >  northd/ovn-northd.c     |  65 ++++++++-
> >  northd/ovn_northd.dl    |  31 ++++
> >  ovn-nb.ovsschema        |   9 +-
> >  ovn-nb.xml              |   9 +-
> >  tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
> >  utilities/ovn-nbctl.c   |   6 +-
> >  8 files changed, 428 insertions(+), 11 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 1ddde15f8..d72139e76 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -11,6 +11,8 @@ Post-v21.03.0
> >      'use_parallel_build' to enable it.  It is disabled by default.
> >    - Support vlan-passthru mode for tag=0 localnet ports.
> >    - Support custom 802.11ad EthType for localnet ports.
> > +  - Introduce a new "allow-stateless" ACL verb to always bypass connection
> > +    tracking. The existing "allow" verb behavior is left intact.
> >
> >  OVN v21.03.0 - 12 Mar 2021
> >  -------------------------
> > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > index 54e88d3fa..08c4ea852 100644
> > --- a/northd/ovn-northd.8.xml
> > +++ b/northd/ovn-northd.8.xml
> > @@ -419,7 +419,9 @@
> >        before eventually advancing to ingress table <code>ACLs</code>. If
> >        special ports such as route ports or localnet ports can't use ct(), a
> >        priority-110 flow is added to skip over stateful ACLs. IPv6 Neighbor
> > -      Discovery and MLD traffic also skips stateful ACLs.
> > +      Discovery and MLD traffic also skips stateful ACLs. For "allow-stateless"
> > +      ACLs, a flow is added to bypass setting the hint for connection tracker
> > +      processing.
>
> This approach clearly achieves the goal of bypassing conntrack and there is great dataplane performance gain, but it also doubles the ACL logical flows which in turn could significantly increase the control plane cost, especially in ovn-controller. In large scale environments where lots of ACLs referencing large port-group/address-sets are used, it is already a bottleneck of ovn-controller flow computing. It would be better if we could avoid doubling this cost. Instead of directly add the ACL match here, can we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage pre-stateful add a higher priority flow, if this hint is matched then directly "next" without going through CT? I hope this can achieve the same goal but not increasing control plane cost. What do you think?

I believe this is already addressed somewhere else in the thread.
There should be no difference in the number of flows in general case,
though I see your point about a potential optimization where we would
treat allow-stateless as allow when no allow-related ACLs present. If
possible, I would prefer to make it a follow-up for this patch because
of close release. (I'll send the patch to implement the proposal
tomorrow.)

>
> >      </p>
> >
> >      <p>
> > @@ -614,6 +616,10 @@
> >          for new connections and <code>reg0[1] = 1; next;</code> for existing
> >          connections.
> >        </li>
> > +      <li>
> > +        <code>allow-stateless</code> ACLs translate into logical
> > +        flows with the <code>next;</code> action.
> > +      </li>
> >        <li>
> >          <code>reject</code> ACLs translate into logical
> >          flows with the
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 94fae5648..4032f1441 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -4974,7 +4974,52 @@ skip_port_from_conntrack(struct ovn_datapath *od, struct ovn_port *op,
> >  }
> >
> >  static void
> > -build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
> > +build_stateless_filter(struct ovn_datapath *od,
> > +                       const struct nbrec_acl *acl,
> > +                       struct hmap *lflows)
> > +{
> > +    if (!strcmp(acl->direction, "from-lport")) {
> > +        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL,
> > +                                acl->priority + OVN_ACL_PRI_OFFSET,
> > +                                acl->match,
> > +                                "next;",
> > +                                &acl->header_);
> > +    } else {
> > +        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL,
> > +                                acl->priority + OVN_ACL_PRI_OFFSET,
> > +                                acl->match,
> > +                                "next;",
> > +                                &acl->header_);
> > +    }
> > +}
> > +
> > +static void
> > +build_stateless_filters(struct ovn_datapath *od, struct hmap *port_groups,
> > +                        struct hmap *lflows)
> > +{
> > +    for (size_t i = 0; i < od->nbs->n_acls; i++) {
> > +        const struct nbrec_acl *acl = od->nbs->acls[i];
> > +        if (!strcmp(acl->action, "allow-stateless")) {
> > +            build_stateless_filter(od, acl, lflows);
> > +        }
> > +    }
> > +
> > +    struct ovn_port_group *pg;
> > +    HMAP_FOR_EACH (pg, key_node, port_groups) {
> > +        if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) {
> > +            for (size_t i = 0; i < pg->nb_pg->n_acls; i++) {
> > +                const struct nbrec_acl *acl = pg->nb_pg->acls[i];
> > +                if (!strcmp(acl->action, "allow-stateless")) {
> > +                    build_stateless_filter(od, acl, lflows);
> > +                }
> > +            }
> > +        }
> > +    }
> > +}
> > +
> > +static void
> > +build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups,
> > +               struct hmap *lflows)
> >  {
> >      /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are
> >       * allowed by default. */
> > @@ -5002,6 +5047,8 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
> >                                       110, lflows);
> >          }
> >
>
> nit: there is a comment here several lines above this that needs to be updated:

thanks, adjusted it somewhat

>
> |     /* If there are any stateful ACL rules in this datapath, we must
> |      * send all IP packets through the conntrack action, which handles
> |      * defragmentation, in order to match L4 headers. */
>
> > +        build_stateless_filters(od, port_groups, lflows);
> > +
> >          /* Ingress and Egress Pre-ACL Table (Priority 110).
> >           *
> >           * Not to do conntrack on ND and ICMP destination
> > @@ -5369,7 +5416,8 @@ build_acl_log(struct ds *actions, const struct nbrec_acl *acl,
> >      } else if (!strcmp(acl->action, "reject")) {
> >          ds_put_cstr(actions, "verdict=reject, ");
> >      } else if (!strcmp(acl->action, "allow")
> > -        || !strcmp(acl->action, "allow-related")) {
> > +        || !strcmp(acl->action, "allow-related")
> > +        || !strcmp(acl->action, "allow-stateless")) {
> >          ds_put_cstr(actions, "verdict=allow, ");
> >      }
> >
> > @@ -5428,7 +5476,16 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
> >      bool ingress = !strcmp(acl->direction, "from-lport") ? true :false;
> >      enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL;
> >
> > -    if (!strcmp(acl->action, "allow")
> > +    if (!strcmp(acl->action, "allow-stateless")) {
> > +        struct ds actions = DS_EMPTY_INITIALIZER;
> > +        build_acl_log(&actions, acl, meter_groups);
> > +        ds_put_cstr(&actions, "next;");
> > +        ovn_lflow_add_with_hint(lflows, od, stage,
> > +                                acl->priority + OVN_ACL_PRI_OFFSET,
> > +                                acl->match, ds_cstr(&actions),
> > +                                &acl->header_);
> > +        ds_destroy(&actions);
> > +    } else if (!strcmp(acl->action, "allow")
> >          || !strcmp(acl->action, "allow-related")) {
> >          /* If there are any stateful flows, we must even commit "allow"
> >           * actions.  This is because, while the initiater's
> > @@ -6828,7 +6885,7 @@ build_lswitch_lflows_pre_acl_and_acl(struct ovn_datapath *od,
> >          od->has_stateful_acl = ls_has_stateful_acl(od);
> >          od->has_lb_vip = ls_has_lb_vip(od);
> >
> > -        build_pre_acls(od, lflows);
> > +        build_pre_acls(od, port_groups, lflows);
> >          build_pre_lb(od, lflows, meter_groups, lbs);
> >          build_pre_stateful(od, lflows);
> >          build_acl_hints(od, lflows);
> > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> > index 7953325aa..5ae52b3e8 100644
> > --- a/northd/ovn_northd.dl
> > +++ b/northd/ovn_northd.dl
> > @@ -1824,6 +1824,27 @@ for (&Switch(.ls =ls)) {
> >           .external_ids     = map_empty())
> >  }
> >
> > +for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_meter)) {
> > +    if (sw.has_stateful_acl) {
> > +        if (acl.action == "allow-stateless") {
> > +            if (acl.direction == "from-lport") {
> > +                Flow(.logical_datapath = ls._uuid,
> > +                     .stage            = s_SWITCH_IN_PRE_ACL(),
> > +                     .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
> > +                     .__match          = acl.__match,
> > +                     .actions          = "next;",
> > +                     .external_ids     = stage_hint(acl._uuid))
> > +            } else {
> > +                Flow(.logical_datapath = ls._uuid,
> > +                     .stage            = s_SWITCH_OUT_PRE_ACL(),
> > +                     .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
> > +                     .__match          = acl.__match,
> > +                     .actions          = "next;",
> > +                     .external_ids     = stage_hint(acl._uuid))
> > +            }
> > +        }
> > +    }
> > +}
> >
> >  /* If there are any stateful ACL rules in this datapath, we must
> >   * send all IP packets through the conntrack action, which handles
> > @@ -2168,6 +2189,9 @@ function build_acl_log(acl: nb::ACL, fair_meter: bool): string =
> >              "allow-related" -> {
> >                  strs.push("verdict=allow")
> >              },
> > +            "allow-stateless" -> {
> > +                strs.push("verdict=allow")
> > +            },
> >              _ -> ()
> >          };
> >          match (acl.meter) {
> > @@ -2546,6 +2570,13 @@ for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_
> >                   .actions          = "${acl_log}next;",
> >                   .external_ids     = stage_hint)
> >          }
> > +    } else if (acl.action == "allow-stateless") {
> > +        Flow(.logical_datapath = ls._uuid,
> > +             .stage            = stage,
> > +             .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
> > +             .__match          = acl.__match,
> > +             .actions          = "${acl_log}next;",
> > +             .external_ids     = stage_hint)
> >      } else if (acl.action == "drop" or acl.action == "reject") {
> >          /* The implementation of "drop" differs if stateful ACLs are in
> >           * use for this datapath.  In that case, the actions differ
> > diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema
> > index 29019809c..faf619a1c 100644
> > --- a/ovn-nb.ovsschema
> > +++ b/ovn-nb.ovsschema
> > @@ -1,7 +1,7 @@
> >  {
> >      "name": "OVN_Northbound",
> > -    "version": "5.31.0",
> > -    "cksum": "2352750632 28701",
> > +    "version": "5.32.0",
> > +    "cksum": "204590300 28863",
> >      "tables": {
> >          "NB_Global": {
> >              "columns": {
> > @@ -221,7 +221,10 @@
> >                                              "enum": ["set", ["from-lport", "to-lport"]]}}},
> >                  "match": {"type": "string"},
> >                  "action": {"type": {"key": {"type": "string",
> > -                                            "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}},
> > +                                            "enum": ["set",
> > +                                               ["allow", "allow-related",
> > +                                                "allow-stateless", "drop",
> > +                                                "reject"]]}}},
> >                  "log": {"type": "boolean"},
> >                  "severity": {"type": {"key": {"type": "string",
> >                                                "enum": ["set",
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index feb38b5d3..5386c8fef 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -1791,7 +1791,9 @@
> >        <p>The action to take when the ACL rule matches:</p>
> >        <ul>
> >          <li>
> > -          <code>allow</code>: Forward the packet.
> > +          <code>allow</code>: Forward the packet. It will also send the
> > +          packets through connection tracking when
> > +          <code>allow-related</code> rules exist on the logical switch.
> >          </li>
>
> Thanks for updating the documentation. It is really helpful for understanding the difference between "allow" and "allow-stateless". I think it could be even more clear if we just say something like: when there is no "allow-related" rules exist on the logical switch, "allow" is equivalent to "allow-stateless". Otherwise, it is equivalent to "allow-related". (And the order of "allow-stateless" and "allow" may be switched)

I did something similar though not verbatim in the latest upload.
Please take a look if it's better.

>
> >
> >          <li>
> > @@ -1799,6 +1801,11 @@
> >            (e.g. inbound replies to an outbound connection).
> >          </li>
> >
> > +        <li>
> > +          <code>allow-stateless</code>: Always forward the packet in stateless
> > +          manner. May require defining additional rules for inbound replies.
> > +        </li>
> > +
> It would be better to clarify more on the "inbound replies" part. The commit message has more details, but it is better to make sure the users understand it by just reading this text. We may also emphasise that the packets will not go through conntrack table, even if there are other "allow-related" rules co-exist.

You are right, I've expanded a bit about matching peer rules. Also
added on contrack.

>
> Thanks,
> Han
>
> >          <li>
> >            <code>drop</code>: Silently drop the packet.
> >          </li>
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index 32afb4fa8..f62526fd3 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -2592,6 +2592,315 @@ sed 's/reg8\[[0..15\]] == [[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0],
> >  AT_CLEANUP
> >  ])
> >
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Logical_Switch])
> > +ovn_start
> > +
> > +ovn-nbctl ls-add ls
> > +ovn-nbctl lsp-add ls lsp1
> > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
> > +ovn-nbctl lsp-add ls lsp2
> > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
> > +
> > +for direction in from to; do
> > +    ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related
> > +    ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related
> > +    ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop
> > +done
> > +ovn-nbctl --wait=sb sync
> > +
> > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66'
> > +flow_tcp='tcp && tcp.dst == 80'
> > +flow_udp='udp && udp.dst == 80'
> > +
> > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
> > +
> > +# TCP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +ct_next(ct_state=new|trk) {
> > +    ct_next(ct_state=new|trk) {
> > +        output("lsp2");
> > +    };
> > +};
> > +])
> > +
> > +# UDP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_next(ct_state=new|trk) {
> > +        output("lsp2");
> > +    };
> > +};
> > +])
> > +
> > +# Allow stateless for TCP.
> > +for direction in from to; do
> > +    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
> > +done
> > +ovn-nbctl --wait=sb sync
> > +
> > +# TCP packets should not go to conntrack anymore.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +output("lsp2");
> > +])
> > +
> > +# UDP packets still go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_next(ct_state=new|trk) {
> > +        output("lsp2");
> > +    };
> > +};
> > +])
> > +
> > +# Add a load balancer.
> > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
> > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
> > +ovn-nbctl ls-lb-add ls lb-tcp
> > +ovn-nbctl ls-lb-add ls lb-udp
> > +
> > +# Remove stateless for TCP.
> > +ovn-nbctl acl-del ls
> > +ovn-nbctl --wait=sb sync
> > +
> > +# TCP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +# UDP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +# Allow stateless for TCP.
> > +for direction in from to; do
> > +    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
> > +done
> > +ovn-nbctl --wait=sb sync
> > +
> > +# TCP packets should go to conntrack for load balancing.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +# UDP packets still go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +AT_CLEANUP
> > +])
> > +
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group])
> > +ovn_start
> > +
> > +ovn-nbctl ls-add ls
> > +ovn-nbctl lsp-add ls lsp1
> > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
> > +ovn-nbctl lsp-add ls lsp2
> > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
> > +
> > +ovn-nbctl pg-add pg lsp1 lsp2
> > +
> > +for direction in from to; do
> > +    ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related
> > +    ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related
> > +    ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop
> > +done
> > +ovn-nbctl --wait=sb sync
> > +
> > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
> > +echo $lsp1_inport
> > +
> > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66'
> > +flow_tcp='tcp && tcp.dst == 80'
> > +flow_udp='udp && udp.dst == 80'
> > +
> > +# TCP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +ct_next(ct_state=new|trk) {
> > +    ct_next(ct_state=new|trk) {
> > +        output("lsp2");
> > +    };
> > +};
> > +])
> > +
> > +# UDP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_next(ct_state=new|trk) {
> > +        output("lsp2");
> > +    };
> > +};
> > +])
> > +
> > +# Allow stateless for TCP.
> > +for direction in from to; do
> > +    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
> > +done
> > +ovn-nbctl --wait=sb sync
> > +
> > +# TCP packets should not go to conntrack anymore.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +output("lsp2");
> > +])
> > +
> > +# UDP packets still go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_next(ct_state=new|trk) {
> > +        output("lsp2");
> > +    };
> > +};
> > +])
> > +
> > +# Add a load balancer.
> > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
> > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
> > +ovn-nbctl ls-lb-add ls lb-tcp
> > +ovn-nbctl ls-lb-add ls lb-udp
> > +
> > +# Remove stateless for TCP.
> > +ovn-nbctl acl-del pg
> > +ovn-nbctl --wait=sb sync
> > +
> > +# TCP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +# UDP packets should go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +# Allow stateless for TCP.
> > +for direction in from to; do
> > +    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
> > +done
> > +ovn-nbctl --wait=sb sync
> > +
> > +# TCP packets should go to conntrack for load balancing.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +# UDP packets still go to conntrack.
> > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
> > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > +ct_next(ct_state=new|trk) {
> > +    ct_lb {
> > +        reg0[[6]] = 0;
> > +        *** chk_lb_hairpin_reply action not implemented;
> > +        reg0[[12]] = 0;
> > +        ct_next(ct_state=new|trk) {
> > +            output("lsp2");
> > +        };
> > +    };
> > +};
> > +])
> > +
> > +AT_CLEANUP
> > +])
> > +
> >  OVN_FOR_EACH_NORTHD([
> >  AT_SETUP([ovn -- check BFD config propagation to SBDB])
> >  AT_KEYWORDS([northd-bfd])
> > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
> > index 042c21002..48fd0b7ee 100644
> > --- a/utilities/ovn-nbctl.c
> > +++ b/utilities/ovn-nbctl.c
> > @@ -2303,9 +2303,11 @@ nbctl_acl_add(struct ctl_context *ctx)
> >
> >      /* Validate action. */
> >      if (strcmp(action, "allow") && strcmp(action, "allow-related")
> > -        && strcmp(action, "drop") && strcmp(action, "reject")) {
> > +        && strcmp(action, "allow-stateless") && strcmp(action, "drop")
> > +        && strcmp(action, "reject")) {
> >          ctl_error(ctx, "%s: action must be one of \"allow\", "
> > -                  "\"allow-related\", \"drop\", and \"reject\"", action);
> > +                  "\"allow-related\", \"allow-stateless\", \"drop\", "
> > +                  "and \"reject\"", action);
> >          return;
> >      }
> >
> > --
> > 2.31.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Han Zhou May 10, 2021, 6:43 a.m. UTC | #5
On Sun, May 9, 2021 at 6:28 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>
> Hi Han,
>
> On Fri, May 7, 2021 at 1:58 AM Han Zhou <hzhou@ovn.org> wrote:
> >
> > Hi Ihar,
> >
> > Thanks for the patch, and it's great to see the significant performance
improvement!
> > Please see my comments below.
> >
> > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com>
wrote:
> > >
> > > For allow-stateless ACLs, bypass connection tracking by avoiding
> > > setting ct hints for matching traffic. Avoid sending all traffic to ct
> > > when a stateful ACL is present.
> > >
> > > ===
> > >
> > > Reusing an existing 'allow' verb for stateless matching would have its
> > > drawbacks, specifically, when it comes to backwards incompatibility of
> > > the new behavior with existing environments. When using "allow" ACLs
> > > in mixed allow/allow-related environment, we still commit "allow"
> > > traffic to conntrack. This unnecessarily hits performance when mixed
> > > ACL action types were used for the same datapath. This is why we
> > > introduce a new action verb to describe stateless behavior.
> > >
> > > Another complexity to consider is the fact that with stateless
> > > matching, one would not be able to rely on 'related' magic that
> > > guarantees that reply traffic is passed through. Instead, the user
> > > would have to accurately define matching rules both for request and
> > > reply directions of a protocol session. Specifically, when allowing
> > > ICMP for a specific peer host, one has to define 'allow-stateless'
> > > rules that would match against ip.dst for request direction and ip.src
> > > for reply direction. Other protocols and scenarios will require their
> > > own fine grained matching approaches implemented by the user.
> > >
> > > ===
> > >
> > > For performance measurements, qperf was used. Tests were executed on
two
> > > setups:
> > >
> > > 1) ovn-fake-multinode virtual environment;
> > > 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2
compute
> > >    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
> > >    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
> > >
> > > 1) ovn-fake-multinode:
> > >
> > > Performance measured between two virtual nodes, two ports
> > > that belong to different LSs connected via router. Using qperf,
> > > performance was measured for UDP, TCP, SCTP protocols (using
> > > <proto>_lat and <proto>_bw tests). The qperf version used:
> > > 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
> > > averages compared.
> > >
> > > Tests were executed with `allow-stateless` rules for the tested
> > > protocol and `allow-related` for another protocol set for both ports,
> > > both directions, e.g. for TCP scenario, the following ACLs were
> > > defined:
> > >
> > > ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> > > ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> > > ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> > > ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
> > >
> > > ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> > > ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> > > ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> > > ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
> > >
> > > In this particular environment, improvement was seen in send_bw,
> > > latency, and msg_rate measurements, where applicable, for all three
> > > protocols under test.
> > >
> > > for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
> > >          latency: 16 us => 14.08 us (-12%)
> > >          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
> > >
> > > for TCP, latency: 18.6 us => 14.88 us (-20%)
> > >          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
> > >
> > > for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
> > >           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
> > >
> > > Interestingly, some performance improvement was also seen for the same
> > > scenarios with no ACLs set at all, albeit significantly more
> > > negligible.
> > >
> > > for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
> > >          latency: 13.74 us => 12.88 us (-6.68%)
> > >          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
> > >
> > > for TCP, latency: 15.62 us => 14.26 us (-9.54%)
> > >          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
> > >
> > > for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
> > >           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
> > >
> > > 2) Supermicro RH-OSP cluster:
> > >
> > > Two scenarios executed:
> > > - connectivity between two ports on the same switch
> > > - connectivity between two ports on different switches connected via
> > >   router
> > >
> > > For the same switch, improvements are as follows:
> > > - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> > > - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
> > >
> > > For different switches, improvements are as follows:
> > > - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> > > - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
> > >
> > > The effect is more noticeable in same switch scenario.
> >
> > This is interesting. Any idea why with different switches the result is
so different? I'd expect similar improvement just like in same switch,
because in this kind of test the flow should be cached in kernel datapath
(regardless of same/different logical switches) and in both cases the cost
of traversing conntrack is saved.
> >
>
> Thanks for bringing this up. I've made some repeated measurements with
> more test iterations to avoid flukes, also noticed that sometimes
> qperf server loses connectivity to the client and reports incorrect
> data, so avoided spurious runs in repeated tests. After doing all
> that, I get numbers that are a lot closer for both scenarios, though
> still I see some reduction in latency (f.e. 13.9% improvement for UDP
> latency goes slightly down to 11.9%). One other thing I notice is that
> TCP, for some reason, gains a lot less from the changes than UDP. My
> knowledge is not enough to explain the difference though. :(
>

Thanks for retesting and I checked the new data in your v10 patch, which
looks more consistent.
Here is my guess to why TCP shows less gains. Since TCP itself requires
states on both source and destination's stack, so the conntrack cost
introduced by OVN contributes less significant part of the e2e cost
comparing with UDP. So when this OVN conntrack is removed, TCP shows less
gains than UDP. TCP still shows obvious improvement in latency, because
there is absolutely a saving of conntrack cost from the e2e datapath, but
it shows almost no improvement in bandwidth, probably because this
conntrack wasn't the bottleneck between the source and destination, so when
throughput is considered it is almost the same, and I think this is ok and
nothing wrong about it. Does this sound reasonable?

> > >
> > > Comparable numbers can be captured with iperf. It may be useful to run
> > > more tests in a more elaborate (bare metal) environment.
> > >
> > > ===
> > >
> > > The patch takes inspiration from a now abandoned patch:
> > >
> > > "ovn-northd: Support mixing stateless/stateful ACLs with
> > > Stateless_Filter." by Dumitru Ceara.
> > >
> > > The original patch assumed CMS doesn't require full flexibility of
> > > matching rules for stateless matching (for example, to be used by
> > > OpenShift). But other CMS interfaces may require the same
> > > customizability for stateless as well as stateful matching, like in
> > > OpenStack Neutron API. Which is why this patch reuses existing ACL
> > > object type to describe stateless rules.
> > >
> > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> > >
> > > ---
> > >
> > > v1: initial version.
> > > v2: rebased after conflict.
> > > v3: added ddlog implementation.
> > > v3: apply stateless skip-hint-set rules to appropriate direction only.
> > > v3: added more background as to implementation in commit message.
> > > v3: test stateless scenarios with ddlog too.
> > > v3: rebased after conflict.
> > > v4: introduce a separate allow-stateless ACL match verb.
> > > v5: rebased.
> > > v6: updated docs for new allow-stateless approach.
> > > v6: removed no longer valid comments.
> > > v6: removed acl_is_stateless.
> > > v7: bump north db schema version to 5.31.0 -> 5.32.0.
> > > v8: fixed checkpatch failure on a too long line in dbschema.
> > > v8: added perf data on baremetal lab testing.
> > > v9: fixed ovsdb checksum.
> > > ---
> > >  NEWS                    |   2 +
> > >  northd/ovn-northd.8.xml |   8 +-
> > >  northd/ovn-northd.c     |  65 ++++++++-
> > >  northd/ovn_northd.dl    |  31 ++++
> > >  ovn-nb.ovsschema        |   9 +-
> > >  ovn-nb.xml              |   9 +-
> > >  tests/ovn-northd.at     | 309
++++++++++++++++++++++++++++++++++++++++
> > >  utilities/ovn-nbctl.c   |   6 +-
> > >  8 files changed, 428 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 1ddde15f8..d72139e76 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -11,6 +11,8 @@ Post-v21.03.0
> > >      'use_parallel_build' to enable it.  It is disabled by default.
> > >    - Support vlan-passthru mode for tag=0 localnet ports.
> > >    - Support custom 802.11ad EthType for localnet ports.
> > > +  - Introduce a new "allow-stateless" ACL verb to always bypass
connection
> > > +    tracking. The existing "allow" verb behavior is left intact.
> > >
> > >  OVN v21.03.0 - 12 Mar 2021
> > >  -------------------------
> > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > > index 54e88d3fa..08c4ea852 100644
> > > --- a/northd/ovn-northd.8.xml
> > > +++ b/northd/ovn-northd.8.xml
> > > @@ -419,7 +419,9 @@
> > >        before eventually advancing to ingress table
<code>ACLs</code>. If
> > >        special ports such as route ports or localnet ports can't use
ct(), a
> > >        priority-110 flow is added to skip over stateful ACLs. IPv6
Neighbor
> > > -      Discovery and MLD traffic also skips stateful ACLs.
> > > +      Discovery and MLD traffic also skips stateful ACLs. For
"allow-stateless"
> > > +      ACLs, a flow is added to bypass setting the hint for
connection tracker
> > > +      processing.
> >
> > This approach clearly achieves the goal of bypassing conntrack and
there is great dataplane performance gain, but it also doubles the ACL
logical flows which in turn could significantly increase the control plane
cost, especially in ovn-controller. In large scale environments where lots
of ACLs referencing large port-group/address-sets are used, it is already a
bottleneck of ovn-controller flow computing. It would be better if we could
avoid doubling this cost. Instead of directly add the ACL match here, can
we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage
pre-stateful add a higher priority flow, if this hint is matched then
directly "next" without going through CT? I hope this can achieve the same
goal but not increasing control plane cost. What do you think?
>
> I believe this is already addressed somewhere else in the thread.
> There should be no difference in the number of flows in general case,
> though I see your point about a potential optimization where we would
> treat allow-stateless as allow when no allow-related ACLs present. If
> possible, I would prefer to make it a follow-up for this patch because
> of close release. (I'll send the patch to implement the proposal
> tomorrow.)
>

I am ok with a follow-up patch. (but don't worry about the release, because
this patch was sent before the soft freeze date so it should have secured
its position in this release :))

> >
> > >      </p>
> > >
> > >      <p>
> > > @@ -614,6 +616,10 @@
> > >          for new connections and <code>reg0[1] = 1; next;</code> for
existing
> > >          connections.
> > >        </li>
> > > +      <li>
> > > +        <code>allow-stateless</code> ACLs translate into logical
> > > +        flows with the <code>next;</code> action.
> > > +      </li>
> > >        <li>
> > >          <code>reject</code> ACLs translate into logical
> > >          flows with the
> > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > index 94fae5648..4032f1441 100644
> > > --- a/northd/ovn-northd.c
> > > +++ b/northd/ovn-northd.c
> > > @@ -4974,7 +4974,52 @@ skip_port_from_conntrack(struct ovn_datapath
*od, struct ovn_port *op,
> > >  }
> > >
> > >  static void
> > > -build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
> > > +build_stateless_filter(struct ovn_datapath *od,
> > > +                       const struct nbrec_acl *acl,
> > > +                       struct hmap *lflows)
> > > +{
> > > +    if (!strcmp(acl->direction, "from-lport")) {
> > > +        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL,
> > > +                                acl->priority + OVN_ACL_PRI_OFFSET,
> > > +                                acl->match,
> > > +                                "next;",
> > > +                                &acl->header_);
> > > +    } else {
> > > +        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL,
> > > +                                acl->priority + OVN_ACL_PRI_OFFSET,
> > > +                                acl->match,
> > > +                                "next;",
> > > +                                &acl->header_);
> > > +    }
> > > +}
> > > +
> > > +static void
> > > +build_stateless_filters(struct ovn_datapath *od, struct hmap
*port_groups,
> > > +                        struct hmap *lflows)
> > > +{
> > > +    for (size_t i = 0; i < od->nbs->n_acls; i++) {
> > > +        const struct nbrec_acl *acl = od->nbs->acls[i];
> > > +        if (!strcmp(acl->action, "allow-stateless")) {
> > > +            build_stateless_filter(od, acl, lflows);
> > > +        }
> > > +    }
> > > +
> > > +    struct ovn_port_group *pg;
> > > +    HMAP_FOR_EACH (pg, key_node, port_groups) {
> > > +        if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) {
> > > +            for (size_t i = 0; i < pg->nb_pg->n_acls; i++) {
> > > +                const struct nbrec_acl *acl = pg->nb_pg->acls[i];
> > > +                if (!strcmp(acl->action, "allow-stateless")) {
> > > +                    build_stateless_filter(od, acl, lflows);
> > > +                }
> > > +            }
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +static void
> > > +build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups,
> > > +               struct hmap *lflows)
> > >  {
> > >      /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are
> > >       * allowed by default. */
> > > @@ -5002,6 +5047,8 @@ build_pre_acls(struct ovn_datapath *od, struct
hmap *lflows)
> > >                                       110, lflows);
> > >          }
> > >
> >
> > nit: there is a comment here several lines above this that needs to be
updated:
>
> thanks, adjusted it somewhat
>
> >
> > |     /* If there are any stateful ACL rules in this datapath, we must
> > |      * send all IP packets through the conntrack action, which handles
> > |      * defragmentation, in order to match L4 headers. */
> >
> > > +        build_stateless_filters(od, port_groups, lflows);
> > > +
> > >          /* Ingress and Egress Pre-ACL Table (Priority 110).
> > >           *
> > >           * Not to do conntrack on ND and ICMP destination
> > > @@ -5369,7 +5416,8 @@ build_acl_log(struct ds *actions, const struct
nbrec_acl *acl,
> > >      } else if (!strcmp(acl->action, "reject")) {
> > >          ds_put_cstr(actions, "verdict=reject, ");
> > >      } else if (!strcmp(acl->action, "allow")
> > > -        || !strcmp(acl->action, "allow-related")) {
> > > +        || !strcmp(acl->action, "allow-related")
> > > +        || !strcmp(acl->action, "allow-stateless")) {
> > >          ds_put_cstr(actions, "verdict=allow, ");
> > >      }
> > >
> > > @@ -5428,7 +5476,16 @@ consider_acl(struct hmap *lflows, struct
ovn_datapath *od,
> > >      bool ingress = !strcmp(acl->direction, "from-lport") ? true
:false;
> > >      enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL :
S_SWITCH_OUT_ACL;
> > >
> > > -    if (!strcmp(acl->action, "allow")
> > > +    if (!strcmp(acl->action, "allow-stateless")) {
> > > +        struct ds actions = DS_EMPTY_INITIALIZER;
> > > +        build_acl_log(&actions, acl, meter_groups);
> > > +        ds_put_cstr(&actions, "next;");
> > > +        ovn_lflow_add_with_hint(lflows, od, stage,
> > > +                                acl->priority + OVN_ACL_PRI_OFFSET,
> > > +                                acl->match, ds_cstr(&actions),
> > > +                                &acl->header_);
> > > +        ds_destroy(&actions);
> > > +    } else if (!strcmp(acl->action, "allow")
> > >          || !strcmp(acl->action, "allow-related")) {
> > >          /* If there are any stateful flows, we must even commit
"allow"
> > >           * actions.  This is because, while the initiater's
> > > @@ -6828,7 +6885,7 @@ build_lswitch_lflows_pre_acl_and_acl(struct
ovn_datapath *od,
> > >          od->has_stateful_acl = ls_has_stateful_acl(od);
> > >          od->has_lb_vip = ls_has_lb_vip(od);
> > >
> > > -        build_pre_acls(od, lflows);
> > > +        build_pre_acls(od, port_groups, lflows);
> > >          build_pre_lb(od, lflows, meter_groups, lbs);
> > >          build_pre_stateful(od, lflows);
> > >          build_acl_hints(od, lflows);
> > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> > > index 7953325aa..5ae52b3e8 100644
> > > --- a/northd/ovn_northd.dl
> > > +++ b/northd/ovn_northd.dl
> > > @@ -1824,6 +1824,27 @@ for (&Switch(.ls =ls)) {
> > >           .external_ids     = map_empty())
> > >  }
> > >
> > > +for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl,
.has_fair_meter = fair_meter)) {
> > > +    if (sw.has_stateful_acl) {
> > > +        if (acl.action == "allow-stateless") {
> > > +            if (acl.direction == "from-lport") {
> > > +                Flow(.logical_datapath = ls._uuid,
> > > +                     .stage            = s_SWITCH_IN_PRE_ACL(),
> > > +                     .priority         = acl.priority +
oVN_ACL_PRI_OFFSET(),
> > > +                     .__match          = acl.__match,
> > > +                     .actions          = "next;",
> > > +                     .external_ids     = stage_hint(acl._uuid))
> > > +            } else {
> > > +                Flow(.logical_datapath = ls._uuid,
> > > +                     .stage            = s_SWITCH_OUT_PRE_ACL(),
> > > +                     .priority         = acl.priority +
oVN_ACL_PRI_OFFSET(),
> > > +                     .__match          = acl.__match,
> > > +                     .actions          = "next;",
> > > +                     .external_ids     = stage_hint(acl._uuid))
> > > +            }
> > > +        }
> > > +    }
> > > +}
> > >
> > >  /* If there are any stateful ACL rules in this datapath, we must
> > >   * send all IP packets through the conntrack action, which handles
> > > @@ -2168,6 +2189,9 @@ function build_acl_log(acl: nb::ACL,
fair_meter: bool): string =
> > >              "allow-related" -> {
> > >                  strs.push("verdict=allow")
> > >              },
> > > +            "allow-stateless" -> {
> > > +                strs.push("verdict=allow")
> > > +            },
> > >              _ -> ()
> > >          };
> > >          match (acl.meter) {
> > > @@ -2546,6 +2570,13 @@ for (&SwitchACL(.sw = sw@&Switch{.ls = ls},
.acl = &acl, .has_fair_meter = fair_
> > >                   .actions          = "${acl_log}next;",
> > >                   .external_ids     = stage_hint)
> > >          }
> > > +    } else if (acl.action == "allow-stateless") {
> > > +        Flow(.logical_datapath = ls._uuid,
> > > +             .stage            = stage,
> > > +             .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
> > > +             .__match          = acl.__match,
> > > +             .actions          = "${acl_log}next;",
> > > +             .external_ids     = stage_hint)
> > >      } else if (acl.action == "drop" or acl.action == "reject") {
> > >          /* The implementation of "drop" differs if stateful ACLs are
in
> > >           * use for this datapath.  In that case, the actions differ
> > > diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema
> > > index 29019809c..faf619a1c 100644
> > > --- a/ovn-nb.ovsschema
> > > +++ b/ovn-nb.ovsschema
> > > @@ -1,7 +1,7 @@
> > >  {
> > >      "name": "OVN_Northbound",
> > > -    "version": "5.31.0",
> > > -    "cksum": "2352750632 28701",
> > > +    "version": "5.32.0",
> > > +    "cksum": "204590300 28863",
> > >      "tables": {
> > >          "NB_Global": {
> > >              "columns": {
> > > @@ -221,7 +221,10 @@
> > >                                              "enum": ["set",
["from-lport", "to-lport"]]}}},
> > >                  "match": {"type": "string"},
> > >                  "action": {"type": {"key": {"type": "string",
> > > -                                            "enum": ["set",
["allow", "allow-related", "drop", "reject"]]}}},
> > > +                                            "enum": ["set",
> > > +                                               ["allow",
"allow-related",
> > > +                                                "allow-stateless",
"drop",
> > > +                                                "reject"]]}}},
> > >                  "log": {"type": "boolean"},
> > >                  "severity": {"type": {"key": {"type": "string",
> > >                                                "enum": ["set",
> > > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > > index feb38b5d3..5386c8fef 100644
> > > --- a/ovn-nb.xml
> > > +++ b/ovn-nb.xml
> > > @@ -1791,7 +1791,9 @@
> > >        <p>The action to take when the ACL rule matches:</p>
> > >        <ul>
> > >          <li>
> > > -          <code>allow</code>: Forward the packet.
> > > +          <code>allow</code>: Forward the packet. It will also send
the
> > > +          packets through connection tracking when
> > > +          <code>allow-related</code> rules exist on the logical
switch.
> > >          </li>
> >
> > Thanks for updating the documentation. It is really helpful for
understanding the difference between "allow" and "allow-stateless". I think
it could be even more clear if we just say something like: when there is no
"allow-related" rules exist on the logical switch, "allow" is equivalent to
"allow-stateless". Otherwise, it is equivalent to "allow-related". (And the
order of "allow-stateless" and "allow" may be switched)
>
> I did something similar though not verbatim in the latest upload.
> Please take a look if it's better.
>
> >
> > >
> > >          <li>
> > > @@ -1799,6 +1801,11 @@
> > >            (e.g. inbound replies to an outbound connection).
> > >          </li>
> > >
> > > +        <li>
> > > +          <code>allow-stateless</code>: Always forward the packet in
stateless
> > > +          manner. May require defining additional rules for inbound
replies.
> > > +        </li>
> > > +
> > It would be better to clarify more on the "inbound replies" part. The
commit message has more details, but it is better to make sure the users
understand it by just reading this text. We may also emphasise that the
packets will not go through conntrack table, even if there are other
"allow-related" rules co-exist.
>
> You are right, I've expanded a bit about matching peer rules. Also
> added on contrack.
>
> >
> > Thanks,
> > Han
> >
> > >          <li>
> > >            <code>drop</code>: Silently drop the packet.
> > >          </li>
> > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > > index 32afb4fa8..f62526fd3 100644
> > > --- a/tests/ovn-northd.at
> > > +++ b/tests/ovn-northd.at
> > > @@ -2592,6 +2592,315 @@ sed 's/reg8\[[0..15\]] ==
[[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0],
> > >  AT_CLEANUP
> > >  ])
> > >
> > > +OVN_FOR_EACH_NORTHD([
> > > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack -
Logical_Switch])
> > > +ovn_start
> > > +
> > > +ovn-nbctl ls-add ls
> > > +ovn-nbctl lsp-add ls lsp1
> > > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
> > > +ovn-nbctl lsp-add ls lsp2
> > > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
> > > +
> > > +for direction in from to; do
> > > +    ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related
> > > +    ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related
> > > +    ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop
> > > +done
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst ==
00:00:00:00:00:02'
> > > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst ==
66.66.66.66'
> > > +flow_tcp='tcp && tcp.dst == 80'
> > > +flow_udp='udp && udp.dst == 80'
> > > +
> > > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
> > > +
> > > +# TCP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_next(ct_state=new|trk) {
> > > +        output("lsp2");
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# UDP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_next(ct_state=new|trk) {
> > > +        output("lsp2");
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# Allow stateless for TCP.
> > > +for direction in from to; do
> > > +    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
> > > +done
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +# TCP packets should not go to conntrack anymore.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +output("lsp2");
> > > +])
> > > +
> > > +# UDP packets still go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_next(ct_state=new|trk) {
> > > +        output("lsp2");
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# Add a load balancer.
> > > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
> > > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
> > > +ovn-nbctl ls-lb-add ls lb-tcp
> > > +ovn-nbctl ls-lb-add ls lb-udp
> > > +
> > > +# Remove stateless for TCP.
> > > +ovn-nbctl acl-del ls
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +# TCP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# UDP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# Allow stateless for TCP.
> > > +for direction in from to; do
> > > +    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
> > > +done
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +# TCP packets should go to conntrack for load balancing.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# UDP packets still go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +AT_CLEANUP
> > > +])
> > > +
> > > +OVN_FOR_EACH_NORTHD([
> > > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group])
> > > +ovn_start
> > > +
> > > +ovn-nbctl ls-add ls
> > > +ovn-nbctl lsp-add ls lsp1
> > > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
> > > +ovn-nbctl lsp-add ls lsp2
> > > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
> > > +
> > > +ovn-nbctl pg-add pg lsp1 lsp2
> > > +
> > > +for direction in from to; do
> > > +    ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related
> > > +    ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related
> > > +    ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop
> > > +done
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
> > > +echo $lsp1_inport
> > > +
> > > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst ==
00:00:00:00:00:02'
> > > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst ==
66.66.66.66'
> > > +flow_tcp='tcp && tcp.dst == 80'
> > > +flow_udp='udp && udp.dst == 80'
> > > +
> > > +# TCP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_next(ct_state=new|trk) {
> > > +        output("lsp2");
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# UDP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_next(ct_state=new|trk) {
> > > +        output("lsp2");
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# Allow stateless for TCP.
> > > +for direction in from to; do
> > > +    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
> > > +done
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +# TCP packets should not go to conntrack anymore.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +output("lsp2");
> > > +])
> > > +
> > > +# UDP packets still go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_next(ct_state=new|trk) {
> > > +        output("lsp2");
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# Add a load balancer.
> > > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
> > > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
> > > +ovn-nbctl ls-lb-add ls lb-tcp
> > > +ovn-nbctl ls-lb-add ls lb-udp
> > > +
> > > +# Remove stateless for TCP.
> > > +ovn-nbctl acl-del pg
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +# TCP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# UDP packets should go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# Allow stateless for TCP.
> > > +for direction in from to; do
> > > +    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
> > > +done
> > > +ovn-nbctl --wait=sb sync
> > > +
> > > +# TCP packets should go to conntrack for load balancing.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +# UDP packets still go to conntrack.
> > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
> > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls
"${flow}"], [0], [dnl
> > > +#
udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
> > > +ct_next(ct_state=new|trk) {
> > > +    ct_lb {
> > > +        reg0[[6]] = 0;
> > > +        *** chk_lb_hairpin_reply action not implemented;
> > > +        reg0[[12]] = 0;
> > > +        ct_next(ct_state=new|trk) {
> > > +            output("lsp2");
> > > +        };
> > > +    };
> > > +};
> > > +])
> > > +
> > > +AT_CLEANUP
> > > +])
> > > +
> > >  OVN_FOR_EACH_NORTHD([
> > >  AT_SETUP([ovn -- check BFD config propagation to SBDB])
> > >  AT_KEYWORDS([northd-bfd])
> > > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
> > > index 042c21002..48fd0b7ee 100644
> > > --- a/utilities/ovn-nbctl.c
> > > +++ b/utilities/ovn-nbctl.c
> > > @@ -2303,9 +2303,11 @@ nbctl_acl_add(struct ctl_context *ctx)
> > >
> > >      /* Validate action. */
> > >      if (strcmp(action, "allow") && strcmp(action, "allow-related")
> > > -        && strcmp(action, "drop") && strcmp(action, "reject")) {
> > > +        && strcmp(action, "allow-stateless") && strcmp(action,
"drop")
> > > +        && strcmp(action, "reject")) {
> > >          ctl_error(ctx, "%s: action must be one of \"allow\", "
> > > -                  "\"allow-related\", \"drop\", and \"reject\"",
action);
> > > +                  "\"allow-related\", \"allow-stateless\", \"drop\",
"
> > > +                  "and \"reject\"", action);
> > >          return;
> > >      }
> > >
> > > --
> > > 2.31.1
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ihar Hrachyshka May 17, 2021, 9:27 p.m. UTC | #6
On Fri, May 7, 2021 at 7:11 PM Han Zhou <hzhou@ovn.org> wrote:
>
>
>
> On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >
> > On 5/7/21 7:35 AM, Han Zhou wrote:
> > > Hi Ihar,
> > >
> > > Thanks for the patch, and it's great to see the significant performance
> > > improvement!
> > > Please see my comments below.
> > >
> > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
> > >>
> > >> For allow-stateless ACLs, bypass connection tracking by avoiding
> > >> setting ct hints for matching traffic. Avoid sending all traffic to ct
> > >> when a stateful ACL is present.
> > >>
> > >> ===
> > >>
> > >> Reusing an existing 'allow' verb for stateless matching would have its
> > >> drawbacks, specifically, when it comes to backwards incompatibility of
> > >> the new behavior with existing environments. When using "allow" ACLs
> > >> in mixed allow/allow-related environment, we still commit "allow"
> > >> traffic to conntrack. This unnecessarily hits performance when mixed
> > >> ACL action types were used for the same datapath. This is why we
> > >> introduce a new action verb to describe stateless behavior.
> > >>
> > >> Another complexity to consider is the fact that with stateless
> > >> matching, one would not be able to rely on 'related' magic that
> > >> guarantees that reply traffic is passed through. Instead, the user
> > >> would have to accurately define matching rules both for request and
> > >> reply directions of a protocol session. Specifically, when allowing
> > >> ICMP for a specific peer host, one has to define 'allow-stateless'
> > >> rules that would match against ip.dst for request direction and ip.src
> > >> for reply direction. Other protocols and scenarios will require their
> > >> own fine grained matching approaches implemented by the user.
> > >>
> > >> ===
> > >>
> > >> For performance measurements, qperf was used. Tests were executed on two
> > >> setups:
> > >>
> > >> 1) ovn-fake-multinode virtual environment;
> > >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute
> > >>    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
> > >>    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
> > >>
> > >> 1) ovn-fake-multinode:
> > >>
> > >> Performance measured between two virtual nodes, two ports
> > >> that belong to different LSs connected via router. Using qperf,
> > >> performance was measured for UDP, TCP, SCTP protocols (using
> > >> <proto>_lat and <proto>_bw tests). The qperf version used:
> > >> 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
> > >> averages compared.
> > >>
> > >> Tests were executed with `allow-stateless` rules for the tested
> > >> protocol and `allow-related` for another protocol set for both ports,
> > >> both directions, e.g. for TCP scenario, the following ACLs were
> > >> defined:
> > >>
> > >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> > >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> > >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> > >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
> > >>
> > >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> > >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> > >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> > >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
> > >>
> > >> In this particular environment, improvement was seen in send_bw,
> > >> latency, and msg_rate measurements, where applicable, for all three
> > >> protocols under test.
> > >>
> > >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
> > >>          latency: 16 us => 14.08 us (-12%)
> > >>          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
> > >>
> > >> for TCP, latency: 18.6 us => 14.88 us (-20%)
> > >>          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
> > >>
> > >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
> > >>           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
> > >>
> > >> Interestingly, some performance improvement was also seen for the same
> > >> scenarios with no ACLs set at all, albeit significantly more
> > >> negligible.
> > >>
> > >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
> > >>          latency: 13.74 us => 12.88 us (-6.68%)
> > >>          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
> > >>
> > >> for TCP, latency: 15.62 us => 14.26 us (-9.54%)
> > >>          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
> > >>
> > >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
> > >>           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
> > >>
> > >> 2) Supermicro RH-OSP cluster:
> > >>
> > >> Two scenarios executed:
> > >> - connectivity between two ports on the same switch
> > >> - connectivity between two ports on different switches connected via
> > >>   router
> > >>
> > >> For the same switch, improvements are as follows:
> > >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> > >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
> > >>
> > >> For different switches, improvements are as follows:
> > >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> > >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
> > >>
> > >> The effect is more noticeable in same switch scenario.
> > >
> > > This is interesting. Any idea why with different switches the result is so
> > > different? I'd expect similar improvement just like in same switch, because
> > > in this kind of test the flow should be cached in kernel datapath
> > > (regardless of same/different logical switches) and in both cases the cost
> > > of traversing conntrack is saved.
> > >
> > >>
> > >> Comparable numbers can be captured with iperf. It may be useful to run
> > >> more tests in a more elaborate (bare metal) environment.
> > >>
> > >> ===
> > >>
> > >> The patch takes inspiration from a now abandoned patch:
> > >>
> > >> "ovn-northd: Support mixing stateless/stateful ACLs with
> > >> Stateless_Filter." by Dumitru Ceara.
> > >>
> > >> The original patch assumed CMS doesn't require full flexibility of
> > >> matching rules for stateless matching (for example, to be used by
> > >> OpenShift). But other CMS interfaces may require the same
> > >> customizability for stateless as well as stateful matching, like in
> > >> OpenStack Neutron API. Which is why this patch reuses existing ACL
> > >> object type to describe stateless rules.
> > >>
> > >> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> > >>
> > >> ---
> > >>
> > >> v1: initial version.
> > >> v2: rebased after conflict.
> > >> v3: added ddlog implementation.
> > >> v3: apply stateless skip-hint-set rules to appropriate direction only.
> > >> v3: added more background as to implementation in commit message.
> > >> v3: test stateless scenarios with ddlog too.
> > >> v3: rebased after conflict.
> > >> v4: introduce a separate allow-stateless ACL match verb.
> > >> v5: rebased.
> > >> v6: updated docs for new allow-stateless approach.
> > >> v6: removed no longer valid comments.
> > >> v6: removed acl_is_stateless.
> > >> v7: bump north db schema version to 5.31.0 -> 5.32.0.
> > >> v8: fixed checkpatch failure on a too long line in dbschema.
> > >> v8: added perf data on baremetal lab testing.
> > >> v9: fixed ovsdb checksum.
> > >> ---
> > >>  NEWS                    |   2 +
> > >>  northd/ovn-northd.8.xml |   8 +-
> > >>  northd/ovn-northd.c     |  65 ++++++++-
> > >>  northd/ovn_northd.dl    |  31 ++++
> > >>  ovn-nb.ovsschema        |   9 +-
> > >>  ovn-nb.xml              |   9 +-
> > >>  tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
> > >>  utilities/ovn-nbctl.c   |   6 +-
> > >>  8 files changed, 428 insertions(+), 11 deletions(-)
> > >>
> > >> diff --git a/NEWS b/NEWS
> > >> index 1ddde15f8..d72139e76 100644
> > >> --- a/NEWS
> > >> +++ b/NEWS
> > >> @@ -11,6 +11,8 @@ Post-v21.03.0
> > >>      'use_parallel_build' to enable it.  It is disabled by default.
> > >>    - Support vlan-passthru mode for tag=0 localnet ports.
> > >>    - Support custom 802.11ad EthType for localnet ports.
> > >> +  - Introduce a new "allow-stateless" ACL verb to always bypass
> > > connection
> > >> +    tracking. The existing "allow" verb behavior is left intact.
> > >>
> > >>  OVN v21.03.0 - 12 Mar 2021
> > >>  -------------------------
> > >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > >> index 54e88d3fa..08c4ea852 100644
> > >> --- a/northd/ovn-northd.8.xml
> > >> +++ b/northd/ovn-northd.8.xml
> > >> @@ -419,7 +419,9 @@
> > >>        before eventually advancing to ingress table <code>ACLs</code>. If
> > >>        special ports such as route ports or localnet ports can't use
> > > ct(), a
> > >>        priority-110 flow is added to skip over stateful ACLs. IPv6
> > > Neighbor
> > >> -      Discovery and MLD traffic also skips stateful ACLs.
> > >> +      Discovery and MLD traffic also skips stateful ACLs. For
> > > "allow-stateless"
> > >> +      ACLs, a flow is added to bypass setting the hint for connection
> > > tracker
> > >> +      processing.
> > >
> > > This approach clearly achieves the goal of bypassing conntrack and there is
> > > great dataplane performance gain, but it also doubles the ACL logical flows
> > > which in turn could significantly increase the control plane cost,
> > > especially in ovn-controller. In large scale environments where lots of
> > > ACLs referencing large port-group/address-sets are used, it is already a
> > > bottleneck of ovn-controller flow computing. It would be better if we could
> > > avoid doubling this cost. Instead of directly add the ACL match here, can
> > > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage
> > > pre-stateful add a higher priority flow, if this hint is matched then
> > > directly "next" without going through CT? I hope this can achieve the same
> > > goal but not increasing control plane cost. What do you think?
> > >
> >
> > Wouldn't this break ACL priority?  E.g., with the following ACLs:
> >
> > ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless
> > ACL2: from-lport, prio=50, ip, drop
> >
> > If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic
> > will be dropped by the lflow added for ACL2.
> >
> You are right. Actually I was thinking about matching the hint again in acl stage and just move next, but it would skip all the stateful ACLs that could have a higher priority. This would work only if the stateless ACLs can always be treated as higher priority than stateful ACLs, but it seems unreasonable. It seems all the ACL matches have to be evaluated at the same stage to ensure the priorities are followed. So, forget about this proposal.
>
> > On the other hand, IIUC, this feature is designed to be used in
> > combination with "allow-related" ACLs defined on the same switch,
> > replacing "allow" ACLs in that case.
> >
> > Before this change an "allow" ACL would've been treated as
> > "allow-related" if there was at least one other "allow-related" ACL
> > applied on the logical switch.  Essentially generating 2 logical flows
> > per "allow" ACL:
> >
> > lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then
> > "REGBIT_CONNTRACK_COMMIT=1, next"
> > lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next"
> >
> > If I'm not wrong, changing this ACL to "allow-stateless" would also
> > generate 2 logical flows, so at least from a number of logical flows
> > perspective, the result is the same.
> >
> You are right. But this also means that when there is no stateful rules, "allow-stateless" is worse than "allow" because "allow" would generate only a single lflow in that case, while "allow-stateless" always generates doubled lflows. So, I think at least we could do the simple improvement to make "allow-stateless" perform the same as "allow" when there are NO stateful ACLs. (Otherwise, my comment on the documentation regarding '"allow-stateless" is equivalent to "allow" when there are no stateful ACLs' would be wrong). We can move the logic of adding the stateless matching flows of pre-acl stage into a if-block (only when has_stateful is true). Would this work?

For the record, I looked again into that, and I don't think double
lflows concern doesn't apply because we already check for
od->has_stateful_acl before introducing any flows that set DEFRAG bit,
as well as the new flows that omit doing it. So I don't think there is
any optimization as described applicable. Let me know if that's not
correct.

Ihar
Han Zhou May 17, 2021, 10:17 p.m. UTC | #7
On Mon, May 17, 2021 at 2:27 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>
> On Fri, May 7, 2021 at 7:11 PM Han Zhou <hzhou@ovn.org> wrote:
> >
> >
> >
> > On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <dceara@redhat.com> wrote:
> > >
> > > On 5/7/21 7:35 AM, Han Zhou wrote:
> > > > Hi Ihar,
> > > >
> > > > Thanks for the patch, and it's great to see the significant
performance
> > > > improvement!
> > > > Please see my comments below.
> > > >
> > > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com>
wrote:
> > > >>
> > > >> For allow-stateless ACLs, bypass connection tracking by avoiding
> > > >> setting ct hints for matching traffic. Avoid sending all traffic
to ct
> > > >> when a stateful ACL is present.
> > > >>
> > > >> ===
> > > >>
> > > >> Reusing an existing 'allow' verb for stateless matching would have
its
> > > >> drawbacks, specifically, when it comes to backwards
incompatibility of
> > > >> the new behavior with existing environments. When using "allow"
ACLs
> > > >> in mixed allow/allow-related environment, we still commit "allow"
> > > >> traffic to conntrack. This unnecessarily hits performance when
mixed
> > > >> ACL action types were used for the same datapath. This is why we
> > > >> introduce a new action verb to describe stateless behavior.
> > > >>
> > > >> Another complexity to consider is the fact that with stateless
> > > >> matching, one would not be able to rely on 'related' magic that
> > > >> guarantees that reply traffic is passed through. Instead, the user
> > > >> would have to accurately define matching rules both for request and
> > > >> reply directions of a protocol session. Specifically, when allowing
> > > >> ICMP for a specific peer host, one has to define 'allow-stateless'
> > > >> rules that would match against ip.dst for request direction and
ip.src
> > > >> for reply direction. Other protocols and scenarios will require
their
> > > >> own fine grained matching approaches implemented by the user.
> > > >>
> > > >> ===
> > > >>
> > > >> For performance measurements, qperf was used. Tests were executed
on two
> > > >> setups:
> > > >>
> > > >> 1) ovn-fake-multinode virtual environment;
> > > >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2
compute
> > > >>    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for
25GbE
> > > >>    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
> > > >>
> > > >> 1) ovn-fake-multinode:
> > > >>
> > > >> Performance measured between two virtual nodes, two ports
> > > >> that belong to different LSs connected via router. Using qperf,
> > > >> performance was measured for UDP, TCP, SCTP protocols (using
> > > >> <proto>_lat and <proto>_bw tests). The qperf version used:
> > > >> 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times
and
> > > >> averages compared.
> > > >>
> > > >> Tests were executed with `allow-stateless` rules for the tested
> > > >> protocol and `allow-related` for another protocol set for both
ports,
> > > >> both directions, e.g. for TCP scenario, the following ACLs were
> > > >> defined:
> > > >>
> > > >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> > > >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> > > >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> > > >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
> > > >>
> > > >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> > > >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> > > >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> > > >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
> > > >>
> > > >> In this particular environment, improvement was seen in send_bw,
> > > >> latency, and msg_rate measurements, where applicable, for all three
> > > >> protocols under test.
> > > >>
> > > >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
> > > >>          latency: 16 us => 14.08 us (-12%)
> > > >>          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
> > > >>
> > > >> for TCP, latency: 18.6 us => 14.88 us (-20%)
> > > >>          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
> > > >>
> > > >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
> > > >>           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
> > > >>
> > > >> Interestingly, some performance improvement was also seen for the
same
> > > >> scenarios with no ACLs set at all, albeit significantly more
> > > >> negligible.
> > > >>
> > > >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
> > > >>          latency: 13.74 us => 12.88 us (-6.68%)
> > > >>          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
> > > >>
> > > >> for TCP, latency: 15.62 us => 14.26 us (-9.54%)
> > > >>          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
> > > >>
> > > >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
> > > >>           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
> > > >>
> > > >> 2) Supermicro RH-OSP cluster:
> > > >>
> > > >> Two scenarios executed:
> > > >> - connectivity between two ports on the same switch
> > > >> - connectivity between two ports on different switches connected
via
> > > >>   router
> > > >>
> > > >> For the same switch, improvements are as follows:
> > > >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> > > >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
> > > >>
> > > >> For different switches, improvements are as follows:
> > > >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> > > >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
> > > >>
> > > >> The effect is more noticeable in same switch scenario.
> > > >
> > > > This is interesting. Any idea why with different switches the
result is so
> > > > different? I'd expect similar improvement just like in same switch,
because
> > > > in this kind of test the flow should be cached in kernel datapath
> > > > (regardless of same/different logical switches) and in both cases
the cost
> > > > of traversing conntrack is saved.
> > > >
> > > >>
> > > >> Comparable numbers can be captured with iperf. It may be useful to
run
> > > >> more tests in a more elaborate (bare metal) environment.
> > > >>
> > > >> ===
> > > >>
> > > >> The patch takes inspiration from a now abandoned patch:
> > > >>
> > > >> "ovn-northd: Support mixing stateless/stateful ACLs with
> > > >> Stateless_Filter." by Dumitru Ceara.
> > > >>
> > > >> The original patch assumed CMS doesn't require full flexibility of
> > > >> matching rules for stateless matching (for example, to be used by
> > > >> OpenShift). But other CMS interfaces may require the same
> > > >> customizability for stateless as well as stateful matching, like in
> > > >> OpenStack Neutron API. Which is why this patch reuses existing ACL
> > > >> object type to describe stateless rules.
> > > >>
> > > >> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> > > >>
> > > >> ---
> > > >>
> > > >> v1: initial version.
> > > >> v2: rebased after conflict.
> > > >> v3: added ddlog implementation.
> > > >> v3: apply stateless skip-hint-set rules to appropriate direction
only.
> > > >> v3: added more background as to implementation in commit message.
> > > >> v3: test stateless scenarios with ddlog too.
> > > >> v3: rebased after conflict.
> > > >> v4: introduce a separate allow-stateless ACL match verb.
> > > >> v5: rebased.
> > > >> v6: updated docs for new allow-stateless approach.
> > > >> v6: removed no longer valid comments.
> > > >> v6: removed acl_is_stateless.
> > > >> v7: bump north db schema version to 5.31.0 -> 5.32.0.
> > > >> v8: fixed checkpatch failure on a too long line in dbschema.
> > > >> v8: added perf data on baremetal lab testing.
> > > >> v9: fixed ovsdb checksum.
> > > >> ---
> > > >>  NEWS                    |   2 +
> > > >>  northd/ovn-northd.8.xml |   8 +-
> > > >>  northd/ovn-northd.c     |  65 ++++++++-
> > > >>  northd/ovn_northd.dl    |  31 ++++
> > > >>  ovn-nb.ovsschema        |   9 +-
> > > >>  ovn-nb.xml              |   9 +-
> > > >>  tests/ovn-northd.at     | 309
++++++++++++++++++++++++++++++++++++++++
> > > >>  utilities/ovn-nbctl.c   |   6 +-
> > > >>  8 files changed, 428 insertions(+), 11 deletions(-)
> > > >>
> > > >> diff --git a/NEWS b/NEWS
> > > >> index 1ddde15f8..d72139e76 100644
> > > >> --- a/NEWS
> > > >> +++ b/NEWS
> > > >> @@ -11,6 +11,8 @@ Post-v21.03.0
> > > >>      'use_parallel_build' to enable it.  It is disabled by default.
> > > >>    - Support vlan-passthru mode for tag=0 localnet ports.
> > > >>    - Support custom 802.11ad EthType for localnet ports.
> > > >> +  - Introduce a new "allow-stateless" ACL verb to always bypass
> > > > connection
> > > >> +    tracking. The existing "allow" verb behavior is left intact.
> > > >>
> > > >>  OVN v21.03.0 - 12 Mar 2021
> > > >>  -------------------------
> > > >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > > >> index 54e88d3fa..08c4ea852 100644
> > > >> --- a/northd/ovn-northd.8.xml
> > > >> +++ b/northd/ovn-northd.8.xml
> > > >> @@ -419,7 +419,9 @@
> > > >>        before eventually advancing to ingress table
<code>ACLs</code>. If
> > > >>        special ports such as route ports or localnet ports can't
use
> > > > ct(), a
> > > >>        priority-110 flow is added to skip over stateful ACLs. IPv6
> > > > Neighbor
> > > >> -      Discovery and MLD traffic also skips stateful ACLs.
> > > >> +      Discovery and MLD traffic also skips stateful ACLs. For
> > > > "allow-stateless"
> > > >> +      ACLs, a flow is added to bypass setting the hint for
connection
> > > > tracker
> > > >> +      processing.
> > > >
> > > > This approach clearly achieves the goal of bypassing conntrack and
there is
> > > > great dataplane performance gain, but it also doubles the ACL
logical flows
> > > > which in turn could significantly increase the control plane cost,
> > > > especially in ovn-controller. In large scale environments where
lots of
> > > > ACLs referencing large port-group/address-sets are used, it is
already a
> > > > bottleneck of ovn-controller flow computing. It would be better if
we could
> > > > avoid doubling this cost. Instead of directly add the ACL match
here, can
> > > > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later
stage
> > > > pre-stateful add a higher priority flow, if this hint is matched
then
> > > > directly "next" without going through CT? I hope this can achieve
the same
> > > > goal but not increasing control plane cost. What do you think?
> > > >
> > >
> > > Wouldn't this break ACL priority?  E.g., with the following ACLs:
> > >
> > > ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless
> > > ACL2: from-lport, prio=50, ip, drop
> > >
> > > If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic
> > > will be dropped by the lflow added for ACL2.
> > >
> > You are right. Actually I was thinking about matching the hint again in
acl stage and just move next, but it would skip all the stateful ACLs that
could have a higher priority. This would work only if the stateless ACLs
can always be treated as higher priority than stateful ACLs, but it seems
unreasonable. It seems all the ACL matches have to be evaluated at the same
stage to ensure the priorities are followed. So, forget about this proposal.
> >
> > > On the other hand, IIUC, this feature is designed to be used in
> > > combination with "allow-related" ACLs defined on the same switch,
> > > replacing "allow" ACLs in that case.
> > >
> > > Before this change an "allow" ACL would've been treated as
> > > "allow-related" if there was at least one other "allow-related" ACL
> > > applied on the logical switch.  Essentially generating 2 logical flows
> > > per "allow" ACL:
> > >
> > > lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then
> > > "REGBIT_CONNTRACK_COMMIT=1, next"
> > > lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next"
> > >
> > > If I'm not wrong, changing this ACL to "allow-stateless" would also
> > > generate 2 logical flows, so at least from a number of logical flows
> > > perspective, the result is the same.
> > >
> > You are right. But this also means that when there is no stateful
rules, "allow-stateless" is worse than "allow" because "allow" would
generate only a single lflow in that case, while "allow-stateless" always
generates doubled lflows. So, I think at least we could do the simple
improvement to make "allow-stateless" perform the same as "allow" when
there are NO stateful ACLs. (Otherwise, my comment on the documentation
regarding '"allow-stateless" is equivalent to "allow" when there are no
stateful ACLs' would be wrong). We can move the logic of adding the
stateless matching flows of pre-acl stage into a if-block (only when
has_stateful is true). Would this work?
>
> For the record, I looked again into that, and I don't think double
> lflows concern doesn't apply because we already check for
> od->has_stateful_acl before introducing any flows that set DEFRAG bit,
> as well as the new flows that omit doing it. So I don't think there is
> any optimization as described applicable. Let me know if that's not
> correct.
>
> Ihar
>

Yes you are right. Sorry that I should have noticed that the caller of the
function build_stateless_filters() is already in the if-block. Forget about
my comment :)
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 1ddde15f8..d72139e76 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,8 @@  Post-v21.03.0
     'use_parallel_build' to enable it.  It is disabled by default.
   - Support vlan-passthru mode for tag=0 localnet ports.
   - Support custom 802.11ad EthType for localnet ports.
+  - Introduce a new "allow-stateless" ACL verb to always bypass connection
+    tracking. The existing "allow" verb behavior is left intact.
 
 OVN v21.03.0 - 12 Mar 2021
 -------------------------
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 54e88d3fa..08c4ea852 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -419,7 +419,9 @@ 
       before eventually advancing to ingress table <code>ACLs</code>. If
       special ports such as route ports or localnet ports can't use ct(), a
       priority-110 flow is added to skip over stateful ACLs. IPv6 Neighbor
-      Discovery and MLD traffic also skips stateful ACLs.
+      Discovery and MLD traffic also skips stateful ACLs. For "allow-stateless"
+      ACLs, a flow is added to bypass setting the hint for connection tracker
+      processing.
     </p>
 
     <p>
@@ -614,6 +616,10 @@ 
         for new connections and <code>reg0[1] = 1; next;</code> for existing
         connections.
       </li>
+      <li>
+        <code>allow-stateless</code> ACLs translate into logical
+        flows with the <code>next;</code> action.
+      </li>
       <li>
         <code>reject</code> ACLs translate into logical
         flows with the
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 94fae5648..4032f1441 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -4974,7 +4974,52 @@  skip_port_from_conntrack(struct ovn_datapath *od, struct ovn_port *op,
 }
 
 static void
-build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
+build_stateless_filter(struct ovn_datapath *od,
+                       const struct nbrec_acl *acl,
+                       struct hmap *lflows)
+{
+    if (!strcmp(acl->direction, "from-lport")) {
+        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL,
+                                acl->priority + OVN_ACL_PRI_OFFSET,
+                                acl->match,
+                                "next;",
+                                &acl->header_);
+    } else {
+        ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL,
+                                acl->priority + OVN_ACL_PRI_OFFSET,
+                                acl->match,
+                                "next;",
+                                &acl->header_);
+    }
+}
+
+static void
+build_stateless_filters(struct ovn_datapath *od, struct hmap *port_groups,
+                        struct hmap *lflows)
+{
+    for (size_t i = 0; i < od->nbs->n_acls; i++) {
+        const struct nbrec_acl *acl = od->nbs->acls[i];
+        if (!strcmp(acl->action, "allow-stateless")) {
+            build_stateless_filter(od, acl, lflows);
+        }
+    }
+
+    struct ovn_port_group *pg;
+    HMAP_FOR_EACH (pg, key_node, port_groups) {
+        if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) {
+            for (size_t i = 0; i < pg->nb_pg->n_acls; i++) {
+                const struct nbrec_acl *acl = pg->nb_pg->acls[i];
+                if (!strcmp(acl->action, "allow-stateless")) {
+                    build_stateless_filter(od, acl, lflows);
+                }
+            }
+        }
+    }
+}
+
+static void
+build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups,
+               struct hmap *lflows)
 {
     /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are
      * allowed by default. */
@@ -5002,6 +5047,8 @@  build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
                                      110, lflows);
         }
 
+        build_stateless_filters(od, port_groups, lflows);
+
         /* Ingress and Egress Pre-ACL Table (Priority 110).
          *
          * Not to do conntrack on ND and ICMP destination
@@ -5369,7 +5416,8 @@  build_acl_log(struct ds *actions, const struct nbrec_acl *acl,
     } else if (!strcmp(acl->action, "reject")) {
         ds_put_cstr(actions, "verdict=reject, ");
     } else if (!strcmp(acl->action, "allow")
-        || !strcmp(acl->action, "allow-related")) {
+        || !strcmp(acl->action, "allow-related")
+        || !strcmp(acl->action, "allow-stateless")) {
         ds_put_cstr(actions, "verdict=allow, ");
     }
 
@@ -5428,7 +5476,16 @@  consider_acl(struct hmap *lflows, struct ovn_datapath *od,
     bool ingress = !strcmp(acl->direction, "from-lport") ? true :false;
     enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL;
 
-    if (!strcmp(acl->action, "allow")
+    if (!strcmp(acl->action, "allow-stateless")) {
+        struct ds actions = DS_EMPTY_INITIALIZER;
+        build_acl_log(&actions, acl, meter_groups);
+        ds_put_cstr(&actions, "next;");
+        ovn_lflow_add_with_hint(lflows, od, stage,
+                                acl->priority + OVN_ACL_PRI_OFFSET,
+                                acl->match, ds_cstr(&actions),
+                                &acl->header_);
+        ds_destroy(&actions);
+    } else if (!strcmp(acl->action, "allow")
         || !strcmp(acl->action, "allow-related")) {
         /* If there are any stateful flows, we must even commit "allow"
          * actions.  This is because, while the initiater's
@@ -6828,7 +6885,7 @@  build_lswitch_lflows_pre_acl_and_acl(struct ovn_datapath *od,
         od->has_stateful_acl = ls_has_stateful_acl(od);
         od->has_lb_vip = ls_has_lb_vip(od);
 
-        build_pre_acls(od, lflows);
+        build_pre_acls(od, port_groups, lflows);
         build_pre_lb(od, lflows, meter_groups, lbs);
         build_pre_stateful(od, lflows);
         build_acl_hints(od, lflows);
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 7953325aa..5ae52b3e8 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1824,6 +1824,27 @@  for (&Switch(.ls =ls)) {
          .external_ids     = map_empty())
 }
 
+for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_meter)) {
+    if (sw.has_stateful_acl) {
+        if (acl.action == "allow-stateless") {
+            if (acl.direction == "from-lport") {
+                Flow(.logical_datapath = ls._uuid,
+                     .stage            = s_SWITCH_IN_PRE_ACL(),
+                     .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
+                     .__match          = acl.__match,
+                     .actions          = "next;",
+                     .external_ids     = stage_hint(acl._uuid))
+            } else {
+                Flow(.logical_datapath = ls._uuid,
+                     .stage            = s_SWITCH_OUT_PRE_ACL(),
+                     .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
+                     .__match          = acl.__match,
+                     .actions          = "next;",
+                     .external_ids     = stage_hint(acl._uuid))
+            }
+        }
+    }
+}
 
 /* If there are any stateful ACL rules in this datapath, we must
  * send all IP packets through the conntrack action, which handles
@@ -2168,6 +2189,9 @@  function build_acl_log(acl: nb::ACL, fair_meter: bool): string =
             "allow-related" -> {
                 strs.push("verdict=allow")
             },
+            "allow-stateless" -> {
+                strs.push("verdict=allow")
+            },
             _ -> ()
         };
         match (acl.meter) {
@@ -2546,6 +2570,13 @@  for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_
                  .actions          = "${acl_log}next;",
                  .external_ids     = stage_hint)
         }
+    } else if (acl.action == "allow-stateless") {
+        Flow(.logical_datapath = ls._uuid,
+             .stage            = stage,
+             .priority         = acl.priority + oVN_ACL_PRI_OFFSET(),
+             .__match          = acl.__match,
+             .actions          = "${acl_log}next;",
+             .external_ids     = stage_hint)
     } else if (acl.action == "drop" or acl.action == "reject") {
         /* The implementation of "drop" differs if stateful ACLs are in
          * use for this datapath.  In that case, the actions differ
diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema
index 29019809c..faf619a1c 100644
--- a/ovn-nb.ovsschema
+++ b/ovn-nb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Northbound",
-    "version": "5.31.0",
-    "cksum": "2352750632 28701",
+    "version": "5.32.0",
+    "cksum": "204590300 28863",
     "tables": {
         "NB_Global": {
             "columns": {
@@ -221,7 +221,10 @@ 
                                             "enum": ["set", ["from-lport", "to-lport"]]}}},
                 "match": {"type": "string"},
                 "action": {"type": {"key": {"type": "string",
-                                            "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}},
+                                            "enum": ["set",
+                                               ["allow", "allow-related",
+                                                "allow-stateless", "drop",
+                                                "reject"]]}}},
                 "log": {"type": "boolean"},
                 "severity": {"type": {"key": {"type": "string",
                                               "enum": ["set",
diff --git a/ovn-nb.xml b/ovn-nb.xml
index feb38b5d3..5386c8fef 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -1791,7 +1791,9 @@ 
       <p>The action to take when the ACL rule matches:</p>
       <ul>
         <li>
-          <code>allow</code>: Forward the packet.
+          <code>allow</code>: Forward the packet. It will also send the
+          packets through connection tracking when
+          <code>allow-related</code> rules exist on the logical switch.
         </li>
 
         <li>
@@ -1799,6 +1801,11 @@ 
           (e.g. inbound replies to an outbound connection).
         </li>
 
+        <li>
+          <code>allow-stateless</code>: Always forward the packet in stateless
+          manner. May require defining additional rules for inbound replies.
+        </li>
+
         <li>
           <code>drop</code>: Silently drop the packet.
         </li>
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 32afb4fa8..f62526fd3 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -2592,6 +2592,315 @@  sed 's/reg8\[[0..15\]] == [[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0],
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Logical_Switch])
+ovn_start
+
+ovn-nbctl ls-add ls
+ovn-nbctl lsp-add ls lsp1
+ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
+ovn-nbctl lsp-add ls lsp2
+ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
+
+for direction in from to; do
+    ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related
+    ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related
+    ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop
+done
+ovn-nbctl --wait=sb sync
+
+flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
+flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66'
+flow_tcp='tcp && tcp.dst == 80'
+flow_udp='udp && udp.dst == 80'
+
+lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
+
+# TCP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+ct_next(ct_state=new|trk) {
+    ct_next(ct_state=new|trk) {
+        output("lsp2");
+    };
+};
+])
+
+# UDP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_next(ct_state=new|trk) {
+        output("lsp2");
+    };
+};
+])
+
+# Allow stateless for TCP.
+for direction in from to; do
+    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
+done
+ovn-nbctl --wait=sb sync
+
+# TCP packets should not go to conntrack anymore.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+output("lsp2");
+])
+
+# UDP packets still go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_next(ct_state=new|trk) {
+        output("lsp2");
+    };
+};
+])
+
+# Add a load balancer.
+ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
+ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
+ovn-nbctl ls-lb-add ls lb-tcp
+ovn-nbctl ls-lb-add ls lb-udp
+
+# Remove stateless for TCP.
+ovn-nbctl acl-del ls
+ovn-nbctl --wait=sb sync
+
+# TCP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+# UDP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+# Allow stateless for TCP.
+for direction in from to; do
+    ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
+done
+ovn-nbctl --wait=sb sync
+
+# TCP packets should go to conntrack for load balancing.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+# UDP packets still go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+AT_CLEANUP
+])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group])
+ovn_start
+
+ovn-nbctl ls-add ls
+ovn-nbctl lsp-add ls lsp1
+ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
+ovn-nbctl lsp-add ls lsp2
+ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
+
+ovn-nbctl pg-add pg lsp1 lsp2
+
+for direction in from to; do
+    ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related
+    ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related
+    ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop
+done
+ovn-nbctl --wait=sb sync
+
+lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
+echo $lsp1_inport
+
+flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
+flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66'
+flow_tcp='tcp && tcp.dst == 80'
+flow_udp='udp && udp.dst == 80'
+
+# TCP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+ct_next(ct_state=new|trk) {
+    ct_next(ct_state=new|trk) {
+        output("lsp2");
+    };
+};
+])
+
+# UDP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_next(ct_state=new|trk) {
+        output("lsp2");
+    };
+};
+])
+
+# Allow stateless for TCP.
+for direction in from to; do
+    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
+done
+ovn-nbctl --wait=sb sync
+
+# TCP packets should not go to conntrack anymore.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+output("lsp2");
+])
+
+# UDP packets still go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_next(ct_state=new|trk) {
+        output("lsp2");
+    };
+};
+])
+
+# Add a load balancer.
+ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp
+ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp
+ovn-nbctl ls-lb-add ls lb-tcp
+ovn-nbctl ls-lb-add ls lb-udp
+
+# Remove stateless for TCP.
+ovn-nbctl acl-del pg
+ovn-nbctl --wait=sb sync
+
+# TCP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+# UDP packets should go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+# Allow stateless for TCP.
+for direction in from to; do
+    ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless
+done
+ovn-nbctl --wait=sb sync
+
+# TCP packets should go to conntrack for load balancing.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+# UDP packets still go to conntrack.
+flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}"
+AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl
+# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80
+ct_next(ct_state=new|trk) {
+    ct_lb {
+        reg0[[6]] = 0;
+        *** chk_lb_hairpin_reply action not implemented;
+        reg0[[12]] = 0;
+        ct_next(ct_state=new|trk) {
+            output("lsp2");
+        };
+    };
+};
+])
+
+AT_CLEANUP
+])
+
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([ovn -- check BFD config propagation to SBDB])
 AT_KEYWORDS([northd-bfd])
diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
index 042c21002..48fd0b7ee 100644
--- a/utilities/ovn-nbctl.c
+++ b/utilities/ovn-nbctl.c
@@ -2303,9 +2303,11 @@  nbctl_acl_add(struct ctl_context *ctx)
 
     /* Validate action. */
     if (strcmp(action, "allow") && strcmp(action, "allow-related")
-        && strcmp(action, "drop") && strcmp(action, "reject")) {
+        && strcmp(action, "allow-stateless") && strcmp(action, "drop")
+        && strcmp(action, "reject")) {
         ctl_error(ctx, "%s: action must be one of \"allow\", "
-                  "\"allow-related\", \"drop\", and \"reject\"", action);
+                  "\"allow-related\", \"allow-stateless\", \"drop\", "
+                  "and \"reject\"", action);
         return;
     }