diff mbox series

[ovs-dev,v4,2/2] northd: Provide the option to not use ct.inv in lflows.

Message ID 20210422132502.2367928-1-numans@ovn.org
State Accepted
Headers show
Series [ovs-dev,v4,1/2] northd: Optimize ct nat for load balancer traffic. | expand

Commit Message

Numan Siddique April 22, 2021, 1:25 p.m. UTC
From: Numan Siddique <numans@ovn.org>

Presently we add 65535 priority lflows in the stages -
'ls_in_acl' and 'ls_out_acl' to drop packets which
match on 'ct.inv'.

This patch provides an option to not use this field in the
logical flow matches for the following
reasons:

 • Some of the smart NICs which support offloading datapath
   flows may not support this field.  In such cases, almost
   all the datapath flows (matching on the ct_state field +inv
   or -inv cannot be offloaded if ACLs/load balancers
   are configured on the logical switch datapath.

 • A recent commit in kernel ovs datapath sets the committed
   connection tracking entry to be liberal for out-of-window
   tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
   packets will not be marked as invalid.

There are still certain scenarios where a packet can be marked
invalid.  Like
 • If the tcp flags are not correct.

 • If there are checksum errors.

In such cases, the packet will be delivered to the destination
port.  So CMS should evaluate their usecases before enabling
this option.

Signed-off-by: Numan Siddique <numans@ovn.org>
Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
v3 -> v4
---
  * Moved the static variable 'use_ct_inv_match' declaration up in the
    file.

v2 -> v3
----
  * Updated the documentation in ovn-nb.xml as per Mark G's suggestion.

v1 -> v2
----
  * Adopted the code changes shared by Ben Pfaff for the ddlog code.

 NEWS                 |   7 +-
 northd/lswitch.dl    |   7 +
 northd/ovn-northd.c  |  42 +++---
 northd/ovn_northd.dl | 304 ++++++++++++++++++++++---------------------
 ovn-nb.xml           |  15 +++
 5 files changed, 208 insertions(+), 167 deletions(-)

Comments

Mark Gray April 23, 2021, 3:17 p.m. UTC | #1
On 22/04/2021 14:25, numans@ovn.org wrote:
> From: Numan Siddique <numans@ovn.org>
> 
> Presently we add 65535 priority lflows in the stages -
> 'ls_in_acl' and 'ls_out_acl' to drop packets which
> match on 'ct.inv'.
> 
> This patch provides an option to not use this field in the
> logical flow matches for the following
> reasons:
> 
>  • Some of the smart NICs which support offloading datapath
>    flows may not support this field.  In such cases, almost
>    all the datapath flows (matching on the ct_state field +inv
>    or -inv cannot be offloaded if ACLs/load balancers
>    are configured on the logical switch datapath.
> 
>  • A recent commit in kernel ovs datapath sets the committed
>    connection tracking entry to be liberal for out-of-window
>    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
>    packets will not be marked as invalid.
> 
> There are still certain scenarios where a packet can be marked
> invalid.  Like
>  • If the tcp flags are not correct.
> 
>  • If there are checksum errors.
> 
> In such cases, the packet will be delivered to the destination
> port.  So CMS should evaluate their usecases before enabling
> this option.
> 
> Signed-off-by: Numan Siddique <numans@ovn.org>
> Co-authored-by: Ben Pfaff <blp@ovn.org>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
> v3 -> v4
> ---
>   * Moved the static variable 'use_ct_inv_match' declaration up in the
>     file.
> 
> v2 -> v3
> ----
>   * Updated the documentation in ovn-nb.xml as per Mark G's suggestion.
> 
> v1 -> v2
> ----
>   * Adopted the code changes shared by Ben Pfaff for the ddlog code.
> 
>  NEWS                 |   7 +-
>  northd/lswitch.dl    |   7 +
>  northd/ovn-northd.c  |  42 +++---
>  northd/ovn_northd.dl | 304 ++++++++++++++++++++++---------------------
>  ovn-nb.xml           |  15 +++
>  5 files changed, 208 insertions(+), 167 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 1ddde15f86..5f1365775d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -6,11 +6,16 @@ Post-v21.03.0
>      expected to scale better than the C implementation, for large deployments.
>      (This may take testing and tuning to be effective.)  This version of OVN
>      requires DDLog 0.36.
> -  - Introduce ovn-controller incremetal processing engine statistics
> +  - Introduce ovn-controller incremental processing engine statistics
>    - Introduced parallel processing in ovn-northd with the NB_Global config option
>      'use_parallel_build' to enable it.  It is disabled by default.
>    - Support vlan-passthru mode for tag=0 localnet ports.
>    - Support custom 802.11ad EthType for localnet ports.
> +  - Add a new NB Global option - use_ct_inv_match with the default value of true.
> +    If this is set to false, then the logical field - "ct.inv" will not be
> +    used in the logical flow matches.  CMS can consider setting this to false,
> +    if they want to use smart NICs which don't support offloading datapath flows
> +    with this field used.
>  
>  OVN v21.03.0 - 12 Mar 2021
>  -------------------------
> diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> index 47c497e0cf..27ac3cadc5 100644
> --- a/northd/lswitch.dl
> +++ b/northd/lswitch.dl
> @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options: Map<string,string>,
>  relation SvcMonitorMac(mac: eth_addr)
>  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
>      nb::NB_Global(._uuid = uuid, .options = options).
> +
> +relation UseCtInvMatch[bool]
> +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
> +    nb::NB_Global(.options = options).
> +UseCtInvMatch[true] :-
> +    Unit(),
> +    not nb in nb::NB_Global().
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index de1aa896d3..5f6c0a6922 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
>  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
>  static struct eth_addr svc_monitor_mac_ea;
>  
> +/* If this option is 'true' northd will make use of ct.inv match fields.
> + * Otherwise, it will avoid using it.  The default is true. */
> +static bool use_ct_inv_match = true;
> +
>  /* Default probe interval for NB and SB DB connections. */
>  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>  static int northd_probe_interval_nb = 0;
> @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool shared,
>      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
>  }
>  
> -
>  /* Adds a row with the specified contents to the Logical_Flow table. */
>  static void
>  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
> @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows,
>           * for deletion (bit 0 of ct_label is set).
>           *
>           * This is enforced at a higher priority than ACLs can be defined. */
> +        char *match = xasprintf("%s(ct.est && ct.rpl && ct_label.blocked == 1)",
> +                                use_ct_inv_match ? "ct.inv || " : "");
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> -                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
> -                      "drop;");
> +                      match, "drop;");
>          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> -                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
> -                      "drop;");
> +                      match, "drop;");
> +        free(match);
>  
>          /* Ingress and Egress ACL Table (Priority 65535).
>           *
> @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows,
>           * direction to hit the currently defined policy from ACLs.
>           *
>           * This is enforced at a higher priority than ACLs can be defined. */
> +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
> +                          "ct.rpl && ct_label.blocked == 0",
> +                          use_ct_inv_match ? " && !ct.inv" : "");
> +
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> -                      "&& ct.rpl && ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
>          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> -                      "&& ct.rpl && ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
> +        free(match);
>  
>          /* Ingress and Egress ACL Table (Priority 65535).
>           *
> @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows,
>           * a dynamically negotiated FTP data channel), but will allow
>           * related traffic such as an ICMP Port Unreachable through
>           * that's generated from a non-listening UDP port.  */
> +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
> +                          "ct_label.blocked == 0",
> +                          use_ct_inv_match ? " && !ct.inv" : "");
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> -                      "&& ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
>          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> -                      "&& ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
> +        free(match);
>  
>          /* Ingress and Egress ACL Table (Priority 65535).
>           *
> @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
>  
>      use_logical_dp_groups = smap_get_bool(&nb->options,
>                                            "use_logical_dp_groups", false);
> +    use_ct_inv_match = smap_get_bool(&nb->options,
> +                                     "use_ct_inv_match", true);
> +
>      /* deprecated, use --event instead */
>      controller_event_en = smap_get_bool(&nb->options,
>                                          "controller_event", false);
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 2f5d36c599..a0f7944103 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage, acl, fair_meter, extra_match_, extra_action
>  }
>  
>  /* build_acls */
> -for (sw in &Switch(.ls = ls))
> -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> -{
> -    /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
> -     * default.  A related rule at priority 1 is added below if there
> -     * are any stateful ACLs in this datapath. */
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_IN_ACL(),
> -         .priority         = 0,
> -         .__match          = "1",
> -         .actions          = "next;",
> -         .external_ids     = map_empty());
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_OUT_ACL(),
> -         .priority         = 0,
> -         .__match          = "1",
> -         .actions          = "next;",
> -         .external_ids     = map_empty());
> -
> -    if (has_stateful) {
> -        /* Ingress and Egress ACL Table (Priority 1).
> -         *
> -         * By default, traffic is allowed.  This is partially handled by
> -         * the Priority 0 ACL flows added earlier, but we also need to
> -         * commit IP flows.  This is because, while the initiater's
> -         * direction may not have any stateful rules, the server's may
> -         * and then its return traffic would not have an associated
> -         * conntrack entry and would return "+invalid".
> -         *
> -         * We use "ct_commit" for a connection that is not already known
> -         * by the connection tracker.  Once a connection is committed,
> -         * subsequent packets will hit the flow at priority 0 that just
> -         * uses "next;"
> -         *
> -         * We also check for established connections that have ct_label.blocked
> -         * set on them.  That's a connection that was disallowed, but is
> -         * now allowed by policy again since it hit this default-allow flow.
> -         * We need to set ct_label.blocked=0 to let the connection continue,
> -         * which will be done by ct_commit() in the "stateful" stage.
> -         * Subsequent packets will hit the flow at priority 0 that just
> -         * uses "next;". */
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 1,
> -             .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
> -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
> -             .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 1,
> -             .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
> -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
> -             .external_ids     = map_empty());
> -
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Always drop traffic that's in an invalid state.  Also drop
> -         * reply direction packets for connections that have been marked
> -         * for deletion (bit 0 of ct_label is set).
> -         *
> -         * This is enforced at a higher priority than ACLs can be defined. */
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
> -             .actions          = "drop;",
> -             .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
> -             .actions          = "drop;",
> -             .external_ids     = map_empty());
> -
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Allow reply traffic that is part of an established
> -         * conntrack entry that has not been marked for deletion
> -         * (bit 0 of ct_label).  We only match traffic in the
> -         * reply direction because we want traffic in the request
> -         * direction to hit the currently defined policy from ACLs.
> -         *
> -         * This is enforced at a higher priority than ACLs can be defined. */
> +for (UseCtInvMatch[use_ct_inv_match]) {
> +    (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match) {
> +        true -> ("ct.inv || ", "&& !ct.inv "),
> +        false -> ("", ""),
> +    } in
> +    for (sw in &Switch(.ls = ls))
> +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> +    {
> +        /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
> +         * default.  A related rule at priority 1 is added below if there
> +         * are any stateful ACLs in this datapath. */
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.est && !ct.rel && !ct.new && !ct.inv "
> -                                 "&& ct.rpl && ct_label.blocked == 0",
> +             .priority         = 0,
> +             .__match          = "1",
>               .actions          = "next;",
>               .external_ids     = map_empty());
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.est && !ct.rel && !ct.new && !ct.inv "
> -                                 "&& ct.rpl && ct_label.blocked == 0",
> +             .priority         = 0,
> +             .__match          = "1",
>               .actions          = "next;",
>               .external_ids     = map_empty());
>  
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Allow traffic that is related to an existing conntrack entry that
> -         * has not been marked for deletion (bit 0 of ct_label).
> -         *
> -         * This is enforced at a higher priority than ACLs can be defined.
> -         *
> -         * NOTE: This does not support related data sessions (eg,
> -         * a dynamically negotiated FTP data channel), but will allow
> -         * related traffic such as an ICMP Port Unreachable through
> -         * that's generated from a non-listening UDP port.  */
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "!ct.est && ct.rel && !ct.new && !ct.inv "
> -                                 "&& ct_label.blocked == 0",
> -             .actions          = "next;",
> -             .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "!ct.est && ct.rel && !ct.new && !ct.inv "
> -                                 "&& ct_label.blocked == 0",
> -             .actions          = "next;",
> -             .external_ids     = map_empty());
> +        if (has_stateful) {
> +            /* Ingress and Egress ACL Table (Priority 1).
> +             *
> +             * By default, traffic is allowed.  This is partially handled by
> +             * the Priority 0 ACL flows added earlier, but we also need to
> +             * commit IP flows.  This is because, while the initiater's
> +             * direction may not have any stateful rules, the server's may
> +             * and then its return traffic would not have an associated
> +             * conntrack entry and would return "+invalid".
> +             *
> +             * We use "ct_commit" for a connection that is not already known
> +             * by the connection tracker.  Once a connection is committed,
> +             * subsequent packets will hit the flow at priority 0 that just
> +             * uses "next;"
> +             *
> +             * We also check for established connections that have ct_label.blocked
> +             * set on them.  That's a connection that was disallowed, but is
> +             * now allowed by policy again since it hit this default-allow flow.
> +             * We need to set ct_label.blocked=0 to let the connection continue,
> +             * which will be done by ct_commit() in the "stateful" stage.
> +             * Subsequent packets will hit the flow at priority 0 that just
> +             * uses "next;". */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 1,
> +                 .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
> +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 1,
> +                 .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
> +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
> +                 .external_ids     = map_empty());
>  
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Not to do conntrack on ND packets. */
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Always drop traffic that's in an invalid state.  Also drop
> +             * reply direction packets for connections that have been marked
> +             * for deletion (bit 0 of ct_label is set).
> +             *
> +             * This is enforced at a higher priority than ACLs can be defined. */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl && ct_label.blocked == 1)",
> +                 .actions          = "drop;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl && ct_label.blocked == 1)",
> +                 .actions          = "drop;",
> +                 .external_ids     = map_empty());
> +
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Allow reply traffic that is part of an established
> +             * conntrack entry that has not been marked for deletion
> +             * (bit 0 of ct_label).  We only match traffic in the
> +             * reply direction because we want traffic in the request
> +             * direction to hit the currently defined policy from ACLs.
> +             *
> +             * This is enforced at a higher priority than ACLs can be defined. */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "ct.est && !ct.rel && !ct.new " ++ and_not_ct_inv ++
> +                                     "&& ct.rpl && ct_label.blocked == 0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "ct.est && !ct.rel && !ct.new " ++ and_not_ct_inv ++
> +                                     "&& ct.rpl && ct_label.blocked == 0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Allow traffic that is related to an existing conntrack entry that
> +             * has not been marked for deletion (bit 0 of ct_label).
> +             *
> +             * This is enforced at a higher priority than ACLs can be defined.
> +             *
> +             * NOTE: This does not support related data sessions (eg,
> +             * a dynamically negotiated FTP data channel), but will allow
> +             * related traffic such as an ICMP Port Unreachable through
> +             * that's generated from a non-listening UDP port.  */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "!ct.est && ct.rel && !ct.new " ++ and_not_ct_inv ++
> +                                     "&& ct_label.blocked == 0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "!ct.est && ct.rel && !ct.new " ++ and_not_ct_inv ++
> +                                     "&& ct_label.blocked == 0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Not to do conntrack on ND packets. */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty())
> +        };
> +
> +        /* Add a 34000 priority flow to advance the DNS reply from ovn-controller,
> +         * if the CMS has configured DNS records for the datapath.
> +         */
> +        if (sw.has_dns_records) {
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 34000,
> +                 .__match          = "udp.src == 53",
> +                 .actions          = if has_stateful "ct_commit; next;" else "next;",
> +                 .external_ids     = map_empty())
> +        };
> +
> +        /* Add a 34000 priority flow to advance the service monitor reply
> +         * packets to skip applying ingress ACLs. */
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
> +             .priority         = 34000,
> +             .__match          = "eth.dst == $svc_monitor_mac",
>               .actions          = "next;",
>               .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
> -             .actions          = "next;",
> -             .external_ids     = map_empty())
> -    };
> -
> -    /* Add a 34000 priority flow to advance the DNS reply from ovn-controller,
> -     * if the CMS has configured DNS records for the datapath.
> -     */
> -    if (sw.has_dns_records) {
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_OUT_ACL(),
>               .priority         = 34000,
> -             .__match          = "udp.src == 53",
> -             .actions          = if has_stateful "ct_commit; next;" else "next;",
> +             .__match          = "eth.src == $svc_monitor_mac",
> +             .actions          = "next;",
>               .external_ids     = map_empty())
> -    };
> -
> -    /* Add a 34000 priority flow to advance the service monitor reply
> -     * packets to skip applying ingress ACLs. */
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_IN_ACL(),
> -         .priority         = 34000,
> -         .__match          = "eth.dst == $svc_monitor_mac",
> -         .actions          = "next;",
> -         .external_ids     = map_empty());
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_OUT_ACL(),
> -         .priority         = 34000,
> -         .__match          = "eth.src == $svc_monitor_mac",
> -         .actions          = "next;",
> -         .external_ids     = map_empty())
> +    }
>  }
>  
>  /* This stage builds hints for the IN/OUT_ACL stage. Based on various
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index feb38b5d31..e337be4eb3 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -240,6 +240,21 @@
>          </p>
>        </column>
>  
> +      <column name="options" key="use_ct_inv_match">
> +        <p>
> +          If set to false, <code>ovn-northd</code> will not use the
> +          <code>ct.inv</code> field in any of the logical flow matches.
> +          The default value is true.  If the NIC supports offloading
> +          OVS datapath flows but doesn't support offloading ct_state
> +          <code>inv</code> flag, then the datapath flows matching on this flag
> +          (either <code>+inv</code> or <code>-inv</code>) will not be
> +          offloaded.  CMS should consider setting <code>use_ct_inv_match</code>
> +          to <code>false</code> in such cases.  This results in a side effect
> +          of the invalid packets getting delivered to the destination VIF, which
> +          otherwise would have been dropped by <code>OVN</code>.
> +        </p>
> +      </column>
> +
>        <group title="Options for configuring interconnection route advertisement">
>          <p>
>            These options control how routes are advertised between OVN
> 

Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Han Zhou May 6, 2021, 6:33 a.m. UTC | #2
On Thu, Apr 22, 2021 at 6:25 AM <numans@ovn.org> wrote:
>
> From: Numan Siddique <numans@ovn.org>
>
> Presently we add 65535 priority lflows in the stages -
> 'ls_in_acl' and 'ls_out_acl' to drop packets which
> match on 'ct.inv'.
>
> This patch provides an option to not use this field in the
> logical flow matches for the following
> reasons:
>
>  • Some of the smart NICs which support offloading datapath
>    flows may not support this field.  In such cases, almost
>    all the datapath flows (matching on the ct_state field +inv
>    or -inv cannot be offloaded if ACLs/load balancers
>    are configured on the logical switch datapath.
>
>  • A recent commit in kernel ovs datapath sets the committed
>    connection tracking entry to be liberal for out-of-window
>    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
>    packets will not be marked as invalid.
>
> There are still certain scenarios where a packet can be marked
> invalid.  Like
>  • If the tcp flags are not correct.
>
>  • If there are checksum errors.
>
> In such cases, the packet will be delivered to the destination
> port.  So CMS should evaluate their usecases before enabling
> this option.
>
> Signed-off-by: Numan Siddique <numans@ovn.org>
> Co-authored-by: Ben Pfaff <blp@ovn.org>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
> v3 -> v4
> ---
>   * Moved the static variable 'use_ct_inv_match' declaration up in the
>     file.
>
> v2 -> v3
> ----
>   * Updated the documentation in ovn-nb.xml as per Mark G's suggestion.
>
> v1 -> v2
> ----
>   * Adopted the code changes shared by Ben Pfaff for the ddlog code.
>
>  NEWS                 |   7 +-
>  northd/lswitch.dl    |   7 +
>  northd/ovn-northd.c  |  42 +++---
>  northd/ovn_northd.dl | 304 ++++++++++++++++++++++---------------------
>  ovn-nb.xml           |  15 +++
>  5 files changed, 208 insertions(+), 167 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 1ddde15f86..5f1365775d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -6,11 +6,16 @@ Post-v21.03.0
>      expected to scale better than the C implementation, for large
deployments.
>      (This may take testing and tuning to be effective.)  This version of
OVN
>      requires DDLog 0.36.
> -  - Introduce ovn-controller incremetal processing engine statistics
> +  - Introduce ovn-controller incremental processing engine statistics
>    - Introduced parallel processing in ovn-northd with the NB_Global
config option
>      'use_parallel_build' to enable it.  It is disabled by default.
>    - Support vlan-passthru mode for tag=0 localnet ports.
>    - Support custom 802.11ad EthType for localnet ports.
> +  - Add a new NB Global option - use_ct_inv_match with the default value
of true.
> +    If this is set to false, then the logical field - "ct.inv" will not
be
> +    used in the logical flow matches.  CMS can consider setting this to
false,
> +    if they want to use smart NICs which don't support offloading
datapath flows
> +    with this field used.
>
>  OVN v21.03.0 - 12 Mar 2021
>  -------------------------
> diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> index 47c497e0cf..27ac3cadc5 100644
> --- a/northd/lswitch.dl
> +++ b/northd/lswitch.dl
> @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options:
Map<string,string>,
>  relation SvcMonitorMac(mac: eth_addr)
>  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
>      nb::NB_Global(._uuid = uuid, .options = options).
> +
> +relation UseCtInvMatch[bool]
> +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
> +    nb::NB_Global(.options = options).
> +UseCtInvMatch[true] :-
> +    Unit(),
> +    not nb in nb::NB_Global().
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index de1aa896d3..5f6c0a6922 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
>  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
>  static struct eth_addr svc_monitor_mac_ea;
>
> +/* If this option is 'true' northd will make use of ct.inv match fields.
> + * Otherwise, it will avoid using it.  The default is true. */
> +static bool use_ct_inv_match = true;
> +
>  /* Default probe interval for NB and SB DB connections. */
>  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>  static int northd_probe_interval_nb = 0;
> @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool
shared,
>      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
>  }
>
> -
>  /* Adds a row with the specified contents to the Logical_Flow table. */
>  static void
>  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
> @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct hmap
*lflows,
>           * for deletion (bit 0 of ct_label is set).
>           *
>           * This is enforced at a higher priority than ACLs can be
defined. */
> +        char *match = xasprintf("%s(ct.est && ct.rpl && ct_label.blocked
== 1)",
> +                                use_ct_inv_match ? "ct.inv || " : "");
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> -                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked
== 1)",
> -                      "drop;");
> +                      match, "drop;");
>          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> -                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked
== 1)",
> -                      "drop;");
> +                      match, "drop;");
> +        free(match);
>
>          /* Ingress and Egress ACL Table (Priority 65535).
>           *
> @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct hmap
*lflows,
>           * direction to hit the currently defined policy from ACLs.
>           *
>           * This is enforced at a higher priority than ACLs can be
defined. */
> +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
> +                          "ct.rpl && ct_label.blocked == 0",
> +                          use_ct_inv_match ? " && !ct.inv" : "");
> +
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> -                      "&& ct.rpl && ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
>          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> -                      "&& ct.rpl && ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
> +        free(match);
>
>          /* Ingress and Egress ACL Table (Priority 65535).
>           *
> @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct hmap
*lflows,
>           * a dynamically negotiated FTP data channel), but will allow
>           * related traffic such as an ICMP Port Unreachable through
>           * that's generated from a non-listening UDP port.  */
> +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
> +                          "ct_label.blocked == 0",
> +                          use_ct_inv_match ? " && !ct.inv" : "");
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> -                      "&& ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
>          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> -                      "&& ct_label.blocked == 0",
> -                      "next;");
> +                      match, "next;");
> +        free(match);
>
>          /* Ingress and Egress ACL Table (Priority 65535).
>           *
> @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
>
>      use_logical_dp_groups = smap_get_bool(&nb->options,
>                                            "use_logical_dp_groups",
false);
> +    use_ct_inv_match = smap_get_bool(&nb->options,
> +                                     "use_ct_inv_match", true);
> +
>      /* deprecated, use --event instead */
>      controller_event_en = smap_get_bool(&nb->options,
>                                          "controller_event", false);
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 2f5d36c599..a0f7944103 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage, acl,
fair_meter, extra_match_, extra_action
>  }
>
>  /* build_acls */
> -for (sw in &Switch(.ls = ls))
> -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> -{
> -    /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
> -     * default.  A related rule at priority 1 is added below if there
> -     * are any stateful ACLs in this datapath. */
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_IN_ACL(),
> -         .priority         = 0,
> -         .__match          = "1",
> -         .actions          = "next;",
> -         .external_ids     = map_empty());
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_OUT_ACL(),
> -         .priority         = 0,
> -         .__match          = "1",
> -         .actions          = "next;",
> -         .external_ids     = map_empty());
> -
> -    if (has_stateful) {
> -        /* Ingress and Egress ACL Table (Priority 1).
> -         *
> -         * By default, traffic is allowed.  This is partially handled by
> -         * the Priority 0 ACL flows added earlier, but we also need to
> -         * commit IP flows.  This is because, while the initiater's
> -         * direction may not have any stateful rules, the server's may
> -         * and then its return traffic would not have an associated
> -         * conntrack entry and would return "+invalid".
> -         *
> -         * We use "ct_commit" for a connection that is not already known
> -         * by the connection tracker.  Once a connection is committed,
> -         * subsequent packets will hit the flow at priority 0 that just
> -         * uses "next;"
> -         *
> -         * We also check for established connections that have
ct_label.blocked
> -         * set on them.  That's a connection that was disallowed, but is
> -         * now allowed by policy again since it hit this default-allow
flow.
> -         * We need to set ct_label.blocked=0 to let the connection
continue,
> -         * which will be done by ct_commit() in the "stateful" stage.
> -         * Subsequent packets will hit the flow at priority 0 that just
> -         * uses "next;". */
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 1,
> -             .__match          = "ip && (!ct.est || (ct.est &&
ct_label.blocked == 1))",
> -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
next;",
> -             .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 1,
> -             .__match          = "ip && (!ct.est || (ct.est &&
ct_label.blocked == 1))",
> -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
next;",
> -             .external_ids     = map_empty());
> -
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Always drop traffic that's in an invalid state.  Also drop
> -         * reply direction packets for connections that have been marked
> -         * for deletion (bit 0 of ct_label is set).
> -         *
> -         * This is enforced at a higher priority than ACLs can be
defined. */
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.inv || (ct.est && ct.rpl &&
ct_label.blocked == 1)",
> -             .actions          = "drop;",
> -             .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.inv || (ct.est && ct.rpl &&
ct_label.blocked == 1)",
> -             .actions          = "drop;",
> -             .external_ids     = map_empty());
> -
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Allow reply traffic that is part of an established
> -         * conntrack entry that has not been marked for deletion
> -         * (bit 0 of ct_label).  We only match traffic in the
> -         * reply direction because we want traffic in the request
> -         * direction to hit the currently defined policy from ACLs.
> -         *
> -         * This is enforced at a higher priority than ACLs can be
defined. */
> +for (UseCtInvMatch[use_ct_inv_match]) {
> +    (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match) {
> +        true -> ("ct.inv || ", "&& !ct.inv "),
> +        false -> ("", ""),
> +    } in
> +    for (sw in &Switch(.ls = ls))
> +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> +    {
> +        /* Ingress and Egress ACL Table (Priority 0): Packets are
allowed by
> +         * default.  A related rule at priority 1 is added below if there
> +         * are any stateful ACLs in this datapath. */
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.est && !ct.rel && !ct.new &&
!ct.inv "
> -                                 "&& ct.rpl && ct_label.blocked == 0",
> +             .priority         = 0,
> +             .__match          = "1",
>               .actions          = "next;",
>               .external_ids     = map_empty());
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "ct.est && !ct.rel && !ct.new &&
!ct.inv "
> -                                 "&& ct.rpl && ct_label.blocked == 0",
> +             .priority         = 0,
> +             .__match          = "1",
>               .actions          = "next;",
>               .external_ids     = map_empty());
>
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Allow traffic that is related to an existing conntrack entry
that
> -         * has not been marked for deletion (bit 0 of ct_label).
> -         *
> -         * This is enforced at a higher priority than ACLs can be
defined.
> -         *
> -         * NOTE: This does not support related data sessions (eg,
> -         * a dynamically negotiated FTP data channel), but will allow
> -         * related traffic such as an ICMP Port Unreachable through
> -         * that's generated from a non-listening UDP port.  */
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "!ct.est && ct.rel && !ct.new &&
!ct.inv "
> -                                 "&& ct_label.blocked == 0",
> -             .actions          = "next;",
> -             .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "!ct.est && ct.rel && !ct.new &&
!ct.inv "
> -                                 "&& ct_label.blocked == 0",
> -             .actions          = "next;",
> -             .external_ids     = map_empty());
> +        if (has_stateful) {
> +            /* Ingress and Egress ACL Table (Priority 1).
> +             *
> +             * By default, traffic is allowed.  This is partially
handled by
> +             * the Priority 0 ACL flows added earlier, but we also need
to
> +             * commit IP flows.  This is because, while the initiater's
> +             * direction may not have any stateful rules, the server's
may
> +             * and then its return traffic would not have an associated
> +             * conntrack entry and would return "+invalid".
> +             *
> +             * We use "ct_commit" for a connection that is not already
known
> +             * by the connection tracker.  Once a connection is
committed,
> +             * subsequent packets will hit the flow at priority 0 that
just
> +             * uses "next;"
> +             *
> +             * We also check for established connections that have
ct_label.blocked
> +             * set on them.  That's a connection that was disallowed,
but is
> +             * now allowed by policy again since it hit this
default-allow flow.
> +             * We need to set ct_label.blocked=0 to let the connection
continue,
> +             * which will be done by ct_commit() in the "stateful" stage.
> +             * Subsequent packets will hit the flow at priority 0 that
just
> +             * uses "next;". */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 1,
> +                 .__match          = "ip && (!ct.est || (ct.est &&
ct_label.blocked == 1))",
> +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 1,
> +                 .__match          = "ip && (!ct.est || (ct.est &&
ct_label.blocked == 1))",
> +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
next;",
> +                 .external_ids     = map_empty());
>
> -        /* Ingress and Egress ACL Table (Priority 65535).
> -         *
> -         * Not to do conntrack on ND packets. */
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Always drop traffic that's in an invalid state.  Also drop
> +             * reply direction packets for connections that have been
marked
> +             * for deletion (bit 0 of ct_label is set).
> +             *
> +             * This is enforced at a higher priority than ACLs can be
defined. */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl &&
ct_label.blocked == 1)",
> +                 .actions          = "drop;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl &&
ct_label.blocked == 1)",
> +                 .actions          = "drop;",
> +                 .external_ids     = map_empty());
> +
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Allow reply traffic that is part of an established
> +             * conntrack entry that has not been marked for deletion
> +             * (bit 0 of ct_label).  We only match traffic in the
> +             * reply direction because we want traffic in the request
> +             * direction to hit the currently defined policy from ACLs.
> +             *
> +             * This is enforced at a higher priority than ACLs can be
defined. */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "ct.est && !ct.rel && !ct.new " ++
and_not_ct_inv ++
> +                                     "&& ct.rpl && ct_label.blocked ==
0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "ct.est && !ct.rel && !ct.new " ++
and_not_ct_inv ++
> +                                     "&& ct.rpl && ct_label.blocked ==
0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Allow traffic that is related to an existing conntrack
entry that
> +             * has not been marked for deletion (bit 0 of ct_label).
> +             *
> +             * This is enforced at a higher priority than ACLs can be
defined.
> +             *
> +             * NOTE: This does not support related data sessions (eg,
> +             * a dynamically negotiated FTP data channel), but will allow
> +             * related traffic such as an ICMP Port Unreachable through
> +             * that's generated from a non-listening UDP port.  */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "!ct.est && ct.rel && !ct.new " ++
and_not_ct_inv ++
> +                                     "&& ct_label.blocked == 0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "!ct.est && ct.rel && !ct.new " ++
and_not_ct_inv ++
> +                                     "&& ct_label.blocked == 0",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +
> +            /* Ingress and Egress ACL Table (Priority 65535).
> +             *
> +             * Not to do conntrack on ND packets. */
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_IN_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "nd || nd_ra || nd_rs || mldv1 ||
mldv2",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty());
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 65535,
> +                 .__match          = "nd || nd_ra || nd_rs || mldv1 ||
mldv2",
> +                 .actions          = "next;",
> +                 .external_ids     = map_empty())
> +        };
> +
> +        /* Add a 34000 priority flow to advance the DNS reply from
ovn-controller,
> +         * if the CMS has configured DNS records for the datapath.
> +         */
> +        if (sw.has_dns_records) {
> +            Flow(.logical_datapath = ls._uuid,
> +                 .stage            = s_SWITCH_OUT_ACL(),
> +                 .priority         = 34000,
> +                 .__match          = "udp.src == 53",
> +                 .actions          = if has_stateful "ct_commit; next;"
else "next;",
> +                 .external_ids     = map_empty())
> +        };
> +
> +        /* Add a 34000 priority flow to advance the service monitor reply
> +         * packets to skip applying ingress ACLs. */
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_IN_ACL(),
> -             .priority         = 65535,
> -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
mldv2",
> +             .priority         = 34000,
> +             .__match          = "eth.dst == $svc_monitor_mac",
>               .actions          = "next;",
>               .external_ids     = map_empty());
> -        Flow(.logical_datapath = ls._uuid,
> -             .stage            = s_SWITCH_OUT_ACL(),
> -             .priority         = 65535,
> -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
mldv2",
> -             .actions          = "next;",
> -             .external_ids     = map_empty())
> -    };
> -
> -    /* Add a 34000 priority flow to advance the DNS reply from
ovn-controller,
> -     * if the CMS has configured DNS records for the datapath.
> -     */
> -    if (sw.has_dns_records) {
>          Flow(.logical_datapath = ls._uuid,
>               .stage            = s_SWITCH_OUT_ACL(),
>               .priority         = 34000,
> -             .__match          = "udp.src == 53",
> -             .actions          = if has_stateful "ct_commit; next;" else
"next;",
> +             .__match          = "eth.src == $svc_monitor_mac",
> +             .actions          = "next;",
>               .external_ids     = map_empty())
> -    };
> -
> -    /* Add a 34000 priority flow to advance the service monitor reply
> -     * packets to skip applying ingress ACLs. */
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_IN_ACL(),
> -         .priority         = 34000,
> -         .__match          = "eth.dst == $svc_monitor_mac",
> -         .actions          = "next;",
> -         .external_ids     = map_empty());
> -    Flow(.logical_datapath = ls._uuid,
> -         .stage            = s_SWITCH_OUT_ACL(),
> -         .priority         = 34000,
> -         .__match          = "eth.src == $svc_monitor_mac",
> -         .actions          = "next;",
> -         .external_ids     = map_empty())
> +    }
>  }
>
>  /* This stage builds hints for the IN/OUT_ACL stage. Based on various
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index feb38b5d31..e337be4eb3 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -240,6 +240,21 @@
>          </p>
>        </column>
>
> +      <column name="options" key="use_ct_inv_match">
> +        <p>
> +          If set to false, <code>ovn-northd</code> will not use the
> +          <code>ct.inv</code> field in any of the logical flow matches.
> +          The default value is true.  If the NIC supports offloading
> +          OVS datapath flows but doesn't support offloading ct_state
> +          <code>inv</code> flag, then the datapath flows matching on
this flag
> +          (either <code>+inv</code> or <code>-inv</code>) will not be
> +          offloaded.  CMS should consider setting
<code>use_ct_inv_match</code>
> +          to <code>false</code> in such cases.  This results in a side
effect
> +          of the invalid packets getting delivered to the destination
VIF, which
> +          otherwise would have been dropped by <code>OVN</code>.
> +        </p>
> +      </column>
> +
>        <group title="Options for configuring interconnection route
advertisement">
>          <p>
>            These options control how routes are advertised between OVN
> --
> 2.29.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Acked-by: Han Zhou <hzhou@ovn.org>
Numan Siddique May 6, 2021, 6:41 p.m. UTC | #3
On Thu, May 6, 2021 at 2:35 AM Han Zhou <hzhou@ovn.org> wrote:
>
> On Thu, Apr 22, 2021 at 6:25 AM <numans@ovn.org> wrote:
> >
> > From: Numan Siddique <numans@ovn.org>
> >
> > Presently we add 65535 priority lflows in the stages -
> > 'ls_in_acl' and 'ls_out_acl' to drop packets which
> > match on 'ct.inv'.
> >
> > This patch provides an option to not use this field in the
> > logical flow matches for the following
> > reasons:
> >
> >  • Some of the smart NICs which support offloading datapath
> >    flows may not support this field.  In such cases, almost
> >    all the datapath flows (matching on the ct_state field +inv
> >    or -inv cannot be offloaded if ACLs/load balancers
> >    are configured on the logical switch datapath.
> >
> >  • A recent commit in kernel ovs datapath sets the committed
> >    connection tracking entry to be liberal for out-of-window
> >    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
> >    packets will not be marked as invalid.
> >
> > There are still certain scenarios where a packet can be marked
> > invalid.  Like
> >  • If the tcp flags are not correct.
> >
> >  • If there are checksum errors.
> >
> > In such cases, the packet will be delivered to the destination
> > port.  So CMS should evaluate their usecases before enabling
> > this option.
> >
> > Signed-off-by: Numan Siddique <numans@ovn.org>
> > Co-authored-by: Ben Pfaff <blp@ovn.org>
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> > v3 -> v4
> > ---
> >   * Moved the static variable 'use_ct_inv_match' declaration up in the
> >     file.
> >
> > v2 -> v3
> > ----
> >   * Updated the documentation in ovn-nb.xml as per Mark G's suggestion.
> >
> > v1 -> v2
> > ----
> >   * Adopted the code changes shared by Ben Pfaff for the ddlog code.
> >
> >  NEWS                 |   7 +-
> >  northd/lswitch.dl    |   7 +
> >  northd/ovn-northd.c  |  42 +++---
> >  northd/ovn_northd.dl | 304 ++++++++++++++++++++++---------------------
> >  ovn-nb.xml           |  15 +++
> >  5 files changed, 208 insertions(+), 167 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 1ddde15f86..5f1365775d 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -6,11 +6,16 @@ Post-v21.03.0
> >      expected to scale better than the C implementation, for large
> deployments.
> >      (This may take testing and tuning to be effective.)  This version of
> OVN
> >      requires DDLog 0.36.
> > -  - Introduce ovn-controller incremetal processing engine statistics
> > +  - Introduce ovn-controller incremental processing engine statistics
> >    - Introduced parallel processing in ovn-northd with the NB_Global
> config option
> >      'use_parallel_build' to enable it.  It is disabled by default.
> >    - Support vlan-passthru mode for tag=0 localnet ports.
> >    - Support custom 802.11ad EthType for localnet ports.
> > +  - Add a new NB Global option - use_ct_inv_match with the default value
> of true.
> > +    If this is set to false, then the logical field - "ct.inv" will not
> be
> > +    used in the logical flow matches.  CMS can consider setting this to
> false,
> > +    if they want to use smart NICs which don't support offloading
> datapath flows
> > +    with this field used.
> >
> >  OVN v21.03.0 - 12 Mar 2021
> >  -------------------------
> > diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> > index 47c497e0cf..27ac3cadc5 100644
> > --- a/northd/lswitch.dl
> > +++ b/northd/lswitch.dl
> > @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options:
> Map<string,string>,
> >  relation SvcMonitorMac(mac: eth_addr)
> >  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
> >      nb::NB_Global(._uuid = uuid, .options = options).
> > +
> > +relation UseCtInvMatch[bool]
> > +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
> > +    nb::NB_Global(.options = options).
> > +UseCtInvMatch[true] :-
> > +    Unit(),
> > +    not nb in nb::NB_Global().
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index de1aa896d3..5f6c0a6922 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
> >  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
> >  static struct eth_addr svc_monitor_mac_ea;
> >
> > +/* If this option is 'true' northd will make use of ct.inv match fields.
> > + * Otherwise, it will avoid using it.  The default is true. */
> > +static bool use_ct_inv_match = true;
> > +
> >  /* Default probe interval for NB and SB DB connections. */
> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> >  static int northd_probe_interval_nb = 0;
> > @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool
> shared,
> >      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
> >  }
> >
> > -
> >  /* Adds a row with the specified contents to the Logical_Flow table. */
> >  static void
> >  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
> > @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct hmap
> *lflows,
> >           * for deletion (bit 0 of ct_label is set).
> >           *
> >           * This is enforced at a higher priority than ACLs can be
> defined. */
> > +        char *match = xasprintf("%s(ct.est && ct.rpl && ct_label.blocked
> == 1)",
> > +                                use_ct_inv_match ? "ct.inv || " : "");
> >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > -                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked
> == 1)",
> > -                      "drop;");
> > +                      match, "drop;");
> >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > -                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked
> == 1)",
> > -                      "drop;");
> > +                      match, "drop;");
> > +        free(match);
> >
> >          /* Ingress and Egress ACL Table (Priority 65535).
> >           *
> > @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct hmap
> *lflows,
> >           * direction to hit the currently defined policy from ACLs.
> >           *
> >           * This is enforced at a higher priority than ACLs can be
> defined. */
> > +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
> > +                          "ct.rpl && ct_label.blocked == 0",
> > +                          use_ct_inv_match ? " && !ct.inv" : "");
> > +
> >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> > -                      "&& ct.rpl && ct_label.blocked == 0",
> > -                      "next;");
> > +                      match, "next;");
> >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> > -                      "&& ct.rpl && ct_label.blocked == 0",
> > -                      "next;");
> > +                      match, "next;");
> > +        free(match);
> >
> >          /* Ingress and Egress ACL Table (Priority 65535).
> >           *
> > @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct hmap
> *lflows,
> >           * a dynamically negotiated FTP data channel), but will allow
> >           * related traffic such as an ICMP Port Unreachable through
> >           * that's generated from a non-listening UDP port.  */
> > +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
> > +                          "ct_label.blocked == 0",
> > +                          use_ct_inv_match ? " && !ct.inv" : "");
> >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> > -                      "&& ct_label.blocked == 0",
> > -                      "next;");
> > +                      match, "next;");
> >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> > -                      "&& ct_label.blocked == 0",
> > -                      "next;");
> > +                      match, "next;");
> > +        free(match);
> >
> >          /* Ingress and Egress ACL Table (Priority 65535).
> >           *
> > @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
> >
> >      use_logical_dp_groups = smap_get_bool(&nb->options,
> >                                            "use_logical_dp_groups",
> false);
> > +    use_ct_inv_match = smap_get_bool(&nb->options,
> > +                                     "use_ct_inv_match", true);
> > +
> >      /* deprecated, use --event instead */
> >      controller_event_en = smap_get_bool(&nb->options,
> >                                          "controller_event", false);
> > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> > index 2f5d36c599..a0f7944103 100644
> > --- a/northd/ovn_northd.dl
> > +++ b/northd/ovn_northd.dl
> > @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage, acl,
> fair_meter, extra_match_, extra_action
> >  }
> >
> >  /* build_acls */
> > -for (sw in &Switch(.ls = ls))
> > -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> > -{
> > -    /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
> > -     * default.  A related rule at priority 1 is added below if there
> > -     * are any stateful ACLs in this datapath. */
> > -    Flow(.logical_datapath = ls._uuid,
> > -         .stage            = s_SWITCH_IN_ACL(),
> > -         .priority         = 0,
> > -         .__match          = "1",
> > -         .actions          = "next;",
> > -         .external_ids     = map_empty());
> > -    Flow(.logical_datapath = ls._uuid,
> > -         .stage            = s_SWITCH_OUT_ACL(),
> > -         .priority         = 0,
> > -         .__match          = "1",
> > -         .actions          = "next;",
> > -         .external_ids     = map_empty());
> > -
> > -    if (has_stateful) {
> > -        /* Ingress and Egress ACL Table (Priority 1).
> > -         *
> > -         * By default, traffic is allowed.  This is partially handled by
> > -         * the Priority 0 ACL flows added earlier, but we also need to
> > -         * commit IP flows.  This is because, while the initiater's
> > -         * direction may not have any stateful rules, the server's may
> > -         * and then its return traffic would not have an associated
> > -         * conntrack entry and would return "+invalid".
> > -         *
> > -         * We use "ct_commit" for a connection that is not already known
> > -         * by the connection tracker.  Once a connection is committed,
> > -         * subsequent packets will hit the flow at priority 0 that just
> > -         * uses "next;"
> > -         *
> > -         * We also check for established connections that have
> ct_label.blocked
> > -         * set on them.  That's a connection that was disallowed, but is
> > -         * now allowed by policy again since it hit this default-allow
> flow.
> > -         * We need to set ct_label.blocked=0 to let the connection
> continue,
> > -         * which will be done by ct_commit() in the "stateful" stage.
> > -         * Subsequent packets will hit the flow at priority 0 that just
> > -         * uses "next;". */
> > -        Flow(.logical_datapath = ls._uuid,
> > -             .stage            = s_SWITCH_IN_ACL(),
> > -             .priority         = 1,
> > -             .__match          = "ip && (!ct.est || (ct.est &&
> ct_label.blocked == 1))",
> > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> next;",
> > -             .external_ids     = map_empty());
> > -        Flow(.logical_datapath = ls._uuid,
> > -             .stage            = s_SWITCH_OUT_ACL(),
> > -             .priority         = 1,
> > -             .__match          = "ip && (!ct.est || (ct.est &&
> ct_label.blocked == 1))",
> > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> next;",
> > -             .external_ids     = map_empty());
> > -
> > -        /* Ingress and Egress ACL Table (Priority 65535).
> > -         *
> > -         * Always drop traffic that's in an invalid state.  Also drop
> > -         * reply direction packets for connections that have been marked
> > -         * for deletion (bit 0 of ct_label is set).
> > -         *
> > -         * This is enforced at a higher priority than ACLs can be
> defined. */
> > -        Flow(.logical_datapath = ls._uuid,
> > -             .stage            = s_SWITCH_IN_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> ct_label.blocked == 1)",
> > -             .actions          = "drop;",
> > -             .external_ids     = map_empty());
> > -        Flow(.logical_datapath = ls._uuid,
> > -             .stage            = s_SWITCH_OUT_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> ct_label.blocked == 1)",
> > -             .actions          = "drop;",
> > -             .external_ids     = map_empty());
> > -
> > -        /* Ingress and Egress ACL Table (Priority 65535).
> > -         *
> > -         * Allow reply traffic that is part of an established
> > -         * conntrack entry that has not been marked for deletion
> > -         * (bit 0 of ct_label).  We only match traffic in the
> > -         * reply direction because we want traffic in the request
> > -         * direction to hit the currently defined policy from ACLs.
> > -         *
> > -         * This is enforced at a higher priority than ACLs can be
> defined. */
> > +for (UseCtInvMatch[use_ct_inv_match]) {
> > +    (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match) {
> > +        true -> ("ct.inv || ", "&& !ct.inv "),
> > +        false -> ("", ""),
> > +    } in
> > +    for (sw in &Switch(.ls = ls))
> > +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> > +    {
> > +        /* Ingress and Egress ACL Table (Priority 0): Packets are
> allowed by
> > +         * default.  A related rule at priority 1 is added below if there
> > +         * are any stateful ACLs in this datapath. */
> >          Flow(.logical_datapath = ls._uuid,
> >               .stage            = s_SWITCH_IN_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> !ct.inv "
> > -                                 "&& ct.rpl && ct_label.blocked == 0",
> > +             .priority         = 0,
> > +             .__match          = "1",
> >               .actions          = "next;",
> >               .external_ids     = map_empty());
> >          Flow(.logical_datapath = ls._uuid,
> >               .stage            = s_SWITCH_OUT_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> !ct.inv "
> > -                                 "&& ct.rpl && ct_label.blocked == 0",
> > +             .priority         = 0,
> > +             .__match          = "1",
> >               .actions          = "next;",
> >               .external_ids     = map_empty());
> >
> > -        /* Ingress and Egress ACL Table (Priority 65535).
> > -         *
> > -         * Allow traffic that is related to an existing conntrack entry
> that
> > -         * has not been marked for deletion (bit 0 of ct_label).
> > -         *
> > -         * This is enforced at a higher priority than ACLs can be
> defined.
> > -         *
> > -         * NOTE: This does not support related data sessions (eg,
> > -         * a dynamically negotiated FTP data channel), but will allow
> > -         * related traffic such as an ICMP Port Unreachable through
> > -         * that's generated from a non-listening UDP port.  */
> > -        Flow(.logical_datapath = ls._uuid,
> > -             .stage            = s_SWITCH_IN_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> !ct.inv "
> > -                                 "&& ct_label.blocked == 0",
> > -             .actions          = "next;",
> > -             .external_ids     = map_empty());
> > -        Flow(.logical_datapath = ls._uuid,
> > -             .stage            = s_SWITCH_OUT_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> !ct.inv "
> > -                                 "&& ct_label.blocked == 0",
> > -             .actions          = "next;",
> > -             .external_ids     = map_empty());
> > +        if (has_stateful) {
> > +            /* Ingress and Egress ACL Table (Priority 1).
> > +             *
> > +             * By default, traffic is allowed.  This is partially
> handled by
> > +             * the Priority 0 ACL flows added earlier, but we also need
> to
> > +             * commit IP flows.  This is because, while the initiater's
> > +             * direction may not have any stateful rules, the server's
> may
> > +             * and then its return traffic would not have an associated
> > +             * conntrack entry and would return "+invalid".
> > +             *
> > +             * We use "ct_commit" for a connection that is not already
> known
> > +             * by the connection tracker.  Once a connection is
> committed,
> > +             * subsequent packets will hit the flow at priority 0 that
> just
> > +             * uses "next;"
> > +             *
> > +             * We also check for established connections that have
> ct_label.blocked
> > +             * set on them.  That's a connection that was disallowed,
> but is
> > +             * now allowed by policy again since it hit this
> default-allow flow.
> > +             * We need to set ct_label.blocked=0 to let the connection
> continue,
> > +             * which will be done by ct_commit() in the "stateful" stage.
> > +             * Subsequent packets will hit the flow at priority 0 that
> just
> > +             * uses "next;". */
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_IN_ACL(),
> > +                 .priority         = 1,
> > +                 .__match          = "ip && (!ct.est || (ct.est &&
> ct_label.blocked == 1))",
> > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> next;",
> > +                 .external_ids     = map_empty());
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_OUT_ACL(),
> > +                 .priority         = 1,
> > +                 .__match          = "ip && (!ct.est || (ct.est &&
> ct_label.blocked == 1))",
> > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> next;",
> > +                 .external_ids     = map_empty());
> >
> > -        /* Ingress and Egress ACL Table (Priority 65535).
> > -         *
> > -         * Not to do conntrack on ND packets. */
> > +            /* Ingress and Egress ACL Table (Priority 65535).
> > +             *
> > +             * Always drop traffic that's in an invalid state.  Also drop
> > +             * reply direction packets for connections that have been
> marked
> > +             * for deletion (bit 0 of ct_label is set).
> > +             *
> > +             * This is enforced at a higher priority than ACLs can be
> defined. */
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_IN_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl &&
> ct_label.blocked == 1)",
> > +                 .actions          = "drop;",
> > +                 .external_ids     = map_empty());
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_OUT_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl &&
> ct_label.blocked == 1)",
> > +                 .actions          = "drop;",
> > +                 .external_ids     = map_empty());
> > +
> > +            /* Ingress and Egress ACL Table (Priority 65535).
> > +             *
> > +             * Allow reply traffic that is part of an established
> > +             * conntrack entry that has not been marked for deletion
> > +             * (bit 0 of ct_label).  We only match traffic in the
> > +             * reply direction because we want traffic in the request
> > +             * direction to hit the currently defined policy from ACLs.
> > +             *
> > +             * This is enforced at a higher priority than ACLs can be
> defined. */
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_IN_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = "ct.est && !ct.rel && !ct.new " ++
> and_not_ct_inv ++
> > +                                     "&& ct.rpl && ct_label.blocked ==
> 0",
> > +                 .actions          = "next;",
> > +                 .external_ids     = map_empty());
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_OUT_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = "ct.est && !ct.rel && !ct.new " ++
> and_not_ct_inv ++
> > +                                     "&& ct.rpl && ct_label.blocked ==
> 0",
> > +                 .actions          = "next;",
> > +                 .external_ids     = map_empty());
> > +
> > +            /* Ingress and Egress ACL Table (Priority 65535).
> > +             *
> > +             * Allow traffic that is related to an existing conntrack
> entry that
> > +             * has not been marked for deletion (bit 0 of ct_label).
> > +             *
> > +             * This is enforced at a higher priority than ACLs can be
> defined.
> > +             *
> > +             * NOTE: This does not support related data sessions (eg,
> > +             * a dynamically negotiated FTP data channel), but will allow
> > +             * related traffic such as an ICMP Port Unreachable through
> > +             * that's generated from a non-listening UDP port.  */
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_IN_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = "!ct.est && ct.rel && !ct.new " ++
> and_not_ct_inv ++
> > +                                     "&& ct_label.blocked == 0",
> > +                 .actions          = "next;",
> > +                 .external_ids     = map_empty());
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_OUT_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = "!ct.est && ct.rel && !ct.new " ++
> and_not_ct_inv ++
> > +                                     "&& ct_label.blocked == 0",
> > +                 .actions          = "next;",
> > +                 .external_ids     = map_empty());
> > +
> > +            /* Ingress and Egress ACL Table (Priority 65535).
> > +             *
> > +             * Not to do conntrack on ND packets. */
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_IN_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> mldv2",
> > +                 .actions          = "next;",
> > +                 .external_ids     = map_empty());
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_OUT_ACL(),
> > +                 .priority         = 65535,
> > +                 .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> mldv2",
> > +                 .actions          = "next;",
> > +                 .external_ids     = map_empty())
> > +        };
> > +
> > +        /* Add a 34000 priority flow to advance the DNS reply from
> ovn-controller,
> > +         * if the CMS has configured DNS records for the datapath.
> > +         */
> > +        if (sw.has_dns_records) {
> > +            Flow(.logical_datapath = ls._uuid,
> > +                 .stage            = s_SWITCH_OUT_ACL(),
> > +                 .priority         = 34000,
> > +                 .__match          = "udp.src == 53",
> > +                 .actions          = if has_stateful "ct_commit; next;"
> else "next;",
> > +                 .external_ids     = map_empty())
> > +        };
> > +
> > +        /* Add a 34000 priority flow to advance the service monitor reply
> > +         * packets to skip applying ingress ACLs. */
> >          Flow(.logical_datapath = ls._uuid,
> >               .stage            = s_SWITCH_IN_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> mldv2",
> > +             .priority         = 34000,
> > +             .__match          = "eth.dst == $svc_monitor_mac",
> >               .actions          = "next;",
> >               .external_ids     = map_empty());
> > -        Flow(.logical_datapath = ls._uuid,
> > -             .stage            = s_SWITCH_OUT_ACL(),
> > -             .priority         = 65535,
> > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> mldv2",
> > -             .actions          = "next;",
> > -             .external_ids     = map_empty())
> > -    };
> > -
> > -    /* Add a 34000 priority flow to advance the DNS reply from
> ovn-controller,
> > -     * if the CMS has configured DNS records for the datapath.
> > -     */
> > -    if (sw.has_dns_records) {
> >          Flow(.logical_datapath = ls._uuid,
> >               .stage            = s_SWITCH_OUT_ACL(),
> >               .priority         = 34000,
> > -             .__match          = "udp.src == 53",
> > -             .actions          = if has_stateful "ct_commit; next;" else
> "next;",
> > +             .__match          = "eth.src == $svc_monitor_mac",
> > +             .actions          = "next;",
> >               .external_ids     = map_empty())
> > -    };
> > -
> > -    /* Add a 34000 priority flow to advance the service monitor reply
> > -     * packets to skip applying ingress ACLs. */
> > -    Flow(.logical_datapath = ls._uuid,
> > -         .stage            = s_SWITCH_IN_ACL(),
> > -         .priority         = 34000,
> > -         .__match          = "eth.dst == $svc_monitor_mac",
> > -         .actions          = "next;",
> > -         .external_ids     = map_empty());
> > -    Flow(.logical_datapath = ls._uuid,
> > -         .stage            = s_SWITCH_OUT_ACL(),
> > -         .priority         = 34000,
> > -         .__match          = "eth.src == $svc_monitor_mac",
> > -         .actions          = "next;",
> > -         .external_ids     = map_empty())
> > +    }
> >  }
> >
> >  /* This stage builds hints for the IN/OUT_ACL stage. Based on various
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index feb38b5d31..e337be4eb3 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -240,6 +240,21 @@
> >          </p>
> >        </column>
> >
> > +      <column name="options" key="use_ct_inv_match">
> > +        <p>
> > +          If set to false, <code>ovn-northd</code> will not use the
> > +          <code>ct.inv</code> field in any of the logical flow matches.
> > +          The default value is true.  If the NIC supports offloading
> > +          OVS datapath flows but doesn't support offloading ct_state
> > +          <code>inv</code> flag, then the datapath flows matching on
> this flag
> > +          (either <code>+inv</code> or <code>-inv</code>) will not be
> > +          offloaded.  CMS should consider setting
> <code>use_ct_inv_match</code>
> > +          to <code>false</code> in such cases.  This results in a side
> effect
> > +          of the invalid packets getting delivered to the destination
> VIF, which
> > +          otherwise would have been dropped by <code>OVN</code>.
> > +        </p>
> > +      </column>
> > +
> >        <group title="Options for configuring interconnection route
> advertisement">
> >          <p>
> >            These options control how routes are advertised between OVN
> > --
> > 2.29.2
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> Acked-by: Han Zhou <hzhou@ovn.org>

Thanks for the reviews Mark G and Han.

I applied both the patches to the main branch addressing Han's comment in p1.
When I submitted this patch, I had a test case which somehow got removed
in the later versions.  I added the test case before applying.

Regards
Numan

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Han Zhou May 6, 2021, 11 p.m. UTC | #4
On Thu, May 6, 2021 at 11:41 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Thu, May 6, 2021 at 2:35 AM Han Zhou <hzhou@ovn.org> wrote:
> >
> > On Thu, Apr 22, 2021 at 6:25 AM <numans@ovn.org> wrote:
> > >
> > > From: Numan Siddique <numans@ovn.org>
> > >
> > > Presently we add 65535 priority lflows in the stages -
> > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
> > > match on 'ct.inv'.
> > >
> > > This patch provides an option to not use this field in the
> > > logical flow matches for the following
> > > reasons:
> > >
> > >  • Some of the smart NICs which support offloading datapath
> > >    flows may not support this field.  In such cases, almost
> > >    all the datapath flows (matching on the ct_state field +inv
> > >    or -inv cannot be offloaded if ACLs/load balancers
> > >    are configured on the logical switch datapath.
> > >
> > >  • A recent commit in kernel ovs datapath sets the committed
> > >    connection tracking entry to be liberal for out-of-window
> > >    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
> > >    packets will not be marked as invalid.
> > >
> > > There are still certain scenarios where a packet can be marked
> > > invalid.  Like
> > >  • If the tcp flags are not correct.
> > >
> > >  • If there are checksum errors.
> > >
> > > In such cases, the packet will be delivered to the destination
> > > port.  So CMS should evaluate their usecases before enabling
> > > this option.
> > >
> > > Signed-off-by: Numan Siddique <numans@ovn.org>
> > > Co-authored-by: Ben Pfaff <blp@ovn.org>
> > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > ---
> > > v3 -> v4
> > > ---
> > >   * Moved the static variable 'use_ct_inv_match' declaration up in the
> > >     file.
> > >
> > > v2 -> v3
> > > ----
> > >   * Updated the documentation in ovn-nb.xml as per Mark G's
suggestion.
> > >
> > > v1 -> v2
> > > ----
> > >   * Adopted the code changes shared by Ben Pfaff for the ddlog code.
> > >
> > >  NEWS                 |   7 +-
> > >  northd/lswitch.dl    |   7 +
> > >  northd/ovn-northd.c  |  42 +++---
> > >  northd/ovn_northd.dl | 304
++++++++++++++++++++++---------------------
> > >  ovn-nb.xml           |  15 +++
> > >  5 files changed, 208 insertions(+), 167 deletions(-)
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 1ddde15f86..5f1365775d 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -6,11 +6,16 @@ Post-v21.03.0
> > >      expected to scale better than the C implementation, for large
> > deployments.
> > >      (This may take testing and tuning to be effective.)  This
version of
> > OVN
> > >      requires DDLog 0.36.
> > > -  - Introduce ovn-controller incremetal processing engine statistics
> > > +  - Introduce ovn-controller incremental processing engine statistics
> > >    - Introduced parallel processing in ovn-northd with the NB_Global
> > config option
> > >      'use_parallel_build' to enable it.  It is disabled by default.
> > >    - Support vlan-passthru mode for tag=0 localnet ports.
> > >    - Support custom 802.11ad EthType for localnet ports.
> > > +  - Add a new NB Global option - use_ct_inv_match with the default
value
> > of true.
> > > +    If this is set to false, then the logical field - "ct.inv" will
not
> > be
> > > +    used in the logical flow matches.  CMS can consider setting this
to
> > false,
> > > +    if they want to use smart NICs which don't support offloading
> > datapath flows
> > > +    with this field used.
> > >
> > >  OVN v21.03.0 - 12 Mar 2021
> > >  -------------------------
> > > diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> > > index 47c497e0cf..27ac3cadc5 100644
> > > --- a/northd/lswitch.dl
> > > +++ b/northd/lswitch.dl
> > > @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options:
> > Map<string,string>,
> > >  relation SvcMonitorMac(mac: eth_addr)
> > >  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
> > >      nb::NB_Global(._uuid = uuid, .options = options).
> > > +
> > > +relation UseCtInvMatch[bool]
> > > +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
> > > +    nb::NB_Global(.options = options).
> > > +UseCtInvMatch[true] :-
> > > +    Unit(),
> > > +    not nb in nb::NB_Global().
> > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > index de1aa896d3..5f6c0a6922 100644
> > > --- a/northd/ovn-northd.c
> > > +++ b/northd/ovn-northd.c
> > > @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
> > >  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
> > >  static struct eth_addr svc_monitor_mac_ea;
> > >
> > > +/* If this option is 'true' northd will make use of ct.inv match
fields.
> > > + * Otherwise, it will avoid using it.  The default is true. */
> > > +static bool use_ct_inv_match = true;
> > > +
> > >  /* Default probe interval for NB and SB DB connections. */
> > >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > >  static int northd_probe_interval_nb = 0;
> > > @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool
> > shared,
> > >      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
> > >  }
> > >
> > > -
> > >  /* Adds a row with the specified contents to the Logical_Flow table.
*/
> > >  static void
> > >  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
> > > @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct
hmap
> > *lflows,
> > >           * for deletion (bit 0 of ct_label is set).
> > >           *
> > >           * This is enforced at a higher priority than ACLs can be
> > defined. */
> > > +        char *match = xasprintf("%s(ct.est && ct.rpl &&
ct_label.blocked
> > == 1)",
> > > +                                use_ct_inv_match ? "ct.inv || " :
"");
> > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > > -                      "ct.inv || (ct.est && ct.rpl &&
ct_label.blocked
> > == 1)",
> > > -                      "drop;");
> > > +                      match, "drop;");
> > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > > -                      "ct.inv || (ct.est && ct.rpl &&
ct_label.blocked
> > == 1)",
> > > -                      "drop;");
> > > +                      match, "drop;");
> > > +        free(match);
> > >
> > >          /* Ingress and Egress ACL Table (Priority 65535).
> > >           *
> > > @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct
hmap
> > *lflows,
> > >           * direction to hit the currently defined policy from ACLs.
> > >           *
> > >           * This is enforced at a higher priority than ACLs can be
> > defined. */
> > > +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
> > > +                          "ct.rpl && ct_label.blocked == 0",
> > > +                          use_ct_inv_match ? " && !ct.inv" : "");
> > > +
> > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> > > -                      "&& ct.rpl && ct_label.blocked == 0",
> > > -                      "next;");
> > > +                      match, "next;");
> > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> > > -                      "&& ct.rpl && ct_label.blocked == 0",
> > > -                      "next;");
> > > +                      match, "next;");
> > > +        free(match);
> > >
> > >          /* Ingress and Egress ACL Table (Priority 65535).
> > >           *
> > > @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct
hmap
> > *lflows,
> > >           * a dynamically negotiated FTP data channel), but will allow
> > >           * related traffic such as an ICMP Port Unreachable through
> > >           * that's generated from a non-listening UDP port.  */
> > > +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
> > > +                          "ct_label.blocked == 0",
> > > +                          use_ct_inv_match ? " && !ct.inv" : "");
> > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> > > -                      "&& ct_label.blocked == 0",
> > > -                      "next;");
> > > +                      match, "next;");
> > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> > > -                      "&& ct_label.blocked == 0",
> > > -                      "next;");
> > > +                      match, "next;");
> > > +        free(match);
> > >
> > >          /* Ingress and Egress ACL Table (Priority 65535).
> > >           *
> > > @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
> > >
> > >      use_logical_dp_groups = smap_get_bool(&nb->options,
> > >                                            "use_logical_dp_groups",
> > false);
> > > +    use_ct_inv_match = smap_get_bool(&nb->options,
> > > +                                     "use_ct_inv_match", true);
> > > +
> > >      /* deprecated, use --event instead */
> > >      controller_event_en = smap_get_bool(&nb->options,
> > >                                          "controller_event", false);
> > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> > > index 2f5d36c599..a0f7944103 100644
> > > --- a/northd/ovn_northd.dl
> > > +++ b/northd/ovn_northd.dl
> > > @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage, acl,
> > fair_meter, extra_match_, extra_action
> > >  }
> > >
> > >  /* build_acls */
> > > -for (sw in &Switch(.ls = ls))
> > > -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> > > -{
> > > -    /* Ingress and Egress ACL Table (Priority 0): Packets are
allowed by
> > > -     * default.  A related rule at priority 1 is added below if there
> > > -     * are any stateful ACLs in this datapath. */
> > > -    Flow(.logical_datapath = ls._uuid,
> > > -         .stage            = s_SWITCH_IN_ACL(),
> > > -         .priority         = 0,
> > > -         .__match          = "1",
> > > -         .actions          = "next;",
> > > -         .external_ids     = map_empty());
> > > -    Flow(.logical_datapath = ls._uuid,
> > > -         .stage            = s_SWITCH_OUT_ACL(),
> > > -         .priority         = 0,
> > > -         .__match          = "1",
> > > -         .actions          = "next;",
> > > -         .external_ids     = map_empty());
> > > -
> > > -    if (has_stateful) {
> > > -        /* Ingress and Egress ACL Table (Priority 1).
> > > -         *
> > > -         * By default, traffic is allowed.  This is partially
handled by
> > > -         * the Priority 0 ACL flows added earlier, but we also need
to
> > > -         * commit IP flows.  This is because, while the initiater's
> > > -         * direction may not have any stateful rules, the server's
may
> > > -         * and then its return traffic would not have an associated
> > > -         * conntrack entry and would return "+invalid".
> > > -         *
> > > -         * We use "ct_commit" for a connection that is not already
known
> > > -         * by the connection tracker.  Once a connection is
committed,
> > > -         * subsequent packets will hit the flow at priority 0 that
just
> > > -         * uses "next;"
> > > -         *
> > > -         * We also check for established connections that have
> > ct_label.blocked
> > > -         * set on them.  That's a connection that was disallowed,
but is
> > > -         * now allowed by policy again since it hit this
default-allow
> > flow.
> > > -         * We need to set ct_label.blocked=0 to let the connection
> > continue,
> > > -         * which will be done by ct_commit() in the "stateful" stage.
> > > -         * Subsequent packets will hit the flow at priority 0 that
just
> > > -         * uses "next;". */
> > > -        Flow(.logical_datapath = ls._uuid,
> > > -             .stage            = s_SWITCH_IN_ACL(),
> > > -             .priority         = 1,
> > > -             .__match          = "ip && (!ct.est || (ct.est &&
> > ct_label.blocked == 1))",
> > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> > next;",
> > > -             .external_ids     = map_empty());
> > > -        Flow(.logical_datapath = ls._uuid,
> > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > -             .priority         = 1,
> > > -             .__match          = "ip && (!ct.est || (ct.est &&
> > ct_label.blocked == 1))",
> > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> > next;",
> > > -             .external_ids     = map_empty());
> > > -
> > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > -         *
> > > -         * Always drop traffic that's in an invalid state.  Also drop
> > > -         * reply direction packets for connections that have been
marked
> > > -         * for deletion (bit 0 of ct_label is set).
> > > -         *
> > > -         * This is enforced at a higher priority than ACLs can be
> > defined. */
> > > -        Flow(.logical_datapath = ls._uuid,
> > > -             .stage            = s_SWITCH_IN_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> > ct_label.blocked == 1)",
> > > -             .actions          = "drop;",
> > > -             .external_ids     = map_empty());
> > > -        Flow(.logical_datapath = ls._uuid,
> > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> > ct_label.blocked == 1)",
> > > -             .actions          = "drop;",
> > > -             .external_ids     = map_empty());
> > > -
> > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > -         *
> > > -         * Allow reply traffic that is part of an established
> > > -         * conntrack entry that has not been marked for deletion
> > > -         * (bit 0 of ct_label).  We only match traffic in the
> > > -         * reply direction because we want traffic in the request
> > > -         * direction to hit the currently defined policy from ACLs.
> > > -         *
> > > -         * This is enforced at a higher priority than ACLs can be
> > defined. */
> > > +for (UseCtInvMatch[use_ct_inv_match]) {
> > > +    (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match) {
> > > +        true -> ("ct.inv || ", "&& !ct.inv "),
> > > +        false -> ("", ""),
> > > +    } in
> > > +    for (sw in &Switch(.ls = ls))
> > > +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> > > +    {
> > > +        /* Ingress and Egress ACL Table (Priority 0): Packets are
> > allowed by
> > > +         * default.  A related rule at priority 1 is added below if
there
> > > +         * are any stateful ACLs in this datapath. */
> > >          Flow(.logical_datapath = ls._uuid,
> > >               .stage            = s_SWITCH_IN_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> > !ct.inv "
> > > -                                 "&& ct.rpl && ct_label.blocked ==
0",
> > > +             .priority         = 0,
> > > +             .__match          = "1",
> > >               .actions          = "next;",
> > >               .external_ids     = map_empty());
> > >          Flow(.logical_datapath = ls._uuid,
> > >               .stage            = s_SWITCH_OUT_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> > !ct.inv "
> > > -                                 "&& ct.rpl && ct_label.blocked ==
0",
> > > +             .priority         = 0,
> > > +             .__match          = "1",
> > >               .actions          = "next;",
> > >               .external_ids     = map_empty());
> > >
> > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > -         *
> > > -         * Allow traffic that is related to an existing conntrack
entry
> > that
> > > -         * has not been marked for deletion (bit 0 of ct_label).
> > > -         *
> > > -         * This is enforced at a higher priority than ACLs can be
> > defined.
> > > -         *
> > > -         * NOTE: This does not support related data sessions (eg,
> > > -         * a dynamically negotiated FTP data channel), but will allow
> > > -         * related traffic such as an ICMP Port Unreachable through
> > > -         * that's generated from a non-listening UDP port.  */
> > > -        Flow(.logical_datapath = ls._uuid,
> > > -             .stage            = s_SWITCH_IN_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> > !ct.inv "
> > > -                                 "&& ct_label.blocked == 0",
> > > -             .actions          = "next;",
> > > -             .external_ids     = map_empty());
> > > -        Flow(.logical_datapath = ls._uuid,
> > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> > !ct.inv "
> > > -                                 "&& ct_label.blocked == 0",
> > > -             .actions          = "next;",
> > > -             .external_ids     = map_empty());
> > > +        if (has_stateful) {
> > > +            /* Ingress and Egress ACL Table (Priority 1).
> > > +             *
> > > +             * By default, traffic is allowed.  This is partially
> > handled by
> > > +             * the Priority 0 ACL flows added earlier, but we also
need
> > to
> > > +             * commit IP flows.  This is because, while the
initiater's
> > > +             * direction may not have any stateful rules, the
server's
> > may
> > > +             * and then its return traffic would not have an
associated
> > > +             * conntrack entry and would return "+invalid".
> > > +             *
> > > +             * We use "ct_commit" for a connection that is not
already
> > known
> > > +             * by the connection tracker.  Once a connection is
> > committed,
> > > +             * subsequent packets will hit the flow at priority 0
that
> > just
> > > +             * uses "next;"
> > > +             *
> > > +             * We also check for established connections that have
> > ct_label.blocked
> > > +             * set on them.  That's a connection that was disallowed,
> > but is
> > > +             * now allowed by policy again since it hit this
> > default-allow flow.
> > > +             * We need to set ct_label.blocked=0 to let the
connection
> > continue,
> > > +             * which will be done by ct_commit() in the "stateful"
stage.
> > > +             * Subsequent packets will hit the flow at priority 0
that
> > just
> > > +             * uses "next;". */
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > +                 .priority         = 1,
> > > +                 .__match          = "ip && (!ct.est || (ct.est &&
> > ct_label.blocked == 1))",
> > > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} =
1;
> > next;",
> > > +                 .external_ids     = map_empty());
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > +                 .priority         = 1,
> > > +                 .__match          = "ip && (!ct.est || (ct.est &&
> > ct_label.blocked == 1))",
> > > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} =
1;
> > next;",
> > > +                 .external_ids     = map_empty());
> > >
> > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > -         *
> > > -         * Not to do conntrack on ND packets. */
> > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > +             *
> > > +             * Always drop traffic that's in an invalid state.  Also
drop
> > > +             * reply direction packets for connections that have been
> > marked
> > > +             * for deletion (bit 0 of ct_label is set).
> > > +             *
> > > +             * This is enforced at a higher priority than ACLs can be
> > defined. */
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl
&&
> > ct_label.blocked == 1)",
> > > +                 .actions          = "drop;",
> > > +                 .external_ids     = map_empty());
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl
&&
> > ct_label.blocked == 1)",
> > > +                 .actions          = "drop;",
> > > +                 .external_ids     = map_empty());
> > > +
> > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > +             *
> > > +             * Allow reply traffic that is part of an established
> > > +             * conntrack entry that has not been marked for deletion
> > > +             * (bit 0 of ct_label).  We only match traffic in the
> > > +             * reply direction because we want traffic in the request
> > > +             * direction to hit the currently defined policy from
ACLs.
> > > +             *
> > > +             * This is enforced at a higher priority than ACLs can be
> > defined. */
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = "ct.est && !ct.rel && !ct.new "
++
> > and_not_ct_inv ++
> > > +                                     "&& ct.rpl && ct_label.blocked
==
> > 0",
> > > +                 .actions          = "next;",
> > > +                 .external_ids     = map_empty());
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = "ct.est && !ct.rel && !ct.new "
++
> > and_not_ct_inv ++
> > > +                                     "&& ct.rpl && ct_label.blocked
==
> > 0",
> > > +                 .actions          = "next;",
> > > +                 .external_ids     = map_empty());
> > > +
> > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > +             *
> > > +             * Allow traffic that is related to an existing conntrack
> > entry that
> > > +             * has not been marked for deletion (bit 0 of ct_label).
> > > +             *
> > > +             * This is enforced at a higher priority than ACLs can be
> > defined.
> > > +             *
> > > +             * NOTE: This does not support related data sessions (eg,
> > > +             * a dynamically negotiated FTP data channel), but will
allow
> > > +             * related traffic such as an ICMP Port Unreachable
through
> > > +             * that's generated from a non-listening UDP port.  */
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = "!ct.est && ct.rel && !ct.new "
++
> > and_not_ct_inv ++
> > > +                                     "&& ct_label.blocked == 0",
> > > +                 .actions          = "next;",
> > > +                 .external_ids     = map_empty());
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = "!ct.est && ct.rel && !ct.new "
++
> > and_not_ct_inv ++
> > > +                                     "&& ct_label.blocked == 0",
> > > +                 .actions          = "next;",
> > > +                 .external_ids     = map_empty());
> > > +
> > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > +             *
> > > +             * Not to do conntrack on ND packets. */
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = "nd || nd_ra || nd_rs || mldv1
||
> > mldv2",
> > > +                 .actions          = "next;",
> > > +                 .external_ids     = map_empty());
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > +                 .priority         = 65535,
> > > +                 .__match          = "nd || nd_ra || nd_rs || mldv1
||
> > mldv2",
> > > +                 .actions          = "next;",
> > > +                 .external_ids     = map_empty())
> > > +        };
> > > +
> > > +        /* Add a 34000 priority flow to advance the DNS reply from
> > ovn-controller,
> > > +         * if the CMS has configured DNS records for the datapath.
> > > +         */
> > > +        if (sw.has_dns_records) {
> > > +            Flow(.logical_datapath = ls._uuid,
> > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > +                 .priority         = 34000,
> > > +                 .__match          = "udp.src == 53",
> > > +                 .actions          = if has_stateful "ct_commit;
next;"
> > else "next;",
> > > +                 .external_ids     = map_empty())
> > > +        };
> > > +
> > > +        /* Add a 34000 priority flow to advance the service monitor
reply
> > > +         * packets to skip applying ingress ACLs. */
> > >          Flow(.logical_datapath = ls._uuid,
> > >               .stage            = s_SWITCH_IN_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> > mldv2",
> > > +             .priority         = 34000,
> > > +             .__match          = "eth.dst == $svc_monitor_mac",
> > >               .actions          = "next;",
> > >               .external_ids     = map_empty());
> > > -        Flow(.logical_datapath = ls._uuid,
> > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > -             .priority         = 65535,
> > > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> > mldv2",
> > > -             .actions          = "next;",
> > > -             .external_ids     = map_empty())
> > > -    };
> > > -
> > > -    /* Add a 34000 priority flow to advance the DNS reply from
> > ovn-controller,
> > > -     * if the CMS has configured DNS records for the datapath.
> > > -     */
> > > -    if (sw.has_dns_records) {
> > >          Flow(.logical_datapath = ls._uuid,
> > >               .stage            = s_SWITCH_OUT_ACL(),
> > >               .priority         = 34000,
> > > -             .__match          = "udp.src == 53",
> > > -             .actions          = if has_stateful "ct_commit; next;"
else
> > "next;",
> > > +             .__match          = "eth.src == $svc_monitor_mac",
> > > +             .actions          = "next;",
> > >               .external_ids     = map_empty())
> > > -    };
> > > -
> > > -    /* Add a 34000 priority flow to advance the service monitor reply
> > > -     * packets to skip applying ingress ACLs. */
> > > -    Flow(.logical_datapath = ls._uuid,
> > > -         .stage            = s_SWITCH_IN_ACL(),
> > > -         .priority         = 34000,
> > > -         .__match          = "eth.dst == $svc_monitor_mac",
> > > -         .actions          = "next;",
> > > -         .external_ids     = map_empty());
> > > -    Flow(.logical_datapath = ls._uuid,
> > > -         .stage            = s_SWITCH_OUT_ACL(),
> > > -         .priority         = 34000,
> > > -         .__match          = "eth.src == $svc_monitor_mac",
> > > -         .actions          = "next;",
> > > -         .external_ids     = map_empty())
> > > +    }
> > >  }
> > >
> > >  /* This stage builds hints for the IN/OUT_ACL stage. Based on various
> > > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > > index feb38b5d31..e337be4eb3 100644
> > > --- a/ovn-nb.xml
> > > +++ b/ovn-nb.xml
> > > @@ -240,6 +240,21 @@
> > >          </p>
> > >        </column>
> > >
> > > +      <column name="options" key="use_ct_inv_match">
> > > +        <p>
> > > +          If set to false, <code>ovn-northd</code> will not use the
> > > +          <code>ct.inv</code> field in any of the logical flow
matches.
> > > +          The default value is true.  If the NIC supports offloading
> > > +          OVS datapath flows but doesn't support offloading ct_state
> > > +          <code>inv</code> flag, then the datapath flows matching on
> > this flag
> > > +          (either <code>+inv</code> or <code>-inv</code>) will not be
> > > +          offloaded.  CMS should consider setting
> > <code>use_ct_inv_match</code>
> > > +          to <code>false</code> in such cases.  This results in a
side
> > effect
> > > +          of the invalid packets getting delivered to the destination
> > VIF, which
> > > +          otherwise would have been dropped by <code>OVN</code>.
> > > +        </p>
> > > +      </column>
> > > +
> > >        <group title="Options for configuring interconnection route
> > advertisement">
> > >          <p>
> > >            These options control how routes are advertised between OVN
> > > --
> > > 2.29.2
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> > Acked-by: Han Zhou <hzhou@ovn.org>
>
> Thanks for the reviews Mark G and Han.
>
> I applied both the patches to the main branch addressing Han's comment in
p1.
> When I submitted this patch, I had a test case which somehow got removed
> in the later versions.  I added the test case before applying.
>
> Regards
> Numan

Hi Numan,

Seems this merge breaks the CI. All these vtep related cases are failing:
898: ovn-controller-vtep - chassis FAILED (ovn-controller-vtep.at:110)
899: ovn-controller-vtep - binding 1 FAILED (ovn-controller-vtep.at:178)
900: ovn-controller-vtep - binding 2 FAILED (ovn-controller-vtep.at:252)
901: ovn-controller-vtep - vtep-lswitch FAILED (ovn-controller-vtep.at:291)
902: ovn-controller-vtep - vtep-macs 1 FAILED (ovn-controller-vtep.at:343)
903: ovn-controller-vtep - vtep-macs 2 FAILED (ovn-controller-vtep.at:431)

All these tests are successful when I run them locally. I haven't checked
more details yet.

Thanks,
Han
Numan Siddique May 7, 2021, 1:12 a.m. UTC | #5
On Thu, May 6, 2021, 7:00 PM Han Zhou <hzhou@ovn.org> wrote:

> On Thu, May 6, 2021 at 11:41 AM Numan Siddique <numans@ovn.org> wrote:
> >
> > On Thu, May 6, 2021 at 2:35 AM Han Zhou <hzhou@ovn.org> wrote:
> > >
> > > On Thu, Apr 22, 2021 at 6:25 AM <numans@ovn.org> wrote:
> > > >
> > > > From: Numan Siddique <numans@ovn.org>
> > > >
> > > > Presently we add 65535 priority lflows in the stages -
> > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
> > > > match on 'ct.inv'.
> > > >
> > > > This patch provides an option to not use this field in the
> > > > logical flow matches for the following
> > > > reasons:
> > > >
> > > >  • Some of the smart NICs which support offloading datapath
> > > >    flows may not support this field.  In such cases, almost
> > > >    all the datapath flows (matching on the ct_state field +inv
> > > >    or -inv cannot be offloaded if ACLs/load balancers
> > > >    are configured on the logical switch datapath.
> > > >
> > > >  • A recent commit in kernel ovs datapath sets the committed
> > > >    connection tracking entry to be liberal for out-of-window
> > > >    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
> > > >    packets will not be marked as invalid.
> > > >
> > > > There are still certain scenarios where a packet can be marked
> > > > invalid.  Like
> > > >  • If the tcp flags are not correct.
> > > >
> > > >  • If there are checksum errors.
> > > >
> > > > In such cases, the packet will be delivered to the destination
> > > > port.  So CMS should evaluate their usecases before enabling
> > > > this option.
> > > >
> > > > Signed-off-by: Numan Siddique <numans@ovn.org>
> > > > Co-authored-by: Ben Pfaff <blp@ovn.org>
> > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > > ---
> > > > v3 -> v4
> > > > ---
> > > >   * Moved the static variable 'use_ct_inv_match' declaration up in
> the
> > > >     file.
> > > >
> > > > v2 -> v3
> > > > ----
> > > >   * Updated the documentation in ovn-nb.xml as per Mark G's
> suggestion.
> > > >
> > > > v1 -> v2
> > > > ----
> > > >   * Adopted the code changes shared by Ben Pfaff for the ddlog code.
> > > >
> > > >  NEWS                 |   7 +-
> > > >  northd/lswitch.dl    |   7 +
> > > >  northd/ovn-northd.c  |  42 +++---
> > > >  northd/ovn_northd.dl | 304
> ++++++++++++++++++++++---------------------
> > > >  ovn-nb.xml           |  15 +++
> > > >  5 files changed, 208 insertions(+), 167 deletions(-)
> > > >
> > > > diff --git a/NEWS b/NEWS
> > > > index 1ddde15f86..5f1365775d 100644
> > > > --- a/NEWS
> > > > +++ b/NEWS
> > > > @@ -6,11 +6,16 @@ Post-v21.03.0
> > > >      expected to scale better than the C implementation, for large
> > > deployments.
> > > >      (This may take testing and tuning to be effective.)  This
> version of
> > > OVN
> > > >      requires DDLog 0.36.
> > > > -  - Introduce ovn-controller incremetal processing engine statistics
> > > > +  - Introduce ovn-controller incremental processing engine
> statistics
> > > >    - Introduced parallel processing in ovn-northd with the NB_Global
> > > config option
> > > >      'use_parallel_build' to enable it.  It is disabled by default.
> > > >    - Support vlan-passthru mode for tag=0 localnet ports.
> > > >    - Support custom 802.11ad EthType for localnet ports.
> > > > +  - Add a new NB Global option - use_ct_inv_match with the default
> value
> > > of true.
> > > > +    If this is set to false, then the logical field - "ct.inv" will
> not
> > > be
> > > > +    used in the logical flow matches.  CMS can consider setting this
> to
> > > false,
> > > > +    if they want to use smart NICs which don't support offloading
> > > datapath flows
> > > > +    with this field used.
> > > >
> > > >  OVN v21.03.0 - 12 Mar 2021
> > > >  -------------------------
> > > > diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> > > > index 47c497e0cf..27ac3cadc5 100644
> > > > --- a/northd/lswitch.dl
> > > > +++ b/northd/lswitch.dl
> > > > @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options:
> > > Map<string,string>,
> > > >  relation SvcMonitorMac(mac: eth_addr)
> > > >  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
> > > >      nb::NB_Global(._uuid = uuid, .options = options).
> > > > +
> > > > +relation UseCtInvMatch[bool]
> > > > +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
> > > > +    nb::NB_Global(.options = options).
> > > > +UseCtInvMatch[true] :-
> > > > +    Unit(),
> > > > +    not nb in nb::NB_Global().
> > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > > index de1aa896d3..5f6c0a6922 100644
> > > > --- a/northd/ovn-northd.c
> > > > +++ b/northd/ovn-northd.c
> > > > @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
> > > >  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
> > > >  static struct eth_addr svc_monitor_mac_ea;
> > > >
> > > > +/* If this option is 'true' northd will make use of ct.inv match
> fields.
> > > > + * Otherwise, it will avoid using it.  The default is true. */
> > > > +static bool use_ct_inv_match = true;
> > > > +
> > > >  /* Default probe interval for NB and SB DB connections. */
> > > >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > > >  static int northd_probe_interval_nb = 0;
> > > > @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool
> > > shared,
> > > >      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
> > > >  }
> > > >
> > > > -
> > > >  /* Adds a row with the specified contents to the Logical_Flow table.
> */
> > > >  static void
> > > >  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
> > > > @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct
> hmap
> > > *lflows,
> > > >           * for deletion (bit 0 of ct_label is set).
> > > >           *
> > > >           * This is enforced at a higher priority than ACLs can be
> > > defined. */
> > > > +        char *match = xasprintf("%s(ct.est && ct.rpl &&
> ct_label.blocked
> > > == 1)",
> > > > +                                use_ct_inv_match ? "ct.inv || " :
> "");
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > > > -                      "ct.inv || (ct.est && ct.rpl &&
> ct_label.blocked
> > > == 1)",
> > > > -                      "drop;");
> > > > +                      match, "drop;");
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > > > -                      "ct.inv || (ct.est && ct.rpl &&
> ct_label.blocked
> > > == 1)",
> > > > -                      "drop;");
> > > > +                      match, "drop;");
> > > > +        free(match);
> > > >
> > > >          /* Ingress and Egress ACL Table (Priority 65535).
> > > >           *
> > > > @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct
> hmap
> > > *lflows,
> > > >           * direction to hit the currently defined policy from ACLs.
> > > >           *
> > > >           * This is enforced at a higher priority than ACLs can be
> > > defined. */
> > > > +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
> > > > +                          "ct.rpl && ct_label.blocked == 0",
> > > > +                          use_ct_inv_match ? " && !ct.inv" : "");
> > > > +
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
> > > > -                      "next;");
> > > > +                      match, "next;");
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
> > > > -                      "next;");
> > > > +                      match, "next;");
> > > > +        free(match);
> > > >
> > > >          /* Ingress and Egress ACL Table (Priority 65535).
> > > >           *
> > > > @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct
> hmap
> > > *lflows,
> > > >           * a dynamically negotiated FTP data channel), but will
> allow
> > > >           * related traffic such as an ICMP Port Unreachable through
> > > >           * that's generated from a non-listening UDP port.  */
> > > > +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
> > > > +                          "ct_label.blocked == 0",
> > > > +                          use_ct_inv_match ? " && !ct.inv" : "");
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> > > > -                      "&& ct_label.blocked == 0",
> > > > -                      "next;");
> > > > +                      match, "next;");
> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> > > > -                      "&& ct_label.blocked == 0",
> > > > -                      "next;");
> > > > +                      match, "next;");
> > > > +        free(match);
> > > >
> > > >          /* Ingress and Egress ACL Table (Priority 65535).
> > > >           *
> > > > @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
> > > >
> > > >      use_logical_dp_groups = smap_get_bool(&nb->options,
> > > >                                            "use_logical_dp_groups",
> > > false);
> > > > +    use_ct_inv_match = smap_get_bool(&nb->options,
> > > > +                                     "use_ct_inv_match", true);
> > > > +
> > > >      /* deprecated, use --event instead */
> > > >      controller_event_en = smap_get_bool(&nb->options,
> > > >                                          "controller_event", false);
> > > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> > > > index 2f5d36c599..a0f7944103 100644
> > > > --- a/northd/ovn_northd.dl
> > > > +++ b/northd/ovn_northd.dl
> > > > @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage, acl,
> > > fair_meter, extra_match_, extra_action
> > > >  }
> > > >
> > > >  /* build_acls */
> > > > -for (sw in &Switch(.ls = ls))
> > > > -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> > > > -{
> > > > -    /* Ingress and Egress ACL Table (Priority 0): Packets are
> allowed by
> > > > -     * default.  A related rule at priority 1 is added below if
> there
> > > > -     * are any stateful ACLs in this datapath. */
> > > > -    Flow(.logical_datapath = ls._uuid,
> > > > -         .stage            = s_SWITCH_IN_ACL(),
> > > > -         .priority         = 0,
> > > > -         .__match          = "1",
> > > > -         .actions          = "next;",
> > > > -         .external_ids     = map_empty());
> > > > -    Flow(.logical_datapath = ls._uuid,
> > > > -         .stage            = s_SWITCH_OUT_ACL(),
> > > > -         .priority         = 0,
> > > > -         .__match          = "1",
> > > > -         .actions          = "next;",
> > > > -         .external_ids     = map_empty());
> > > > -
> > > > -    if (has_stateful) {
> > > > -        /* Ingress and Egress ACL Table (Priority 1).
> > > > -         *
> > > > -         * By default, traffic is allowed.  This is partially
> handled by
> > > > -         * the Priority 0 ACL flows added earlier, but we also need
> to
> > > > -         * commit IP flows.  This is because, while the initiater's
> > > > -         * direction may not have any stateful rules, the server's
> may
> > > > -         * and then its return traffic would not have an associated
> > > > -         * conntrack entry and would return "+invalid".
> > > > -         *
> > > > -         * We use "ct_commit" for a connection that is not already
> known
> > > > -         * by the connection tracker.  Once a connection is
> committed,
> > > > -         * subsequent packets will hit the flow at priority 0 that
> just
> > > > -         * uses "next;"
> > > > -         *
> > > > -         * We also check for established connections that have
> > > ct_label.blocked
> > > > -         * set on them.  That's a connection that was disallowed,
> but is
> > > > -         * now allowed by policy again since it hit this
> default-allow
> > > flow.
> > > > -         * We need to set ct_label.blocked=0 to let the connection
> > > continue,
> > > > -         * which will be done by ct_commit() in the "stateful"
> stage.
> > > > -         * Subsequent packets will hit the flow at priority 0 that
> just
> > > > -         * uses "next;". */
> > > > -        Flow(.logical_datapath = ls._uuid,
> > > > -             .stage            = s_SWITCH_IN_ACL(),
> > > > -             .priority         = 1,
> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
> > > ct_label.blocked == 1))",
> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> > > next;",
> > > > -             .external_ids     = map_empty());
> > > > -        Flow(.logical_datapath = ls._uuid,
> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > > -             .priority         = 1,
> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
> > > ct_label.blocked == 1))",
> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
> > > next;",
> > > > -             .external_ids     = map_empty());
> > > > -
> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > > -         *
> > > > -         * Always drop traffic that's in an invalid state.  Also
> drop
> > > > -         * reply direction packets for connections that have been
> marked
> > > > -         * for deletion (bit 0 of ct_label is set).
> > > > -         *
> > > > -         * This is enforced at a higher priority than ACLs can be
> > > defined. */
> > > > -        Flow(.logical_datapath = ls._uuid,
> > > > -             .stage            = s_SWITCH_IN_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> > > ct_label.blocked == 1)",
> > > > -             .actions          = "drop;",
> > > > -             .external_ids     = map_empty());
> > > > -        Flow(.logical_datapath = ls._uuid,
> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> > > ct_label.blocked == 1)",
> > > > -             .actions          = "drop;",
> > > > -             .external_ids     = map_empty());
> > > > -
> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > > -         *
> > > > -         * Allow reply traffic that is part of an established
> > > > -         * conntrack entry that has not been marked for deletion
> > > > -         * (bit 0 of ct_label).  We only match traffic in the
> > > > -         * reply direction because we want traffic in the request
> > > > -         * direction to hit the currently defined policy from ACLs.
> > > > -         *
> > > > -         * This is enforced at a higher priority than ACLs can be
> > > defined. */
> > > > +for (UseCtInvMatch[use_ct_inv_match]) {
> > > > +    (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match) {
> > > > +        true -> ("ct.inv || ", "&& !ct.inv "),
> > > > +        false -> ("", ""),
> > > > +    } in
> > > > +    for (sw in &Switch(.ls = ls))
> > > > +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> > > > +    {
> > > > +        /* Ingress and Egress ACL Table (Priority 0): Packets are
> > > allowed by
> > > > +         * default.  A related rule at priority 1 is added below if
> there
> > > > +         * are any stateful ACLs in this datapath. */
> > > >          Flow(.logical_datapath = ls._uuid,
> > > >               .stage            = s_SWITCH_IN_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> > > !ct.inv "
> > > > -                                 "&& ct.rpl && ct_label.blocked ==
> 0",
> > > > +             .priority         = 0,
> > > > +             .__match          = "1",
> > > >               .actions          = "next;",
> > > >               .external_ids     = map_empty());
> > > >          Flow(.logical_datapath = ls._uuid,
> > > >               .stage            = s_SWITCH_OUT_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> > > !ct.inv "
> > > > -                                 "&& ct.rpl && ct_label.blocked ==
> 0",
> > > > +             .priority         = 0,
> > > > +             .__match          = "1",
> > > >               .actions          = "next;",
> > > >               .external_ids     = map_empty());
> > > >
> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > > -         *
> > > > -         * Allow traffic that is related to an existing conntrack
> entry
> > > that
> > > > -         * has not been marked for deletion (bit 0 of ct_label).
> > > > -         *
> > > > -         * This is enforced at a higher priority than ACLs can be
> > > defined.
> > > > -         *
> > > > -         * NOTE: This does not support related data sessions (eg,
> > > > -         * a dynamically negotiated FTP data channel), but will
> allow
> > > > -         * related traffic such as an ICMP Port Unreachable through
> > > > -         * that's generated from a non-listening UDP port.  */
> > > > -        Flow(.logical_datapath = ls._uuid,
> > > > -             .stage            = s_SWITCH_IN_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> > > !ct.inv "
> > > > -                                 "&& ct_label.blocked == 0",
> > > > -             .actions          = "next;",
> > > > -             .external_ids     = map_empty());
> > > > -        Flow(.logical_datapath = ls._uuid,
> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> > > !ct.inv "
> > > > -                                 "&& ct_label.blocked == 0",
> > > > -             .actions          = "next;",
> > > > -             .external_ids     = map_empty());
> > > > +        if (has_stateful) {
> > > > +            /* Ingress and Egress ACL Table (Priority 1).
> > > > +             *
> > > > +             * By default, traffic is allowed.  This is partially
> > > handled by
> > > > +             * the Priority 0 ACL flows added earlier, but we also
> need
> > > to
> > > > +             * commit IP flows.  This is because, while the
> initiater's
> > > > +             * direction may not have any stateful rules, the
> server's
> > > may
> > > > +             * and then its return traffic would not have an
> associated
> > > > +             * conntrack entry and would return "+invalid".
> > > > +             *
> > > > +             * We use "ct_commit" for a connection that is not
> already
> > > known
> > > > +             * by the connection tracker.  Once a connection is
> > > committed,
> > > > +             * subsequent packets will hit the flow at priority 0
> that
> > > just
> > > > +             * uses "next;"
> > > > +             *
> > > > +             * We also check for established connections that have
> > > ct_label.blocked
> > > > +             * set on them.  That's a connection that was
> disallowed,
> > > but is
> > > > +             * now allowed by policy again since it hit this
> > > default-allow flow.
> > > > +             * We need to set ct_label.blocked=0 to let the
> connection
> > > continue,
> > > > +             * which will be done by ct_commit() in the "stateful"
> stage.
> > > > +             * Subsequent packets will hit the flow at priority 0
> that
> > > just
> > > > +             * uses "next;". */
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > > +                 .priority         = 1,
> > > > +                 .__match          = "ip && (!ct.est || (ct.est &&
> > > ct_label.blocked == 1))",
> > > > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} =
> 1;
> > > next;",
> > > > +                 .external_ids     = map_empty());
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > > +                 .priority         = 1,
> > > > +                 .__match          = "ip && (!ct.est || (ct.est &&
> > > ct_label.blocked == 1))",
> > > > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} =
> 1;
> > > next;",
> > > > +                 .external_ids     = map_empty());
> > > >
> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> > > > -         *
> > > > -         * Not to do conntrack on ND packets. */
> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > > +             *
> > > > +             * Always drop traffic that's in an invalid state.  Also
> drop
> > > > +             * reply direction packets for connections that have
> been
> > > marked
> > > > +             * for deletion (bit 0 of ct_label is set).
> > > > +             *
> > > > +             * This is enforced at a higher priority than ACLs can
> be
> > > defined. */
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl
> &&
> > > ct_label.blocked == 1)",
> > > > +                 .actions          = "drop;",
> > > > +                 .external_ids     = map_empty());
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl
> &&
> > > ct_label.blocked == 1)",
> > > > +                 .actions          = "drop;",
> > > > +                 .external_ids     = map_empty());
> > > > +
> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > > +             *
> > > > +             * Allow reply traffic that is part of an established
> > > > +             * conntrack entry that has not been marked for deletion
> > > > +             * (bit 0 of ct_label).  We only match traffic in the
> > > > +             * reply direction because we want traffic in the
> request
> > > > +             * direction to hit the currently defined policy from
> ACLs.
> > > > +             *
> > > > +             * This is enforced at a higher priority than ACLs can
> be
> > > defined. */
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = "ct.est && !ct.rel && !ct.new "
> ++
> > > and_not_ct_inv ++
> > > > +                                     "&& ct.rpl && ct_label.blocked
> ==
> > > 0",
> > > > +                 .actions          = "next;",
> > > > +                 .external_ids     = map_empty());
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = "ct.est && !ct.rel && !ct.new "
> ++
> > > and_not_ct_inv ++
> > > > +                                     "&& ct.rpl && ct_label.blocked
> ==
> > > 0",
> > > > +                 .actions          = "next;",
> > > > +                 .external_ids     = map_empty());
> > > > +
> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > > +             *
> > > > +             * Allow traffic that is related to an existing
> conntrack
> > > entry that
> > > > +             * has not been marked for deletion (bit 0 of ct_label).
> > > > +             *
> > > > +             * This is enforced at a higher priority than ACLs can
> be
> > > defined.
> > > > +             *
> > > > +             * NOTE: This does not support related data sessions
> (eg,
> > > > +             * a dynamically negotiated FTP data channel), but will
> allow
> > > > +             * related traffic such as an ICMP Port Unreachable
> through
> > > > +             * that's generated from a non-listening UDP port.  */
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = "!ct.est && ct.rel && !ct.new "
> ++
> > > and_not_ct_inv ++
> > > > +                                     "&& ct_label.blocked == 0",
> > > > +                 .actions          = "next;",
> > > > +                 .external_ids     = map_empty());
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = "!ct.est && ct.rel && !ct.new "
> ++
> > > and_not_ct_inv ++
> > > > +                                     "&& ct_label.blocked == 0",
> > > > +                 .actions          = "next;",
> > > > +                 .external_ids     = map_empty());
> > > > +
> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> > > > +             *
> > > > +             * Not to do conntrack on ND packets. */
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = "nd || nd_ra || nd_rs || mldv1
> ||
> > > mldv2",
> > > > +                 .actions          = "next;",
> > > > +                 .external_ids     = map_empty());
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > > +                 .priority         = 65535,
> > > > +                 .__match          = "nd || nd_ra || nd_rs || mldv1
> ||
> > > mldv2",
> > > > +                 .actions          = "next;",
> > > > +                 .external_ids     = map_empty())
> > > > +        };
> > > > +
> > > > +        /* Add a 34000 priority flow to advance the DNS reply from
> > > ovn-controller,
> > > > +         * if the CMS has configured DNS records for the datapath.
> > > > +         */
> > > > +        if (sw.has_dns_records) {
> > > > +            Flow(.logical_datapath = ls._uuid,
> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> > > > +                 .priority         = 34000,
> > > > +                 .__match          = "udp.src == 53",
> > > > +                 .actions          = if has_stateful "ct_commit;
> next;"
> > > else "next;",
> > > > +                 .external_ids     = map_empty())
> > > > +        };
> > > > +
> > > > +        /* Add a 34000 priority flow to advance the service monitor
> reply
> > > > +         * packets to skip applying ingress ACLs. */
> > > >          Flow(.logical_datapath = ls._uuid,
> > > >               .stage            = s_SWITCH_IN_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> > > mldv2",
> > > > +             .priority         = 34000,
> > > > +             .__match          = "eth.dst == $svc_monitor_mac",
> > > >               .actions          = "next;",
> > > >               .external_ids     = map_empty());
> > > > -        Flow(.logical_datapath = ls._uuid,
> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> > > > -             .priority         = 65535,
> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
> > > mldv2",
> > > > -             .actions          = "next;",
> > > > -             .external_ids     = map_empty())
> > > > -    };
> > > > -
> > > > -    /* Add a 34000 priority flow to advance the DNS reply from
> > > ovn-controller,
> > > > -     * if the CMS has configured DNS records for the datapath.
> > > > -     */
> > > > -    if (sw.has_dns_records) {
> > > >          Flow(.logical_datapath = ls._uuid,
> > > >               .stage            = s_SWITCH_OUT_ACL(),
> > > >               .priority         = 34000,
> > > > -             .__match          = "udp.src == 53",
> > > > -             .actions          = if has_stateful "ct_commit; next;"
> else
> > > "next;",
> > > > +             .__match          = "eth.src == $svc_monitor_mac",
> > > > +             .actions          = "next;",
> > > >               .external_ids     = map_empty())
> > > > -    };
> > > > -
> > > > -    /* Add a 34000 priority flow to advance the service monitor
> reply
> > > > -     * packets to skip applying ingress ACLs. */
> > > > -    Flow(.logical_datapath = ls._uuid,
> > > > -         .stage            = s_SWITCH_IN_ACL(),
> > > > -         .priority         = 34000,
> > > > -         .__match          = "eth.dst == $svc_monitor_mac",
> > > > -         .actions          = "next;",
> > > > -         .external_ids     = map_empty());
> > > > -    Flow(.logical_datapath = ls._uuid,
> > > > -         .stage            = s_SWITCH_OUT_ACL(),
> > > > -         .priority         = 34000,
> > > > -         .__match          = "eth.src == $svc_monitor_mac",
> > > > -         .actions          = "next;",
> > > > -         .external_ids     = map_empty())
> > > > +    }
> > > >  }
> > > >
> > > >  /* This stage builds hints for the IN/OUT_ACL stage. Based on
> various
> > > > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > > > index feb38b5d31..e337be4eb3 100644
> > > > --- a/ovn-nb.xml
> > > > +++ b/ovn-nb.xml
> > > > @@ -240,6 +240,21 @@
> > > >          </p>
> > > >        </column>
> > > >
> > > > +      <column name="options" key="use_ct_inv_match">
> > > > +        <p>
> > > > +          If set to false, <code>ovn-northd</code> will not use the
> > > > +          <code>ct.inv</code> field in any of the logical flow
> matches.
> > > > +          The default value is true.  If the NIC supports offloading
> > > > +          OVS datapath flows but doesn't support offloading ct_state
> > > > +          <code>inv</code> flag, then the datapath flows matching on
> > > this flag
> > > > +          (either <code>+inv</code> or <code>-inv</code>) will not
> be
> > > > +          offloaded.  CMS should consider setting
> > > <code>use_ct_inv_match</code>
> > > > +          to <code>false</code> in such cases.  This results in a
> side
> > > effect
> > > > +          of the invalid packets getting delivered to the
> destination
> > > VIF, which
> > > > +          otherwise would have been dropped by <code>OVN</code>.
> > > > +        </p>
> > > > +      </column>
> > > > +
> > > >        <group title="Options for configuring interconnection route
> > > advertisement">
> > > >          <p>
> > > >            These options control how routes are advertised between
> OVN
> > > > --
> > > > 2.29.2
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> > > Acked-by: Han Zhou <hzhou@ovn.org>
> >
> > Thanks for the reviews Mark G and Han.
> >
> > I applied both the patches to the main branch addressing Han's comment in
> p1.
> > When I submitted this patch, I had a test case which somehow got removed
> > in the later versions.  I added the test case before applying.
> >
> > Regards
> > Numan
>
> Hi Numan,
>
> Seems this merge breaks the CI. All these vtep related cases are failing:
> 898: ovn-controller-vtep - chassis FAILED (ovn-controller-vtep.at:110)
> 899: ovn-controller-vtep - binding 1 FAILED (ovn-controller-vtep.at:178)
> 900: ovn-controller-vtep - binding 2 FAILED (ovn-controller-vtep.at:252)
> 901: ovn-controller-vtep - vtep-lswitch FAILED (ovn-controller-vtep.at:291
> )
> 902: ovn-controller-vtep - vtep-macs 1 FAILED (ovn-controller-vtep.at:343)
> 903: ovn-controller-vtep - vtep-macs 2 FAILED (ovn-controller-vtep.at:431)
>
> All these tests are successful when I run them locally. I haven't checked
> more details yet.
>

Hi Han,

I noticed that. In fact it's failing without these patches too.

I quickly looked into it and looks like it's failing because of some
warning message related to dns_resolve in ovsdb-server log.

I think wen need to white list this warning message in vtep tests.


Thanks
Numan



>
>
> Thanks,
> Han
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou May 7, 2021, 1:16 a.m. UTC | #6
On Thu, May 6, 2021 at 6:13 PM Numan Siddique <numans@ovn.org> wrote:

>
>
> On Thu, May 6, 2021, 7:00 PM Han Zhou <hzhou@ovn.org> wrote:
>
>> On Thu, May 6, 2021 at 11:41 AM Numan Siddique <numans@ovn.org> wrote:
>> >
>> > On Thu, May 6, 2021 at 2:35 AM Han Zhou <hzhou@ovn.org> wrote:
>> > >
>> > > On Thu, Apr 22, 2021 at 6:25 AM <numans@ovn.org> wrote:
>> > > >
>> > > > From: Numan Siddique <numans@ovn.org>
>> > > >
>> > > > Presently we add 65535 priority lflows in the stages -
>> > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
>> > > > match on 'ct.inv'.
>> > > >
>> > > > This patch provides an option to not use this field in the
>> > > > logical flow matches for the following
>> > > > reasons:
>> > > >
>> > > >  • Some of the smart NICs which support offloading datapath
>> > > >    flows may not support this field.  In such cases, almost
>> > > >    all the datapath flows (matching on the ct_state field +inv
>> > > >    or -inv cannot be offloaded if ACLs/load balancers
>> > > >    are configured on the logical switch datapath.
>> > > >
>> > > >  • A recent commit in kernel ovs datapath sets the committed
>> > > >    connection tracking entry to be liberal for out-of-window
>> > > >    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
>> > > >    packets will not be marked as invalid.
>> > > >
>> > > > There are still certain scenarios where a packet can be marked
>> > > > invalid.  Like
>> > > >  • If the tcp flags are not correct.
>> > > >
>> > > >  • If there are checksum errors.
>> > > >
>> > > > In such cases, the packet will be delivered to the destination
>> > > > port.  So CMS should evaluate their usecases before enabling
>> > > > this option.
>> > > >
>> > > > Signed-off-by: Numan Siddique <numans@ovn.org>
>> > > > Co-authored-by: Ben Pfaff <blp@ovn.org>
>> > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
>> > > > ---
>> > > > v3 -> v4
>> > > > ---
>> > > >   * Moved the static variable 'use_ct_inv_match' declaration up in
>> the
>> > > >     file.
>> > > >
>> > > > v2 -> v3
>> > > > ----
>> > > >   * Updated the documentation in ovn-nb.xml as per Mark G's
>> suggestion.
>> > > >
>> > > > v1 -> v2
>> > > > ----
>> > > >   * Adopted the code changes shared by Ben Pfaff for the ddlog code.
>> > > >
>> > > >  NEWS                 |   7 +-
>> > > >  northd/lswitch.dl    |   7 +
>> > > >  northd/ovn-northd.c  |  42 +++---
>> > > >  northd/ovn_northd.dl | 304
>> ++++++++++++++++++++++---------------------
>> > > >  ovn-nb.xml           |  15 +++
>> > > >  5 files changed, 208 insertions(+), 167 deletions(-)
>> > > >
>> > > > diff --git a/NEWS b/NEWS
>> > > > index 1ddde15f86..5f1365775d 100644
>> > > > --- a/NEWS
>> > > > +++ b/NEWS
>> > > > @@ -6,11 +6,16 @@ Post-v21.03.0
>> > > >      expected to scale better than the C implementation, for large
>> > > deployments.
>> > > >      (This may take testing and tuning to be effective.)  This
>> version of
>> > > OVN
>> > > >      requires DDLog 0.36.
>> > > > -  - Introduce ovn-controller incremetal processing engine
>> statistics
>> > > > +  - Introduce ovn-controller incremental processing engine
>> statistics
>> > > >    - Introduced parallel processing in ovn-northd with the NB_Global
>> > > config option
>> > > >      'use_parallel_build' to enable it.  It is disabled by default.
>> > > >    - Support vlan-passthru mode for tag=0 localnet ports.
>> > > >    - Support custom 802.11ad EthType for localnet ports.
>> > > > +  - Add a new NB Global option - use_ct_inv_match with the default
>> value
>> > > of true.
>> > > > +    If this is set to false, then the logical field - "ct.inv" will
>> not
>> > > be
>> > > > +    used in the logical flow matches.  CMS can consider setting
>> this
>> to
>> > > false,
>> > > > +    if they want to use smart NICs which don't support offloading
>> > > datapath flows
>> > > > +    with this field used.
>> > > >
>> > > >  OVN v21.03.0 - 12 Mar 2021
>> > > >  -------------------------
>> > > > diff --git a/northd/lswitch.dl b/northd/lswitch.dl
>> > > > index 47c497e0cf..27ac3cadc5 100644
>> > > > --- a/northd/lswitch.dl
>> > > > +++ b/northd/lswitch.dl
>> > > > @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options:
>> > > Map<string,string>,
>> > > >  relation SvcMonitorMac(mac: eth_addr)
>> > > >  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
>> > > >      nb::NB_Global(._uuid = uuid, .options = options).
>> > > > +
>> > > > +relation UseCtInvMatch[bool]
>> > > > +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
>> > > > +    nb::NB_Global(.options = options).
>> > > > +UseCtInvMatch[true] :-
>> > > > +    Unit(),
>> > > > +    not nb in nb::NB_Global().
>> > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> > > > index de1aa896d3..5f6c0a6922 100644
>> > > > --- a/northd/ovn-northd.c
>> > > > +++ b/northd/ovn-northd.c
>> > > > @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
>> > > >  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
>> > > >  static struct eth_addr svc_monitor_mac_ea;
>> > > >
>> > > > +/* If this option is 'true' northd will make use of ct.inv match
>> fields.
>> > > > + * Otherwise, it will avoid using it.  The default is true. */
>> > > > +static bool use_ct_inv_match = true;
>> > > > +
>> > > >  /* Default probe interval for NB and SB DB connections. */
>> > > >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>> > > >  static int northd_probe_interval_nb = 0;
>> > > > @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool
>> > > shared,
>> > > >      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
>> > > >  }
>> > > >
>> > > > -
>> > > >  /* Adds a row with the specified contents to the Logical_Flow
>> table.
>> */
>> > > >  static void
>> > > >  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
>> > > > @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct
>> hmap
>> > > *lflows,
>> > > >           * for deletion (bit 0 of ct_label is set).
>> > > >           *
>> > > >           * This is enforced at a higher priority than ACLs can be
>> > > defined. */
>> > > > +        char *match = xasprintf("%s(ct.est && ct.rpl &&
>> ct_label.blocked
>> > > == 1)",
>> > > > +                                use_ct_inv_match ? "ct.inv || " :
>> "");
>> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
>> > > > -                      "ct.inv || (ct.est && ct.rpl &&
>> ct_label.blocked
>> > > == 1)",
>> > > > -                      "drop;");
>> > > > +                      match, "drop;");
>> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
>> > > > -                      "ct.inv || (ct.est && ct.rpl &&
>> ct_label.blocked
>> > > == 1)",
>> > > > -                      "drop;");
>> > > > +                      match, "drop;");
>> > > > +        free(match);
>> > > >
>> > > >          /* Ingress and Egress ACL Table (Priority 65535).
>> > > >           *
>> > > > @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct
>> hmap
>> > > *lflows,
>> > > >           * direction to hit the currently defined policy from ACLs.
>> > > >           *
>> > > >           * This is enforced at a higher priority than ACLs can be
>> > > defined. */
>> > > > +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
>> > > > +                          "ct.rpl && ct_label.blocked == 0",
>> > > > +                          use_ct_inv_match ? " && !ct.inv" : "");
>> > > > +
>> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
>> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
>> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
>> > > > -                      "next;");
>> > > > +                      match, "next;");
>> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
>> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
>> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
>> > > > -                      "next;");
>> > > > +                      match, "next;");
>> > > > +        free(match);
>> > > >
>> > > >          /* Ingress and Egress ACL Table (Priority 65535).
>> > > >           *
>> > > > @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct
>> hmap
>> > > *lflows,
>> > > >           * a dynamically negotiated FTP data channel), but will
>> allow
>> > > >           * related traffic such as an ICMP Port Unreachable through
>> > > >           * that's generated from a non-listening UDP port.  */
>> > > > +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
>> > > > +                          "ct_label.blocked == 0",
>> > > > +                          use_ct_inv_match ? " && !ct.inv" : "");
>> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
>> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
>> > > > -                      "&& ct_label.blocked == 0",
>> > > > -                      "next;");
>> > > > +                      match, "next;");
>> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
>> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
>> > > > -                      "&& ct_label.blocked == 0",
>> > > > -                      "next;");
>> > > > +                      match, "next;");
>> > > > +        free(match);
>> > > >
>> > > >          /* Ingress and Egress ACL Table (Priority 65535).
>> > > >           *
>> > > > @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
>> > > >
>> > > >      use_logical_dp_groups = smap_get_bool(&nb->options,
>> > > >                                            "use_logical_dp_groups",
>> > > false);
>> > > > +    use_ct_inv_match = smap_get_bool(&nb->options,
>> > > > +                                     "use_ct_inv_match", true);
>> > > > +
>> > > >      /* deprecated, use --event instead */
>> > > >      controller_event_en = smap_get_bool(&nb->options,
>> > > >                                          "controller_event", false);
>> > > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
>> > > > index 2f5d36c599..a0f7944103 100644
>> > > > --- a/northd/ovn_northd.dl
>> > > > +++ b/northd/ovn_northd.dl
>> > > > @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage, acl,
>> > > fair_meter, extra_match_, extra_action
>> > > >  }
>> > > >
>> > > >  /* build_acls */
>> > > > -for (sw in &Switch(.ls = ls))
>> > > > -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
>> > > > -{
>> > > > -    /* Ingress and Egress ACL Table (Priority 0): Packets are
>> allowed by
>> > > > -     * default.  A related rule at priority 1 is added below if
>> there
>> > > > -     * are any stateful ACLs in this datapath. */
>> > > > -    Flow(.logical_datapath = ls._uuid,
>> > > > -         .stage            = s_SWITCH_IN_ACL(),
>> > > > -         .priority         = 0,
>> > > > -         .__match          = "1",
>> > > > -         .actions          = "next;",
>> > > > -         .external_ids     = map_empty());
>> > > > -    Flow(.logical_datapath = ls._uuid,
>> > > > -         .stage            = s_SWITCH_OUT_ACL(),
>> > > > -         .priority         = 0,
>> > > > -         .__match          = "1",
>> > > > -         .actions          = "next;",
>> > > > -         .external_ids     = map_empty());
>> > > > -
>> > > > -    if (has_stateful) {
>> > > > -        /* Ingress and Egress ACL Table (Priority 1).
>> > > > -         *
>> > > > -         * By default, traffic is allowed.  This is partially
>> handled by
>> > > > -         * the Priority 0 ACL flows added earlier, but we also need
>> to
>> > > > -         * commit IP flows.  This is because, while the initiater's
>> > > > -         * direction may not have any stateful rules, the server's
>> may
>> > > > -         * and then its return traffic would not have an associated
>> > > > -         * conntrack entry and would return "+invalid".
>> > > > -         *
>> > > > -         * We use "ct_commit" for a connection that is not already
>> known
>> > > > -         * by the connection tracker.  Once a connection is
>> committed,
>> > > > -         * subsequent packets will hit the flow at priority 0 that
>> just
>> > > > -         * uses "next;"
>> > > > -         *
>> > > > -         * We also check for established connections that have
>> > > ct_label.blocked
>> > > > -         * set on them.  That's a connection that was disallowed,
>> but is
>> > > > -         * now allowed by policy again since it hit this
>> default-allow
>> > > flow.
>> > > > -         * We need to set ct_label.blocked=0 to let the connection
>> > > continue,
>> > > > -         * which will be done by ct_commit() in the "stateful"
>> stage.
>> > > > -         * Subsequent packets will hit the flow at priority 0 that
>> just
>> > > > -         * uses "next;". */
>> > > > -        Flow(.logical_datapath = ls._uuid,
>> > > > -             .stage            = s_SWITCH_IN_ACL(),
>> > > > -             .priority         = 1,
>> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
>> > > ct_label.blocked == 1))",
>> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
>> > > next;",
>> > > > -             .external_ids     = map_empty());
>> > > > -        Flow(.logical_datapath = ls._uuid,
>> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> > > > -             .priority         = 1,
>> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
>> > > ct_label.blocked == 1))",
>> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1;
>> > > next;",
>> > > > -             .external_ids     = map_empty());
>> > > > -
>> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> > > > -         *
>> > > > -         * Always drop traffic that's in an invalid state.  Also
>> drop
>> > > > -         * reply direction packets for connections that have been
>> marked
>> > > > -         * for deletion (bit 0 of ct_label is set).
>> > > > -         *
>> > > > -         * This is enforced at a higher priority than ACLs can be
>> > > defined. */
>> > > > -        Flow(.logical_datapath = ls._uuid,
>> > > > -             .stage            = s_SWITCH_IN_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
>> > > ct_label.blocked == 1)",
>> > > > -             .actions          = "drop;",
>> > > > -             .external_ids     = map_empty());
>> > > > -        Flow(.logical_datapath = ls._uuid,
>> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
>> > > ct_label.blocked == 1)",
>> > > > -             .actions          = "drop;",
>> > > > -             .external_ids     = map_empty());
>> > > > -
>> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> > > > -         *
>> > > > -         * Allow reply traffic that is part of an established
>> > > > -         * conntrack entry that has not been marked for deletion
>> > > > -         * (bit 0 of ct_label).  We only match traffic in the
>> > > > -         * reply direction because we want traffic in the request
>> > > > -         * direction to hit the currently defined policy from ACLs.
>> > > > -         *
>> > > > -         * This is enforced at a higher priority than ACLs can be
>> > > defined. */
>> > > > +for (UseCtInvMatch[use_ct_inv_match]) {
>> > > > +    (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match)
>> {
>> > > > +        true -> ("ct.inv || ", "&& !ct.inv "),
>> > > > +        false -> ("", ""),
>> > > > +    } in
>> > > > +    for (sw in &Switch(.ls = ls))
>> > > > +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
>> > > > +    {
>> > > > +        /* Ingress and Egress ACL Table (Priority 0): Packets are
>> > > allowed by
>> > > > +         * default.  A related rule at priority 1 is added below if
>> there
>> > > > +         * are any stateful ACLs in this datapath. */
>> > > >          Flow(.logical_datapath = ls._uuid,
>> > > >               .stage            = s_SWITCH_IN_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
>> > > !ct.inv "
>> > > > -                                 "&& ct.rpl && ct_label.blocked ==
>> 0",
>> > > > +             .priority         = 0,
>> > > > +             .__match          = "1",
>> > > >               .actions          = "next;",
>> > > >               .external_ids     = map_empty());
>> > > >          Flow(.logical_datapath = ls._uuid,
>> > > >               .stage            = s_SWITCH_OUT_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
>> > > !ct.inv "
>> > > > -                                 "&& ct.rpl && ct_label.blocked ==
>> 0",
>> > > > +             .priority         = 0,
>> > > > +             .__match          = "1",
>> > > >               .actions          = "next;",
>> > > >               .external_ids     = map_empty());
>> > > >
>> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> > > > -         *
>> > > > -         * Allow traffic that is related to an existing conntrack
>> entry
>> > > that
>> > > > -         * has not been marked for deletion (bit 0 of ct_label).
>> > > > -         *
>> > > > -         * This is enforced at a higher priority than ACLs can be
>> > > defined.
>> > > > -         *
>> > > > -         * NOTE: This does not support related data sessions (eg,
>> > > > -         * a dynamically negotiated FTP data channel), but will
>> allow
>> > > > -         * related traffic such as an ICMP Port Unreachable through
>> > > > -         * that's generated from a non-listening UDP port.  */
>> > > > -        Flow(.logical_datapath = ls._uuid,
>> > > > -             .stage            = s_SWITCH_IN_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
>> > > !ct.inv "
>> > > > -                                 "&& ct_label.blocked == 0",
>> > > > -             .actions          = "next;",
>> > > > -             .external_ids     = map_empty());
>> > > > -        Flow(.logical_datapath = ls._uuid,
>> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
>> > > !ct.inv "
>> > > > -                                 "&& ct_label.blocked == 0",
>> > > > -             .actions          = "next;",
>> > > > -             .external_ids     = map_empty());
>> > > > +        if (has_stateful) {
>> > > > +            /* Ingress and Egress ACL Table (Priority 1).
>> > > > +             *
>> > > > +             * By default, traffic is allowed.  This is partially
>> > > handled by
>> > > > +             * the Priority 0 ACL flows added earlier, but we also
>> need
>> > > to
>> > > > +             * commit IP flows.  This is because, while the
>> initiater's
>> > > > +             * direction may not have any stateful rules, the
>> server's
>> > > may
>> > > > +             * and then its return traffic would not have an
>> associated
>> > > > +             * conntrack entry and would return "+invalid".
>> > > > +             *
>> > > > +             * We use "ct_commit" for a connection that is not
>> already
>> > > known
>> > > > +             * by the connection tracker.  Once a connection is
>> > > committed,
>> > > > +             * subsequent packets will hit the flow at priority 0
>> that
>> > > just
>> > > > +             * uses "next;"
>> > > > +             *
>> > > > +             * We also check for established connections that have
>> > > ct_label.blocked
>> > > > +             * set on them.  That's a connection that was
>> disallowed,
>> > > but is
>> > > > +             * now allowed by policy again since it hit this
>> > > default-allow flow.
>> > > > +             * We need to set ct_label.blocked=0 to let the
>> connection
>> > > continue,
>> > > > +             * which will be done by ct_commit() in the "stateful"
>> stage.
>> > > > +             * Subsequent packets will hit the flow at priority 0
>> that
>> > > just
>> > > > +             * uses "next;". */
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> > > > +                 .priority         = 1,
>> > > > +                 .__match          = "ip && (!ct.est || (ct.est &&
>> > > ct_label.blocked == 1))",
>> > > > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()}
>> =
>> 1;
>> > > next;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> > > > +                 .priority         = 1,
>> > > > +                 .__match          = "ip && (!ct.est || (ct.est &&
>> > > ct_label.blocked == 1))",
>> > > > +                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()}
>> =
>> 1;
>> > > next;",
>> > > > +                 .external_ids     = map_empty());
>> > > >
>> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> > > > -         *
>> > > > -         * Not to do conntrack on ND packets. */
>> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> > > > +             *
>> > > > +             * Always drop traffic that's in an invalid state.
>> Also
>> drop
>> > > > +             * reply direction packets for connections that have
>> been
>> > > marked
>> > > > +             * for deletion (bit 0 of ct_label is set).
>> > > > +             *
>> > > > +             * This is enforced at a higher priority than ACLs can
>> be
>> > > defined. */
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = ct_inv_or ++ "(ct.est &&
>> ct.rpl
>> &&
>> > > ct_label.blocked == 1)",
>> > > > +                 .actions          = "drop;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = ct_inv_or ++ "(ct.est &&
>> ct.rpl
>> &&
>> > > ct_label.blocked == 1)",
>> > > > +                 .actions          = "drop;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +
>> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> > > > +             *
>> > > > +             * Allow reply traffic that is part of an established
>> > > > +             * conntrack entry that has not been marked for
>> deletion
>> > > > +             * (bit 0 of ct_label).  We only match traffic in the
>> > > > +             * reply direction because we want traffic in the
>> request
>> > > > +             * direction to hit the currently defined policy from
>> ACLs.
>> > > > +             *
>> > > > +             * This is enforced at a higher priority than ACLs can
>> be
>> > > defined. */
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = "ct.est && !ct.rel && !ct.new
>> "
>> ++
>> > > and_not_ct_inv ++
>> > > > +                                     "&& ct.rpl && ct_label.blocked
>> ==
>> > > 0",
>> > > > +                 .actions          = "next;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = "ct.est && !ct.rel && !ct.new
>> "
>> ++
>> > > and_not_ct_inv ++
>> > > > +                                     "&& ct.rpl && ct_label.blocked
>> ==
>> > > 0",
>> > > > +                 .actions          = "next;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +
>> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> > > > +             *
>> > > > +             * Allow traffic that is related to an existing
>> conntrack
>> > > entry that
>> > > > +             * has not been marked for deletion (bit 0 of
>> ct_label).
>> > > > +             *
>> > > > +             * This is enforced at a higher priority than ACLs can
>> be
>> > > defined.
>> > > > +             *
>> > > > +             * NOTE: This does not support related data sessions
>> (eg,
>> > > > +             * a dynamically negotiated FTP data channel), but will
>> allow
>> > > > +             * related traffic such as an ICMP Port Unreachable
>> through
>> > > > +             * that's generated from a non-listening UDP port.  */
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = "!ct.est && ct.rel && !ct.new
>> "
>> ++
>> > > and_not_ct_inv ++
>> > > > +                                     "&& ct_label.blocked == 0",
>> > > > +                 .actions          = "next;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = "!ct.est && ct.rel && !ct.new
>> "
>> ++
>> > > and_not_ct_inv ++
>> > > > +                                     "&& ct_label.blocked == 0",
>> > > > +                 .actions          = "next;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +
>> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> > > > +             *
>> > > > +             * Not to do conntrack on ND packets. */
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = "nd || nd_ra || nd_rs || mldv1
>> ||
>> > > mldv2",
>> > > > +                 .actions          = "next;",
>> > > > +                 .external_ids     = map_empty());
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> > > > +                 .priority         = 65535,
>> > > > +                 .__match          = "nd || nd_ra || nd_rs || mldv1
>> ||
>> > > mldv2",
>> > > > +                 .actions          = "next;",
>> > > > +                 .external_ids     = map_empty())
>> > > > +        };
>> > > > +
>> > > > +        /* Add a 34000 priority flow to advance the DNS reply from
>> > > ovn-controller,
>> > > > +         * if the CMS has configured DNS records for the datapath.
>> > > > +         */
>> > > > +        if (sw.has_dns_records) {
>> > > > +            Flow(.logical_datapath = ls._uuid,
>> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> > > > +                 .priority         = 34000,
>> > > > +                 .__match          = "udp.src == 53",
>> > > > +                 .actions          = if has_stateful "ct_commit;
>> next;"
>> > > else "next;",
>> > > > +                 .external_ids     = map_empty())
>> > > > +        };
>> > > > +
>> > > > +        /* Add a 34000 priority flow to advance the service monitor
>> reply
>> > > > +         * packets to skip applying ingress ACLs. */
>> > > >          Flow(.logical_datapath = ls._uuid,
>> > > >               .stage            = s_SWITCH_IN_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
>> > > mldv2",
>> > > > +             .priority         = 34000,
>> > > > +             .__match          = "eth.dst == $svc_monitor_mac",
>> > > >               .actions          = "next;",
>> > > >               .external_ids     = map_empty());
>> > > > -        Flow(.logical_datapath = ls._uuid,
>> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> > > > -             .priority         = 65535,
>> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1 ||
>> > > mldv2",
>> > > > -             .actions          = "next;",
>> > > > -             .external_ids     = map_empty())
>> > > > -    };
>> > > > -
>> > > > -    /* Add a 34000 priority flow to advance the DNS reply from
>> > > ovn-controller,
>> > > > -     * if the CMS has configured DNS records for the datapath.
>> > > > -     */
>> > > > -    if (sw.has_dns_records) {
>> > > >          Flow(.logical_datapath = ls._uuid,
>> > > >               .stage            = s_SWITCH_OUT_ACL(),
>> > > >               .priority         = 34000,
>> > > > -             .__match          = "udp.src == 53",
>> > > > -             .actions          = if has_stateful "ct_commit; next;"
>> else
>> > > "next;",
>> > > > +             .__match          = "eth.src == $svc_monitor_mac",
>> > > > +             .actions          = "next;",
>> > > >               .external_ids     = map_empty())
>> > > > -    };
>> > > > -
>> > > > -    /* Add a 34000 priority flow to advance the service monitor
>> reply
>> > > > -     * packets to skip applying ingress ACLs. */
>> > > > -    Flow(.logical_datapath = ls._uuid,
>> > > > -         .stage            = s_SWITCH_IN_ACL(),
>> > > > -         .priority         = 34000,
>> > > > -         .__match          = "eth.dst == $svc_monitor_mac",
>> > > > -         .actions          = "next;",
>> > > > -         .external_ids     = map_empty());
>> > > > -    Flow(.logical_datapath = ls._uuid,
>> > > > -         .stage            = s_SWITCH_OUT_ACL(),
>> > > > -         .priority         = 34000,
>> > > > -         .__match          = "eth.src == $svc_monitor_mac",
>> > > > -         .actions          = "next;",
>> > > > -         .external_ids     = map_empty())
>> > > > +    }
>> > > >  }
>> > > >
>> > > >  /* This stage builds hints for the IN/OUT_ACL stage. Based on
>> various
>> > > > diff --git a/ovn-nb.xml b/ovn-nb.xml
>> > > > index feb38b5d31..e337be4eb3 100644
>> > > > --- a/ovn-nb.xml
>> > > > +++ b/ovn-nb.xml
>> > > > @@ -240,6 +240,21 @@
>> > > >          </p>
>> > > >        </column>
>> > > >
>> > > > +      <column name="options" key="use_ct_inv_match">
>> > > > +        <p>
>> > > > +          If set to false, <code>ovn-northd</code> will not use the
>> > > > +          <code>ct.inv</code> field in any of the logical flow
>> matches.
>> > > > +          The default value is true.  If the NIC supports
>> offloading
>> > > > +          OVS datapath flows but doesn't support offloading
>> ct_state
>> > > > +          <code>inv</code> flag, then the datapath flows matching
>> on
>> > > this flag
>> > > > +          (either <code>+inv</code> or <code>-inv</code>) will not
>> be
>> > > > +          offloaded.  CMS should consider setting
>> > > <code>use_ct_inv_match</code>
>> > > > +          to <code>false</code> in such cases.  This results in a
>> side
>> > > effect
>> > > > +          of the invalid packets getting delivered to the
>> destination
>> > > VIF, which
>> > > > +          otherwise would have been dropped by <code>OVN</code>.
>> > > > +        </p>
>> > > > +      </column>
>> > > > +
>> > > >        <group title="Options for configuring interconnection route
>> > > advertisement">
>> > > >          <p>
>> > > >            These options control how routes are advertised between
>> OVN
>> > > > --
>> > > > 2.29.2
>> > > >
>> > > > _______________________________________________
>> > > > dev mailing list
>> > > > dev@openvswitch.org
>> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> > >
>> > > Acked-by: Han Zhou <hzhou@ovn.org>
>> >
>> > Thanks for the reviews Mark G and Han.
>> >
>> > I applied both the patches to the main branch addressing Han's comment
>> in
>> p1.
>> > When I submitted this patch, I had a test case which somehow got removed
>> > in the later versions.  I added the test case before applying.
>> >
>> > Regards
>> > Numan
>>
>> Hi Numan,
>>
>> Seems this merge breaks the CI. All these vtep related cases are failing:
>> 898: ovn-controller-vtep - chassis FAILED (ovn-controller-vtep.at:110)
>> 899: ovn-controller-vtep - binding 1 FAILED (ovn-controller-vtep.at:178)
>> 900: ovn-controller-vtep - binding 2 FAILED (ovn-controller-vtep.at:252)
>> 901: ovn-controller-vtep - vtep-lswitch FAILED (
>> ovn-controller-vtep.at:291)
>> 902: ovn-controller-vtep - vtep-macs 1 FAILED (ovn-controller-vtep.at:343
>> )
>> 903: ovn-controller-vtep - vtep-macs 2 FAILED (ovn-controller-vtep.at:431
>> )
>>
>> All these tests are successful when I run them locally. I haven't checked
>> more details yet.
>>
>
> Hi Han,
>
> I noticed that. In fact it's failing without these patches too.
>
> I quickly looked into it and looks like it's failing because of some
> warning message related to dns_resolve in ovsdb-server log.
>
> I think wen need to white list this warning message in vtep tests.
>
>
> Thanks
> Numan
>
> ok. Thanks for checking! Do you happen to know what change introduced
this? Also I wonder why I don't see these failures locally. Is it test
server's problem (regarding DNS?)

>
>
>>
>>
>> Thanks,
>> Han
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
Numan Siddique May 7, 2021, 2:20 a.m. UTC | #7
On Thu, May 6, 2021, 9:17 PM Han Zhou <hzhou@ovn.org> wrote:

> On Thu, May 6, 2021 at 6:13 PM Numan Siddique <numans@ovn.org> wrote:
>
> >
> >
> > On Thu, May 6, 2021, 7:00 PM Han Zhou <hzhou@ovn.org> wrote:
> >
> >> On Thu, May 6, 2021 at 11:41 AM Numan Siddique <numans@ovn.org> wrote:
> >> >
> >> > On Thu, May 6, 2021 at 2:35 AM Han Zhou <hzhou@ovn.org> wrote:
> >> > >
> >> > > On Thu, Apr 22, 2021 at 6:25 AM <numans@ovn.org> wrote:
> >> > > >
> >> > > > From: Numan Siddique <numans@ovn.org>
> >> > > >
> >> > > > Presently we add 65535 priority lflows in the stages -
> >> > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
> >> > > > match on 'ct.inv'.
> >> > > >
> >> > > > This patch provides an option to not use this field in the
> >> > > > logical flow matches for the following
> >> > > > reasons:
> >> > > >
> >> > > >  • Some of the smart NICs which support offloading datapath
> >> > > >    flows may not support this field.  In such cases, almost
> >> > > >    all the datapath flows (matching on the ct_state field +inv
> >> > > >    or -inv cannot be offloaded if ACLs/load balancers
> >> > > >    are configured on the logical switch datapath.
> >> > > >
> >> > > >  • A recent commit in kernel ovs datapath sets the committed
> >> > > >    connection tracking entry to be liberal for out-of-window
> >> > > >    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
> >> > > >    packets will not be marked as invalid.
> >> > > >
> >> > > > There are still certain scenarios where a packet can be marked
> >> > > > invalid.  Like
> >> > > >  • If the tcp flags are not correct.
> >> > > >
> >> > > >  • If there are checksum errors.
> >> > > >
> >> > > > In such cases, the packet will be delivered to the destination
> >> > > > port.  So CMS should evaluate their usecases before enabling
> >> > > > this option.
> >> > > >
> >> > > > Signed-off-by: Numan Siddique <numans@ovn.org>
> >> > > > Co-authored-by: Ben Pfaff <blp@ovn.org>
> >> > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> >> > > > ---
> >> > > > v3 -> v4
> >> > > > ---
> >> > > >   * Moved the static variable 'use_ct_inv_match' declaration up in
> >> the
> >> > > >     file.
> >> > > >
> >> > > > v2 -> v3
> >> > > > ----
> >> > > >   * Updated the documentation in ovn-nb.xml as per Mark G's
> >> suggestion.
> >> > > >
> >> > > > v1 -> v2
> >> > > > ----
> >> > > >   * Adopted the code changes shared by Ben Pfaff for the ddlog
> code.
> >> > > >
> >> > > >  NEWS                 |   7 +-
> >> > > >  northd/lswitch.dl    |   7 +
> >> > > >  northd/ovn-northd.c  |  42 +++---
> >> > > >  northd/ovn_northd.dl | 304
> >> ++++++++++++++++++++++---------------------
> >> > > >  ovn-nb.xml           |  15 +++
> >> > > >  5 files changed, 208 insertions(+), 167 deletions(-)
> >> > > >
> >> > > > diff --git a/NEWS b/NEWS
> >> > > > index 1ddde15f86..5f1365775d 100644
> >> > > > --- a/NEWS
> >> > > > +++ b/NEWS
> >> > > > @@ -6,11 +6,16 @@ Post-v21.03.0
> >> > > >      expected to scale better than the C implementation, for large
> >> > > deployments.
> >> > > >      (This may take testing and tuning to be effective.)  This
> >> version of
> >> > > OVN
> >> > > >      requires DDLog 0.36.
> >> > > > -  - Introduce ovn-controller incremetal processing engine
> >> statistics
> >> > > > +  - Introduce ovn-controller incremental processing engine
> >> statistics
> >> > > >    - Introduced parallel processing in ovn-northd with the
> NB_Global
> >> > > config option
> >> > > >      'use_parallel_build' to enable it.  It is disabled by
> default.
> >> > > >    - Support vlan-passthru mode for tag=0 localnet ports.
> >> > > >    - Support custom 802.11ad EthType for localnet ports.
> >> > > > +  - Add a new NB Global option - use_ct_inv_match with the
> default
> >> value
> >> > > of true.
> >> > > > +    If this is set to false, then the logical field - "ct.inv"
> will
> >> not
> >> > > be
> >> > > > +    used in the logical flow matches.  CMS can consider setting
> >> this
> >> to
> >> > > false,
> >> > > > +    if they want to use smart NICs which don't support offloading
> >> > > datapath flows
> >> > > > +    with this field used.
> >> > > >
> >> > > >  OVN v21.03.0 - 12 Mar 2021
> >> > > >  -------------------------
> >> > > > diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> >> > > > index 47c497e0cf..27ac3cadc5 100644
> >> > > > --- a/northd/lswitch.dl
> >> > > > +++ b/northd/lswitch.dl
> >> > > > @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options:
> >> > > Map<string,string>,
> >> > > >  relation SvcMonitorMac(mac: eth_addr)
> >> > > >  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
> >> > > >      nb::NB_Global(._uuid = uuid, .options = options).
> >> > > > +
> >> > > > +relation UseCtInvMatch[bool]
> >> > > > +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
> >> > > > +    nb::NB_Global(.options = options).
> >> > > > +UseCtInvMatch[true] :-
> >> > > > +    Unit(),
> >> > > > +    not nb in nb::NB_Global().
> >> > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> >> > > > index de1aa896d3..5f6c0a6922 100644
> >> > > > --- a/northd/ovn-northd.c
> >> > > > +++ b/northd/ovn-northd.c
> >> > > > @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
> >> > > >  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
> >> > > >  static struct eth_addr svc_monitor_mac_ea;
> >> > > >
> >> > > > +/* If this option is 'true' northd will make use of ct.inv match
> >> fields.
> >> > > > + * Otherwise, it will avoid using it.  The default is true. */
> >> > > > +static bool use_ct_inv_match = true;
> >> > > > +
> >> > > >  /* Default probe interval for NB and SB DB connections. */
> >> > > >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> >> > > >  static int northd_probe_interval_nb = 0;
> >> > > > @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map,
> bool
> >> > > shared,
> >> > > >      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
> >> > > >  }
> >> > > >
> >> > > > -
> >> > > >  /* Adds a row with the specified contents to the Logical_Flow
> >> table.
> >> */
> >> > > >  static void
> >> > > >  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
> >> > > > @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct
> >> hmap
> >> > > *lflows,
> >> > > >           * for deletion (bit 0 of ct_label is set).
> >> > > >           *
> >> > > >           * This is enforced at a higher priority than ACLs can be
> >> > > defined. */
> >> > > > +        char *match = xasprintf("%s(ct.est && ct.rpl &&
> >> ct_label.blocked
> >> > > == 1)",
> >> > > > +                                use_ct_inv_match ? "ct.inv || " :
> >> "");
> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> >> > > > -                      "ct.inv || (ct.est && ct.rpl &&
> >> ct_label.blocked
> >> > > == 1)",
> >> > > > -                      "drop;");
> >> > > > +                      match, "drop;");
> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> >> > > > -                      "ct.inv || (ct.est && ct.rpl &&
> >> ct_label.blocked
> >> > > == 1)",
> >> > > > -                      "drop;");
> >> > > > +                      match, "drop;");
> >> > > > +        free(match);
> >> > > >
> >> > > >          /* Ingress and Egress ACL Table (Priority 65535).
> >> > > >           *
> >> > > > @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct
> >> hmap
> >> > > *lflows,
> >> > > >           * direction to hit the currently defined policy from
> ACLs.
> >> > > >           *
> >> > > >           * This is enforced at a higher priority than ACLs can be
> >> > > defined. */
> >> > > > +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
> >> > > > +                          "ct.rpl && ct_label.blocked == 0",
> >> > > > +                          use_ct_inv_match ? " && !ct.inv" : "");
> >> > > > +
> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> >> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> >> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
> >> > > > -                      "next;");
> >> > > > +                      match, "next;");
> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> >> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
> >> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
> >> > > > -                      "next;");
> >> > > > +                      match, "next;");
> >> > > > +        free(match);
> >> > > >
> >> > > >          /* Ingress and Egress ACL Table (Priority 65535).
> >> > > >           *
> >> > > > @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct
> >> hmap
> >> > > *lflows,
> >> > > >           * a dynamically negotiated FTP data channel), but will
> >> allow
> >> > > >           * related traffic such as an ICMP Port Unreachable
> through
> >> > > >           * that's generated from a non-listening UDP port.  */
> >> > > > +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
> >> > > > +                          "ct_label.blocked == 0",
> >> > > > +                          use_ct_inv_match ? " && !ct.inv" : "");
> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
> >> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> >> > > > -                      "&& ct_label.blocked == 0",
> >> > > > -                      "next;");
> >> > > > +                      match, "next;");
> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
> >> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
> >> > > > -                      "&& ct_label.blocked == 0",
> >> > > > -                      "next;");
> >> > > > +                      match, "next;");
> >> > > > +        free(match);
> >> > > >
> >> > > >          /* Ingress and Egress ACL Table (Priority 65535).
> >> > > >           *
> >> > > > @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
> >> > > >
> >> > > >      use_logical_dp_groups = smap_get_bool(&nb->options,
> >> > > >
> "use_logical_dp_groups",
> >> > > false);
> >> > > > +    use_ct_inv_match = smap_get_bool(&nb->options,
> >> > > > +                                     "use_ct_inv_match", true);
> >> > > > +
> >> > > >      /* deprecated, use --event instead */
> >> > > >      controller_event_en = smap_get_bool(&nb->options,
> >> > > >                                          "controller_event",
> false);
> >> > > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> >> > > > index 2f5d36c599..a0f7944103 100644
> >> > > > --- a/northd/ovn_northd.dl
> >> > > > +++ b/northd/ovn_northd.dl
> >> > > > @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage,
> acl,
> >> > > fair_meter, extra_match_, extra_action
> >> > > >  }
> >> > > >
> >> > > >  /* build_acls */
> >> > > > -for (sw in &Switch(.ls = ls))
> >> > > > -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> >> > > > -{
> >> > > > -    /* Ingress and Egress ACL Table (Priority 0): Packets are
> >> allowed by
> >> > > > -     * default.  A related rule at priority 1 is added below if
> >> there
> >> > > > -     * are any stateful ACLs in this datapath. */
> >> > > > -    Flow(.logical_datapath = ls._uuid,
> >> > > > -         .stage            = s_SWITCH_IN_ACL(),
> >> > > > -         .priority         = 0,
> >> > > > -         .__match          = "1",
> >> > > > -         .actions          = "next;",
> >> > > > -         .external_ids     = map_empty());
> >> > > > -    Flow(.logical_datapath = ls._uuid,
> >> > > > -         .stage            = s_SWITCH_OUT_ACL(),
> >> > > > -         .priority         = 0,
> >> > > > -         .__match          = "1",
> >> > > > -         .actions          = "next;",
> >> > > > -         .external_ids     = map_empty());
> >> > > > -
> >> > > > -    if (has_stateful) {
> >> > > > -        /* Ingress and Egress ACL Table (Priority 1).
> >> > > > -         *
> >> > > > -         * By default, traffic is allowed.  This is partially
> >> handled by
> >> > > > -         * the Priority 0 ACL flows added earlier, but we also
> need
> >> to
> >> > > > -         * commit IP flows.  This is because, while the
> initiater's
> >> > > > -         * direction may not have any stateful rules, the
> server's
> >> may
> >> > > > -         * and then its return traffic would not have an
> associated
> >> > > > -         * conntrack entry and would return "+invalid".
> >> > > > -         *
> >> > > > -         * We use "ct_commit" for a connection that is not
> already
> >> known
> >> > > > -         * by the connection tracker.  Once a connection is
> >> committed,
> >> > > > -         * subsequent packets will hit the flow at priority 0
> that
> >> just
> >> > > > -         * uses "next;"
> >> > > > -         *
> >> > > > -         * We also check for established connections that have
> >> > > ct_label.blocked
> >> > > > -         * set on them.  That's a connection that was disallowed,
> >> but is
> >> > > > -         * now allowed by policy again since it hit this
> >> default-allow
> >> > > flow.
> >> > > > -         * We need to set ct_label.blocked=0 to let the
> connection
> >> > > continue,
> >> > > > -         * which will be done by ct_commit() in the "stateful"
> >> stage.
> >> > > > -         * Subsequent packets will hit the flow at priority 0
> that
> >> just
> >> > > > -         * uses "next;". */
> >> > > > -        Flow(.logical_datapath = ls._uuid,
> >> > > > -             .stage            = s_SWITCH_IN_ACL(),
> >> > > > -             .priority         = 1,
> >> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
> >> > > ct_label.blocked == 1))",
> >> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} =
> 1;
> >> > > next;",
> >> > > > -             .external_ids     = map_empty());
> >> > > > -        Flow(.logical_datapath = ls._uuid,
> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> >> > > > -             .priority         = 1,
> >> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
> >> > > ct_label.blocked == 1))",
> >> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} =
> 1;
> >> > > next;",
> >> > > > -             .external_ids     = map_empty());
> >> > > > -
> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > -         *
> >> > > > -         * Always drop traffic that's in an invalid state.  Also
> >> drop
> >> > > > -         * reply direction packets for connections that have been
> >> marked
> >> > > > -         * for deletion (bit 0 of ct_label is set).
> >> > > > -         *
> >> > > > -         * This is enforced at a higher priority than ACLs can be
> >> > > defined. */
> >> > > > -        Flow(.logical_datapath = ls._uuid,
> >> > > > -             .stage            = s_SWITCH_IN_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> >> > > ct_label.blocked == 1)",
> >> > > > -             .actions          = "drop;",
> >> > > > -             .external_ids     = map_empty());
> >> > > > -        Flow(.logical_datapath = ls._uuid,
> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
> >> > > ct_label.blocked == 1)",
> >> > > > -             .actions          = "drop;",
> >> > > > -             .external_ids     = map_empty());
> >> > > > -
> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > -         *
> >> > > > -         * Allow reply traffic that is part of an established
> >> > > > -         * conntrack entry that has not been marked for deletion
> >> > > > -         * (bit 0 of ct_label).  We only match traffic in the
> >> > > > -         * reply direction because we want traffic in the request
> >> > > > -         * direction to hit the currently defined policy from
> ACLs.
> >> > > > -         *
> >> > > > -         * This is enforced at a higher priority than ACLs can be
> >> > > defined. */
> >> > > > +for (UseCtInvMatch[use_ct_inv_match]) {
> >> > > > +    (var ct_inv_or, var and_not_ct_inv) = match
> (use_ct_inv_match)
> >> {
> >> > > > +        true -> ("ct.inv || ", "&& !ct.inv "),
> >> > > > +        false -> ("", ""),
> >> > > > +    } in
> >> > > > +    for (sw in &Switch(.ls = ls))
> >> > > > +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
> >> > > > +    {
> >> > > > +        /* Ingress and Egress ACL Table (Priority 0): Packets are
> >> > > allowed by
> >> > > > +         * default.  A related rule at priority 1 is added below
> if
> >> there
> >> > > > +         * are any stateful ACLs in this datapath. */
> >> > > >          Flow(.logical_datapath = ls._uuid,
> >> > > >               .stage            = s_SWITCH_IN_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> >> > > !ct.inv "
> >> > > > -                                 "&& ct.rpl && ct_label.blocked
> ==
> >> 0",
> >> > > > +             .priority         = 0,
> >> > > > +             .__match          = "1",
> >> > > >               .actions          = "next;",
> >> > > >               .external_ids     = map_empty());
> >> > > >          Flow(.logical_datapath = ls._uuid,
> >> > > >               .stage            = s_SWITCH_OUT_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "ct.est && !ct.rel && !ct.new &&
> >> > > !ct.inv "
> >> > > > -                                 "&& ct.rpl && ct_label.blocked
> ==
> >> 0",
> >> > > > +             .priority         = 0,
> >> > > > +             .__match          = "1",
> >> > > >               .actions          = "next;",
> >> > > >               .external_ids     = map_empty());
> >> > > >
> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > -         *
> >> > > > -         * Allow traffic that is related to an existing conntrack
> >> entry
> >> > > that
> >> > > > -         * has not been marked for deletion (bit 0 of ct_label).
> >> > > > -         *
> >> > > > -         * This is enforced at a higher priority than ACLs can be
> >> > > defined.
> >> > > > -         *
> >> > > > -         * NOTE: This does not support related data sessions (eg,
> >> > > > -         * a dynamically negotiated FTP data channel), but will
> >> allow
> >> > > > -         * related traffic such as an ICMP Port Unreachable
> through
> >> > > > -         * that's generated from a non-listening UDP port.  */
> >> > > > -        Flow(.logical_datapath = ls._uuid,
> >> > > > -             .stage            = s_SWITCH_IN_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> >> > > !ct.inv "
> >> > > > -                                 "&& ct_label.blocked == 0",
> >> > > > -             .actions          = "next;",
> >> > > > -             .external_ids     = map_empty());
> >> > > > -        Flow(.logical_datapath = ls._uuid,
> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "!ct.est && ct.rel && !ct.new &&
> >> > > !ct.inv "
> >> > > > -                                 "&& ct_label.blocked == 0",
> >> > > > -             .actions          = "next;",
> >> > > > -             .external_ids     = map_empty());
> >> > > > +        if (has_stateful) {
> >> > > > +            /* Ingress and Egress ACL Table (Priority 1).
> >> > > > +             *
> >> > > > +             * By default, traffic is allowed.  This is partially
> >> > > handled by
> >> > > > +             * the Priority 0 ACL flows added earlier, but we
> also
> >> need
> >> > > to
> >> > > > +             * commit IP flows.  This is because, while the
> >> initiater's
> >> > > > +             * direction may not have any stateful rules, the
> >> server's
> >> > > may
> >> > > > +             * and then its return traffic would not have an
> >> associated
> >> > > > +             * conntrack entry and would return "+invalid".
> >> > > > +             *
> >> > > > +             * We use "ct_commit" for a connection that is not
> >> already
> >> > > known
> >> > > > +             * by the connection tracker.  Once a connection is
> >> > > committed,
> >> > > > +             * subsequent packets will hit the flow at priority 0
> >> that
> >> > > just
> >> > > > +             * uses "next;"
> >> > > > +             *
> >> > > > +             * We also check for established connections that
> have
> >> > > ct_label.blocked
> >> > > > +             * set on them.  That's a connection that was
> >> disallowed,
> >> > > but is
> >> > > > +             * now allowed by policy again since it hit this
> >> > > default-allow flow.
> >> > > > +             * We need to set ct_label.blocked=0 to let the
> >> connection
> >> > > continue,
> >> > > > +             * which will be done by ct_commit() in the
> "stateful"
> >> stage.
> >> > > > +             * Subsequent packets will hit the flow at priority 0
> >> that
> >> > > just
> >> > > > +             * uses "next;". */
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> >> > > > +                 .priority         = 1,
> >> > > > +                 .__match          = "ip && (!ct.est || (ct.est
> &&
> >> > > ct_label.blocked == 1))",
> >> > > > +                 .actions          =
> "${rEGBIT_CONNTRACK_COMMIT()}
> >> =
> >> 1;
> >> > > next;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> >> > > > +                 .priority         = 1,
> >> > > > +                 .__match          = "ip && (!ct.est || (ct.est
> &&
> >> > > ct_label.blocked == 1))",
> >> > > > +                 .actions          =
> "${rEGBIT_CONNTRACK_COMMIT()}
> >> =
> >> 1;
> >> > > next;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > >
> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > -         *
> >> > > > -         * Not to do conntrack on ND packets. */
> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > +             *
> >> > > > +             * Always drop traffic that's in an invalid state.
> >> Also
> >> drop
> >> > > > +             * reply direction packets for connections that have
> >> been
> >> > > marked
> >> > > > +             * for deletion (bit 0 of ct_label is set).
> >> > > > +             *
> >> > > > +             * This is enforced at a higher priority than ACLs
> can
> >> be
> >> > > defined. */
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = ct_inv_or ++ "(ct.est &&
> >> ct.rpl
> >> &&
> >> > > ct_label.blocked == 1)",
> >> > > > +                 .actions          = "drop;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = ct_inv_or ++ "(ct.est &&
> >> ct.rpl
> >> &&
> >> > > ct_label.blocked == 1)",
> >> > > > +                 .actions          = "drop;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +
> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > +             *
> >> > > > +             * Allow reply traffic that is part of an established
> >> > > > +             * conntrack entry that has not been marked for
> >> deletion
> >> > > > +             * (bit 0 of ct_label).  We only match traffic in the
> >> > > > +             * reply direction because we want traffic in the
> >> request
> >> > > > +             * direction to hit the currently defined policy from
> >> ACLs.
> >> > > > +             *
> >> > > > +             * This is enforced at a higher priority than ACLs
> can
> >> be
> >> > > defined. */
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = "ct.est && !ct.rel &&
> !ct.new
> >> "
> >> ++
> >> > > and_not_ct_inv ++
> >> > > > +                                     "&& ct.rpl &&
> ct_label.blocked
> >> ==
> >> > > 0",
> >> > > > +                 .actions          = "next;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = "ct.est && !ct.rel &&
> !ct.new
> >> "
> >> ++
> >> > > and_not_ct_inv ++
> >> > > > +                                     "&& ct.rpl &&
> ct_label.blocked
> >> ==
> >> > > 0",
> >> > > > +                 .actions          = "next;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +
> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > +             *
> >> > > > +             * Allow traffic that is related to an existing
> >> conntrack
> >> > > entry that
> >> > > > +             * has not been marked for deletion (bit 0 of
> >> ct_label).
> >> > > > +             *
> >> > > > +             * This is enforced at a higher priority than ACLs
> can
> >> be
> >> > > defined.
> >> > > > +             *
> >> > > > +             * NOTE: This does not support related data sessions
> >> (eg,
> >> > > > +             * a dynamically negotiated FTP data channel), but
> will
> >> allow
> >> > > > +             * related traffic such as an ICMP Port Unreachable
> >> through
> >> > > > +             * that's generated from a non-listening UDP port.
> */
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = "!ct.est && ct.rel &&
> !ct.new
> >> "
> >> ++
> >> > > and_not_ct_inv ++
> >> > > > +                                     "&& ct_label.blocked == 0",
> >> > > > +                 .actions          = "next;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = "!ct.est && ct.rel &&
> !ct.new
> >> "
> >> ++
> >> > > and_not_ct_inv ++
> >> > > > +                                     "&& ct_label.blocked == 0",
> >> > > > +                 .actions          = "next;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +
> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
> >> > > > +             *
> >> > > > +             * Not to do conntrack on ND packets. */
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = "nd || nd_ra || nd_rs ||
> mldv1
> >> ||
> >> > > mldv2",
> >> > > > +                 .actions          = "next;",
> >> > > > +                 .external_ids     = map_empty());
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> >> > > > +                 .priority         = 65535,
> >> > > > +                 .__match          = "nd || nd_ra || nd_rs ||
> mldv1
> >> ||
> >> > > mldv2",
> >> > > > +                 .actions          = "next;",
> >> > > > +                 .external_ids     = map_empty())
> >> > > > +        };
> >> > > > +
> >> > > > +        /* Add a 34000 priority flow to advance the DNS reply
> from
> >> > > ovn-controller,
> >> > > > +         * if the CMS has configured DNS records for the
> datapath.
> >> > > > +         */
> >> > > > +        if (sw.has_dns_records) {
> >> > > > +            Flow(.logical_datapath = ls._uuid,
> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
> >> > > > +                 .priority         = 34000,
> >> > > > +                 .__match          = "udp.src == 53",
> >> > > > +                 .actions          = if has_stateful "ct_commit;
> >> next;"
> >> > > else "next;",
> >> > > > +                 .external_ids     = map_empty())
> >> > > > +        };
> >> > > > +
> >> > > > +        /* Add a 34000 priority flow to advance the service
> monitor
> >> reply
> >> > > > +         * packets to skip applying ingress ACLs. */
> >> > > >          Flow(.logical_datapath = ls._uuid,
> >> > > >               .stage            = s_SWITCH_IN_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1
> ||
> >> > > mldv2",
> >> > > > +             .priority         = 34000,
> >> > > > +             .__match          = "eth.dst == $svc_monitor_mac",
> >> > > >               .actions          = "next;",
> >> > > >               .external_ids     = map_empty());
> >> > > > -        Flow(.logical_datapath = ls._uuid,
> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
> >> > > > -             .priority         = 65535,
> >> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1
> ||
> >> > > mldv2",
> >> > > > -             .actions          = "next;",
> >> > > > -             .external_ids     = map_empty())
> >> > > > -    };
> >> > > > -
> >> > > > -    /* Add a 34000 priority flow to advance the DNS reply from
> >> > > ovn-controller,
> >> > > > -     * if the CMS has configured DNS records for the datapath.
> >> > > > -     */
> >> > > > -    if (sw.has_dns_records) {
> >> > > >          Flow(.logical_datapath = ls._uuid,
> >> > > >               .stage            = s_SWITCH_OUT_ACL(),
> >> > > >               .priority         = 34000,
> >> > > > -             .__match          = "udp.src == 53",
> >> > > > -             .actions          = if has_stateful "ct_commit;
> next;"
> >> else
> >> > > "next;",
> >> > > > +             .__match          = "eth.src == $svc_monitor_mac",
> >> > > > +             .actions          = "next;",
> >> > > >               .external_ids     = map_empty())
> >> > > > -    };
> >> > > > -
> >> > > > -    /* Add a 34000 priority flow to advance the service monitor
> >> reply
> >> > > > -     * packets to skip applying ingress ACLs. */
> >> > > > -    Flow(.logical_datapath = ls._uuid,
> >> > > > -         .stage            = s_SWITCH_IN_ACL(),
> >> > > > -         .priority         = 34000,
> >> > > > -         .__match          = "eth.dst == $svc_monitor_mac",
> >> > > > -         .actions          = "next;",
> >> > > > -         .external_ids     = map_empty());
> >> > > > -    Flow(.logical_datapath = ls._uuid,
> >> > > > -         .stage            = s_SWITCH_OUT_ACL(),
> >> > > > -         .priority         = 34000,
> >> > > > -         .__match          = "eth.src == $svc_monitor_mac",
> >> > > > -         .actions          = "next;",
> >> > > > -         .external_ids     = map_empty())
> >> > > > +    }
> >> > > >  }
> >> > > >
> >> > > >  /* This stage builds hints for the IN/OUT_ACL stage. Based on
> >> various
> >> > > > diff --git a/ovn-nb.xml b/ovn-nb.xml
> >> > > > index feb38b5d31..e337be4eb3 100644
> >> > > > --- a/ovn-nb.xml
> >> > > > +++ b/ovn-nb.xml
> >> > > > @@ -240,6 +240,21 @@
> >> > > >          </p>
> >> > > >        </column>
> >> > > >
> >> > > > +      <column name="options" key="use_ct_inv_match">
> >> > > > +        <p>
> >> > > > +          If set to false, <code>ovn-northd</code> will not use
> the
> >> > > > +          <code>ct.inv</code> field in any of the logical flow
> >> matches.
> >> > > > +          The default value is true.  If the NIC supports
> >> offloading
> >> > > > +          OVS datapath flows but doesn't support offloading
> >> ct_state
> >> > > > +          <code>inv</code> flag, then the datapath flows matching
> >> on
> >> > > this flag
> >> > > > +          (either <code>+inv</code> or <code>-inv</code>) will
> not
> >> be
> >> > > > +          offloaded.  CMS should consider setting
> >> > > <code>use_ct_inv_match</code>
> >> > > > +          to <code>false</code> in such cases.  This results in a
> >> side
> >> > > effect
> >> > > > +          of the invalid packets getting delivered to the
> >> destination
> >> > > VIF, which
> >> > > > +          otherwise would have been dropped by <code>OVN</code>.
> >> > > > +        </p>
> >> > > > +      </column>
> >> > > > +
> >> > > >        <group title="Options for configuring interconnection route
> >> > > advertisement">
> >> > > >          <p>
> >> > > >            These options control how routes are advertised between
> >> OVN
> >> > > > --
> >> > > > 2.29.2
> >> > > >
> >> > > > _______________________________________________
> >> > > > dev mailing list
> >> > > > dev@openvswitch.org
> >> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >> > >
> >> > > Acked-by: Han Zhou <hzhou@ovn.org>
> >> >
> >> > Thanks for the reviews Mark G and Han.
> >> >
> >> > I applied both the patches to the main branch addressing Han's comment
> >> in
> >> p1.
> >> > When I submitted this patch, I had a test case which somehow got
> removed
> >> > in the later versions.  I added the test case before applying.
> >> >
> >> > Regards
> >> > Numan
> >>
> >> Hi Numan,
> >>
> >> Seems this merge breaks the CI. All these vtep related cases are
> failing:
> >> 898: ovn-controller-vtep - chassis FAILED (ovn-controller-vtep.at:110)
> >> 899: ovn-controller-vtep - binding 1 FAILED (ovn-controller-vtep.at:178
> )
> >> 900: ovn-controller-vtep - binding 2 FAILED (ovn-controller-vtep.at:252
> )
> >> 901: ovn-controller-vtep - vtep-lswitch FAILED (
> >> ovn-controller-vtep.at:291)
> >> 902: ovn-controller-vtep - vtep-macs 1 FAILED (
> ovn-controller-vtep.at:343
> >> )
> >> 903: ovn-controller-vtep - vtep-macs 2 FAILED (
> ovn-controller-vtep.at:431
> >> )
> >>
> >> All these tests are successful when I run them locally. I haven't
> checked
> >> more details yet.
> >>
> >
> > Hi Han,
> >
> > I noticed that. In fact it's failing without these patches too.
> >
> > I quickly looked into it and looks like it's failing because of some
> > warning message related to dns_resolve in ovsdb-server log.
> >
> > I think wen need to white list this warning message in vtep tests.
> >
> >
> > Thanks
> > Numan
> >
> > ok. Thanks for checking! Do you happen to know what change introduced
> this? Also I wonder why I don't see these failures locally. Is it test
> server's problem (regarding DNS?)
>

I didn't get a chance to look further.  If you're planning to look into it
then please do so.

Thanks
Numan


> >
> >
> >>
> >>
> >> Thanks,
> >> Han
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou May 7, 2021, 6:11 a.m. UTC | #8
On Thu, May 6, 2021 at 7:21 PM Numan Siddique <numans@ovn.org> wrote:

>
>
> On Thu, May 6, 2021, 9:17 PM Han Zhou <hzhou@ovn.org> wrote:
>
>> On Thu, May 6, 2021 at 6:13 PM Numan Siddique <numans@ovn.org> wrote:
>>
>> >
>> >
>> > On Thu, May 6, 2021, 7:00 PM Han Zhou <hzhou@ovn.org> wrote:
>> >
>> >> On Thu, May 6, 2021 at 11:41 AM Numan Siddique <numans@ovn.org> wrote:
>> >> >
>> >> > On Thu, May 6, 2021 at 2:35 AM Han Zhou <hzhou@ovn.org> wrote:
>> >> > >
>> >> > > On Thu, Apr 22, 2021 at 6:25 AM <numans@ovn.org> wrote:
>> >> > > >
>> >> > > > From: Numan Siddique <numans@ovn.org>
>> >> > > >
>> >> > > > Presently we add 65535 priority lflows in the stages -
>> >> > > > 'ls_in_acl' and 'ls_out_acl' to drop packets which
>> >> > > > match on 'ct.inv'.
>> >> > > >
>> >> > > > This patch provides an option to not use this field in the
>> >> > > > logical flow matches for the following
>> >> > > > reasons:
>> >> > > >
>> >> > > >  • Some of the smart NICs which support offloading datapath
>> >> > > >    flows may not support this field.  In such cases, almost
>> >> > > >    all the datapath flows (matching on the ct_state field +inv
>> >> > > >    or -inv cannot be offloaded if ACLs/load balancers
>> >> > > >    are configured on the logical switch datapath.
>> >> > > >
>> >> > > >  • A recent commit in kernel ovs datapath sets the committed
>> >> > > >    connection tracking entry to be liberal for out-of-window
>> >> > > >    tcp packets (nf_ct_set_tcp_be_liberal()).  Such TCP
>> >> > > >    packets will not be marked as invalid.
>> >> > > >
>> >> > > > There are still certain scenarios where a packet can be marked
>> >> > > > invalid.  Like
>> >> > > >  • If the tcp flags are not correct.
>> >> > > >
>> >> > > >  • If there are checksum errors.
>> >> > > >
>> >> > > > In such cases, the packet will be delivered to the destination
>> >> > > > port.  So CMS should evaluate their usecases before enabling
>> >> > > > this option.
>> >> > > >
>> >> > > > Signed-off-by: Numan Siddique <numans@ovn.org>
>> >> > > > Co-authored-by: Ben Pfaff <blp@ovn.org>
>> >> > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
>> >> > > > ---
>> >> > > > v3 -> v4
>> >> > > > ---
>> >> > > >   * Moved the static variable 'use_ct_inv_match' declaration up
>> in
>> >> the
>> >> > > >     file.
>> >> > > >
>> >> > > > v2 -> v3
>> >> > > > ----
>> >> > > >   * Updated the documentation in ovn-nb.xml as per Mark G's
>> >> suggestion.
>> >> > > >
>> >> > > > v1 -> v2
>> >> > > > ----
>> >> > > >   * Adopted the code changes shared by Ben Pfaff for the ddlog
>> code.
>> >> > > >
>> >> > > >  NEWS                 |   7 +-
>> >> > > >  northd/lswitch.dl    |   7 +
>> >> > > >  northd/ovn-northd.c  |  42 +++---
>> >> > > >  northd/ovn_northd.dl | 304
>> >> ++++++++++++++++++++++---------------------
>> >> > > >  ovn-nb.xml           |  15 +++
>> >> > > >  5 files changed, 208 insertions(+), 167 deletions(-)
>> >> > > >
>> >> > > > diff --git a/NEWS b/NEWS
>> >> > > > index 1ddde15f86..5f1365775d 100644
>> >> > > > --- a/NEWS
>> >> > > > +++ b/NEWS
>> >> > > > @@ -6,11 +6,16 @@ Post-v21.03.0
>> >> > > >      expected to scale better than the C implementation, for
>> large
>> >> > > deployments.
>> >> > > >      (This may take testing and tuning to be effective.)  This
>> >> version of
>> >> > > OVN
>> >> > > >      requires DDLog 0.36.
>> >> > > > -  - Introduce ovn-controller incremetal processing engine
>> >> statistics
>> >> > > > +  - Introduce ovn-controller incremental processing engine
>> >> statistics
>> >> > > >    - Introduced parallel processing in ovn-northd with the
>> NB_Global
>> >> > > config option
>> >> > > >      'use_parallel_build' to enable it.  It is disabled by
>> default.
>> >> > > >    - Support vlan-passthru mode for tag=0 localnet ports.
>> >> > > >    - Support custom 802.11ad EthType for localnet ports.
>> >> > > > +  - Add a new NB Global option - use_ct_inv_match with the
>> default
>> >> value
>> >> > > of true.
>> >> > > > +    If this is set to false, then the logical field - "ct.inv"
>> will
>> >> not
>> >> > > be
>> >> > > > +    used in the logical flow matches.  CMS can consider setting
>> >> this
>> >> to
>> >> > > false,
>> >> > > > +    if they want to use smart NICs which don't support
>> offloading
>> >> > > datapath flows
>> >> > > > +    with this field used.
>> >> > > >
>> >> > > >  OVN v21.03.0 - 12 Mar 2021
>> >> > > >  -------------------------
>> >> > > > diff --git a/northd/lswitch.dl b/northd/lswitch.dl
>> >> > > > index 47c497e0cf..27ac3cadc5 100644
>> >> > > > --- a/northd/lswitch.dl
>> >> > > > +++ b/northd/lswitch.dl
>> >> > > > @@ -742,3 +742,10 @@ function put_svc_monitor_mac(options:
>> >> > > Map<string,string>,
>> >> > > >  relation SvcMonitorMac(mac: eth_addr)
>> >> > > >  SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
>> >> > > >      nb::NB_Global(._uuid = uuid, .options = options).
>> >> > > > +
>> >> > > > +relation UseCtInvMatch[bool]
>> >> > > > +UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
>> >> > > > +    nb::NB_Global(.options = options).
>> >> > > > +UseCtInvMatch[true] :-
>> >> > > > +    Unit(),
>> >> > > > +    not nb in nb::NB_Global().
>> >> > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> >> > > > index de1aa896d3..5f6c0a6922 100644
>> >> > > > --- a/northd/ovn-northd.c
>> >> > > > +++ b/northd/ovn-northd.c
>> >> > > > @@ -98,6 +98,10 @@ static bool check_lsp_is_up;
>> >> > > >  static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
>> >> > > >  static struct eth_addr svc_monitor_mac_ea;
>> >> > > >
>> >> > > > +/* If this option is 'true' northd will make use of ct.inv match
>> >> fields.
>> >> > > > + * Otherwise, it will avoid using it.  The default is true. */
>> >> > > > +static bool use_ct_inv_match = true;
>> >> > > > +
>> >> > > >  /* Default probe interval for NB and SB DB connections. */
>> >> > > >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>> >> > > >  static int northd_probe_interval_nb = 0;
>> >> > > > @@ -4099,7 +4103,6 @@ do_ovn_lflow_add(struct hmap *lflow_map,
>> bool
>> >> > > shared,
>> >> > > >      hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
>> >> > > >  }
>> >> > > >
>> >> > > > -
>> >> > > >  /* Adds a row with the specified contents to the Logical_Flow
>> >> table.
>> >> */
>> >> > > >  static void
>> >> > > >  ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath
>> *od,
>> >> > > > @@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od,
>> struct
>> >> hmap
>> >> > > *lflows,
>> >> > > >           * for deletion (bit 0 of ct_label is set).
>> >> > > >           *
>> >> > > >           * This is enforced at a higher priority than ACLs can
>> be
>> >> > > defined. */
>> >> > > > +        char *match = xasprintf("%s(ct.est && ct.rpl &&
>> >> ct_label.blocked
>> >> > > == 1)",
>> >> > > > +                                use_ct_inv_match ? "ct.inv || "
>> :
>> >> "");
>> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
>> >> > > > -                      "ct.inv || (ct.est && ct.rpl &&
>> >> ct_label.blocked
>> >> > > == 1)",
>> >> > > > -                      "drop;");
>> >> > > > +                      match, "drop;");
>> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
>> >> > > > -                      "ct.inv || (ct.est && ct.rpl &&
>> >> ct_label.blocked
>> >> > > == 1)",
>> >> > > > -                      "drop;");
>> >> > > > +                      match, "drop;");
>> >> > > > +        free(match);
>> >> > > >
>> >> > > >          /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > >           *
>> >> > > > @@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od,
>> struct
>> >> hmap
>> >> > > *lflows,
>> >> > > >           * direction to hit the currently defined policy from
>> ACLs.
>> >> > > >           *
>> >> > > >           * This is enforced at a higher priority than ACLs can
>> be
>> >> > > defined. */
>> >> > > > +        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
>> >> > > > +                          "ct.rpl && ct_label.blocked == 0",
>> >> > > > +                          use_ct_inv_match ? " && !ct.inv" :
>> "");
>> >> > > > +
>> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
>> >> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
>> >> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
>> >> > > > -                      "next;");
>> >> > > > +                      match, "next;");
>> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
>> >> > > > -                      "ct.est && !ct.rel && !ct.new && !ct.inv "
>> >> > > > -                      "&& ct.rpl && ct_label.blocked == 0",
>> >> > > > -                      "next;");
>> >> > > > +                      match, "next;");
>> >> > > > +        free(match);
>> >> > > >
>> >> > > >          /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > >           *
>> >> > > > @@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od,
>> struct
>> >> hmap
>> >> > > *lflows,
>> >> > > >           * a dynamically negotiated FTP data channel), but will
>> >> allow
>> >> > > >           * related traffic such as an ICMP Port Unreachable
>> through
>> >> > > >           * that's generated from a non-listening UDP port.  */
>> >> > > > +        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
>> >> > > > +                          "ct_label.blocked == 0",
>> >> > > > +                          use_ct_inv_match ? " && !ct.inv" :
>> "");
>> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
>> >> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
>> >> > > > -                      "&& ct_label.blocked == 0",
>> >> > > > -                      "next;");
>> >> > > > +                      match, "next;");
>> >> > > >          ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
>> >> > > > -                      "!ct.est && ct.rel && !ct.new && !ct.inv "
>> >> > > > -                      "&& ct_label.blocked == 0",
>> >> > > > -                      "next;");
>> >> > > > +                      match, "next;");
>> >> > > > +        free(match);
>> >> > > >
>> >> > > >          /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > >           *
>> >> > > > @@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
>> >> > > >
>> >> > > >      use_logical_dp_groups = smap_get_bool(&nb->options,
>> >> > > >
>> "use_logical_dp_groups",
>> >> > > false);
>> >> > > > +    use_ct_inv_match = smap_get_bool(&nb->options,
>> >> > > > +                                     "use_ct_inv_match", true);
>> >> > > > +
>> >> > > >      /* deprecated, use --event instead */
>> >> > > >      controller_event_en = smap_get_bool(&nb->options,
>> >> > > >                                          "controller_event",
>> false);
>> >> > > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
>> >> > > > index 2f5d36c599..a0f7944103 100644
>> >> > > > --- a/northd/ovn_northd.dl
>> >> > > > +++ b/northd/ovn_northd.dl
>> >> > > > @@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage,
>> acl,
>> >> > > fair_meter, extra_match_, extra_action
>> >> > > >  }
>> >> > > >
>> >> > > >  /* build_acls */
>> >> > > > -for (sw in &Switch(.ls = ls))
>> >> > > > -var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
>> >> > > > -{
>> >> > > > -    /* Ingress and Egress ACL Table (Priority 0): Packets are
>> >> allowed by
>> >> > > > -     * default.  A related rule at priority 1 is added below if
>> >> there
>> >> > > > -     * are any stateful ACLs in this datapath. */
>> >> > > > -    Flow(.logical_datapath = ls._uuid,
>> >> > > > -         .stage            = s_SWITCH_IN_ACL(),
>> >> > > > -         .priority         = 0,
>> >> > > > -         .__match          = "1",
>> >> > > > -         .actions          = "next;",
>> >> > > > -         .external_ids     = map_empty());
>> >> > > > -    Flow(.logical_datapath = ls._uuid,
>> >> > > > -         .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > -         .priority         = 0,
>> >> > > > -         .__match          = "1",
>> >> > > > -         .actions          = "next;",
>> >> > > > -         .external_ids     = map_empty());
>> >> > > > -
>> >> > > > -    if (has_stateful) {
>> >> > > > -        /* Ingress and Egress ACL Table (Priority 1).
>> >> > > > -         *
>> >> > > > -         * By default, traffic is allowed.  This is partially
>> >> handled by
>> >> > > > -         * the Priority 0 ACL flows added earlier, but we also
>> need
>> >> to
>> >> > > > -         * commit IP flows.  This is because, while the
>> initiater's
>> >> > > > -         * direction may not have any stateful rules, the
>> server's
>> >> may
>> >> > > > -         * and then its return traffic would not have an
>> associated
>> >> > > > -         * conntrack entry and would return "+invalid".
>> >> > > > -         *
>> >> > > > -         * We use "ct_commit" for a connection that is not
>> already
>> >> known
>> >> > > > -         * by the connection tracker.  Once a connection is
>> >> committed,
>> >> > > > -         * subsequent packets will hit the flow at priority 0
>> that
>> >> just
>> >> > > > -         * uses "next;"
>> >> > > > -         *
>> >> > > > -         * We also check for established connections that have
>> >> > > ct_label.blocked
>> >> > > > -         * set on them.  That's a connection that was
>> disallowed,
>> >> but is
>> >> > > > -         * now allowed by policy again since it hit this
>> >> default-allow
>> >> > > flow.
>> >> > > > -         * We need to set ct_label.blocked=0 to let the
>> connection
>> >> > > continue,
>> >> > > > -         * which will be done by ct_commit() in the "stateful"
>> >> stage.
>> >> > > > -         * Subsequent packets will hit the flow at priority 0
>> that
>> >> just
>> >> > > > -         * uses "next;". */
>> >> > > > -        Flow(.logical_datapath = ls._uuid,
>> >> > > > -             .stage            = s_SWITCH_IN_ACL(),
>> >> > > > -             .priority         = 1,
>> >> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
>> >> > > ct_label.blocked == 1))",
>> >> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()}
>> = 1;
>> >> > > next;",
>> >> > > > -             .external_ids     = map_empty());
>> >> > > > -        Flow(.logical_datapath = ls._uuid,
>> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > -             .priority         = 1,
>> >> > > > -             .__match          = "ip && (!ct.est || (ct.est &&
>> >> > > ct_label.blocked == 1))",
>> >> > > > -             .actions          = "${rEGBIT_CONNTRACK_COMMIT()}
>> = 1;
>> >> > > next;",
>> >> > > > -             .external_ids     = map_empty());
>> >> > > > -
>> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > -         *
>> >> > > > -         * Always drop traffic that's in an invalid state.  Also
>> >> drop
>> >> > > > -         * reply direction packets for connections that have
>> been
>> >> marked
>> >> > > > -         * for deletion (bit 0 of ct_label is set).
>> >> > > > -         *
>> >> > > > -         * This is enforced at a higher priority than ACLs can
>> be
>> >> > > defined. */
>> >> > > > -        Flow(.logical_datapath = ls._uuid,
>> >> > > > -             .stage            = s_SWITCH_IN_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
>> >> > > ct_label.blocked == 1)",
>> >> > > > -             .actions          = "drop;",
>> >> > > > -             .external_ids     = map_empty());
>> >> > > > -        Flow(.logical_datapath = ls._uuid,
>> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "ct.inv || (ct.est && ct.rpl &&
>> >> > > ct_label.blocked == 1)",
>> >> > > > -             .actions          = "drop;",
>> >> > > > -             .external_ids     = map_empty());
>> >> > > > -
>> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > -         *
>> >> > > > -         * Allow reply traffic that is part of an established
>> >> > > > -         * conntrack entry that has not been marked for deletion
>> >> > > > -         * (bit 0 of ct_label).  We only match traffic in the
>> >> > > > -         * reply direction because we want traffic in the
>> request
>> >> > > > -         * direction to hit the currently defined policy from
>> ACLs.
>> >> > > > -         *
>> >> > > > -         * This is enforced at a higher priority than ACLs can
>> be
>> >> > > defined. */
>> >> > > > +for (UseCtInvMatch[use_ct_inv_match]) {
>> >> > > > +    (var ct_inv_or, var and_not_ct_inv) = match
>> (use_ct_inv_match)
>> >> {
>> >> > > > +        true -> ("ct.inv || ", "&& !ct.inv "),
>> >> > > > +        false -> ("", ""),
>> >> > > > +    } in
>> >> > > > +    for (sw in &Switch(.ls = ls))
>> >> > > > +    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
>> >> > > > +    {
>> >> > > > +        /* Ingress and Egress ACL Table (Priority 0): Packets
>> are
>> >> > > allowed by
>> >> > > > +         * default.  A related rule at priority 1 is added
>> below if
>> >> there
>> >> > > > +         * are any stateful ACLs in this datapath. */
>> >> > > >          Flow(.logical_datapath = ls._uuid,
>> >> > > >               .stage            = s_SWITCH_IN_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "ct.est && !ct.rel && !ct.new
>> &&
>> >> > > !ct.inv "
>> >> > > > -                                 "&& ct.rpl && ct_label.blocked
>> ==
>> >> 0",
>> >> > > > +             .priority         = 0,
>> >> > > > +             .__match          = "1",
>> >> > > >               .actions          = "next;",
>> >> > > >               .external_ids     = map_empty());
>> >> > > >          Flow(.logical_datapath = ls._uuid,
>> >> > > >               .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "ct.est && !ct.rel && !ct.new
>> &&
>> >> > > !ct.inv "
>> >> > > > -                                 "&& ct.rpl && ct_label.blocked
>> ==
>> >> 0",
>> >> > > > +             .priority         = 0,
>> >> > > > +             .__match          = "1",
>> >> > > >               .actions          = "next;",
>> >> > > >               .external_ids     = map_empty());
>> >> > > >
>> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > -         *
>> >> > > > -         * Allow traffic that is related to an existing
>> conntrack
>> >> entry
>> >> > > that
>> >> > > > -         * has not been marked for deletion (bit 0 of ct_label).
>> >> > > > -         *
>> >> > > > -         * This is enforced at a higher priority than ACLs can
>> be
>> >> > > defined.
>> >> > > > -         *
>> >> > > > -         * NOTE: This does not support related data sessions
>> (eg,
>> >> > > > -         * a dynamically negotiated FTP data channel), but will
>> >> allow
>> >> > > > -         * related traffic such as an ICMP Port Unreachable
>> through
>> >> > > > -         * that's generated from a non-listening UDP port.  */
>> >> > > > -        Flow(.logical_datapath = ls._uuid,
>> >> > > > -             .stage            = s_SWITCH_IN_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "!ct.est && ct.rel && !ct.new
>> &&
>> >> > > !ct.inv "
>> >> > > > -                                 "&& ct_label.blocked == 0",
>> >> > > > -             .actions          = "next;",
>> >> > > > -             .external_ids     = map_empty());
>> >> > > > -        Flow(.logical_datapath = ls._uuid,
>> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "!ct.est && ct.rel && !ct.new
>> &&
>> >> > > !ct.inv "
>> >> > > > -                                 "&& ct_label.blocked == 0",
>> >> > > > -             .actions          = "next;",
>> >> > > > -             .external_ids     = map_empty());
>> >> > > > +        if (has_stateful) {
>> >> > > > +            /* Ingress and Egress ACL Table (Priority 1).
>> >> > > > +             *
>> >> > > > +             * By default, traffic is allowed.  This is
>> partially
>> >> > > handled by
>> >> > > > +             * the Priority 0 ACL flows added earlier, but we
>> also
>> >> need
>> >> > > to
>> >> > > > +             * commit IP flows.  This is because, while the
>> >> initiater's
>> >> > > > +             * direction may not have any stateful rules, the
>> >> server's
>> >> > > may
>> >> > > > +             * and then its return traffic would not have an
>> >> associated
>> >> > > > +             * conntrack entry and would return "+invalid".
>> >> > > > +             *
>> >> > > > +             * We use "ct_commit" for a connection that is not
>> >> already
>> >> > > known
>> >> > > > +             * by the connection tracker.  Once a connection is
>> >> > > committed,
>> >> > > > +             * subsequent packets will hit the flow at priority
>> 0
>> >> that
>> >> > > just
>> >> > > > +             * uses "next;"
>> >> > > > +             *
>> >> > > > +             * We also check for established connections that
>> have
>> >> > > ct_label.blocked
>> >> > > > +             * set on them.  That's a connection that was
>> >> disallowed,
>> >> > > but is
>> >> > > > +             * now allowed by policy again since it hit this
>> >> > > default-allow flow.
>> >> > > > +             * We need to set ct_label.blocked=0 to let the
>> >> connection
>> >> > > continue,
>> >> > > > +             * which will be done by ct_commit() in the
>> "stateful"
>> >> stage.
>> >> > > > +             * Subsequent packets will hit the flow at priority
>> 0
>> >> that
>> >> > > just
>> >> > > > +             * uses "next;". */
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> >> > > > +                 .priority         = 1,
>> >> > > > +                 .__match          = "ip && (!ct.est || (ct.est
>> &&
>> >> > > ct_label.blocked == 1))",
>> >> > > > +                 .actions          =
>> "${rEGBIT_CONNTRACK_COMMIT()}
>> >> =
>> >> 1;
>> >> > > next;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > +                 .priority         = 1,
>> >> > > > +                 .__match          = "ip && (!ct.est || (ct.est
>> &&
>> >> > > ct_label.blocked == 1))",
>> >> > > > +                 .actions          =
>> "${rEGBIT_CONNTRACK_COMMIT()}
>> >> =
>> >> 1;
>> >> > > next;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > >
>> >> > > > -        /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > -         *
>> >> > > > -         * Not to do conntrack on ND packets. */
>> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > +             *
>> >> > > > +             * Always drop traffic that's in an invalid state.
>> >> Also
>> >> drop
>> >> > > > +             * reply direction packets for connections that have
>> >> been
>> >> > > marked
>> >> > > > +             * for deletion (bit 0 of ct_label is set).
>> >> > > > +             *
>> >> > > > +             * This is enforced at a higher priority than ACLs
>> can
>> >> be
>> >> > > defined. */
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = ct_inv_or ++ "(ct.est &&
>> >> ct.rpl
>> >> &&
>> >> > > ct_label.blocked == 1)",
>> >> > > > +                 .actions          = "drop;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = ct_inv_or ++ "(ct.est &&
>> >> ct.rpl
>> >> &&
>> >> > > ct_label.blocked == 1)",
>> >> > > > +                 .actions          = "drop;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +
>> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > +             *
>> >> > > > +             * Allow reply traffic that is part of an
>> established
>> >> > > > +             * conntrack entry that has not been marked for
>> >> deletion
>> >> > > > +             * (bit 0 of ct_label).  We only match traffic in
>> the
>> >> > > > +             * reply direction because we want traffic in the
>> >> request
>> >> > > > +             * direction to hit the currently defined policy
>> from
>> >> ACLs.
>> >> > > > +             *
>> >> > > > +             * This is enforced at a higher priority than ACLs
>> can
>> >> be
>> >> > > defined. */
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = "ct.est && !ct.rel &&
>> !ct.new
>> >> "
>> >> ++
>> >> > > and_not_ct_inv ++
>> >> > > > +                                     "&& ct.rpl &&
>> ct_label.blocked
>> >> ==
>> >> > > 0",
>> >> > > > +                 .actions          = "next;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = "ct.est && !ct.rel &&
>> !ct.new
>> >> "
>> >> ++
>> >> > > and_not_ct_inv ++
>> >> > > > +                                     "&& ct.rpl &&
>> ct_label.blocked
>> >> ==
>> >> > > 0",
>> >> > > > +                 .actions          = "next;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +
>> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > +             *
>> >> > > > +             * Allow traffic that is related to an existing
>> >> conntrack
>> >> > > entry that
>> >> > > > +             * has not been marked for deletion (bit 0 of
>> >> ct_label).
>> >> > > > +             *
>> >> > > > +             * This is enforced at a higher priority than ACLs
>> can
>> >> be
>> >> > > defined.
>> >> > > > +             *
>> >> > > > +             * NOTE: This does not support related data sessions
>> >> (eg,
>> >> > > > +             * a dynamically negotiated FTP data channel), but
>> will
>> >> allow
>> >> > > > +             * related traffic such as an ICMP Port Unreachable
>> >> through
>> >> > > > +             * that's generated from a non-listening UDP port.
>> */
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = "!ct.est && ct.rel &&
>> !ct.new
>> >> "
>> >> ++
>> >> > > and_not_ct_inv ++
>> >> > > > +                                     "&& ct_label.blocked == 0",
>> >> > > > +                 .actions          = "next;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = "!ct.est && ct.rel &&
>> !ct.new
>> >> "
>> >> ++
>> >> > > and_not_ct_inv ++
>> >> > > > +                                     "&& ct_label.blocked == 0",
>> >> > > > +                 .actions          = "next;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +
>> >> > > > +            /* Ingress and Egress ACL Table (Priority 65535).
>> >> > > > +             *
>> >> > > > +             * Not to do conntrack on ND packets. */
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_IN_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = "nd || nd_ra || nd_rs ||
>> mldv1
>> >> ||
>> >> > > mldv2",
>> >> > > > +                 .actions          = "next;",
>> >> > > > +                 .external_ids     = map_empty());
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > +                 .priority         = 65535,
>> >> > > > +                 .__match          = "nd || nd_ra || nd_rs ||
>> mldv1
>> >> ||
>> >> > > mldv2",
>> >> > > > +                 .actions          = "next;",
>> >> > > > +                 .external_ids     = map_empty())
>> >> > > > +        };
>> >> > > > +
>> >> > > > +        /* Add a 34000 priority flow to advance the DNS reply
>> from
>> >> > > ovn-controller,
>> >> > > > +         * if the CMS has configured DNS records for the
>> datapath.
>> >> > > > +         */
>> >> > > > +        if (sw.has_dns_records) {
>> >> > > > +            Flow(.logical_datapath = ls._uuid,
>> >> > > > +                 .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > +                 .priority         = 34000,
>> >> > > > +                 .__match          = "udp.src == 53",
>> >> > > > +                 .actions          = if has_stateful "ct_commit;
>> >> next;"
>> >> > > else "next;",
>> >> > > > +                 .external_ids     = map_empty())
>> >> > > > +        };
>> >> > > > +
>> >> > > > +        /* Add a 34000 priority flow to advance the service
>> monitor
>> >> reply
>> >> > > > +         * packets to skip applying ingress ACLs. */
>> >> > > >          Flow(.logical_datapath = ls._uuid,
>> >> > > >               .stage            = s_SWITCH_IN_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1
>> ||
>> >> > > mldv2",
>> >> > > > +             .priority         = 34000,
>> >> > > > +             .__match          = "eth.dst == $svc_monitor_mac",
>> >> > > >               .actions          = "next;",
>> >> > > >               .external_ids     = map_empty());
>> >> > > > -        Flow(.logical_datapath = ls._uuid,
>> >> > > > -             .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > -             .priority         = 65535,
>> >> > > > -             .__match          = "nd || nd_ra || nd_rs || mldv1
>> ||
>> >> > > mldv2",
>> >> > > > -             .actions          = "next;",
>> >> > > > -             .external_ids     = map_empty())
>> >> > > > -    };
>> >> > > > -
>> >> > > > -    /* Add a 34000 priority flow to advance the DNS reply from
>> >> > > ovn-controller,
>> >> > > > -     * if the CMS has configured DNS records for the datapath.
>> >> > > > -     */
>> >> > > > -    if (sw.has_dns_records) {
>> >> > > >          Flow(.logical_datapath = ls._uuid,
>> >> > > >               .stage            = s_SWITCH_OUT_ACL(),
>> >> > > >               .priority         = 34000,
>> >> > > > -             .__match          = "udp.src == 53",
>> >> > > > -             .actions          = if has_stateful "ct_commit;
>> next;"
>> >> else
>> >> > > "next;",
>> >> > > > +             .__match          = "eth.src == $svc_monitor_mac",
>> >> > > > +             .actions          = "next;",
>> >> > > >               .external_ids     = map_empty())
>> >> > > > -    };
>> >> > > > -
>> >> > > > -    /* Add a 34000 priority flow to advance the service monitor
>> >> reply
>> >> > > > -     * packets to skip applying ingress ACLs. */
>> >> > > > -    Flow(.logical_datapath = ls._uuid,
>> >> > > > -         .stage            = s_SWITCH_IN_ACL(),
>> >> > > > -         .priority         = 34000,
>> >> > > > -         .__match          = "eth.dst == $svc_monitor_mac",
>> >> > > > -         .actions          = "next;",
>> >> > > > -         .external_ids     = map_empty());
>> >> > > > -    Flow(.logical_datapath = ls._uuid,
>> >> > > > -         .stage            = s_SWITCH_OUT_ACL(),
>> >> > > > -         .priority         = 34000,
>> >> > > > -         .__match          = "eth.src == $svc_monitor_mac",
>> >> > > > -         .actions          = "next;",
>> >> > > > -         .external_ids     = map_empty())
>> >> > > > +    }
>> >> > > >  }
>> >> > > >
>> >> > > >  /* This stage builds hints for the IN/OUT_ACL stage. Based on
>> >> various
>> >> > > > diff --git a/ovn-nb.xml b/ovn-nb.xml
>> >> > > > index feb38b5d31..e337be4eb3 100644
>> >> > > > --- a/ovn-nb.xml
>> >> > > > +++ b/ovn-nb.xml
>> >> > > > @@ -240,6 +240,21 @@
>> >> > > >          </p>
>> >> > > >        </column>
>> >> > > >
>> >> > > > +      <column name="options" key="use_ct_inv_match">
>> >> > > > +        <p>
>> >> > > > +          If set to false, <code>ovn-northd</code> will not use
>> the
>> >> > > > +          <code>ct.inv</code> field in any of the logical flow
>> >> matches.
>> >> > > > +          The default value is true.  If the NIC supports
>> >> offloading
>> >> > > > +          OVS datapath flows but doesn't support offloading
>> >> ct_state
>> >> > > > +          <code>inv</code> flag, then the datapath flows
>> matching
>> >> on
>> >> > > this flag
>> >> > > > +          (either <code>+inv</code> or <code>-inv</code>) will
>> not
>> >> be
>> >> > > > +          offloaded.  CMS should consider setting
>> >> > > <code>use_ct_inv_match</code>
>> >> > > > +          to <code>false</code> in such cases.  This results in
>> a
>> >> side
>> >> > > effect
>> >> > > > +          of the invalid packets getting delivered to the
>> >> destination
>> >> > > VIF, which
>> >> > > > +          otherwise would have been dropped by <code>OVN</code>.
>> >> > > > +        </p>
>> >> > > > +      </column>
>> >> > > > +
>> >> > > >        <group title="Options for configuring interconnection
>> route
>> >> > > advertisement">
>> >> > > >          <p>
>> >> > > >            These options control how routes are advertised
>> between
>> >> OVN
>> >> > > > --
>> >> > > > 2.29.2
>> >> > > >
>> >> > > > _______________________________________________
>> >> > > > dev mailing list
>> >> > > > dev@openvswitch.org
>> >> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >> > >
>> >> > > Acked-by: Han Zhou <hzhou@ovn.org>
>> >> >
>> >> > Thanks for the reviews Mark G and Han.
>> >> >
>> >> > I applied both the patches to the main branch addressing Han's
>> comment
>> >> in
>> >> p1.
>> >> > When I submitted this patch, I had a test case which somehow got
>> removed
>> >> > in the later versions.  I added the test case before applying.
>> >> >
>> >> > Regards
>> >> > Numan
>> >>
>> >> Hi Numan,
>> >>
>> >> Seems this merge breaks the CI. All these vtep related cases are
>> failing:
>> >> 898: ovn-controller-vtep - chassis FAILED (ovn-controller-vtep.at:110)
>> >> 899: ovn-controller-vtep - binding 1 FAILED (
>> ovn-controller-vtep.at:178)
>> >> 900: ovn-controller-vtep - binding 2 FAILED (
>> ovn-controller-vtep.at:252)
>> >> 901: ovn-controller-vtep - vtep-lswitch FAILED (
>> >> ovn-controller-vtep.at:291)
>> >> 902: ovn-controller-vtep - vtep-macs 1 FAILED (
>> ovn-controller-vtep.at:343
>> >> )
>> >> 903: ovn-controller-vtep - vtep-macs 2 FAILED (
>> ovn-controller-vtep.at:431
>> >> )
>> >>
>> >> All these tests are successful when I run them locally. I haven't
>> checked
>> >> more details yet.
>> >>
>> >
>> > Hi Han,
>> >
>> > I noticed that. In fact it's failing without these patches too.
>> >
>> > I quickly looked into it and looks like it's failing because of some
>> > warning message related to dns_resolve in ovsdb-server log.
>> >
>> > I think wen need to white list this warning message in vtep tests.
>> >
>> >
>> > Thanks
>> > Numan
>> >
>> > ok. Thanks for checking! Do you happen to know what change introduced
>> this? Also I wonder why I don't see these failures locally. Is it test
>> server's problem (regarding DNS?)
>>
>
> I didn't get a chance to look further.  If you're planning to look into it
> then please do so.
>
>
I simply tried pushing the same branch to my repo, and the jobs all passed
without any issues, but retrying the job on ovn-org repo always failed. It
is likely related to the git servers (at least not related to any of the
recent commits).
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 1ddde15f86..5f1365775d 100644
--- a/NEWS
+++ b/NEWS
@@ -6,11 +6,16 @@  Post-v21.03.0
     expected to scale better than the C implementation, for large deployments.
     (This may take testing and tuning to be effective.)  This version of OVN
     requires DDLog 0.36.
-  - Introduce ovn-controller incremetal processing engine statistics
+  - Introduce ovn-controller incremental processing engine statistics
   - Introduced parallel processing in ovn-northd with the NB_Global config option
     'use_parallel_build' to enable it.  It is disabled by default.
   - Support vlan-passthru mode for tag=0 localnet ports.
   - Support custom 802.11ad EthType for localnet ports.
+  - Add a new NB Global option - use_ct_inv_match with the default value of true.
+    If this is set to false, then the logical field - "ct.inv" will not be
+    used in the logical flow matches.  CMS can consider setting this to false,
+    if they want to use smart NICs which don't support offloading datapath flows
+    with this field used.
 
 OVN v21.03.0 - 12 Mar 2021
 -------------------------
diff --git a/northd/lswitch.dl b/northd/lswitch.dl
index 47c497e0cf..27ac3cadc5 100644
--- a/northd/lswitch.dl
+++ b/northd/lswitch.dl
@@ -742,3 +742,10 @@  function put_svc_monitor_mac(options: Map<string,string>,
 relation SvcMonitorMac(mac: eth_addr)
 SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
     nb::NB_Global(._uuid = uuid, .options = options).
+
+relation UseCtInvMatch[bool]
+UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
+    nb::NB_Global(.options = options).
+UseCtInvMatch[true] :-
+    Unit(),
+    not nb in nb::NB_Global().
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index de1aa896d3..5f6c0a6922 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -98,6 +98,10 @@  static bool check_lsp_is_up;
 static char svc_monitor_mac[ETH_ADDR_STRLEN + 1];
 static struct eth_addr svc_monitor_mac_ea;
 
+/* If this option is 'true' northd will make use of ct.inv match fields.
+ * Otherwise, it will avoid using it.  The default is true. */
+static bool use_ct_inv_match = true;
+
 /* Default probe interval for NB and SB DB connections. */
 #define DEFAULT_PROBE_INTERVAL_MSEC 5000
 static int northd_probe_interval_nb = 0;
@@ -4099,7 +4103,6 @@  do_ovn_lflow_add(struct hmap *lflow_map, bool shared,
     hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
 }
 
-
 /* Adds a row with the specified contents to the Logical_Flow table. */
 static void
 ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
@@ -5712,12 +5715,13 @@  build_acls(struct ovn_datapath *od, struct hmap *lflows,
          * for deletion (bit 0 of ct_label is set).
          *
          * This is enforced at a higher priority than ACLs can be defined. */
+        char *match = xasprintf("%s(ct.est && ct.rpl && ct_label.blocked == 1)",
+                                use_ct_inv_match ? "ct.inv || " : "");
         ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
-                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
-                      "drop;");
+                      match, "drop;");
         ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
-                      "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
-                      "drop;");
+                      match, "drop;");
+        free(match);
 
         /* Ingress and Egress ACL Table (Priority 65535).
          *
@@ -5728,14 +5732,15 @@  build_acls(struct ovn_datapath *od, struct hmap *lflows,
          * direction to hit the currently defined policy from ACLs.
          *
          * This is enforced at a higher priority than ACLs can be defined. */
+        match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
+                          "ct.rpl && ct_label.blocked == 0",
+                          use_ct_inv_match ? " && !ct.inv" : "");
+
         ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
-                      "ct.est && !ct.rel && !ct.new && !ct.inv "
-                      "&& ct.rpl && ct_label.blocked == 0",
-                      "next;");
+                      match, "next;");
         ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
-                      "ct.est && !ct.rel && !ct.new && !ct.inv "
-                      "&& ct.rpl && ct_label.blocked == 0",
-                      "next;");
+                      match, "next;");
+        free(match);
 
         /* Ingress and Egress ACL Table (Priority 65535).
          *
@@ -5748,14 +5753,14 @@  build_acls(struct ovn_datapath *od, struct hmap *lflows,
          * a dynamically negotiated FTP data channel), but will allow
          * related traffic such as an ICMP Port Unreachable through
          * that's generated from a non-listening UDP port.  */
+        match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
+                          "ct_label.blocked == 0",
+                          use_ct_inv_match ? " && !ct.inv" : "");
         ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
-                      "!ct.est && ct.rel && !ct.new && !ct.inv "
-                      "&& ct_label.blocked == 0",
-                      "next;");
+                      match, "next;");
         ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
-                      "!ct.est && ct.rel && !ct.new && !ct.inv "
-                      "&& ct_label.blocked == 0",
-                      "next;");
+                      match, "next;");
+        free(match);
 
         /* Ingress and Egress ACL Table (Priority 65535).
          *
@@ -13195,6 +13200,9 @@  ovnnb_db_run(struct northd_context *ctx,
 
     use_logical_dp_groups = smap_get_bool(&nb->options,
                                           "use_logical_dp_groups", false);
+    use_ct_inv_match = smap_get_bool(&nb->options,
+                                     "use_ct_inv_match", true);
+
     /* deprecated, use --event instead */
     controller_event_en = smap_get_bool(&nb->options,
                                         "controller_event", false);
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 2f5d36c599..a0f7944103 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -2280,173 +2280,179 @@  for (Reject(lsuuid, pipeline, stage, acl, fair_meter, extra_match_, extra_action
 }
 
 /* build_acls */
-for (sw in &Switch(.ls = ls))
-var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
-{
-    /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
-     * default.  A related rule at priority 1 is added below if there
-     * are any stateful ACLs in this datapath. */
-    Flow(.logical_datapath = ls._uuid,
-         .stage            = s_SWITCH_IN_ACL(),
-         .priority         = 0,
-         .__match          = "1",
-         .actions          = "next;",
-         .external_ids     = map_empty());
-    Flow(.logical_datapath = ls._uuid,
-         .stage            = s_SWITCH_OUT_ACL(),
-         .priority         = 0,
-         .__match          = "1",
-         .actions          = "next;",
-         .external_ids     = map_empty());
-
-    if (has_stateful) {
-        /* Ingress and Egress ACL Table (Priority 1).
-         *
-         * By default, traffic is allowed.  This is partially handled by
-         * the Priority 0 ACL flows added earlier, but we also need to
-         * commit IP flows.  This is because, while the initiater's
-         * direction may not have any stateful rules, the server's may
-         * and then its return traffic would not have an associated
-         * conntrack entry and would return "+invalid".
-         *
-         * We use "ct_commit" for a connection that is not already known
-         * by the connection tracker.  Once a connection is committed,
-         * subsequent packets will hit the flow at priority 0 that just
-         * uses "next;"
-         *
-         * We also check for established connections that have ct_label.blocked
-         * set on them.  That's a connection that was disallowed, but is
-         * now allowed by policy again since it hit this default-allow flow.
-         * We need to set ct_label.blocked=0 to let the connection continue,
-         * which will be done by ct_commit() in the "stateful" stage.
-         * Subsequent packets will hit the flow at priority 0 that just
-         * uses "next;". */
-        Flow(.logical_datapath = ls._uuid,
-             .stage            = s_SWITCH_IN_ACL(),
-             .priority         = 1,
-             .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
-             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
-             .external_ids     = map_empty());
-        Flow(.logical_datapath = ls._uuid,
-             .stage            = s_SWITCH_OUT_ACL(),
-             .priority         = 1,
-             .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
-             .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
-             .external_ids     = map_empty());
-
-        /* Ingress and Egress ACL Table (Priority 65535).
-         *
-         * Always drop traffic that's in an invalid state.  Also drop
-         * reply direction packets for connections that have been marked
-         * for deletion (bit 0 of ct_label is set).
-         *
-         * This is enforced at a higher priority than ACLs can be defined. */
-        Flow(.logical_datapath = ls._uuid,
-             .stage            = s_SWITCH_IN_ACL(),
-             .priority         = 65535,
-             .__match          = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
-             .actions          = "drop;",
-             .external_ids     = map_empty());
-        Flow(.logical_datapath = ls._uuid,
-             .stage            = s_SWITCH_OUT_ACL(),
-             .priority         = 65535,
-             .__match          = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
-             .actions          = "drop;",
-             .external_ids     = map_empty());
-
-        /* Ingress and Egress ACL Table (Priority 65535).
-         *
-         * Allow reply traffic that is part of an established
-         * conntrack entry that has not been marked for deletion
-         * (bit 0 of ct_label).  We only match traffic in the
-         * reply direction because we want traffic in the request
-         * direction to hit the currently defined policy from ACLs.
-         *
-         * This is enforced at a higher priority than ACLs can be defined. */
+for (UseCtInvMatch[use_ct_inv_match]) {
+    (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match) {
+        true -> ("ct.inv || ", "&& !ct.inv "),
+        false -> ("", ""),
+    } in
+    for (sw in &Switch(.ls = ls))
+    var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
+    {
+        /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
+         * default.  A related rule at priority 1 is added below if there
+         * are any stateful ACLs in this datapath. */
         Flow(.logical_datapath = ls._uuid,
              .stage            = s_SWITCH_IN_ACL(),
-             .priority         = 65535,
-             .__match          = "ct.est && !ct.rel && !ct.new && !ct.inv "
-                                 "&& ct.rpl && ct_label.blocked == 0",
+             .priority         = 0,
+             .__match          = "1",
              .actions          = "next;",
              .external_ids     = map_empty());
         Flow(.logical_datapath = ls._uuid,
              .stage            = s_SWITCH_OUT_ACL(),
-             .priority         = 65535,
-             .__match          = "ct.est && !ct.rel && !ct.new && !ct.inv "
-                                 "&& ct.rpl && ct_label.blocked == 0",
+             .priority         = 0,
+             .__match          = "1",
              .actions          = "next;",
              .external_ids     = map_empty());
 
-        /* Ingress and Egress ACL Table (Priority 65535).
-         *
-         * Allow traffic that is related to an existing conntrack entry that
-         * has not been marked for deletion (bit 0 of ct_label).
-         *
-         * This is enforced at a higher priority than ACLs can be defined.
-         *
-         * NOTE: This does not support related data sessions (eg,
-         * a dynamically negotiated FTP data channel), but will allow
-         * related traffic such as an ICMP Port Unreachable through
-         * that's generated from a non-listening UDP port.  */
-        Flow(.logical_datapath = ls._uuid,
-             .stage            = s_SWITCH_IN_ACL(),
-             .priority         = 65535,
-             .__match          = "!ct.est && ct.rel && !ct.new && !ct.inv "
-                                 "&& ct_label.blocked == 0",
-             .actions          = "next;",
-             .external_ids     = map_empty());
-        Flow(.logical_datapath = ls._uuid,
-             .stage            = s_SWITCH_OUT_ACL(),
-             .priority         = 65535,
-             .__match          = "!ct.est && ct.rel && !ct.new && !ct.inv "
-                                 "&& ct_label.blocked == 0",
-             .actions          = "next;",
-             .external_ids     = map_empty());
+        if (has_stateful) {
+            /* Ingress and Egress ACL Table (Priority 1).
+             *
+             * By default, traffic is allowed.  This is partially handled by
+             * the Priority 0 ACL flows added earlier, but we also need to
+             * commit IP flows.  This is because, while the initiater's
+             * direction may not have any stateful rules, the server's may
+             * and then its return traffic would not have an associated
+             * conntrack entry and would return "+invalid".
+             *
+             * We use "ct_commit" for a connection that is not already known
+             * by the connection tracker.  Once a connection is committed,
+             * subsequent packets will hit the flow at priority 0 that just
+             * uses "next;"
+             *
+             * We also check for established connections that have ct_label.blocked
+             * set on them.  That's a connection that was disallowed, but is
+             * now allowed by policy again since it hit this default-allow flow.
+             * We need to set ct_label.blocked=0 to let the connection continue,
+             * which will be done by ct_commit() in the "stateful" stage.
+             * Subsequent packets will hit the flow at priority 0 that just
+             * uses "next;". */
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_IN_ACL(),
+                 .priority         = 1,
+                 .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
+                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
+                 .external_ids     = map_empty());
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_OUT_ACL(),
+                 .priority         = 1,
+                 .__match          = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
+                 .actions          = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
+                 .external_ids     = map_empty());
 
-        /* Ingress and Egress ACL Table (Priority 65535).
-         *
-         * Not to do conntrack on ND packets. */
+            /* Ingress and Egress ACL Table (Priority 65535).
+             *
+             * Always drop traffic that's in an invalid state.  Also drop
+             * reply direction packets for connections that have been marked
+             * for deletion (bit 0 of ct_label is set).
+             *
+             * This is enforced at a higher priority than ACLs can be defined. */
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_IN_ACL(),
+                 .priority         = 65535,
+                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl && ct_label.blocked == 1)",
+                 .actions          = "drop;",
+                 .external_ids     = map_empty());
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_OUT_ACL(),
+                 .priority         = 65535,
+                 .__match          = ct_inv_or ++ "(ct.est && ct.rpl && ct_label.blocked == 1)",
+                 .actions          = "drop;",
+                 .external_ids     = map_empty());
+
+            /* Ingress and Egress ACL Table (Priority 65535).
+             *
+             * Allow reply traffic that is part of an established
+             * conntrack entry that has not been marked for deletion
+             * (bit 0 of ct_label).  We only match traffic in the
+             * reply direction because we want traffic in the request
+             * direction to hit the currently defined policy from ACLs.
+             *
+             * This is enforced at a higher priority than ACLs can be defined. */
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_IN_ACL(),
+                 .priority         = 65535,
+                 .__match          = "ct.est && !ct.rel && !ct.new " ++ and_not_ct_inv ++
+                                     "&& ct.rpl && ct_label.blocked == 0",
+                 .actions          = "next;",
+                 .external_ids     = map_empty());
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_OUT_ACL(),
+                 .priority         = 65535,
+                 .__match          = "ct.est && !ct.rel && !ct.new " ++ and_not_ct_inv ++
+                                     "&& ct.rpl && ct_label.blocked == 0",
+                 .actions          = "next;",
+                 .external_ids     = map_empty());
+
+            /* Ingress and Egress ACL Table (Priority 65535).
+             *
+             * Allow traffic that is related to an existing conntrack entry that
+             * has not been marked for deletion (bit 0 of ct_label).
+             *
+             * This is enforced at a higher priority than ACLs can be defined.
+             *
+             * NOTE: This does not support related data sessions (eg,
+             * a dynamically negotiated FTP data channel), but will allow
+             * related traffic such as an ICMP Port Unreachable through
+             * that's generated from a non-listening UDP port.  */
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_IN_ACL(),
+                 .priority         = 65535,
+                 .__match          = "!ct.est && ct.rel && !ct.new " ++ and_not_ct_inv ++
+                                     "&& ct_label.blocked == 0",
+                 .actions          = "next;",
+                 .external_ids     = map_empty());
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_OUT_ACL(),
+                 .priority         = 65535,
+                 .__match          = "!ct.est && ct.rel && !ct.new " ++ and_not_ct_inv ++
+                                     "&& ct_label.blocked == 0",
+                 .actions          = "next;",
+                 .external_ids     = map_empty());
+
+            /* Ingress and Egress ACL Table (Priority 65535).
+             *
+             * Not to do conntrack on ND packets. */
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_IN_ACL(),
+                 .priority         = 65535,
+                 .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
+                 .actions          = "next;",
+                 .external_ids     = map_empty());
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_OUT_ACL(),
+                 .priority         = 65535,
+                 .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
+                 .actions          = "next;",
+                 .external_ids     = map_empty())
+        };
+
+        /* Add a 34000 priority flow to advance the DNS reply from ovn-controller,
+         * if the CMS has configured DNS records for the datapath.
+         */
+        if (sw.has_dns_records) {
+            Flow(.logical_datapath = ls._uuid,
+                 .stage            = s_SWITCH_OUT_ACL(),
+                 .priority         = 34000,
+                 .__match          = "udp.src == 53",
+                 .actions          = if has_stateful "ct_commit; next;" else "next;",
+                 .external_ids     = map_empty())
+        };
+
+        /* Add a 34000 priority flow to advance the service monitor reply
+         * packets to skip applying ingress ACLs. */
         Flow(.logical_datapath = ls._uuid,
              .stage            = s_SWITCH_IN_ACL(),
-             .priority         = 65535,
-             .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
+             .priority         = 34000,
+             .__match          = "eth.dst == $svc_monitor_mac",
              .actions          = "next;",
              .external_ids     = map_empty());
-        Flow(.logical_datapath = ls._uuid,
-             .stage            = s_SWITCH_OUT_ACL(),
-             .priority         = 65535,
-             .__match          = "nd || nd_ra || nd_rs || mldv1 || mldv2",
-             .actions          = "next;",
-             .external_ids     = map_empty())
-    };
-
-    /* Add a 34000 priority flow to advance the DNS reply from ovn-controller,
-     * if the CMS has configured DNS records for the datapath.
-     */
-    if (sw.has_dns_records) {
         Flow(.logical_datapath = ls._uuid,
              .stage            = s_SWITCH_OUT_ACL(),
              .priority         = 34000,
-             .__match          = "udp.src == 53",
-             .actions          = if has_stateful "ct_commit; next;" else "next;",
+             .__match          = "eth.src == $svc_monitor_mac",
+             .actions          = "next;",
              .external_ids     = map_empty())
-    };
-
-    /* Add a 34000 priority flow to advance the service monitor reply
-     * packets to skip applying ingress ACLs. */
-    Flow(.logical_datapath = ls._uuid,
-         .stage            = s_SWITCH_IN_ACL(),
-         .priority         = 34000,
-         .__match          = "eth.dst == $svc_monitor_mac",
-         .actions          = "next;",
-         .external_ids     = map_empty());
-    Flow(.logical_datapath = ls._uuid,
-         .stage            = s_SWITCH_OUT_ACL(),
-         .priority         = 34000,
-         .__match          = "eth.src == $svc_monitor_mac",
-         .actions          = "next;",
-         .external_ids     = map_empty())
+    }
 }
 
 /* This stage builds hints for the IN/OUT_ACL stage. Based on various
diff --git a/ovn-nb.xml b/ovn-nb.xml
index feb38b5d31..e337be4eb3 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -240,6 +240,21 @@ 
         </p>
       </column>
 
+      <column name="options" key="use_ct_inv_match">
+        <p>
+          If set to false, <code>ovn-northd</code> will not use the
+          <code>ct.inv</code> field in any of the logical flow matches.
+          The default value is true.  If the NIC supports offloading
+          OVS datapath flows but doesn't support offloading ct_state
+          <code>inv</code> flag, then the datapath flows matching on this flag
+          (either <code>+inv</code> or <code>-inv</code>) will not be
+          offloaded.  CMS should consider setting <code>use_ct_inv_match</code>
+          to <code>false</code> in such cases.  This results in a side effect
+          of the invalid packets getting delivered to the destination VIF, which
+          otherwise would have been dropped by <code>OVN</code>.
+        </p>
+      </column>
+
       <group title="Options for configuring interconnection route advertisement">
         <p>
           These options control how routes are advertised between OVN