diff mbox series

[ovs-dev,RFC] northd, controller: Commit flows dropped by ACLs in a separate CT zone

Message ID 20221018153342.164530-1-sangana.abhiram@nutanix.com
State RFC
Headers show
Series [ovs-dev,RFC] northd, controller: Commit flows dropped by ACLs in a separate CT zone | expand

Commit Message

Abhiram Sangana Oct. 18, 2022, 3:33 p.m. UTC
To identify connections dropped by ACLs, users can enable logging for ACLs
but this approach does not scale. ACL logging uses "controller" action
which causes a significant spike in the CPU usage of ovs-vswitchd (and
ovn-controller to a lesser extent) even with metering enabled (observed
65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
approach is to use drop sampling (patch by Adrian Moreno currently in
review) but we might miss specific connections of interest with this
approach.

This patch commits connections dropped by ACLs to the connection tracking
table with a specific ACL label that was introduced in 0e0228be (
northd: Add ACL label). The dropped connections are committed in a separate
CT zone so that they can be managed independently.

Each logical port is assigned a new zone for committing dropped flows.
The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.

A new lflow action "ct_commit_drop" is introduced that commits flows
to connection tracking table in a zone identified by
MFF_LOG_ACL_DROP_ZONE register.

An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
instead of silently dropping the packet.

Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
---
 controller/ovn-controller.c  | 23 ++++++++++++++---
 controller/physical.c        | 32 +++++++++++++++++++++--
 include/ovn/actions.h        |  1 +
 include/ovn/logical-fields.h |  1 +
 lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
 lib/ovn-util.c               |  4 +--
 lib/ovn-util.h               |  2 +-
 northd/northd.c              | 14 ++++++++--
 northd/ovn-northd.8.xml      | 14 ++++++++--
 ovn-sb.xml                   | 17 ++++++++++++
 ovs                          |  2 +-
 utilities/ovn-nbctl.c        |  7 ++---
 12 files changed, 151 insertions(+), 16 deletions(-)

Comments

Dumitru Ceara Oct. 19, 2022, 1:09 p.m. UTC | #1
Hi Abhiram,

Thanks for the patch!  I only skimmed the changes so this is not a full
review but more of a discussion starter.

On 10/18/22 17:33, Abhiram Sangana wrote:
> To identify connections dropped by ACLs, users can enable logging for ACLs
> but this approach does not scale. ACL logging uses "controller" action
> which causes a significant spike in the CPU usage of ovs-vswitchd (and
> ovn-controller to a lesser extent) even with metering enabled (observed
> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
> approach is to use drop sampling (patch by Adrian Moreno currently in
> review) but we might miss specific connections of interest with this
> approach.
> 
> This patch commits connections dropped by ACLs to the connection tracking
> table with a specific ACL label that was introduced in 0e0228be (
> northd: Add ACL label). The dropped connections are committed in a separate
> CT zone so that they can be managed independently.
> 

I'm not sure I understand how the CMS can manage this.  How is this
better than sampling?  Committed connections are going to time out at
some point (30 sec by default for udp/icmp with the kernel datapath).
So the CMS will have to continuously monitor the contents of the
conntrack zone?  Aren't we just moving the CPU load somewhere else with
this?  Even so, there's a chance an entry is missed.

> Each logical port is assigned a new zone for committing dropped flows.
> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
> 
> A new lflow action "ct_commit_drop" is introduced that commits flows
> to connection tracking table in a zone identified by
> MFF_LOG_ACL_DROP_ZONE register.
> 
> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
> instead of silently dropping the packet.
> 
> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
> ---
>  controller/ovn-controller.c  | 23 ++++++++++++++---
>  controller/physical.c        | 32 +++++++++++++++++++++--
>  include/ovn/actions.h        |  1 +
>  include/ovn/logical-fields.h |  1 +
>  lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>  lib/ovn-util.c               |  4 +--
>  lib/ovn-util.h               |  2 +-
>  northd/northd.c              | 14 ++++++++--
>  northd/ovn-northd.8.xml      | 14 ++++++++--
>  ovn-sb.xml                   | 17 ++++++++++++
>  ovs                          |  2 +-

This shouldn't need to change the OVS submodule.

Thanks,
Dumitru

>  utilities/ovn-nbctl.c        |  7 ++---
>  12 files changed, 151 insertions(+), 16 deletions(-)
> 
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index c97744d57..1ad20fe55 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>      unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>  
>      struct shash_node *shash_node;
> +    const struct binding_lport *lport;
>      SHASH_FOR_EACH (shash_node, binding_lports) {
>          sset_add(&all_users, shash_node->name);
> +
> +        /* Zone for committing dropped connections of a vNIC. */
> +        lport = shash_node->data;
> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
> +        sset_add(&all_users, drop_zone);
> +        free(drop_zone);
>      }
>  
>      /* Local patched datapath (gateway routers) need zones assigned. */
> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>      HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>          /* XXX Add method to limit zone assignment to logical router
>           * datapaths with NAT */
> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>          sset_add(&all_users, dnat);
>          sset_add(&all_users, snat);
>          shash_add(&all_lds, dnat, ld);
> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>          /* Check if the requested snat zone has changed for the datapath
>           * or not.  If so, then fall back to full recompute of
>           * ct_zone engine. */
> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>          struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>                                                     snat_dp_zone_key);
>          free(snat_dp_zone_key);
> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>                                          &ct_zones_data->pending);
>                      updated = true;
>                  }
> +                char *drop_zone = alloc_ct_zone_key(
> +                    &t_lport->pb->header_.uuid, "drop");
> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
> +                    alloc_id_to_ct_zone(drop_zone,
> +                                        &ct_zones_data->current,
> +                                        ct_zones_data->bitmap, &scan_start,
> +                                        &ct_zones_data->pending);
> +                    updated = true;
> +                }
> +                free(drop_zone);
>              } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>                  struct simap_node *ct_zone =
>                      simap_find(&ct_zones_data->current,
> diff --git a/controller/physical.c b/controller/physical.c
> index f3c8bddce..fc46669c1 100644
> --- a/controller/physical.c
> +++ b/controller/physical.c
> @@ -60,6 +60,7 @@ struct zone_ids {
>      int ct;                     /* MFF_LOG_CT_ZONE. */
>      int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>      int snat;                   /* MFF_LOG_SNAT_ZONE. */
> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>  };
>  
>  struct tunnel {
> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>  
>      const struct uuid *key = &binding->datapath->header_.uuid;
>  
> -    char *dnat = alloc_nat_zone_key(key, "dnat");
> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>      zone_ids.dnat = simap_get(ct_zones, dnat);
>      free(dnat);
>  
> -    char *snat = alloc_nat_zone_key(key, "snat");
> +    char *snat = alloc_ct_zone_key(key, "snat");
>      zone_ids.snat = simap_get(ct_zones, snat);
>      free(snat);
>  
> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
> +    free(drop_zone);
> +
>      return zone_ids;
>  }
>  
> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>          if (zone_ids->snat) {
>              put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>          }
> +        if (zone_ids->drop) {
> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
> +        }
>      }
>  }
>  
> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>                      pb->header_.uuid.parts[0], &match, ofpacts_p,
>                      &pb->header_.uuid);
>  
> +    if (zone_ids->drop) {
> +        /* Table 39, Priority 1.
> +         * =======================
> +         *
> +         * Clear the logical registers (for consistent behavior with packets
> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
> +        match_init_catchall(&match);
> +        ofpbuf_clear(ofpacts_p);
> +        match_set_metadata(&match, htonll(dp_key));
> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
> +            }
> +        }
> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
> +                        pb->datapath->header_.uuid.parts[0], &match,
> +                        ofpacts_p, &pb->datapath->header_.uuid);
> +    }
> +
>      /* Table 39, Priority 100.
>       * =======================
>       *
> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
> index d7ee84dac..6424250a6 100644
> --- a/include/ovn/actions.h
> +++ b/include/ovn/actions.h
> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>      OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>      OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>      OVNACT(CHK_ECMP_NH,       ovnact_result)          \
> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>  
>  /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>  enum OVS_PACKED_ENUM ovnact_type {
> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
> index 3db7265e4..889f5f9e3 100644
> --- a/include/ovn/logical-fields.h
> +++ b/include/ovn/logical-fields.h
> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>  #define MFF_LOG_REG0             MFF_REG0
>  #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>  #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>  
>  #define MFF_LOG_XXREG0           MFF_XXREG0
>  #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
> diff --git a/lib/actions.c b/lib/actions.c
> index adbb42db4..dfff3e793 100644
> --- a/lib/actions.c
> +++ b/lib/actions.c
> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>                             MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>  }
>  
> +static void
> +parse_ct_commit_drop(struct action_context *ctx)
> +{
> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
> +}
> +
> +static void
> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
> +{
> +    format_nested_action(on, "ct_commit_drop", s);
> +}
> +
> +static void
> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
> +                      struct ofpbuf *ofpacts)
> +{
> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
> +    ct->flags = NX_CT_F_COMMIT;
> +    ct->recirc_table = NX_CT_RECIRC_NONE;
> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
> +    ct->zone_src.ofs = 0;
> +    ct->zone_src.n_bits = 16;
> +
> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
> +     * collisions at commit time between NATed and firewalled-only sessions.
> +     */
> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
> +        size_t nat_offset = ofpacts->size;
> +        ofpbuf_pull(ofpacts, nat_offset);
> +
> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
> +        nat->flags = 0;
> +        nat->range_af = AF_UNSPEC;
> +        nat->flags |= NX_NAT_F_SRC;
> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
> +        ct = ofpacts->header;
> +    }
> +
> +    size_t set_field_offset = ofpacts->size;
> +    ofpbuf_pull(ofpacts, set_field_offset);
> +
> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
> +    ct = ofpacts->header;
> +    ofpact_finish(ofpacts, &ct->ofpact);
> +}
> +
>  /* Parses an assignment or exchange or put_dhcp_opts action. */
>  static void
>  parse_set_action(struct action_context *ctx)
> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>          parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>      } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>          parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
> +        parse_ct_commit_drop(ctx);
>      } else {
>          lexer_syntax_error(ctx->lexer, "expecting action");
>      }
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index 616999eab..a72533a9e 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>      destroy_lport_addresses(&laddrs);
>  }
>  
> -/* Allocates a key for NAT conntrack zone allocation for a provided
> +/* Allocates a key for conntrack zone allocation for a provided
>   * 'key' record and a 'type'.
>   *
>   * It is the caller's responsibility to free the allocated memory. */
>  char *
> -alloc_nat_zone_key(const struct uuid *key, const char *type)
> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>  {
>      return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>  }
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index b3905ef7b..e0855f19e 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>  void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>                       struct svec *ipv6_addrs);
>  
> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>  
>  const char *default_nb_db(void);
>  const char *default_sb_db(void);
> diff --git a/northd/northd.c b/northd/northd.c
> index 7e2681865..2aff3458c 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -287,7 +287,7 @@ enum ovn_stage {
>   * +----+----------------------------------------------+ G |                  |
>   * | R7 |                   UNUSED                     | 1 |                  |
>   * +----+----------------------------------------------+---+------------------+
> - * | R8 |                   UNUSED                     |
> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>   * +----+----------------------------------------------+
>   * | R9 |                   UNUSED                     |
>   * +----+----------------------------------------------+
> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>              } else {
>                  ds_put_format(match, " && (%s)", acl->match);
>                  build_acl_log(actions, acl, meter_groups);
> +                if (acl->label) {
> +                    ds_put_format(actions, "ct_commit_drop { "
> +                                  "ct_label.label = %"PRId64"; }; ",
> +                                  acl->label);
> +                }
>                  ds_put_cstr(actions, "/* drop */");
>                  ovn_lflow_add_with_hint(lflows, od, stage,
>                                          acl->priority + OVN_ACL_PRI_OFFSET,
> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>              ds_clear(match);
>              ds_clear(actions);
>              ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>                            ct_blocked_match);
> +            if (acl->label) {
> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
> +                              acl->label);
> +            }
> +            ds_put_cstr(actions, "}; ");
>              if (!strcmp(acl->action, "reject")) {
>                  build_reject_acl_rules(od, lflows, stage, acl, match,
>                                         actions, &acl->header_, meter_groups);
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index b8f871394..af653be4d 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -699,7 +699,12 @@
>          connections and <code>ct_commit(ct_label=1/1);</code> for known
>          connections.  Setting <code>ct_label</code> marks a connection
>          as one that was previously allowed, but should no longer be
> -        allowed due to a policy change.
> +        allowed due to a policy change. If the ACL has a <code>label</code>,
> +        then it translates to
> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
> +        untracked connections and
> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
> +        for known connections.
>        </li>
>      </ul>
>  
> @@ -969,7 +974,12 @@
>          or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>          known connections.  Setting <code>ct_label</code> marks a connection
>          as one that was previously allowed, but should no longer be
> -        allowed due to a policy change.
> +        allowed due to a policy change. If the ACL has a <code>label</code>,
> +        then it translates to
> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
> +        untracked connections and
> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
> +        for known connections.
>        </li>
>      </ul>
>  
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index 37a709f83..4e294a212 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -1377,6 +1377,23 @@
>            </p>
>          </dd>
>  
> +        <dt><code>ct_commit_drop { };</code></dt>
> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
> +        <dd>
> +          <p>
> +            This action is identical to <code>ct_commit</code> except that the
> +            connection tracking entry is committed in a different zone.
> +          </p>
> +
> +          <p>
> +            This action was added to store connections dropped by ACLs in a
> +            separate zone that is managed independently of the
> +            <code>ct_commit</code> zone, for debugging.
> +          </p>
> +        </dd>
> +
>          <dt><code>ct_dnat;</code></dt>
>          <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>          <dd>
> diff --git a/ovs b/ovs
> index 6f24c2bc7..d94cd0d3e 160000
> --- a/ovs
> +++ b/ovs
> @@ -1 +1 @@
> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
> index 3bbdbd998..07509f488 100644
> --- a/utilities/ovn-nbctl.c
> +++ b/utilities/ovn-nbctl.c
> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>      /* Set the ACL label */
>      const char *label = shash_find_data(&ctx->options, "--label");
>      if (label) {
> -      /* Ensure that the action is either allow or allow-related */
> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
> +      /* Ensure that the action is either allow or allow-related or drop */
> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
> +          strcmp(action, "drop")) {
>          ctl_error(ctx, "label can only be set with actions \"allow\" or "
> -                  "\"allow-related\"");
> +                  "\"allow-related\" or \"drop\"");
>          return;
>        }
>
Abhiram Sangana Oct. 20, 2022, 1:49 p.m. UTC | #2
Hi Dumitru,

Thanks for reviewing the patch.

> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> Hi Abhiram,
> 
> Thanks for the patch!  I only skimmed the changes so this is not a full
> review but more of a discussion starter.
> 
> On 10/18/22 17:33, Abhiram Sangana wrote:
>> To identify connections dropped by ACLs, users can enable logging for ACLs
>> but this approach does not scale. ACL logging uses "controller" action
>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>> ovn-controller to a lesser extent) even with metering enabled (observed
>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>> approach is to use drop sampling (patch by Adrian Moreno currently in
>> review) but we might miss specific connections of interest with this
>> approach.
>> 
>> This patch commits connections dropped by ACLs to the connection tracking
>> table with a specific ACL label that was introduced in 0e0228be (
>> northd: Add ACL label). The dropped connections are committed in a separate
>> CT zone so that they can be managed independently.
>> 
> 
> I'm not sure I understand how the CMS can manage this.  How is this
> better than sampling?  Committed connections are going to time out at
> some point (30 sec by default for udp/icmp with the kernel datapath).
> So the CMS will have to continuously monitor the contents of the
> conntrack zone?  Aren't we just moving the CPU load somewhere else with
> this?  Even so, there's a chance an entry is missed.

Linux nf_conntrack module supports sending connection tracking events
to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
parameter). So, CMS can parse the stream of new connection events from
conntrack and log the packets based on CT label.

An issue with sampling is that if there are a large number of packets
for a particular connection(s), packets of other connections might not
get sampled and we miss information about these connections. With the
conntrack approach, we get a single event for each connection (until
they time out), so there is lesser load on the CMS/collector and lesser
likelihood of missing connections.

> 
>> Each logical port is assigned a new zone for committing dropped flows.
>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>> 
>> A new lflow action "ct_commit_drop" is introduced that commits flows
>> to connection tracking table in a zone identified by
>> MFF_LOG_ACL_DROP_ZONE register.
>> 
>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>> instead of silently dropping the packet.
>> 
>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>> ---
>> controller/ovn-controller.c  | 23 ++++++++++++++---
>> controller/physical.c        | 32 +++++++++++++++++++++--
>> include/ovn/actions.h        |  1 +
>> include/ovn/logical-fields.h |  1 +
>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>> lib/ovn-util.c               |  4 +--
>> lib/ovn-util.h               |  2 +-
>> northd/northd.c              | 14 ++++++++--
>> northd/ovn-northd.8.xml      | 14 ++++++++--
>> ovn-sb.xml                   | 17 ++++++++++++
>> ovs                          |  2 +-
> 
> This shouldn't need to change the OVS submodule.
> 
My bad. Will fix this.

Thanks,
Abhiram Sangana

> Thanks,
> Dumitru
> 
>> utilities/ovn-nbctl.c        |  7 ++---
>> 12 files changed, 151 insertions(+), 16 deletions(-)
>> 
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index c97744d57..1ad20fe55 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>     unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>> 
>>     struct shash_node *shash_node;
>> +    const struct binding_lport *lport;
>>     SHASH_FOR_EACH (shash_node, binding_lports) {
>>         sset_add(&all_users, shash_node->name);
>> +
>> +        /* Zone for committing dropped connections of a vNIC. */
>> +        lport = shash_node->data;
>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>> +        sset_add(&all_users, drop_zone);
>> +        free(drop_zone);
>>     }
>> 
>>     /* Local patched datapath (gateway routers) need zones assigned. */
>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>     HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>         /* XXX Add method to limit zone assignment to logical router
>>          * datapaths with NAT */
>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>         sset_add(&all_users, dnat);
>>         sset_add(&all_users, snat);
>>         shash_add(&all_lds, dnat, ld);
>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>         /* Check if the requested snat zone has changed for the datapath
>>          * or not.  If so, then fall back to full recompute of
>>          * ct_zone engine. */
>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>         struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>                                                    snat_dp_zone_key);
>>         free(snat_dp_zone_key);
>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>                                         &ct_zones_data->pending);
>>                     updated = true;
>>                 }
>> +                char *drop_zone = alloc_ct_zone_key(
>> +                    &t_lport->pb->header_.uuid, "drop");
>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>> +                    alloc_id_to_ct_zone(drop_zone,
>> +                                        &ct_zones_data->current,
>> +                                        ct_zones_data->bitmap, &scan_start,
>> +                                        &ct_zones_data->pending);
>> +                    updated = true;
>> +                }
>> +                free(drop_zone);
>>             } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>                 struct simap_node *ct_zone =
>>                     simap_find(&ct_zones_data->current,
>> diff --git a/controller/physical.c b/controller/physical.c
>> index f3c8bddce..fc46669c1 100644
>> --- a/controller/physical.c
>> +++ b/controller/physical.c
>> @@ -60,6 +60,7 @@ struct zone_ids {
>>     int ct;                     /* MFF_LOG_CT_ZONE. */
>>     int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>     int snat;                   /* MFF_LOG_SNAT_ZONE. */
>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>> };
>> 
>> struct tunnel {
>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>> 
>>     const struct uuid *key = &binding->datapath->header_.uuid;
>> 
>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>     zone_ids.dnat = simap_get(ct_zones, dnat);
>>     free(dnat);
>> 
>> -    char *snat = alloc_nat_zone_key(key, "snat");
>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>     zone_ids.snat = simap_get(ct_zones, snat);
>>     free(snat);
>> 
>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>> +    free(drop_zone);
>> +
>>     return zone_ids;
>> }
>> 
>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>         if (zone_ids->snat) {
>>             put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>         }
>> +        if (zone_ids->drop) {
>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>> +        }
>>     }
>> }
>> 
>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>                     pb->header_.uuid.parts[0], &match, ofpacts_p,
>>                     &pb->header_.uuid);
>> 
>> +    if (zone_ids->drop) {
>> +        /* Table 39, Priority 1.
>> +         * =======================
>> +         *
>> +         * Clear the logical registers (for consistent behavior with packets
>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>> +        match_init_catchall(&match);
>> +        ofpbuf_clear(ofpacts_p);
>> +        match_set_metadata(&match, htonll(dp_key));
>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>> +            }
>> +        }
>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>> +                        pb->datapath->header_.uuid.parts[0], &match,
>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>> +    }
>> +
>>     /* Table 39, Priority 100.
>>      * =======================
>>      *
>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>> index d7ee84dac..6424250a6 100644
>> --- a/include/ovn/actions.h
>> +++ b/include/ovn/actions.h
>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>     OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>     OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>     OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>> 
>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>> enum OVS_PACKED_ENUM ovnact_type {
>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>> index 3db7265e4..889f5f9e3 100644
>> --- a/include/ovn/logical-fields.h
>> +++ b/include/ovn/logical-fields.h
>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>> #define MFF_LOG_REG0             MFF_REG0
>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>> 
>> #define MFF_LOG_XXREG0           MFF_XXREG0
>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>> diff --git a/lib/actions.c b/lib/actions.c
>> index adbb42db4..dfff3e793 100644
>> --- a/lib/actions.c
>> +++ b/lib/actions.c
>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>                            MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>> }
>> 
>> +static void
>> +parse_ct_commit_drop(struct action_context *ctx)
>> +{
>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>> +}
>> +
>> +static void
>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>> +{
>> +    format_nested_action(on, "ct_commit_drop", s);
>> +}
>> +
>> +static void
>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>> +                      struct ofpbuf *ofpacts)
>> +{
>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>> +    ct->flags = NX_CT_F_COMMIT;
>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>> +    ct->zone_src.ofs = 0;
>> +    ct->zone_src.n_bits = 16;
>> +
>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>> +     * collisions at commit time between NATed and firewalled-only sessions.
>> +     */
>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>> +        size_t nat_offset = ofpacts->size;
>> +        ofpbuf_pull(ofpacts, nat_offset);
>> +
>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>> +        nat->flags = 0;
>> +        nat->range_af = AF_UNSPEC;
>> +        nat->flags |= NX_NAT_F_SRC;
>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>> +        ct = ofpacts->header;
>> +    }
>> +
>> +    size_t set_field_offset = ofpacts->size;
>> +    ofpbuf_pull(ofpacts, set_field_offset);
>> +
>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>> +    ct = ofpacts->header;
>> +    ofpact_finish(ofpacts, &ct->ofpact);
>> +}
>> +
>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>> static void
>> parse_set_action(struct action_context *ctx)
>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>         parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>     } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>         parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>> +        parse_ct_commit_drop(ctx);
>>     } else {
>>         lexer_syntax_error(ctx->lexer, "expecting action");
>>     }
>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>> index 616999eab..a72533a9e 100644
>> --- a/lib/ovn-util.c
>> +++ b/lib/ovn-util.c
>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>     destroy_lport_addresses(&laddrs);
>> }
>> 
>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>> +/* Allocates a key for conntrack zone allocation for a provided
>>  * 'key' record and a 'type'.
>>  *
>>  * It is the caller's responsibility to free the allocated memory. */
>> char *
>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>> {
>>     return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>> }
>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>> index b3905ef7b..e0855f19e 100644
>> --- a/lib/ovn-util.h
>> +++ b/lib/ovn-util.h
>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>                      struct svec *ipv6_addrs);
>> 
>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>> 
>> const char *default_nb_db(void);
>> const char *default_sb_db(void);
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 7e2681865..2aff3458c 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>  * +----+----------------------------------------------+ G |                  |
>>  * | R7 |                   UNUSED                     | 1 |                  |
>>  * +----+----------------------------------------------+---+------------------+
>> - * | R8 |                   UNUSED                     |
>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>  * +----+----------------------------------------------+
>>  * | R9 |                   UNUSED                     |
>>  * +----+----------------------------------------------+
>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>             } else {
>>                 ds_put_format(match, " && (%s)", acl->match);
>>                 build_acl_log(actions, acl, meter_groups);
>> +                if (acl->label) {
>> +                    ds_put_format(actions, "ct_commit_drop { "
>> +                                  "ct_label.label = %"PRId64"; }; ",
>> +                                  acl->label);
>> +                }
>>                 ds_put_cstr(actions, "/* drop */");
>>                 ovn_lflow_add_with_hint(lflows, od, stage,
>>                                         acl->priority + OVN_ACL_PRI_OFFSET,
>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>             ds_clear(match);
>>             ds_clear(actions);
>>             ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>                           ct_blocked_match);
>> +            if (acl->label) {
>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>> +                              acl->label);
>> +            }
>> +            ds_put_cstr(actions, "}; ");
>>             if (!strcmp(acl->action, "reject")) {
>>                 build_reject_acl_rules(od, lflows, stage, acl, match,
>>                                        actions, &acl->header_, meter_groups);
>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>> index b8f871394..af653be4d 100644
>> --- a/northd/ovn-northd.8.xml
>> +++ b/northd/ovn-northd.8.xml
>> @@ -699,7 +699,12 @@
>>         connections and <code>ct_commit(ct_label=1/1);</code> for known
>>         connections.  Setting <code>ct_label</code> marks a connection
>>         as one that was previously allowed, but should no longer be
>> -        allowed due to a policy change.
>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>> +        then it translates to
>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>> +        untracked connections and
>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>> +        for known connections.
>>       </li>
>>     </ul>
>> 
>> @@ -969,7 +974,12 @@
>>         or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>         known connections.  Setting <code>ct_label</code> marks a connection
>>         as one that was previously allowed, but should no longer be
>> -        allowed due to a policy change.
>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>> +        then it translates to
>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>> +        untracked connections and
>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>> +        for known connections.
>>       </li>
>>     </ul>
>> 
>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>> index 37a709f83..4e294a212 100644
>> --- a/ovn-sb.xml
>> +++ b/ovn-sb.xml
>> @@ -1377,6 +1377,23 @@
>>           </p>
>>         </dd>
>> 
>> +        <dt><code>ct_commit_drop { };</code></dt>
>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>> +        <dd>
>> +          <p>
>> +            This action is identical to <code>ct_commit</code> except that the
>> +            connection tracking entry is committed in a different zone.
>> +          </p>
>> +
>> +          <p>
>> +            This action was added to store connections dropped by ACLs in a
>> +            separate zone that is managed independently of the
>> +            <code>ct_commit</code> zone, for debugging.
>> +          </p>
>> +        </dd>
>> +
>>         <dt><code>ct_dnat;</code></dt>
>>         <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>         <dd>
>> diff --git a/ovs b/ovs
>> index 6f24c2bc7..d94cd0d3e 160000
>> --- a/ovs
>> +++ b/ovs
>> @@ -1 +1 @@
>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>> index 3bbdbd998..07509f488 100644
>> --- a/utilities/ovn-nbctl.c
>> +++ b/utilities/ovn-nbctl.c
>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>     /* Set the ACL label */
>>     const char *label = shash_find_data(&ctx->options, "--label");
>>     if (label) {
>> -      /* Ensure that the action is either allow or allow-related */
>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>> +      /* Ensure that the action is either allow or allow-related or drop */
>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>> +          strcmp(action, "drop")) {
>>         ctl_error(ctx, "label can only be set with actions \"allow\" or "
>> -                  "\"allow-related\"");
>> +                  "\"allow-related\" or \"drop\"");
>>         return;
>>       }
Dumitru Ceara Oct. 20, 2022, 2:18 p.m. UTC | #3
On 10/20/22 15:49, Abhiram Sangana wrote:
> Hi Dumitru,
> 
> Thanks for reviewing the patch.
> 
>> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> Hi Abhiram,
>>
>> Thanks for the patch!  I only skimmed the changes so this is not a full
>> review but more of a discussion starter.
>>
>> On 10/18/22 17:33, Abhiram Sangana wrote:
>>> To identify connections dropped by ACLs, users can enable logging for ACLs
>>> but this approach does not scale. ACL logging uses "controller" action
>>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>>> ovn-controller to a lesser extent) even with metering enabled (observed
>>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>>> approach is to use drop sampling (patch by Adrian Moreno currently in
>>> review) but we might miss specific connections of interest with this
>>> approach.
>>>
>>> This patch commits connections dropped by ACLs to the connection tracking
>>> table with a specific ACL label that was introduced in 0e0228be (
>>> northd: Add ACL label). The dropped connections are committed in a separate
>>> CT zone so that they can be managed independently.
>>>
>>
>> I'm not sure I understand how the CMS can manage this.  How is this
>> better than sampling?  Committed connections are going to time out at
>> some point (30 sec by default for udp/icmp with the kernel datapath).
>> So the CMS will have to continuously monitor the contents of the
>> conntrack zone?  Aren't we just moving the CPU load somewhere else with
>> this?  Even so, there's a chance an entry is missed.
> 
> Linux nf_conntrack module supports sending connection tracking events
> to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
> parameter). So, CMS can parse the stream of new connection events from
> conntrack and log the packets based on CT label.
> 

Ack.

> An issue with sampling is that if there are a large number of packets
> for a particular connection(s), packets of other connections might not
> get sampled and we miss information about these connections. With the
> conntrack approach, we get a single event for each connection (until
> they time out), so there is lesser load on the CMS/collector and lesser
> likelihood of missing connections.
> 

I hadn't considered this advantage, thanks for pointing it out!

>>
>>> Each logical port is assigned a new zone for committing dropped flows.
>>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>>>
>>> A new lflow action "ct_commit_drop" is introduced that commits flows
>>> to connection tracking table in a zone identified by
>>> MFF_LOG_ACL_DROP_ZONE register.
>>>
>>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>>> instead of silently dropping the packet.
>>>
>>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>>> ---
>>> controller/ovn-controller.c  | 23 ++++++++++++++---
>>> controller/physical.c        | 32 +++++++++++++++++++++--
>>> include/ovn/actions.h        |  1 +
>>> include/ovn/logical-fields.h |  1 +
>>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>>> lib/ovn-util.c               |  4 +--
>>> lib/ovn-util.h               |  2 +-
>>> northd/northd.c              | 14 ++++++++--
>>> northd/ovn-northd.8.xml      | 14 ++++++++--
>>> ovn-sb.xml                   | 17 ++++++++++++
>>> ovs                          |  2 +-
>>
>> This shouldn't need to change the OVS submodule.
>>
> My bad. Will fix this.
> 

Cool, thanks, looking forward for v1!

Regards,
Dumitru

> Thanks,
> Abhiram Sangana
> 
>> Thanks,
>> Dumitru
>>
>>> utilities/ovn-nbctl.c        |  7 ++---
>>> 12 files changed, 151 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index c97744d57..1ad20fe55 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>>     unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>>>
>>>     struct shash_node *shash_node;
>>> +    const struct binding_lport *lport;
>>>     SHASH_FOR_EACH (shash_node, binding_lports) {
>>>         sset_add(&all_users, shash_node->name);
>>> +
>>> +        /* Zone for committing dropped connections of a vNIC. */
>>> +        lport = shash_node->data;
>>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>>> +        sset_add(&all_users, drop_zone);
>>> +        free(drop_zone);
>>>     }
>>>
>>>     /* Local patched datapath (gateway routers) need zones assigned. */
>>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>>     HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>>         /* XXX Add method to limit zone assignment to logical router
>>>          * datapaths with NAT */
>>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>>         sset_add(&all_users, dnat);
>>>         sset_add(&all_users, snat);
>>>         shash_add(&all_lds, dnat, ld);
>>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>>         /* Check if the requested snat zone has changed for the datapath
>>>          * or not.  If so, then fall back to full recompute of
>>>          * ct_zone engine. */
>>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>>         struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>>                                                    snat_dp_zone_key);
>>>         free(snat_dp_zone_key);
>>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>>                                         &ct_zones_data->pending);
>>>                     updated = true;
>>>                 }
>>> +                char *drop_zone = alloc_ct_zone_key(
>>> +                    &t_lport->pb->header_.uuid, "drop");
>>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>>> +                    alloc_id_to_ct_zone(drop_zone,
>>> +                                        &ct_zones_data->current,
>>> +                                        ct_zones_data->bitmap, &scan_start,
>>> +                                        &ct_zones_data->pending);
>>> +                    updated = true;
>>> +                }
>>> +                free(drop_zone);
>>>             } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>>                 struct simap_node *ct_zone =
>>>                     simap_find(&ct_zones_data->current,
>>> diff --git a/controller/physical.c b/controller/physical.c
>>> index f3c8bddce..fc46669c1 100644
>>> --- a/controller/physical.c
>>> +++ b/controller/physical.c
>>> @@ -60,6 +60,7 @@ struct zone_ids {
>>>     int ct;                     /* MFF_LOG_CT_ZONE. */
>>>     int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>>     int snat;                   /* MFF_LOG_SNAT_ZONE. */
>>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>>> };
>>>
>>> struct tunnel {
>>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>>
>>>     const struct uuid *key = &binding->datapath->header_.uuid;
>>>
>>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>>     zone_ids.dnat = simap_get(ct_zones, dnat);
>>>     free(dnat);
>>>
>>> -    char *snat = alloc_nat_zone_key(key, "snat");
>>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>>     zone_ids.snat = simap_get(ct_zones, snat);
>>>     free(snat);
>>>
>>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>>> +    free(drop_zone);
>>> +
>>>     return zone_ids;
>>> }
>>>
>>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>>         if (zone_ids->snat) {
>>>             put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>>         }
>>> +        if (zone_ids->drop) {
>>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>>> +        }
>>>     }
>>> }
>>>
>>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>>                     pb->header_.uuid.parts[0], &match, ofpacts_p,
>>>                     &pb->header_.uuid);
>>>
>>> +    if (zone_ids->drop) {
>>> +        /* Table 39, Priority 1.
>>> +         * =======================
>>> +         *
>>> +         * Clear the logical registers (for consistent behavior with packets
>>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>>> +        match_init_catchall(&match);
>>> +        ofpbuf_clear(ofpacts_p);
>>> +        match_set_metadata(&match, htonll(dp_key));
>>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>>> +            }
>>> +        }
>>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>>> +                        pb->datapath->header_.uuid.parts[0], &match,
>>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>>> +    }
>>> +
>>>     /* Table 39, Priority 100.
>>>      * =======================
>>>      *
>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>> index d7ee84dac..6424250a6 100644
>>> --- a/include/ovn/actions.h
>>> +++ b/include/ovn/actions.h
>>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>>     OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>>     OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>>     OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>>>
>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>> enum OVS_PACKED_ENUM ovnact_type {
>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>> index 3db7265e4..889f5f9e3 100644
>>> --- a/include/ovn/logical-fields.h
>>> +++ b/include/ovn/logical-fields.h
>>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>>> #define MFF_LOG_REG0             MFF_REG0
>>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>>>
>>> #define MFF_LOG_XXREG0           MFF_XXREG0
>>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>>> diff --git a/lib/actions.c b/lib/actions.c
>>> index adbb42db4..dfff3e793 100644
>>> --- a/lib/actions.c
>>> +++ b/lib/actions.c
>>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>>                            MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>>> }
>>>
>>> +static void
>>> +parse_ct_commit_drop(struct action_context *ctx)
>>> +{
>>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>>> +}
>>> +
>>> +static void
>>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>>> +{
>>> +    format_nested_action(on, "ct_commit_drop", s);
>>> +}
>>> +
>>> +static void
>>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>>> +                      struct ofpbuf *ofpacts)
>>> +{
>>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>>> +    ct->flags = NX_CT_F_COMMIT;
>>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>>> +    ct->zone_src.ofs = 0;
>>> +    ct->zone_src.n_bits = 16;
>>> +
>>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>>> +     * collisions at commit time between NATed and firewalled-only sessions.
>>> +     */
>>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>>> +        size_t nat_offset = ofpacts->size;
>>> +        ofpbuf_pull(ofpacts, nat_offset);
>>> +
>>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>>> +        nat->flags = 0;
>>> +        nat->range_af = AF_UNSPEC;
>>> +        nat->flags |= NX_NAT_F_SRC;
>>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>>> +        ct = ofpacts->header;
>>> +    }
>>> +
>>> +    size_t set_field_offset = ofpacts->size;
>>> +    ofpbuf_pull(ofpacts, set_field_offset);
>>> +
>>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>>> +    ct = ofpacts->header;
>>> +    ofpact_finish(ofpacts, &ct->ofpact);
>>> +}
>>> +
>>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>>> static void
>>> parse_set_action(struct action_context *ctx)
>>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>>         parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>>     } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>>         parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>>> +        parse_ct_commit_drop(ctx);
>>>     } else {
>>>         lexer_syntax_error(ctx->lexer, "expecting action");
>>>     }
>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>> index 616999eab..a72533a9e 100644
>>> --- a/lib/ovn-util.c
>>> +++ b/lib/ovn-util.c
>>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>     destroy_lport_addresses(&laddrs);
>>> }
>>>
>>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>>> +/* Allocates a key for conntrack zone allocation for a provided
>>>  * 'key' record and a 'type'.
>>>  *
>>>  * It is the caller's responsibility to free the allocated memory. */
>>> char *
>>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>>> {
>>>     return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>>> }
>>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>>> index b3905ef7b..e0855f19e 100644
>>> --- a/lib/ovn-util.h
>>> +++ b/lib/ovn-util.h
>>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>                      struct svec *ipv6_addrs);
>>>
>>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>>>
>>> const char *default_nb_db(void);
>>> const char *default_sb_db(void);
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 7e2681865..2aff3458c 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>>  * +----+----------------------------------------------+ G |                  |
>>>  * | R7 |                   UNUSED                     | 1 |                  |
>>>  * +----+----------------------------------------------+---+------------------+
>>> - * | R8 |                   UNUSED                     |
>>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>>  * +----+----------------------------------------------+
>>>  * | R9 |                   UNUSED                     |
>>>  * +----+----------------------------------------------+
>>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>             } else {
>>>                 ds_put_format(match, " && (%s)", acl->match);
>>>                 build_acl_log(actions, acl, meter_groups);
>>> +                if (acl->label) {
>>> +                    ds_put_format(actions, "ct_commit_drop { "
>>> +                                  "ct_label.label = %"PRId64"; }; ",
>>> +                                  acl->label);
>>> +                }
>>>                 ds_put_cstr(actions, "/* drop */");
>>>                 ovn_lflow_add_with_hint(lflows, od, stage,
>>>                                         acl->priority + OVN_ACL_PRI_OFFSET,
>>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>             ds_clear(match);
>>>             ds_clear(actions);
>>>             ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>>                           ct_blocked_match);
>>> +            if (acl->label) {
>>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>>> +                              acl->label);
>>> +            }
>>> +            ds_put_cstr(actions, "}; ");
>>>             if (!strcmp(acl->action, "reject")) {
>>>                 build_reject_acl_rules(od, lflows, stage, acl, match,
>>>                                        actions, &acl->header_, meter_groups);
>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>> index b8f871394..af653be4d 100644
>>> --- a/northd/ovn-northd.8.xml
>>> +++ b/northd/ovn-northd.8.xml
>>> @@ -699,7 +699,12 @@
>>>         connections and <code>ct_commit(ct_label=1/1);</code> for known
>>>         connections.  Setting <code>ct_label</code> marks a connection
>>>         as one that was previously allowed, but should no longer be
>>> -        allowed due to a policy change.
>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>> +        then it translates to
>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>> +        untracked connections and
>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>> +        for known connections.
>>>       </li>
>>>     </ul>
>>>
>>> @@ -969,7 +974,12 @@
>>>         or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>>         known connections.  Setting <code>ct_label</code> marks a connection
>>>         as one that was previously allowed, but should no longer be
>>> -        allowed due to a policy change.
>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>> +        then it translates to
>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>> +        untracked connections and
>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>> +        for known connections.
>>>       </li>
>>>     </ul>
>>>
>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>> index 37a709f83..4e294a212 100644
>>> --- a/ovn-sb.xml
>>> +++ b/ovn-sb.xml
>>> @@ -1377,6 +1377,23 @@
>>>           </p>
>>>         </dd>
>>>
>>> +        <dt><code>ct_commit_drop { };</code></dt>
>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>>> +        <dd>
>>> +          <p>
>>> +            This action is identical to <code>ct_commit</code> except that the
>>> +            connection tracking entry is committed in a different zone.
>>> +          </p>
>>> +
>>> +          <p>
>>> +            This action was added to store connections dropped by ACLs in a
>>> +            separate zone that is managed independently of the
>>> +            <code>ct_commit</code> zone, for debugging.
>>> +          </p>
>>> +        </dd>
>>> +
>>>         <dt><code>ct_dnat;</code></dt>
>>>         <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>>         <dd>
>>> diff --git a/ovs b/ovs
>>> index 6f24c2bc7..d94cd0d3e 160000
>>> --- a/ovs
>>> +++ b/ovs
>>> @@ -1 +1 @@
>>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>>> index 3bbdbd998..07509f488 100644
>>> --- a/utilities/ovn-nbctl.c
>>> +++ b/utilities/ovn-nbctl.c
>>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>>     /* Set the ACL label */
>>>     const char *label = shash_find_data(&ctx->options, "--label");
>>>     if (label) {
>>> -      /* Ensure that the action is either allow or allow-related */
>>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>>> +      /* Ensure that the action is either allow or allow-related or drop */
>>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>>> +          strcmp(action, "drop")) {
>>>         ctl_error(ctx, "label can only be set with actions \"allow\" or "
>>> -                  "\"allow-related\"");
>>> +                  "\"allow-related\" or \"drop\"");
>>>         return;
>>>       }
>
Abhiram Sangana Oct. 20, 2022, 3:34 p.m. UTC | #4
Hi Dumitru,

Can you please check if the implementation for the proposal looks ok?
Will send out v1 with the review comments and tests.
Also, any ideas how we can selectively create new drop zones for ports.
Currently, I am assigning a new zone for dropped connections to every
port even if its parent LS doesn’t have drop ACLs with labels.

Thanks,
Abhiram Sangana

> On 20 Oct 2022, at 15:18, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 10/20/22 15:49, Abhiram Sangana wrote:
>> Hi Dumitru,
>> 
>> Thanks for reviewing the patch.
>> 
>>> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> Hi Abhiram,
>>> 
>>> Thanks for the patch!  I only skimmed the changes so this is not a full
>>> review but more of a discussion starter.
>>> 
>>> On 10/18/22 17:33, Abhiram Sangana wrote:
>>>> To identify connections dropped by ACLs, users can enable logging for ACLs
>>>> but this approach does not scale. ACL logging uses "controller" action
>>>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>>>> ovn-controller to a lesser extent) even with metering enabled (observed
>>>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>>>> approach is to use drop sampling (patch by Adrian Moreno currently in
>>>> review) but we might miss specific connections of interest with this
>>>> approach.
>>>> 
>>>> This patch commits connections dropped by ACLs to the connection tracking
>>>> table with a specific ACL label that was introduced in 0e0228be (
>>>> northd: Add ACL label). The dropped connections are committed in a separate
>>>> CT zone so that they can be managed independently.
>>>> 
>>> 
>>> I'm not sure I understand how the CMS can manage this.  How is this
>>> better than sampling?  Committed connections are going to time out at
>>> some point (30 sec by default for udp/icmp with the kernel datapath).
>>> So the CMS will have to continuously monitor the contents of the
>>> conntrack zone?  Aren't we just moving the CPU load somewhere else with
>>> this?  Even so, there's a chance an entry is missed.
>> 
>> Linux nf_conntrack module supports sending connection tracking events
>> to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
>> parameter). So, CMS can parse the stream of new connection events from
>> conntrack and log the packets based on CT label.
>> 
> 
> Ack.
> 
>> An issue with sampling is that if there are a large number of packets
>> for a particular connection(s), packets of other connections might not
>> get sampled and we miss information about these connections. With the
>> conntrack approach, we get a single event for each connection (until
>> they time out), so there is lesser load on the CMS/collector and lesser
>> likelihood of missing connections.
>> 
> 
> I hadn't considered this advantage, thanks for pointing it out!
> 
>>> 
>>>> Each logical port is assigned a new zone for committing dropped flows.
>>>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>>>> 
>>>> A new lflow action "ct_commit_drop" is introduced that commits flows
>>>> to connection tracking table in a zone identified by
>>>> MFF_LOG_ACL_DROP_ZONE register.
>>>> 
>>>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>>>> instead of silently dropping the packet.
>>>> 
>>>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>>>> ---
>>>> controller/ovn-controller.c  | 23 ++++++++++++++---
>>>> controller/physical.c        | 32 +++++++++++++++++++++--
>>>> include/ovn/actions.h        |  1 +
>>>> include/ovn/logical-fields.h |  1 +
>>>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>>>> lib/ovn-util.c               |  4 +--
>>>> lib/ovn-util.h               |  2 +-
>>>> northd/northd.c              | 14 ++++++++--
>>>> northd/ovn-northd.8.xml      | 14 ++++++++--
>>>> ovn-sb.xml                   | 17 ++++++++++++
>>>> ovs                          |  2 +-
>>> 
>>> This shouldn't need to change the OVS submodule.
>>> 
>> My bad. Will fix this.
>> 
> 
> Cool, thanks, looking forward for v1!
> 
> Regards,
> Dumitru
> 
>> Thanks,
>> Abhiram Sangana
>> 
>>> Thanks,
>>> Dumitru
>>> 
>>>> utilities/ovn-nbctl.c        |  7 ++---
>>>> 12 files changed, 151 insertions(+), 16 deletions(-)
>>>> 
>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>> index c97744d57..1ad20fe55 100644
>>>> --- a/controller/ovn-controller.c
>>>> +++ b/controller/ovn-controller.c
>>>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>>>    unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>>>> 
>>>>    struct shash_node *shash_node;
>>>> +    const struct binding_lport *lport;
>>>>    SHASH_FOR_EACH (shash_node, binding_lports) {
>>>>        sset_add(&all_users, shash_node->name);
>>>> +
>>>> +        /* Zone for committing dropped connections of a vNIC. */
>>>> +        lport = shash_node->data;
>>>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>>>> +        sset_add(&all_users, drop_zone);
>>>> +        free(drop_zone);
>>>>    }
>>>> 
>>>>    /* Local patched datapath (gateway routers) need zones assigned. */
>>>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>>>    HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>>>        /* XXX Add method to limit zone assignment to logical router
>>>>         * datapaths with NAT */
>>>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>>>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>        sset_add(&all_users, dnat);
>>>>        sset_add(&all_users, snat);
>>>>        shash_add(&all_lds, dnat, ld);
>>>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>>>        /* Check if the requested snat zone has changed for the datapath
>>>>         * or not.  If so, then fall back to full recompute of
>>>>         * ct_zone engine. */
>>>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>>>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>>>        struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>>>                                                   snat_dp_zone_key);
>>>>        free(snat_dp_zone_key);
>>>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>>>                                        &ct_zones_data->pending);
>>>>                    updated = true;
>>>>                }
>>>> +                char *drop_zone = alloc_ct_zone_key(
>>>> +                    &t_lport->pb->header_.uuid, "drop");
>>>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>>>> +                    alloc_id_to_ct_zone(drop_zone,
>>>> +                                        &ct_zones_data->current,
>>>> +                                        ct_zones_data->bitmap, &scan_start,
>>>> +                                        &ct_zones_data->pending);
>>>> +                    updated = true;
>>>> +                }
>>>> +                free(drop_zone);
>>>>            } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>>>                struct simap_node *ct_zone =
>>>>                    simap_find(&ct_zones_data->current,
>>>> diff --git a/controller/physical.c b/controller/physical.c
>>>> index f3c8bddce..fc46669c1 100644
>>>> --- a/controller/physical.c
>>>> +++ b/controller/physical.c
>>>> @@ -60,6 +60,7 @@ struct zone_ids {
>>>>    int ct;                     /* MFF_LOG_CT_ZONE. */
>>>>    int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>>>    int snat;                   /* MFF_LOG_SNAT_ZONE. */
>>>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>>>> };
>>>> 
>>>> struct tunnel {
>>>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>>> 
>>>>    const struct uuid *key = &binding->datapath->header_.uuid;
>>>> 
>>>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>>>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>>>    zone_ids.dnat = simap_get(ct_zones, dnat);
>>>>    free(dnat);
>>>> 
>>>> -    char *snat = alloc_nat_zone_key(key, "snat");
>>>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>>>    zone_ids.snat = simap_get(ct_zones, snat);
>>>>    free(snat);
>>>> 
>>>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>>>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>>>> +    free(drop_zone);
>>>> +
>>>>    return zone_ids;
>>>> }
>>>> 
>>>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>>>        if (zone_ids->snat) {
>>>>            put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>>>        }
>>>> +        if (zone_ids->drop) {
>>>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>>>> +        }
>>>>    }
>>>> }
>>>> 
>>>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>>>                    pb->header_.uuid.parts[0], &match, ofpacts_p,
>>>>                    &pb->header_.uuid);
>>>> 
>>>> +    if (zone_ids->drop) {
>>>> +        /* Table 39, Priority 1.
>>>> +         * =======================
>>>> +         *
>>>> +         * Clear the logical registers (for consistent behavior with packets
>>>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>>>> +        match_init_catchall(&match);
>>>> +        ofpbuf_clear(ofpacts_p);
>>>> +        match_set_metadata(&match, htonll(dp_key));
>>>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>>>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>>>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>>>> +            }
>>>> +        }
>>>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>>>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>>>> +                        pb->datapath->header_.uuid.parts[0], &match,
>>>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>>>> +    }
>>>> +
>>>>    /* Table 39, Priority 100.
>>>>     * =======================
>>>>     *
>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>> index d7ee84dac..6424250a6 100644
>>>> --- a/include/ovn/actions.h
>>>> +++ b/include/ovn/actions.h
>>>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>>>    OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>>>    OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>>>    OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>>>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>>>> 
>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>>> index 3db7265e4..889f5f9e3 100644
>>>> --- a/include/ovn/logical-fields.h
>>>> +++ b/include/ovn/logical-fields.h
>>>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>>>> #define MFF_LOG_REG0             MFF_REG0
>>>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>>>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>>>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>>>> 
>>>> #define MFF_LOG_XXREG0           MFF_XXREG0
>>>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>> index adbb42db4..dfff3e793 100644
>>>> --- a/lib/actions.c
>>>> +++ b/lib/actions.c
>>>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>>>                           MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>>>> }
>>>> 
>>>> +static void
>>>> +parse_ct_commit_drop(struct action_context *ctx)
>>>> +{
>>>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>>>> +}
>>>> +
>>>> +static void
>>>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>>>> +{
>>>> +    format_nested_action(on, "ct_commit_drop", s);
>>>> +}
>>>> +
>>>> +static void
>>>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>>>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>>>> +                      struct ofpbuf *ofpacts)
>>>> +{
>>>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>>>> +    ct->flags = NX_CT_F_COMMIT;
>>>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>>>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>>>> +    ct->zone_src.ofs = 0;
>>>> +    ct->zone_src.n_bits = 16;
>>>> +
>>>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>>>> +     * collisions at commit time between NATed and firewalled-only sessions.
>>>> +     */
>>>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>>>> +        size_t nat_offset = ofpacts->size;
>>>> +        ofpbuf_pull(ofpacts, nat_offset);
>>>> +
>>>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>>>> +        nat->flags = 0;
>>>> +        nat->range_af = AF_UNSPEC;
>>>> +        nat->flags |= NX_NAT_F_SRC;
>>>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>>>> +        ct = ofpacts->header;
>>>> +    }
>>>> +
>>>> +    size_t set_field_offset = ofpacts->size;
>>>> +    ofpbuf_pull(ofpacts, set_field_offset);
>>>> +
>>>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>>>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>>>> +    ct = ofpacts->header;
>>>> +    ofpact_finish(ofpacts, &ct->ofpact);
>>>> +}
>>>> +
>>>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>>>> static void
>>>> parse_set_action(struct action_context *ctx)
>>>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>>>        parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>>>    } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>>>        parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>>>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>>>> +        parse_ct_commit_drop(ctx);
>>>>    } else {
>>>>        lexer_syntax_error(ctx->lexer, "expecting action");
>>>>    }
>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>> index 616999eab..a72533a9e 100644
>>>> --- a/lib/ovn-util.c
>>>> +++ b/lib/ovn-util.c
>>>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>    destroy_lport_addresses(&laddrs);
>>>> }
>>>> 
>>>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>>>> +/* Allocates a key for conntrack zone allocation for a provided
>>>> * 'key' record and a 'type'.
>>>> *
>>>> * It is the caller's responsibility to free the allocated memory. */
>>>> char *
>>>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>>>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>>>> {
>>>>    return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>>>> }
>>>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>>>> index b3905ef7b..e0855f19e 100644
>>>> --- a/lib/ovn-util.h
>>>> +++ b/lib/ovn-util.h
>>>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>>>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>                     struct svec *ipv6_addrs);
>>>> 
>>>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>>>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>>>> 
>>>> const char *default_nb_db(void);
>>>> const char *default_sb_db(void);
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 7e2681865..2aff3458c 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>>> * +----+----------------------------------------------+ G |                  |
>>>> * | R7 |                   UNUSED                     | 1 |                  |
>>>> * +----+----------------------------------------------+---+------------------+
>>>> - * | R8 |                   UNUSED                     |
>>>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>>> * +----+----------------------------------------------+
>>>> * | R9 |                   UNUSED                     |
>>>> * +----+----------------------------------------------+
>>>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>            } else {
>>>>                ds_put_format(match, " && (%s)", acl->match);
>>>>                build_acl_log(actions, acl, meter_groups);
>>>> +                if (acl->label) {
>>>> +                    ds_put_format(actions, "ct_commit_drop { "
>>>> +                                  "ct_label.label = %"PRId64"; }; ",
>>>> +                                  acl->label);
>>>> +                }
>>>>                ds_put_cstr(actions, "/* drop */");
>>>>                ovn_lflow_add_with_hint(lflows, od, stage,
>>>>                                        acl->priority + OVN_ACL_PRI_OFFSET,
>>>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>            ds_clear(match);
>>>>            ds_clear(actions);
>>>>            ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>>>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>>>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>>>                          ct_blocked_match);
>>>> +            if (acl->label) {
>>>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>>>> +                              acl->label);
>>>> +            }
>>>> +            ds_put_cstr(actions, "}; ");
>>>>            if (!strcmp(acl->action, "reject")) {
>>>>                build_reject_acl_rules(od, lflows, stage, acl, match,
>>>>                                       actions, &acl->header_, meter_groups);
>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>>> index b8f871394..af653be4d 100644
>>>> --- a/northd/ovn-northd.8.xml
>>>> +++ b/northd/ovn-northd.8.xml
>>>> @@ -699,7 +699,12 @@
>>>>        connections and <code>ct_commit(ct_label=1/1);</code> for known
>>>>        connections.  Setting <code>ct_label</code> marks a connection
>>>>        as one that was previously allowed, but should no longer be
>>>> -        allowed due to a policy change.
>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>> +        then it translates to
>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>> +        untracked connections and
>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>> +        for known connections.
>>>>      </li>
>>>>    </ul>
>>>> 
>>>> @@ -969,7 +974,12 @@
>>>>        or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>>>        known connections.  Setting <code>ct_label</code> marks a connection
>>>>        as one that was previously allowed, but should no longer be
>>>> -        allowed due to a policy change.
>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>> +        then it translates to
>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>> +        untracked connections and
>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>> +        for known connections.
>>>>      </li>
>>>>    </ul>
>>>> 
>>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>>> index 37a709f83..4e294a212 100644
>>>> --- a/ovn-sb.xml
>>>> +++ b/ovn-sb.xml
>>>> @@ -1377,6 +1377,23 @@
>>>>          </p>
>>>>        </dd>
>>>> 
>>>> +        <dt><code>ct_commit_drop { };</code></dt>
>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>>>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>>>> +        <dd>
>>>> +          <p>
>>>> +            This action is identical to <code>ct_commit</code> except that the
>>>> +            connection tracking entry is committed in a different zone.
>>>> +          </p>
>>>> +
>>>> +          <p>
>>>> +            This action was added to store connections dropped by ACLs in a
>>>> +            separate zone that is managed independently of the
>>>> +            <code>ct_commit</code> zone, for debugging.
>>>> +          </p>
>>>> +        </dd>
>>>> +
>>>>        <dt><code>ct_dnat;</code></dt>
>>>>        <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>>>        <dd>
>>>> diff --git a/ovs b/ovs
>>>> index 6f24c2bc7..d94cd0d3e 160000
>>>> --- a/ovs
>>>> +++ b/ovs
>>>> @@ -1 +1 @@
>>>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>>>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>>>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>>>> index 3bbdbd998..07509f488 100644
>>>> --- a/utilities/ovn-nbctl.c
>>>> +++ b/utilities/ovn-nbctl.c
>>>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>>>    /* Set the ACL label */
>>>>    const char *label = shash_find_data(&ctx->options, "--label");
>>>>    if (label) {
>>>> -      /* Ensure that the action is either allow or allow-related */
>>>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>>>> +      /* Ensure that the action is either allow or allow-related or drop */
>>>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>>>> +          strcmp(action, "drop")) {
>>>>        ctl_error(ctx, "label can only be set with actions \"allow\" or "
>>>> -                  "\"allow-related\"");
>>>> +                  "\"allow-related\" or \"drop\"");
>>>>        return;
>>>>      }
Dumitru Ceara Oct. 21, 2022, 9:58 a.m. UTC | #5
On 10/20/22 17:34, Abhiram Sangana wrote:
> Hi Dumitru,
> 
> Can you please check if the implementation for the proposal looks ok?
> Will send out v1 with the review comments and tests.
> Also, any ideas how we can selectively create new drop zones for ports.
> Currently, I am assigning a new zone for dropped connections to every
> port even if its parent LS doesn’t have drop ACLs with labels.
> 

Within a logical switch do we really need a drop zone per port?  Isn't
it actually enough if we add a "from-lport-drop" and "to-lport-drop"
zone for the whole logical switch?  That should simplify zone allocation.

> Thanks,
> Abhiram Sangana
> 
>> On 20 Oct 2022, at 15:18, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 10/20/22 15:49, Abhiram Sangana wrote:
>>> Hi Dumitru,
>>>
>>> Thanks for reviewing the patch.
>>>
>>>> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> Hi Abhiram,
>>>>
>>>> Thanks for the patch!  I only skimmed the changes so this is not a full
>>>> review but more of a discussion starter.
>>>>
>>>> On 10/18/22 17:33, Abhiram Sangana wrote:
>>>>> To identify connections dropped by ACLs, users can enable logging for ACLs
>>>>> but this approach does not scale. ACL logging uses "controller" action
>>>>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>>>>> ovn-controller to a lesser extent) even with metering enabled (observed
>>>>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>>>>> approach is to use drop sampling (patch by Adrian Moreno currently in
>>>>> review) but we might miss specific connections of interest with this
>>>>> approach.
>>>>>
>>>>> This patch commits connections dropped by ACLs to the connection tracking
>>>>> table with a specific ACL label that was introduced in 0e0228be (
>>>>> northd: Add ACL label). The dropped connections are committed in a separate
>>>>> CT zone so that they can be managed independently.
>>>>>
>>>>
>>>> I'm not sure I understand how the CMS can manage this.  How is this
>>>> better than sampling?  Committed connections are going to time out at
>>>> some point (30 sec by default for udp/icmp with the kernel datapath).
>>>> So the CMS will have to continuously monitor the contents of the
>>>> conntrack zone?  Aren't we just moving the CPU load somewhere else with
>>>> this?  Even so, there's a chance an entry is missed.
>>>
>>> Linux nf_conntrack module supports sending connection tracking events
>>> to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
>>> parameter). So, CMS can parse the stream of new connection events from
>>> conntrack and log the packets based on CT label.
>>>
>>
>> Ack.
>>
>>> An issue with sampling is that if there are a large number of packets
>>> for a particular connection(s), packets of other connections might not
>>> get sampled and we miss information about these connections. With the
>>> conntrack approach, we get a single event for each connection (until
>>> they time out), so there is lesser load on the CMS/collector and lesser
>>> likelihood of missing connections.
>>>
>>
>> I hadn't considered this advantage, thanks for pointing it out!
>>
>>>>
>>>>> Each logical port is assigned a new zone for committing dropped flows.
>>>>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>>>>>
>>>>> A new lflow action "ct_commit_drop" is introduced that commits flows
>>>>> to connection tracking table in a zone identified by
>>>>> MFF_LOG_ACL_DROP_ZONE register.
>>>>>
>>>>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>>>>> instead of silently dropping the packet.
>>>>>
>>>>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>>>>> ---
>>>>> controller/ovn-controller.c  | 23 ++++++++++++++---
>>>>> controller/physical.c        | 32 +++++++++++++++++++++--
>>>>> include/ovn/actions.h        |  1 +
>>>>> include/ovn/logical-fields.h |  1 +
>>>>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>>>>> lib/ovn-util.c               |  4 +--
>>>>> lib/ovn-util.h               |  2 +-
>>>>> northd/northd.c              | 14 ++++++++--
>>>>> northd/ovn-northd.8.xml      | 14 ++++++++--
>>>>> ovn-sb.xml                   | 17 ++++++++++++
>>>>> ovs                          |  2 +-
>>>>
>>>> This shouldn't need to change the OVS submodule.
>>>>
>>> My bad. Will fix this.
>>>
>>
>> Cool, thanks, looking forward for v1!
>>
>> Regards,
>> Dumitru
>>
>>> Thanks,
>>> Abhiram Sangana
>>>
>>>> Thanks,
>>>> Dumitru
>>>>
>>>>> utilities/ovn-nbctl.c        |  7 ++---
>>>>> 12 files changed, 151 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>> index c97744d57..1ad20fe55 100644
>>>>> --- a/controller/ovn-controller.c
>>>>> +++ b/controller/ovn-controller.c
>>>>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>>>>    unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>>>>>
>>>>>    struct shash_node *shash_node;
>>>>> +    const struct binding_lport *lport;
>>>>>    SHASH_FOR_EACH (shash_node, binding_lports) {
>>>>>        sset_add(&all_users, shash_node->name);
>>>>> +
>>>>> +        /* Zone for committing dropped connections of a vNIC. */
>>>>> +        lport = shash_node->data;
>>>>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>>>>> +        sset_add(&all_users, drop_zone);
>>>>> +        free(drop_zone);
>>>>>    }
>>>>>
>>>>>    /* Local patched datapath (gateway routers) need zones assigned. */
>>>>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>>>>    HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>>>>        /* XXX Add method to limit zone assignment to logical router
>>>>>         * datapaths with NAT */
>>>>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>>        sset_add(&all_users, dnat);
>>>>>        sset_add(&all_users, snat);
>>>>>        shash_add(&all_lds, dnat, ld);
>>>>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>>>>        /* Check if the requested snat zone has changed for the datapath
>>>>>         * or not.  If so, then fall back to full recompute of
>>>>>         * ct_zone engine. */
>>>>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>>>>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>>>>        struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>>>>                                                   snat_dp_zone_key);
>>>>>        free(snat_dp_zone_key);
>>>>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>>>>                                        &ct_zones_data->pending);
>>>>>                    updated = true;
>>>>>                }
>>>>> +                char *drop_zone = alloc_ct_zone_key(
>>>>> +                    &t_lport->pb->header_.uuid, "drop");
>>>>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>>>>> +                    alloc_id_to_ct_zone(drop_zone,
>>>>> +                                        &ct_zones_data->current,
>>>>> +                                        ct_zones_data->bitmap, &scan_start,
>>>>> +                                        &ct_zones_data->pending);
>>>>> +                    updated = true;
>>>>> +                }
>>>>> +                free(drop_zone);
>>>>>            } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>>>>                struct simap_node *ct_zone =
>>>>>                    simap_find(&ct_zones_data->current,
>>>>> diff --git a/controller/physical.c b/controller/physical.c
>>>>> index f3c8bddce..fc46669c1 100644
>>>>> --- a/controller/physical.c
>>>>> +++ b/controller/physical.c
>>>>> @@ -60,6 +60,7 @@ struct zone_ids {
>>>>>    int ct;                     /* MFF_LOG_CT_ZONE. */
>>>>>    int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>>>>    int snat;                   /* MFF_LOG_SNAT_ZONE. */
>>>>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>>>>> };
>>>>>
>>>>> struct tunnel {
>>>>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>>>>
>>>>>    const struct uuid *key = &binding->datapath->header_.uuid;
>>>>>
>>>>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>>>>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>>>>    zone_ids.dnat = simap_get(ct_zones, dnat);
>>>>>    free(dnat);
>>>>>
>>>>> -    char *snat = alloc_nat_zone_key(key, "snat");
>>>>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>>>>    zone_ids.snat = simap_get(ct_zones, snat);
>>>>>    free(snat);
>>>>>
>>>>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>>>>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>>>>> +    free(drop_zone);
>>>>> +
>>>>>    return zone_ids;
>>>>> }
>>>>>
>>>>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>>>>        if (zone_ids->snat) {
>>>>>            put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>>>>        }
>>>>> +        if (zone_ids->drop) {
>>>>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>>>>> +        }
>>>>>    }
>>>>> }
>>>>>
>>>>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>>>>                    pb->header_.uuid.parts[0], &match, ofpacts_p,
>>>>>                    &pb->header_.uuid);
>>>>>
>>>>> +    if (zone_ids->drop) {
>>>>> +        /* Table 39, Priority 1.
>>>>> +         * =======================
>>>>> +         *
>>>>> +         * Clear the logical registers (for consistent behavior with packets
>>>>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>>>>> +        match_init_catchall(&match);
>>>>> +        ofpbuf_clear(ofpacts_p);
>>>>> +        match_set_metadata(&match, htonll(dp_key));
>>>>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>>>>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>>>>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>>>>> +            }
>>>>> +        }
>>>>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>>>>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>>>>> +                        pb->datapath->header_.uuid.parts[0], &match,
>>>>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>>>>> +    }
>>>>> +
>>>>>    /* Table 39, Priority 100.
>>>>>     * =======================
>>>>>     *
>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>> index d7ee84dac..6424250a6 100644
>>>>> --- a/include/ovn/actions.h
>>>>> +++ b/include/ovn/actions.h
>>>>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>>>>    OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>>>>    OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>>>>    OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>>>>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>>>>>
>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>>>> index 3db7265e4..889f5f9e3 100644
>>>>> --- a/include/ovn/logical-fields.h
>>>>> +++ b/include/ovn/logical-fields.h
>>>>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>>>>> #define MFF_LOG_REG0             MFF_REG0
>>>>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>>>>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>>>>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>>>>>
>>>>> #define MFF_LOG_XXREG0           MFF_XXREG0
>>>>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>> index adbb42db4..dfff3e793 100644
>>>>> --- a/lib/actions.c
>>>>> +++ b/lib/actions.c
>>>>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>>>>                           MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>>>>> }
>>>>>
>>>>> +static void
>>>>> +parse_ct_commit_drop(struct action_context *ctx)
>>>>> +{
>>>>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>>>>> +{
>>>>> +    format_nested_action(on, "ct_commit_drop", s);
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>>>>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>>>>> +                      struct ofpbuf *ofpacts)
>>>>> +{
>>>>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>>>>> +    ct->flags = NX_CT_F_COMMIT;
>>>>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>>>>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>>>>> +    ct->zone_src.ofs = 0;
>>>>> +    ct->zone_src.n_bits = 16;
>>>>> +
>>>>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>>>>> +     * collisions at commit time between NATed and firewalled-only sessions.
>>>>> +     */
>>>>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>>>>> +        size_t nat_offset = ofpacts->size;
>>>>> +        ofpbuf_pull(ofpacts, nat_offset);
>>>>> +
>>>>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>>>>> +        nat->flags = 0;
>>>>> +        nat->range_af = AF_UNSPEC;
>>>>> +        nat->flags |= NX_NAT_F_SRC;
>>>>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>>>>> +        ct = ofpacts->header;
>>>>> +    }
>>>>> +
>>>>> +    size_t set_field_offset = ofpacts->size;
>>>>> +    ofpbuf_pull(ofpacts, set_field_offset);
>>>>> +
>>>>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>>>>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>>>>> +    ct = ofpacts->header;
>>>>> +    ofpact_finish(ofpacts, &ct->ofpact);
>>>>> +}
>>>>> +
>>>>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>>>>> static void
>>>>> parse_set_action(struct action_context *ctx)
>>>>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>>>>        parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>>>>    } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>>>>        parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>>>>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>>>>> +        parse_ct_commit_drop(ctx);
>>>>>    } else {
>>>>>        lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>    }
>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>>> index 616999eab..a72533a9e 100644
>>>>> --- a/lib/ovn-util.c
>>>>> +++ b/lib/ovn-util.c
>>>>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>>    destroy_lport_addresses(&laddrs);
>>>>> }
>>>>>
>>>>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>>>>> +/* Allocates a key for conntrack zone allocation for a provided
>>>>> * 'key' record and a 'type'.
>>>>> *
>>>>> * It is the caller's responsibility to free the allocated memory. */
>>>>> char *
>>>>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>>>>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>>>>> {
>>>>>    return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>>>>> }
>>>>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>>>>> index b3905ef7b..e0855f19e 100644
>>>>> --- a/lib/ovn-util.h
>>>>> +++ b/lib/ovn-util.h
>>>>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>>>>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>>                     struct svec *ipv6_addrs);
>>>>>
>>>>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>>>>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>>>>>
>>>>> const char *default_nb_db(void);
>>>>> const char *default_sb_db(void);
>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>> index 7e2681865..2aff3458c 100644
>>>>> --- a/northd/northd.c
>>>>> +++ b/northd/northd.c
>>>>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>>>> * +----+----------------------------------------------+ G |                  |
>>>>> * | R7 |                   UNUSED                     | 1 |                  |
>>>>> * +----+----------------------------------------------+---+------------------+
>>>>> - * | R8 |                   UNUSED                     |
>>>>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>>>> * +----+----------------------------------------------+
>>>>> * | R9 |                   UNUSED                     |
>>>>> * +----+----------------------------------------------+
>>>>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>>            } else {
>>>>>                ds_put_format(match, " && (%s)", acl->match);
>>>>>                build_acl_log(actions, acl, meter_groups);
>>>>> +                if (acl->label) {
>>>>> +                    ds_put_format(actions, "ct_commit_drop { "
>>>>> +                                  "ct_label.label = %"PRId64"; }; ",
>>>>> +                                  acl->label);
>>>>> +                }
>>>>>                ds_put_cstr(actions, "/* drop */");
>>>>>                ovn_lflow_add_with_hint(lflows, od, stage,
>>>>>                                        acl->priority + OVN_ACL_PRI_OFFSET,
>>>>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>>            ds_clear(match);
>>>>>            ds_clear(actions);
>>>>>            ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>>>>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>>>>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>>>>                          ct_blocked_match);
>>>>> +            if (acl->label) {
>>>>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>>>>> +                              acl->label);
>>>>> +            }
>>>>> +            ds_put_cstr(actions, "}; ");
>>>>>            if (!strcmp(acl->action, "reject")) {
>>>>>                build_reject_acl_rules(od, lflows, stage, acl, match,
>>>>>                                       actions, &acl->header_, meter_groups);
>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>>>> index b8f871394..af653be4d 100644
>>>>> --- a/northd/ovn-northd.8.xml
>>>>> +++ b/northd/ovn-northd.8.xml
>>>>> @@ -699,7 +699,12 @@
>>>>>        connections and <code>ct_commit(ct_label=1/1);</code> for known
>>>>>        connections.  Setting <code>ct_label</code> marks a connection
>>>>>        as one that was previously allowed, but should no longer be
>>>>> -        allowed due to a policy change.
>>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>>> +        then it translates to
>>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>>> +        untracked connections and
>>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>>> +        for known connections.
>>>>>      </li>
>>>>>    </ul>
>>>>>
>>>>> @@ -969,7 +974,12 @@
>>>>>        or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>>>>        known connections.  Setting <code>ct_label</code> marks a connection
>>>>>        as one that was previously allowed, but should no longer be
>>>>> -        allowed due to a policy change.
>>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>>> +        then it translates to
>>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>>> +        untracked connections and
>>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>>> +        for known connections.
>>>>>      </li>
>>>>>    </ul>
>>>>>
>>>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>>>> index 37a709f83..4e294a212 100644
>>>>> --- a/ovn-sb.xml
>>>>> +++ b/ovn-sb.xml
>>>>> @@ -1377,6 +1377,23 @@
>>>>>          </p>
>>>>>        </dd>
>>>>>
>>>>> +        <dt><code>ct_commit_drop { };</code></dt>
>>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>>>>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>>>>> +        <dd>
>>>>> +          <p>
>>>>> +            This action is identical to <code>ct_commit</code> except that the
>>>>> +            connection tracking entry is committed in a different zone.
>>>>> +          </p>
>>>>> +
>>>>> +          <p>
>>>>> +            This action was added to store connections dropped by ACLs in a
>>>>> +            separate zone that is managed independently of the
>>>>> +            <code>ct_commit</code> zone, for debugging.
>>>>> +          </p>
>>>>> +        </dd>
>>>>> +
>>>>>        <dt><code>ct_dnat;</code></dt>
>>>>>        <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>>>>        <dd>
>>>>> diff --git a/ovs b/ovs
>>>>> index 6f24c2bc7..d94cd0d3e 160000
>>>>> --- a/ovs
>>>>> +++ b/ovs
>>>>> @@ -1 +1 @@
>>>>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>>>>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>>>>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>>>>> index 3bbdbd998..07509f488 100644
>>>>> --- a/utilities/ovn-nbctl.c
>>>>> +++ b/utilities/ovn-nbctl.c
>>>>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>>>>    /* Set the ACL label */
>>>>>    const char *label = shash_find_data(&ctx->options, "--label");
>>>>>    if (label) {
>>>>> -      /* Ensure that the action is either allow or allow-related */
>>>>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>>>>> +      /* Ensure that the action is either allow or allow-related or drop */
>>>>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>>>>> +          strcmp(action, "drop")) {
>>>>>        ctl_error(ctx, "label can only be set with actions \"allow\" or "
>>>>> -                  "\"allow-related\"");
>>>>> +                  "\"allow-related\" or \"drop\"");
>>>>>        return;
>>>>>      }
>
Abhiram Sangana Nov. 1, 2022, 11:40 a.m. UTC | #6
> On 21 Oct 2022, at 10:58, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 10/20/22 17:34, Abhiram Sangana wrote:
>> Hi Dumitru,
>> 
>> Can you please check if the implementation for the proposal looks ok?
>> Will send out v1 with the review comments and tests.
>> Also, any ideas how we can selectively create new drop zones for ports.
>> Currently, I am assigning a new zone for dropped connections to every
>> port even if its parent LS doesn’t have drop ACLs with labels.
>> 
> 
> Within a logical switch do we really need a drop zone per port?  Isn't
> it actually enough if we add a "from-lport-drop" and "to-lport-drop"
> zone for the whole logical switch?  That should simplify zone allocation.
> 
Given that we are committing dropped connections to CT table, a DDOS
attack can potentially fill up the CT table. We are planning to send
another patch that limits the number of entries for a given CT zone.
It is easier to manage the size of CT zones when there is a CT zone
per port rather than per Logical_Switch.

>> Thanks,
>> Abhiram Sangana
>> 
>>> On 20 Oct 2022, at 15:18, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> On 10/20/22 15:49, Abhiram Sangana wrote:
>>>> Hi Dumitru,
>>>> 
>>>> Thanks for reviewing the patch.
>>>> 
>>>>> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>> 
>>>>> Hi Abhiram,
>>>>> 
>>>>> Thanks for the patch!  I only skimmed the changes so this is not a full
>>>>> review but more of a discussion starter.
>>>>> 
>>>>> On 10/18/22 17:33, Abhiram Sangana wrote:
>>>>>> To identify connections dropped by ACLs, users can enable logging for ACLs
>>>>>> but this approach does not scale. ACL logging uses "controller" action
>>>>>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>>>>>> ovn-controller to a lesser extent) even with metering enabled (observed
>>>>>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>>>>>> approach is to use drop sampling (patch by Adrian Moreno currently in
>>>>>> review) but we might miss specific connections of interest with this
>>>>>> approach.
>>>>>> 
>>>>>> This patch commits connections dropped by ACLs to the connection tracking
>>>>>> table with a specific ACL label that was introduced in 0e0228be (
>>>>>> northd: Add ACL label). The dropped connections are committed in a separate
>>>>>> CT zone so that they can be managed independently.
>>>>>> 
>>>>> 
>>>>> I'm not sure I understand how the CMS can manage this.  How is this
>>>>> better than sampling?  Committed connections are going to time out at
>>>>> some point (30 sec by default for udp/icmp with the kernel datapath).
>>>>> So the CMS will have to continuously monitor the contents of the
>>>>> conntrack zone?  Aren't we just moving the CPU load somewhere else with
>>>>> this?  Even so, there's a chance an entry is missed.
>>>> 
>>>> Linux nf_conntrack module supports sending connection tracking events
>>>> to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
>>>> parameter). So, CMS can parse the stream of new connection events from
>>>> conntrack and log the packets based on CT label.
>>>> 
>>> 
>>> Ack.
>>> 
>>>> An issue with sampling is that if there are a large number of packets
>>>> for a particular connection(s), packets of other connections might not
>>>> get sampled and we miss information about these connections. With the
>>>> conntrack approach, we get a single event for each connection (until
>>>> they time out), so there is lesser load on the CMS/collector and lesser
>>>> likelihood of missing connections.
>>>> 
>>> 
>>> I hadn't considered this advantage, thanks for pointing it out!
>>> 
>>>>> 
>>>>>> Each logical port is assigned a new zone for committing dropped flows.
>>>>>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>>>>>> 
>>>>>> A new lflow action "ct_commit_drop" is introduced that commits flows
>>>>>> to connection tracking table in a zone identified by
>>>>>> MFF_LOG_ACL_DROP_ZONE register.
>>>>>> 
>>>>>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>>>>>> instead of silently dropping the packet.
>>>>>> 
>>>>>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>>>>>> ---
>>>>>> controller/ovn-controller.c  | 23 ++++++++++++++---
>>>>>> controller/physical.c        | 32 +++++++++++++++++++++--
>>>>>> include/ovn/actions.h        |  1 +
>>>>>> include/ovn/logical-fields.h |  1 +
>>>>>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>>>>>> lib/ovn-util.c               |  4 +--
>>>>>> lib/ovn-util.h               |  2 +-
>>>>>> northd/northd.c              | 14 ++++++++--
>>>>>> northd/ovn-northd.8.xml      | 14 ++++++++--
>>>>>> ovn-sb.xml                   | 17 ++++++++++++
>>>>>> ovs                          |  2 +-
>>>>> 
>>>>> This shouldn't need to change the OVS submodule.
>>>>> 
>>>> My bad. Will fix this.
>>>> 
>>> 
>>> Cool, thanks, looking forward for v1!
>>> 
>>> Regards,
>>> Dumitru
>>> 
>>>> Thanks,
>>>> Abhiram Sangana
>>>> 
>>>>> Thanks,
>>>>> Dumitru
>>>>> 
>>>>>> utilities/ovn-nbctl.c        |  7 ++---
>>>>>> 12 files changed, 151 insertions(+), 16 deletions(-)
>>>>>> 
>>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>>> index c97744d57..1ad20fe55 100644
>>>>>> --- a/controller/ovn-controller.c
>>>>>> +++ b/controller/ovn-controller.c
>>>>>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>>>>>   unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>>>>>> 
>>>>>>   struct shash_node *shash_node;
>>>>>> +    const struct binding_lport *lport;
>>>>>>   SHASH_FOR_EACH (shash_node, binding_lports) {
>>>>>>       sset_add(&all_users, shash_node->name);
>>>>>> +
>>>>>> +        /* Zone for committing dropped connections of a vNIC. */
>>>>>> +        lport = shash_node->data;
>>>>>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>>>>>> +        sset_add(&all_users, drop_zone);
>>>>>> +        free(drop_zone);
>>>>>>   }
>>>>>> 
>>>>>>   /* Local patched datapath (gateway routers) need zones assigned. */
>>>>>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>>>>>   HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>>>>>       /* XXX Add method to limit zone assignment to logical router
>>>>>>        * datapaths with NAT */
>>>>>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>>>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>>>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>>>       sset_add(&all_users, dnat);
>>>>>>       sset_add(&all_users, snat);
>>>>>>       shash_add(&all_lds, dnat, ld);
>>>>>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>>>>>       /* Check if the requested snat zone has changed for the datapath
>>>>>>        * or not.  If so, then fall back to full recompute of
>>>>>>        * ct_zone engine. */
>>>>>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>>>>>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>>>>>       struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>>>>>                                                  snat_dp_zone_key);
>>>>>>       free(snat_dp_zone_key);
>>>>>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>>>>>                                       &ct_zones_data->pending);
>>>>>>                   updated = true;
>>>>>>               }
>>>>>> +                char *drop_zone = alloc_ct_zone_key(
>>>>>> +                    &t_lport->pb->header_.uuid, "drop");
>>>>>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>>>>>> +                    alloc_id_to_ct_zone(drop_zone,
>>>>>> +                                        &ct_zones_data->current,
>>>>>> +                                        ct_zones_data->bitmap, &scan_start,
>>>>>> +                                        &ct_zones_data->pending);
>>>>>> +                    updated = true;
>>>>>> +                }
>>>>>> +                free(drop_zone);
>>>>>>           } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>>>>>               struct simap_node *ct_zone =
>>>>>>                   simap_find(&ct_zones_data->current,
>>>>>> diff --git a/controller/physical.c b/controller/physical.c
>>>>>> index f3c8bddce..fc46669c1 100644
>>>>>> --- a/controller/physical.c
>>>>>> +++ b/controller/physical.c
>>>>>> @@ -60,6 +60,7 @@ struct zone_ids {
>>>>>>   int ct;                     /* MFF_LOG_CT_ZONE. */
>>>>>>   int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>>>>>   int snat;                   /* MFF_LOG_SNAT_ZONE. */
>>>>>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>>>>>> };
>>>>>> 
>>>>>> struct tunnel {
>>>>>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>>>>> 
>>>>>>   const struct uuid *key = &binding->datapath->header_.uuid;
>>>>>> 
>>>>>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>>>>>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>>>>>   zone_ids.dnat = simap_get(ct_zones, dnat);
>>>>>>   free(dnat);
>>>>>> 
>>>>>> -    char *snat = alloc_nat_zone_key(key, "snat");
>>>>>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>>>>>   zone_ids.snat = simap_get(ct_zones, snat);
>>>>>>   free(snat);
>>>>>> 
>>>>>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>>>>>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>>>>>> +    free(drop_zone);
>>>>>> +
>>>>>>   return zone_ids;
>>>>>> }
>>>>>> 
>>>>>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>>>>>       if (zone_ids->snat) {
>>>>>>           put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>>>>>       }
>>>>>> +        if (zone_ids->drop) {
>>>>>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>>>>>> +        }
>>>>>>   }
>>>>>> }
>>>>>> 
>>>>>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>>>>>                   pb->header_.uuid.parts[0], &match, ofpacts_p,
>>>>>>                   &pb->header_.uuid);
>>>>>> 
>>>>>> +    if (zone_ids->drop) {
>>>>>> +        /* Table 39, Priority 1.
>>>>>> +         * =======================
>>>>>> +         *
>>>>>> +         * Clear the logical registers (for consistent behavior with packets
>>>>>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>>>>>> +        match_init_catchall(&match);
>>>>>> +        ofpbuf_clear(ofpacts_p);
>>>>>> +        match_set_metadata(&match, htonll(dp_key));
>>>>>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>>>>>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>>>>>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>>>>>> +            }
>>>>>> +        }
>>>>>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>>>>>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>>>>>> +                        pb->datapath->header_.uuid.parts[0], &match,
>>>>>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>>>>>> +    }
>>>>>> +
>>>>>>   /* Table 39, Priority 100.
>>>>>>    * =======================
>>>>>>    *
>>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>>> index d7ee84dac..6424250a6 100644
>>>>>> --- a/include/ovn/actions.h
>>>>>> +++ b/include/ovn/actions.h
>>>>>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>>>>>   OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>>>>>   OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>>>>>   OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>>>>>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>>>>>> 
>>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>>>>> index 3db7265e4..889f5f9e3 100644
>>>>>> --- a/include/ovn/logical-fields.h
>>>>>> +++ b/include/ovn/logical-fields.h
>>>>>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>>>>>> #define MFF_LOG_REG0             MFF_REG0
>>>>>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>>>>>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>>>>>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>>>>>> 
>>>>>> #define MFF_LOG_XXREG0           MFF_XXREG0
>>>>>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>>> index adbb42db4..dfff3e793 100644
>>>>>> --- a/lib/actions.c
>>>>>> +++ b/lib/actions.c
>>>>>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>>>>>                          MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>>>>>> }
>>>>>> 
>>>>>> +static void
>>>>>> +parse_ct_commit_drop(struct action_context *ctx)
>>>>>> +{
>>>>>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>>>>>> +}
>>>>>> +
>>>>>> +static void
>>>>>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>>>>>> +{
>>>>>> +    format_nested_action(on, "ct_commit_drop", s);
>>>>>> +}
>>>>>> +
>>>>>> +static void
>>>>>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>>>>>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>>>>>> +                      struct ofpbuf *ofpacts)
>>>>>> +{
>>>>>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>>>>>> +    ct->flags = NX_CT_F_COMMIT;
>>>>>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>>>>>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>>>>>> +    ct->zone_src.ofs = 0;
>>>>>> +    ct->zone_src.n_bits = 16;
>>>>>> +
>>>>>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>>>>>> +     * collisions at commit time between NATed and firewalled-only sessions.
>>>>>> +     */
>>>>>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>>>>>> +        size_t nat_offset = ofpacts->size;
>>>>>> +        ofpbuf_pull(ofpacts, nat_offset);
>>>>>> +
>>>>>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>>>>>> +        nat->flags = 0;
>>>>>> +        nat->range_af = AF_UNSPEC;
>>>>>> +        nat->flags |= NX_NAT_F_SRC;
>>>>>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>>>>>> +        ct = ofpacts->header;
>>>>>> +    }
>>>>>> +
>>>>>> +    size_t set_field_offset = ofpacts->size;
>>>>>> +    ofpbuf_pull(ofpacts, set_field_offset);
>>>>>> +
>>>>>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>>>>>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>>>>>> +    ct = ofpacts->header;
>>>>>> +    ofpact_finish(ofpacts, &ct->ofpact);
>>>>>> +}
>>>>>> +
>>>>>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>>>>>> static void
>>>>>> parse_set_action(struct action_context *ctx)
>>>>>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>>>>>       parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>>>>>   } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>>>>>       parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>>>>>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>>>>>> +        parse_ct_commit_drop(ctx);
>>>>>>   } else {
>>>>>>       lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>>   }
>>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>>>> index 616999eab..a72533a9e 100644
>>>>>> --- a/lib/ovn-util.c
>>>>>> +++ b/lib/ovn-util.c
>>>>>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>>>   destroy_lport_addresses(&laddrs);
>>>>>> }
>>>>>> 
>>>>>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>>>>>> +/* Allocates a key for conntrack zone allocation for a provided
>>>>>> * 'key' record and a 'type'.
>>>>>> *
>>>>>> * It is the caller's responsibility to free the allocated memory. */
>>>>>> char *
>>>>>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>>>>>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>>>>>> {
>>>>>>   return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>>>>>> }
>>>>>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>>>>>> index b3905ef7b..e0855f19e 100644
>>>>>> --- a/lib/ovn-util.h
>>>>>> +++ b/lib/ovn-util.h
>>>>>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>>>>>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>>>                    struct svec *ipv6_addrs);
>>>>>> 
>>>>>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>>>>>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>>>>>> 
>>>>>> const char *default_nb_db(void);
>>>>>> const char *default_sb_db(void);
>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>> index 7e2681865..2aff3458c 100644
>>>>>> --- a/northd/northd.c
>>>>>> +++ b/northd/northd.c
>>>>>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>>>>> * +----+----------------------------------------------+ G |                  |
>>>>>> * | R7 |                   UNUSED                     | 1 |                  |
>>>>>> * +----+----------------------------------------------+---+------------------+
>>>>>> - * | R8 |                   UNUSED                     |
>>>>>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>>>>> * +----+----------------------------------------------+
>>>>>> * | R9 |                   UNUSED                     |
>>>>>> * +----+----------------------------------------------+
>>>>>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>>>           } else {
>>>>>>               ds_put_format(match, " && (%s)", acl->match);
>>>>>>               build_acl_log(actions, acl, meter_groups);
>>>>>> +                if (acl->label) {
>>>>>> +                    ds_put_format(actions, "ct_commit_drop { "
>>>>>> +                                  "ct_label.label = %"PRId64"; }; ",
>>>>>> +                                  acl->label);
>>>>>> +                }
>>>>>>               ds_put_cstr(actions, "/* drop */");
>>>>>>               ovn_lflow_add_with_hint(lflows, od, stage,
>>>>>>                                       acl->priority + OVN_ACL_PRI_OFFSET,
>>>>>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>>>           ds_clear(match);
>>>>>>           ds_clear(actions);
>>>>>>           ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>>>>>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>>>>>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>>>>>                         ct_blocked_match);
>>>>>> +            if (acl->label) {
>>>>>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>>>>>> +                              acl->label);
>>>>>> +            }
>>>>>> +            ds_put_cstr(actions, "}; ");
>>>>>>           if (!strcmp(acl->action, "reject")) {
>>>>>>               build_reject_acl_rules(od, lflows, stage, acl, match,
>>>>>>                                      actions, &acl->header_, meter_groups);
>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>>>>> index b8f871394..af653be4d 100644
>>>>>> --- a/northd/ovn-northd.8.xml
>>>>>> +++ b/northd/ovn-northd.8.xml
>>>>>> @@ -699,7 +699,12 @@
>>>>>>       connections and <code>ct_commit(ct_label=1/1);</code> for known
>>>>>>       connections.  Setting <code>ct_label</code> marks a connection
>>>>>>       as one that was previously allowed, but should no longer be
>>>>>> -        allowed due to a policy change.
>>>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>>>> +        then it translates to
>>>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>>>> +        untracked connections and
>>>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>>>> +        for known connections.
>>>>>>     </li>
>>>>>>   </ul>
>>>>>> 
>>>>>> @@ -969,7 +974,12 @@
>>>>>>       or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>>>>>       known connections.  Setting <code>ct_label</code> marks a connection
>>>>>>       as one that was previously allowed, but should no longer be
>>>>>> -        allowed due to a policy change.
>>>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>>>> +        then it translates to
>>>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>>>> +        untracked connections and
>>>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>>>> +        for known connections.
>>>>>>     </li>
>>>>>>   </ul>
>>>>>> 
>>>>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>>>>> index 37a709f83..4e294a212 100644
>>>>>> --- a/ovn-sb.xml
>>>>>> +++ b/ovn-sb.xml
>>>>>> @@ -1377,6 +1377,23 @@
>>>>>>         </p>
>>>>>>       </dd>
>>>>>> 
>>>>>> +        <dt><code>ct_commit_drop { };</code></dt>
>>>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>>>>>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>>>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>>>>>> +        <dd>
>>>>>> +          <p>
>>>>>> +            This action is identical to <code>ct_commit</code> except that the
>>>>>> +            connection tracking entry is committed in a different zone.
>>>>>> +          </p>
>>>>>> +
>>>>>> +          <p>
>>>>>> +            This action was added to store connections dropped by ACLs in a
>>>>>> +            separate zone that is managed independently of the
>>>>>> +            <code>ct_commit</code> zone, for debugging.
>>>>>> +          </p>
>>>>>> +        </dd>
>>>>>> +
>>>>>>       <dt><code>ct_dnat;</code></dt>
>>>>>>       <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>>>>>       <dd>
>>>>>> diff --git a/ovs b/ovs
>>>>>> index 6f24c2bc7..d94cd0d3e 160000
>>>>>> --- a/ovs
>>>>>> +++ b/ovs
>>>>>> @@ -1 +1 @@
>>>>>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>>>>>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>>>>>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>>>>>> index 3bbdbd998..07509f488 100644
>>>>>> --- a/utilities/ovn-nbctl.c
>>>>>> +++ b/utilities/ovn-nbctl.c
>>>>>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>>>>>   /* Set the ACL label */
>>>>>>   const char *label = shash_find_data(&ctx->options, "--label");
>>>>>>   if (label) {
>>>>>> -      /* Ensure that the action is either allow or allow-related */
>>>>>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>>>>>> +      /* Ensure that the action is either allow or allow-related or drop */
>>>>>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>>>>>> +          strcmp(action, "drop")) {
>>>>>>       ctl_error(ctx, "label can only be set with actions \"allow\" or "
>>>>>> -                  "\"allow-related\"");
>>>>>> +                  "\"allow-related\" or \"drop\"");
>>>>>>       return;
>>>>>>     }
>> 
>
Dumitru Ceara Nov. 1, 2022, 12:35 p.m. UTC | #7
On 11/1/22 12:40, Abhiram Sangana wrote:
> 
> 
>> On 21 Oct 2022, at 10:58, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 10/20/22 17:34, Abhiram Sangana wrote:
>>> Hi Dumitru,
>>>
>>> Can you please check if the implementation for the proposal looks ok?
>>> Will send out v1 with the review comments and tests.
>>> Also, any ideas how we can selectively create new drop zones for ports.
>>> Currently, I am assigning a new zone for dropped connections to every
>>> port even if its parent LS doesn’t have drop ACLs with labels.
>>>
>>
>> Within a logical switch do we really need a drop zone per port?  Isn't
>> it actually enough if we add a "from-lport-drop" and "to-lport-drop"
>> zone for the whole logical switch?  That should simplify zone allocation.
>>
> Given that we are committing dropped connections to CT table, a DDOS
> attack can potentially fill up the CT table. We are planning to send
> another patch that limits the number of entries for a given CT zone.
> It is easier to manage the size of CT zones when there is a CT zone
> per port rather than per Logical_Switch.
> 

Ok.  To answer your previous question, we could mark the SB.Port_Binding
with an option if there are ACLs applied to the switch containing it and
at least one of those needs drop zones.  Even better would be to mark
the the SB.Datapath_Binding but we don't have options there, we'd have
to update the schema.

I'm not sure if that works for you, what do you think?

Thanks,
Dumitru
Abhiram Sangana Nov. 1, 2022, 4:52 p.m. UTC | #8
> On 1 Nov 2022, at 12:35, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 11/1/22 12:40, Abhiram Sangana wrote:
>> 
>> 
>>> On 21 Oct 2022, at 10:58, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> On 10/20/22 17:34, Abhiram Sangana wrote:
>>>> Hi Dumitru,
>>>> 
>>>> Can you please check if the implementation for the proposal looks ok?
>>>> Will send out v1 with the review comments and tests.
>>>> Also, any ideas how we can selectively create new drop zones for ports.
>>>> Currently, I am assigning a new zone for dropped connections to every
>>>> port even if its parent LS doesn’t have drop ACLs with labels.
>>>> 
>>> 
>>> Within a logical switch do we really need a drop zone per port?  Isn't
>>> it actually enough if we add a "from-lport-drop" and "to-lport-drop"
>>> zone for the whole logical switch?  That should simplify zone allocation.
>>> 
>> Given that we are committing dropped connections to CT table, a DDOS
>> attack can potentially fill up the CT table. We are planning to send
>> another patch that limits the number of entries for a given CT zone.
>> It is easier to manage the size of CT zones when there is a CT zone
>> per port rather than per Logical_Switch.
>> 
> 
> Ok.  To answer your previous question, we could mark the SB.Port_Binding
> with an option if there are ACLs applied to the switch containing it and
> at least one of those needs drop zones.  Even better would be to mark
> the the SB.Datapath_Binding but we don't have options there, we'd have
> to update the schema.
> 
> I'm not sure if that works for you, what do you think?
> 
> Thanks,
> Dumitru

Yes, I think that should work. I will send out a v1 patch with these changes.
Thank you, Dumitru.

Thanks,
Abhiram
Adrian Moreno Nov. 14, 2022, 1:05 p.m. UTC | #9
Hi,

On 10/20/22 15:49, Abhiram Sangana wrote:
> Hi Dumitru,
> 
> Thanks for reviewing the patch.
> 
>> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> Hi Abhiram,
>>
>> Thanks for the patch!  I only skimmed the changes so this is not a full
>> review but more of a discussion starter.
>>
>> On 10/18/22 17:33, Abhiram Sangana wrote:
>>> To identify connections dropped by ACLs, users can enable logging for ACLs
>>> but this approach does not scale. ACL logging uses "controller" action
>>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>>> ovn-controller to a lesser extent) even with metering enabled (observed
>>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>>> approach is to use drop sampling (patch by Adrian Moreno currently in
>>> review) but we might miss specific connections of interest with this
>>> approach.
>>>
>>> This patch commits connections dropped by ACLs to the connection tracking
>>> table with a specific ACL label that was introduced in 0e0228be (
>>> northd: Add ACL label). The dropped connections are committed in a separate
>>> CT zone so that they can be managed independently.
>>>
>>
>> I'm not sure I understand how the CMS can manage this.  How is this
>> better than sampling?  Committed connections are going to time out at
>> some point (30 sec by default for udp/icmp with the kernel datapath).
>> So the CMS will have to continuously monitor the contents of the
>> conntrack zone?  Aren't we just moving the CPU load somewhere else with
>> this?  Even so, there's a chance an entry is missed.
> 
> Linux nf_conntrack module supports sending connection tracking events
> to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
> parameter). So, CMS can parse the stream of new connection events from
> conntrack and log the packets based on CT label.
> 

Isn't this datapath-specific? this won't be available for the netdev datapath, 
right?

> An issue with sampling is that if there are a large number of packets
> for a particular connection(s), packets of other connections might not
> get sampled and we miss information about these connections. With the
> conntrack approach, we get a single event for each connection (until
> they time out), so there is lesser load on the CMS/collector and lesser
> likelihood of missing connections.

I don't see why sampling would inherently miss packets. The current RFC for ACL 
sampling (different from the generic drop sampling one) allows the user to 
specify a probability per ACL so 100% of packets can be sampled in connection 
establishment drops while we sample N% of accepted traffic having a lot of 
flexibility in the inevitable performance vs accuracy tradeoff.

I'd like to better understand what are the limitations of the current approach 
to see if it can be improved in any way.

Thanks,
--
Adrián

> 
>>
>>> Each logical port is assigned a new zone for committing dropped flows.
>>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>>>
>>> A new lflow action "ct_commit_drop" is introduced that commits flows
>>> to connection tracking table in a zone identified by
>>> MFF_LOG_ACL_DROP_ZONE register.
>>>
>>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>>> instead of silently dropping the packet.
>>>
>>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>>> ---
>>> controller/ovn-controller.c  | 23 ++++++++++++++---
>>> controller/physical.c        | 32 +++++++++++++++++++++--
>>> include/ovn/actions.h        |  1 +
>>> include/ovn/logical-fields.h |  1 +
>>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>>> lib/ovn-util.c               |  4 +--
>>> lib/ovn-util.h               |  2 +-
>>> northd/northd.c              | 14 ++++++++--
>>> northd/ovn-northd.8.xml      | 14 ++++++++--
>>> ovn-sb.xml                   | 17 ++++++++++++
>>> ovs                          |  2 +-
>>
>> This shouldn't need to change the OVS submodule.
>>
> My bad. Will fix this.
> 
> Thanks,
> Abhiram Sangana
> 
>> Thanks,
>> Dumitru
>>
>>> utilities/ovn-nbctl.c        |  7 ++---
>>> 12 files changed, 151 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index c97744d57..1ad20fe55 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>>      unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>>>
>>>      struct shash_node *shash_node;
>>> +    const struct binding_lport *lport;
>>>      SHASH_FOR_EACH (shash_node, binding_lports) {
>>>          sset_add(&all_users, shash_node->name);
>>> +
>>> +        /* Zone for committing dropped connections of a vNIC. */
>>> +        lport = shash_node->data;
>>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>>> +        sset_add(&all_users, drop_zone);
>>> +        free(drop_zone);
>>>      }
>>>
>>>      /* Local patched datapath (gateway routers) need zones assigned. */
>>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>>      HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>>          /* XXX Add method to limit zone assignment to logical router
>>>           * datapaths with NAT */
>>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>>          sset_add(&all_users, dnat);
>>>          sset_add(&all_users, snat);
>>>          shash_add(&all_lds, dnat, ld);
>>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>>          /* Check if the requested snat zone has changed for the datapath
>>>           * or not.  If so, then fall back to full recompute of
>>>           * ct_zone engine. */
>>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>>          struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>>                                                     snat_dp_zone_key);
>>>          free(snat_dp_zone_key);
>>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>>                                          &ct_zones_data->pending);
>>>                      updated = true;
>>>                  }
>>> +                char *drop_zone = alloc_ct_zone_key(
>>> +                    &t_lport->pb->header_.uuid, "drop");
>>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>>> +                    alloc_id_to_ct_zone(drop_zone,
>>> +                                        &ct_zones_data->current,
>>> +                                        ct_zones_data->bitmap, &scan_start,
>>> +                                        &ct_zones_data->pending);
>>> +                    updated = true;
>>> +                }
>>> +                free(drop_zone);
>>>              } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>>                  struct simap_node *ct_zone =
>>>                      simap_find(&ct_zones_data->current,
>>> diff --git a/controller/physical.c b/controller/physical.c
>>> index f3c8bddce..fc46669c1 100644
>>> --- a/controller/physical.c
>>> +++ b/controller/physical.c
>>> @@ -60,6 +60,7 @@ struct zone_ids {
>>>      int ct;                     /* MFF_LOG_CT_ZONE. */
>>>      int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>>      int snat;                   /* MFF_LOG_SNAT_ZONE. */
>>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>>> };
>>>
>>> struct tunnel {
>>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>>
>>>      const struct uuid *key = &binding->datapath->header_.uuid;
>>>
>>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>>      zone_ids.dnat = simap_get(ct_zones, dnat);
>>>      free(dnat);
>>>
>>> -    char *snat = alloc_nat_zone_key(key, "snat");
>>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>>      zone_ids.snat = simap_get(ct_zones, snat);
>>>      free(snat);
>>>
>>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>>> +    free(drop_zone);
>>> +
>>>      return zone_ids;
>>> }
>>>
>>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>>          if (zone_ids->snat) {
>>>              put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>>          }
>>> +        if (zone_ids->drop) {
>>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>>> +        }
>>>      }
>>> }
>>>
>>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>>                      pb->header_.uuid.parts[0], &match, ofpacts_p,
>>>                      &pb->header_.uuid);
>>>
>>> +    if (zone_ids->drop) {
>>> +        /* Table 39, Priority 1.
>>> +         * =======================
>>> +         *
>>> +         * Clear the logical registers (for consistent behavior with packets
>>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>>> +        match_init_catchall(&match);
>>> +        ofpbuf_clear(ofpacts_p);
>>> +        match_set_metadata(&match, htonll(dp_key));
>>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>>> +            }
>>> +        }
>>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>>> +                        pb->datapath->header_.uuid.parts[0], &match,
>>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>>> +    }
>>> +
>>>      /* Table 39, Priority 100.
>>>       * =======================
>>>       *
>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>> index d7ee84dac..6424250a6 100644
>>> --- a/include/ovn/actions.h
>>> +++ b/include/ovn/actions.h
>>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>>      OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>>      OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>>      OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>>>
>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>> enum OVS_PACKED_ENUM ovnact_type {
>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>> index 3db7265e4..889f5f9e3 100644
>>> --- a/include/ovn/logical-fields.h
>>> +++ b/include/ovn/logical-fields.h
>>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>>> #define MFF_LOG_REG0             MFF_REG0
>>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>>>
>>> #define MFF_LOG_XXREG0           MFF_XXREG0
>>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>>> diff --git a/lib/actions.c b/lib/actions.c
>>> index adbb42db4..dfff3e793 100644
>>> --- a/lib/actions.c
>>> +++ b/lib/actions.c
>>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>>                             MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>>> }
>>>
>>> +static void
>>> +parse_ct_commit_drop(struct action_context *ctx)
>>> +{
>>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>>> +}
>>> +
>>> +static void
>>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>>> +{
>>> +    format_nested_action(on, "ct_commit_drop", s);
>>> +}
>>> +
>>> +static void
>>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>>> +                      struct ofpbuf *ofpacts)
>>> +{
>>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>>> +    ct->flags = NX_CT_F_COMMIT;
>>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>>> +    ct->zone_src.ofs = 0;
>>> +    ct->zone_src.n_bits = 16;
>>> +
>>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>>> +     * collisions at commit time between NATed and firewalled-only sessions.
>>> +     */
>>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>>> +        size_t nat_offset = ofpacts->size;
>>> +        ofpbuf_pull(ofpacts, nat_offset);
>>> +
>>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>>> +        nat->flags = 0;
>>> +        nat->range_af = AF_UNSPEC;
>>> +        nat->flags |= NX_NAT_F_SRC;
>>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>>> +        ct = ofpacts->header;
>>> +    }
>>> +
>>> +    size_t set_field_offset = ofpacts->size;
>>> +    ofpbuf_pull(ofpacts, set_field_offset);
>>> +
>>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>>> +    ct = ofpacts->header;
>>> +    ofpact_finish(ofpacts, &ct->ofpact);
>>> +}
>>> +
>>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>>> static void
>>> parse_set_action(struct action_context *ctx)
>>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>>          parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>>      } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>>          parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>>> +        parse_ct_commit_drop(ctx);
>>>      } else {
>>>          lexer_syntax_error(ctx->lexer, "expecting action");
>>>      }
>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>> index 616999eab..a72533a9e 100644
>>> --- a/lib/ovn-util.c
>>> +++ b/lib/ovn-util.c
>>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>      destroy_lport_addresses(&laddrs);
>>> }
>>>
>>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>>> +/* Allocates a key for conntrack zone allocation for a provided
>>>   * 'key' record and a 'type'.
>>>   *
>>>   * It is the caller's responsibility to free the allocated memory. */
>>> char *
>>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>>> {
>>>      return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>>> }
>>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>>> index b3905ef7b..e0855f19e 100644
>>> --- a/lib/ovn-util.h
>>> +++ b/lib/ovn-util.h
>>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>                       struct svec *ipv6_addrs);
>>>
>>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>>>
>>> const char *default_nb_db(void);
>>> const char *default_sb_db(void);
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 7e2681865..2aff3458c 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>>   * +----+----------------------------------------------+ G |                  |
>>>   * | R7 |                   UNUSED                     | 1 |                  |
>>>   * +----+----------------------------------------------+---+------------------+
>>> - * | R8 |                   UNUSED                     |
>>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>>   * +----+----------------------------------------------+
>>>   * | R9 |                   UNUSED                     |
>>>   * +----+----------------------------------------------+
>>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>              } else {
>>>                  ds_put_format(match, " && (%s)", acl->match);
>>>                  build_acl_log(actions, acl, meter_groups);
>>> +                if (acl->label) {
>>> +                    ds_put_format(actions, "ct_commit_drop { "
>>> +                                  "ct_label.label = %"PRId64"; }; ",
>>> +                                  acl->label);
>>> +                }
>>>                  ds_put_cstr(actions, "/* drop */");
>>>                  ovn_lflow_add_with_hint(lflows, od, stage,
>>>                                          acl->priority + OVN_ACL_PRI_OFFSET,
>>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>              ds_clear(match);
>>>              ds_clear(actions);
>>>              ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>>                            ct_blocked_match);
>>> +            if (acl->label) {
>>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>>> +                              acl->label);
>>> +            }
>>> +            ds_put_cstr(actions, "}; ");
>>>              if (!strcmp(acl->action, "reject")) {
>>>                  build_reject_acl_rules(od, lflows, stage, acl, match,
>>>                                         actions, &acl->header_, meter_groups);
>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>> index b8f871394..af653be4d 100644
>>> --- a/northd/ovn-northd.8.xml
>>> +++ b/northd/ovn-northd.8.xml
>>> @@ -699,7 +699,12 @@
>>>          connections and <code>ct_commit(ct_label=1/1);</code> for known
>>>          connections.  Setting <code>ct_label</code> marks a connection
>>>          as one that was previously allowed, but should no longer be
>>> -        allowed due to a policy change.
>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>> +        then it translates to
>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>> +        untracked connections and
>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>> +        for known connections.
>>>        </li>
>>>      </ul>
>>>
>>> @@ -969,7 +974,12 @@
>>>          or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>>          known connections.  Setting <code>ct_label</code> marks a connection
>>>          as one that was previously allowed, but should no longer be
>>> -        allowed due to a policy change.
>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>> +        then it translates to
>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>> +        untracked connections and
>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>> +        for known connections.
>>>        </li>
>>>      </ul>
>>>
>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>> index 37a709f83..4e294a212 100644
>>> --- a/ovn-sb.xml
>>> +++ b/ovn-sb.xml
>>> @@ -1377,6 +1377,23 @@
>>>            </p>
>>>          </dd>
>>>
>>> +        <dt><code>ct_commit_drop { };</code></dt>
>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>>> +        <dd>
>>> +          <p>
>>> +            This action is identical to <code>ct_commit</code> except that the
>>> +            connection tracking entry is committed in a different zone.
>>> +          </p>
>>> +
>>> +          <p>
>>> +            This action was added to store connections dropped by ACLs in a
>>> +            separate zone that is managed independently of the
>>> +            <code>ct_commit</code> zone, for debugging.
>>> +          </p>
>>> +        </dd>
>>> +
>>>          <dt><code>ct_dnat;</code></dt>
>>>          <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>>          <dd>
>>> diff --git a/ovs b/ovs
>>> index 6f24c2bc7..d94cd0d3e 160000
>>> --- a/ovs
>>> +++ b/ovs
>>> @@ -1 +1 @@
>>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>>> index 3bbdbd998..07509f488 100644
>>> --- a/utilities/ovn-nbctl.c
>>> +++ b/utilities/ovn-nbctl.c
>>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>>      /* Set the ACL label */
>>>      const char *label = shash_find_data(&ctx->options, "--label");
>>>      if (label) {
>>> -      /* Ensure that the action is either allow or allow-related */
>>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>>> +      /* Ensure that the action is either allow or allow-related or drop */
>>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>>> +          strcmp(action, "drop")) {
>>>          ctl_error(ctx, "label can only be set with actions \"allow\" or "
>>> -                  "\"allow-related\"");
>>> +                  "\"allow-related\" or \"drop\"");
>>>          return;
>>>        }
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Abhiram Sangana Nov. 21, 2022, 1:56 p.m. UTC | #10
Hi Adrian,

I apologise for the delay in replying back.

> On 14 Nov 2022, at 13:05, Adrian Moreno <amorenoz@redhat.com> wrote:
> 
> Hi,
> 
> On 10/20/22 15:49, Abhiram Sangana wrote:
>> Hi Dumitru,
>> Thanks for reviewing the patch.
>>> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> Hi Abhiram,
>>> 
>>> Thanks for the patch!  I only skimmed the changes so this is not a full
>>> review but more of a discussion starter.
>>> 
>>> On 10/18/22 17:33, Abhiram Sangana wrote:
>>>> To identify connections dropped by ACLs, users can enable logging for ACLs
>>>> but this approach does not scale. ACL logging uses "controller" action
>>>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>>>> ovn-controller to a lesser extent) even with metering enabled (observed
>>>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>>>> approach is to use drop sampling (patch by Adrian Moreno currently in
>>>> review) but we might miss specific connections of interest with this
>>>> approach.
>>>> 
>>>> This patch commits connections dropped by ACLs to the connection tracking
>>>> table with a specific ACL label that was introduced in 0e0228be (
>>>> northd: Add ACL label). The dropped connections are committed in a separate
>>>> CT zone so that they can be managed independently.
>>>> 
>>> 
>>> I'm not sure I understand how the CMS can manage this.  How is this
>>> better than sampling?  Committed connections are going to time out at
>>> some point (30 sec by default for udp/icmp with the kernel datapath).
>>> So the CMS will have to continuously monitor the contents of the
>>> conntrack zone?  Aren't we just moving the CPU load somewhere else with
>>> this?  Even so, there's a chance an entry is missed.
>> Linux nf_conntrack module supports sending connection tracking events
>> to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
>> parameter). So, CMS can parse the stream of new connection events from
>> conntrack and log the packets based on CT label.
> 
> Isn't this datapath-specific? this won't be available for the netdev datapath, right?

Yes, this approach is datapath specific - I haven’t checked how to send connection tracking events for netdev datapath.
> 
>> An issue with sampling is that if there are a large number of packets
>> for a particular connection(s), packets of other connections might not
>> get sampled and we miss information about these connections. With the
>> conntrack approach, we get a single event for each connection (until
>> they time out), so there is lesser load on the CMS/collector and lesser
>> likelihood of missing connections.
> 
> I don't see why sampling would inherently miss packets. The current RFC for ACL sampling (different from the generic drop sampling one) allows the user to specify a probability per ACL so 100% of packets can be sampled in connection establishment drops while we sample N% of accepted traffic having a lot of flexibility in the inevitable performance vs accuracy tradeoff.
> 
> I'd like to better understand what are the limitations of the current approach to see if it can be improved in any way.

I haven’t experimented with flow-based IPFIX sampling but I noticed high CPU usage of ovs-vswitchd while trying to export Netflow records (which I think is similar to 100% sampling) with large number of connections in OVN bridge. I was expecting a similar cost with respect to upcalls if we use 100% sampling rate for drop ACLs when there are multiple connection establishment packets in the bridge.

Thanks,
Abhiram

> Thanks,
> --
> Adrián
> 
>>> 
>>>> Each logical port is assigned a new zone for committing dropped flows.
>>>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>>>> 
>>>> A new lflow action "ct_commit_drop" is introduced that commits flows
>>>> to connection tracking table in a zone identified by
>>>> MFF_LOG_ACL_DROP_ZONE register.
>>>> 
>>>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>>>> instead of silently dropping the packet.
>>>> 
>>>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>>>> ---
>>>> controller/ovn-controller.c  | 23 ++++++++++++++---
>>>> controller/physical.c        | 32 +++++++++++++++++++++--
>>>> include/ovn/actions.h        |  1 +
>>>> include/ovn/logical-fields.h |  1 +
>>>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>>>> lib/ovn-util.c               |  4 +--
>>>> lib/ovn-util.h               |  2 +-
>>>> northd/northd.c              | 14 ++++++++--
>>>> northd/ovn-northd.8.xml      | 14 ++++++++--
>>>> ovn-sb.xml                   | 17 ++++++++++++
>>>> ovs                          |  2 +-
>>> 
>>> This shouldn't need to change the OVS submodule.
>>> 
>> My bad. Will fix this.
>> Thanks,
>> Abhiram Sangana
>>> Thanks,
>>> Dumitru
>>> 
>>>> utilities/ovn-nbctl.c        |  7 ++---
>>>> 12 files changed, 151 insertions(+), 16 deletions(-)
>>>> 
>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>> index c97744d57..1ad20fe55 100644
>>>> --- a/controller/ovn-controller.c
>>>> +++ b/controller/ovn-controller.c
>>>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>>>     unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>>>> 
>>>>     struct shash_node *shash_node;
>>>> +    const struct binding_lport *lport;
>>>>     SHASH_FOR_EACH (shash_node, binding_lports) {
>>>>         sset_add(&all_users, shash_node->name);
>>>> +
>>>> +        /* Zone for committing dropped connections of a vNIC. */
>>>> +        lport = shash_node->data;
>>>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>>>> +        sset_add(&all_users, drop_zone);
>>>> +        free(drop_zone);
>>>>     }
>>>> 
>>>>     /* Local patched datapath (gateway routers) need zones assigned. */
>>>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>>>     HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>>>         /* XXX Add method to limit zone assignment to logical router
>>>>          * datapaths with NAT */
>>>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>>>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>         sset_add(&all_users, dnat);
>>>>         sset_add(&all_users, snat);
>>>>         shash_add(&all_lds, dnat, ld);
>>>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>>>         /* Check if the requested snat zone has changed for the datapath
>>>>          * or not.  If so, then fall back to full recompute of
>>>>          * ct_zone engine. */
>>>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>>>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>>>         struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>>>                                                    snat_dp_zone_key);
>>>>         free(snat_dp_zone_key);
>>>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>>>                                         &ct_zones_data->pending);
>>>>                     updated = true;
>>>>                 }
>>>> +                char *drop_zone = alloc_ct_zone_key(
>>>> +                    &t_lport->pb->header_.uuid, "drop");
>>>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>>>> +                    alloc_id_to_ct_zone(drop_zone,
>>>> +                                        &ct_zones_data->current,
>>>> +                                        ct_zones_data->bitmap, &scan_start,
>>>> +                                        &ct_zones_data->pending);
>>>> +                    updated = true;
>>>> +                }
>>>> +                free(drop_zone);
>>>>             } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>>>                 struct simap_node *ct_zone =
>>>>                     simap_find(&ct_zones_data->current,
>>>> diff --git a/controller/physical.c b/controller/physical.c
>>>> index f3c8bddce..fc46669c1 100644
>>>> --- a/controller/physical.c
>>>> +++ b/controller/physical.c
>>>> @@ -60,6 +60,7 @@ struct zone_ids {
>>>>     int ct;                     /* MFF_LOG_CT_ZONE. */
>>>>     int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>>>     int snat;                   /* MFF_LOG_SNAT_ZONE. */
>>>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>>>> };
>>>> 
>>>> struct tunnel {
>>>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>>> 
>>>>     const struct uuid *key = &binding->datapath->header_.uuid;
>>>> 
>>>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>>>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>>>     zone_ids.dnat = simap_get(ct_zones, dnat);
>>>>     free(dnat);
>>>> 
>>>> -    char *snat = alloc_nat_zone_key(key, "snat");
>>>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>>>     zone_ids.snat = simap_get(ct_zones, snat);
>>>>     free(snat);
>>>> 
>>>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>>>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>>>> +    free(drop_zone);
>>>> +
>>>>     return zone_ids;
>>>> }
>>>> 
>>>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>>>         if (zone_ids->snat) {
>>>>             put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>>>         }
>>>> +        if (zone_ids->drop) {
>>>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>>>> +        }
>>>>     }
>>>> }
>>>> 
>>>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>>>                     pb->header_.uuid.parts[0], &match, ofpacts_p,
>>>>                     &pb->header_.uuid);
>>>> 
>>>> +    if (zone_ids->drop) {
>>>> +        /* Table 39, Priority 1.
>>>> +         * =======================
>>>> +         *
>>>> +         * Clear the logical registers (for consistent behavior with packets
>>>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>>>> +        match_init_catchall(&match);
>>>> +        ofpbuf_clear(ofpacts_p);
>>>> +        match_set_metadata(&match, htonll(dp_key));
>>>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>>>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>>>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>>>> +            }
>>>> +        }
>>>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>>>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>>>> +                        pb->datapath->header_.uuid.parts[0], &match,
>>>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>>>> +    }
>>>> +
>>>>     /* Table 39, Priority 100.
>>>>      * =======================
>>>>      *
>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>> index d7ee84dac..6424250a6 100644
>>>> --- a/include/ovn/actions.h
>>>> +++ b/include/ovn/actions.h
>>>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>>>     OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>>>     OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>>>     OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>>>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>>>> 
>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>>> index 3db7265e4..889f5f9e3 100644
>>>> --- a/include/ovn/logical-fields.h
>>>> +++ b/include/ovn/logical-fields.h
>>>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>>>> #define MFF_LOG_REG0             MFF_REG0
>>>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>>>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>>>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>>>> 
>>>> #define MFF_LOG_XXREG0           MFF_XXREG0
>>>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>> index adbb42db4..dfff3e793 100644
>>>> --- a/lib/actions.c
>>>> +++ b/lib/actions.c
>>>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>>>                            MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>>>> }
>>>> 
>>>> +static void
>>>> +parse_ct_commit_drop(struct action_context *ctx)
>>>> +{
>>>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>>>> +}
>>>> +
>>>> +static void
>>>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>>>> +{
>>>> +    format_nested_action(on, "ct_commit_drop", s);
>>>> +}
>>>> +
>>>> +static void
>>>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>>>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>>>> +                      struct ofpbuf *ofpacts)
>>>> +{
>>>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>>>> +    ct->flags = NX_CT_F_COMMIT;
>>>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>>>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>>>> +    ct->zone_src.ofs = 0;
>>>> +    ct->zone_src.n_bits = 16;
>>>> +
>>>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>>>> +     * collisions at commit time between NATed and firewalled-only sessions.
>>>> +     */
>>>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>>>> +        size_t nat_offset = ofpacts->size;
>>>> +        ofpbuf_pull(ofpacts, nat_offset);
>>>> +
>>>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>>>> +        nat->flags = 0;
>>>> +        nat->range_af = AF_UNSPEC;
>>>> +        nat->flags |= NX_NAT_F_SRC;
>>>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>>>> +        ct = ofpacts->header;
>>>> +    }
>>>> +
>>>> +    size_t set_field_offset = ofpacts->size;
>>>> +    ofpbuf_pull(ofpacts, set_field_offset);
>>>> +
>>>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>>>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>>>> +    ct = ofpacts->header;
>>>> +    ofpact_finish(ofpacts, &ct->ofpact);
>>>> +}
>>>> +
>>>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>>>> static void
>>>> parse_set_action(struct action_context *ctx)
>>>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>>>         parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>>>     } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>>>         parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>>>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>>>> +        parse_ct_commit_drop(ctx);
>>>>     } else {
>>>>         lexer_syntax_error(ctx->lexer, "expecting action");
>>>>     }
>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>> index 616999eab..a72533a9e 100644
>>>> --- a/lib/ovn-util.c
>>>> +++ b/lib/ovn-util.c
>>>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>     destroy_lport_addresses(&laddrs);
>>>> }
>>>> 
>>>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>>>> +/* Allocates a key for conntrack zone allocation for a provided
>>>>  * 'key' record and a 'type'.
>>>>  *
>>>>  * It is the caller's responsibility to free the allocated memory. */
>>>> char *
>>>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>>>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>>>> {
>>>>     return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>>>> }
>>>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>>>> index b3905ef7b..e0855f19e 100644
>>>> --- a/lib/ovn-util.h
>>>> +++ b/lib/ovn-util.h
>>>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>>>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>                      struct svec *ipv6_addrs);
>>>> 
>>>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>>>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>>>> 
>>>> const char *default_nb_db(void);
>>>> const char *default_sb_db(void);
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 7e2681865..2aff3458c 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>>>  * +----+----------------------------------------------+ G |                  |
>>>>  * | R7 |                   UNUSED                     | 1 |                  |
>>>>  * +----+----------------------------------------------+---+------------------+
>>>> - * | R8 |                   UNUSED                     |
>>>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>>>  * +----+----------------------------------------------+
>>>>  * | R9 |                   UNUSED                     |
>>>>  * +----+----------------------------------------------+
>>>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>             } else {
>>>>                 ds_put_format(match, " && (%s)", acl->match);
>>>>                 build_acl_log(actions, acl, meter_groups);
>>>> +                if (acl->label) {
>>>> +                    ds_put_format(actions, "ct_commit_drop { "
>>>> +                                  "ct_label.label = %"PRId64"; }; ",
>>>> +                                  acl->label);
>>>> +                }
>>>>                 ds_put_cstr(actions, "/* drop */");
>>>>                 ovn_lflow_add_with_hint(lflows, od, stage,
>>>>                                         acl->priority + OVN_ACL_PRI_OFFSET,
>>>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>             ds_clear(match);
>>>>             ds_clear(actions);
>>>>             ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>>>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>>>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>>>                           ct_blocked_match);
>>>> +            if (acl->label) {
>>>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>>>> +                              acl->label);
>>>> +            }
>>>> +            ds_put_cstr(actions, "}; ");
>>>>             if (!strcmp(acl->action, "reject")) {
>>>>                 build_reject_acl_rules(od, lflows, stage, acl, match,
>>>>                                        actions, &acl->header_, meter_groups);
>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>>> index b8f871394..af653be4d 100644
>>>> --- a/northd/ovn-northd.8.xml
>>>> +++ b/northd/ovn-northd.8.xml
>>>> @@ -699,7 +699,12 @@
>>>>         connections and <code>ct_commit(ct_label=1/1);</code> for known
>>>>         connections.  Setting <code>ct_label</code> marks a connection
>>>>         as one that was previously allowed, but should no longer be
>>>> -        allowed due to a policy change.
>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>> +        then it translates to
>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>> +        untracked connections and
>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>> +        for known connections.
>>>>       </li>
>>>>     </ul>
>>>> 
>>>> @@ -969,7 +974,12 @@
>>>>         or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>>>         known connections.  Setting <code>ct_label</code> marks a connection
>>>>         as one that was previously allowed, but should no longer be
>>>> -        allowed due to a policy change.
>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>> +        then it translates to
>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>> +        untracked connections and
>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>> +        for known connections.
>>>>       </li>
>>>>     </ul>
>>>> 
>>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>>> index 37a709f83..4e294a212 100644
>>>> --- a/ovn-sb.xml
>>>> +++ b/ovn-sb.xml
>>>> @@ -1377,6 +1377,23 @@
>>>>           </p>
>>>>         </dd>
>>>> 
>>>> +        <dt><code>ct_commit_drop { };</code></dt>
>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>>>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>>>> +        <dd>
>>>> +          <p>
>>>> +            This action is identical to <code>ct_commit</code> except that the
>>>> +            connection tracking entry is committed in a different zone.
>>>> +          </p>
>>>> +
>>>> +          <p>
>>>> +            This action was added to store connections dropped by ACLs in a
>>>> +            separate zone that is managed independently of the
>>>> +            <code>ct_commit</code> zone, for debugging.
>>>> +          </p>
>>>> +        </dd>
>>>> +
>>>>         <dt><code>ct_dnat;</code></dt>
>>>>         <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>>>         <dd>
>>>> diff --git a/ovs b/ovs
>>>> index 6f24c2bc7..d94cd0d3e 160000
>>>> --- a/ovs
>>>> +++ b/ovs
>>>> @@ -1 +1 @@
>>>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>>>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>>>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>>>> index 3bbdbd998..07509f488 100644
>>>> --- a/utilities/ovn-nbctl.c
>>>> +++ b/utilities/ovn-nbctl.c
>>>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>>>     /* Set the ACL label */
>>>>     const char *label = shash_find_data(&ctx->options, "--label");
>>>>     if (label) {
>>>> -      /* Ensure that the action is either allow or allow-related */
>>>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>>>> +      /* Ensure that the action is either allow or allow-related or drop */
>>>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>>>> +          strcmp(action, "drop")) {
>>>>         ctl_error(ctx, "label can only be set with actions \"allow\" or "
>>>> -                  "\"allow-related\"");
>>>> +                  "\"allow-related\" or \"drop\"");
>>>>         return;
>>>>       }
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=RotwPXnAckhnfEzdWgnpPR0nnZ46Y0RYo-uUGaS4vXY&m=WQYamQH3q3qJlZ_bwkX04zqpfnIY7eXsmSvIG1UJSdL7jMEKAhhdsjfeHeWjf2pn&s=Wee-wqGguGK5QdONDVISvN7VU9ppa6t4xH_k8v3x2gM&e=
Adrian Moreno Nov. 25, 2022, 12:30 p.m. UTC | #11
On 11/21/22 14:56, Abhiram Sangana wrote:
> Hi Adrian,
> 
> I apologise for the delay in replying back.
> 
>> On 14 Nov 2022, at 13:05, Adrian Moreno <amorenoz@redhat.com> wrote:
>>
>> Hi,
>>
>> On 10/20/22 15:49, Abhiram Sangana wrote:
>>> Hi Dumitru,
>>> Thanks for reviewing the patch.
>>>> On 19 Oct 2022, at 14:09, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> Hi Abhiram,
>>>>
>>>> Thanks for the patch!  I only skimmed the changes so this is not a full
>>>> review but more of a discussion starter.
>>>>
>>>> On 10/18/22 17:33, Abhiram Sangana wrote:
>>>>> To identify connections dropped by ACLs, users can enable logging for ACLs
>>>>> but this approach does not scale. ACL logging uses "controller" action
>>>>> which causes a significant spike in the CPU usage of ovs-vswitchd (and
>>>>> ovn-controller to a lesser extent) even with metering enabled (observed
>>>>> 65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
>>>>> approach is to use drop sampling (patch by Adrian Moreno currently in
>>>>> review) but we might miss specific connections of interest with this
>>>>> approach.
>>>>>
>>>>> This patch commits connections dropped by ACLs to the connection tracking
>>>>> table with a specific ACL label that was introduced in 0e0228be (
>>>>> northd: Add ACL label). The dropped connections are committed in a separate
>>>>> CT zone so that they can be managed independently.
>>>>>
>>>>
>>>> I'm not sure I understand how the CMS can manage this.  How is this
>>>> better than sampling?  Committed connections are going to time out at
>>>> some point (30 sec by default for udp/icmp with the kernel datapath).
>>>> So the CMS will have to continuously monitor the contents of the
>>>> conntrack zone?  Aren't we just moving the CPU load somewhere else with
>>>> this?  Even so, there's a chance an entry is missed.
>>> Linux nf_conntrack module supports sending connection tracking events
>>> to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
>>> parameter). So, CMS can parse the stream of new connection events from
>>> conntrack and log the packets based on CT label.
>>
>> Isn't this datapath-specific? this won't be available for the netdev datapath, right?
> 
> Yes, this approach is datapath specific - I haven’t checked how to send connection tracking events for netdev datapath.
>>
>>> An issue with sampling is that if there are a large number of packets
>>> for a particular connection(s), packets of other connections might not
>>> get sampled and we miss information about these connections. With the
>>> conntrack approach, we get a single event for each connection (until
>>> they time out), so there is lesser load on the CMS/collector and lesser
>>> likelihood of missing connections.
>>
>> I don't see why sampling would inherently miss packets. The current RFC for ACL sampling (different from the generic drop sampling one) allows the user to specify a probability per ACL so 100% of packets can be sampled in connection establishment drops while we sample N% of accepted traffic having a lot of flexibility in the inevitable performance vs accuracy tradeoff.
>>
>> I'd like to better understand what are the limitations of the current approach to see if it can be improved in any way.
> 
> I haven’t experimented with flow-based IPFIX sampling but I noticed high CPU usage of ovs-vswitchd while trying to export Netflow records (which I think is similar to 100% sampling) with large number of connections in OVN bridge. I was expecting a similar cost with respect to upcalls if we use 100% sampling rate for drop ACLs when there are multiple connection establishment packets in the bridge.
> 

I have also found that per-bridge sampling at 100% not very usable. Besides, by 
default both ingress and egress traffic is sampled so that's pretty much 
generating 2 IPFIX messages per packet.

For ACL-rejection sampling, however, it shouldn't be that much. Besides, we can 
always make use of the flow cache to avoid keeping hold of the handler threads. 
I plan to run some performance benchmarking of this.


> Thanks,
> Abhiram
> 
>> Thanks,
>> --
>> Adrián
>>
>>>>
>>>>> Each logical port is assigned a new zone for committing dropped flows.
>>>>> The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.
>>>>>
>>>>> A new lflow action "ct_commit_drop" is introduced that commits flows
>>>>> to connection tracking table in a zone identified by
>>>>> MFF_LOG_ACL_DROP_ZONE register.
>>>>>
>>>>> An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
>>>>> instead of silently dropping the packet.
>>>>>
>>>>> Signed-off-by: Abhiram Sangana <sangana.abhiram@nutanix.com>
>>>>> ---
>>>>> controller/ovn-controller.c  | 23 ++++++++++++++---
>>>>> controller/physical.c        | 32 +++++++++++++++++++++--
>>>>> include/ovn/actions.h        |  1 +
>>>>> include/ovn/logical-fields.h |  1 +
>>>>> lib/actions.c                | 50 ++++++++++++++++++++++++++++++++++++
>>>>> lib/ovn-util.c               |  4 +--
>>>>> lib/ovn-util.h               |  2 +-
>>>>> northd/northd.c              | 14 ++++++++--
>>>>> northd/ovn-northd.8.xml      | 14 ++++++++--
>>>>> ovn-sb.xml                   | 17 ++++++++++++
>>>>> ovs                          |  2 +-
>>>>
>>>> This shouldn't need to change the OVS submodule.
>>>>
>>> My bad. Will fix this.
>>> Thanks,
>>> Abhiram Sangana
>>>> Thanks,
>>>> Dumitru
>>>>
>>>>> utilities/ovn-nbctl.c        |  7 ++---
>>>>> 12 files changed, 151 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>> index c97744d57..1ad20fe55 100644
>>>>> --- a/controller/ovn-controller.c
>>>>> +++ b/controller/ovn-controller.c
>>>>> @@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
>>>>>      unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
>>>>>
>>>>>      struct shash_node *shash_node;
>>>>> +    const struct binding_lport *lport;
>>>>>      SHASH_FOR_EACH (shash_node, binding_lports) {
>>>>>          sset_add(&all_users, shash_node->name);
>>>>> +
>>>>> +        /* Zone for committing dropped connections of a vNIC. */
>>>>> +        lport = shash_node->data;
>>>>> +        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
>>>>> +        sset_add(&all_users, drop_zone);
>>>>> +        free(drop_zone);
>>>>>      }
>>>>>
>>>>>      /* Local patched datapath (gateway routers) need zones assigned. */
>>>>> @@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
>>>>>      HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>>>>>          /* XXX Add method to limit zone assignment to logical router
>>>>>           * datapaths with NAT */
>>>>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>> +        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
>>>>> +        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
>>>>>          sset_add(&all_users, dnat);
>>>>>          sset_add(&all_users, snat);
>>>>>          shash_add(&all_lds, dnat, ld);
>>>>> @@ -2090,7 +2097,7 @@ ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
>>>>>          /* Check if the requested snat zone has changed for the datapath
>>>>>           * or not.  If so, then fall back to full recompute of
>>>>>           * ct_zone engine. */
>>>>> -        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
>>>>> +        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
>>>>>          struct simap_node *simap_node = simap_find(&ct_zones_data->current,
>>>>>                                                     snat_dp_zone_key);
>>>>>          free(snat_dp_zone_key);
>>>>> @@ -2148,6 +2155,16 @@ ct_zones_runtime_data_handler(struct engine_node *node, void *data)
>>>>>                                          &ct_zones_data->pending);
>>>>>                      updated = true;
>>>>>                  }
>>>>> +                char *drop_zone = alloc_ct_zone_key(
>>>>> +                    &t_lport->pb->header_.uuid, "drop");
>>>>> +                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
>>>>> +                    alloc_id_to_ct_zone(drop_zone,
>>>>> +                                        &ct_zones_data->current,
>>>>> +                                        ct_zones_data->bitmap, &scan_start,
>>>>> +                                        &ct_zones_data->pending);
>>>>> +                    updated = true;
>>>>> +                }
>>>>> +                free(drop_zone);
>>>>>              } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
>>>>>                  struct simap_node *ct_zone =
>>>>>                      simap_find(&ct_zones_data->current,
>>>>> diff --git a/controller/physical.c b/controller/physical.c
>>>>> index f3c8bddce..fc46669c1 100644
>>>>> --- a/controller/physical.c
>>>>> +++ b/controller/physical.c
>>>>> @@ -60,6 +60,7 @@ struct zone_ids {
>>>>>      int ct;                     /* MFF_LOG_CT_ZONE. */
>>>>>      int dnat;                   /* MFF_LOG_DNAT_ZONE. */
>>>>>      int snat;                   /* MFF_LOG_SNAT_ZONE. */
>>>>> +    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
>>>>> };
>>>>>
>>>>> struct tunnel {
>>>>> @@ -204,14 +205,18 @@ get_zone_ids(const struct sbrec_port_binding *binding,
>>>>>
>>>>>      const struct uuid *key = &binding->datapath->header_.uuid;
>>>>>
>>>>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>>>>> +    char *dnat = alloc_ct_zone_key(key, "dnat");
>>>>>      zone_ids.dnat = simap_get(ct_zones, dnat);
>>>>>      free(dnat);
>>>>>
>>>>> -    char *snat = alloc_nat_zone_key(key, "snat");
>>>>> +    char *snat = alloc_ct_zone_key(key, "snat");
>>>>>      zone_ids.snat = simap_get(ct_zones, snat);
>>>>>      free(snat);
>>>>>
>>>>> +    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
>>>>> +    zone_ids.drop = simap_get(ct_zones, drop_zone);
>>>>> +    free(drop_zone);
>>>>> +
>>>>>      return zone_ids;
>>>>> }
>>>>>
>>>>> @@ -822,6 +827,9 @@ put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>>>>>          if (zone_ids->snat) {
>>>>>              put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
>>>>>          }
>>>>> +        if (zone_ids->drop) {
>>>>> +            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
>>>>> +        }
>>>>>      }
>>>>> }
>>>>>
>>>>> @@ -858,6 +866,26 @@ put_local_common_flows(uint32_t dp_key,
>>>>>                      pb->header_.uuid.parts[0], &match, ofpacts_p,
>>>>>                      &pb->header_.uuid);
>>>>>
>>>>> +    if (zone_ids->drop) {
>>>>> +        /* Table 39, Priority 1.
>>>>> +         * =======================
>>>>> +         *
>>>>> +         * Clear the logical registers (for consistent behavior with packets
>>>>> +         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
>>>>> +        match_init_catchall(&match);
>>>>> +        ofpbuf_clear(ofpacts_p);
>>>>> +        match_set_metadata(&match, htonll(dp_key));
>>>>> +        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
>>>>> +            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
>>>>> +                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
>>>>> +            }
>>>>> +        }
>>>>> +        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
>>>>> +        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
>>>>> +                        pb->datapath->header_.uuid.parts[0], &match,
>>>>> +                        ofpacts_p, &pb->datapath->header_.uuid);
>>>>> +    }
>>>>> +
>>>>>      /* Table 39, Priority 100.
>>>>>       * =======================
>>>>>       *
>>>>> diff --git a/include/ovn/actions.h b/include/ovn/actions.h
>>>>> index d7ee84dac..6424250a6 100644
>>>>> --- a/include/ovn/actions.h
>>>>> +++ b/include/ovn/actions.h
>>>>> @@ -121,6 +121,7 @@ struct ovn_extend_table;
>>>>>      OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
>>>>>      OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
>>>>>      OVNACT(CHK_ECMP_NH,       ovnact_result)          \
>>>>> +    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
>>>>>
>>>>> /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
>>>>> enum OVS_PACKED_ENUM ovnact_type {
>>>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>>>> index 3db7265e4..889f5f9e3 100644
>>>>> --- a/include/ovn/logical-fields.h
>>>>> +++ b/include/ovn/logical-fields.h
>>>>> @@ -47,6 +47,7 @@ enum ovn_controller_event {
>>>>> #define MFF_LOG_REG0             MFF_REG0
>>>>> #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
>>>>> #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
>>>>> +#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
>>>>>
>>>>> #define MFF_LOG_XXREG0           MFF_XXREG0
>>>>> #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
>>>>> diff --git a/lib/actions.c b/lib/actions.c
>>>>> index adbb42db4..dfff3e793 100644
>>>>> --- a/lib/actions.c
>>>>> +++ b/lib/actions.c
>>>>> @@ -4600,6 +4600,54 @@ encode_CHK_ECMP_NH(const struct ovnact_result *res,
>>>>>                             MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
>>>>> }
>>>>>
>>>>> +static void
>>>>> +parse_ct_commit_drop(struct action_context *ctx)
>>>>> +{
>>>>> +    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
>>>>> +{
>>>>> +    format_nested_action(on, "ct_commit_drop", s);
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
>>>>> +                      const struct ovnact_encode_params *ep OVS_UNUSED,
>>>>> +                      struct ofpbuf *ofpacts)
>>>>> +{
>>>>> +    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
>>>>> +    ct->flags = NX_CT_F_COMMIT;
>>>>> +    ct->recirc_table = NX_CT_RECIRC_NONE;
>>>>> +    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
>>>>> +    ct->zone_src.ofs = 0;
>>>>> +    ct->zone_src.n_bits = 16;
>>>>> +
>>>>> +    /* If the datapath supports all-zero SNAT then use it to avoid tuple
>>>>> +     * collisions at commit time between NATed and firewalled-only sessions.
>>>>> +     */
>>>>> +    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
>>>>> +        size_t nat_offset = ofpacts->size;
>>>>> +        ofpbuf_pull(ofpacts, nat_offset);
>>>>> +
>>>>> +        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
>>>>> +        nat->flags = 0;
>>>>> +        nat->range_af = AF_UNSPEC;
>>>>> +        nat->flags |= NX_NAT_F_SRC;
>>>>> +        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
>>>>> +        ct = ofpacts->header;
>>>>> +    }
>>>>> +
>>>>> +    size_t set_field_offset = ofpacts->size;
>>>>> +    ofpbuf_pull(ofpacts, set_field_offset);
>>>>> +
>>>>> +    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
>>>>> +    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
>>>>> +    ct = ofpacts->header;
>>>>> +    ofpact_finish(ofpacts, &ct->ofpact);
>>>>> +}
>>>>> +
>>>>> /* Parses an assignment or exchange or put_dhcp_opts action. */
>>>>> static void
>>>>> parse_set_action(struct action_context *ctx)
>>>>> @@ -4790,6 +4838,8 @@ parse_action(struct action_context *ctx)
>>>>>          parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
>>>>>      } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
>>>>>          parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
>>>>> +    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
>>>>> +        parse_ct_commit_drop(ctx);
>>>>>      } else {
>>>>>          lexer_syntax_error(ctx->lexer, "expecting action");
>>>>>      }
>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>>> index 616999eab..a72533a9e 100644
>>>>> --- a/lib/ovn-util.c
>>>>> +++ b/lib/ovn-util.c
>>>>> @@ -443,12 +443,12 @@ split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>>      destroy_lport_addresses(&laddrs);
>>>>> }
>>>>>
>>>>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>>>>> +/* Allocates a key for conntrack zone allocation for a provided
>>>>>   * 'key' record and a 'type'.
>>>>>   *
>>>>>   * It is the caller's responsibility to free the allocated memory. */
>>>>> char *
>>>>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>>>>> +alloc_ct_zone_key(const struct uuid *key, const char *type)
>>>>> {
>>>>>      return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>>>>> }
>>>>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>>>>> index b3905ef7b..e0855f19e 100644
>>>>> --- a/lib/ovn-util.h
>>>>> +++ b/lib/ovn-util.h
>>>>> @@ -92,7 +92,7 @@ const char *find_lport_address(const struct lport_addresses *laddrs,
>>>>> void split_addresses(const char *addresses, struct svec *ipv4_addrs,
>>>>>                       struct svec *ipv6_addrs);
>>>>>
>>>>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>>>>> +char *alloc_ct_zone_key(const struct uuid *key, const char *type);
>>>>>
>>>>> const char *default_nb_db(void);
>>>>> const char *default_sb_db(void);
>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>> index 7e2681865..2aff3458c 100644
>>>>> --- a/northd/northd.c
>>>>> +++ b/northd/northd.c
>>>>> @@ -287,7 +287,7 @@ enum ovn_stage {
>>>>>   * +----+----------------------------------------------+ G |                  |
>>>>>   * | R7 |                   UNUSED                     | 1 |                  |
>>>>>   * +----+----------------------------------------------+---+------------------+
>>>>> - * | R8 |                   UNUSED                     |
>>>>> + * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
>>>>>   * +----+----------------------------------------------+
>>>>>   * | R9 |                   UNUSED                     |
>>>>>   * +----+----------------------------------------------+
>>>>> @@ -6343,6 +6343,11 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>>              } else {
>>>>>                  ds_put_format(match, " && (%s)", acl->match);
>>>>>                  build_acl_log(actions, acl, meter_groups);
>>>>> +                if (acl->label) {
>>>>> +                    ds_put_format(actions, "ct_commit_drop { "
>>>>> +                                  "ct_label.label = %"PRId64"; }; ",
>>>>> +                                  acl->label);
>>>>> +                }
>>>>>                  ds_put_cstr(actions, "/* drop */");
>>>>>                  ovn_lflow_add_with_hint(lflows, od, stage,
>>>>>                                          acl->priority + OVN_ACL_PRI_OFFSET,
>>>>> @@ -6363,8 +6368,13 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od,
>>>>>              ds_clear(match);
>>>>>              ds_clear(actions);
>>>>>              ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
>>>>> -            ds_put_format(actions, "ct_commit { %s = 1; }; ",
>>>>> +            ds_put_format(actions, "ct_commit { %s = 1; ",
>>>>>                            ct_blocked_match);
>>>>> +            if (acl->label) {
>>>>> +                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
>>>>> +                              acl->label);
>>>>> +            }
>>>>> +            ds_put_cstr(actions, "}; ");
>>>>>              if (!strcmp(acl->action, "reject")) {
>>>>>                  build_reject_acl_rules(od, lflows, stage, acl, match,
>>>>>                                         actions, &acl->header_, meter_groups);
>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
>>>>> index b8f871394..af653be4d 100644
>>>>> --- a/northd/ovn-northd.8.xml
>>>>> +++ b/northd/ovn-northd.8.xml
>>>>> @@ -699,7 +699,12 @@
>>>>>          connections and <code>ct_commit(ct_label=1/1);</code> for known
>>>>>          connections.  Setting <code>ct_label</code> marks a connection
>>>>>          as one that was previously allowed, but should no longer be
>>>>> -        allowed due to a policy change.
>>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>>> +        then it translates to
>>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>>> +        untracked connections and
>>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>>> +        for known connections.
>>>>>        </li>
>>>>>      </ul>
>>>>>
>>>>> @@ -969,7 +974,12 @@
>>>>>          or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
>>>>>          known connections.  Setting <code>ct_label</code> marks a connection
>>>>>          as one that was previously allowed, but should no longer be
>>>>> -        allowed due to a policy change.
>>>>> +        allowed due to a policy change. If the ACL has a <code>label</code>,
>>>>> +        then it translates to
>>>>> +        <code>ct_commit_drop(ct_label.label=label)</code> for new and
>>>>> +        untracked connections and
>>>>> +        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
>>>>> +        for known connections.
>>>>>        </li>
>>>>>      </ul>
>>>>>
>>>>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>>>>> index 37a709f83..4e294a212 100644
>>>>> --- a/ovn-sb.xml
>>>>> +++ b/ovn-sb.xml
>>>>> @@ -1377,6 +1377,23 @@
>>>>>            </p>
>>>>>          </dd>
>>>>>
>>>>> +        <dt><code>ct_commit_drop { };</code></dt>
>>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
>>>>> +        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
>>>>> +        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
>>>>> +        <dd>
>>>>> +          <p>
>>>>> +            This action is identical to <code>ct_commit</code> except that the
>>>>> +            connection tracking entry is committed in a different zone.
>>>>> +          </p>
>>>>> +
>>>>> +          <p>
>>>>> +            This action was added to store connections dropped by ACLs in a
>>>>> +            separate zone that is managed independently of the
>>>>> +            <code>ct_commit</code> zone, for debugging.
>>>>> +          </p>
>>>>> +        </dd>
>>>>> +
>>>>>          <dt><code>ct_dnat;</code></dt>
>>>>>          <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>>>>>          <dd>
>>>>> diff --git a/ovs b/ovs
>>>>> index 6f24c2bc7..d94cd0d3e 160000
>>>>> --- a/ovs
>>>>> +++ b/ovs
>>>>> @@ -1 +1 @@
>>>>> -Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
>>>>> +Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
>>>>> diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
>>>>> index 3bbdbd998..07509f488 100644
>>>>> --- a/utilities/ovn-nbctl.c
>>>>> +++ b/utilities/ovn-nbctl.c
>>>>> @@ -2225,10 +2225,11 @@ nbctl_acl_add(struct ctl_context *ctx)
>>>>>      /* Set the ACL label */
>>>>>      const char *label = shash_find_data(&ctx->options, "--label");
>>>>>      if (label) {
>>>>> -      /* Ensure that the action is either allow or allow-related */
>>>>> -      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
>>>>> +      /* Ensure that the action is either allow or allow-related or drop */
>>>>> +      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
>>>>> +          strcmp(action, "drop")) {
>>>>>          ctl_error(ctx, "label can only be set with actions \"allow\" or "
>>>>> -                  "\"allow-related\"");
>>>>> +                  "\"allow-related\" or \"drop\"");
>>>>>          return;
>>>>>        }
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=RotwPXnAckhnfEzdWgnpPR0nnZ46Y0RYo-uUGaS4vXY&m=WQYamQH3q3qJlZ_bwkX04zqpfnIY7eXsmSvIG1UJSdL7jMEKAhhdsjfeHeWjf2pn&s=Wee-wqGguGK5QdONDVISvN7VU9ppa6t4xH_k8v3x2gM&e=
>
diff mbox series

Patch

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index c97744d57..1ad20fe55 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -660,8 +660,15 @@  update_ct_zones(const struct shash *binding_lports,
     unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];
 
     struct shash_node *shash_node;
+    const struct binding_lport *lport;
     SHASH_FOR_EACH (shash_node, binding_lports) {
         sset_add(&all_users, shash_node->name);
+
+        /* Zone for committing dropped connections of a vNIC. */
+        lport = shash_node->data;
+        char *drop_zone = alloc_ct_zone_key(&lport->pb->header_.uuid, "drop");
+        sset_add(&all_users, drop_zone);
+        free(drop_zone);
     }
 
     /* Local patched datapath (gateway routers) need zones assigned. */
@@ -670,8 +677,8 @@  update_ct_zones(const struct shash *binding_lports,
     HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
         /* XXX Add method to limit zone assignment to logical router
          * datapaths with NAT */
-        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
-        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
+        char *dnat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "dnat");
+        char *snat = alloc_ct_zone_key(&ld->datapath->header_.uuid, "snat");
         sset_add(&all_users, dnat);
         sset_add(&all_users, snat);
         shash_add(&all_lds, dnat, ld);
@@ -2090,7 +2097,7 @@  ct_zones_datapath_binding_handler(struct engine_node *node, void *data)
         /* Check if the requested snat zone has changed for the datapath
          * or not.  If so, then fall back to full recompute of
          * ct_zone engine. */
-        char *snat_dp_zone_key = alloc_nat_zone_key(&dp->header_.uuid, "snat");
+        char *snat_dp_zone_key = alloc_ct_zone_key(&dp->header_.uuid, "snat");
         struct simap_node *simap_node = simap_find(&ct_zones_data->current,
                                                    snat_dp_zone_key);
         free(snat_dp_zone_key);
@@ -2148,6 +2155,16 @@  ct_zones_runtime_data_handler(struct engine_node *node, void *data)
                                         &ct_zones_data->pending);
                     updated = true;
                 }
+                char *drop_zone = alloc_ct_zone_key(
+                    &t_lport->pb->header_.uuid, "drop");
+                if (!simap_contains(&ct_zones_data->current, drop_zone)) {
+                    alloc_id_to_ct_zone(drop_zone,
+                                        &ct_zones_data->current,
+                                        ct_zones_data->bitmap, &scan_start,
+                                        &ct_zones_data->pending);
+                    updated = true;
+                }
+                free(drop_zone);
             } else if (t_lport->tracked_type == TRACKED_RESOURCE_REMOVED) {
                 struct simap_node *ct_zone =
                     simap_find(&ct_zones_data->current,
diff --git a/controller/physical.c b/controller/physical.c
index f3c8bddce..fc46669c1 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -60,6 +60,7 @@  struct zone_ids {
     int ct;                     /* MFF_LOG_CT_ZONE. */
     int dnat;                   /* MFF_LOG_DNAT_ZONE. */
     int snat;                   /* MFF_LOG_SNAT_ZONE. */
+    int drop;                   /* MFF_LOG_ACL_DROP_ZONE. */
 };
 
 struct tunnel {
@@ -204,14 +205,18 @@  get_zone_ids(const struct sbrec_port_binding *binding,
 
     const struct uuid *key = &binding->datapath->header_.uuid;
 
-    char *dnat = alloc_nat_zone_key(key, "dnat");
+    char *dnat = alloc_ct_zone_key(key, "dnat");
     zone_ids.dnat = simap_get(ct_zones, dnat);
     free(dnat);
 
-    char *snat = alloc_nat_zone_key(key, "snat");
+    char *snat = alloc_ct_zone_key(key, "snat");
     zone_ids.snat = simap_get(ct_zones, snat);
     free(snat);
 
+    char *drop_zone = alloc_ct_zone_key(&binding->header_.uuid, "drop");
+    zone_ids.drop = simap_get(ct_zones, drop_zone);
+    free(drop_zone);
+
     return zone_ids;
 }
 
@@ -822,6 +827,9 @@  put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
         if (zone_ids->snat) {
             put_load(zone_ids->snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
         }
+        if (zone_ids->drop) {
+            put_load(zone_ids->drop, MFF_LOG_ACL_DROP_ZONE, 0, 32, ofpacts_p);
+        }
     }
 }
 
@@ -858,6 +866,26 @@  put_local_common_flows(uint32_t dp_key,
                     pb->header_.uuid.parts[0], &match, ofpacts_p,
                     &pb->header_.uuid);
 
+    if (zone_ids->drop) {
+        /* Table 39, Priority 1.
+         * =======================
+         *
+         * Clear the logical registers (for consistent behavior with packets
+         * that get tunneled) except MFF_LOG_ACL_DROP_ZONE. */
+        match_init_catchall(&match);
+        ofpbuf_clear(ofpacts_p);
+        match_set_metadata(&match, htonll(dp_key));
+        for (int i = 0; i < MFF_N_LOG_REGS; i++) {
+            if ((MFF_REG0 + i) != MFF_LOG_ACL_DROP_ZONE) {
+                put_load(0, MFF_REG0 + i, 0, 32, ofpacts_p);
+            }
+        }
+        put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, ofpacts_p);
+        ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 1,
+                        pb->datapath->header_.uuid.parts[0], &match,
+                        ofpacts_p, &pb->datapath->header_.uuid);
+    }
+
     /* Table 39, Priority 100.
      * =======================
      *
diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index d7ee84dac..6424250a6 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -121,6 +121,7 @@  struct ovn_extend_table;
     OVNACT(COMMIT_ECMP_NH,    ovnact_commit_ecmp_nh)  \
     OVNACT(CHK_ECMP_NH_MAC,   ovnact_result)          \
     OVNACT(CHK_ECMP_NH,       ovnact_result)          \
+    OVNACT(CT_COMMIT_DROP,    ovnact_nest)            \
 
 /* enum ovnact_type, with a member OVNACT_<ENUM> for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
index 3db7265e4..889f5f9e3 100644
--- a/include/ovn/logical-fields.h
+++ b/include/ovn/logical-fields.h
@@ -47,6 +47,7 @@  enum ovn_controller_event {
 #define MFF_LOG_REG0             MFF_REG0
 #define MFF_LOG_LB_ORIG_DIP_IPV4 MFF_REG1
 #define MFF_LOG_LB_ORIG_TP_DPORT MFF_REG2
+#define MFF_LOG_ACL_DROP_ZONE    MFF_REG8
 
 #define MFF_LOG_XXREG0           MFF_XXREG0
 #define MFF_LOG_LB_ORIG_DIP_IPV6 MFF_XXREG1
diff --git a/lib/actions.c b/lib/actions.c
index adbb42db4..dfff3e793 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -4600,6 +4600,54 @@  encode_CHK_ECMP_NH(const struct ovnact_result *res,
                            MLF_LOOKUP_COMMIT_ECMP_NH_BIT, ofpacts);
 }
 
+static void
+parse_ct_commit_drop(struct action_context *ctx)
+{
+    parse_nested_action(ctx, OVNACT_CT_COMMIT_DROP, "ip", WR_CT_COMMIT);
+}
+
+static void
+format_CT_COMMIT_DROP(const struct ovnact_nest *on, struct ds *s)
+{
+    format_nested_action(on, "ct_commit_drop", s);
+}
+
+static void
+encode_CT_COMMIT_DROP(const struct ovnact_nest *on,
+                      const struct ovnact_encode_params *ep OVS_UNUSED,
+                      struct ofpbuf *ofpacts)
+{
+    struct ofpact_conntrack *ct = ofpact_put_CT(ofpacts);
+    ct->flags = NX_CT_F_COMMIT;
+    ct->recirc_table = NX_CT_RECIRC_NONE;
+    ct->zone_src.field = mf_from_id(MFF_LOG_ACL_DROP_ZONE);
+    ct->zone_src.ofs = 0;
+    ct->zone_src.n_bits = 16;
+
+    /* If the datapath supports all-zero SNAT then use it to avoid tuple
+     * collisions at commit time between NATed and firewalled-only sessions.
+     */
+    if (ovs_feature_is_supported(OVS_CT_ZERO_SNAT_SUPPORT)) {
+        size_t nat_offset = ofpacts->size;
+        ofpbuf_pull(ofpacts, nat_offset);
+
+        struct ofpact_nat *nat = ofpact_put_NAT(ofpacts);
+        nat->flags = 0;
+        nat->range_af = AF_UNSPEC;
+        nat->flags |= NX_NAT_F_SRC;
+        ofpacts->header = ofpbuf_push_uninit(ofpacts, nat_offset);
+        ct = ofpacts->header;
+    }
+
+    size_t set_field_offset = ofpacts->size;
+    ofpbuf_pull(ofpacts, set_field_offset);
+
+    ovnacts_encode(on->nested, on->nested_len, ep, ofpacts);
+    ofpacts->header = ofpbuf_push_uninit(ofpacts, set_field_offset);
+    ct = ofpacts->header;
+    ofpact_finish(ofpacts, &ct->ofpact);
+}
+
 /* Parses an assignment or exchange or put_dhcp_opts action. */
 static void
 parse_set_action(struct action_context *ctx)
@@ -4790,6 +4838,8 @@  parse_action(struct action_context *ctx)
         parse_put_fdb(ctx, ovnact_put_PUT_FDB(ctx->ovnacts));
     } else if (lexer_match_id(ctx->lexer, "commit_ecmp_nh")) {
         parse_commit_ecmp_nh(ctx, ovnact_put_COMMIT_ECMP_NH(ctx->ovnacts));
+    } else if (lexer_match_id(ctx->lexer, "ct_commit_drop")) {
+        parse_ct_commit_drop(ctx);
     } else {
         lexer_syntax_error(ctx->lexer, "expecting action");
     }
diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index 616999eab..a72533a9e 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -443,12 +443,12 @@  split_addresses(const char *addresses, struct svec *ipv4_addrs,
     destroy_lport_addresses(&laddrs);
 }
 
-/* Allocates a key for NAT conntrack zone allocation for a provided
+/* Allocates a key for conntrack zone allocation for a provided
  * 'key' record and a 'type'.
  *
  * It is the caller's responsibility to free the allocated memory. */
 char *
-alloc_nat_zone_key(const struct uuid *key, const char *type)
+alloc_ct_zone_key(const struct uuid *key, const char *type)
 {
     return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
 }
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index b3905ef7b..e0855f19e 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -92,7 +92,7 @@  const char *find_lport_address(const struct lport_addresses *laddrs,
 void split_addresses(const char *addresses, struct svec *ipv4_addrs,
                      struct svec *ipv6_addrs);
 
-char *alloc_nat_zone_key(const struct uuid *key, const char *type);
+char *alloc_ct_zone_key(const struct uuid *key, const char *type);
 
 const char *default_nb_db(void);
 const char *default_sb_db(void);
diff --git a/northd/northd.c b/northd/northd.c
index 7e2681865..2aff3458c 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -287,7 +287,7 @@  enum ovn_stage {
  * +----+----------------------------------------------+ G |                  |
  * | R7 |                   UNUSED                     | 1 |                  |
  * +----+----------------------------------------------+---+------------------+
- * | R8 |                   UNUSED                     |
+ * | R8 |        DROP_CT_ZONE (<= IN(/OUT)_ACL         |
  * +----+----------------------------------------------+
  * | R9 |                   UNUSED                     |
  * +----+----------------------------------------------+
@@ -6343,6 +6343,11 @@  consider_acl(struct hmap *lflows, struct ovn_datapath *od,
             } else {
                 ds_put_format(match, " && (%s)", acl->match);
                 build_acl_log(actions, acl, meter_groups);
+                if (acl->label) {
+                    ds_put_format(actions, "ct_commit_drop { "
+                                  "ct_label.label = %"PRId64"; }; ",
+                                  acl->label);
+                }
                 ds_put_cstr(actions, "/* drop */");
                 ovn_lflow_add_with_hint(lflows, od, stage,
                                         acl->priority + OVN_ACL_PRI_OFFSET,
@@ -6363,8 +6368,13 @@  consider_acl(struct hmap *lflows, struct ovn_datapath *od,
             ds_clear(match);
             ds_clear(actions);
             ds_put_cstr(match, REGBIT_ACL_HINT_BLOCK " == 1");
-            ds_put_format(actions, "ct_commit { %s = 1; }; ",
+            ds_put_format(actions, "ct_commit { %s = 1; ",
                           ct_blocked_match);
+            if (acl->label) {
+                ds_put_format(actions, "ct_label.label = %"PRId64"; ",
+                              acl->label);
+            }
+            ds_put_cstr(actions, "}; ");
             if (!strcmp(acl->action, "reject")) {
                 build_reject_acl_rules(od, lflows, stage, acl, match,
                                        actions, &acl->header_, meter_groups);
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index b8f871394..af653be4d 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -699,7 +699,12 @@ 
         connections and <code>ct_commit(ct_label=1/1);</code> for known
         connections.  Setting <code>ct_label</code> marks a connection
         as one that was previously allowed, but should no longer be
-        allowed due to a policy change.
+        allowed due to a policy change. If the ACL has a <code>label</code>,
+        then it translates to
+        <code>ct_commit_drop(ct_label.label=label)</code> for new and
+        untracked connections and
+        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
+        for known connections.
       </li>
     </ul>
 
@@ -969,7 +974,12 @@ 
         or untracked connections and <code>ct_commit(ct_label=1/1);</code> for
         known connections.  Setting <code>ct_label</code> marks a connection
         as one that was previously allowed, but should no longer be
-        allowed due to a policy change.
+        allowed due to a policy change. If the ACL has a <code>label</code>,
+        then it translates to
+        <code>ct_commit_drop(ct_label.label=label)</code> for new and
+        untracked connections and
+        <code>ct_commit(ct_label.blocked=1; ct_label.label=label);</code>
+        for known connections.
       </li>
     </ul>
 
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 37a709f83..4e294a212 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -1377,6 +1377,23 @@ 
           </p>
         </dd>
 
+        <dt><code>ct_commit_drop { };</code></dt>
+        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; };</code></dt>
+        <dt><code>ct_commit_drop { ct_label=<var>value[/mask]</var>; };</code></dt>
+        <dt><code>ct_commit_drop { ct_mark=<var>value[/mask]</var>; ct_label=<var>value[/mask]</var>; };</code></dt>
+        <dd>
+          <p>
+            This action is identical to <code>ct_commit</code> except that the
+            connection tracking entry is committed in a different zone.
+          </p>
+
+          <p>
+            This action was added to store connections dropped by ACLs in a
+            separate zone that is managed independently of the
+            <code>ct_commit</code> zone, for debugging.
+          </p>
+        </dd>
+
         <dt><code>ct_dnat;</code></dt>
         <dt><code>ct_dnat(<var>IP</var>);</code></dt>
         <dd>
diff --git a/ovs b/ovs
index 6f24c2bc7..d94cd0d3e 160000
--- a/ovs
+++ b/ovs
@@ -1 +1 @@ 
-Subproject commit 6f24c2bc769afde0a390ce344de1a7d9c592e5a6
+Subproject commit d94cd0d3eec33e4290d7ca81918f5ac61444886e
diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c
index 3bbdbd998..07509f488 100644
--- a/utilities/ovn-nbctl.c
+++ b/utilities/ovn-nbctl.c
@@ -2225,10 +2225,11 @@  nbctl_acl_add(struct ctl_context *ctx)
     /* Set the ACL label */
     const char *label = shash_find_data(&ctx->options, "--label");
     if (label) {
-      /* Ensure that the action is either allow or allow-related */
-      if (strcmp(action, "allow") && strcmp(action, "allow-related")) {
+      /* Ensure that the action is either allow or allow-related or drop */
+      if (strcmp(action, "allow") && strcmp(action, "allow-related") &&
+          strcmp(action, "drop")) {
         ctl_error(ctx, "label can only be set with actions \"allow\" or "
-                  "\"allow-related\"");
+                  "\"allow-related\" or \"drop\"");
         return;
       }