diff mbox series

[ovs-dev,v5,1/2] ofp, dpif: Allow CT flush based on partial match

Message ID 20221216123048.973406-2-amusil@redhat.com
State Changes Requested
Headers show
Series Add extension for partial CT flush | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ales Musil Dec. 16, 2022, 12:30 p.m. UTC
Currently, the CT can be flushed by dpctl only be specifying
the whole 5-tuple. This is not very convenient when there are
only some fields known to the user of CT flush. Add new struct
ofputil_ct_match which represents the generic filtering that can
be done for CT flush. The match is done only on fields that are
non-zero with exception to the icmp fields.

This allows the filtering just within dpctl, however
it is a preparation for OpenFlow extension.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil <amusil@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
---
v5: Add ack from Paolo.
v4: Fix a flush all scenario.
v3: Rebase on top of master.
    Address the C99 comment and missing dpif_close call.
v2: Rebase on top of master.
    Address comments from Paolo.
---
 NEWS                           |   2 +
 include/openvswitch/ofp-util.h |  28 +++
 lib/automake.mk                |   2 +
 lib/ct-dpif.c                  | 208 +++++++++------------
 lib/ct-dpif.h                  |   4 +-
 lib/dpctl.c                    |  42 ++---
 lib/dpctl.man                  |   3 +-
 lib/ofp-ct-util.c              | 326 +++++++++++++++++++++++++++++++++
 lib/ofp-ct-util.h              |  36 ++++
 tests/system-traffic.at        |  82 ++++++++-
 10 files changed, 586 insertions(+), 147 deletions(-)
 create mode 100644 lib/ofp-ct-util.c
 create mode 100644 lib/ofp-ct-util.h

Comments

Ilya Maximets Jan. 10, 2023, 10:08 p.m. UTC | #1
On 12/16/22 13:30, Ales Musil wrote:
> Currently, the CT can be flushed by dpctl only be specifying

*by

> the whole 5-tuple. This is not very convenient when there are
> only some fields known to the user of CT flush. Add new struct
> ofputil_ct_match which represents the generic filtering that can
> be done for CT flush. The match is done only on fields that are
> non-zero with exception to the icmp fields.
> 
> This allows the filtering just within dpctl, however
> it is a preparation for OpenFlow extension.
> 
> Reported-at: https://bugzilla.redhat.com/2120546
> Signed-off-by: Ales Musil <amusil@redhat.com>
> Acked-by: Paolo Valerio <pvalerio@redhat.com>
> ---
> v5: Add ack from Paolo.
> v4: Fix a flush all scenario.
> v3: Rebase on top of master.
>     Address the C99 comment and missing dpif_close call.
> v2: Rebase on top of master.
>     Address comments from Paolo.
> ---

Hi, Ales.  Thanks for working on this change.
Sorry for review taking so long.  I initially though there
will be mostly a few cosmetic/style changes needed.  But
longer I look at this code, more questions I have.  See below.

Dumitru, maybe you can also help answer some of them?

Best regards, Ilya Maximets.

>  NEWS                           |   2 +
>  include/openvswitch/ofp-util.h |  28 +++
>  lib/automake.mk                |   2 +
>  lib/ct-dpif.c                  | 208 +++++++++------------
>  lib/ct-dpif.h                  |   4 +-
>  lib/dpctl.c                    |  42 ++---
>  lib/dpctl.man                  |   3 +-
>  lib/ofp-ct-util.c              | 326 +++++++++++++++++++++++++++++++++
>  lib/ofp-ct-util.h              |  36 ++++
>  tests/system-traffic.at        |  82 ++++++++-
>  10 files changed, 586 insertions(+), 147 deletions(-)
>  create mode 100644 lib/ofp-ct-util.c
>  create mode 100644 lib/ofp-ct-util.h
> 
> diff --git a/NEWS b/NEWS
> index 265375e1c..ff8904b02 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,8 @@ Post-v3.0.0
>       10 Gbps link speed by default in case the actual link speed cannot be
>       determined.  Previously it was 10 Mbps.  Values can still be overridden
>       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
> +   - ovs-dpctl and related ovs-appctl commands:

"ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.

> +     * "flush-conntrack" is capable of handling partial 5-tuple.
>  
>  
>  v3.0.0 - 15 Aug 2022
> diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
> index 091a09cad..84937ae26 100644
> --- a/include/openvswitch/ofp-util.h
> +++ b/include/openvswitch/ofp-util.h
> @@ -19,6 +19,9 @@
>  
>  #include <stdbool.h>
>  #include <stdint.h>
> +#include <sys/types.h>
> +#include <netinet/in.h>
> +
>  #include "openvswitch/ofp-protocol.h"
>  
>  struct ofp_header;
> @@ -27,6 +30,31 @@ struct ofp_header;
>  extern "C" {
>  #endif
>  
> +struct ofputil_ct_tuple {
> +    struct in6_addr src;
> +    struct in6_addr dst;
> +
> +    union {
> +        ovs_be16 src_port;
> +        ovs_be16 icmp_id;
> +    };
> +    union {
> +        ovs_be16 dst_port;
> +        struct {
> +            uint8_t icmp_code;
> +            uint8_t icmp_type;
> +        };
> +    };
> +};
> +
> +struct ofputil_ct_match {
> +    uint8_t ip_proto;
> +    uint16_t l3_type;
> +
> +    struct ofputil_ct_tuple tuple_orig;
> +    struct ofputil_ct_tuple tuple_reply;
> +};

These structures are very similar to existing ct_dpif_tuple.
We can keep them for now, but I think, we'll need to replace
ct_dpif_tuple with the new ofputil_ct_tuple in the future.
We may need some changes in ofputil_ct_tuple for that though.


OTOH, I'm not sure I understand why we need both tuples here.
Can we have a tuple and a flag that indicates a direction
instead?

There are several issues with having both original and reply
tuples, IIUC:

1. None of the command line interfaces support specifying
   reply tuple.

2. If the reply tuple will be specified via OpenFlow, it
   will be ignored if the original direction tuple is
   fully specified.

3. Netfilter kernel interface allows specifying only one
   tuple, not both, AFAICT.

I'm not a conntrack expert, so I have a question: Is it possible
that addresses and ports in original and reply directions are
not mirrored copies?  Or do we care about these cases?

If we want to flush all entries related to the same IP:PORT,
we will not be able to do that in a single OpenFlow request,
because we'll need to supply the same IP and PORT as src
and dst, or as src in both original and reply directions.
None of these can fit into a single request.

At this point is there actually any value in flushing basing
on reply direction?  Shouldn't we be able to flush the same
entries by flushing by dst in original direction, if we will
need to issue multiple requests anyway?

Did you start working on the OVN side of things?  Which
commands OVN will actually use?

> +
>  bool ofputil_decode_hello(const struct ofp_header *,
>                            uint32_t *allowed_versions);
>  struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
> diff --git a/lib/automake.mk b/lib/automake.mk
> index a0fabe38f..37135f118 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/ofp-actions.c \
>  	lib/ofp-bundle.c \
>  	lib/ofp-connection.c \
> +	lib/ofp-ct-util.c \
> +	lib/ofp-ct-util.h \
>  	lib/ofp-ed-props.c \
>  	lib/ofp-errors.c \
>  	lib/ofp-flow.c \
> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> index 6f17a26b5..9b324ff79 100644
> --- a/lib/ct-dpif.c
> +++ b/lib/ct-dpif.c
> @@ -20,6 +20,7 @@
>  #include <errno.h>
>  
>  #include "ct-dpif.h"
> +#include "ofp-ct-util.h"
>  #include "openvswitch/ofp-parse.h"
>  #include "openvswitch/vlog.h"
>  
> @@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct ct_dpif_dump_state **dump,
>      return err;
>  }
>  
> +static void
> +ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple *ofp_tuple,
> +                                    struct ct_dpif_tuple *tuple,
> +                                    uint16_t l3_type, uint8_t ip_proto)
> +{
> +    if (l3_type == AF_INET) {
> +        tuple->src.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->src);
> +        tuple->dst.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->dst);
> +    } else {
> +        tuple->src.in6 = ofp_tuple->src;
> +        tuple->dst.in6 = ofp_tuple->dst;
> +    }
> +
> +    tuple->l3_type = l3_type;
> +    tuple->ip_proto = ip_proto;
> +    tuple->src_port = ofp_tuple->src_port;
> +
> +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> +        tuple->icmp_code = ofp_tuple->icmp_code;
> +        tuple->icmp_type = ofp_tuple->icmp_type;
> +    } else {
> +        tuple->dst_port = ofp_tuple->dst_port;
> +    }
> +}
> +
>  /* Dump one connection from a tracker, and put it in 'entry'.
>   *
>   * 'dump' should have been initialized by ct_dpif_dump_start().
> @@ -100,25 +126,77 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
>              ? dpif->dpif_class->ct_dump_done(dpif, dump)
>              : EOPNOTSUPP);
>  }
> -
> +
> +static int
> +ct_dpif_flush_tuple(struct dpif *dpif, const uint16_t *zone,
> +                    const struct ofputil_ct_match *match) {
> +    struct ct_dpif_dump_state *dump;
> +    struct ct_dpif_entry cte;
> +    int error;
> +    int tot_bkts;
> +
> +    if (!dpif->dpif_class->ct_flush) {
> +        return EOPNOTSUPP;
> +    }
> +
> +    if (VLOG_IS_DBG_ENABLED()) {
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        ofputil_ct_match_format(&ds, match);
> +        VLOG_DBG("%s: ct_flush:%s in zone %d", dpif_name(dpif), ds_cstr(&ds),
> +                 zone ? *zone : 0);
> +        ds_destroy(&ds);
> +    }
> +
> +    /* If we have full five tuple in orig just do the flush over that
> +     * tuple directly. */
> +    if (ofputil_ct_tuple_is_five_tuple(&match->tuple_orig, match->ip_proto)) {
> +        struct ct_dpif_tuple tuple;
> +        ct_dpif_tuple_from_ofputil_ct_tuple(&match->tuple_orig, &tuple,
> +                                            match->l3_type, match->ip_proto);
> +        return dpif->dpif_class->ct_flush(dpif, zone, &tuple);

Here we just fully ignored the reply tuple, even if it was
actually specified.  What if it didn't match whatever we're
oing to remove?  It is user-specified, so can be anything.

> +    }
> +
> +    error = ct_dpif_dump_start(dpif, &dump, zone, &tot_bkts);
> +    if (error) {
> +        return error;
> +    }
> +
> +    while (!(error = ct_dpif_dump_next(dump, &cte))) {
> +        if (zone && *zone != cte.zone) {
> +            continue;
> +        }
> +
> +        if (ofputil_ct_match_cmp(match, &cte)) {
> +            error = dpif->dpif_class->ct_flush(dpif, &cte.zone,
> +                                               &cte.tuple_orig);

We assume here that original direction tuple uniquely identifies
the conntrack entry.

> +            if (error) {
> +                break;
> +            }
> +        }
> +    }
> +    if (error == EOF) {
> +        error = 0;
> +    }
> +
> +    ct_dpif_dump_done(dump);
> +    return error;
> +}
> +
>  /* Flush the entries in the connection tracker used by 'dpif'.  The
>   * arguments have the following behavior:
>   *
> - *   - If both 'zone' and 'tuple' are NULL, flush all the conntrack entries.
> - *   - If 'zone' is not NULL, and 'tuple' is NULL, flush all the conntrack
> + *   - If both 'zone' and 'match' are NULL, or match is zero,

It's not clear that 'or' applies to the 'match' part only.

> + *     flush all the conntrack entries.
> + *   - If 'zone' is not NULL, and 'match' is NULL, flush all the conntrack
>   *     entries in '*zone'.
> - *   - If 'tuple' is not NULL, flush the conntrack entry specified by 'tuple'
> + *   - If 'tuple' is not NULL, flush the conntrack entry specified by 'match'

s/tuple/match/

>   *     in '*zone'. If 'zone' is NULL, use the default zone (zone 0). */
>  int
>  ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
> -              const struct ct_dpif_tuple *tuple)
> +              const struct ofputil_ct_match *match)
>  {
> -    if (tuple) {
> -        struct ds ds = DS_EMPTY_INITIALIZER;
> -        ct_dpif_format_tuple(&ds, tuple);
> -        VLOG_DBG("%s: ct_flush: %s in zone %d", dpif_name(dpif), ds_cstr(&ds),
> -                                                zone ? *zone : 0);
> -        ds_destroy(&ds);
> +    if (match && !ofputil_is_ct_match_zero(match)) {
> +        return ct_dpif_flush_tuple(dpif, zone, match);
>      } else if (zone) {
>          VLOG_DBG("%s: ct_flush: zone %"PRIu16, dpif_name(dpif), *zone);
>      } else {
> @@ -126,7 +204,7 @@ ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
>      }
>  
>      return (dpif->dpif_class->ct_flush
> -            ? dpif->dpif_class->ct_flush(dpif, zone, tuple)
> +            ? dpif->dpif_class->ct_flush(dpif, zone, NULL)
>              : EOPNOTSUPP);
>  }
>  
> @@ -583,112 +661,6 @@ ct_dpif_format_tcp_stat(struct ds * ds, int tcp_state, int conn_per_state)
>      ds_put_format(ds, "=%u", conn_per_state);
>  }
>  
> -/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
> - * Returns true on success.  Otherwise, returns false and puts the error
> - * message in 'ds'. */
> -bool
> -ct_dpif_parse_tuple(struct ct_dpif_tuple *tuple, const char *s, struct ds *ds)
> -{
> -    char *pos, *key, *value, *copy;
> -    memset(tuple, 0, sizeof *tuple);
> -
> -    pos = copy = xstrdup(s);
> -    while (ofputil_parse_key_value(&pos, &key, &value)) {
> -        if (!*value) {
> -            ds_put_format(ds, "field %s missing value", key);
> -            goto error;
> -        }
> -
> -        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")) {
> -            if (tuple->l3_type && tuple->l3_type != AF_INET) {
> -                ds_put_cstr(ds, "L3 type set multiple times");
> -                goto error;
> -            } else {
> -                tuple->l3_type = AF_INET;
> -            }
> -            if (!ip_parse(value, key[6] == 's' ? &tuple->src.ip :
> -                                                 &tuple->dst.ip)) {
> -                goto error_with_msg;
> -            }
> -        } else if (!strcmp(key, "ct_ipv6_src") ||
> -                   !strcmp(key, "ct_ipv6_dst")) {
> -            if (tuple->l3_type && tuple->l3_type != AF_INET6) {
> -                ds_put_cstr(ds, "L3 type set multiple times");
> -                goto error;
> -            } else {
> -                tuple->l3_type = AF_INET6;
> -            }
> -            if (!ipv6_parse(value, key[8] == 's' ? &tuple->src.in6 :
> -                                                   &tuple->dst.in6)) {
> -                goto error_with_msg;
> -            }
> -        } else if (!strcmp(key, "ct_nw_proto")) {
> -            char *err = str_to_u8(value, key, &tuple->ip_proto);
> -            if (err) {
> -                free(err);
> -                goto error_with_msg;
> -            }
> -        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key,"ct_tp_dst")) {
> -            uint16_t port;
> -            char *err = str_to_u16(value, key, &port);
> -            if (err) {
> -                free(err);
> -                goto error_with_msg;
> -            }
> -            if (key[6] == 's') {
> -                tuple->src_port = htons(port);
> -            } else {
> -                tuple->dst_port = htons(port);
> -            }
> -        } else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") ||
> -                   !strcmp(key, "icmp_id") ) {
> -            if (tuple->ip_proto != IPPROTO_ICMP &&
> -                tuple->ip_proto != IPPROTO_ICMPV6) {
> -                ds_put_cstr(ds, "invalid L4 fields");
> -                goto error;
> -            }
> -            uint16_t icmp_id;
> -            char *err;
> -            if (key[5] == 't') {
> -                err = str_to_u8(value, key, &tuple->icmp_type);
> -            } else if (key[5] == 'c') {
> -                err = str_to_u8(value, key, &tuple->icmp_code);
> -            } else {
> -                err = str_to_u16(value, key, &icmp_id);
> -                tuple->icmp_id = htons(icmp_id);
> -            }
> -            if (err) {
> -                free(err);
> -                goto error_with_msg;
> -            }
> -        } else {
> -            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
> -            goto error;
> -        }
> -    }
> -
> -    if (ipv6_is_zero(&tuple->src.in6) || ipv6_is_zero(&tuple->dst.in6) ||
> -        !tuple->ip_proto) {
> -        /* icmp_type, icmp_code, and icmp_id can be 0. */
> -        if (tuple->ip_proto != IPPROTO_ICMP &&
> -            tuple->ip_proto != IPPROTO_ICMPV6) {
> -            if (!tuple->src_port || !tuple->dst_port) {
> -                ds_put_cstr(ds, "at least one of the conntrack 5-tuple fields "
> -                                "is missing.");
> -                goto error;
> -            }
> -        }
> -    }
> -
> -    free(copy);
> -    return true;
> -
> -error_with_msg:
> -    ds_put_format(ds, "failed to parse field %s", key);
> -error:
> -    free(copy);
> -    return false;
> -}
>  
>  void
>  ct_dpif_push_zone_limit(struct ovs_list *zone_limits, uint16_t zone,
> diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
> index 2848549b0..337c99dd4 100644
> --- a/lib/ct-dpif.h
> +++ b/lib/ct-dpif.h
> @@ -17,6 +17,7 @@
>  #ifndef CT_DPIF_H
>  #define CT_DPIF_H
>  
> +#include "openvswitch/ofp-util.h"
>  #include "openvswitch/types.h"
>  #include "packets.h"
>  
> @@ -285,7 +286,7 @@ int ct_dpif_dump_start(struct dpif *, struct ct_dpif_dump_state **,
>  int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry *);
>  int ct_dpif_dump_done(struct ct_dpif_dump_state *);
>  int ct_dpif_flush(struct dpif *, const uint16_t *zone,
> -                  const struct ct_dpif_tuple *);
> +                  const struct ofputil_ct_match *);
>  int ct_dpif_set_maxconns(struct dpif *dpif, uint32_t maxconns);
>  int ct_dpif_get_maxconns(struct dpif *dpif, uint32_t *maxconns);
>  int ct_dpif_get_nconns(struct dpif *dpif, uint32_t *nconns);
> @@ -311,7 +312,6 @@ void ct_dpif_format_ipproto(struct ds *ds, uint16_t ipproto);
>  void ct_dpif_format_tuple(struct ds *, const struct ct_dpif_tuple *);
>  uint8_t ct_dpif_coalesce_tcp_state(uint8_t state);
>  void ct_dpif_format_tcp_stat(struct ds *, int, int);
> -bool ct_dpif_parse_tuple(struct ct_dpif_tuple *, const char *s, struct ds *);
>  void ct_dpif_push_zone_limit(struct ovs_list *, uint16_t zone, uint32_t limit,
>                               uint32_t count);
>  void ct_dpif_free_zone_limits(struct ovs_list *);
> diff --git a/lib/dpctl.c b/lib/dpctl.c
> index 29041fa3e..e6e350aba 100644
> --- a/lib/dpctl.c
> +++ b/lib/dpctl.c
> @@ -40,6 +40,7 @@
>  #include "netdev.h"
>  #include "netlink.h"
>  #include "odp-util.h"
> +#include "ofp-ct-util.h"
>  #include "openvswitch/ofpbuf.h"
>  #include "packets.h"
>  #include "openvswitch/shash.h"
> @@ -1707,47 +1708,38 @@ dpctl_flush_conntrack(int argc, const char *argv[],
>                        struct dpctl_params *dpctl_p)
>  {
>      struct dpif *dpif = NULL;
> -    struct ct_dpif_tuple tuple, *ptuple = NULL;
> -    struct ds ds = DS_EMPTY_INITIALIZER;
> -    uint16_t zone, *pzone = NULL;
> -    int error;
> +    struct ofputil_ct_match match = {0};
> +    uint16_t zone;
> +    bool with_zone = false;
>      int args = argc - 1;
>  
>      /* Parse ct tuple */
> -    if (args && ct_dpif_parse_tuple(&tuple, argv[args], &ds)) {
> -        ptuple = &tuple;
> +    if (args) {
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        if (!ofputil_ct_match_parse(&match, argv[args], &ds)) {
> +            dpctl_error(dpctl_p, EINVAL, "%s", ds_cstr(&ds));
> +            ds_destroy(&ds);
> +            return EINVAL;
> +        }
>          args--;
>      }
>  
>      /* Parse zone */
>      if (args && ovs_scan(argv[args], "zone=%"SCNu16, &zone)) {
> -        pzone = &zone;
> -        args--;
> -    }
> -
> -    /* Report error if there are more than one unparsed argument. */
> -    if (args > 1) {
> -        ds_put_cstr(&ds, "invalid arguments");
> -        error = EINVAL;
> -        goto error;
> +        with_zone = true;
>      }
>  
> -    error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
> +    int error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
>      if (error) {
> +        dpctl_error(dpctl_p, error, "Cannot open dpif");
>          return error;
>      }
>  
> -    error = ct_dpif_flush(dpif, pzone, ptuple);
> -    if (!error) {
> -        dpif_close(dpif);
> -        return 0;
> -    } else {
> -        ds_put_cstr(&ds, "failed to flush conntrack");
> +    error = ct_dpif_flush(dpif, with_zone ? &zone : NULL, &match);
> +    if (error) {
> +        dpctl_error(dpctl_p, error, "Failed to flush conntrack");
>      }
>  
> -error:
> -    dpctl_error(dpctl_p, error, "%s", ds_cstr(&ds));
> -    ds_destroy(&ds);
>      dpif_close(dpif);
>      return error;
>  }
> diff --git a/lib/dpctl.man b/lib/dpctl.man
> index 87ea8087b..b0cabe05d 100644
> --- a/lib/dpctl.man
> +++ b/lib/dpctl.man
> @@ -312,7 +312,8 @@ If \fBzone\fR=\fIzone\fR is specified, only flushes the connections in
>  If \fIct-tuple\fR is provided, flushes the connection entry specified by
>  \fIct-tuple\fR in \fIzone\fR. The zone defaults to 0 if it is not provided.
>  The userspace connection tracker requires flushing with the original pre-NATed
> -tuple and a warning log will be otherwise generated.
> +tuple and a warning log will be otherwise generated. The tuple can be partial
> +and will remove all connections that are matching on the specified fields.
>  An example of an IPv4 ICMP \fIct-tuple\fR:
>  .IP
>  "ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=1,icmp_type=8,icmp_code=0,icmp_id=10"
> diff --git a/lib/ofp-ct-util.c b/lib/ofp-ct-util.c
> new file mode 100644
> index 000000000..a10607590
> --- /dev/null
> +++ b/lib/ofp-ct-util.c
> @@ -0,0 +1,326 @@
> +
> +/* Copyright (c) 2022, Red Hat, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <sys/types.h>
> +#include <netinet/in.h>
> +#include <netinet/icmp6.h>
> +
> +#include "ct-dpif.h"
> +#include "ofp-ct-util.h"
> +#include "openvswitch/dynamic-string.h"
> +#include "openvswitch/ofp-parse.h"
> +#include "openvswitch/ofp-util.h"
> +#include "openvswitch/packets.h"
> +
> +static inline bool
> +ofputil_ct_inet_addr_cmp_partial(const struct in6_addr *partial,
> +                                 const union ct_dpif_inet_addr *addr,
> +                                 const uint16_t l3_type)
> +{
> +    if (ipv6_is_zero(partial)) {
> +        return true;
> +    }
> +
> +    if (l3_type == AF_INET && in6_addr_get_mapped_ipv4(partial) != addr->ip) {
> +        return false;
> +    }
> +
> +    if (l3_type == AF_INET6 && !ipv6_addr_equals(partial, &addr->in6)) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static inline bool
> +ofputil_ct_tuple_ip_cmp_partial(const struct ofputil_ct_tuple *partial,
> +                                const struct ct_dpif_tuple *tuple,
> +                                const uint16_t l3_type, const uint8_t ip_proto)
> +{
> +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->src,
> +                                          &tuple->src, l3_type)) {
> +        return false;
> +    }
> +
> +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->dst,
> +                                          &tuple->dst, l3_type)) {
> +        return false;
> +    }
> +
> +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> +        if (partial->icmp_id != tuple->icmp_id) {
> +            return false;
> +        }
> +
> +        if (partial->icmp_type != tuple->icmp_type) {
> +            return false;
> +        }
> +
> +        if (partial->icmp_code != tuple->icmp_code) {
> +            return false;
> +        }
> +    } else {
> +        if (partial->src_port && partial->src_port != tuple->src_port) {
> +            return false;
> +        }
> +
> +        if (partial->dst_port && partial->dst_port != tuple->dst_port) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +/* Compares the non-zero members if they match. This is useful for clearing
> + * up all connections specified by a partial tuples for orig/reply. */
> +bool
> +ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
> +                     const struct ct_dpif_entry *entry)
> +{
> +    if (match->l3_type && match->l3_type != entry->tuple_orig.l3_type) {
> +        return false;
> +    }
> +
> +    if (match->ip_proto && match->ip_proto != entry->tuple_orig.ip_proto) {
> +        return false;
> +    }
> +
> +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_orig,
> +                                         &entry->tuple_orig,
> +                                         match->l3_type, match->ip_proto)) {
> +        return false;
> +    }
> +
> +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_reply,
> +                                         &entry->tuple_reply,
> +                                         match->l3_type, match->ip_proto)) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static void
> +ofputil_ct_tuple_format(struct ds *ds, const struct ofputil_ct_tuple *tuple,
> +                        uint8_t ip_proto)
> +{
> +    ds_put_cstr(ds, "src=");
> +    ipv6_format_mapped(&tuple->src, ds);
> +    ds_put_cstr(ds, ",dst=");
> +    ipv6_format_mapped(&tuple->dst, ds);
> +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> +        ds_put_format(ds, ",icmp_id=%u,icmp_type=%u,icmp_code=%u",
> +                      ntohs(tuple->icmp_id), tuple->icmp_type,
> +                      tuple->icmp_code);
> +
> +    } else {
> +        ds_put_format(ds, ",src_port=%u,dst_port=%u", ntohs(tuple->src_port),
> +                      ntohs(tuple->dst_port));
> +    }
> +}
> +
> +static bool
> +ofputil_is_tuple_zero(const struct ofputil_ct_tuple *tuple)
> +{
> +    return ipv6_is_zero(&tuple->src) && ipv6_is_zero(&tuple->dst) &&
> +           !tuple->src_port && !tuple->dst_port;
> +}
> +
> +bool
> +ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple *tuple,
> +                               uint8_t ip_proto)
> +{
> +    /* First check if we have address. */
> +    bool five_tuple = !ipv6_is_zero(&tuple->src) && !ipv6_is_zero(&tuple->dst);
> +
> +    if (!(ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6)) {
> +        five_tuple = five_tuple && tuple->src_port && tuple->dst_port;
> +    }
> +
> +    return five_tuple;
> +}
> +
> +bool
> +ofputil_is_ct_match_zero(const struct ofputil_ct_match *match)
> +{
> +    return !match->ip_proto && !match->l3_type &&
> +           ofputil_is_tuple_zero(&match->tuple_orig) &&
> +           ofputil_is_tuple_zero(&match->tuple_reply);
> +}
> +
> +void
> +ofputil_ct_match_format(struct ds *ds, const struct ofputil_ct_match *match)
> +{
> +    ds_put_format(ds, " l3_type=%u,ip_proto=%u", match->l3_type,
> +                  match->ip_proto);
> +    ds_put_cstr(ds, ",orig=(");
> +    ofputil_ct_tuple_format(ds, &match->tuple_orig, match->ip_proto);
> +    ds_put_cstr(ds, "),reply=(");
> +    ofputil_ct_tuple_format(ds, &match->tuple_reply, match->ip_proto);
> +    ds_put_cstr(ds, ")");

This syntax is very different from what is used as cmdline arguments,
that is not good.  This means you cannot feed back what OVS prints out.

Also, OVS usually uses 'nw_proto' instead of 'ip_proto', especially
on the OpenFlow side of things.  And it might be better to print
a string 'ip'/'ipv6' instead of 'l3_type', which is not great on its
own.  See also comments on patch 2/2.
They can also be combined into a single 'icmp'/'icmp6'/'udp'/'udp6'/etc.
And there is already a function like this.

> +}
> +
> +static bool
> +ofputil_ct_tuple_ip_parse(struct in6_addr *addr, char *value, uint16_t l3_type)
> +{
> +    if (!ipv6_is_zero(addr)) {
> +        return false;
> +    }
> +
> +    if (l3_type == AF_INET) {
> +        ovs_be32 ip = 0;
> +
> +        ip_parse(value, &ip);
> +        *addr = in6_addr_mapped_ipv4(ip);
> +    } else {
> +        ipv6_parse(value, addr);

Both ip_parse and ipv6_parse can fail.
There is also lookup_any() function that can be exported and reused.

> +    }
> +
> +    return true;
> +}
> +
> +/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
> + * Returns true on success.  Otherwise, returns false and puts the error
> + * message in 'ds'. */
> +bool
> +ofputil_ct_match_parse(struct ofputil_ct_match *match, const char *s,
> +                       struct ds *ds)
> +{
> +    char *pos, *key, *value, *copy;
> +
> +    memset(match, 0, sizeof *match);
> +    struct ofputil_ct_tuple *tuple = &match->tuple_orig;
> +
> +    pos = copy = xstrdup(s);
> +    while (ofputil_parse_key_value(&pos, &key, &value)) {
> +        if (!*value) {
> +            ds_put_format(ds, "field %s missing value", key);
> +            goto error_with_msg;
> +        }
> +
> +        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")
> +            || !strcmp(key, "ct_ipv6_src") || !strcmp(key, "ct_ipv6_dst")) {
> +            match->l3_type = key[6] == '6' ? AF_INET6 : AF_INET;

Original code made sure that user didn't specify keys from
different families.  This version does not.

> +            uint8_t index = key[6] == '6' ? 8 : 6;
> +            struct in6_addr *addr = key[index] == 's'
> +                ? &tuple->src : &tuple->dst;
> +
> +            if (!ofputil_ct_tuple_ip_parse(addr, value, match->l3_type)) {
> +                ds_put_format(ds, "%s is set multiple times", key);
> +                goto error;
> +            }
> +        } else if (!strcmp(key, "ct_nw_proto")) {
> +            char *err = str_to_u8(value, key, &match->ip_proto);
> +
> +            if (err) {
> +                free(err);
> +                goto error_with_msg;
> +            }
> +        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key, "ct_tp_dst")) {
> +            uint16_t port;
> +            char *err = str_to_u16(value, key, &port);
> +
> +            if (err) {
> +                free(err);
> +                goto error_with_msg;
> +            }
> +            if (key[6] == 's') {
> +                tuple->src_port = htons(port);
> +            } else {
> +                tuple->dst_port = htons(port);
> +            }
> +        } else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") ||
> +                   !strcmp(key, "icmp_id")) {
> +            if (match->ip_proto != IPPROTO_ICMP &&
> +                match->ip_proto != IPPROTO_ICMPV6) {
> +                ds_put_cstr(ds, "invalid L4 fields");
> +                goto error;
> +            }
> +            uint16_t icmp_id;
> +            char *err;
> +
> +            if (key[5] == 't') {
> +                err = str_to_u8(value, key, &tuple->icmp_type);
> +            } else if (key[5] == 'c') {
> +                err = str_to_u8(value, key, &tuple->icmp_code);
> +            } else {
> +                err = str_to_u16(value, key, &icmp_id);
> +                tuple->icmp_id = htons(icmp_id);
> +            }
> +            if (err) {
> +                free(err);
> +                goto error_with_msg;
> +            }
> +        } else {
> +            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
> +            goto error;
> +        }
> +    }
> +
> +    if (!match->ip_proto && (tuple->src_port || tuple->dst_port)) {
> +        ds_put_cstr(ds, "port is set without protocol");
> +        goto error;
> +    }
> +
> +    /* For the filtering to work with icmp we need to fill the reply direction
> +     * with correct information. */
> +    if (match->ip_proto == IPPROTO_ICMP) {
> +        switch (match->tuple_orig.icmp_type) {
> +        case ICMP4_ECHO_REQUEST:
> +            match->tuple_reply.icmp_type = ICMP4_ECHO_REPLY;
> +            break;

An empty line after the 'break;'.

> +        case ICMP4_ECHO_REPLY:
> +            match->tuple_reply.icmp_type = ICMP4_ECHO_REQUEST;
> +            break;
> +        case ICMP4_TIMESTAMP:
> +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMPREPLY;
> +            break;
> +        case ICMP4_TIMESTAMPREPLY:
> +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMP;
> +            break;
> +        case ICMP4_INFOREQUEST:
> +            match->tuple_reply.icmp_type = ICMP4_INFOREPLY;
> +            break;
> +        case ICMP4_INFOREPLY:
> +            match->tuple_reply.icmp_type = ICMP4_INFOREQUEST;
> +            break;
> +        }
> +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
> +    } else if (match->ip_proto == IPPROTO_ICMPV6) {
> +        switch (match->tuple_orig.icmp_type) {
> +        case ICMP6_ECHO_REQUEST:
> +            match->tuple_reply.icmp_type = ICMP6_ECHO_REPLY;
> +            break;
> +        case ICMP6_ECHO_REPLY:
> +            match->tuple_reply.icmp_type = ICMP6_ECHO_REQUEST;
> +            break;
> +        }
> +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
> +    }
> +
> +    free(copy);
> +    return true;
> +
> +error_with_msg:
> +    ds_put_format(ds, "failed to parse field %s", key);
> +error:
> +    free(copy);
> +    return false;
> +}
> diff --git a/lib/ofp-ct-util.h b/lib/ofp-ct-util.h
> new file mode 100644
> index 000000000..5a5eaf920
> --- /dev/null
> +++ b/lib/ofp-ct-util.h
> @@ -0,0 +1,36 @@
> +/* Copyright (c) 2022, Red Hat, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef OVS_OFP_CT_UTIL_H
> +#define OVS_OFP_CT_UTIL_H
> +
> +#include "ct-dpif.h"
> +#include "openvswitch/ofp-util.h"
> +
> +bool ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
> +                                 const struct ct_dpif_entry *entry);
> +
> +bool ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple *tuple,
> +                                    uint8_t ip_proto);
> +
> +void ofputil_ct_match_format(struct ds *ds,
> +                             const struct ofputil_ct_match *match);
> +
> +bool ofputil_ct_match_parse(struct ofputil_ct_match *match, const char *s,
> +                            struct ds *ds);
> +
> +bool ofputil_is_ct_match_zero(const struct ofputil_ct_match *match);
> +
> +#endif /* lib/ofp-ct-util.h */
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index e5403519f..51903a658 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -2230,7 +2230,7 @@ udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> -AT_SETUP([conntrack - ct flush by 5-tuple])
> +AT_SETUP([conntrack - ct flush])
>  CHECK_CONNTRACK()
>  OVS_TRAFFIC_VSWITCHD_START()
>  
> @@ -2291,6 +2291,86 @@ AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl
>  ])
>  
> +dnl Test UDP from port 1 and 2, partial flush by src port
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=1'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=2'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> +
> +dnl Test UDP from port 1 and 2, partial flush by dst port
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_dst=2'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_dst=1'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> +
> +dnl Test UDP from port 1 and 2, partial flush by src address
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.1'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> +
> +dnl Test UDP from port 1 and 2, partial flush by dst address
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.2'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.1'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> +
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
Ales Musil Jan. 11, 2023, 6:53 a.m. UTC | #2
On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <i.maximets@ovn.org> wrote:

> On 12/16/22 13:30, Ales Musil wrote:
> > Currently, the CT can be flushed by dpctl only be specifying
>
> *by
>
> > the whole 5-tuple. This is not very convenient when there are
> > only some fields known to the user of CT flush. Add new struct
> > ofputil_ct_match which represents the generic filtering that can
> > be done for CT flush. The match is done only on fields that are
> > non-zero with exception to the icmp fields.
> >
> > This allows the filtering just within dpctl, however
> > it is a preparation for OpenFlow extension.
> >
> > Reported-at: https://bugzilla.redhat.com/2120546
> > Signed-off-by: Ales Musil <amusil@redhat.com>
> > Acked-by: Paolo Valerio <pvalerio@redhat.com>
> > ---
> > v5: Add ack from Paolo.
> > v4: Fix a flush all scenario.
> > v3: Rebase on top of master.
> >     Address the C99 comment and missing dpif_close call.
> > v2: Rebase on top of master.
> >     Address comments from Paolo.
> > ---
>
> Hi, Ales.  Thanks for working on this change.
> Sorry for review taking so long.  I initially though there
> will be mostly a few cosmetic/style changes needed.  But
> longer I look at this code, more questions I have.  See below.
>
> Dumitru, maybe you can also help answer some of them?
>
> Best regards, Ilya Maximets.
>

Hi Ilya,

thank you for the review. Before proceeding I tried to answer your
concerns/questions.
Let me know what you think.

Thanks,
Ales


>
> >  NEWS                           |   2 +
> >  include/openvswitch/ofp-util.h |  28 +++
> >  lib/automake.mk                |   2 +
> >  lib/ct-dpif.c                  | 208 +++++++++------------
> >  lib/ct-dpif.h                  |   4 +-
> >  lib/dpctl.c                    |  42 ++---
> >  lib/dpctl.man                  |   3 +-
> >  lib/ofp-ct-util.c              | 326 +++++++++++++++++++++++++++++++++
> >  lib/ofp-ct-util.h              |  36 ++++
> >  tests/system-traffic.at        |  82 ++++++++-
> >  10 files changed, 586 insertions(+), 147 deletions(-)
> >  create mode 100644 lib/ofp-ct-util.c
> >  create mode 100644 lib/ofp-ct-util.h
> >
> > diff --git a/NEWS b/NEWS
> > index 265375e1c..ff8904b02 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -14,6 +14,8 @@ Post-v3.0.0
> >       10 Gbps link speed by default in case the actual link speed cannot
> be
> >       determined.  Previously it was 10 Mbps.  Values can still be
> overridden
> >       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
> > +   - ovs-dpctl and related ovs-appctl commands:
>
> "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
>
> > +     * "flush-conntrack" is capable of handling partial 5-tuple.
> >
> >
> >  v3.0.0 - 15 Aug 2022
> > diff --git a/include/openvswitch/ofp-util.h
> b/include/openvswitch/ofp-util.h
> > index 091a09cad..84937ae26 100644
> > --- a/include/openvswitch/ofp-util.h
> > +++ b/include/openvswitch/ofp-util.h
> > @@ -19,6 +19,9 @@
> >
> >  #include <stdbool.h>
> >  #include <stdint.h>
> > +#include <sys/types.h>
> > +#include <netinet/in.h>
> > +
> >  #include "openvswitch/ofp-protocol.h"
> >
> >  struct ofp_header;
> > @@ -27,6 +30,31 @@ struct ofp_header;
> >  extern "C" {
> >  #endif
> >
> > +struct ofputil_ct_tuple {
> > +    struct in6_addr src;
> > +    struct in6_addr dst;
> > +
> > +    union {
> > +        ovs_be16 src_port;
> > +        ovs_be16 icmp_id;
> > +    };
> > +    union {
> > +        ovs_be16 dst_port;
> > +        struct {
> > +            uint8_t icmp_code;
> > +            uint8_t icmp_type;
> > +        };
> > +    };
> > +};
> > +
> > +struct ofputil_ct_match {
> > +    uint8_t ip_proto;
> > +    uint16_t l3_type;
> > +
> > +    struct ofputil_ct_tuple tuple_orig;
> > +    struct ofputil_ct_tuple tuple_reply;
> > +};
>
> These structures are very similar to existing ct_dpif_tuple.
> We can keep them for now, but I think, we'll need to replace
> ct_dpif_tuple with the new ofputil_ct_tuple in the future.
> We may need some changes in ofputil_ct_tuple for that though.
>

I was thinking about replacing them, however it would require a much
bigger change, which was out of scope of this RFE. Also I was not entirely
sure if there wasn't any specific reason for the ct_dpif_tuple specifically
the address part. It seems like optimization to me.

Anyway it should be possible to unify those if needed.


>
>
> OTOH, I'm not sure I understand why we need both tuples here.
> Can we have a tuple and a flag that indicates a direction
> instead?
>

That is also a possibility. However that wouldn't fit too that nicely in the
whole match struct, we could probably duplicate the proto field and get rid
of the
l3_type. That way we could supply just two tuples instead, getting rid of
the match completely.


>
> There are several issues with having both original and reply
> tuples, IIUC:
>
> 1. None of the command line interfaces support specifying
>    reply tuple.
>

That was the intention of the original command AFAICT, we could modify the
commands
to support that. However I'm not sure if there is any benefit in that.


>
> 2. If the reply tuple will be specified via OpenFlow, it
>    will be ignored if the original direction tuple is
>    fully specified.
>

That is ok IMO, because a fully specified original should be enough to
determine the one
entry that needs to be erased. So the reply direction in that would be an
extra.


>
> 3. Netfilter kernel interface allows specifying only one
>    tuple, not both, AFAICT.
>

Right, that's the reason why, when we match, we use the original tuple to
erase that
entry.


>
> I'm not a conntrack expert, so I have a question: Is it possible
> that addresses and ports in original and reply directions are
> not mirrored copies?  Or do we care about these cases?
>

Not sure if it's possible, but even if it would be, the current
implementation
would account for that, with the exception of specifying fully the original
direction.


>
> If we want to flush all entries related to the same IP:PORT,
> we will not be able to do that in a single OpenFlow request,
> because we'll need to supply the same IP and PORT as src
> and dst, or as src in both original and reply directions.
> None of these can fit into a single request.
>

Is that really a problem? You could do that in two, one for the original
one for reply. This is also very similar to the interface supported by
conntrack-tools,
with those you couldn't remove all entries in a single call as well. We
could modify
the interface that would allow matching on tuple without direction
specification,
that would translate to both directions, however there is still the
question if that's needed.


>
> At this point is there actually any value in flushing basing
> on reply direction?  Shouldn't we be able to flush the same
> entries by flushing by dst in original direction, if we will
> need to issue multiple requests anyway?
>

That assumption would be fine when you can specify a full tuple for the
original direction.
The problem is that for example OVN we know only the original
DST_IP:DST_PORT (VIP)
and the reply SRC_IP:SRC_PORT (backend). That means we cannot fully specify
the original direction
without doing the filtering ourselves. It would be possible with extra
steps from the user. If this interface
is too generic/complicated for that purpose we could say that the flush
extension supports only full orig tuple
and leave the filtering up to the user. That would definitely work, it
wouldn't be that elegant for the user however
from the code change perspective much simpler.

 I can see a certain benefit of keeping it in OvS:

1) The filtering complexity is hidden from the user.
2) The data transfer between OvS and the user is much smaller (even if you
need to send two matches).
3) Once implemented it's just ready to be used without the need of
additional thoughts going into it.




>
> Did you start working on the OVN side of things?  Which
> commands OVN will actually use?
>

Yes, I did. The plan is that OVN will use the extension.


>
> > +
> >  bool ofputil_decode_hello(const struct ofp_header *,
> >                            uint32_t *allowed_versions);
> >  struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index a0fabe38f..37135f118 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
> >       lib/ofp-actions.c \
> >       lib/ofp-bundle.c \
> >       lib/ofp-connection.c \
> > +     lib/ofp-ct-util.c \
> > +     lib/ofp-ct-util.h \
> >       lib/ofp-ed-props.c \
> >       lib/ofp-errors.c \
> >       lib/ofp-flow.c \
> > diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> > index 6f17a26b5..9b324ff79 100644
> > --- a/lib/ct-dpif.c
> > +++ b/lib/ct-dpif.c
> > @@ -20,6 +20,7 @@
> >  #include <errno.h>
> >
> >  #include "ct-dpif.h"
> > +#include "ofp-ct-util.h"
> >  #include "openvswitch/ofp-parse.h"
> >  #include "openvswitch/vlog.h"
> >
> > @@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct
> ct_dpif_dump_state **dump,
> >      return err;
> >  }
> >
> > +static void
> > +ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple
> *ofp_tuple,
> > +                                    struct ct_dpif_tuple *tuple,
> > +                                    uint16_t l3_type, uint8_t ip_proto)
> > +{
> > +    if (l3_type == AF_INET) {
> > +        tuple->src.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->src);
> > +        tuple->dst.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->dst);
> > +    } else {
> > +        tuple->src.in6 = ofp_tuple->src;
> > +        tuple->dst.in6 = ofp_tuple->dst;
> > +    }
> > +
> > +    tuple->l3_type = l3_type;
> > +    tuple->ip_proto = ip_proto;
> > +    tuple->src_port = ofp_tuple->src_port;
> > +
> > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> > +        tuple->icmp_code = ofp_tuple->icmp_code;
> > +        tuple->icmp_type = ofp_tuple->icmp_type;
> > +    } else {
> > +        tuple->dst_port = ofp_tuple->dst_port;
> > +    }
> > +}
> > +
> >  /* Dump one connection from a tracker, and put it in 'entry'.
> >   *
> >   * 'dump' should have been initialized by ct_dpif_dump_start().
> > @@ -100,25 +126,77 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
> >              ? dpif->dpif_class->ct_dump_done(dpif, dump)
> >              : EOPNOTSUPP);
> >  }
> > -
> > +
> > +static int
> > +ct_dpif_flush_tuple(struct dpif *dpif, const uint16_t *zone,
> > +                    const struct ofputil_ct_match *match) {
> > +    struct ct_dpif_dump_state *dump;
> > +    struct ct_dpif_entry cte;
> > +    int error;
> > +    int tot_bkts;
> > +
> > +    if (!dpif->dpif_class->ct_flush) {
> > +        return EOPNOTSUPP;
> > +    }
> > +
> > +    if (VLOG_IS_DBG_ENABLED()) {
> > +        struct ds ds = DS_EMPTY_INITIALIZER;
> > +        ofputil_ct_match_format(&ds, match);
> > +        VLOG_DBG("%s: ct_flush:%s in zone %d", dpif_name(dpif),
> ds_cstr(&ds),
> > +                 zone ? *zone : 0);
> > +        ds_destroy(&ds);
> > +    }
> > +
> > +    /* If we have full five tuple in orig just do the flush over that
> > +     * tuple directly. */
> > +    if (ofputil_ct_tuple_is_five_tuple(&match->tuple_orig,
> match->ip_proto)) {
> > +        struct ct_dpif_tuple tuple;
> > +        ct_dpif_tuple_from_ofputil_ct_tuple(&match->tuple_orig, &tuple,
> > +                                            match->l3_type,
> match->ip_proto);
> > +        return dpif->dpif_class->ct_flush(dpif, zone, &tuple);
>
> Here we just fully ignored the reply tuple, even if it was
> actually specified.  What if it didn't match whatever we're
> oing to remove?  It is user-specified, so can be anything.
>

This is a shortcut for a fully specified tuple. If user fully specifies
original
tuple there shouldn't be any need to actually match on reply as original
should be
unique enough. It's up to the user to provide the right tuple, also it
wouldn't help if we tried
to match later on because we would fail in the original direction
nevertheless.


>
> > +    }
> > +
> > +    error = ct_dpif_dump_start(dpif, &dump, zone, &tot_bkts);
> > +    if (error) {
> > +        return error;
> > +    }
> > +
> > +    while (!(error = ct_dpif_dump_next(dump, &cte))) {
> > +        if (zone && *zone != cte.zone) {
> > +            continue;
> > +        }
> > +
> > +        if (ofputil_ct_match_cmp(match, &cte)) {
> > +            error = dpif->dpif_class->ct_flush(dpif, &cte.zone,
> > +                                               &cte.tuple_orig);
>
> We assume here that original direction tuple uniquely identifies
> the conntrack entry.
>

Which should be the case. Note not only the original direction but also the
zone.



>
> > +            if (error) {
> > +                break;
> > +            }
> > +        }
> > +    }
> > +    if (error == EOF) {
> > +        error = 0;
> > +    }
> > +
> > +    ct_dpif_dump_done(dump);
> > +    return error;
> > +}
> > +
> >  /* Flush the entries in the connection tracker used by 'dpif'.  The
> >   * arguments have the following behavior:
> >   *
> > - *   - If both 'zone' and 'tuple' are NULL, flush all the conntrack
> entries.
> > - *   - If 'zone' is not NULL, and 'tuple' is NULL, flush all the
> conntrack
> > + *   - If both 'zone' and 'match' are NULL, or match is zero,
>
> It's not clear that 'or' applies to the 'match' part only.
>
> > + *     flush all the conntrack entries.
> > + *   - If 'zone' is not NULL, and 'match' is NULL, flush all the
> conntrack
> >   *     entries in '*zone'.
> > - *   - If 'tuple' is not NULL, flush the conntrack entry specified by
> 'tuple'
> > + *   - If 'tuple' is not NULL, flush the conntrack entry specified by
> 'match'
>
> s/tuple/match/
>
> >   *     in '*zone'. If 'zone' is NULL, use the default zone (zone 0). */
> >  int
> >  ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
> > -              const struct ct_dpif_tuple *tuple)
> > +              const struct ofputil_ct_match *match)
> >  {
> > -    if (tuple) {
> > -        struct ds ds = DS_EMPTY_INITIALIZER;
> > -        ct_dpif_format_tuple(&ds, tuple);
> > -        VLOG_DBG("%s: ct_flush: %s in zone %d", dpif_name(dpif),
> ds_cstr(&ds),
> > -                                                zone ? *zone : 0);
> > -        ds_destroy(&ds);
> > +    if (match && !ofputil_is_ct_match_zero(match)) {
> > +        return ct_dpif_flush_tuple(dpif, zone, match);
> >      } else if (zone) {
> >          VLOG_DBG("%s: ct_flush: zone %"PRIu16, dpif_name(dpif), *zone);
> >      } else {
> > @@ -126,7 +204,7 @@ ct_dpif_flush(struct dpif *dpif, const uint16_t
> *zone,
> >      }
> >
> >      return (dpif->dpif_class->ct_flush
> > -            ? dpif->dpif_class->ct_flush(dpif, zone, tuple)
> > +            ? dpif->dpif_class->ct_flush(dpif, zone, NULL)
> >              : EOPNOTSUPP);
> >  }
> >
> > @@ -583,112 +661,6 @@ ct_dpif_format_tcp_stat(struct ds * ds, int
> tcp_state, int conn_per_state)
> >      ds_put_format(ds, "=%u", conn_per_state);
> >  }
> >
> > -/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
> > - * Returns true on success.  Otherwise, returns false and puts the error
> > - * message in 'ds'. */
> > -bool
> > -ct_dpif_parse_tuple(struct ct_dpif_tuple *tuple, const char *s, struct
> ds *ds)
> > -{
> > -    char *pos, *key, *value, *copy;
> > -    memset(tuple, 0, sizeof *tuple);
> > -
> > -    pos = copy = xstrdup(s);
> > -    while (ofputil_parse_key_value(&pos, &key, &value)) {
> > -        if (!*value) {
> > -            ds_put_format(ds, "field %s missing value", key);
> > -            goto error;
> > -        }
> > -
> > -        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")) {
> > -            if (tuple->l3_type && tuple->l3_type != AF_INET) {
> > -                ds_put_cstr(ds, "L3 type set multiple times");
> > -                goto error;
> > -            } else {
> > -                tuple->l3_type = AF_INET;
> > -            }
> > -            if (!ip_parse(value, key[6] == 's' ? &tuple->src.ip :
> > -                                                 &tuple->dst.ip)) {
> > -                goto error_with_msg;
> > -            }
> > -        } else if (!strcmp(key, "ct_ipv6_src") ||
> > -                   !strcmp(key, "ct_ipv6_dst")) {
> > -            if (tuple->l3_type && tuple->l3_type != AF_INET6) {
> > -                ds_put_cstr(ds, "L3 type set multiple times");
> > -                goto error;
> > -            } else {
> > -                tuple->l3_type = AF_INET6;
> > -            }
> > -            if (!ipv6_parse(value, key[8] == 's' ? &tuple->src.in6 :
> > -                                                   &tuple->dst.in6)) {
> > -                goto error_with_msg;
> > -            }
> > -        } else if (!strcmp(key, "ct_nw_proto")) {
> > -            char *err = str_to_u8(value, key, &tuple->ip_proto);
> > -            if (err) {
> > -                free(err);
> > -                goto error_with_msg;
> > -            }
> > -        } else if (!strcmp(key, "ct_tp_src") ||
> !strcmp(key,"ct_tp_dst")) {
> > -            uint16_t port;
> > -            char *err = str_to_u16(value, key, &port);
> > -            if (err) {
> > -                free(err);
> > -                goto error_with_msg;
> > -            }
> > -            if (key[6] == 's') {
> > -                tuple->src_port = htons(port);
> > -            } else {
> > -                tuple->dst_port = htons(port);
> > -            }
> > -        } else if (!strcmp(key, "icmp_type") || !strcmp(key,
> "icmp_code") ||
> > -                   !strcmp(key, "icmp_id") ) {
> > -            if (tuple->ip_proto != IPPROTO_ICMP &&
> > -                tuple->ip_proto != IPPROTO_ICMPV6) {
> > -                ds_put_cstr(ds, "invalid L4 fields");
> > -                goto error;
> > -            }
> > -            uint16_t icmp_id;
> > -            char *err;
> > -            if (key[5] == 't') {
> > -                err = str_to_u8(value, key, &tuple->icmp_type);
> > -            } else if (key[5] == 'c') {
> > -                err = str_to_u8(value, key, &tuple->icmp_code);
> > -            } else {
> > -                err = str_to_u16(value, key, &icmp_id);
> > -                tuple->icmp_id = htons(icmp_id);
> > -            }
> > -            if (err) {
> > -                free(err);
> > -                goto error_with_msg;
> > -            }
> > -        } else {
> > -            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
> > -            goto error;
> > -        }
> > -    }
> > -
> > -    if (ipv6_is_zero(&tuple->src.in6) || ipv6_is_zero(&tuple->dst.in6)
> ||
> > -        !tuple->ip_proto) {
> > -        /* icmp_type, icmp_code, and icmp_id can be 0. */
> > -        if (tuple->ip_proto != IPPROTO_ICMP &&
> > -            tuple->ip_proto != IPPROTO_ICMPV6) {
> > -            if (!tuple->src_port || !tuple->dst_port) {
> > -                ds_put_cstr(ds, "at least one of the conntrack 5-tuple
> fields "
> > -                                "is missing.");
> > -                goto error;
> > -            }
> > -        }
> > -    }
> > -
> > -    free(copy);
> > -    return true;
> > -
> > -error_with_msg:
> > -    ds_put_format(ds, "failed to parse field %s", key);
> > -error:
> > -    free(copy);
> > -    return false;
> > -}
> >
> >  void
> >  ct_dpif_push_zone_limit(struct ovs_list *zone_limits, uint16_t zone,
> > diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
> > index 2848549b0..337c99dd4 100644
> > --- a/lib/ct-dpif.h
> > +++ b/lib/ct-dpif.h
> > @@ -17,6 +17,7 @@
> >  #ifndef CT_DPIF_H
> >  #define CT_DPIF_H
> >
> > +#include "openvswitch/ofp-util.h"
> >  #include "openvswitch/types.h"
> >  #include "packets.h"
> >
> > @@ -285,7 +286,7 @@ int ct_dpif_dump_start(struct dpif *, struct
> ct_dpif_dump_state **,
> >  int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry
> *);
> >  int ct_dpif_dump_done(struct ct_dpif_dump_state *);
> >  int ct_dpif_flush(struct dpif *, const uint16_t *zone,
> > -                  const struct ct_dpif_tuple *);
> > +                  const struct ofputil_ct_match *);
> >  int ct_dpif_set_maxconns(struct dpif *dpif, uint32_t maxconns);
> >  int ct_dpif_get_maxconns(struct dpif *dpif, uint32_t *maxconns);
> >  int ct_dpif_get_nconns(struct dpif *dpif, uint32_t *nconns);
> > @@ -311,7 +312,6 @@ void ct_dpif_format_ipproto(struct ds *ds, uint16_t
> ipproto);
> >  void ct_dpif_format_tuple(struct ds *, const struct ct_dpif_tuple *);
> >  uint8_t ct_dpif_coalesce_tcp_state(uint8_t state);
> >  void ct_dpif_format_tcp_stat(struct ds *, int, int);
> > -bool ct_dpif_parse_tuple(struct ct_dpif_tuple *, const char *s, struct
> ds *);
> >  void ct_dpif_push_zone_limit(struct ovs_list *, uint16_t zone, uint32_t
> limit,
> >                               uint32_t count);
> >  void ct_dpif_free_zone_limits(struct ovs_list *);
> > diff --git a/lib/dpctl.c b/lib/dpctl.c
> > index 29041fa3e..e6e350aba 100644
> > --- a/lib/dpctl.c
> > +++ b/lib/dpctl.c
> > @@ -40,6 +40,7 @@
> >  #include "netdev.h"
> >  #include "netlink.h"
> >  #include "odp-util.h"
> > +#include "ofp-ct-util.h"
> >  #include "openvswitch/ofpbuf.h"
> >  #include "packets.h"
> >  #include "openvswitch/shash.h"
> > @@ -1707,47 +1708,38 @@ dpctl_flush_conntrack(int argc, const char
> *argv[],
> >                        struct dpctl_params *dpctl_p)
> >  {
> >      struct dpif *dpif = NULL;
> > -    struct ct_dpif_tuple tuple, *ptuple = NULL;
> > -    struct ds ds = DS_EMPTY_INITIALIZER;
> > -    uint16_t zone, *pzone = NULL;
> > -    int error;
> > +    struct ofputil_ct_match match = {0};
> > +    uint16_t zone;
> > +    bool with_zone = false;
> >      int args = argc - 1;
> >
> >      /* Parse ct tuple */
> > -    if (args && ct_dpif_parse_tuple(&tuple, argv[args], &ds)) {
> > -        ptuple = &tuple;
> > +    if (args) {
> > +        struct ds ds = DS_EMPTY_INITIALIZER;
> > +        if (!ofputil_ct_match_parse(&match, argv[args], &ds)) {
> > +            dpctl_error(dpctl_p, EINVAL, "%s", ds_cstr(&ds));
> > +            ds_destroy(&ds);
> > +            return EINVAL;
> > +        }
> >          args--;
> >      }
> >
> >      /* Parse zone */
> >      if (args && ovs_scan(argv[args], "zone=%"SCNu16, &zone)) {
> > -        pzone = &zone;
> > -        args--;
> > -    }
> > -
> > -    /* Report error if there are more than one unparsed argument. */
> > -    if (args > 1) {
> > -        ds_put_cstr(&ds, "invalid arguments");
> > -        error = EINVAL;
> > -        goto error;
> > +        with_zone = true;
> >      }
> >
> > -    error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
> > +    int error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
> >      if (error) {
> > +        dpctl_error(dpctl_p, error, "Cannot open dpif");
> >          return error;
> >      }
> >
> > -    error = ct_dpif_flush(dpif, pzone, ptuple);
> > -    if (!error) {
> > -        dpif_close(dpif);
> > -        return 0;
> > -    } else {
> > -        ds_put_cstr(&ds, "failed to flush conntrack");
> > +    error = ct_dpif_flush(dpif, with_zone ? &zone : NULL, &match);
> > +    if (error) {
> > +        dpctl_error(dpctl_p, error, "Failed to flush conntrack");
> >      }
> >
> > -error:
> > -    dpctl_error(dpctl_p, error, "%s", ds_cstr(&ds));
> > -    ds_destroy(&ds);
> >      dpif_close(dpif);
> >      return error;
> >  }
> > diff --git a/lib/dpctl.man b/lib/dpctl.man
> > index 87ea8087b..b0cabe05d 100644
> > --- a/lib/dpctl.man
> > +++ b/lib/dpctl.man
> > @@ -312,7 +312,8 @@ If \fBzone\fR=\fIzone\fR is specified, only flushes
> the connections in
> >  If \fIct-tuple\fR is provided, flushes the connection entry specified by
> >  \fIct-tuple\fR in \fIzone\fR. The zone defaults to 0 if it is not
> provided.
> >  The userspace connection tracker requires flushing with the original
> pre-NATed
> > -tuple and a warning log will be otherwise generated.
> > +tuple and a warning log will be otherwise generated. The tuple can be
> partial
> > +and will remove all connections that are matching on the specified
> fields.
> >  An example of an IPv4 ICMP \fIct-tuple\fR:
> >  .IP
> >
> "ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=1,icmp_type=8,icmp_code=0,icmp_id=10"
> > diff --git a/lib/ofp-ct-util.c b/lib/ofp-ct-util.c
> > new file mode 100644
> > index 000000000..a10607590
> > --- /dev/null
> > +++ b/lib/ofp-ct-util.c
> > @@ -0,0 +1,326 @@
> > +
> > +/* Copyright (c) 2022, Red Hat, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +#include <sys/types.h>
> > +#include <netinet/in.h>
> > +#include <netinet/icmp6.h>
> > +
> > +#include "ct-dpif.h"
> > +#include "ofp-ct-util.h"
> > +#include "openvswitch/dynamic-string.h"
> > +#include "openvswitch/ofp-parse.h"
> > +#include "openvswitch/ofp-util.h"
> > +#include "openvswitch/packets.h"
> > +
> > +static inline bool
> > +ofputil_ct_inet_addr_cmp_partial(const struct in6_addr *partial,
> > +                                 const union ct_dpif_inet_addr *addr,
> > +                                 const uint16_t l3_type)
> > +{
> > +    if (ipv6_is_zero(partial)) {
> > +        return true;
> > +    }
> > +
> > +    if (l3_type == AF_INET && in6_addr_get_mapped_ipv4(partial) !=
> addr->ip) {
> > +        return false;
> > +    }
> > +
> > +    if (l3_type == AF_INET6 && !ipv6_addr_equals(partial, &addr->in6)) {
> > +        return false;
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +static inline bool
> > +ofputil_ct_tuple_ip_cmp_partial(const struct ofputil_ct_tuple *partial,
> > +                                const struct ct_dpif_tuple *tuple,
> > +                                const uint16_t l3_type, const uint8_t
> ip_proto)
> > +{
> > +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->src,
> > +                                          &tuple->src, l3_type)) {
> > +        return false;
> > +    }
> > +
> > +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->dst,
> > +                                          &tuple->dst, l3_type)) {
> > +        return false;
> > +    }
> > +
> > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> > +        if (partial->icmp_id != tuple->icmp_id) {
> > +            return false;
> > +        }
> > +
> > +        if (partial->icmp_type != tuple->icmp_type) {
> > +            return false;
> > +        }
> > +
> > +        if (partial->icmp_code != tuple->icmp_code) {
> > +            return false;
> > +        }
> > +    } else {
> > +        if (partial->src_port && partial->src_port != tuple->src_port) {
> > +            return false;
> > +        }
> > +
> > +        if (partial->dst_port && partial->dst_port != tuple->dst_port) {
> > +            return false;
> > +        }
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +/* Compares the non-zero members if they match. This is useful for
> clearing
> > + * up all connections specified by a partial tuples for orig/reply. */
> > +bool
> > +ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
> > +                     const struct ct_dpif_entry *entry)
> > +{
> > +    if (match->l3_type && match->l3_type != entry->tuple_orig.l3_type) {
> > +        return false;
> > +    }
> > +
> > +    if (match->ip_proto && match->ip_proto !=
> entry->tuple_orig.ip_proto) {
> > +        return false;
> > +    }
> > +
> > +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_orig,
> > +                                         &entry->tuple_orig,
> > +                                         match->l3_type,
> match->ip_proto)) {
> > +        return false;
> > +    }
> > +
> > +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_reply,
> > +                                         &entry->tuple_reply,
> > +                                         match->l3_type,
> match->ip_proto)) {
> > +        return false;
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +static void
> > +ofputil_ct_tuple_format(struct ds *ds, const struct ofputil_ct_tuple
> *tuple,
> > +                        uint8_t ip_proto)
> > +{
> > +    ds_put_cstr(ds, "src=");
> > +    ipv6_format_mapped(&tuple->src, ds);
> > +    ds_put_cstr(ds, ",dst=");
> > +    ipv6_format_mapped(&tuple->dst, ds);
> > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> > +        ds_put_format(ds, ",icmp_id=%u,icmp_type=%u,icmp_code=%u",
> > +                      ntohs(tuple->icmp_id), tuple->icmp_type,
> > +                      tuple->icmp_code);
> > +
> > +    } else {
> > +        ds_put_format(ds, ",src_port=%u,dst_port=%u",
> ntohs(tuple->src_port),
> > +                      ntohs(tuple->dst_port));
> > +    }
> > +}
> > +
> > +static bool
> > +ofputil_is_tuple_zero(const struct ofputil_ct_tuple *tuple)
> > +{
> > +    return ipv6_is_zero(&tuple->src) && ipv6_is_zero(&tuple->dst) &&
> > +           !tuple->src_port && !tuple->dst_port;
> > +}
> > +
> > +bool
> > +ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple *tuple,
> > +                               uint8_t ip_proto)
> > +{
> > +    /* First check if we have address. */
> > +    bool five_tuple = !ipv6_is_zero(&tuple->src) &&
> !ipv6_is_zero(&tuple->dst);
> > +
> > +    if (!(ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6)) {
> > +        five_tuple = five_tuple && tuple->src_port && tuple->dst_port;
> > +    }
> > +
> > +    return five_tuple;
> > +}
> > +
> > +bool
> > +ofputil_is_ct_match_zero(const struct ofputil_ct_match *match)
> > +{
> > +    return !match->ip_proto && !match->l3_type &&
> > +           ofputil_is_tuple_zero(&match->tuple_orig) &&
> > +           ofputil_is_tuple_zero(&match->tuple_reply);
> > +}
> > +
> > +void
> > +ofputil_ct_match_format(struct ds *ds, const struct ofputil_ct_match
> *match)
> > +{
> > +    ds_put_format(ds, " l3_type=%u,ip_proto=%u", match->l3_type,
> > +                  match->ip_proto);
> > +    ds_put_cstr(ds, ",orig=(");
> > +    ofputil_ct_tuple_format(ds, &match->tuple_orig, match->ip_proto);
> > +    ds_put_cstr(ds, "),reply=(");
> > +    ofputil_ct_tuple_format(ds, &match->tuple_reply, match->ip_proto);
> > +    ds_put_cstr(ds, ")");
>
> This syntax is very different from what is used as cmdline arguments,
> that is not good.  This means you cannot feed back what OVS prints out.
>
> Also, OVS usually uses 'nw_proto' instead of 'ip_proto', especially
> on the OpenFlow side of things.  And it might be better to print
> a string 'ip'/'ipv6' instead of 'l3_type', which is not great on its
> own.  See also comments on patch 2/2.
> They can also be combined into a single 'icmp'/'icmp6'/'udp'/'udp6'/etc.
> And there is already a function like this.
>
> > +}
> > +
> > +static bool
> > +ofputil_ct_tuple_ip_parse(struct in6_addr *addr, char *value, uint16_t
> l3_type)
> > +{
> > +    if (!ipv6_is_zero(addr)) {
> > +        return false;
> > +    }
> > +
> > +    if (l3_type == AF_INET) {
> > +        ovs_be32 ip = 0;
> > +
> > +        ip_parse(value, &ip);
> > +        *addr = in6_addr_mapped_ipv4(ip);
> > +    } else {
> > +        ipv6_parse(value, addr);
>
> Both ip_parse and ipv6_parse can fail.
> There is also lookup_any() function that can be exported and reused.
>
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
> > + * Returns true on success.  Otherwise, returns false and puts the error
> > + * message in 'ds'. */
> > +bool
> > +ofputil_ct_match_parse(struct ofputil_ct_match *match, const char *s,
> > +                       struct ds *ds)
> > +{
> > +    char *pos, *key, *value, *copy;
> > +
> > +    memset(match, 0, sizeof *match);
> > +    struct ofputil_ct_tuple *tuple = &match->tuple_orig;
> > +
> > +    pos = copy = xstrdup(s);
> > +    while (ofputil_parse_key_value(&pos, &key, &value)) {
> > +        if (!*value) {
> > +            ds_put_format(ds, "field %s missing value", key);
> > +            goto error_with_msg;
> > +        }
> > +
> > +        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")
> > +            || !strcmp(key, "ct_ipv6_src") || !strcmp(key,
> "ct_ipv6_dst")) {
> > +            match->l3_type = key[6] == '6' ? AF_INET6 : AF_INET;
>
> Original code made sure that user didn't specify keys from
> different families.  This version does not.
>

> > +            uint8_t index = key[6] == '6' ? 8 : 6;
> > +            struct in6_addr *addr = key[index] == 's'
> > +                ? &tuple->src : &tuple->dst;
> > +
> > +            if (!ofputil_ct_tuple_ip_parse(addr, value,
> match->l3_type)) {
> > +                ds_put_format(ds, "%s is set multiple times", key);
> > +                goto error;
> > +            }
> > +        } else if (!strcmp(key, "ct_nw_proto")) {
> > +            char *err = str_to_u8(value, key, &match->ip_proto);
> > +
> > +            if (err) {
> > +                free(err);
> > +                goto error_with_msg;
> > +            }
> > +        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key,
> "ct_tp_dst")) {
> > +            uint16_t port;
> > +            char *err = str_to_u16(value, key, &port);
> > +
> > +            if (err) {
> > +                free(err);
> > +                goto error_with_msg;
> > +            }
> > +            if (key[6] == 's') {
> > +                tuple->src_port = htons(port);
> > +            } else {
> > +                tuple->dst_port = htons(port);
> > +            }
> > +        } else if (!strcmp(key, "icmp_type") || !strcmp(key,
> "icmp_code") ||
> > +                   !strcmp(key, "icmp_id")) {
> > +            if (match->ip_proto != IPPROTO_ICMP &&
> > +                match->ip_proto != IPPROTO_ICMPV6) {
> > +                ds_put_cstr(ds, "invalid L4 fields");
> > +                goto error;
> > +            }
> > +            uint16_t icmp_id;
> > +            char *err;
> > +
> > +            if (key[5] == 't') {
> > +                err = str_to_u8(value, key, &tuple->icmp_type);
> > +            } else if (key[5] == 'c') {
> > +                err = str_to_u8(value, key, &tuple->icmp_code);
> > +            } else {
> > +                err = str_to_u16(value, key, &icmp_id);
> > +                tuple->icmp_id = htons(icmp_id);
> > +            }
> > +            if (err) {
> > +                free(err);
> > +                goto error_with_msg;
> > +            }
> > +        } else {
> > +            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
> > +            goto error;
> > +        }
> > +    }
> > +
> > +    if (!match->ip_proto && (tuple->src_port || tuple->dst_port)) {
> > +        ds_put_cstr(ds, "port is set without protocol");
> > +        goto error;
> > +    }
> > +
> > +    /* For the filtering to work with icmp we need to fill the reply
> direction
> > +     * with correct information. */
> > +    if (match->ip_proto == IPPROTO_ICMP) {
> > +        switch (match->tuple_orig.icmp_type) {
> > +        case ICMP4_ECHO_REQUEST:
> > +            match->tuple_reply.icmp_type = ICMP4_ECHO_REPLY;
> > +            break;
>
> An empty line after the 'break;'.
>
> > +        case ICMP4_ECHO_REPLY:
> > +            match->tuple_reply.icmp_type = ICMP4_ECHO_REQUEST;
> > +            break;
> > +        case ICMP4_TIMESTAMP:
> > +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMPREPLY;
> > +            break;
> > +        case ICMP4_TIMESTAMPREPLY:
> > +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMP;
> > +            break;
> > +        case ICMP4_INFOREQUEST:
> > +            match->tuple_reply.icmp_type = ICMP4_INFOREPLY;
> > +            break;
> > +        case ICMP4_INFOREPLY:
> > +            match->tuple_reply.icmp_type = ICMP4_INFOREQUEST;
> > +            break;
> > +        }
> > +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
> > +    } else if (match->ip_proto == IPPROTO_ICMPV6) {
> > +        switch (match->tuple_orig.icmp_type) {
> > +        case ICMP6_ECHO_REQUEST:
> > +            match->tuple_reply.icmp_type = ICMP6_ECHO_REPLY;
> > +            break;
> > +        case ICMP6_ECHO_REPLY:
> > +            match->tuple_reply.icmp_type = ICMP6_ECHO_REQUEST;
> > +            break;
> > +        }
> > +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
> > +    }
> > +
> > +    free(copy);
> > +    return true;
> > +
> > +error_with_msg:
> > +    ds_put_format(ds, "failed to parse field %s", key);
> > +error:
> > +    free(copy);
> > +    return false;
> > +}
> > diff --git a/lib/ofp-ct-util.h b/lib/ofp-ct-util.h
> > new file mode 100644
> > index 000000000..5a5eaf920
> > --- /dev/null
> > +++ b/lib/ofp-ct-util.h
> > @@ -0,0 +1,36 @@
> > +/* Copyright (c) 2022, Red Hat, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef OVS_OFP_CT_UTIL_H
> > +#define OVS_OFP_CT_UTIL_H
> > +
> > +#include "ct-dpif.h"
> > +#include "openvswitch/ofp-util.h"
> > +
> > +bool ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
> > +                                 const struct ct_dpif_entry *entry);
> > +
> > +bool ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple
> *tuple,
> > +                                    uint8_t ip_proto);
> > +
> > +void ofputil_ct_match_format(struct ds *ds,
> > +                             const struct ofputil_ct_match *match);
> > +
> > +bool ofputil_ct_match_parse(struct ofputil_ct_match *match, const char
> *s,
> > +                            struct ds *ds);
> > +
> > +bool ofputil_is_ct_match_zero(const struct ofputil_ct_match *match);
> > +
> > +#endif /* lib/ofp-ct-util.h */
> > diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> > index e5403519f..51903a658 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -2230,7 +2230,7 @@
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
> > -AT_SETUP([conntrack - ct flush by 5-tuple])
> > +AT_SETUP([conntrack - ct flush])
> >  CHECK_CONNTRACK()
> >  OVS_TRAFFIC_VSWITCHD_START()
> >
> > @@ -2291,6 +2291,86 @@ AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5
> $ICMP_TUPLE])
> >  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [1], [dnl
> >  ])
> >
> > +dnl Test UDP from port 1 and 2, partial flush by src port
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort],
> [0], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_src=1'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0],
> [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_src=2'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> > +
> > +dnl Test UDP from port 1 and 2, partial flush by dst port
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort],
> [0], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_dst=2'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0],
> [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_dst=1'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> > +
> > +dnl Test UDP from port 1 and 2, partial flush by src address
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort],
> [0], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.1'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0],
> [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> > +
> > +dnl Test UDP from port 1 and 2, partial flush by dst address
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort],
> [0], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.2'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0],
> [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.1'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
> > +
> >  OVS_TRAFFIC_VSWITCHD_STOP
> >  AT_CLEANUP
> >
>
>
Ilya Maximets Jan. 11, 2023, 12:34 p.m. UTC | #3
On 1/11/23 07:53, Ales Musil wrote:
> 
> 
> On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> 
>     On 12/16/22 13:30, Ales Musil wrote:
>     > Currently, the CT can be flushed by dpctl only be specifying
> 
>     *by
> 
>     > the whole 5-tuple. This is not very convenient when there are
>     > only some fields known to the user of CT flush. Add new struct
>     > ofputil_ct_match which represents the generic filtering that can
>     > be done for CT flush. The match is done only on fields that are
>     > non-zero with exception to the icmp fields.
>     >
>     > This allows the filtering just within dpctl, however
>     > it is a preparation for OpenFlow extension.
>     >
>     > Reported-at: https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546>
>     > Signed-off-by: Ales Musil <amusil@redhat.com <mailto:amusil@redhat.com>>
>     > Acked-by: Paolo Valerio <pvalerio@redhat.com <mailto:pvalerio@redhat.com>>
>     > ---
>     > v5: Add ack from Paolo.
>     > v4: Fix a flush all scenario.
>     > v3: Rebase on top of master.
>     >     Address the C99 comment and missing dpif_close call.
>     > v2: Rebase on top of master.
>     >     Address comments from Paolo.
>     > ---
> 
>     Hi, Ales.  Thanks for working on this change.
>     Sorry for review taking so long.  I initially though there
>     will be mostly a few cosmetic/style changes needed.  But
>     longer I look at this code, more questions I have.  See below.
> 
>     Dumitru, maybe you can also help answer some of them?
> 
>     Best regards, Ilya Maximets.
> 
> 
> Hi Ilya,
> 
> thank you for the review. Before proceeding I tried to answer your concerns/questions.
> Let me know what you think.
> 
> Thanks,
> Ales
>  
> 
> 
>     >  NEWS                           |   2 +
>     >  include/openvswitch/ofp-util.h |  28 +++
>     >  lib/automake.mk <http://automake.mk>                |   2 +
>     >  lib/ct-dpif.c                  | 208 +++++++++------------
>     >  lib/ct-dpif.h                  |   4 +-
>     >  lib/dpctl.c                    |  42 ++---
>     >  lib/dpctl.man                  |   3 +-
>     >  lib/ofp-ct-util.c              | 326 +++++++++++++++++++++++++++++++++
>     >  lib/ofp-ct-util.h              |  36 ++++
>     >  tests/system-traffic.at <http://system-traffic.at>        |  82 ++++++++-
>     >  10 files changed, 586 insertions(+), 147 deletions(-)
>     >  create mode 100644 lib/ofp-ct-util.c
>     >  create mode 100644 lib/ofp-ct-util.h
>     >
>     > diff --git a/NEWS b/NEWS
>     > index 265375e1c..ff8904b02 100644
>     > --- a/NEWS
>     > +++ b/NEWS
>     > @@ -14,6 +14,8 @@ Post-v3.0.0
>     >       10 Gbps link speed by default in case the actual link speed cannot be
>     >       determined.  Previously it was 10 Mbps.  Values can still be overridden
>     >       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
>     > +   - ovs-dpctl and related ovs-appctl commands:
> 
>     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
> 
>     > +     * "flush-conntrack" is capable of handling partial 5-tuple.
>     > 
>     > 
>     >  v3.0.0 - 15 Aug 2022
>     > diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
>     > index 091a09cad..84937ae26 100644
>     > --- a/include/openvswitch/ofp-util.h
>     > +++ b/include/openvswitch/ofp-util.h
>     > @@ -19,6 +19,9 @@
>     > 
>     >  #include <stdbool.h>
>     >  #include <stdint.h>
>     > +#include <sys/types.h>
>     > +#include <netinet/in.h>
>     > +
>     >  #include "openvswitch/ofp-protocol.h"
>     > 
>     >  struct ofp_header;
>     > @@ -27,6 +30,31 @@ struct ofp_header;
>     >  extern "C" {
>     >  #endif
>     > 
>     > +struct ofputil_ct_tuple {
>     > +    struct in6_addr src;
>     > +    struct in6_addr dst;
>     > +
>     > +    union {
>     > +        ovs_be16 src_port;
>     > +        ovs_be16 icmp_id;
>     > +    };
>     > +    union {
>     > +        ovs_be16 dst_port;
>     > +        struct {
>     > +            uint8_t icmp_code;
>     > +            uint8_t icmp_type;
>     > +        };
>     > +    };
>     > +};
>     > +
>     > +struct ofputil_ct_match {
>     > +    uint8_t ip_proto;
>     > +    uint16_t l3_type;
>     > +
>     > +    struct ofputil_ct_tuple tuple_orig;
>     > +    struct ofputil_ct_tuple tuple_reply;
>     > +};
> 
>     These structures are very similar to existing ct_dpif_tuple.
>     We can keep them for now, but I think, we'll need to replace
>     ct_dpif_tuple with the new ofputil_ct_tuple in the future.
>     We may need some changes in ofputil_ct_tuple for that though.
> 
> 
> I was thinking about replacing them, however it would require a much
> bigger change, which was out of scope of this RFE. Also I was not entirely
> sure if there wasn't any specific reason for the ct_dpif_tuple specifically
> the address part. It seems like optimization to me.
> 
> Anyway it should be possible to unify those if needed.
>  
> 
> 
> 
>     OTOH, I'm not sure I understand why we need both tuples here.
>     Can we have a tuple and a flag that indicates a direction
>     instead?
> 
> 
> That is also a possibility. However that wouldn't fit too that nicely in the
> whole match struct, we could probably duplicate the proto field and get rid of the
> l3_type. That way we could supply just two tuples instead, getting rid of the match completely.

The point was to have only one tuple.  See below.

>  
> 
> 
>     There are several issues with having both original and reply
>     tuples, IIUC:
> 
>     1. None of the command line interfaces support specifying
>        reply tuple.
> 
> 
> That was the intention of the original command AFAICT, we could modify the commands
> to support that. However I'm not sure if there is any benefit in that.

ovs-ofctl supposed to mimic the OpenFlow request, so it should
support everything that OpenFlow request can have.

>  
> 
> 
>     2. If the reply tuple will be specified via OpenFlow, it
>        will be ignored if the original direction tuple is
>        fully specified.
> 
> 
> That is ok IMO, because a fully specified original should be enough to determine the one
> entry that needs to be erased. So the reply direction in that would be an extra.

That supports the point that we don't need two tuples in the request.

>  
> 
> 
>     3. Netfilter kernel interface allows specifying only one
>        tuple, not both, AFAICT.
> 
> 
> Right, that's the reason why, when we match, we use the original tuple to erase that
> entry.
>  
> 
> 
>     I'm not a conntrack expert, so I have a question: Is it possible
>     that addresses and ports in original and reply directions are
>     not mirrored copies?  Or do we care about these cases?
> 
> 
> Not sure if it's possible, but even if it would be, the current implementation
> would account for that, with the exception of specifying fully the original
> direction.
>  
> 
> 
>     If we want to flush all entries related to the same IP:PORT,
>     we will not be able to do that in a single OpenFlow request,
>     because we'll need to supply the same IP and PORT as src
>     and dst, or as src in both original and reply directions.
>     None of these can fit into a single request.
> 
> 
> Is that really a problem? You could do that in two, one for the original
> one for reply. This is also very similar to the interface supported by conntrack-tools,
> with those you couldn't remove all entries in a single call as well. We could modify
> the interface that would allow matching on tuple without direction specification,
> that would translate to both directions, however there is still the question if that's needed.


It's not a problem, but another supporting point for the interface
being redundant.  We don't seem to need two tuples in the request,
because users will have to send two different requests anyway.

So, the request should have:

1. zone
2. nw_proto
3. tuple
4. tuple type ('original' or 'reply'), in case we want to allow
   users to flush based on reply tuple.  But this seems a bit
   redundant as well.

Something like:

struct ofputil_ct_tuple {
}

struct ofputil_ct_match {
    uint8_t ip_proto;
    uint16_t l3_type;
    // enum DIRECTION direction;
    struct ofputil_ct_tuple tuple;
};


And the OpenFlow part:

/* CT flush available TLVs. */
enum nx_ct_flush_tlv_type {
    NXT_CT_ZONE_ID,           /* u8: CT zone id. */
    NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_* attributes. */
};

enum nx_ct_flush_tuple_type {
    NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped IPv4 address. */
    NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or mapped IPv4 address. */
    NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
    NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
    NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
    NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
    NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
    // NXT_CT_TUPLE_DIRECTION,   /* enum nx_ct_flush_tuple_direction: CT tuple direction. */
};

//enum nx_ct_flush_tuple_direction {
//    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
//    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
//}

struct nx_ct_flush {
    uint8_t nw_proto;          /* IP protocol. */
    uint8_t pad[7];            /* Align to 64-bit (must be zero). */
    /* Followed by ... */
};


Such interface will also allow us in to have both direction specified
if we would want that in the future.  For now we should check that
each TLV type is specified once, i.e. only one tuple is in the request.

We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can add it
in later releases if necessary.  Tuples without that TLV can be treated
as original direction.

>  
> 
> 
>     At this point is there actually any value in flushing basing
>     on reply direction?  Shouldn't we be able to flush the same
>     entries by flushing by dst in original direction, if we will
>     need to issue multiple requests anyway?
> 
> 
> That assumption would be fine when you can specify a full tuple for the original direction.
> The problem is that for example OVN we know only the original DST_IP:DST_PORT (VIP)
> and the reply SRC_IP:SRC_PORT (backend).

Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?

> That means we cannot fully specify the original direction
> without doing the filtering ourselves. It would be possible with extra steps from the user. If this interface
> is too generic/complicated for that purpose we could say that the flush extension supports only full orig tuple
> and leave the filtering up to the user. That would definitely work, it wouldn't be that elegant for the user however
> from the code change perspective much simpler.

I'm not sure I understand that part.  I'm not arguing aginst partial
matches.  Or do you mean something else?

> 
>  I can see a certain benefit of keeping it in OvS:
> 
> 1) The filtering complexity is hidden from the user.
> 2) The data transfer between OvS and the user is much smaller (even if you need to send two matches).
> 3) Once implemented it's just ready to be used without the need of additional thoughts going into it.
> 
> 
>  
> 
> 
>     Did you start working on the OVN side of things?  Which
>     commands OVN will actually use?
> 
> 
> Yes, I did. The plan is that OVN will use the extension.

I understand that.  The question was how the actual request will
look like.

>  
> 
> 
>     > +
>     >  bool ofputil_decode_hello(const struct ofp_header *,
>     >                            uint32_t *allowed_versions);
>     >  struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
>     > diff --git a/lib/automake.mk <http://automake.mk> b/lib/automake.mk <http://automake.mk>
>     > index a0fabe38f..37135f118 100644
>     > --- a/lib/automake.mk <http://automake.mk>
>     > +++ b/lib/automake.mk <http://automake.mk>
>     > @@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
>     >       lib/ofp-actions.c \
>     >       lib/ofp-bundle.c \
>     >       lib/ofp-connection.c \
>     > +     lib/ofp-ct-util.c \
>     > +     lib/ofp-ct-util.h \
>     >       lib/ofp-ed-props.c \
>     >       lib/ofp-errors.c \
>     >       lib/ofp-flow.c \
>     > diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
>     > index 6f17a26b5..9b324ff79 100644
>     > --- a/lib/ct-dpif.c
>     > +++ b/lib/ct-dpif.c
>     > @@ -20,6 +20,7 @@
>     >  #include <errno.h>
>     > 
>     >  #include "ct-dpif.h"
>     > +#include "ofp-ct-util.h"
>     >  #include "openvswitch/ofp-parse.h"
>     >  #include "openvswitch/vlog.h"
>     > 
>     > @@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct ct_dpif_dump_state **dump,
>     >      return err;
>     >  }
>     > 
>     > +static void
>     > +ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple *ofp_tuple,
>     > +                                    struct ct_dpif_tuple *tuple,
>     > +                                    uint16_t l3_type, uint8_t ip_proto)
>     > +{
>     > +    if (l3_type == AF_INET) {
>     > +        tuple->src.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->src);
>     > +        tuple->dst.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->dst);
>     > +    } else {
>     > +        tuple->src.in6 = ofp_tuple->src;
>     > +        tuple->dst.in6 = ofp_tuple->dst;
>     > +    }
>     > +
>     > +    tuple->l3_type = l3_type;
>     > +    tuple->ip_proto = ip_proto;
>     > +    tuple->src_port = ofp_tuple->src_port;
>     > +
>     > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
>     > +        tuple->icmp_code = ofp_tuple->icmp_code;
>     > +        tuple->icmp_type = ofp_tuple->icmp_type;
>     > +    } else {
>     > +        tuple->dst_port = ofp_tuple->dst_port;
>     > +    }
>     > +}
>     > +
>     >  /* Dump one connection from a tracker, and put it in 'entry'.
>     >   *
>     >   * 'dump' should have been initialized by ct_dpif_dump_start().
>     > @@ -100,25 +126,77 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
>     >              ? dpif->dpif_class->ct_dump_done(dpif, dump)
>     >              : EOPNOTSUPP);
>     >  }
>     > -
>     > +
>     > +static int
>     > +ct_dpif_flush_tuple(struct dpif *dpif, const uint16_t *zone,
>     > +                    const struct ofputil_ct_match *match) {
>     > +    struct ct_dpif_dump_state *dump;
>     > +    struct ct_dpif_entry cte;
>     > +    int error;
>     > +    int tot_bkts;
>     > +
>     > +    if (!dpif->dpif_class->ct_flush) {
>     > +        return EOPNOTSUPP;
>     > +    }
>     > +
>     > +    if (VLOG_IS_DBG_ENABLED()) {
>     > +        struct ds ds = DS_EMPTY_INITIALIZER;
>     > +        ofputil_ct_match_format(&ds, match);
>     > +        VLOG_DBG("%s: ct_flush:%s in zone %d", dpif_name(dpif), ds_cstr(&ds),
>     > +                 zone ? *zone : 0);
>     > +        ds_destroy(&ds);
>     > +    }
>     > +
>     > +    /* If we have full five tuple in orig just do the flush over that
>     > +     * tuple directly. */
>     > +    if (ofputil_ct_tuple_is_five_tuple(&match->tuple_orig, match->ip_proto)) {
>     > +        struct ct_dpif_tuple tuple;
>     > +        ct_dpif_tuple_from_ofputil_ct_tuple(&match->tuple_orig, &tuple,
>     > +                                            match->l3_type, match->ip_proto);
>     > +        return dpif->dpif_class->ct_flush(dpif, zone, &tuple);
> 
>     Here we just fully ignored the reply tuple, even if it was
>     actually specified.  What if it didn't match whatever we're
>     oing to remove?  It is user-specified, so can be anything.
> 
> 
> This is a shortcut for a fully specified tuple. If user fully specifies original
> tuple there shouldn't be any need to actually match on reply as original should be
> unique enough. It's up to the user to provide the right tuple, also it wouldn't help if we tried
> to match later on because we would fail in the original direction nevertheless.

The point was that if the user supplied full orig+reply, we
should not delete entries that do not fully match on both.
It is unlikley, but user may specify reply information that
doesn't match orig tuple.  In this case it would be strange
that OVS removed entries that do not match the request.

>  
> 
> 
>     > +    }
>     > +
>     > +    error = ct_dpif_dump_start(dpif, &dump, zone, &tot_bkts);
>     > +    if (error) {
>     > +        return error;
>     > +    }
>     > +
>     > +    while (!(error = ct_dpif_dump_next(dump, &cte))) {
>     > +        if (zone && *zone != cte.zone) {
>     > +            continue;
>     > +        }
>     > +
>     > +        if (ofputil_ct_match_cmp(match, &cte)) {
>     > +            error = dpif->dpif_class->ct_flush(dpif, &cte.zone,
>     > +                                               &cte.tuple_orig);
> 
>     We assume here that original direction tuple uniquely identifies
>     the conntrack entry.
> 
> 
> Which should be the case. Note not only the original direction but also the zone.
> 
>  
> 
> 
>     > +            if (error) {
>     > +                break;
>     > +            }
>     > +        }
>     > +    }
>     > +    if (error == EOF) {
>     > +        error = 0;
>     > +    }
>     > +
>     > +    ct_dpif_dump_done(dump);
>     > +    return error;
>     > +}
>     > +
>     >  /* Flush the entries in the connection tracker used by 'dpif'.  The
>     >   * arguments have the following behavior:
>     >   *
>     > - *   - If both 'zone' and 'tuple' are NULL, flush all the conntrack entries.
>     > - *   - If 'zone' is not NULL, and 'tuple' is NULL, flush all the conntrack
>     > + *   - If both 'zone' and 'match' are NULL, or match is zero,
> 
>     It's not clear that 'or' applies to the 'match' part only.
> 
>     > + *     flush all the conntrack entries.
>     > + *   - If 'zone' is not NULL, and 'match' is NULL, flush all the conntrack
>     >   *     entries in '*zone'.
>     > - *   - If 'tuple' is not NULL, flush the conntrack entry specified by 'tuple'
>     > + *   - If 'tuple' is not NULL, flush the conntrack entry specified by 'match'
> 
>     s/tuple/match/
> 
>     >   *     in '*zone'. If 'zone' is NULL, use the default zone (zone 0). */
>     >  int
>     >  ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
>     > -              const struct ct_dpif_tuple *tuple)
>     > +              const struct ofputil_ct_match *match)
>     >  {
>     > -    if (tuple) {
>     > -        struct ds ds = DS_EMPTY_INITIALIZER;
>     > -        ct_dpif_format_tuple(&ds, tuple);
>     > -        VLOG_DBG("%s: ct_flush: %s in zone %d", dpif_name(dpif), ds_cstr(&ds),
>     > -                                                zone ? *zone : 0);
>     > -        ds_destroy(&ds);
>     > +    if (match && !ofputil_is_ct_match_zero(match)) {
>     > +        return ct_dpif_flush_tuple(dpif, zone, match);
>     >      } else if (zone) {
>     >          VLOG_DBG("%s: ct_flush: zone %"PRIu16, dpif_name(dpif), *zone);
>     >      } else {
>     > @@ -126,7 +204,7 @@ ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
>     >      }
>     > 
>     >      return (dpif->dpif_class->ct_flush
>     > -            ? dpif->dpif_class->ct_flush(dpif, zone, tuple)
>     > +            ? dpif->dpif_class->ct_flush(dpif, zone, NULL)
>     >              : EOPNOTSUPP);
>     >  }
>     > 
>     > @@ -583,112 +661,6 @@ ct_dpif_format_tcp_stat(struct ds * ds, int tcp_state, int conn_per_state)
>     >      ds_put_format(ds, "=%u", conn_per_state);
>     >  }
>     > 
>     > -/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
>     > - * Returns true on success.  Otherwise, returns false and puts the error
>     > - * message in 'ds'. */
>     > -bool
>     > -ct_dpif_parse_tuple(struct ct_dpif_tuple *tuple, const char *s, struct ds *ds)
>     > -{
>     > -    char *pos, *key, *value, *copy;
>     > -    memset(tuple, 0, sizeof *tuple);
>     > -
>     > -    pos = copy = xstrdup(s);
>     > -    while (ofputil_parse_key_value(&pos, &key, &value)) {
>     > -        if (!*value) {
>     > -            ds_put_format(ds, "field %s missing value", key);
>     > -            goto error;
>     > -        }
>     > -
>     > -        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")) {
>     > -            if (tuple->l3_type && tuple->l3_type != AF_INET) {
>     > -                ds_put_cstr(ds, "L3 type set multiple times");
>     > -                goto error;
>     > -            } else {
>     > -                tuple->l3_type = AF_INET;
>     > -            }
>     > -            if (!ip_parse(value, key[6] == 's' ? &tuple->src.ip :
>     > -                                                 &tuple->dst.ip)) {
>     > -                goto error_with_msg;
>     > -            }
>     > -        } else if (!strcmp(key, "ct_ipv6_src") ||
>     > -                   !strcmp(key, "ct_ipv6_dst")) {
>     > -            if (tuple->l3_type && tuple->l3_type != AF_INET6) {
>     > -                ds_put_cstr(ds, "L3 type set multiple times");
>     > -                goto error;
>     > -            } else {
>     > -                tuple->l3_type = AF_INET6;
>     > -            }
>     > -            if (!ipv6_parse(value, key[8] == 's' ? &tuple->src.in6 :
>     > -                                                   &tuple->dst.in6)) {
>     > -                goto error_with_msg;
>     > -            }
>     > -        } else if (!strcmp(key, "ct_nw_proto")) {
>     > -            char *err = str_to_u8(value, key, &tuple->ip_proto);
>     > -            if (err) {
>     > -                free(err);
>     > -                goto error_with_msg;
>     > -            }
>     > -        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key,"ct_tp_dst")) {
>     > -            uint16_t port;
>     > -            char *err = str_to_u16(value, key, &port);
>     > -            if (err) {
>     > -                free(err);
>     > -                goto error_with_msg;
>     > -            }
>     > -            if (key[6] == 's') {
>     > -                tuple->src_port = htons(port);
>     > -            } else {
>     > -                tuple->dst_port = htons(port);
>     > -            }
>     > -        } else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") ||
>     > -                   !strcmp(key, "icmp_id") ) {
>     > -            if (tuple->ip_proto != IPPROTO_ICMP &&
>     > -                tuple->ip_proto != IPPROTO_ICMPV6) {
>     > -                ds_put_cstr(ds, "invalid L4 fields");
>     > -                goto error;
>     > -            }
>     > -            uint16_t icmp_id;
>     > -            char *err;
>     > -            if (key[5] == 't') {
>     > -                err = str_to_u8(value, key, &tuple->icmp_type);
>     > -            } else if (key[5] == 'c') {
>     > -                err = str_to_u8(value, key, &tuple->icmp_code);
>     > -            } else {
>     > -                err = str_to_u16(value, key, &icmp_id);
>     > -                tuple->icmp_id = htons(icmp_id);
>     > -            }
>     > -            if (err) {
>     > -                free(err);
>     > -                goto error_with_msg;
>     > -            }
>     > -        } else {
>     > -            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
>     > -            goto error;
>     > -        }
>     > -    }
>     > -
>     > -    if (ipv6_is_zero(&tuple->src.in6) || ipv6_is_zero(&tuple->dst.in6) ||
>     > -        !tuple->ip_proto) {
>     > -        /* icmp_type, icmp_code, and icmp_id can be 0. */
>     > -        if (tuple->ip_proto != IPPROTO_ICMP &&
>     > -            tuple->ip_proto != IPPROTO_ICMPV6) {
>     > -            if (!tuple->src_port || !tuple->dst_port) {
>     > -                ds_put_cstr(ds, "at least one of the conntrack 5-tuple fields "
>     > -                                "is missing.");
>     > -                goto error;
>     > -            }
>     > -        }
>     > -    }
>     > -
>     > -    free(copy);
>     > -    return true;
>     > -
>     > -error_with_msg:
>     > -    ds_put_format(ds, "failed to parse field %s", key);
>     > -error:
>     > -    free(copy);
>     > -    return false;
>     > -}
>     > 
>     >  void
>     >  ct_dpif_push_zone_limit(struct ovs_list *zone_limits, uint16_t zone,
>     > diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
>     > index 2848549b0..337c99dd4 100644
>     > --- a/lib/ct-dpif.h
>     > +++ b/lib/ct-dpif.h
>     > @@ -17,6 +17,7 @@
>     >  #ifndef CT_DPIF_H
>     >  #define CT_DPIF_H
>     > 
>     > +#include "openvswitch/ofp-util.h"
>     >  #include "openvswitch/types.h"
>     >  #include "packets.h"
>     > 
>     > @@ -285,7 +286,7 @@ int ct_dpif_dump_start(struct dpif *, struct ct_dpif_dump_state **,
>     >  int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry *);
>     >  int ct_dpif_dump_done(struct ct_dpif_dump_state *);
>     >  int ct_dpif_flush(struct dpif *, const uint16_t *zone,
>     > -                  const struct ct_dpif_tuple *);
>     > +                  const struct ofputil_ct_match *);
>     >  int ct_dpif_set_maxconns(struct dpif *dpif, uint32_t maxconns);
>     >  int ct_dpif_get_maxconns(struct dpif *dpif, uint32_t *maxconns);
>     >  int ct_dpif_get_nconns(struct dpif *dpif, uint32_t *nconns);
>     > @@ -311,7 +312,6 @@ void ct_dpif_format_ipproto(struct ds *ds, uint16_t ipproto);
>     >  void ct_dpif_format_tuple(struct ds *, const struct ct_dpif_tuple *);
>     >  uint8_t ct_dpif_coalesce_tcp_state(uint8_t state);
>     >  void ct_dpif_format_tcp_stat(struct ds *, int, int);
>     > -bool ct_dpif_parse_tuple(struct ct_dpif_tuple *, const char *s, struct ds *);
>     >  void ct_dpif_push_zone_limit(struct ovs_list *, uint16_t zone, uint32_t limit,
>     >                               uint32_t count);
>     >  void ct_dpif_free_zone_limits(struct ovs_list *);
>     > diff --git a/lib/dpctl.c b/lib/dpctl.c
>     > index 29041fa3e..e6e350aba 100644
>     > --- a/lib/dpctl.c
>     > +++ b/lib/dpctl.c
>     > @@ -40,6 +40,7 @@
>     >  #include "netdev.h"
>     >  #include "netlink.h"
>     >  #include "odp-util.h"
>     > +#include "ofp-ct-util.h"
>     >  #include "openvswitch/ofpbuf.h"
>     >  #include "packets.h"
>     >  #include "openvswitch/shash.h"
>     > @@ -1707,47 +1708,38 @@ dpctl_flush_conntrack(int argc, const char *argv[],
>     >                        struct dpctl_params *dpctl_p)
>     >  {
>     >      struct dpif *dpif = NULL;
>     > -    struct ct_dpif_tuple tuple, *ptuple = NULL;
>     > -    struct ds ds = DS_EMPTY_INITIALIZER;
>     > -    uint16_t zone, *pzone = NULL;
>     > -    int error;
>     > +    struct ofputil_ct_match match = {0};
>     > +    uint16_t zone;
>     > +    bool with_zone = false;
>     >      int args = argc - 1;
>     > 
>     >      /* Parse ct tuple */
>     > -    if (args && ct_dpif_parse_tuple(&tuple, argv[args], &ds)) {
>     > -        ptuple = &tuple;
>     > +    if (args) {
>     > +        struct ds ds = DS_EMPTY_INITIALIZER;
>     > +        if (!ofputil_ct_match_parse(&match, argv[args], &ds)) {
>     > +            dpctl_error(dpctl_p, EINVAL, "%s", ds_cstr(&ds));
>     > +            ds_destroy(&ds);
>     > +            return EINVAL;
>     > +        }
>     >          args--;
>     >      }
>     > 
>     >      /* Parse zone */
>     >      if (args && ovs_scan(argv[args], "zone=%"SCNu16, &zone)) {
>     > -        pzone = &zone;
>     > -        args--;
>     > -    }
>     > -
>     > -    /* Report error if there are more than one unparsed argument. */
>     > -    if (args > 1) {
>     > -        ds_put_cstr(&ds, "invalid arguments");
>     > -        error = EINVAL;
>     > -        goto error;
>     > +        with_zone = true;
>     >      }
>     > 
>     > -    error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
>     > +    int error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
>     >      if (error) {
>     > +        dpctl_error(dpctl_p, error, "Cannot open dpif");
>     >          return error;
>     >      }
>     > 
>     > -    error = ct_dpif_flush(dpif, pzone, ptuple);
>     > -    if (!error) {
>     > -        dpif_close(dpif);
>     > -        return 0;
>     > -    } else {
>     > -        ds_put_cstr(&ds, "failed to flush conntrack");
>     > +    error = ct_dpif_flush(dpif, with_zone ? &zone : NULL, &match);
>     > +    if (error) {
>     > +        dpctl_error(dpctl_p, error, "Failed to flush conntrack");
>     >      }
>     > 
>     > -error:
>     > -    dpctl_error(dpctl_p, error, "%s", ds_cstr(&ds));
>     > -    ds_destroy(&ds);
>     >      dpif_close(dpif);
>     >      return error;
>     >  }
>     > diff --git a/lib/dpctl.man b/lib/dpctl.man
>     > index 87ea8087b..b0cabe05d 100644
>     > --- a/lib/dpctl.man
>     > +++ b/lib/dpctl.man
>     > @@ -312,7 +312,8 @@ If \fBzone\fR=\fIzone\fR is specified, only flushes the connections in
>     >  If \fIct-tuple\fR is provided, flushes the connection entry specified by
>     >  \fIct-tuple\fR in \fIzone\fR. The zone defaults to 0 if it is not provided.
>     >  The userspace connection tracker requires flushing with the original pre-NATed
>     > -tuple and a warning log will be otherwise generated.
>     > +tuple and a warning log will be otherwise generated. The tuple can be partial
>     > +and will remove all connections that are matching on the specified fields.
>     >  An example of an IPv4 ICMP \fIct-tuple\fR:
>     >  .IP
>     >  "ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=1,icmp_type=8,icmp_code=0,icmp_id=10"
>     > diff --git a/lib/ofp-ct-util.c b/lib/ofp-ct-util.c
>     > new file mode 100644
>     > index 000000000..a10607590
>     > --- /dev/null
>     > +++ b/lib/ofp-ct-util.c
>     > @@ -0,0 +1,326 @@
>     > +
>     > +/* Copyright (c) 2022, Red Hat, Inc.
>     > + *
>     > + * Licensed under the Apache License, Version 2.0 (the "License");
>     > + * you may not use this file except in compliance with the License.
>     > + * You may obtain a copy of the License at:
>     > + *
>     > + *     http://www.apache.org/licenses/LICENSE-2.0 <http://www.apache.org/licenses/LICENSE-2.0>
>     > + *
>     > + * Unless required by applicable law or agreed to in writing, software
>     > + * distributed under the License is distributed on an "AS IS" BASIS,
>     > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>     > + * See the License for the specific language governing permissions and
>     > + * limitations under the License.
>     > + */
>     > +
>     > +#include <config.h>
>     > +#include <stdbool.h>
>     > +#include <stdint.h>
>     > +#include <sys/types.h>
>     > +#include <netinet/in.h>
>     > +#include <netinet/icmp6.h>
>     > +
>     > +#include "ct-dpif.h"
>     > +#include "ofp-ct-util.h"
>     > +#include "openvswitch/dynamic-string.h"
>     > +#include "openvswitch/ofp-parse.h"
>     > +#include "openvswitch/ofp-util.h"
>     > +#include "openvswitch/packets.h"
>     > +
>     > +static inline bool
>     > +ofputil_ct_inet_addr_cmp_partial(const struct in6_addr *partial,
>     > +                                 const union ct_dpif_inet_addr *addr,
>     > +                                 const uint16_t l3_type)
>     > +{
>     > +    if (ipv6_is_zero(partial)) {
>     > +        return true;
>     > +    }
>     > +
>     > +    if (l3_type == AF_INET && in6_addr_get_mapped_ipv4(partial) != addr->ip) {
>     > +        return false;
>     > +    }
>     > +
>     > +    if (l3_type == AF_INET6 && !ipv6_addr_equals(partial, &addr->in6)) {
>     > +        return false;
>     > +    }
>     > +
>     > +    return true;
>     > +}
>     > +
>     > +static inline bool
>     > +ofputil_ct_tuple_ip_cmp_partial(const struct ofputil_ct_tuple *partial,
>     > +                                const struct ct_dpif_tuple *tuple,
>     > +                                const uint16_t l3_type, const uint8_t ip_proto)
>     > +{
>     > +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->src,
>     > +                                          &tuple->src, l3_type)) {
>     > +        return false;
>     > +    }
>     > +
>     > +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->dst,
>     > +                                          &tuple->dst, l3_type)) {
>     > +        return false;
>     > +    }
>     > +
>     > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
>     > +        if (partial->icmp_id != tuple->icmp_id) {
>     > +            return false;
>     > +        }
>     > +
>     > +        if (partial->icmp_type != tuple->icmp_type) {
>     > +            return false;
>     > +        }
>     > +
>     > +        if (partial->icmp_code != tuple->icmp_code) {
>     > +            return false;
>     > +        }
>     > +    } else {
>     > +        if (partial->src_port && partial->src_port != tuple->src_port) {
>     > +            return false;
>     > +        }
>     > +
>     > +        if (partial->dst_port && partial->dst_port != tuple->dst_port) {
>     > +            return false;
>     > +        }
>     > +    }
>     > +
>     > +    return true;
>     > +}
>     > +
>     > +/* Compares the non-zero members if they match. This is useful for clearing
>     > + * up all connections specified by a partial tuples for orig/reply. */
>     > +bool
>     > +ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
>     > +                     const struct ct_dpif_entry *entry)
>     > +{
>     > +    if (match->l3_type && match->l3_type != entry->tuple_orig.l3_type) {
>     > +        return false;
>     > +    }
>     > +
>     > +    if (match->ip_proto && match->ip_proto != entry->tuple_orig.ip_proto) {
>     > +        return false;
>     > +    }
>     > +
>     > +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_orig,
>     > +                                         &entry->tuple_orig,
>     > +                                         match->l3_type, match->ip_proto)) {
>     > +        return false;
>     > +    }
>     > +
>     > +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_reply,
>     > +                                         &entry->tuple_reply,
>     > +                                         match->l3_type, match->ip_proto)) {
>     > +        return false;
>     > +    }
>     > +
>     > +    return true;
>     > +}
>     > +
>     > +static void
>     > +ofputil_ct_tuple_format(struct ds *ds, const struct ofputil_ct_tuple *tuple,
>     > +                        uint8_t ip_proto)
>     > +{
>     > +    ds_put_cstr(ds, "src=");
>     > +    ipv6_format_mapped(&tuple->src, ds);
>     > +    ds_put_cstr(ds, ",dst=");
>     > +    ipv6_format_mapped(&tuple->dst, ds);
>     > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
>     > +        ds_put_format(ds, ",icmp_id=%u,icmp_type=%u,icmp_code=%u",
>     > +                      ntohs(tuple->icmp_id), tuple->icmp_type,
>     > +                      tuple->icmp_code);
>     > +
>     > +    } else {
>     > +        ds_put_format(ds, ",src_port=%u,dst_port=%u", ntohs(tuple->src_port),
>     > +                      ntohs(tuple->dst_port));
>     > +    }
>     > +}
>     > +
>     > +static bool
>     > +ofputil_is_tuple_zero(const struct ofputil_ct_tuple *tuple)
>     > +{
>     > +    return ipv6_is_zero(&tuple->src) && ipv6_is_zero(&tuple->dst) &&
>     > +           !tuple->src_port && !tuple->dst_port;
>     > +}
>     > +
>     > +bool
>     > +ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple *tuple,
>     > +                               uint8_t ip_proto)
>     > +{
>     > +    /* First check if we have address. */
>     > +    bool five_tuple = !ipv6_is_zero(&tuple->src) && !ipv6_is_zero(&tuple->dst);
>     > +
>     > +    if (!(ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6)) {
>     > +        five_tuple = five_tuple && tuple->src_port && tuple->dst_port;
>     > +    }
>     > +
>     > +    return five_tuple;
>     > +}
>     > +
>     > +bool
>     > +ofputil_is_ct_match_zero(const struct ofputil_ct_match *match)
>     > +{
>     > +    return !match->ip_proto && !match->l3_type &&
>     > +           ofputil_is_tuple_zero(&match->tuple_orig) &&
>     > +           ofputil_is_tuple_zero(&match->tuple_reply);
>     > +}
>     > +
>     > +void
>     > +ofputil_ct_match_format(struct ds *ds, const struct ofputil_ct_match *match)
>     > +{
>     > +    ds_put_format(ds, " l3_type=%u,ip_proto=%u", match->l3_type,
>     > +                  match->ip_proto);
>     > +    ds_put_cstr(ds, ",orig=(");
>     > +    ofputil_ct_tuple_format(ds, &match->tuple_orig, match->ip_proto);
>     > +    ds_put_cstr(ds, "),reply=(");
>     > +    ofputil_ct_tuple_format(ds, &match->tuple_reply, match->ip_proto);
>     > +    ds_put_cstr(ds, ")");
> 
>     This syntax is very different from what is used as cmdline arguments,
>     that is not good.  This means you cannot feed back what OVS prints out.
> 
>     Also, OVS usually uses 'nw_proto' instead of 'ip_proto', especially
>     on the OpenFlow side of things.  And it might be better to print
>     a string 'ip'/'ipv6' instead of 'l3_type', which is not great on its
>     own.  See also comments on patch 2/2.
>     They can also be combined into a single 'icmp'/'icmp6'/'udp'/'udp6'/etc.
>     And there is already a function like this.
> 
>     > +}
>     > +
>     > +static bool
>     > +ofputil_ct_tuple_ip_parse(struct in6_addr *addr, char *value, uint16_t l3_type)
>     > +{
>     > +    if (!ipv6_is_zero(addr)) {
>     > +        return false;
>     > +    }
>     > +
>     > +    if (l3_type == AF_INET) {
>     > +        ovs_be32 ip = 0;
>     > +
>     > +        ip_parse(value, &ip);
>     > +        *addr = in6_addr_mapped_ipv4(ip);
>     > +    } else {
>     > +        ipv6_parse(value, addr);
> 
>     Both ip_parse and ipv6_parse can fail.
>     There is also lookup_any() function that can be exported and reused.
> 
>     > +    }
>     > +
>     > +    return true;
>     > +}
>     > +
>     > +/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
>     > + * Returns true on success.  Otherwise, returns false and puts the error
>     > + * message in 'ds'. */
>     > +bool
>     > +ofputil_ct_match_parse(struct ofputil_ct_match *match, const char *s,
>     > +                       struct ds *ds)
>     > +{
>     > +    char *pos, *key, *value, *copy;
>     > +
>     > +    memset(match, 0, sizeof *match);
>     > +    struct ofputil_ct_tuple *tuple = &match->tuple_orig;
>     > +
>     > +    pos = copy = xstrdup(s);
>     > +    while (ofputil_parse_key_value(&pos, &key, &value)) {
>     > +        if (!*value) {
>     > +            ds_put_format(ds, "field %s missing value", key);
>     > +            goto error_with_msg;
>     > +        }
>     > +
>     > +        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")
>     > +            || !strcmp(key, "ct_ipv6_src") || !strcmp(key, "ct_ipv6_dst")) {
>     > +            match->l3_type = key[6] == '6' ? AF_INET6 : AF_INET;
> 
>     Original code made sure that user didn't specify keys from
>     different families.  This version does not.
> 
> 
>     > +            uint8_t index = key[6] == '6' ? 8 : 6;
>     > +            struct in6_addr *addr = key[index] == 's'
>     > +                ? &tuple->src : &tuple->dst;
>     > +
>     > +            if (!ofputil_ct_tuple_ip_parse(addr, value, match->l3_type)) {
>     > +                ds_put_format(ds, "%s is set multiple times", key);
>     > +                goto error;
>     > +            }
>     > +        } else if (!strcmp(key, "ct_nw_proto")) {
>     > +            char *err = str_to_u8(value, key, &match->ip_proto);
>     > +
>     > +            if (err) {
>     > +                free(err);
>     > +                goto error_with_msg;
>     > +            }
>     > +        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key, "ct_tp_dst")) {
>     > +            uint16_t port;
>     > +            char *err = str_to_u16(value, key, &port);
>     > +
>     > +            if (err) {
>     > +                free(err);
>     > +                goto error_with_msg;
>     > +            }
>     > +            if (key[6] == 's') {
>     > +                tuple->src_port = htons(port);
>     > +            } else {
>     > +                tuple->dst_port = htons(port);
>     > +            }
>     > +        } else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") ||
>     > +                   !strcmp(key, "icmp_id")) {
>     > +            if (match->ip_proto != IPPROTO_ICMP &&
>     > +                match->ip_proto != IPPROTO_ICMPV6) {
>     > +                ds_put_cstr(ds, "invalid L4 fields");
>     > +                goto error;
>     > +            }
>     > +            uint16_t icmp_id;
>     > +            char *err;
>     > +
>     > +            if (key[5] == 't') {
>     > +                err = str_to_u8(value, key, &tuple->icmp_type);
>     > +            } else if (key[5] == 'c') {
>     > +                err = str_to_u8(value, key, &tuple->icmp_code);
>     > +            } else {
>     > +                err = str_to_u16(value, key, &icmp_id);
>     > +                tuple->icmp_id = htons(icmp_id);
>     > +            }
>     > +            if (err) {
>     > +                free(err);
>     > +                goto error_with_msg;
>     > +            }
>     > +        } else {
>     > +            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
>     > +            goto error;
>     > +        }
>     > +    }
>     > +
>     > +    if (!match->ip_proto && (tuple->src_port || tuple->dst_port)) {
>     > +        ds_put_cstr(ds, "port is set without protocol");
>     > +        goto error;
>     > +    }
>     > +
>     > +    /* For the filtering to work with icmp we need to fill the reply direction
>     > +     * with correct information. */
>     > +    if (match->ip_proto == IPPROTO_ICMP) {
>     > +        switch (match->tuple_orig.icmp_type) {
>     > +        case ICMP4_ECHO_REQUEST:
>     > +            match->tuple_reply.icmp_type = ICMP4_ECHO_REPLY;
>     > +            break;
> 
>     An empty line after the 'break;'.
> 
>     > +        case ICMP4_ECHO_REPLY:
>     > +            match->tuple_reply.icmp_type = ICMP4_ECHO_REQUEST;
>     > +            break;
>     > +        case ICMP4_TIMESTAMP:
>     > +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMPREPLY;
>     > +            break;
>     > +        case ICMP4_TIMESTAMPREPLY:
>     > +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMP;
>     > +            break;
>     > +        case ICMP4_INFOREQUEST:
>     > +            match->tuple_reply.icmp_type = ICMP4_INFOREPLY;
>     > +            break;
>     > +        case ICMP4_INFOREPLY:
>     > +            match->tuple_reply.icmp_type = ICMP4_INFOREQUEST;
>     > +            break;
>     > +        }
>     > +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
>     > +    } else if (match->ip_proto == IPPROTO_ICMPV6) {
>     > +        switch (match->tuple_orig.icmp_type) {
>     > +        case ICMP6_ECHO_REQUEST:
>     > +            match->tuple_reply.icmp_type = ICMP6_ECHO_REPLY;
>     > +            break;
>     > +        case ICMP6_ECHO_REPLY:
>     > +            match->tuple_reply.icmp_type = ICMP6_ECHO_REQUEST;
>     > +            break;
>     > +        }
>     > +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
>     > +    }
>     > +
>     > +    free(copy);
>     > +    return true;
>     > +
>     > +error_with_msg:
>     > +    ds_put_format(ds, "failed to parse field %s", key);
>     > +error:
>     > +    free(copy);
>     > +    return false;
>     > +}
>     > diff --git a/lib/ofp-ct-util.h b/lib/ofp-ct-util.h
>     > new file mode 100644
>     > index 000000000..5a5eaf920
>     > --- /dev/null
>     > +++ b/lib/ofp-ct-util.h
>     > @@ -0,0 +1,36 @@
>     > +/* Copyright (c) 2022, Red Hat, Inc.
>     > + *
>     > + * Licensed under the Apache License, Version 2.0 (the "License");
>     > + * you may not use this file except in compliance with the License.
>     > + * You may obtain a copy of the License at:
>     > + *
>     > + *     http://www.apache.org/licenses/LICENSE-2.0 <http://www.apache.org/licenses/LICENSE-2.0>
>     > + *
>     > + * Unless required by applicable law or agreed to in writing, software
>     > + * distributed under the License is distributed on an "AS IS" BASIS,
>     > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>     > + * See the License for the specific language governing permissions and
>     > + * limitations under the License.
>     > + */
>     > +
>     > +#ifndef OVS_OFP_CT_UTIL_H
>     > +#define OVS_OFP_CT_UTIL_H
>     > +
>     > +#include "ct-dpif.h"
>     > +#include "openvswitch/ofp-util.h"
>     > +
>     > +bool ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
>     > +                                 const struct ct_dpif_entry *entry);
>     > +
>     > +bool ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple *tuple,
>     > +                                    uint8_t ip_proto);
>     > +
>     > +void ofputil_ct_match_format(struct ds *ds,
>     > +                             const struct ofputil_ct_match *match);
>     > +
>     > +bool ofputil_ct_match_parse(struct ofputil_ct_match *match, const char *s,
>     > +                            struct ds *ds);
>     > +
>     > +bool ofputil_is_ct_match_zero(const struct ofputil_ct_match *match);
>     > +
>     > +#endif /* lib/ofp-ct-util.h */
>     > diff --git a/tests/system-traffic.at <http://system-traffic.at> b/tests/system-traffic.at <http://system-traffic.at>
>     > index e5403519f..51903a658 100644
>     > --- a/tests/system-traffic.at <http://system-traffic.at>
>     > +++ b/tests/system-traffic.at <http://system-traffic.at>
>     > @@ -2230,7 +2230,7 @@ udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.
>     >  OVS_TRAFFIC_VSWITCHD_STOP
>     >  AT_CLEANUP
>     > 
>     > -AT_SETUP([conntrack - ct flush by 5-tuple])
>     > +AT_SETUP([conntrack - ct flush])
>     >  CHECK_CONNTRACK()
>     >  OVS_TRAFFIC_VSWITCHD_START()
>     > 
>     > @@ -2291,6 +2291,86 @@ AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
>     >  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl
>     >  ])
>     > 
>     > +dnl Test UDP from port 1 and 2, partial flush by src port
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
>     > +
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
>     > +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=1'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=2'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
>     > +
>     > +dnl Test UDP from port 1 and 2, partial flush by dst port
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
>     > +
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
>     > +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_dst=2'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_dst=1'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
>     > +
>     > +dnl Test UDP from port 1 and 2, partial flush by src address
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
>     > +
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
>     > +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.1'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
>     > +
>     > +dnl Test UDP from port 1 and 2, partial flush by dst address
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
>     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
>     > +
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
>     > +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.2'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
>     > +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
>     > +])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.1'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
>     > +
>     >  OVS_TRAFFIC_VSWITCHD_STOP
>     >  AT_CLEANUP
>     > 
> 
> 
> 
> -- 
> 
> Ales Musil
> 
> Senior Software Engineer - OVN Core
> 
> Red Hat EMEA <https://www.redhat.com>
> 
> amusil@redhat.com <mailto:amusil@redhat.com>    IM: amusil
> 
> <https://red.ht/sig>
>
Ales Musil Jan. 11, 2023, 1:09 p.m. UTC | #4
On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <i.maximets@ovn.org> wrote:

> On 1/11/23 07:53, Ales Musil wrote:
> >
> >
> > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <i.maximets@ovn.org
> <mailto:i.maximets@ovn.org>> wrote:
> >
> >     On 12/16/22 13:30, Ales Musil wrote:
> >     > Currently, the CT can be flushed by dpctl only be specifying
> >
> >     *by
> >
> >     > the whole 5-tuple. This is not very convenient when there are
> >     > only some fields known to the user of CT flush. Add new struct
> >     > ofputil_ct_match which represents the generic filtering that can
> >     > be done for CT flush. The match is done only on fields that are
> >     > non-zero with exception to the icmp fields.
> >     >
> >     > This allows the filtering just within dpctl, however
> >     > it is a preparation for OpenFlow extension.
> >     >
> >     > Reported-at: https://bugzilla.redhat.com/2120546 <
> https://bugzilla.redhat.com/2120546>
> >     > Signed-off-by: Ales Musil <amusil@redhat.com <mailto:
> amusil@redhat.com>>
> >     > Acked-by: Paolo Valerio <pvalerio@redhat.com <mailto:
> pvalerio@redhat.com>>
> >     > ---
> >     > v5: Add ack from Paolo.
> >     > v4: Fix a flush all scenario.
> >     > v3: Rebase on top of master.
> >     >     Address the C99 comment and missing dpif_close call.
> >     > v2: Rebase on top of master.
> >     >     Address comments from Paolo.
> >     > ---
> >
> >     Hi, Ales.  Thanks for working on this change.
> >     Sorry for review taking so long.  I initially though there
> >     will be mostly a few cosmetic/style changes needed.  But
> >     longer I look at this code, more questions I have.  See below.
> >
> >     Dumitru, maybe you can also help answer some of them?
> >
> >     Best regards, Ilya Maximets.
> >
> >
> > Hi Ilya,
> >
> > thank you for the review. Before proceeding I tried to answer your
> concerns/questions.
> > Let me know what you think.
> >
> > Thanks,
> > Ales
> >
> >
> >
> >     >  NEWS                           |   2 +
> >     >  include/openvswitch/ofp-util.h |  28 +++
> >     >  lib/automake.mk <http://automake.mk>                |   2 +
> >     >  lib/ct-dpif.c                  | 208 +++++++++------------
> >     >  lib/ct-dpif.h                  |   4 +-
> >     >  lib/dpctl.c                    |  42 ++---
> >     >  lib/dpctl.man                  |   3 +-
> >     >  lib/ofp-ct-util.c              | 326
> +++++++++++++++++++++++++++++++++
> >     >  lib/ofp-ct-util.h              |  36 ++++
> >     >  tests/system-traffic.at <http://system-traffic.at>        |  82
> ++++++++-
> >     >  10 files changed, 586 insertions(+), 147 deletions(-)
> >     >  create mode 100644 lib/ofp-ct-util.c
> >     >  create mode 100644 lib/ofp-ct-util.h
> >     >
> >     > diff --git a/NEWS b/NEWS
> >     > index 265375e1c..ff8904b02 100644
> >     > --- a/NEWS
> >     > +++ b/NEWS
> >     > @@ -14,6 +14,8 @@ Post-v3.0.0
> >     >       10 Gbps link speed by default in case the actual link speed
> cannot be
> >     >       determined.  Previously it was 10 Mbps.  Values can still be
> overridden
> >     >       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
> >     > +   - ovs-dpctl and related ovs-appctl commands:
> >
> >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
> >
> >     > +     * "flush-conntrack" is capable of handling partial 5-tuple.
> >     >
> >     >
> >     >  v3.0.0 - 15 Aug 2022
> >     > diff --git a/include/openvswitch/ofp-util.h
> b/include/openvswitch/ofp-util.h
> >     > index 091a09cad..84937ae26 100644
> >     > --- a/include/openvswitch/ofp-util.h
> >     > +++ b/include/openvswitch/ofp-util.h
> >     > @@ -19,6 +19,9 @@
> >     >
> >     >  #include <stdbool.h>
> >     >  #include <stdint.h>
> >     > +#include <sys/types.h>
> >     > +#include <netinet/in.h>
> >     > +
> >     >  #include "openvswitch/ofp-protocol.h"
> >     >
> >     >  struct ofp_header;
> >     > @@ -27,6 +30,31 @@ struct ofp_header;
> >     >  extern "C" {
> >     >  #endif
> >     >
> >     > +struct ofputil_ct_tuple {
> >     > +    struct in6_addr src;
> >     > +    struct in6_addr dst;
> >     > +
> >     > +    union {
> >     > +        ovs_be16 src_port;
> >     > +        ovs_be16 icmp_id;
> >     > +    };
> >     > +    union {
> >     > +        ovs_be16 dst_port;
> >     > +        struct {
> >     > +            uint8_t icmp_code;
> >     > +            uint8_t icmp_type;
> >     > +        };
> >     > +    };
> >     > +};
> >     > +
> >     > +struct ofputil_ct_match {
> >     > +    uint8_t ip_proto;
> >     > +    uint16_t l3_type;
> >     > +
> >     > +    struct ofputil_ct_tuple tuple_orig;
> >     > +    struct ofputil_ct_tuple tuple_reply;
> >     > +};
> >
> >     These structures are very similar to existing ct_dpif_tuple.
> >     We can keep them for now, but I think, we'll need to replace
> >     ct_dpif_tuple with the new ofputil_ct_tuple in the future.
> >     We may need some changes in ofputil_ct_tuple for that though.
> >
> >
> > I was thinking about replacing them, however it would require a much
> > bigger change, which was out of scope of this RFE. Also I was not
> entirely
> > sure if there wasn't any specific reason for the ct_dpif_tuple
> specifically
> > the address part. It seems like optimization to me.
> >
> > Anyway it should be possible to unify those if needed.
> >
> >
> >
> >
> >     OTOH, I'm not sure I understand why we need both tuples here.
> >     Can we have a tuple and a flag that indicates a direction
> >     instead?
> >
> >
> > That is also a possibility. However that wouldn't fit too that nicely in
> the
> > whole match struct, we could probably duplicate the proto field and get
> rid of the
> > l3_type. That way we could supply just two tuples instead, getting rid
> of the match completely.
>
> The point was to have only one tuple.  See below.
>
> >
> >
> >
> >     There are several issues with having both original and reply
> >     tuples, IIUC:
> >
> >     1. None of the command line interfaces support specifying
> >        reply tuple.
> >
> >
> > That was the intention of the original command AFAICT, we could modify
> the commands
> > to support that. However I'm not sure if there is any benefit in that.
>
> ovs-ofctl supposed to mimic the OpenFlow request, so it should
> support everything that OpenFlow request can have.
>

I see, sure that can definitely be done, however we should probably think
how this can
be done in a more universal way that could be used also for dpctl.


> >
> >
> >
> >     2. If the reply tuple will be specified via OpenFlow, it
> >        will be ignored if the original direction tuple is
> >        fully specified.
> >
> >
> > That is ok IMO, because a fully specified original should be enough to
> determine the one
> > entry that needs to be erased. So the reply direction in that would be
> an extra.
>
> That supports the point that we don't need two tuples in the request.
>

Right but only if we are able to supply the full original direction. See
below.


>
> >
> >
> >
> >     3. Netfilter kernel interface allows specifying only one
> >        tuple, not both, AFAICT.
> >
> >
> > Right, that's the reason why, when we match, we use the original tuple
> to erase that
> > entry.
> >
> >
> >
> >     I'm not a conntrack expert, so I have a question: Is it possible
> >     that addresses and ports in original and reply directions are
> >     not mirrored copies?  Or do we care about these cases?
> >
> >
> > Not sure if it's possible, but even if it would be, the current
> implementation
> > would account for that, with the exception of specifying fully the
> original
> > direction.
> >
> >
> >
> >     If we want to flush all entries related to the same IP:PORT,
> >     we will not be able to do that in a single OpenFlow request,
> >     because we'll need to supply the same IP and PORT as src
> >     and dst, or as src in both original and reply directions.
> >     None of these can fit into a single request.
> >
> >
> > Is that really a problem? You could do that in two, one for the original
> > one for reply. This is also very similar to the interface supported by
> conntrack-tools,
> > with those you couldn't remove all entries in a single call as well. We
> could modify
> > the interface that would allow matching on tuple without direction
> specification,
> > that would translate to both directions, however there is still the
> question if that's needed.
>
>
> It's not a problem, but another supporting point for the interface
> being redundant.  We don't seem to need two tuples in the request,
> because users will have to send two different requests anyway.
>
> So, the request should have:
>
> 1. zone
> 2. nw_proto
> 3. tuple
> 4. tuple type ('original' or 'reply'), in case we want to allow
>    users to flush based on reply tuple.  But this seems a bit
>    redundant as well.
>
> Something like:
>
> struct ofputil_ct_tuple {
> }
>
> struct ofputil_ct_match {
>     uint8_t ip_proto;
>     uint16_t l3_type;
>     // enum DIRECTION direction;
>     struct ofputil_ct_tuple tuple;
> };
>
>
> And the OpenFlow part:
>
> /* CT flush available TLVs. */
> enum nx_ct_flush_tlv_type {
>     NXT_CT_ZONE_ID,           /* u8: CT zone id. */
>     NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_* attributes. */
> };
>
> enum nx_ct_flush_tuple_type {
>     NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped IPv4
> address. */
>     NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or mapped IPv4
> address. */
>     NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
>     NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
>     NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
>     NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
>     NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
>     // NXT_CT_TUPLE_DIRECTION,   /* enum nx_ct_flush_tuple_direction: CT
> tuple direction. */
> };
>
> //enum nx_ct_flush_tuple_direction {
> //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
> //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
> //}
>
> struct nx_ct_flush {
>     uint8_t nw_proto;          /* IP protocol. */
>     uint8_t pad[7];            /* Align to 64-bit (must be zero). */
>     /* Followed by ... */
> };
>
>
> Such interface will also allow us in to have both direction specified
> if we would want that in the future.  For now we should check that
> each TLV type is specified once, i.e. only one tuple is in the request.
>
> We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can add it
> in later releases if necessary.  Tuples without that TLV can be treated
> as original direction.
>

The original idea was something similar but as stated below that wouldn't
really help
us in the OVN case. And there is probably no significant benefit in having
partial match
with only single direction at time. I'm not sure if I'm not missing
something obvious.


> >
> >
> >
> >     At this point is there actually any value in flushing basing
> >     on reply direction?  Shouldn't we be able to flush the same
> >     entries by flushing by dst in original direction, if we will
> >     need to issue multiple requests anyway?
> >
> >
> > That assumption would be fine when you can specify a full tuple for the
> original direction.
> > The problem is that for example OVN we know only the original
> DST_IP:DST_PORT (VIP)
> > and the reply SRC_IP:SRC_PORT (backend).
>
> Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
>

Unfortunately no, it's DNATed so for example
'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
will have CT entries as follows:

tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2

We cannot simply filter on the original direction, as we cannot infer the
SRC part (without filtering all entries etc.).



>
> > That means we cannot fully specify the original direction
> > without doing the filtering ourselves. It would be possible with extra
> steps from the user. If this interface
> > is too generic/complicated for that purpose we could say that the flush
> extension supports only full orig tuple
> > and leave the filtering up to the user. That would definitely work, it
> wouldn't be that elegant for the user however
> > from the code change perspective much simpler.
>
> I'm not sure I understand that part.  I'm not arguing aginst partial
> matches.  Or do you mean something else?
>
> >
> >  I can see a certain benefit of keeping it in OvS:
> >
> > 1) The filtering complexity is hidden from the user.
> > 2) The data transfer between OvS and the user is much smaller (even if
> you need to send two matches).
> > 3) Once implemented it's just ready to be used without the need of
> additional thoughts going into it.
> >
> >
> >
> >
> >
> >     Did you start working on the OVN side of things?  Which
> >     commands OVN will actually use?
> >
> >
> > Yes, I did. The plan is that OVN will use the extension.
>
> I understand that.  The question was how the actual request will
> look like.
>

The testing version used something like this:

    memset(match, 0, sizeof *match);

    struct ofputil_ct_tuple *orig = &match->tuple_orig;
    struct ofputil_ct_tuple *reply = &match->tuple_reply;

    if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
        match->l3_type = AF_INET;
    } else {
        match->l3_type = AF_INET6;
    }

    match->ip_proto = tuple->proto;

    orig->dst = tuple->vip_ip;
    orig->dst_port = htons(tuple->vip_port);

    reply->src = tuple->backend_ip;
    reply->src_port = htons(tuple->backend_port);

tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND,
BACKEND_PORT, PROTO.
match is the "struct ofputil_ct_match".

This is with older version of this RFE but it should illustrate the point.


>
> >
> >
> >
> >     > +
> >     >  bool ofputil_decode_hello(const struct ofp_header *,
> >     >                            uint32_t *allowed_versions);
> >     >  struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
> >     > diff --git a/lib/automake.mk <http://automake.mk> b/lib/
> automake.mk <http://automake.mk>
> >     > index a0fabe38f..37135f118 100644
> >     > --- a/lib/automake.mk <http://automake.mk>
> >     > +++ b/lib/automake.mk <http://automake.mk>
> >     > @@ -227,6 +227,8 @@ lib_libopenvswitch_la_SOURCES = \
> >     >       lib/ofp-actions.c \
> >     >       lib/ofp-bundle.c \
> >     >       lib/ofp-connection.c \
> >     > +     lib/ofp-ct-util.c \
> >     > +     lib/ofp-ct-util.h \
> >     >       lib/ofp-ed-props.c \
> >     >       lib/ofp-errors.c \
> >     >       lib/ofp-flow.c \
> >     > diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> >     > index 6f17a26b5..9b324ff79 100644
> >     > --- a/lib/ct-dpif.c
> >     > +++ b/lib/ct-dpif.c
> >     > @@ -20,6 +20,7 @@
> >     >  #include <errno.h>
> >     >
> >     >  #include "ct-dpif.h"
> >     > +#include "ofp-ct-util.h"
> >     >  #include "openvswitch/ofp-parse.h"
> >     >  #include "openvswitch/vlog.h"
> >     >
> >     > @@ -71,6 +72,31 @@ ct_dpif_dump_start(struct dpif *dpif, struct
> ct_dpif_dump_state **dump,
> >     >      return err;
> >     >  }
> >     >
> >     > +static void
> >     > +ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple
> *ofp_tuple,
> >     > +                                    struct ct_dpif_tuple *tuple,
> >     > +                                    uint16_t l3_type, uint8_t
> ip_proto)
> >     > +{
> >     > +    if (l3_type == AF_INET) {
> >     > +        tuple->src.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->src);
> >     > +        tuple->dst.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->dst);
> >     > +    } else {
> >     > +        tuple->src.in6 = ofp_tuple->src;
> >     > +        tuple->dst.in6 = ofp_tuple->dst;
> >     > +    }
> >     > +
> >     > +    tuple->l3_type = l3_type;
> >     > +    tuple->ip_proto = ip_proto;
> >     > +    tuple->src_port = ofp_tuple->src_port;
> >     > +
> >     > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> >     > +        tuple->icmp_code = ofp_tuple->icmp_code;
> >     > +        tuple->icmp_type = ofp_tuple->icmp_type;
> >     > +    } else {
> >     > +        tuple->dst_port = ofp_tuple->dst_port;
> >     > +    }
> >     > +}
> >     > +
> >     >  /* Dump one connection from a tracker, and put it in 'entry'.
> >     >   *
> >     >   * 'dump' should have been initialized by ct_dpif_dump_start().
> >     > @@ -100,25 +126,77 @@ ct_dpif_dump_done(struct ct_dpif_dump_state
> *dump)
> >     >              ? dpif->dpif_class->ct_dump_done(dpif, dump)
> >     >              : EOPNOTSUPP);
> >     >  }
> >     > -
> >     > +
> >     > +static int
> >     > +ct_dpif_flush_tuple(struct dpif *dpif, const uint16_t *zone,
> >     > +                    const struct ofputil_ct_match *match) {
> >     > +    struct ct_dpif_dump_state *dump;
> >     > +    struct ct_dpif_entry cte;
> >     > +    int error;
> >     > +    int tot_bkts;
> >     > +
> >     > +    if (!dpif->dpif_class->ct_flush) {
> >     > +        return EOPNOTSUPP;
> >     > +    }
> >     > +
> >     > +    if (VLOG_IS_DBG_ENABLED()) {
> >     > +        struct ds ds = DS_EMPTY_INITIALIZER;
> >     > +        ofputil_ct_match_format(&ds, match);
> >     > +        VLOG_DBG("%s: ct_flush:%s in zone %d", dpif_name(dpif),
> ds_cstr(&ds),
> >     > +                 zone ? *zone : 0);
> >     > +        ds_destroy(&ds);
> >     > +    }
> >     > +
> >     > +    /* If we have full five tuple in orig just do the flush over
> that
> >     > +     * tuple directly. */
> >     > +    if (ofputil_ct_tuple_is_five_tuple(&match->tuple_orig,
> match->ip_proto)) {
> >     > +        struct ct_dpif_tuple tuple;
> >     > +        ct_dpif_tuple_from_ofputil_ct_tuple(&match->tuple_orig,
> &tuple,
> >     > +                                            match->l3_type,
> match->ip_proto);
> >     > +        return dpif->dpif_class->ct_flush(dpif, zone, &tuple);
> >
> >     Here we just fully ignored the reply tuple, even if it was
> >     actually specified.  What if it didn't match whatever we're
> >     oing to remove?  It is user-specified, so can be anything.
> >
> >
> > This is a shortcut for a fully specified tuple. If user fully specifies
> original
> > tuple there shouldn't be any need to actually match on reply as original
> should be
> > unique enough. It's up to the user to provide the right tuple, also it
> wouldn't help if we tried
> > to match later on because we would fail in the original direction
> nevertheless.
>
> The point was that if the user supplied full orig+reply, we
> should not delete entries that do not fully match on both.
> It is unlikley, but user may specify reply information that
> doesn't match orig tuple.  In this case it would be strange
> that OVS removed entries that do not match the request.
>

There could be additional checks if the reply direction can be matched
or is just empty in which case the shortcut would apply.


>
> >
> >
> >
> >     > +    }
> >     > +
> >     > +    error = ct_dpif_dump_start(dpif, &dump, zone, &tot_bkts);
> >     > +    if (error) {
> >     > +        return error;
> >     > +    }
> >     > +
> >     > +    while (!(error = ct_dpif_dump_next(dump, &cte))) {
> >     > +        if (zone && *zone != cte.zone) {
> >     > +            continue;
> >     > +        }
> >     > +
> >     > +        if (ofputil_ct_match_cmp(match, &cte)) {
> >     > +            error = dpif->dpif_class->ct_flush(dpif, &cte.zone,
> >     > +                                               &cte.tuple_orig);
> >
> >     We assume here that original direction tuple uniquely identifies
> >     the conntrack entry.
> >
> >
> > Which should be the case. Note not only the original direction but also
> the zone.
> >
> >
> >
> >
> >     > +            if (error) {
> >     > +                break;
> >     > +            }
> >     > +        }
> >     > +    }
> >     > +    if (error == EOF) {
> >     > +        error = 0;
> >     > +    }
> >     > +
> >     > +    ct_dpif_dump_done(dump);
> >     > +    return error;
> >     > +}
> >     > +
> >     >  /* Flush the entries in the connection tracker used by 'dpif'.
> The
> >     >   * arguments have the following behavior:
> >     >   *
> >     > - *   - If both 'zone' and 'tuple' are NULL, flush all the
> conntrack entries.
> >     > - *   - If 'zone' is not NULL, and 'tuple' is NULL, flush all the
> conntrack
> >     > + *   - If both 'zone' and 'match' are NULL, or match is zero,
> >
> >     It's not clear that 'or' applies to the 'match' part only.
> >
> >     > + *     flush all the conntrack entries.
> >     > + *   - If 'zone' is not NULL, and 'match' is NULL, flush all the
> conntrack
> >     >   *     entries in '*zone'.
> >     > - *   - If 'tuple' is not NULL, flush the conntrack entry
> specified by 'tuple'
> >     > + *   - If 'tuple' is not NULL, flush the conntrack entry
> specified by 'match'
> >
> >     s/tuple/match/
> >
> >     >   *     in '*zone'. If 'zone' is NULL, use the default zone (zone
> 0). */
> >     >  int
> >     >  ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
> >     > -              const struct ct_dpif_tuple *tuple)
> >     > +              const struct ofputil_ct_match *match)
> >     >  {
> >     > -    if (tuple) {
> >     > -        struct ds ds = DS_EMPTY_INITIALIZER;
> >     > -        ct_dpif_format_tuple(&ds, tuple);
> >     > -        VLOG_DBG("%s: ct_flush: %s in zone %d", dpif_name(dpif),
> ds_cstr(&ds),
> >     > -                                                zone ? *zone : 0);
> >     > -        ds_destroy(&ds);
> >     > +    if (match && !ofputil_is_ct_match_zero(match)) {
> >     > +        return ct_dpif_flush_tuple(dpif, zone, match);
> >     >      } else if (zone) {
> >     >          VLOG_DBG("%s: ct_flush: zone %"PRIu16, dpif_name(dpif),
> *zone);
> >     >      } else {
> >     > @@ -126,7 +204,7 @@ ct_dpif_flush(struct dpif *dpif, const
> uint16_t *zone,
> >     >      }
> >     >
> >     >      return (dpif->dpif_class->ct_flush
> >     > -            ? dpif->dpif_class->ct_flush(dpif, zone, tuple)
> >     > +            ? dpif->dpif_class->ct_flush(dpif, zone, NULL)
> >     >              : EOPNOTSUPP);
> >     >  }
> >     >
> >     > @@ -583,112 +661,6 @@ ct_dpif_format_tcp_stat(struct ds * ds, int
> tcp_state, int conn_per_state)
> >     >      ds_put_format(ds, "=%u", conn_per_state);
> >     >  }
> >     >
> >     > -/* Parses a specification of a conntrack 5-tuple from 's' into
> 'tuple'.
> >     > - * Returns true on success.  Otherwise, returns false and puts
> the error
> >     > - * message in 'ds'. */
> >     > -bool
> >     > -ct_dpif_parse_tuple(struct ct_dpif_tuple *tuple, const char *s,
> struct ds *ds)
> >     > -{
> >     > -    char *pos, *key, *value, *copy;
> >     > -    memset(tuple, 0, sizeof *tuple);
> >     > -
> >     > -    pos = copy = xstrdup(s);
> >     > -    while (ofputil_parse_key_value(&pos, &key, &value)) {
> >     > -        if (!*value) {
> >     > -            ds_put_format(ds, "field %s missing value", key);
> >     > -            goto error;
> >     > -        }
> >     > -
> >     > -        if (!strcmp(key, "ct_nw_src") || !strcmp(key,
> "ct_nw_dst")) {
> >     > -            if (tuple->l3_type && tuple->l3_type != AF_INET) {
> >     > -                ds_put_cstr(ds, "L3 type set multiple times");
> >     > -                goto error;
> >     > -            } else {
> >     > -                tuple->l3_type = AF_INET;
> >     > -            }
> >     > -            if (!ip_parse(value, key[6] == 's' ? &tuple->src.ip :
> >     > -                                                 &tuple->dst.ip))
> {
> >     > -                goto error_with_msg;
> >     > -            }
> >     > -        } else if (!strcmp(key, "ct_ipv6_src") ||
> >     > -                   !strcmp(key, "ct_ipv6_dst")) {
> >     > -            if (tuple->l3_type && tuple->l3_type != AF_INET6) {
> >     > -                ds_put_cstr(ds, "L3 type set multiple times");
> >     > -                goto error;
> >     > -            } else {
> >     > -                tuple->l3_type = AF_INET6;
> >     > -            }
> >     > -            if (!ipv6_parse(value, key[8] == 's' ?
> &tuple->src.in6 :
> >     > -
>  &tuple->dst.in6)) {
> >     > -                goto error_with_msg;
> >     > -            }
> >     > -        } else if (!strcmp(key, "ct_nw_proto")) {
> >     > -            char *err = str_to_u8(value, key, &tuple->ip_proto);
> >     > -            if (err) {
> >     > -                free(err);
> >     > -                goto error_with_msg;
> >     > -            }
> >     > -        } else if (!strcmp(key, "ct_tp_src") ||
> !strcmp(key,"ct_tp_dst")) {
> >     > -            uint16_t port;
> >     > -            char *err = str_to_u16(value, key, &port);
> >     > -            if (err) {
> >     > -                free(err);
> >     > -                goto error_with_msg;
> >     > -            }
> >     > -            if (key[6] == 's') {
> >     > -                tuple->src_port = htons(port);
> >     > -            } else {
> >     > -                tuple->dst_port = htons(port);
> >     > -            }
> >     > -        } else if (!strcmp(key, "icmp_type") || !strcmp(key,
> "icmp_code") ||
> >     > -                   !strcmp(key, "icmp_id") ) {
> >     > -            if (tuple->ip_proto != IPPROTO_ICMP &&
> >     > -                tuple->ip_proto != IPPROTO_ICMPV6) {
> >     > -                ds_put_cstr(ds, "invalid L4 fields");
> >     > -                goto error;
> >     > -            }
> >     > -            uint16_t icmp_id;
> >     > -            char *err;
> >     > -            if (key[5] == 't') {
> >     > -                err = str_to_u8(value, key, &tuple->icmp_type);
> >     > -            } else if (key[5] == 'c') {
> >     > -                err = str_to_u8(value, key, &tuple->icmp_code);
> >     > -            } else {
> >     > -                err = str_to_u16(value, key, &icmp_id);
> >     > -                tuple->icmp_id = htons(icmp_id);
> >     > -            }
> >     > -            if (err) {
> >     > -                free(err);
> >     > -                goto error_with_msg;
> >     > -            }
> >     > -        } else {
> >     > -            ds_put_format(ds, "invalid conntrack tuple field:
> %s", key);
> >     > -            goto error;
> >     > -        }
> >     > -    }
> >     > -
> >     > -    if (ipv6_is_zero(&tuple->src.in6) ||
> ipv6_is_zero(&tuple->dst.in6) ||
> >     > -        !tuple->ip_proto) {
> >     > -        /* icmp_type, icmp_code, and icmp_id can be 0. */
> >     > -        if (tuple->ip_proto != IPPROTO_ICMP &&
> >     > -            tuple->ip_proto != IPPROTO_ICMPV6) {
> >     > -            if (!tuple->src_port || !tuple->dst_port) {
> >     > -                ds_put_cstr(ds, "at least one of the conntrack
> 5-tuple fields "
> >     > -                                "is missing.");
> >     > -                goto error;
> >     > -            }
> >     > -        }
> >     > -    }
> >     > -
> >     > -    free(copy);
> >     > -    return true;
> >     > -
> >     > -error_with_msg:
> >     > -    ds_put_format(ds, "failed to parse field %s", key);
> >     > -error:
> >     > -    free(copy);
> >     > -    return false;
> >     > -}
> >     >
> >     >  void
> >     >  ct_dpif_push_zone_limit(struct ovs_list *zone_limits, uint16_t
> zone,
> >     > diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
> >     > index 2848549b0..337c99dd4 100644
> >     > --- a/lib/ct-dpif.h
> >     > +++ b/lib/ct-dpif.h
> >     > @@ -17,6 +17,7 @@
> >     >  #ifndef CT_DPIF_H
> >     >  #define CT_DPIF_H
> >     >
> >     > +#include "openvswitch/ofp-util.h"
> >     >  #include "openvswitch/types.h"
> >     >  #include "packets.h"
> >     >
> >     > @@ -285,7 +286,7 @@ int ct_dpif_dump_start(struct dpif *, struct
> ct_dpif_dump_state **,
> >     >  int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct
> ct_dpif_entry *);
> >     >  int ct_dpif_dump_done(struct ct_dpif_dump_state *);
> >     >  int ct_dpif_flush(struct dpif *, const uint16_t *zone,
> >     > -                  const struct ct_dpif_tuple *);
> >     > +                  const struct ofputil_ct_match *);
> >     >  int ct_dpif_set_maxconns(struct dpif *dpif, uint32_t maxconns);
> >     >  int ct_dpif_get_maxconns(struct dpif *dpif, uint32_t *maxconns);
> >     >  int ct_dpif_get_nconns(struct dpif *dpif, uint32_t *nconns);
> >     > @@ -311,7 +312,6 @@ void ct_dpif_format_ipproto(struct ds *ds,
> uint16_t ipproto);
> >     >  void ct_dpif_format_tuple(struct ds *, const struct ct_dpif_tuple
> *);
> >     >  uint8_t ct_dpif_coalesce_tcp_state(uint8_t state);
> >     >  void ct_dpif_format_tcp_stat(struct ds *, int, int);
> >     > -bool ct_dpif_parse_tuple(struct ct_dpif_tuple *, const char *s,
> struct ds *);
> >     >  void ct_dpif_push_zone_limit(struct ovs_list *, uint16_t zone,
> uint32_t limit,
> >     >                               uint32_t count);
> >     >  void ct_dpif_free_zone_limits(struct ovs_list *);
> >     > diff --git a/lib/dpctl.c b/lib/dpctl.c
> >     > index 29041fa3e..e6e350aba 100644
> >     > --- a/lib/dpctl.c
> >     > +++ b/lib/dpctl.c
> >     > @@ -40,6 +40,7 @@
> >     >  #include "netdev.h"
> >     >  #include "netlink.h"
> >     >  #include "odp-util.h"
> >     > +#include "ofp-ct-util.h"
> >     >  #include "openvswitch/ofpbuf.h"
> >     >  #include "packets.h"
> >     >  #include "openvswitch/shash.h"
> >     > @@ -1707,47 +1708,38 @@ dpctl_flush_conntrack(int argc, const char
> *argv[],
> >     >                        struct dpctl_params *dpctl_p)
> >     >  {
> >     >      struct dpif *dpif = NULL;
> >     > -    struct ct_dpif_tuple tuple, *ptuple = NULL;
> >     > -    struct ds ds = DS_EMPTY_INITIALIZER;
> >     > -    uint16_t zone, *pzone = NULL;
> >     > -    int error;
> >     > +    struct ofputil_ct_match match = {0};
> >     > +    uint16_t zone;
> >     > +    bool with_zone = false;
> >     >      int args = argc - 1;
> >     >
> >     >      /* Parse ct tuple */
> >     > -    if (args && ct_dpif_parse_tuple(&tuple, argv[args], &ds)) {
> >     > -        ptuple = &tuple;
> >     > +    if (args) {
> >     > +        struct ds ds = DS_EMPTY_INITIALIZER;
> >     > +        if (!ofputil_ct_match_parse(&match, argv[args], &ds)) {
> >     > +            dpctl_error(dpctl_p, EINVAL, "%s", ds_cstr(&ds));
> >     > +            ds_destroy(&ds);
> >     > +            return EINVAL;
> >     > +        }
> >     >          args--;
> >     >      }
> >     >
> >     >      /* Parse zone */
> >     >      if (args && ovs_scan(argv[args], "zone=%"SCNu16, &zone)) {
> >     > -        pzone = &zone;
> >     > -        args--;
> >     > -    }
> >     > -
> >     > -    /* Report error if there are more than one unparsed argument.
> */
> >     > -    if (args > 1) {
> >     > -        ds_put_cstr(&ds, "invalid arguments");
> >     > -        error = EINVAL;
> >     > -        goto error;
> >     > +        with_zone = true;
> >     >      }
> >     >
> >     > -    error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
> >     > +    int error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
> >     >      if (error) {
> >     > +        dpctl_error(dpctl_p, error, "Cannot open dpif");
> >     >          return error;
> >     >      }
> >     >
> >     > -    error = ct_dpif_flush(dpif, pzone, ptuple);
> >     > -    if (!error) {
> >     > -        dpif_close(dpif);
> >     > -        return 0;
> >     > -    } else {
> >     > -        ds_put_cstr(&ds, "failed to flush conntrack");
> >     > +    error = ct_dpif_flush(dpif, with_zone ? &zone : NULL, &match);
> >     > +    if (error) {
> >     > +        dpctl_error(dpctl_p, error, "Failed to flush conntrack");
> >     >      }
> >     >
> >     > -error:
> >     > -    dpctl_error(dpctl_p, error, "%s", ds_cstr(&ds));
> >     > -    ds_destroy(&ds);
> >     >      dpif_close(dpif);
> >     >      return error;
> >     >  }
> >     > diff --git a/lib/dpctl.man b/lib/dpctl.man
> >     > index 87ea8087b..b0cabe05d 100644
> >     > --- a/lib/dpctl.man
> >     > +++ b/lib/dpctl.man
> >     > @@ -312,7 +312,8 @@ If \fBzone\fR=\fIzone\fR is specified, only
> flushes the connections in
> >     >  If \fIct-tuple\fR is provided, flushes the connection entry
> specified by
> >     >  \fIct-tuple\fR in \fIzone\fR. The zone defaults to 0 if it is not
> provided.
> >     >  The userspace connection tracker requires flushing with the
> original pre-NATed
> >     > -tuple and a warning log will be otherwise generated.
> >     > +tuple and a warning log will be otherwise generated. The tuple
> can be partial
> >     > +and will remove all connections that are matching on the
> specified fields.
> >     >  An example of an IPv4 ICMP \fIct-tuple\fR:
> >     >  .IP
> >     >
> "ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=1,icmp_type=8,icmp_code=0,icmp_id=10"
> >     > diff --git a/lib/ofp-ct-util.c b/lib/ofp-ct-util.c
> >     > new file mode 100644
> >     > index 000000000..a10607590
> >     > --- /dev/null
> >     > +++ b/lib/ofp-ct-util.c
> >     > @@ -0,0 +1,326 @@
> >     > +
> >     > +/* Copyright (c) 2022, Red Hat, Inc.
> >     > + *
> >     > + * Licensed under the Apache License, Version 2.0 (the "License");
> >     > + * you may not use this file except in compliance with the
> License.
> >     > + * You may obtain a copy of the License at:
> >     > + *
> >     > + *     http://www.apache.org/licenses/LICENSE-2.0 <
> http://www.apache.org/licenses/LICENSE-2.0>
> >     > + *
> >     > + * Unless required by applicable law or agreed to in writing,
> software
> >     > + * distributed under the License is distributed on an "AS IS"
> BASIS,
> >     > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
> or implied.
> >     > + * See the License for the specific language governing
> permissions and
> >     > + * limitations under the License.
> >     > + */
> >     > +
> >     > +#include <config.h>
> >     > +#include <stdbool.h>
> >     > +#include <stdint.h>
> >     > +#include <sys/types.h>
> >     > +#include <netinet/in.h>
> >     > +#include <netinet/icmp6.h>
> >     > +
> >     > +#include "ct-dpif.h"
> >     > +#include "ofp-ct-util.h"
> >     > +#include "openvswitch/dynamic-string.h"
> >     > +#include "openvswitch/ofp-parse.h"
> >     > +#include "openvswitch/ofp-util.h"
> >     > +#include "openvswitch/packets.h"
> >     > +
> >     > +static inline bool
> >     > +ofputil_ct_inet_addr_cmp_partial(const struct in6_addr *partial,
> >     > +                                 const union ct_dpif_inet_addr
> *addr,
> >     > +                                 const uint16_t l3_type)
> >     > +{
> >     > +    if (ipv6_is_zero(partial)) {
> >     > +        return true;
> >     > +    }
> >     > +
> >     > +    if (l3_type == AF_INET && in6_addr_get_mapped_ipv4(partial)
> != addr->ip) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    if (l3_type == AF_INET6 && !ipv6_addr_equals(partial,
> &addr->in6)) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    return true;
> >     > +}
> >     > +
> >     > +static inline bool
> >     > +ofputil_ct_tuple_ip_cmp_partial(const struct ofputil_ct_tuple
> *partial,
> >     > +                                const struct ct_dpif_tuple *tuple,
> >     > +                                const uint16_t l3_type, const
> uint8_t ip_proto)
> >     > +{
> >     > +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->src,
> >     > +                                          &tuple->src, l3_type)) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    if (!ofputil_ct_inet_addr_cmp_partial(&partial->dst,
> >     > +                                          &tuple->dst, l3_type)) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> >     > +        if (partial->icmp_id != tuple->icmp_id) {
> >     > +            return false;
> >     > +        }
> >     > +
> >     > +        if (partial->icmp_type != tuple->icmp_type) {
> >     > +            return false;
> >     > +        }
> >     > +
> >     > +        if (partial->icmp_code != tuple->icmp_code) {
> >     > +            return false;
> >     > +        }
> >     > +    } else {
> >     > +        if (partial->src_port && partial->src_port !=
> tuple->src_port) {
> >     > +            return false;
> >     > +        }
> >     > +
> >     > +        if (partial->dst_port && partial->dst_port !=
> tuple->dst_port) {
> >     > +            return false;
> >     > +        }
> >     > +    }
> >     > +
> >     > +    return true;
> >     > +}
> >     > +
> >     > +/* Compares the non-zero members if they match. This is useful
> for clearing
> >     > + * up all connections specified by a partial tuples for
> orig/reply. */
> >     > +bool
> >     > +ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
> >     > +                     const struct ct_dpif_entry *entry)
> >     > +{
> >     > +    if (match->l3_type && match->l3_type !=
> entry->tuple_orig.l3_type) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    if (match->ip_proto && match->ip_proto !=
> entry->tuple_orig.ip_proto) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_orig,
> >     > +                                         &entry->tuple_orig,
> >     > +                                         match->l3_type,
> match->ip_proto)) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_reply,
> >     > +                                         &entry->tuple_reply,
> >     > +                                         match->l3_type,
> match->ip_proto)) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    return true;
> >     > +}
> >     > +
> >     > +static void
> >     > +ofputil_ct_tuple_format(struct ds *ds, const struct
> ofputil_ct_tuple *tuple,
> >     > +                        uint8_t ip_proto)
> >     > +{
> >     > +    ds_put_cstr(ds, "src=");
> >     > +    ipv6_format_mapped(&tuple->src, ds);
> >     > +    ds_put_cstr(ds, ",dst=");
> >     > +    ipv6_format_mapped(&tuple->dst, ds);
> >     > +    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> >     > +        ds_put_format(ds, ",icmp_id=%u,icmp_type=%u,icmp_code=%u",
> >     > +                      ntohs(tuple->icmp_id), tuple->icmp_type,
> >     > +                      tuple->icmp_code);
> >     > +
> >     > +    } else {
> >     > +        ds_put_format(ds, ",src_port=%u,dst_port=%u",
> ntohs(tuple->src_port),
> >     > +                      ntohs(tuple->dst_port));
> >     > +    }
> >     > +}
> >     > +
> >     > +static bool
> >     > +ofputil_is_tuple_zero(const struct ofputil_ct_tuple *tuple)
> >     > +{
> >     > +    return ipv6_is_zero(&tuple->src) && ipv6_is_zero(&tuple->dst)
> &&
> >     > +           !tuple->src_port && !tuple->dst_port;
> >     > +}
> >     > +
> >     > +bool
> >     > +ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple
> *tuple,
> >     > +                               uint8_t ip_proto)
> >     > +{
> >     > +    /* First check if we have address. */
> >     > +    bool five_tuple = !ipv6_is_zero(&tuple->src) &&
> !ipv6_is_zero(&tuple->dst);
> >     > +
> >     > +    if (!(ip_proto == IPPROTO_ICMP || ip_proto ==
> IPPROTO_ICMPV6)) {
> >     > +        five_tuple = five_tuple && tuple->src_port &&
> tuple->dst_port;
> >     > +    }
> >     > +
> >     > +    return five_tuple;
> >     > +}
> >     > +
> >     > +bool
> >     > +ofputil_is_ct_match_zero(const struct ofputil_ct_match *match)
> >     > +{
> >     > +    return !match->ip_proto && !match->l3_type &&
> >     > +           ofputil_is_tuple_zero(&match->tuple_orig) &&
> >     > +           ofputil_is_tuple_zero(&match->tuple_reply);
> >     > +}
> >     > +
> >     > +void
> >     > +ofputil_ct_match_format(struct ds *ds, const struct
> ofputil_ct_match *match)
> >     > +{
> >     > +    ds_put_format(ds, " l3_type=%u,ip_proto=%u", match->l3_type,
> >     > +                  match->ip_proto);
> >     > +    ds_put_cstr(ds, ",orig=(");
> >     > +    ofputil_ct_tuple_format(ds, &match->tuple_orig,
> match->ip_proto);
> >     > +    ds_put_cstr(ds, "),reply=(");
> >     > +    ofputil_ct_tuple_format(ds, &match->tuple_reply,
> match->ip_proto);
> >     > +    ds_put_cstr(ds, ")");
> >
> >     This syntax is very different from what is used as cmdline arguments,
> >     that is not good.  This means you cannot feed back what OVS prints
> out.
> >
> >     Also, OVS usually uses 'nw_proto' instead of 'ip_proto', especially
> >     on the OpenFlow side of things.  And it might be better to print
> >     a string 'ip'/'ipv6' instead of 'l3_type', which is not great on its
> >     own.  See also comments on patch 2/2.
> >     They can also be combined into a single
> 'icmp'/'icmp6'/'udp'/'udp6'/etc.
> >     And there is already a function like this.
> >
> >     > +}
> >     > +
> >     > +static bool
> >     > +ofputil_ct_tuple_ip_parse(struct in6_addr *addr, char *value,
> uint16_t l3_type)
> >     > +{
> >     > +    if (!ipv6_is_zero(addr)) {
> >     > +        return false;
> >     > +    }
> >     > +
> >     > +    if (l3_type == AF_INET) {
> >     > +        ovs_be32 ip = 0;
> >     > +
> >     > +        ip_parse(value, &ip);
> >     > +        *addr = in6_addr_mapped_ipv4(ip);
> >     > +    } else {
> >     > +        ipv6_parse(value, addr);
> >
> >     Both ip_parse and ipv6_parse can fail.
> >     There is also lookup_any() function that can be exported and reused.
> >
> >     > +    }
> >     > +
> >     > +    return true;
> >     > +}
> >     > +
> >     > +/* Parses a specification of a conntrack 5-tuple from 's' into
> 'tuple'.
> >     > + * Returns true on success.  Otherwise, returns false and puts
> the error
> >     > + * message in 'ds'. */
> >     > +bool
> >     > +ofputil_ct_match_parse(struct ofputil_ct_match *match, const char
> *s,
> >     > +                       struct ds *ds)
> >     > +{
> >     > +    char *pos, *key, *value, *copy;
> >     > +
> >     > +    memset(match, 0, sizeof *match);
> >     > +    struct ofputil_ct_tuple *tuple = &match->tuple_orig;
> >     > +
> >     > +    pos = copy = xstrdup(s);
> >     > +    while (ofputil_parse_key_value(&pos, &key, &value)) {
> >     > +        if (!*value) {
> >     > +            ds_put_format(ds, "field %s missing value", key);
> >     > +            goto error_with_msg;
> >     > +        }
> >     > +
> >     > +        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")
> >     > +            || !strcmp(key, "ct_ipv6_src") || !strcmp(key,
> "ct_ipv6_dst")) {
> >     > +            match->l3_type = key[6] == '6' ? AF_INET6 : AF_INET;
> >
> >     Original code made sure that user didn't specify keys from
> >     different families.  This version does not.
> >
> >
> >     > +            uint8_t index = key[6] == '6' ? 8 : 6;
> >     > +            struct in6_addr *addr = key[index] == 's'
> >     > +                ? &tuple->src : &tuple->dst;
> >     > +
> >     > +            if (!ofputil_ct_tuple_ip_parse(addr, value,
> match->l3_type)) {
> >     > +                ds_put_format(ds, "%s is set multiple times",
> key);
> >     > +                goto error;
> >     > +            }
> >     > +        } else if (!strcmp(key, "ct_nw_proto")) {
> >     > +            char *err = str_to_u8(value, key, &match->ip_proto);
> >     > +
> >     > +            if (err) {
> >     > +                free(err);
> >     > +                goto error_with_msg;
> >     > +            }
> >     > +        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key,
> "ct_tp_dst")) {
> >     > +            uint16_t port;
> >     > +            char *err = str_to_u16(value, key, &port);
> >     > +
> >     > +            if (err) {
> >     > +                free(err);
> >     > +                goto error_with_msg;
> >     > +            }
> >     > +            if (key[6] == 's') {
> >     > +                tuple->src_port = htons(port);
> >     > +            } else {
> >     > +                tuple->dst_port = htons(port);
> >     > +            }
> >     > +        } else if (!strcmp(key, "icmp_type") || !strcmp(key,
> "icmp_code") ||
> >     > +                   !strcmp(key, "icmp_id")) {
> >     > +            if (match->ip_proto != IPPROTO_ICMP &&
> >     > +                match->ip_proto != IPPROTO_ICMPV6) {
> >     > +                ds_put_cstr(ds, "invalid L4 fields");
> >     > +                goto error;
> >     > +            }
> >     > +            uint16_t icmp_id;
> >     > +            char *err;
> >     > +
> >     > +            if (key[5] == 't') {
> >     > +                err = str_to_u8(value, key, &tuple->icmp_type);
> >     > +            } else if (key[5] == 'c') {
> >     > +                err = str_to_u8(value, key, &tuple->icmp_code);
> >     > +            } else {
> >     > +                err = str_to_u16(value, key, &icmp_id);
> >     > +                tuple->icmp_id = htons(icmp_id);
> >     > +            }
> >     > +            if (err) {
> >     > +                free(err);
> >     > +                goto error_with_msg;
> >     > +            }
> >     > +        } else {
> >     > +            ds_put_format(ds, "invalid conntrack tuple field:
> %s", key);
> >     > +            goto error;
> >     > +        }
> >     > +    }
> >     > +
> >     > +    if (!match->ip_proto && (tuple->src_port || tuple->dst_port))
> {
> >     > +        ds_put_cstr(ds, "port is set without protocol");
> >     > +        goto error;
> >     > +    }
> >     > +
> >     > +    /* For the filtering to work with icmp we need to fill the
> reply direction
> >     > +     * with correct information. */
> >     > +    if (match->ip_proto == IPPROTO_ICMP) {
> >     > +        switch (match->tuple_orig.icmp_type) {
> >     > +        case ICMP4_ECHO_REQUEST:
> >     > +            match->tuple_reply.icmp_type = ICMP4_ECHO_REPLY;
> >     > +            break;
> >
> >     An empty line after the 'break;'.
> >
> >     > +        case ICMP4_ECHO_REPLY:
> >     > +            match->tuple_reply.icmp_type = ICMP4_ECHO_REQUEST;
> >     > +            break;
> >     > +        case ICMP4_TIMESTAMP:
> >     > +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMPREPLY;
> >     > +            break;
> >     > +        case ICMP4_TIMESTAMPREPLY:
> >     > +            match->tuple_reply.icmp_type = ICMP4_TIMESTAMP;
> >     > +            break;
> >     > +        case ICMP4_INFOREQUEST:
> >     > +            match->tuple_reply.icmp_type = ICMP4_INFOREPLY;
> >     > +            break;
> >     > +        case ICMP4_INFOREPLY:
> >     > +            match->tuple_reply.icmp_type = ICMP4_INFOREQUEST;
> >     > +            break;
> >     > +        }
> >     > +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
> >     > +    } else if (match->ip_proto == IPPROTO_ICMPV6) {
> >     > +        switch (match->tuple_orig.icmp_type) {
> >     > +        case ICMP6_ECHO_REQUEST:
> >     > +            match->tuple_reply.icmp_type = ICMP6_ECHO_REPLY;
> >     > +            break;
> >     > +        case ICMP6_ECHO_REPLY:
> >     > +            match->tuple_reply.icmp_type = ICMP6_ECHO_REQUEST;
> >     > +            break;
> >     > +        }
> >     > +        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
> >     > +    }
> >     > +
> >     > +    free(copy);
> >     > +    return true;
> >     > +
> >     > +error_with_msg:
> >     > +    ds_put_format(ds, "failed to parse field %s", key);
> >     > +error:
> >     > +    free(copy);
> >     > +    return false;
> >     > +}
> >     > diff --git a/lib/ofp-ct-util.h b/lib/ofp-ct-util.h
> >     > new file mode 100644
> >     > index 000000000..5a5eaf920
> >     > --- /dev/null
> >     > +++ b/lib/ofp-ct-util.h
> >     > @@ -0,0 +1,36 @@
> >     > +/* Copyright (c) 2022, Red Hat, Inc.
> >     > + *
> >     > + * Licensed under the Apache License, Version 2.0 (the "License");
> >     > + * you may not use this file except in compliance with the
> License.
> >     > + * You may obtain a copy of the License at:
> >     > + *
> >     > + *     http://www.apache.org/licenses/LICENSE-2.0 <
> http://www.apache.org/licenses/LICENSE-2.0>
> >     > + *
> >     > + * Unless required by applicable law or agreed to in writing,
> software
> >     > + * distributed under the License is distributed on an "AS IS"
> BASIS,
> >     > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
> or implied.
> >     > + * See the License for the specific language governing
> permissions and
> >     > + * limitations under the License.
> >     > + */
> >     > +
> >     > +#ifndef OVS_OFP_CT_UTIL_H
> >     > +#define OVS_OFP_CT_UTIL_H
> >     > +
> >     > +#include "ct-dpif.h"
> >     > +#include "openvswitch/ofp-util.h"
> >     > +
> >     > +bool ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
> >     > +                                 const struct ct_dpif_entry
> *entry);
> >     > +
> >     > +bool ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple
> *tuple,
> >     > +                                    uint8_t ip_proto);
> >     > +
> >     > +void ofputil_ct_match_format(struct ds *ds,
> >     > +                             const struct ofputil_ct_match
> *match);
> >     > +
> >     > +bool ofputil_ct_match_parse(struct ofputil_ct_match *match, const
> char *s,
> >     > +                            struct ds *ds);
> >     > +
> >     > +bool ofputil_is_ct_match_zero(const struct ofputil_ct_match
> *match);
> >     > +
> >     > +#endif /* lib/ofp-ct-util.h */
> >     > diff --git a/tests/system-traffic.at <http://system-traffic.at>
> b/tests/system-traffic.at <http://system-traffic.at>
> >     > index e5403519f..51903a658 100644
> >     > --- a/tests/system-traffic.at <http://system-traffic.at>
> >     > +++ b/tests/system-traffic.at <http://system-traffic.at>
> >     > @@ -2230,7 +2230,7 @@
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.
> >     >  OVS_TRAFFIC_VSWITCHD_STOP
> >     >  AT_CLEANUP
> >     >
> >     > -AT_SETUP([conntrack - ct flush by 5-tuple])
> >     > +AT_SETUP([conntrack - ct flush])
> >     >  CHECK_CONNTRACK()
> >     >  OVS_TRAFFIC_VSWITCHD_START()
> >     >
> >     > @@ -2291,6 +2291,86 @@ AT_CHECK([ovs-appctl dpctl/flush-conntrack
> zone=5 $ICMP_TUPLE])
> >     >  AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [1], [dnl
> >     >  ])
> >     >
> >     > +dnl Test UDP from port 1 and 2, partial flush by src port
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> >     > +
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" |
> sort], [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_src=1'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_src=2'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [1])
> >     > +
> >     > +dnl Test UDP from port 1 and 2, partial flush by dst port
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> >     > +
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" |
> sort], [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_dst=2'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_proto=17,ct_tp_dst=1'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [1])
> >     > +
> >     > +dnl Test UDP from port 1 and 2, partial flush by src address
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> >     > +
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" |
> sort], [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.1'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [1])
> >     > +
> >     > +dnl Test UDP from port 1 and 2, partial flush by dst address
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> >     > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> >     > +
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" |
> sort], [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.2'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [0], [dnl
> >     >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> >     > +])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.1'])
> >     > +
> >     > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"],
> [1])
> >     > +
> >     >  OVS_TRAFFIC_VSWITCHD_STOP
> >     >  AT_CLEANUP
> >     >
> >
> >
> >
> > --
> >
> > Ales Musil
> >
> > Senior Software Engineer - OVN Core
> >
> > Red Hat EMEA <https://www.redhat.com>
> >
> > amusil@redhat.com <mailto:amusil@redhat.com>    IM: amusil
> >
> > <https://red.ht/sig>
> >
>
>
Ilya Maximets Jan. 11, 2023, 2:52 p.m. UTC | #5
On 1/11/23 14:09, Ales Musil wrote:
> 
> 
> On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> 
>     On 1/11/23 07:53, Ales Musil wrote:
>     >
>     >
>     > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:i.maximets@ovn.org>>> wrote:
>     >
>     >     On 12/16/22 13:30, Ales Musil wrote:
>     >     > Currently, the CT can be flushed by dpctl only be specifying
>     >
>     >     *by
>     >
>     >     > the whole 5-tuple. This is not very convenient when there are
>     >     > only some fields known to the user of CT flush. Add new struct
>     >     > ofputil_ct_match which represents the generic filtering that can
>     >     > be done for CT flush. The match is done only on fields that are
>     >     > non-zero with exception to the icmp fields.
>     >     >
>     >     > This allows the filtering just within dpctl, however
>     >     > it is a preparation for OpenFlow extension.
>     >     >
>     >     > Reported-at: https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546>>
>     >     > Signed-off-by: Ales Musil <amusil@redhat.com <mailto:amusil@redhat.com> <mailto:amusil@redhat.com <mailto:amusil@redhat.com>>>
>     >     > Acked-by: Paolo Valerio <pvalerio@redhat.com <mailto:pvalerio@redhat.com> <mailto:pvalerio@redhat.com <mailto:pvalerio@redhat.com>>>
>     >     > ---
>     >     > v5: Add ack from Paolo.
>     >     > v4: Fix a flush all scenario.
>     >     > v3: Rebase on top of master.
>     >     >     Address the C99 comment and missing dpif_close call.
>     >     > v2: Rebase on top of master.
>     >     >     Address comments from Paolo.
>     >     > ---
>     >
>     >     Hi, Ales.  Thanks for working on this change.
>     >     Sorry for review taking so long.  I initially though there
>     >     will be mostly a few cosmetic/style changes needed.  But
>     >     longer I look at this code, more questions I have.  See below.
>     >
>     >     Dumitru, maybe you can also help answer some of them?
>     >
>     >     Best regards, Ilya Maximets.
>     >
>     >
>     > Hi Ilya,
>     >
>     > thank you for the review. Before proceeding I tried to answer your concerns/questions.
>     > Let me know what you think.
>     >
>     > Thanks,
>     > Ales
>     >  
>     >
>     >
>     >     >  NEWS                           |   2 +
>     >     >  include/openvswitch/ofp-util.h |  28 +++
>     >     >  lib/automake.mk <http://automake.mk> <http://automake.mk <http://automake.mk>>                |   2 +
>     >     >  lib/ct-dpif.c                  | 208 +++++++++------------
>     >     >  lib/ct-dpif.h                  |   4 +-
>     >     >  lib/dpctl.c                    |  42 ++---
>     >     >  lib/dpctl.man                  |   3 +-
>     >     >  lib/ofp-ct-util.c              | 326 +++++++++++++++++++++++++++++++++
>     >     >  lib/ofp-ct-util.h              |  36 ++++
>     >     >  tests/system-traffic.at <http://system-traffic.at> <http://system-traffic.at <http://system-traffic.at>>        |  82 ++++++++-
>     >     >  10 files changed, 586 insertions(+), 147 deletions(-)
>     >     >  create mode 100644 lib/ofp-ct-util.c
>     >     >  create mode 100644 lib/ofp-ct-util.h
>     >     >
>     >     > diff --git a/NEWS b/NEWS
>     >     > index 265375e1c..ff8904b02 100644
>     >     > --- a/NEWS
>     >     > +++ b/NEWS
>     >     > @@ -14,6 +14,8 @@ Post-v3.0.0
>     >     >       10 Gbps link speed by default in case the actual link speed cannot be
>     >     >       determined.  Previously it was 10 Mbps.  Values can still be overridden
>     >     >       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
>     >     > +   - ovs-dpctl and related ovs-appctl commands:
>     >
>     >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
>     >
>     >     > +     * "flush-conntrack" is capable of handling partial 5-tuple.
>     >     > 
>     >     > 
>     >     >  v3.0.0 - 15 Aug 2022
>     >     > diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
>     >     > index 091a09cad..84937ae26 100644
>     >     > --- a/include/openvswitch/ofp-util.h
>     >     > +++ b/include/openvswitch/ofp-util.h
>     >     > @@ -19,6 +19,9 @@
>     >     > 
>     >     >  #include <stdbool.h>
>     >     >  #include <stdint.h>
>     >     > +#include <sys/types.h>
>     >     > +#include <netinet/in.h>
>     >     > +
>     >     >  #include "openvswitch/ofp-protocol.h"
>     >     > 
>     >     >  struct ofp_header;
>     >     > @@ -27,6 +30,31 @@ struct ofp_header;
>     >     >  extern "C" {
>     >     >  #endif
>     >     > 
>     >     > +struct ofputil_ct_tuple {
>     >     > +    struct in6_addr src;
>     >     > +    struct in6_addr dst;
>     >     > +
>     >     > +    union {
>     >     > +        ovs_be16 src_port;
>     >     > +        ovs_be16 icmp_id;
>     >     > +    };
>     >     > +    union {
>     >     > +        ovs_be16 dst_port;
>     >     > +        struct {
>     >     > +            uint8_t icmp_code;
>     >     > +            uint8_t icmp_type;
>     >     > +        };
>     >     > +    };
>     >     > +};
>     >     > +
>     >     > +struct ofputil_ct_match {
>     >     > +    uint8_t ip_proto;
>     >     > +    uint16_t l3_type;
>     >     > +
>     >     > +    struct ofputil_ct_tuple tuple_orig;
>     >     > +    struct ofputil_ct_tuple tuple_reply;
>     >     > +};
>     >
>     >     These structures are very similar to existing ct_dpif_tuple.
>     >     We can keep them for now, but I think, we'll need to replace
>     >     ct_dpif_tuple with the new ofputil_ct_tuple in the future.
>     >     We may need some changes in ofputil_ct_tuple for that though.
>     >
>     >
>     > I was thinking about replacing them, however it would require a much
>     > bigger change, which was out of scope of this RFE. Also I was not entirely
>     > sure if there wasn't any specific reason for the ct_dpif_tuple specifically
>     > the address part. It seems like optimization to me.
>     >
>     > Anyway it should be possible to unify those if needed.
>     >  
>     >
>     >
>     >
>     >     OTOH, I'm not sure I understand why we need both tuples here.
>     >     Can we have a tuple and a flag that indicates a direction
>     >     instead?
>     >
>     >
>     > That is also a possibility. However that wouldn't fit too that nicely in the
>     > whole match struct, we could probably duplicate the proto field and get rid of the
>     > l3_type. That way we could supply just two tuples instead, getting rid of the match completely.
> 
>     The point was to have only one tuple.  See below.
> 
>     >  
>     >
>     >
>     >     There are several issues with having both original and reply
>     >     tuples, IIUC:
>     >
>     >     1. None of the command line interfaces support specifying
>     >        reply tuple.
>     >
>     >
>     > That was the intention of the original command AFAICT, we could modify the commands
>     > to support that. However I'm not sure if there is any benefit in that.
> 
>     ovs-ofctl supposed to mimic the OpenFlow request, so it should
>     support everything that OpenFlow request can have.
> 
> 
> I see, sure that can definitely be done, however we should probably think how this can
> be done in a more universal way that could be used also for dpctl.
> 
> 
>     >  
>     >
>     >
>     >     2. If the reply tuple will be specified via OpenFlow, it
>     >        will be ignored if the original direction tuple is
>     >        fully specified.
>     >
>     >
>     > That is ok IMO, because a fully specified original should be enough to determine the one
>     > entry that needs to be erased. So the reply direction in that would be an extra.
> 
>     That supports the point that we don't need two tuples in the request.
> 
> 
> Right but only if we are able to supply the full original direction. See below.
>  
> 
> 
>     >  
>     >
>     >
>     >     3. Netfilter kernel interface allows specifying only one
>     >        tuple, not both, AFAICT.
>     >
>     >
>     > Right, that's the reason why, when we match, we use the original tuple to erase that
>     > entry.
>     >  
>     >
>     >
>     >     I'm not a conntrack expert, so I have a question: Is it possible
>     >     that addresses and ports in original and reply directions are
>     >     not mirrored copies?  Or do we care about these cases?
>     >
>     >
>     > Not sure if it's possible, but even if it would be, the current implementation
>     > would account for that, with the exception of specifying fully the original
>     > direction.
>     >  
>     >
>     >
>     >     If we want to flush all entries related to the same IP:PORT,
>     >     we will not be able to do that in a single OpenFlow request,
>     >     because we'll need to supply the same IP and PORT as src
>     >     and dst, or as src in both original and reply directions.
>     >     None of these can fit into a single request.
>     >
>     >
>     > Is that really a problem? You could do that in two, one for the original
>     > one for reply. This is also very similar to the interface supported by conntrack-tools,
>     > with those you couldn't remove all entries in a single call as well. We could modify
>     > the interface that would allow matching on tuple without direction specification,
>     > that would translate to both directions, however there is still the question if that's needed.
> 
> 
>     It's not a problem, but another supporting point for the interface
>     being redundant.  We don't seem to need two tuples in the request,
>     because users will have to send two different requests anyway.
> 
>     So, the request should have:
> 
>     1. zone
>     2. nw_proto
>     3. tuple
>     4. tuple type ('original' or 'reply'), in case we want to allow
>        users to flush based on reply tuple.  But this seems a bit
>        redundant as well.
> 
>     Something like:
> 
>     struct ofputil_ct_tuple {
>     }
> 
>     struct ofputil_ct_match {
>         uint8_t ip_proto;
>         uint16_t l3_type;
>         // enum DIRECTION direction;
>         struct ofputil_ct_tuple tuple;
>     };
> 
> 
>     And the OpenFlow part:
> 
>     /* CT flush available TLVs. */
>     enum nx_ct_flush_tlv_type {
>         NXT_CT_ZONE_ID,           /* u8: CT zone id. */
>         NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_* attributes. */
>     };
> 
>     enum nx_ct_flush_tuple_type {
>         NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped IPv4 address. */
>         NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or mapped IPv4 address. */
>         NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
>         NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
>         NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
>         NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
>         NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
>         // NXT_CT_TUPLE_DIRECTION,   /* enum nx_ct_flush_tuple_direction: CT tuple direction. */
>     };
> 
>     //enum nx_ct_flush_tuple_direction {
>     //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
>     //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
>     //}
> 
>     struct nx_ct_flush {
>         uint8_t nw_proto;          /* IP protocol. */
>         uint8_t pad[7];            /* Align to 64-bit (must be zero). */
>         /* Followed by ... */
>     };
> 
> 
>     Such interface will also allow us in to have both direction specified
>     if we would want that in the future.  For now we should check that
>     each TLV type is specified once, i.e. only one tuple is in the request.
> 
>     We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can add it
>     in later releases if necessary.  Tuples without that TLV can be treated
>     as original direction.
> 
> 
> The original idea was something similar but as stated below that wouldn't really help
> us in the OVN case. And there is probably no significant benefit in having partial match
> with only single direction at time. I'm not sure if I'm not missing something obvious.
> 
> 
>     >  
>     >
>     >
>     >     At this point is there actually any value in flushing basing
>     >     on reply direction?  Shouldn't we be able to flush the same
>     >     entries by flushing by dst in original direction, if we will
>     >     need to issue multiple requests anyway?
>     >
>     >
>     > That assumption would be fine when you can specify a full tuple for the original direction.
>     > The problem is that for example OVN we know only the original DST_IP:DST_PORT (VIP)
>     > and the reply SRC_IP:SRC_PORT (backend).
> 
>     Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
> 
> 
> Unfortunately no, it's DNATed so for example
> 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
> will have CT entries as follows:
> 
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2
> 
> We cannot simply filter on the original direction, as we cannot infer the SRC part (without filtering all entries etc.).

OK.  This is what I was asking above and you said
'Not sure if it's possible'.  So, it is possible,
I see.

> 
>  
> 
> 
>     > That means we cannot fully specify the original direction
>     > without doing the filtering ourselves. It would be possible with extra steps from the user. If this interface
>     > is too generic/complicated for that purpose we could say that the flush extension supports only full orig tuple
>     > and leave the filtering up to the user. That would definitely work, it wouldn't be that elegant for the user however
>     > from the code change perspective much simpler.
> 
>     I'm not sure I understand that part.  I'm not arguing aginst partial
>     matches.  Or do you mean something else?
> 
>     >
>     >  I can see a certain benefit of keeping it in OvS:
>     >
>     > 1) The filtering complexity is hidden from the user.
>     > 2) The data transfer between OvS and the user is much smaller (even if you need to send two matches).
>     > 3) Once implemented it's just ready to be used without the need of additional thoughts going into it.
>     >
>     >
>     >  
>     >
>     >
>     >     Did you start working on the OVN side of things?  Which
>     >     commands OVN will actually use?
>     >
>     >
>     > Yes, I did. The plan is that OVN will use the extension.
> 
>     I understand that.  The question was how the actual request will
>     look like.
> 
> 
> The testing version used something like this:
> 
>     memset(match, 0, sizeof *match);
> 
>     struct ofputil_ct_tuple *orig = &match->tuple_orig;
>     struct ofputil_ct_tuple *reply = &match->tuple_reply;
> 
>     if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
>         match->l3_type = AF_INET;
>     } else {
>         match->l3_type = AF_INET6;
>     }
> 
>     match->ip_proto = tuple->proto;
> 
>     orig->dst = tuple->vip_ip;
>     orig->dst_port = htons(tuple->vip_port);
> 
>     reply->src = tuple->backend_ip;
>     reply->src_port = htons(tuple->backend_port);
> 
> tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND, BACKEND_PORT, PROTO.
> match is the "struct ofputil_ct_match".
> 
> This is with older version of this RFE but it should illustrate the point.

Assuming the current v5 implementation of the extension, will the end
result of the request be different if we remove the assignment of
'orig=>dst' and 'orig->dst_port' from the code above?
Ales Musil Jan. 11, 2023, 3:09 p.m. UTC | #6
On Wed, Jan 11, 2023 at 3:52 PM Ilya Maximets <i.maximets@ovn.org> wrote:

> On 1/11/23 14:09, Ales Musil wrote:
> >
> >
> > On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <i.maximets@ovn.org
> <mailto:i.maximets@ovn.org>> wrote:
> >
> >     On 1/11/23 07:53, Ales Musil wrote:
> >     >
> >     >
> >     > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <i.maximets@ovn.org
> <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:
> i.maximets@ovn.org>>> wrote:
> >     >
> >     >     On 12/16/22 13:30, Ales Musil wrote:
> >     >     > Currently, the CT can be flushed by dpctl only be specifying
> >     >
> >     >     *by
> >     >
> >     >     > the whole 5-tuple. This is not very convenient when there are
> >     >     > only some fields known to the user of CT flush. Add new
> struct
> >     >     > ofputil_ct_match which represents the generic filtering that
> can
> >     >     > be done for CT flush. The match is done only on fields that
> are
> >     >     > non-zero with exception to the icmp fields.
> >     >     >
> >     >     > This allows the filtering just within dpctl, however
> >     >     > it is a preparation for OpenFlow extension.
> >     >     >
> >     >     > Reported-at: https://bugzilla.redhat.com/2120546 <
> https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546
> <https://bugzilla.redhat.com/2120546>>
> >     >     > Signed-off-by: Ales Musil <amusil@redhat.com <mailto:
> amusil@redhat.com> <mailto:amusil@redhat.com <mailto:amusil@redhat.com>>>
> >     >     > Acked-by: Paolo Valerio <pvalerio@redhat.com <mailto:
> pvalerio@redhat.com> <mailto:pvalerio@redhat.com <mailto:
> pvalerio@redhat.com>>>
> >     >     > ---
> >     >     > v5: Add ack from Paolo.
> >     >     > v4: Fix a flush all scenario.
> >     >     > v3: Rebase on top of master.
> >     >     >     Address the C99 comment and missing dpif_close call.
> >     >     > v2: Rebase on top of master.
> >     >     >     Address comments from Paolo.
> >     >     > ---
> >     >
> >     >     Hi, Ales.  Thanks for working on this change.
> >     >     Sorry for review taking so long.  I initially though there
> >     >     will be mostly a few cosmetic/style changes needed.  But
> >     >     longer I look at this code, more questions I have.  See below.
> >     >
> >     >     Dumitru, maybe you can also help answer some of them?
> >     >
> >     >     Best regards, Ilya Maximets.
> >     >
> >     >
> >     > Hi Ilya,
> >     >
> >     > thank you for the review. Before proceeding I tried to answer your
> concerns/questions.
> >     > Let me know what you think.
> >     >
> >     > Thanks,
> >     > Ales
> >     >
> >     >
> >     >
> >     >     >  NEWS                           |   2 +
> >     >     >  include/openvswitch/ofp-util.h |  28 +++
> >     >     >  lib/automake.mk <http://automake.mk> <http://automake.mk <
> http://automake.mk>>                |   2 +
> >     >     >  lib/ct-dpif.c                  | 208 +++++++++------------
> >     >     >  lib/ct-dpif.h                  |   4 +-
> >     >     >  lib/dpctl.c                    |  42 ++---
> >     >     >  lib/dpctl.man                  |   3 +-
> >     >     >  lib/ofp-ct-util.c              | 326
> +++++++++++++++++++++++++++++++++
> >     >     >  lib/ofp-ct-util.h              |  36 ++++
> >     >     >  tests/system-traffic.at <http://system-traffic.at> <
> http://system-traffic.at <http://system-traffic.at>>        |  82
> ++++++++-
> >     >     >  10 files changed, 586 insertions(+), 147 deletions(-)
> >     >     >  create mode 100644 lib/ofp-ct-util.c
> >     >     >  create mode 100644 lib/ofp-ct-util.h
> >     >     >
> >     >     > diff --git a/NEWS b/NEWS
> >     >     > index 265375e1c..ff8904b02 100644
> >     >     > --- a/NEWS
> >     >     > +++ b/NEWS
> >     >     > @@ -14,6 +14,8 @@ Post-v3.0.0
> >     >     >       10 Gbps link speed by default in case the actual link
> speed cannot be
> >     >     >       determined.  Previously it was 10 Mbps.  Values can
> still be overridden
> >     >     >       by specifying 'max-rate' or '[r]stp-path-cost'
> accordingly.
> >     >     > +   - ovs-dpctl and related ovs-appctl commands:
> >     >
> >     >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
> >     >
> >     >     > +     * "flush-conntrack" is capable of handling partial
> 5-tuple.
> >     >     >
> >     >     >
> >     >     >  v3.0.0 - 15 Aug 2022
> >     >     > diff --git a/include/openvswitch/ofp-util.h
> b/include/openvswitch/ofp-util.h
> >     >     > index 091a09cad..84937ae26 100644
> >     >     > --- a/include/openvswitch/ofp-util.h
> >     >     > +++ b/include/openvswitch/ofp-util.h
> >     >     > @@ -19,6 +19,9 @@
> >     >     >
> >     >     >  #include <stdbool.h>
> >     >     >  #include <stdint.h>
> >     >     > +#include <sys/types.h>
> >     >     > +#include <netinet/in.h>
> >     >     > +
> >     >     >  #include "openvswitch/ofp-protocol.h"
> >     >     >
> >     >     >  struct ofp_header;
> >     >     > @@ -27,6 +30,31 @@ struct ofp_header;
> >     >     >  extern "C" {
> >     >     >  #endif
> >     >     >
> >     >     > +struct ofputil_ct_tuple {
> >     >     > +    struct in6_addr src;
> >     >     > +    struct in6_addr dst;
> >     >     > +
> >     >     > +    union {
> >     >     > +        ovs_be16 src_port;
> >     >     > +        ovs_be16 icmp_id;
> >     >     > +    };
> >     >     > +    union {
> >     >     > +        ovs_be16 dst_port;
> >     >     > +        struct {
> >     >     > +            uint8_t icmp_code;
> >     >     > +            uint8_t icmp_type;
> >     >     > +        };
> >     >     > +    };
> >     >     > +};
> >     >     > +
> >     >     > +struct ofputil_ct_match {
> >     >     > +    uint8_t ip_proto;
> >     >     > +    uint16_t l3_type;
> >     >     > +
> >     >     > +    struct ofputil_ct_tuple tuple_orig;
> >     >     > +    struct ofputil_ct_tuple tuple_reply;
> >     >     > +};
> >     >
> >     >     These structures are very similar to existing ct_dpif_tuple.
> >     >     We can keep them for now, but I think, we'll need to replace
> >     >     ct_dpif_tuple with the new ofputil_ct_tuple in the future.
> >     >     We may need some changes in ofputil_ct_tuple for that though.
> >     >
> >     >
> >     > I was thinking about replacing them, however it would require a
> much
> >     > bigger change, which was out of scope of this RFE. Also I was not
> entirely
> >     > sure if there wasn't any specific reason for the ct_dpif_tuple
> specifically
> >     > the address part. It seems like optimization to me.
> >     >
> >     > Anyway it should be possible to unify those if needed.
> >     >
> >     >
> >     >
> >     >
> >     >     OTOH, I'm not sure I understand why we need both tuples here.
> >     >     Can we have a tuple and a flag that indicates a direction
> >     >     instead?
> >     >
> >     >
> >     > That is also a possibility. However that wouldn't fit too that
> nicely in the
> >     > whole match struct, we could probably duplicate the proto field
> and get rid of the
> >     > l3_type. That way we could supply just two tuples instead, getting
> rid of the match completely.
> >
> >     The point was to have only one tuple.  See below.
> >
> >     >
> >     >
> >     >
> >     >     There are several issues with having both original and reply
> >     >     tuples, IIUC:
> >     >
> >     >     1. None of the command line interfaces support specifying
> >     >        reply tuple.
> >     >
> >     >
> >     > That was the intention of the original command AFAICT, we could
> modify the commands
> >     > to support that. However I'm not sure if there is any benefit in
> that.
> >
> >     ovs-ofctl supposed to mimic the OpenFlow request, so it should
> >     support everything that OpenFlow request can have.
> >
> >
> > I see, sure that can definitely be done, however we should probably
> think how this can
> > be done in a more universal way that could be used also for dpctl.
> >
> >
> >     >
> >     >
> >     >
> >     >     2. If the reply tuple will be specified via OpenFlow, it
> >     >        will be ignored if the original direction tuple is
> >     >        fully specified.
> >     >
> >     >
> >     > That is ok IMO, because a fully specified original should be
> enough to determine the one
> >     > entry that needs to be erased. So the reply direction in that
> would be an extra.
> >
> >     That supports the point that we don't need two tuples in the request.
> >
> >
> > Right but only if we are able to supply the full original direction. See
> below.
> >
> >
> >
> >     >
> >     >
> >     >
> >     >     3. Netfilter kernel interface allows specifying only one
> >     >        tuple, not both, AFAICT.
> >     >
> >     >
> >     > Right, that's the reason why, when we match, we use the original
> tuple to erase that
> >     > entry.
> >     >
> >     >
> >     >
> >     >     I'm not a conntrack expert, so I have a question: Is it
> possible
> >     >     that addresses and ports in original and reply directions are
> >     >     not mirrored copies?  Or do we care about these cases?
> >     >
> >     >
> >     > Not sure if it's possible, but even if it would be, the current
> implementation
> >     > would account for that, with the exception of specifying fully the
> original
> >     > direction.
> >     >
> >     >
> >     >
> >     >     If we want to flush all entries related to the same IP:PORT,
> >     >     we will not be able to do that in a single OpenFlow request,
> >     >     because we'll need to supply the same IP and PORT as src
> >     >     and dst, or as src in both original and reply directions.
> >     >     None of these can fit into a single request.
> >     >
> >     >
> >     > Is that really a problem? You could do that in two, one for the
> original
> >     > one for reply. This is also very similar to the interface
> supported by conntrack-tools,
> >     > with those you couldn't remove all entries in a single call as
> well. We could modify
> >     > the interface that would allow matching on tuple without direction
> specification,
> >     > that would translate to both directions, however there is still
> the question if that's needed.
> >
> >
> >     It's not a problem, but another supporting point for the interface
> >     being redundant.  We don't seem to need two tuples in the request,
> >     because users will have to send two different requests anyway.
> >
> >     So, the request should have:
> >
> >     1. zone
> >     2. nw_proto
> >     3. tuple
> >     4. tuple type ('original' or 'reply'), in case we want to allow
> >        users to flush based on reply tuple.  But this seems a bit
> >        redundant as well.
> >
> >     Something like:
> >
> >     struct ofputil_ct_tuple {
> >     }
> >
> >     struct ofputil_ct_match {
> >         uint8_t ip_proto;
> >         uint16_t l3_type;
> >         // enum DIRECTION direction;
> >         struct ofputil_ct_tuple tuple;
> >     };
> >
> >
> >     And the OpenFlow part:
> >
> >     /* CT flush available TLVs. */
> >     enum nx_ct_flush_tlv_type {
> >         NXT_CT_ZONE_ID,           /* u8: CT zone id. */
> >         NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_* attributes. */
> >     };
> >
> >     enum nx_ct_flush_tuple_type {
> >         NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped IPv4
> address. */
> >         NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or mapped
> IPv4 address. */
> >         NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
> >         NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
> >         NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
> >         NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
> >         NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
> >         // NXT_CT_TUPLE_DIRECTION,   /* enum
> nx_ct_flush_tuple_direction: CT tuple direction. */
> >     };
> >
> >     //enum nx_ct_flush_tuple_direction {
> >     //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
> >     //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
> >     //}
> >
> >     struct nx_ct_flush {
> >         uint8_t nw_proto;          /* IP protocol. */
> >         uint8_t pad[7];            /* Align to 64-bit (must be zero). */
> >         /* Followed by ... */
> >     };
> >
> >
> >     Such interface will also allow us in to have both direction specified
> >     if we would want that in the future.  For now we should check that
> >     each TLV type is specified once, i.e. only one tuple is in the
> request.
> >
> >     We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can add it
> >     in later releases if necessary.  Tuples without that TLV can be
> treated
> >     as original direction.
> >
> >
> > The original idea was something similar but as stated below that
> wouldn't really help
> > us in the OVN case. And there is probably no significant benefit in
> having partial match
> > with only single direction at time. I'm not sure if I'm not missing
> something obvious.
> >
> >
> >     >
> >     >
> >     >
> >     >     At this point is there actually any value in flushing basing
> >     >     on reply direction?  Shouldn't we be able to flush the same
> >     >     entries by flushing by dst in original direction, if we will
> >     >     need to issue multiple requests anyway?
> >     >
> >     >
> >     > That assumption would be fine when you can specify a full tuple
> for the original direction.
> >     > The problem is that for example OVN we know only the original
> DST_IP:DST_PORT (VIP)
> >     > and the reply SRC_IP:SRC_PORT (backend).
> >
> >     Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
> >
> >
> > Unfortunately no, it's DNATed so for example
> > 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
> > will have CT entries as follows:
> >
> >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
> >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
> >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2
> >
> > We cannot simply filter on the original direction, as we cannot infer
> the SRC part (without filtering all entries etc.).
>
> OK.  This is what I was asking above and you said
> 'Not sure if it's possible'.  So, it is possible,
> I see.
>

Yeah my bad I did not connect the dots right away.


>
> >
> >
> >
> >
> >     > That means we cannot fully specify the original direction
> >     > without doing the filtering ourselves. It would be possible with
> extra steps from the user. If this interface
> >     > is too generic/complicated for that purpose we could say that the
> flush extension supports only full orig tuple
> >     > and leave the filtering up to the user. That would definitely
> work, it wouldn't be that elegant for the user however
> >     > from the code change perspective much simpler.
> >
> >     I'm not sure I understand that part.  I'm not arguing aginst partial
> >     matches.  Or do you mean something else?
> >
> >     >
> >     >  I can see a certain benefit of keeping it in OvS:
> >     >
> >     > 1) The filtering complexity is hidden from the user.
> >     > 2) The data transfer between OvS and the user is much smaller
> (even if you need to send two matches).
> >     > 3) Once implemented it's just ready to be used without the need of
> additional thoughts going into it.
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >     Did you start working on the OVN side of things?  Which
> >     >     commands OVN will actually use?
> >     >
> >     >
> >     > Yes, I did. The plan is that OVN will use the extension.
> >
> >     I understand that.  The question was how the actual request will
> >     look like.
> >
> >
> > The testing version used something like this:
> >
> >     memset(match, 0, sizeof *match);
> >
> >     struct ofputil_ct_tuple *orig = &match->tuple_orig;
> >     struct ofputil_ct_tuple *reply = &match->tuple_reply;
> >
> >     if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
> >         match->l3_type = AF_INET;
> >     } else {
> >         match->l3_type = AF_INET6;
> >     }
> >
> >     match->ip_proto = tuple->proto;
> >
> >     orig->dst = tuple->vip_ip;
> >     orig->dst_port = htons(tuple->vip_port);
> >
> >     reply->src = tuple->backend_ip;
> >     reply->src_port = htons(tuple->backend_port);
> >
> > tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND,
> BACKEND_PORT, PROTO.
> > match is the "struct ofputil_ct_match".
> >
> > This is with older version of this RFE but it should illustrate the
> point.
>
> Assuming the current v5 implementation of the extension, will the end
> result of the request be different if we remove the assignment of
> 'orig=>dst' and 'orig->dst_port' from the code above?
>
>
It might delete more than we actually want to. Without specifying the LBs
5-tuple
we might always remove more entries than we should. As we can have multiple
backends
for the same VIP (that's the whole point of LB), however nothing prevents
us from two VIPs
pointing to the same backend. Which would cause an issue if we will go only
with the reply part.
Ilya Maximets Jan. 11, 2023, 3:27 p.m. UTC | #7
On 1/11/23 16:09, Ales Musil wrote:
> 
> 
> On Wed, Jan 11, 2023 at 3:52 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> 
>     On 1/11/23 14:09, Ales Musil wrote:
>     >
>     >
>     > On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:i.maximets@ovn.org>>> wrote:
>     >
>     >     On 1/11/23 07:53, Ales Musil wrote:
>     >     >
>     >     >
>     >     > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:i.maximets@ovn.org>> <mailto:i.maximets@ovn.org <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:i.maximets@ovn.org>>>> wrote:
>     >     >
>     >     >     On 12/16/22 13:30, Ales Musil wrote:
>     >     >     > Currently, the CT can be flushed by dpctl only be specifying
>     >     >
>     >     >     *by
>     >     >
>     >     >     > the whole 5-tuple. This is not very convenient when there are
>     >     >     > only some fields known to the user of CT flush. Add new struct
>     >     >     > ofputil_ct_match which represents the generic filtering that can
>     >     >     > be done for CT flush. The match is done only on fields that are
>     >     >     > non-zero with exception to the icmp fields.
>     >     >     >
>     >     >     > This allows the filtering just within dpctl, however
>     >     >     > it is a preparation for OpenFlow extension.
>     >     >     >
>     >     >     > Reported-at: https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546>> <https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546>>>
>     >     >     > Signed-off-by: Ales Musil <amusil@redhat.com <mailto:amusil@redhat.com> <mailto:amusil@redhat.com <mailto:amusil@redhat.com>> <mailto:amusil@redhat.com <mailto:amusil@redhat.com> <mailto:amusil@redhat.com <mailto:amusil@redhat.com>>>>
>     >     >     > Acked-by: Paolo Valerio <pvalerio@redhat.com <mailto:pvalerio@redhat.com> <mailto:pvalerio@redhat.com <mailto:pvalerio@redhat.com>> <mailto:pvalerio@redhat.com <mailto:pvalerio@redhat.com> <mailto:pvalerio@redhat.com <mailto:pvalerio@redhat.com>>>>
>     >     >     > ---
>     >     >     > v5: Add ack from Paolo.
>     >     >     > v4: Fix a flush all scenario.
>     >     >     > v3: Rebase on top of master.
>     >     >     >     Address the C99 comment and missing dpif_close call.
>     >     >     > v2: Rebase on top of master.
>     >     >     >     Address comments from Paolo.
>     >     >     > ---
>     >     >
>     >     >     Hi, Ales.  Thanks for working on this change.
>     >     >     Sorry for review taking so long.  I initially though there
>     >     >     will be mostly a few cosmetic/style changes needed.  But
>     >     >     longer I look at this code, more questions I have.  See below.
>     >     >
>     >     >     Dumitru, maybe you can also help answer some of them?
>     >     >
>     >     >     Best regards, Ilya Maximets.
>     >     >
>     >     >
>     >     > Hi Ilya,
>     >     >
>     >     > thank you for the review. Before proceeding I tried to answer your concerns/questions.
>     >     > Let me know what you think.
>     >     >
>     >     > Thanks,
>     >     > Ales
>     >     >  
>     >     >
>     >     >
>     >     >     >  NEWS                           |   2 +
>     >     >     >  include/openvswitch/ofp-util.h |  28 +++
>     >     >     >  lib/automake.mk <http://automake.mk> <http://automake.mk <http://automake.mk>> <http://automake.mk <http://automake.mk> <http://automake.mk <http://automake.mk>>>                |   2 +
>     >     >     >  lib/ct-dpif.c                  | 208 +++++++++------------
>     >     >     >  lib/ct-dpif.h                  |   4 +-
>     >     >     >  lib/dpctl.c                    |  42 ++---
>     >     >     >  lib/dpctl.man                  |   3 +-
>     >     >     >  lib/ofp-ct-util.c              | 326 +++++++++++++++++++++++++++++++++
>     >     >     >  lib/ofp-ct-util.h              |  36 ++++
>     >     >     >  tests/system-traffic.at <http://system-traffic.at> <http://system-traffic.at <http://system-traffic.at>> <http://system-traffic.at <http://system-traffic.at> <http://system-traffic.at <http://system-traffic.at>>>        |  82 ++++++++-
>     >     >     >  10 files changed, 586 insertions(+), 147 deletions(-)
>     >     >     >  create mode 100644 lib/ofp-ct-util.c
>     >     >     >  create mode 100644 lib/ofp-ct-util.h
>     >     >     >
>     >     >     > diff --git a/NEWS b/NEWS
>     >     >     > index 265375e1c..ff8904b02 100644
>     >     >     > --- a/NEWS
>     >     >     > +++ b/NEWS
>     >     >     > @@ -14,6 +14,8 @@ Post-v3.0.0
>     >     >     >       10 Gbps link speed by default in case the actual link speed cannot be
>     >     >     >       determined.  Previously it was 10 Mbps.  Values can still be overridden
>     >     >     >       by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
>     >     >     > +   - ovs-dpctl and related ovs-appctl commands:
>     >     >
>     >     >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better header.
>     >     >
>     >     >     > +     * "flush-conntrack" is capable of handling partial 5-tuple.
>     >     >     > 
>     >     >     > 
>     >     >     >  v3.0.0 - 15 Aug 2022
>     >     >     > diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
>     >     >     > index 091a09cad..84937ae26 100644
>     >     >     > --- a/include/openvswitch/ofp-util.h
>     >     >     > +++ b/include/openvswitch/ofp-util.h
>     >     >     > @@ -19,6 +19,9 @@
>     >     >     > 
>     >     >     >  #include <stdbool.h>
>     >     >     >  #include <stdint.h>
>     >     >     > +#include <sys/types.h>
>     >     >     > +#include <netinet/in.h>
>     >     >     > +
>     >     >     >  #include "openvswitch/ofp-protocol.h"
>     >     >     > 
>     >     >     >  struct ofp_header;
>     >     >     > @@ -27,6 +30,31 @@ struct ofp_header;
>     >     >     >  extern "C" {
>     >     >     >  #endif
>     >     >     > 
>     >     >     > +struct ofputil_ct_tuple {
>     >     >     > +    struct in6_addr src;
>     >     >     > +    struct in6_addr dst;
>     >     >     > +
>     >     >     > +    union {
>     >     >     > +        ovs_be16 src_port;
>     >     >     > +        ovs_be16 icmp_id;
>     >     >     > +    };
>     >     >     > +    union {
>     >     >     > +        ovs_be16 dst_port;
>     >     >     > +        struct {
>     >     >     > +            uint8_t icmp_code;
>     >     >     > +            uint8_t icmp_type;
>     >     >     > +        };
>     >     >     > +    };
>     >     >     > +};
>     >     >     > +
>     >     >     > +struct ofputil_ct_match {
>     >     >     > +    uint8_t ip_proto;
>     >     >     > +    uint16_t l3_type;
>     >     >     > +
>     >     >     > +    struct ofputil_ct_tuple tuple_orig;
>     >     >     > +    struct ofputil_ct_tuple tuple_reply;
>     >     >     > +};
>     >     >
>     >     >     These structures are very similar to existing ct_dpif_tuple.
>     >     >     We can keep them for now, but I think, we'll need to replace
>     >     >     ct_dpif_tuple with the new ofputil_ct_tuple in the future.
>     >     >     We may need some changes in ofputil_ct_tuple for that though.
>     >     >
>     >     >
>     >     > I was thinking about replacing them, however it would require a much
>     >     > bigger change, which was out of scope of this RFE. Also I was not entirely
>     >     > sure if there wasn't any specific reason for the ct_dpif_tuple specifically
>     >     > the address part. It seems like optimization to me.
>     >     >
>     >     > Anyway it should be possible to unify those if needed.
>     >     >  
>     >     >
>     >     >
>     >     >
>     >     >     OTOH, I'm not sure I understand why we need both tuples here.
>     >     >     Can we have a tuple and a flag that indicates a direction
>     >     >     instead?
>     >     >
>     >     >
>     >     > That is also a possibility. However that wouldn't fit too that nicely in the
>     >     > whole match struct, we could probably duplicate the proto field and get rid of the
>     >     > l3_type. That way we could supply just two tuples instead, getting rid of the match completely.
>     >
>     >     The point was to have only one tuple.  See below.
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     There are several issues with having both original and reply
>     >     >     tuples, IIUC:
>     >     >
>     >     >     1. None of the command line interfaces support specifying
>     >     >        reply tuple.
>     >     >
>     >     >
>     >     > That was the intention of the original command AFAICT, we could modify the commands
>     >     > to support that. However I'm not sure if there is any benefit in that.
>     >
>     >     ovs-ofctl supposed to mimic the OpenFlow request, so it should
>     >     support everything that OpenFlow request can have.
>     >
>     >
>     > I see, sure that can definitely be done, however we should probably think how this can
>     > be done in a more universal way that could be used also for dpctl.
>     >
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     2. If the reply tuple will be specified via OpenFlow, it
>     >     >        will be ignored if the original direction tuple is
>     >     >        fully specified.
>     >     >
>     >     >
>     >     > That is ok IMO, because a fully specified original should be enough to determine the one
>     >     > entry that needs to be erased. So the reply direction in that would be an extra.
>     >
>     >     That supports the point that we don't need two tuples in the request.
>     >
>     >
>     > Right but only if we are able to supply the full original direction. See below.
>     >  
>     >
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     3. Netfilter kernel interface allows specifying only one
>     >     >        tuple, not both, AFAICT.
>     >     >
>     >     >
>     >     > Right, that's the reason why, when we match, we use the original tuple to erase that
>     >     > entry.
>     >     >  
>     >     >
>     >     >
>     >     >     I'm not a conntrack expert, so I have a question: Is it possible
>     >     >     that addresses and ports in original and reply directions are
>     >     >     not mirrored copies?  Or do we care about these cases?
>     >     >
>     >     >
>     >     > Not sure if it's possible, but even if it would be, the current implementation
>     >     > would account for that, with the exception of specifying fully the original
>     >     > direction.
>     >     >  
>     >     >
>     >     >
>     >     >     If we want to flush all entries related to the same IP:PORT,
>     >     >     we will not be able to do that in a single OpenFlow request,
>     >     >     because we'll need to supply the same IP and PORT as src
>     >     >     and dst, or as src in both original and reply directions.
>     >     >     None of these can fit into a single request.
>     >     >
>     >     >
>     >     > Is that really a problem? You could do that in two, one for the original
>     >     > one for reply. This is also very similar to the interface supported by conntrack-tools,
>     >     > with those you couldn't remove all entries in a single call as well. We could modify
>     >     > the interface that would allow matching on tuple without direction specification,
>     >     > that would translate to both directions, however there is still the question if that's needed.
>     >
>     >
>     >     It's not a problem, but another supporting point for the interface
>     >     being redundant.  We don't seem to need two tuples in the request,
>     >     because users will have to send two different requests anyway.
>     >
>     >     So, the request should have:
>     >
>     >     1. zone
>     >     2. nw_proto
>     >     3. tuple
>     >     4. tuple type ('original' or 'reply'), in case we want to allow
>     >        users to flush based on reply tuple.  But this seems a bit
>     >        redundant as well.
>     >
>     >     Something like:
>     >
>     >     struct ofputil_ct_tuple {
>     >     }
>     >
>     >     struct ofputil_ct_match {
>     >         uint8_t ip_proto;
>     >         uint16_t l3_type;
>     >         // enum DIRECTION direction;
>     >         struct ofputil_ct_tuple tuple;
>     >     };
>     >
>     >
>     >     And the OpenFlow part:
>     >
>     >     /* CT flush available TLVs. */
>     >     enum nx_ct_flush_tlv_type {
>     >         NXT_CT_ZONE_ID,           /* u8: CT zone id. */
>     >         NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_* attributes. */
>     >     };
>     >
>     >     enum nx_ct_flush_tuple_type {
>     >         NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped IPv4 address. */
>     >         NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or mapped IPv4 address. */
>     >         NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
>     >         NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
>     >         NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
>     >         NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
>     >         NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
>     >         // NXT_CT_TUPLE_DIRECTION,   /* enum nx_ct_flush_tuple_direction: CT tuple direction. */
>     >     };
>     >
>     >     //enum nx_ct_flush_tuple_direction {
>     >     //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
>     >     //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
>     >     //}
>     >
>     >     struct nx_ct_flush {
>     >         uint8_t nw_proto;          /* IP protocol. */
>     >         uint8_t pad[7];            /* Align to 64-bit (must be zero). */
>     >         /* Followed by ... */
>     >     };
>     >
>     >
>     >     Such interface will also allow us in to have both direction specified
>     >     if we would want that in the future.  For now we should check that
>     >     each TLV type is specified once, i.e. only one tuple is in the request.
>     >
>     >     We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can add it
>     >     in later releases if necessary.  Tuples without that TLV can be treated
>     >     as original direction.
>     >
>     >
>     > The original idea was something similar but as stated below that wouldn't really help
>     > us in the OVN case. And there is probably no significant benefit in having partial match
>     > with only single direction at time. I'm not sure if I'm not missing something obvious.
>     >
>     >
>     >     >  
>     >     >
>     >     >
>     >     >     At this point is there actually any value in flushing basing
>     >     >     on reply direction?  Shouldn't we be able to flush the same
>     >     >     entries by flushing by dst in original direction, if we will
>     >     >     need to issue multiple requests anyway?
>     >     >
>     >     >
>     >     > That assumption would be fine when you can specify a full tuple for the original direction.
>     >     > The problem is that for example OVN we know only the original DST_IP:DST_PORT (VIP)
>     >     > and the reply SRC_IP:SRC_PORT (backend).
>     >
>     >     Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
>     >
>     >
>     > Unfortunately no, it's DNATed so for example
>     > 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
>     > will have CT entries as follows:
>     >
>     > tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
>     > tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
>     > tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2
>     >
>     > We cannot simply filter on the original direction, as we cannot infer the SRC part (without filtering all entries etc.).
> 
>     OK.  This is what I was asking above and you said
>     'Not sure if it's possible'.  So, it is possible,
>     I see.
> 
> 
> Yeah my bad I did not connect the dots right away.
>  
> 
> 
>     >
>     >  
>     >
>     >
>     >     > That means we cannot fully specify the original direction
>     >     > without doing the filtering ourselves. It would be possible with extra steps from the user. If this interface
>     >     > is too generic/complicated for that purpose we could say that the flush extension supports only full orig tuple
>     >     > and leave the filtering up to the user. That would definitely work, it wouldn't be that elegant for the user however
>     >     > from the code change perspective much simpler.
>     >
>     >     I'm not sure I understand that part.  I'm not arguing aginst partial
>     >     matches.  Or do you mean something else?
>     >
>     >     >
>     >     >  I can see a certain benefit of keeping it in OvS:
>     >     >
>     >     > 1) The filtering complexity is hidden from the user.
>     >     > 2) The data transfer between OvS and the user is much smaller (even if you need to send two matches).
>     >     > 3) Once implemented it's just ready to be used without the need of additional thoughts going into it.
>     >     >
>     >     >
>     >     >  
>     >     >
>     >     >
>     >     >     Did you start working on the OVN side of things?  Which
>     >     >     commands OVN will actually use?
>     >     >
>     >     >
>     >     > Yes, I did. The plan is that OVN will use the extension.
>     >
>     >     I understand that.  The question was how the actual request will
>     >     look like.
>     >
>     >
>     > The testing version used something like this:
>     >
>     >     memset(match, 0, sizeof *match);
>     >
>     >     struct ofputil_ct_tuple *orig = &match->tuple_orig;
>     >     struct ofputil_ct_tuple *reply = &match->tuple_reply;
>     >
>     >     if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
>     >         match->l3_type = AF_INET;
>     >     } else {
>     >         match->l3_type = AF_INET6;
>     >     }
>     >
>     >     match->ip_proto = tuple->proto;
>     >
>     >     orig->dst = tuple->vip_ip;
>     >     orig->dst_port = htons(tuple->vip_port);
>     >
>     >     reply->src = tuple->backend_ip;
>     >     reply->src_port = htons(tuple->backend_port);
>     >
>     > tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND, BACKEND_PORT, PROTO.
>     > match is the "struct ofputil_ct_match".
>     >
>     > This is with older version of this RFE but it should illustrate the point.
> 
>     Assuming the current v5 implementation of the extension, will the end
>     result of the request be different if we remove the assignment of
>     'orig=>dst' and 'orig->dst_port' from the code above?
> 
> 
> It might delete more than we actually want to. Without specifying the LBs 5-tuple
> we might always remove more entries than we should. As we can have multiple backends
> for the same VIP (that's the whole point of LB), however nothing prevents us from two VIPs
> pointing to the same backend. Which would cause an issue if we will go only with the reply part.


OK.  That makes sense.  Let's keep both tuples then.

But we need to allow ovs-ofctl to accept both tuples.

And we should use a "shortcut" only if reply tuple is not specified,
because if I'm asking to delete:

 orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=1)

I'm not expecting the following entry to be deleted:

 orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1)

Even though these entries probably cannot exist at the same time,
we should not delete the entry that we wasn't asked to delete.

Best regards, Ilya Maximets.
Ales Musil Jan. 11, 2023, 3:31 p.m. UTC | #8
On Wed, Jan 11, 2023 at 4:27 PM Ilya Maximets <i.maximets@ovn.org> wrote:

> On 1/11/23 16:09, Ales Musil wrote:
> >
> >
> > On Wed, Jan 11, 2023 at 3:52 PM Ilya Maximets <i.maximets@ovn.org
> <mailto:i.maximets@ovn.org>> wrote:
> >
> >     On 1/11/23 14:09, Ales Musil wrote:
> >     >
> >     >
> >     > On Wed, Jan 11, 2023 at 1:34 PM Ilya Maximets <i.maximets@ovn.org
> <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:
> i.maximets@ovn.org>>> wrote:
> >     >
> >     >     On 1/11/23 07:53, Ales Musil wrote:
> >     >     >
> >     >     >
> >     >     > On Tue, Jan 10, 2023 at 11:08 PM Ilya Maximets <
> i.maximets@ovn.org <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org
> <mailto:i.maximets@ovn.org>> <mailto:i.maximets@ovn.org <mailto:
> i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:i.maximets@ovn.org>>>>
> wrote:
> >     >     >
> >     >     >     On 12/16/22 13:30, Ales Musil wrote:
> >     >     >     > Currently, the CT can be flushed by dpctl only be
> specifying
> >     >     >
> >     >     >     *by
> >     >     >
> >     >     >     > the whole 5-tuple. This is not very convenient when
> there are
> >     >     >     > only some fields known to the user of CT flush. Add
> new struct
> >     >     >     > ofputil_ct_match which represents the generic
> filtering that can
> >     >     >     > be done for CT flush. The match is done only on fields
> that are
> >     >     >     > non-zero with exception to the icmp fields.
> >     >     >     >
> >     >     >     > This allows the filtering just within dpctl, however
> >     >     >     > it is a preparation for OpenFlow extension.
> >     >     >     >
> >     >     >     > Reported-at: https://bugzilla.redhat.com/2120546 <
> https://bugzilla.redhat.com/2120546> <https://bugzilla.redhat.com/2120546
> <https://bugzilla.redhat.com/2120546>> <
> https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546>
> <https://bugzilla.redhat.com/2120546 <https://bugzilla.redhat.com/2120546
> >>>
> >     >     >     > Signed-off-by: Ales Musil <amusil@redhat.com <mailto:
> amusil@redhat.com> <mailto:amusil@redhat.com <mailto:amusil@redhat.com>>
> <mailto:amusil@redhat.com <mailto:amusil@redhat.com> <mailto:
> amusil@redhat.com <mailto:amusil@redhat.com>>>>
> >     >     >     > Acked-by: Paolo Valerio <pvalerio@redhat.com <mailto:
> pvalerio@redhat.com> <mailto:pvalerio@redhat.com <mailto:
> pvalerio@redhat.com>> <mailto:pvalerio@redhat.com <mailto:
> pvalerio@redhat.com> <mailto:pvalerio@redhat.com <mailto:
> pvalerio@redhat.com>>>>
> >     >     >     > ---
> >     >     >     > v5: Add ack from Paolo.
> >     >     >     > v4: Fix a flush all scenario.
> >     >     >     > v3: Rebase on top of master.
> >     >     >     >     Address the C99 comment and missing dpif_close
> call.
> >     >     >     > v2: Rebase on top of master.
> >     >     >     >     Address comments from Paolo.
> >     >     >     > ---
> >     >     >
> >     >     >     Hi, Ales.  Thanks for working on this change.
> >     >     >     Sorry for review taking so long.  I initially though
> there
> >     >     >     will be mostly a few cosmetic/style changes needed.  But
> >     >     >     longer I look at this code, more questions I have.  See
> below.
> >     >     >
> >     >     >     Dumitru, maybe you can also help answer some of them?
> >     >     >
> >     >     >     Best regards, Ilya Maximets.
> >     >     >
> >     >     >
> >     >     > Hi Ilya,
> >     >     >
> >     >     > thank you for the review. Before proceeding I tried to
> answer your concerns/questions.
> >     >     > Let me know what you think.
> >     >     >
> >     >     > Thanks,
> >     >     > Ales
> >     >     >
> >     >     >
> >     >     >
> >     >     >     >  NEWS                           |   2 +
> >     >     >     >  include/openvswitch/ofp-util.h |  28 +++
> >     >     >     >  lib/automake.mk <http://automake.mk> <
> http://automake.mk <http://automake.mk>> <http://automake.mk <
> http://automake.mk> <http://automake.mk <http://automake.mk>>>
>     |   2 +
> >     >     >     >  lib/ct-dpif.c                  | 208
> +++++++++------------
> >     >     >     >  lib/ct-dpif.h                  |   4 +-
> >     >     >     >  lib/dpctl.c                    |  42 ++---
> >     >     >     >  lib/dpctl.man                  |   3 +-
> >     >     >     >  lib/ofp-ct-util.c              | 326
> +++++++++++++++++++++++++++++++++
> >     >     >     >  lib/ofp-ct-util.h              |  36 ++++
> >     >     >     >  tests/system-traffic.at <http://system-traffic.at> <
> http://system-traffic.at <http://system-traffic.at>> <
> http://system-traffic.at <http://system-traffic.at> <
> http://system-traffic.at <http://system-traffic.at>>>        |  82
> ++++++++-
> >     >     >     >  10 files changed, 586 insertions(+), 147 deletions(-)
> >     >     >     >  create mode 100644 lib/ofp-ct-util.c
> >     >     >     >  create mode 100644 lib/ofp-ct-util.h
> >     >     >     >
> >     >     >     > diff --git a/NEWS b/NEWS
> >     >     >     > index 265375e1c..ff8904b02 100644
> >     >     >     > --- a/NEWS
> >     >     >     > +++ b/NEWS
> >     >     >     > @@ -14,6 +14,8 @@ Post-v3.0.0
> >     >     >     >       10 Gbps link speed by default in case the actual
> link speed cannot be
> >     >     >     >       determined.  Previously it was 10 Mbps.  Values
> can still be overridden
> >     >     >     >       by specifying 'max-rate' or '[r]stp-path-cost'
> accordingly.
> >     >     >     > +   - ovs-dpctl and related ovs-appctl commands:
> >     >     >
> >     >     >     "ovs-dpctl and 'ovs-appctl dpctl/'" might be a better
> header.
> >     >     >
> >     >     >     > +     * "flush-conntrack" is capable of handling
> partial 5-tuple.
> >     >     >     >
> >     >     >     >
> >     >     >     >  v3.0.0 - 15 Aug 2022
> >     >     >     > diff --git a/include/openvswitch/ofp-util.h
> b/include/openvswitch/ofp-util.h
> >     >     >     > index 091a09cad..84937ae26 100644
> >     >     >     > --- a/include/openvswitch/ofp-util.h
> >     >     >     > +++ b/include/openvswitch/ofp-util.h
> >     >     >     > @@ -19,6 +19,9 @@
> >     >     >     >
> >     >     >     >  #include <stdbool.h>
> >     >     >     >  #include <stdint.h>
> >     >     >     > +#include <sys/types.h>
> >     >     >     > +#include <netinet/in.h>
> >     >     >     > +
> >     >     >     >  #include "openvswitch/ofp-protocol.h"
> >     >     >     >
> >     >     >     >  struct ofp_header;
> >     >     >     > @@ -27,6 +30,31 @@ struct ofp_header;
> >     >     >     >  extern "C" {
> >     >     >     >  #endif
> >     >     >     >
> >     >     >     > +struct ofputil_ct_tuple {
> >     >     >     > +    struct in6_addr src;
> >     >     >     > +    struct in6_addr dst;
> >     >     >     > +
> >     >     >     > +    union {
> >     >     >     > +        ovs_be16 src_port;
> >     >     >     > +        ovs_be16 icmp_id;
> >     >     >     > +    };
> >     >     >     > +    union {
> >     >     >     > +        ovs_be16 dst_port;
> >     >     >     > +        struct {
> >     >     >     > +            uint8_t icmp_code;
> >     >     >     > +            uint8_t icmp_type;
> >     >     >     > +        };
> >     >     >     > +    };
> >     >     >     > +};
> >     >     >     > +
> >     >     >     > +struct ofputil_ct_match {
> >     >     >     > +    uint8_t ip_proto;
> >     >     >     > +    uint16_t l3_type;
> >     >     >     > +
> >     >     >     > +    struct ofputil_ct_tuple tuple_orig;
> >     >     >     > +    struct ofputil_ct_tuple tuple_reply;
> >     >     >     > +};
> >     >     >
> >     >     >     These structures are very similar to existing
> ct_dpif_tuple.
> >     >     >     We can keep them for now, but I think, we'll need to
> replace
> >     >     >     ct_dpif_tuple with the new ofputil_ct_tuple in the
> future.
> >     >     >     We may need some changes in ofputil_ct_tuple for that
> though.
> >     >     >
> >     >     >
> >     >     > I was thinking about replacing them, however it would
> require a much
> >     >     > bigger change, which was out of scope of this RFE. Also I
> was not entirely
> >     >     > sure if there wasn't any specific reason for the
> ct_dpif_tuple specifically
> >     >     > the address part. It seems like optimization to me.
> >     >     >
> >     >     > Anyway it should be possible to unify those if needed.
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     OTOH, I'm not sure I understand why we need both tuples
> here.
> >     >     >     Can we have a tuple and a flag that indicates a direction
> >     >     >     instead?
> >     >     >
> >     >     >
> >     >     > That is also a possibility. However that wouldn't fit too
> that nicely in the
> >     >     > whole match struct, we could probably duplicate the proto
> field and get rid of the
> >     >     > l3_type. That way we could supply just two tuples instead,
> getting rid of the match completely.
> >     >
> >     >     The point was to have only one tuple.  See below.
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     There are several issues with having both original and
> reply
> >     >     >     tuples, IIUC:
> >     >     >
> >     >     >     1. None of the command line interfaces support specifying
> >     >     >        reply tuple.
> >     >     >
> >     >     >
> >     >     > That was the intention of the original command AFAICT, we
> could modify the commands
> >     >     > to support that. However I'm not sure if there is any
> benefit in that.
> >     >
> >     >     ovs-ofctl supposed to mimic the OpenFlow request, so it should
> >     >     support everything that OpenFlow request can have.
> >     >
> >     >
> >     > I see, sure that can definitely be done, however we should
> probably think how this can
> >     > be done in a more universal way that could be used also for dpctl.
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     2. If the reply tuple will be specified via OpenFlow, it
> >     >     >        will be ignored if the original direction tuple is
> >     >     >        fully specified.
> >     >     >
> >     >     >
> >     >     > That is ok IMO, because a fully specified original should be
> enough to determine the one
> >     >     > entry that needs to be erased. So the reply direction in
> that would be an extra.
> >     >
> >     >     That supports the point that we don't need two tuples in the
> request.
> >     >
> >     >
> >     > Right but only if we are able to supply the full original
> direction. See below.
> >     >
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     3. Netfilter kernel interface allows specifying only one
> >     >     >        tuple, not both, AFAICT.
> >     >     >
> >     >     >
> >     >     > Right, that's the reason why, when we match, we use the
> original tuple to erase that
> >     >     > entry.
> >     >     >
> >     >     >
> >     >     >
> >     >     >     I'm not a conntrack expert, so I have a question: Is it
> possible
> >     >     >     that addresses and ports in original and reply
> directions are
> >     >     >     not mirrored copies?  Or do we care about these cases?
> >     >     >
> >     >     >
> >     >     > Not sure if it's possible, but even if it would be, the
> current implementation
> >     >     > would account for that, with the exception of specifying
> fully the original
> >     >     > direction.
> >     >     >
> >     >     >
> >     >     >
> >     >     >     If we want to flush all entries related to the same
> IP:PORT,
> >     >     >     we will not be able to do that in a single OpenFlow
> request,
> >     >     >     because we'll need to supply the same IP and PORT as src
> >     >     >     and dst, or as src in both original and reply directions.
> >     >     >     None of these can fit into a single request.
> >     >     >
> >     >     >
> >     >     > Is that really a problem? You could do that in two, one for
> the original
> >     >     > one for reply. This is also very similar to the interface
> supported by conntrack-tools,
> >     >     > with those you couldn't remove all entries in a single call
> as well. We could modify
> >     >     > the interface that would allow matching on tuple without
> direction specification,
> >     >     > that would translate to both directions, however there is
> still the question if that's needed.
> >     >
> >     >
> >     >     It's not a problem, but another supporting point for the
> interface
> >     >     being redundant.  We don't seem to need two tuples in the
> request,
> >     >     because users will have to send two different requests anyway.
> >     >
> >     >     So, the request should have:
> >     >
> >     >     1. zone
> >     >     2. nw_proto
> >     >     3. tuple
> >     >     4. tuple type ('original' or 'reply'), in case we want to allow
> >     >        users to flush based on reply tuple.  But this seems a bit
> >     >        redundant as well.
> >     >
> >     >     Something like:
> >     >
> >     >     struct ofputil_ct_tuple {
> >     >     }
> >     >
> >     >     struct ofputil_ct_match {
> >     >         uint8_t ip_proto;
> >     >         uint16_t l3_type;
> >     >         // enum DIRECTION direction;
> >     >         struct ofputil_ct_tuple tuple;
> >     >     };
> >     >
> >     >
> >     >     And the OpenFlow part:
> >     >
> >     >     /* CT flush available TLVs. */
> >     >     enum nx_ct_flush_tlv_type {
> >     >         NXT_CT_ZONE_ID,           /* u8: CT zone id. */
> >     >         NXT_CT_TUPLE,             /* Nested NXT_CT_TUPLE_*
> attributes. */
> >     >     };
> >     >
> >     >     enum nx_ct_flush_tuple_type {
> >     >         NXT_CT_TUPLE_SRC,         /* be128: source IPv6 or mapped
> IPv4 address. */
> >     >         NXT_CT_TUPLE_DST,         /* be128: destination IPv6 or
> mapped IPv4 address. */
> >     >         NXT_CT_TUPLE_SRC_PORT,    /* be16: source port. */
> >     >         NXT_CT_TUPLE_DST_PORT,    /* be16: destination port. */
> >     >         NXT_CT_TUPLE_ICMP_ID,     /* be16: ICMP id. */
> >     >         NXT_CT_TUPLE_ICMP_TYPE,   /* u8: ICMP type. */
> >     >         NXT_CT_TUPLE_ICMP_CODE,   /* u8: ICMP code. */
> >     >         // NXT_CT_TUPLE_DIRECTION,   /* enum
> nx_ct_flush_tuple_direction: CT tuple direction. */
> >     >     };
> >     >
> >     >     //enum nx_ct_flush_tuple_direction {
> >     >     //    NXT_CT_TUPLE_DIRECTION_ORIG = 1,
> >     >     //    NXT_CT_TUPLE_DIRECTION_REPLY = 2,
> >     >     //}
> >     >
> >     >     struct nx_ct_flush {
> >     >         uint8_t nw_proto;          /* IP protocol. */
> >     >         uint8_t pad[7];            /* Align to 64-bit (must be
> zero). */
> >     >         /* Followed by ... */
> >     >     };
> >     >
> >     >
> >     >     Such interface will also allow us in to have both direction
> specified
> >     >     if we would want that in the future.  For now we should check
> that
> >     >     each TLV type is specified once, i.e. only one tuple is in the
> request.
> >     >
> >     >     We may or may not add NXT_CT_TUPLE_DIRECTION support now.  Can
> add it
> >     >     in later releases if necessary.  Tuples without that TLV can
> be treated
> >     >     as original direction.
> >     >
> >     >
> >     > The original idea was something similar but as stated below that
> wouldn't really help
> >     > us in the OVN case. And there is probably no significant benefit
> in having partial match
> >     > with only single direction at time. I'm not sure if I'm not
> missing something obvious.
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     At this point is there actually any value in flushing
> basing
> >     >     >     on reply direction?  Shouldn't we be able to flush the
> same
> >     >     >     entries by flushing by dst in original direction, if we
> will
> >     >     >     need to issue multiple requests anyway?
> >     >     >
> >     >     >
> >     >     > That assumption would be fine when you can specify a full
> tuple for the original direction.
> >     >     > The problem is that for example OVN we know only the
> original DST_IP:DST_PORT (VIP)
> >     >     > and the reply SRC_IP:SRC_PORT (backend).
> >     >
> >     >     Isn't original DST_IP:DST_PORT equal to reply SRC_IP:SRC_PORT ?
> >     >
> >     >
> >     > Unfortunately no, it's DNATed so for example
> >     > 'ovn-nbctl lb-add lb1 30.0.0.1 "172.16.1.2,172.16.1.3,172.16.1.4"'
> >     > will have CT entries as follows:
> >     >
> >     >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1),zone=0,mark=2
> >     >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=2,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=2),zone=0,mark=2
> >     >
> tcp,orig=(src=192.168.1.2,dst=30.0.0.1,sport=3,dport=80),reply=(src=172.16.1.4,dst=192.168.1.2,sport=80,dport=3),zone=0,mark=2
> >     >
> >     > We cannot simply filter on the original direction, as we cannot
> infer the SRC part (without filtering all entries etc.).
> >
> >     OK.  This is what I was asking above and you said
> >     'Not sure if it's possible'.  So, it is possible,
> >     I see.
> >
> >
> > Yeah my bad I did not connect the dots right away.
> >
> >
> >
> >     >
> >     >
> >     >
> >     >
> >     >     > That means we cannot fully specify the original direction
> >     >     > without doing the filtering ourselves. It would be possible
> with extra steps from the user. If this interface
> >     >     > is too generic/complicated for that purpose we could say
> that the flush extension supports only full orig tuple
> >     >     > and leave the filtering up to the user. That would
> definitely work, it wouldn't be that elegant for the user however
> >     >     > from the code change perspective much simpler.
> >     >
> >     >     I'm not sure I understand that part.  I'm not arguing aginst
> partial
> >     >     matches.  Or do you mean something else?
> >     >
> >     >     >
> >     >     >  I can see a certain benefit of keeping it in OvS:
> >     >     >
> >     >     > 1) The filtering complexity is hidden from the user.
> >     >     > 2) The data transfer between OvS and the user is much
> smaller (even if you need to send two matches).
> >     >     > 3) Once implemented it's just ready to be used without the
> need of additional thoughts going into it.
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >     Did you start working on the OVN side of things?  Which
> >     >     >     commands OVN will actually use?
> >     >     >
> >     >     >
> >     >     > Yes, I did. The plan is that OVN will use the extension.
> >     >
> >     >     I understand that.  The question was how the actual request
> will
> >     >     look like.
> >     >
> >     >
> >     > The testing version used something like this:
> >     >
> >     >     memset(match, 0, sizeof *match);
> >     >
> >     >     struct ofputil_ct_tuple *orig = &match->tuple_orig;
> >     >     struct ofputil_ct_tuple *reply = &match->tuple_reply;
> >     >
> >     >     if (IN6_IS_ADDR_V4MAPPED(&tuple->vip_ip)) {
> >     >         match->l3_type = AF_INET;
> >     >     } else {
> >     >         match->l3_type = AF_INET6;
> >     >     }
> >     >
> >     >     match->ip_proto = tuple->proto;
> >     >
> >     >     orig->dst = tuple->vip_ip;
> >     >     orig->dst_port = htons(tuple->vip_port);
> >     >
> >     >     reply->src = tuple->backend_ip;
> >     >     reply->src_port = htons(tuple->backend_port);
> >     >
> >     > tuple is the struct containing lb info, so VIP, VIP_PORT, BACKEND,
> BACKEND_PORT, PROTO.
> >     > match is the "struct ofputil_ct_match".
> >     >
> >     > This is with older version of this RFE but it should illustrate
> the point.
> >
> >     Assuming the current v5 implementation of the extension, will the end
> >     result of the request be different if we remove the assignment of
> >     'orig=>dst' and 'orig->dst_port' from the code above?
> >
> >
> > It might delete more than we actually want to. Without specifying the
> LBs 5-tuple
> > we might always remove more entries than we should. As we can have
> multiple backends
> > for the same VIP (that's the whole point of LB), however nothing
> prevents us from two VIPs
> > pointing to the same backend. Which would cause an issue if we will go
> only with the reply part.
>
>
> OK.  That makes sense.  Let's keep both tuples then.
>
> But we need to allow ovs-ofctl to accept both tuples.
>
> And we should use a "shortcut" only if reply tuple is not specified,
> because if I'm asking to delete:
>
>
>  orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.3,dst=192.168.1.2,sport=80,dport=1)
>
> I'm not expecting the following entry to be deleted:
>
>
>  orig=(src=192.168.1.2,dst=30.0.0.1,sport=1,dport=80),reply=(src=172.16.1.2,dst=192.168.1.2,sport=80,dport=1)
>
> Even though these entries probably cannot exist at the same time,
> we should not delete the entry that we wasn't asked to delete.
>
> Best regards, Ilya Maximets.
>
>
That makes sense. I'll work on that + the smaller comments.

Thanks,
Ales
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 265375e1c..ff8904b02 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,8 @@  Post-v3.0.0
      10 Gbps link speed by default in case the actual link speed cannot be
      determined.  Previously it was 10 Mbps.  Values can still be overridden
      by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
+   - ovs-dpctl and related ovs-appctl commands:
+     * "flush-conntrack" is capable of handling partial 5-tuple.
 
 
 v3.0.0 - 15 Aug 2022
diff --git a/include/openvswitch/ofp-util.h b/include/openvswitch/ofp-util.h
index 091a09cad..84937ae26 100644
--- a/include/openvswitch/ofp-util.h
+++ b/include/openvswitch/ofp-util.h
@@ -19,6 +19,9 @@ 
 
 #include <stdbool.h>
 #include <stdint.h>
+#include <sys/types.h>
+#include <netinet/in.h>
+
 #include "openvswitch/ofp-protocol.h"
 
 struct ofp_header;
@@ -27,6 +30,31 @@  struct ofp_header;
 extern "C" {
 #endif
 
+struct ofputil_ct_tuple {
+    struct in6_addr src;
+    struct in6_addr dst;
+
+    union {
+        ovs_be16 src_port;
+        ovs_be16 icmp_id;
+    };
+    union {
+        ovs_be16 dst_port;
+        struct {
+            uint8_t icmp_code;
+            uint8_t icmp_type;
+        };
+    };
+};
+
+struct ofputil_ct_match {
+    uint8_t ip_proto;
+    uint16_t l3_type;
+
+    struct ofputil_ct_tuple tuple_orig;
+    struct ofputil_ct_tuple tuple_reply;
+};
+
 bool ofputil_decode_hello(const struct ofp_header *,
                           uint32_t *allowed_versions);
 struct ofpbuf *ofputil_encode_hello(uint32_t version_bitmap);
diff --git a/lib/automake.mk b/lib/automake.mk
index a0fabe38f..37135f118 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -227,6 +227,8 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/ofp-actions.c \
 	lib/ofp-bundle.c \
 	lib/ofp-connection.c \
+	lib/ofp-ct-util.c \
+	lib/ofp-ct-util.h \
 	lib/ofp-ed-props.c \
 	lib/ofp-errors.c \
 	lib/ofp-flow.c \
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index 6f17a26b5..9b324ff79 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -20,6 +20,7 @@ 
 #include <errno.h>
 
 #include "ct-dpif.h"
+#include "ofp-ct-util.h"
 #include "openvswitch/ofp-parse.h"
 #include "openvswitch/vlog.h"
 
@@ -71,6 +72,31 @@  ct_dpif_dump_start(struct dpif *dpif, struct ct_dpif_dump_state **dump,
     return err;
 }
 
+static void
+ct_dpif_tuple_from_ofputil_ct_tuple(const struct ofputil_ct_tuple *ofp_tuple,
+                                    struct ct_dpif_tuple *tuple,
+                                    uint16_t l3_type, uint8_t ip_proto)
+{
+    if (l3_type == AF_INET) {
+        tuple->src.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->src);
+        tuple->dst.ip = in6_addr_get_mapped_ipv4(&ofp_tuple->dst);
+    } else {
+        tuple->src.in6 = ofp_tuple->src;
+        tuple->dst.in6 = ofp_tuple->dst;
+    }
+
+    tuple->l3_type = l3_type;
+    tuple->ip_proto = ip_proto;
+    tuple->src_port = ofp_tuple->src_port;
+
+    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+        tuple->icmp_code = ofp_tuple->icmp_code;
+        tuple->icmp_type = ofp_tuple->icmp_type;
+    } else {
+        tuple->dst_port = ofp_tuple->dst_port;
+    }
+}
+
 /* Dump one connection from a tracker, and put it in 'entry'.
  *
  * 'dump' should have been initialized by ct_dpif_dump_start().
@@ -100,25 +126,77 @@  ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
             ? dpif->dpif_class->ct_dump_done(dpif, dump)
             : EOPNOTSUPP);
 }
-
+
+static int
+ct_dpif_flush_tuple(struct dpif *dpif, const uint16_t *zone,
+                    const struct ofputil_ct_match *match) {
+    struct ct_dpif_dump_state *dump;
+    struct ct_dpif_entry cte;
+    int error;
+    int tot_bkts;
+
+    if (!dpif->dpif_class->ct_flush) {
+        return EOPNOTSUPP;
+    }
+
+    if (VLOG_IS_DBG_ENABLED()) {
+        struct ds ds = DS_EMPTY_INITIALIZER;
+        ofputil_ct_match_format(&ds, match);
+        VLOG_DBG("%s: ct_flush:%s in zone %d", dpif_name(dpif), ds_cstr(&ds),
+                 zone ? *zone : 0);
+        ds_destroy(&ds);
+    }
+
+    /* If we have full five tuple in orig just do the flush over that
+     * tuple directly. */
+    if (ofputil_ct_tuple_is_five_tuple(&match->tuple_orig, match->ip_proto)) {
+        struct ct_dpif_tuple tuple;
+        ct_dpif_tuple_from_ofputil_ct_tuple(&match->tuple_orig, &tuple,
+                                            match->l3_type, match->ip_proto);
+        return dpif->dpif_class->ct_flush(dpif, zone, &tuple);
+    }
+
+    error = ct_dpif_dump_start(dpif, &dump, zone, &tot_bkts);
+    if (error) {
+        return error;
+    }
+
+    while (!(error = ct_dpif_dump_next(dump, &cte))) {
+        if (zone && *zone != cte.zone) {
+            continue;
+        }
+
+        if (ofputil_ct_match_cmp(match, &cte)) {
+            error = dpif->dpif_class->ct_flush(dpif, &cte.zone,
+                                               &cte.tuple_orig);
+            if (error) {
+                break;
+            }
+        }
+    }
+    if (error == EOF) {
+        error = 0;
+    }
+
+    ct_dpif_dump_done(dump);
+    return error;
+}
+
 /* Flush the entries in the connection tracker used by 'dpif'.  The
  * arguments have the following behavior:
  *
- *   - If both 'zone' and 'tuple' are NULL, flush all the conntrack entries.
- *   - If 'zone' is not NULL, and 'tuple' is NULL, flush all the conntrack
+ *   - If both 'zone' and 'match' are NULL, or match is zero,
+ *     flush all the conntrack entries.
+ *   - If 'zone' is not NULL, and 'match' is NULL, flush all the conntrack
  *     entries in '*zone'.
- *   - If 'tuple' is not NULL, flush the conntrack entry specified by 'tuple'
+ *   - If 'tuple' is not NULL, flush the conntrack entry specified by 'match'
  *     in '*zone'. If 'zone' is NULL, use the default zone (zone 0). */
 int
 ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
-              const struct ct_dpif_tuple *tuple)
+              const struct ofputil_ct_match *match)
 {
-    if (tuple) {
-        struct ds ds = DS_EMPTY_INITIALIZER;
-        ct_dpif_format_tuple(&ds, tuple);
-        VLOG_DBG("%s: ct_flush: %s in zone %d", dpif_name(dpif), ds_cstr(&ds),
-                                                zone ? *zone : 0);
-        ds_destroy(&ds);
+    if (match && !ofputil_is_ct_match_zero(match)) {
+        return ct_dpif_flush_tuple(dpif, zone, match);
     } else if (zone) {
         VLOG_DBG("%s: ct_flush: zone %"PRIu16, dpif_name(dpif), *zone);
     } else {
@@ -126,7 +204,7 @@  ct_dpif_flush(struct dpif *dpif, const uint16_t *zone,
     }
 
     return (dpif->dpif_class->ct_flush
-            ? dpif->dpif_class->ct_flush(dpif, zone, tuple)
+            ? dpif->dpif_class->ct_flush(dpif, zone, NULL)
             : EOPNOTSUPP);
 }
 
@@ -583,112 +661,6 @@  ct_dpif_format_tcp_stat(struct ds * ds, int tcp_state, int conn_per_state)
     ds_put_format(ds, "=%u", conn_per_state);
 }
 
-/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
- * Returns true on success.  Otherwise, returns false and puts the error
- * message in 'ds'. */
-bool
-ct_dpif_parse_tuple(struct ct_dpif_tuple *tuple, const char *s, struct ds *ds)
-{
-    char *pos, *key, *value, *copy;
-    memset(tuple, 0, sizeof *tuple);
-
-    pos = copy = xstrdup(s);
-    while (ofputil_parse_key_value(&pos, &key, &value)) {
-        if (!*value) {
-            ds_put_format(ds, "field %s missing value", key);
-            goto error;
-        }
-
-        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")) {
-            if (tuple->l3_type && tuple->l3_type != AF_INET) {
-                ds_put_cstr(ds, "L3 type set multiple times");
-                goto error;
-            } else {
-                tuple->l3_type = AF_INET;
-            }
-            if (!ip_parse(value, key[6] == 's' ? &tuple->src.ip :
-                                                 &tuple->dst.ip)) {
-                goto error_with_msg;
-            }
-        } else if (!strcmp(key, "ct_ipv6_src") ||
-                   !strcmp(key, "ct_ipv6_dst")) {
-            if (tuple->l3_type && tuple->l3_type != AF_INET6) {
-                ds_put_cstr(ds, "L3 type set multiple times");
-                goto error;
-            } else {
-                tuple->l3_type = AF_INET6;
-            }
-            if (!ipv6_parse(value, key[8] == 's' ? &tuple->src.in6 :
-                                                   &tuple->dst.in6)) {
-                goto error_with_msg;
-            }
-        } else if (!strcmp(key, "ct_nw_proto")) {
-            char *err = str_to_u8(value, key, &tuple->ip_proto);
-            if (err) {
-                free(err);
-                goto error_with_msg;
-            }
-        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key,"ct_tp_dst")) {
-            uint16_t port;
-            char *err = str_to_u16(value, key, &port);
-            if (err) {
-                free(err);
-                goto error_with_msg;
-            }
-            if (key[6] == 's') {
-                tuple->src_port = htons(port);
-            } else {
-                tuple->dst_port = htons(port);
-            }
-        } else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") ||
-                   !strcmp(key, "icmp_id") ) {
-            if (tuple->ip_proto != IPPROTO_ICMP &&
-                tuple->ip_proto != IPPROTO_ICMPV6) {
-                ds_put_cstr(ds, "invalid L4 fields");
-                goto error;
-            }
-            uint16_t icmp_id;
-            char *err;
-            if (key[5] == 't') {
-                err = str_to_u8(value, key, &tuple->icmp_type);
-            } else if (key[5] == 'c') {
-                err = str_to_u8(value, key, &tuple->icmp_code);
-            } else {
-                err = str_to_u16(value, key, &icmp_id);
-                tuple->icmp_id = htons(icmp_id);
-            }
-            if (err) {
-                free(err);
-                goto error_with_msg;
-            }
-        } else {
-            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
-            goto error;
-        }
-    }
-
-    if (ipv6_is_zero(&tuple->src.in6) || ipv6_is_zero(&tuple->dst.in6) ||
-        !tuple->ip_proto) {
-        /* icmp_type, icmp_code, and icmp_id can be 0. */
-        if (tuple->ip_proto != IPPROTO_ICMP &&
-            tuple->ip_proto != IPPROTO_ICMPV6) {
-            if (!tuple->src_port || !tuple->dst_port) {
-                ds_put_cstr(ds, "at least one of the conntrack 5-tuple fields "
-                                "is missing.");
-                goto error;
-            }
-        }
-    }
-
-    free(copy);
-    return true;
-
-error_with_msg:
-    ds_put_format(ds, "failed to parse field %s", key);
-error:
-    free(copy);
-    return false;
-}
 
 void
 ct_dpif_push_zone_limit(struct ovs_list *zone_limits, uint16_t zone,
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 2848549b0..337c99dd4 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -17,6 +17,7 @@ 
 #ifndef CT_DPIF_H
 #define CT_DPIF_H
 
+#include "openvswitch/ofp-util.h"
 #include "openvswitch/types.h"
 #include "packets.h"
 
@@ -285,7 +286,7 @@  int ct_dpif_dump_start(struct dpif *, struct ct_dpif_dump_state **,
 int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry *);
 int ct_dpif_dump_done(struct ct_dpif_dump_state *);
 int ct_dpif_flush(struct dpif *, const uint16_t *zone,
-                  const struct ct_dpif_tuple *);
+                  const struct ofputil_ct_match *);
 int ct_dpif_set_maxconns(struct dpif *dpif, uint32_t maxconns);
 int ct_dpif_get_maxconns(struct dpif *dpif, uint32_t *maxconns);
 int ct_dpif_get_nconns(struct dpif *dpif, uint32_t *nconns);
@@ -311,7 +312,6 @@  void ct_dpif_format_ipproto(struct ds *ds, uint16_t ipproto);
 void ct_dpif_format_tuple(struct ds *, const struct ct_dpif_tuple *);
 uint8_t ct_dpif_coalesce_tcp_state(uint8_t state);
 void ct_dpif_format_tcp_stat(struct ds *, int, int);
-bool ct_dpif_parse_tuple(struct ct_dpif_tuple *, const char *s, struct ds *);
 void ct_dpif_push_zone_limit(struct ovs_list *, uint16_t zone, uint32_t limit,
                              uint32_t count);
 void ct_dpif_free_zone_limits(struct ovs_list *);
diff --git a/lib/dpctl.c b/lib/dpctl.c
index 29041fa3e..e6e350aba 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -40,6 +40,7 @@ 
 #include "netdev.h"
 #include "netlink.h"
 #include "odp-util.h"
+#include "ofp-ct-util.h"
 #include "openvswitch/ofpbuf.h"
 #include "packets.h"
 #include "openvswitch/shash.h"
@@ -1707,47 +1708,38 @@  dpctl_flush_conntrack(int argc, const char *argv[],
                       struct dpctl_params *dpctl_p)
 {
     struct dpif *dpif = NULL;
-    struct ct_dpif_tuple tuple, *ptuple = NULL;
-    struct ds ds = DS_EMPTY_INITIALIZER;
-    uint16_t zone, *pzone = NULL;
-    int error;
+    struct ofputil_ct_match match = {0};
+    uint16_t zone;
+    bool with_zone = false;
     int args = argc - 1;
 
     /* Parse ct tuple */
-    if (args && ct_dpif_parse_tuple(&tuple, argv[args], &ds)) {
-        ptuple = &tuple;
+    if (args) {
+        struct ds ds = DS_EMPTY_INITIALIZER;
+        if (!ofputil_ct_match_parse(&match, argv[args], &ds)) {
+            dpctl_error(dpctl_p, EINVAL, "%s", ds_cstr(&ds));
+            ds_destroy(&ds);
+            return EINVAL;
+        }
         args--;
     }
 
     /* Parse zone */
     if (args && ovs_scan(argv[args], "zone=%"SCNu16, &zone)) {
-        pzone = &zone;
-        args--;
-    }
-
-    /* Report error if there are more than one unparsed argument. */
-    if (args > 1) {
-        ds_put_cstr(&ds, "invalid arguments");
-        error = EINVAL;
-        goto error;
+        with_zone = true;
     }
 
-    error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
+    int error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
     if (error) {
+        dpctl_error(dpctl_p, error, "Cannot open dpif");
         return error;
     }
 
-    error = ct_dpif_flush(dpif, pzone, ptuple);
-    if (!error) {
-        dpif_close(dpif);
-        return 0;
-    } else {
-        ds_put_cstr(&ds, "failed to flush conntrack");
+    error = ct_dpif_flush(dpif, with_zone ? &zone : NULL, &match);
+    if (error) {
+        dpctl_error(dpctl_p, error, "Failed to flush conntrack");
     }
 
-error:
-    dpctl_error(dpctl_p, error, "%s", ds_cstr(&ds));
-    ds_destroy(&ds);
     dpif_close(dpif);
     return error;
 }
diff --git a/lib/dpctl.man b/lib/dpctl.man
index 87ea8087b..b0cabe05d 100644
--- a/lib/dpctl.man
+++ b/lib/dpctl.man
@@ -312,7 +312,8 @@  If \fBzone\fR=\fIzone\fR is specified, only flushes the connections in
 If \fIct-tuple\fR is provided, flushes the connection entry specified by
 \fIct-tuple\fR in \fIzone\fR. The zone defaults to 0 if it is not provided.
 The userspace connection tracker requires flushing with the original pre-NATed
-tuple and a warning log will be otherwise generated.
+tuple and a warning log will be otherwise generated. The tuple can be partial
+and will remove all connections that are matching on the specified fields.
 An example of an IPv4 ICMP \fIct-tuple\fR:
 .IP
 "ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=1,icmp_type=8,icmp_code=0,icmp_id=10"
diff --git a/lib/ofp-ct-util.c b/lib/ofp-ct-util.c
new file mode 100644
index 000000000..a10607590
--- /dev/null
+++ b/lib/ofp-ct-util.c
@@ -0,0 +1,326 @@ 
+
+/* Copyright (c) 2022, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <sys/types.h>
+#include <netinet/in.h>
+#include <netinet/icmp6.h>
+
+#include "ct-dpif.h"
+#include "ofp-ct-util.h"
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/ofp-parse.h"
+#include "openvswitch/ofp-util.h"
+#include "openvswitch/packets.h"
+
+static inline bool
+ofputil_ct_inet_addr_cmp_partial(const struct in6_addr *partial,
+                                 const union ct_dpif_inet_addr *addr,
+                                 const uint16_t l3_type)
+{
+    if (ipv6_is_zero(partial)) {
+        return true;
+    }
+
+    if (l3_type == AF_INET && in6_addr_get_mapped_ipv4(partial) != addr->ip) {
+        return false;
+    }
+
+    if (l3_type == AF_INET6 && !ipv6_addr_equals(partial, &addr->in6)) {
+        return false;
+    }
+
+    return true;
+}
+
+static inline bool
+ofputil_ct_tuple_ip_cmp_partial(const struct ofputil_ct_tuple *partial,
+                                const struct ct_dpif_tuple *tuple,
+                                const uint16_t l3_type, const uint8_t ip_proto)
+{
+    if (!ofputil_ct_inet_addr_cmp_partial(&partial->src,
+                                          &tuple->src, l3_type)) {
+        return false;
+    }
+
+    if (!ofputil_ct_inet_addr_cmp_partial(&partial->dst,
+                                          &tuple->dst, l3_type)) {
+        return false;
+    }
+
+    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+        if (partial->icmp_id != tuple->icmp_id) {
+            return false;
+        }
+
+        if (partial->icmp_type != tuple->icmp_type) {
+            return false;
+        }
+
+        if (partial->icmp_code != tuple->icmp_code) {
+            return false;
+        }
+    } else {
+        if (partial->src_port && partial->src_port != tuple->src_port) {
+            return false;
+        }
+
+        if (partial->dst_port && partial->dst_port != tuple->dst_port) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+/* Compares the non-zero members if they match. This is useful for clearing
+ * up all connections specified by a partial tuples for orig/reply. */
+bool
+ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
+                     const struct ct_dpif_entry *entry)
+{
+    if (match->l3_type && match->l3_type != entry->tuple_orig.l3_type) {
+        return false;
+    }
+
+    if (match->ip_proto && match->ip_proto != entry->tuple_orig.ip_proto) {
+        return false;
+    }
+
+    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_orig,
+                                         &entry->tuple_orig,
+                                         match->l3_type, match->ip_proto)) {
+        return false;
+    }
+
+    if (!ofputil_ct_tuple_ip_cmp_partial(&match->tuple_reply,
+                                         &entry->tuple_reply,
+                                         match->l3_type, match->ip_proto)) {
+        return false;
+    }
+
+    return true;
+}
+
+static void
+ofputil_ct_tuple_format(struct ds *ds, const struct ofputil_ct_tuple *tuple,
+                        uint8_t ip_proto)
+{
+    ds_put_cstr(ds, "src=");
+    ipv6_format_mapped(&tuple->src, ds);
+    ds_put_cstr(ds, ",dst=");
+    ipv6_format_mapped(&tuple->dst, ds);
+    if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+        ds_put_format(ds, ",icmp_id=%u,icmp_type=%u,icmp_code=%u",
+                      ntohs(tuple->icmp_id), tuple->icmp_type,
+                      tuple->icmp_code);
+
+    } else {
+        ds_put_format(ds, ",src_port=%u,dst_port=%u", ntohs(tuple->src_port),
+                      ntohs(tuple->dst_port));
+    }
+}
+
+static bool
+ofputil_is_tuple_zero(const struct ofputil_ct_tuple *tuple)
+{
+    return ipv6_is_zero(&tuple->src) && ipv6_is_zero(&tuple->dst) &&
+           !tuple->src_port && !tuple->dst_port;
+}
+
+bool
+ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple *tuple,
+                               uint8_t ip_proto)
+{
+    /* First check if we have address. */
+    bool five_tuple = !ipv6_is_zero(&tuple->src) && !ipv6_is_zero(&tuple->dst);
+
+    if (!(ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6)) {
+        five_tuple = five_tuple && tuple->src_port && tuple->dst_port;
+    }
+
+    return five_tuple;
+}
+
+bool
+ofputil_is_ct_match_zero(const struct ofputil_ct_match *match)
+{
+    return !match->ip_proto && !match->l3_type &&
+           ofputil_is_tuple_zero(&match->tuple_orig) &&
+           ofputil_is_tuple_zero(&match->tuple_reply);
+}
+
+void
+ofputil_ct_match_format(struct ds *ds, const struct ofputil_ct_match *match)
+{
+    ds_put_format(ds, " l3_type=%u,ip_proto=%u", match->l3_type,
+                  match->ip_proto);
+    ds_put_cstr(ds, ",orig=(");
+    ofputil_ct_tuple_format(ds, &match->tuple_orig, match->ip_proto);
+    ds_put_cstr(ds, "),reply=(");
+    ofputil_ct_tuple_format(ds, &match->tuple_reply, match->ip_proto);
+    ds_put_cstr(ds, ")");
+}
+
+static bool
+ofputil_ct_tuple_ip_parse(struct in6_addr *addr, char *value, uint16_t l3_type)
+{
+    if (!ipv6_is_zero(addr)) {
+        return false;
+    }
+
+    if (l3_type == AF_INET) {
+        ovs_be32 ip = 0;
+
+        ip_parse(value, &ip);
+        *addr = in6_addr_mapped_ipv4(ip);
+    } else {
+        ipv6_parse(value, addr);
+    }
+
+    return true;
+}
+
+/* Parses a specification of a conntrack 5-tuple from 's' into 'tuple'.
+ * Returns true on success.  Otherwise, returns false and puts the error
+ * message in 'ds'. */
+bool
+ofputil_ct_match_parse(struct ofputil_ct_match *match, const char *s,
+                       struct ds *ds)
+{
+    char *pos, *key, *value, *copy;
+
+    memset(match, 0, sizeof *match);
+    struct ofputil_ct_tuple *tuple = &match->tuple_orig;
+
+    pos = copy = xstrdup(s);
+    while (ofputil_parse_key_value(&pos, &key, &value)) {
+        if (!*value) {
+            ds_put_format(ds, "field %s missing value", key);
+            goto error_with_msg;
+        }
+
+        if (!strcmp(key, "ct_nw_src") || !strcmp(key, "ct_nw_dst")
+            || !strcmp(key, "ct_ipv6_src") || !strcmp(key, "ct_ipv6_dst")) {
+            match->l3_type = key[6] == '6' ? AF_INET6 : AF_INET;
+            uint8_t index = key[6] == '6' ? 8 : 6;
+            struct in6_addr *addr = key[index] == 's'
+                ? &tuple->src : &tuple->dst;
+
+            if (!ofputil_ct_tuple_ip_parse(addr, value, match->l3_type)) {
+                ds_put_format(ds, "%s is set multiple times", key);
+                goto error;
+            }
+        } else if (!strcmp(key, "ct_nw_proto")) {
+            char *err = str_to_u8(value, key, &match->ip_proto);
+
+            if (err) {
+                free(err);
+                goto error_with_msg;
+            }
+        } else if (!strcmp(key, "ct_tp_src") || !strcmp(key, "ct_tp_dst")) {
+            uint16_t port;
+            char *err = str_to_u16(value, key, &port);
+
+            if (err) {
+                free(err);
+                goto error_with_msg;
+            }
+            if (key[6] == 's') {
+                tuple->src_port = htons(port);
+            } else {
+                tuple->dst_port = htons(port);
+            }
+        } else if (!strcmp(key, "icmp_type") || !strcmp(key, "icmp_code") ||
+                   !strcmp(key, "icmp_id")) {
+            if (match->ip_proto != IPPROTO_ICMP &&
+                match->ip_proto != IPPROTO_ICMPV6) {
+                ds_put_cstr(ds, "invalid L4 fields");
+                goto error;
+            }
+            uint16_t icmp_id;
+            char *err;
+
+            if (key[5] == 't') {
+                err = str_to_u8(value, key, &tuple->icmp_type);
+            } else if (key[5] == 'c') {
+                err = str_to_u8(value, key, &tuple->icmp_code);
+            } else {
+                err = str_to_u16(value, key, &icmp_id);
+                tuple->icmp_id = htons(icmp_id);
+            }
+            if (err) {
+                free(err);
+                goto error_with_msg;
+            }
+        } else {
+            ds_put_format(ds, "invalid conntrack tuple field: %s", key);
+            goto error;
+        }
+    }
+
+    if (!match->ip_proto && (tuple->src_port || tuple->dst_port)) {
+        ds_put_cstr(ds, "port is set without protocol");
+        goto error;
+    }
+
+    /* For the filtering to work with icmp we need to fill the reply direction
+     * with correct information. */
+    if (match->ip_proto == IPPROTO_ICMP) {
+        switch (match->tuple_orig.icmp_type) {
+        case ICMP4_ECHO_REQUEST:
+            match->tuple_reply.icmp_type = ICMP4_ECHO_REPLY;
+            break;
+        case ICMP4_ECHO_REPLY:
+            match->tuple_reply.icmp_type = ICMP4_ECHO_REQUEST;
+            break;
+        case ICMP4_TIMESTAMP:
+            match->tuple_reply.icmp_type = ICMP4_TIMESTAMPREPLY;
+            break;
+        case ICMP4_TIMESTAMPREPLY:
+            match->tuple_reply.icmp_type = ICMP4_TIMESTAMP;
+            break;
+        case ICMP4_INFOREQUEST:
+            match->tuple_reply.icmp_type = ICMP4_INFOREPLY;
+            break;
+        case ICMP4_INFOREPLY:
+            match->tuple_reply.icmp_type = ICMP4_INFOREQUEST;
+            break;
+        }
+        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
+    } else if (match->ip_proto == IPPROTO_ICMPV6) {
+        switch (match->tuple_orig.icmp_type) {
+        case ICMP6_ECHO_REQUEST:
+            match->tuple_reply.icmp_type = ICMP6_ECHO_REPLY;
+            break;
+        case ICMP6_ECHO_REPLY:
+            match->tuple_reply.icmp_type = ICMP6_ECHO_REQUEST;
+            break;
+        }
+        match->tuple_reply.icmp_id = match->tuple_orig.icmp_id;
+    }
+
+    free(copy);
+    return true;
+
+error_with_msg:
+    ds_put_format(ds, "failed to parse field %s", key);
+error:
+    free(copy);
+    return false;
+}
diff --git a/lib/ofp-ct-util.h b/lib/ofp-ct-util.h
new file mode 100644
index 000000000..5a5eaf920
--- /dev/null
+++ b/lib/ofp-ct-util.h
@@ -0,0 +1,36 @@ 
+/* Copyright (c) 2022, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef OVS_OFP_CT_UTIL_H
+#define OVS_OFP_CT_UTIL_H
+
+#include "ct-dpif.h"
+#include "openvswitch/ofp-util.h"
+
+bool ofputil_ct_match_cmp(const struct ofputil_ct_match *match,
+                                 const struct ct_dpif_entry *entry);
+
+bool ofputil_ct_tuple_is_five_tuple(const struct ofputil_ct_tuple *tuple,
+                                    uint8_t ip_proto);
+
+void ofputil_ct_match_format(struct ds *ds,
+                             const struct ofputil_ct_match *match);
+
+bool ofputil_ct_match_parse(struct ofputil_ct_match *match, const char *s,
+                            struct ds *ds);
+
+bool ofputil_is_ct_match_zero(const struct ofputil_ct_match *match);
+
+#endif /* lib/ofp-ct-util.h */
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index e5403519f..51903a658 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2230,7 +2230,7 @@  udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
-AT_SETUP([conntrack - ct flush by 5-tuple])
+AT_SETUP([conntrack - ct flush])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
@@ -2291,6 +2291,86 @@  AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
 AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl
 ])
 
+dnl Test UDP from port 1 and 2, partial flush by src port
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_src=2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
+
+dnl Test UDP from port 1 and 2, partial flush by dst port
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_dst=2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_proto=17,ct_tp_dst=1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
+
+dnl Test UDP from port 1 and 2, partial flush by src address
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
+
+dnl Test UDP from port 1 and 2, partial flush by dst address
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1" | sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_dst=10.1.1.1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "10\.1\.1\.1"], [1])
+
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP