[net-next,2/2] openvswitch: Support conntrack zone limit

Message ID 1523902550-10767-3-git-send-email-yihung.wei@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • openvswitch: Support conntrack zone limit
Related show

Commit Message

Yi-Hung Wei April 16, 2018, 6:15 p.m.
Currently, nf_conntrack_max is used to limit the maximum number of
conntrack entries in the conntrack table for every network namespace.
For the VMs and containers that reside in the same namespace,
they share the same conntrack table, and the total # of conntrack entries
for all the VMs and containers are limited by nf_conntrack_max.  In this
case, if one of the VM/container abuses the usage the conntrack entries,
it blocks the others to commit valid conntrack entry into the conntrack
table.  Even if we can possibly put the VM in different network namespace,
the current nf_conntrack_max configuration is kind of rigid that we cannot
limit different VM/container to have different # conntrack entries.

To address the aforementioned issue, this patch proposes to have a
fine-grained mechanism that could further limit the # of conntrack entries
per-zone.  For example, we can designate different zone to different VM,
and set conntrack limit to each zone.  By providing this isolation, a
mis-behaved VM only consumes the conntrack entries in its own zone, and
it will not influence other well-behaved VMs.  Moreover, the users can
set various conntrack limit to different zone based on their preference.

The proposed implementation utilizes Netfilter's nf_conncount backend
to count the number of connections in a particular zone.  If the number of
connection is above a configured limitation, ovs will return ENOMEM to the
userspace.  If userspace does not configure the zone limit, the limit
defaults to zero that is no limitation, which is backward compatible to
the behavior without this patch.

The following high leve APIs are provided to the userspace:
  - OVS_CT_LIMIT_CMD_SET:
    * set default connection limit for all zones
    * set the connection limit for a particular zone
  - OVS_CT_LIMIT_CMD_DEL:
    * remove the connection limit for a particular zone
  - OVS_CT_LIMIT_CMD_GET:
    * get the default connection limit for all zones
    * get the connection limit for a particular zone

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
---
 net/openvswitch/Kconfig     |   3 +-
 net/openvswitch/conntrack.c | 497 +++++++++++++++++++++++++++++++++++++++++++-
 net/openvswitch/conntrack.h |   9 +-
 net/openvswitch/datapath.c  |   7 +-
 net/openvswitch/datapath.h  |   1 +
 5 files changed, 511 insertions(+), 6 deletions(-)

Comments

Gregory Rose April 17, 2018, midnight | #1
On 4/16/2018 11:15 AM, Yi-Hung Wei wrote:
> Currently, nf_conntrack_max is used to limit the maximum number of
> conntrack entries in the conntrack table for every network namespace.
> For the VMs and containers that reside in the same namespace,
> they share the same conntrack table, and the total # of conntrack entries
> for all the VMs and containers are limited by nf_conntrack_max.  In this
> case, if one of the VM/container abuses the usage the conntrack entries,
> it blocks the others to commit valid conntrack entry into the conntrack

s/to commit/from committing/
s/entry/entries/

> table.  Even if we can possibly put the VM in different network namespace,
> the current nf_conntrack_max configuration is kind of rigid that we cannot
> limit different VM/container to have different # conntrack entries.
>
> To address the aforementioned issue, this patch proposes to have a
> fine-grained mechanism that could further limit the # of conntrack entries
> per-zone.  For example, we can designate different zone to different VM,
> and set conntrack limit to each zone.  By providing this isolation, a
> mis-behaved VM only consumes the conntrack entries in its own zone, and
> it will not influence other well-behaved VMs.  Moreover, the users can
> set various conntrack limit to different zone based on their preference.
>
> The proposed implementation utilizes Netfilter's nf_conncount backend
> to count the number of connections in a particular zone.  If the number of
> connection is above a configured limitation, ovs will return ENOMEM to the
> userspace.  If userspace does not configure the zone limit, the limit
> defaults to zero that is no limitation, which is backward compatible to
> the behavior without this patch.
>
> The following high leve APIs are provided to the userspace:
>    - OVS_CT_LIMIT_CMD_SET:
>      * set default connection limit for all zones
>      * set the connection limit for a particular zone
>    - OVS_CT_LIMIT_CMD_DEL:
>      * remove the connection limit for a particular zone
>    - OVS_CT_LIMIT_CMD_GET:
>      * get the default connection limit for all zones
>      * get the connection limit for a particular zone
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>

I think this is a great idea but I suggest porting to the iproute2 package
so everyone can use it.  Then git rid of the OVS specific prefixes.
Presuming of course that the conntrack connection
limit backend works there as well I guess.  If it doesn't, then I'd 
suggest extending
it.  This is a nice feature for all users in my opinion and then OVS
can take advantage of it as well.

Thanks!

- Greg

> ---
>   net/openvswitch/Kconfig     |   3 +-
>   net/openvswitch/conntrack.c | 497 +++++++++++++++++++++++++++++++++++++++++++-
>   net/openvswitch/conntrack.h |   9 +-
>   net/openvswitch/datapath.c  |   7 +-
>   net/openvswitch/datapath.h  |   1 +
>   5 files changed, 511 insertions(+), 6 deletions(-)
>
> diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
> index 2650205cdaf9..89da9512ec1e 100644
> --- a/net/openvswitch/Kconfig
> +++ b/net/openvswitch/Kconfig
> @@ -9,7 +9,8 @@ config OPENVSWITCH
>   		   (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
>   				     (!NF_NAT || NF_NAT) && \
>   				     (!NF_NAT_IPV4 || NF_NAT_IPV4) && \
> -				     (!NF_NAT_IPV6 || NF_NAT_IPV6)))
> +				     (!NF_NAT_IPV6 || NF_NAT_IPV6) && \
> +				     (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
>   	select LIBCRC32C
>   	select MPLS
>   	select NET_MPLS_GSO
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index c5904f629091..2f51da91d056 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -17,7 +17,9 @@
>   #include <linux/udp.h>
>   #include <linux/sctp.h>
>   #include <net/ip.h>
> +#include <net/genetlink.h>
>   #include <net/netfilter/nf_conntrack_core.h>
> +#include <net/netfilter/nf_conntrack_count.h>
>   #include <net/netfilter/nf_conntrack_helper.h>
>   #include <net/netfilter/nf_conntrack_labels.h>
>   #include <net/netfilter/nf_conntrack_seqadj.h>
> @@ -76,6 +78,38 @@ struct ovs_conntrack_info {
>   #endif
>   };
>   
> +#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +#define OVS_CT_LIMIT_UNLIMITED	0
> +#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
> +#define CT_LIMIT_HASH_BUCKETS 512
> +
> +struct ovs_ct_limit {
> +	/* Elements in ovs_ct_limit_info->limits hash table */
> +	struct hlist_node hlist_node;
> +	struct rcu_head rcu;
> +	u16 zone;
> +	u32 limit;
> +};
> +
> +struct ovs_ct_limit_info {
> +	u32 default_limit;
> +	struct hlist_head *limits;
> +	struct nf_conncount_data *data __aligned(8);
> +};
> +
> +static const struct nla_policy ct_limit_policy[OVS_CT_LIMIT_ATTR_MAX + 1] = {
> +	[OVS_CT_LIMIT_ATTR_OPTION] = { .type = NLA_NESTED, },
> +};
> +
> +static const struct nla_policy
> +	ct_zone_limit_policy[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1] = {
> +		[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT] = { .type = NLA_U32, },
> +		[OVS_CT_ZONE_LIMIT_ATTR_ZONE] = { .type = NLA_U16, },
> +		[OVS_CT_ZONE_LIMIT_ATTR_LIMIT] = { .type = NLA_U32, },
> +		[OVS_CT_ZONE_LIMIT_ATTR_COUNT] = { .type = NLA_U32, },
> +};
> +#endif
> +
>   static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
>   
>   static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
> @@ -1036,6 +1070,94 @@ static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
>   	return false;
>   }
>   
> +#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +static struct hlist_head *ct_limit_hash_bucket(
> +	const struct ovs_ct_limit_info *info, u16 zone)
> +{
> +	return &info->limits[zone & (CT_LIMIT_HASH_BUCKETS - 1)];
> +}
> +
> +/* Call with ovs_mutex */
> +static void ct_limit_set(const struct ovs_ct_limit_info *info,
> +			 struct ovs_ct_limit *new_ct_limit)
> +{
> +	struct ovs_ct_limit *ct_limit;
> +	struct hlist_head *head;
> +
> +	head = ct_limit_hash_bucket(info, new_ct_limit->zone);
> +	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
> +		if (ct_limit->zone == new_ct_limit->zone) {
> +			hlist_replace_rcu(&ct_limit->hlist_node,
> +					  &new_ct_limit->hlist_node);
> +			kfree_rcu(ct_limit, rcu);
> +			return;
> +		}
> +	}
> +
> +	hlist_add_head_rcu(&new_ct_limit->hlist_node, head);
> +}
> +
> +/* Call with ovs_mutex */
> +static void ct_limit_del(const struct ovs_ct_limit_info *info, u16 zone)
> +{
> +	struct ovs_ct_limit *ct_limit;
> +	struct hlist_head *head;
> +
> +	head = ct_limit_hash_bucket(info, zone);
> +	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
> +		if (ct_limit->zone == zone) {
> +			hlist_del_rcu(&ct_limit->hlist_node);
> +			kfree_rcu(ct_limit, rcu);
> +			return;
> +		}
> +	}
> +}
> +
> +/* Call with RCU read lock */
> +static u32 ct_limit_get(const struct ovs_ct_limit_info *info, u16 zone)
> +{
> +	struct ovs_ct_limit *ct_limit;
> +	struct hlist_head *head;
> +
> +	head = ct_limit_hash_bucket(info, zone);
> +	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
> +		if (ct_limit->zone == zone)
> +			return ct_limit->limit;
> +	}
> +
> +	return info->default_limit;
> +}
> +
> +static int ovs_ct_check_limit(struct net *net,
> +			      const struct ovs_conntrack_info *info,
> +			      const struct nf_conntrack_tuple *tuple)
> +{
> +	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
> +	const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
> +	u32 per_zone_limit, connections;
> +	u32 conncount_key[5];
> +
> +	conncount_key[0] = info->zone.id;
> +
> +	rcu_read_lock();
> +	per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id);
> +	if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED) {
> +		rcu_read_unlock();
> +		return 0;
> +	}
> +
> +	connections = nf_conncount_count(net, ct_limit_info->data,
> +					 conncount_key, tuple, &info->zone);
> +	if (connections > per_zone_limit) {
> +		rcu_read_unlock();
> +		return -ENOMEM;
> +	}
> +
> +	rcu_read_unlock();
> +	return 0;
> +}
> +#endif
> +
>   /* Lookup connection and confirm if unconfirmed. */
>   static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>   			 const struct ovs_conntrack_info *info,
> @@ -1054,6 +1176,13 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>   	if (!ct)
>   		return 0;
>   
> +#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +	err = ovs_ct_check_limit(net, info,
> +				 &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
> +	if (err)
> +		return err;
> +#endif
> +
>   	/* Set the conntrack event mask if given.  NEW and DELETE events have
>   	 * their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
>   	 * typically would receive many kinds of updates.  Setting the event
> @@ -1655,7 +1784,363 @@ static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info)
>   		nf_ct_tmpl_free(ct_info->ct);
>   }
>   
> -void ovs_ct_init(struct net *net)
> +#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +static int ovs_ct_limit_init(struct net *net, struct ovs_net *ovs_net)
> +{
> +	int i;
> +
> +	ovs_net->ct_limit_info = kmalloc(sizeof *ovs_net->ct_limit_info,
> +					 GFP_KERNEL);
> +	if (!ovs_net->ct_limit_info)
> +		return -ENOMEM;
> +
> +	ovs_net->ct_limit_info->default_limit = OVS_CT_LIMIT_DEFAULT;
> +	ovs_net->ct_limit_info->limits =
> +		kmalloc_array(CT_LIMIT_HASH_BUCKETS, sizeof(struct hlist_head),
> +			      GFP_KERNEL);
> +	if (!ovs_net->ct_limit_info->limits) {
> +		kfree(ovs_net->ct_limit_info);
> +		return -ENOMEM;
> +	}
> +
> +	for (i = 0; i < CT_LIMIT_HASH_BUCKETS; i++)
> +		INIT_HLIST_HEAD(&ovs_net->ct_limit_info->limits[i]);
> +
> +	ovs_net->ct_limit_info->data =
> +		nf_conncount_init(net, NFPROTO_INET, sizeof(u32));
> +
> +	if (IS_ERR(ovs_net->ct_limit_info->data)) {
> +		kfree(ovs_net->ct_limit_info->limits);
> +		kfree(ovs_net->ct_limit_info);
> +		return PTR_ERR(ovs_net->ct_limit_info->data);
> +	}
> +	return 0;
> +}
> +
> +static void ovs_ct_limit_exit(struct net *net, struct ovs_net *ovs_net)
> +{
> +	const struct ovs_ct_limit_info *info = ovs_net->ct_limit_info;
> +	int i;
> +
> +	nf_conncount_destroy(net, NFPROTO_INET, info->data);
> +	for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) {
> +		struct hlist_head *head = &info->limits[i];
> +		struct ovs_ct_limit *ct_limit;
> +
> +		hlist_for_each_entry_rcu(ct_limit, head, hlist_node)
> +			kfree_rcu(ct_limit, rcu);
> +	}
> +	kfree(ovs_net->ct_limit_info->limits);
> +	kfree(ovs_net->ct_limit_info);
> +}
> +
> +static struct sk_buff *
> +ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd,
> +			     struct ovs_header **ovs_reply_header)
> +{
> +	struct sk_buff *skb;
> +	struct ovs_header *ovs_header = info->userhdr;
> +
> +	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return ERR_PTR(-ENOMEM);
> +
> +	*ovs_reply_header = genlmsg_put(skb, info->snd_portid,
> +					info->snd_seq,
> +					&dp_ct_limit_genl_family, 0, cmd);
> +
> +	if (!*ovs_reply_header) {
> +		nlmsg_free(skb);
> +		return ERR_PTR(-EMSGSIZE);
> +	}
> +	(*ovs_reply_header)->dp_ifindex = ovs_header->dp_ifindex;
> +
> +	return skb;
> +}
> +
> +static int ovs_ct_limit_set_zone_limit(struct nlattr *nla_zone_limit,
> +				       struct ovs_ct_limit_info *info)
> +{
> +	struct nlattr *nla;
> +	int rem, err;
> +
> +	nla_for_each_nested(nla, nla_zone_limit, rem) {
> +		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
> +		struct ovs_ct_limit *ct_limit;
> +
> +		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_SET_REQ)
> +			return  -EINVAL;
> +
> +		err = nla_parse((struct nlattr **)&attr,
> +				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
> +				nla_len(nla), ct_zone_limit_policy, NULL);
> +		if (err)
> +			return err;
> +
> +		if (attr[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT]) {
> +			u32 default_limit = nla_get_u32(
> +				attr[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT]);
> +			ovs_lock();
> +			info->default_limit = default_limit;
> +			ovs_unlock();
> +		} else {
> +			if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE] ||
> +			    !attr[OVS_CT_ZONE_LIMIT_ATTR_LIMIT]) {
> +				return -EINVAL;
> +			}
> +
> +			ct_limit = kmalloc(sizeof(*ct_limit), GFP_KERNEL);
> +			if (!ct_limit)
> +				return -ENOMEM;
> +
> +			ct_limit->zone = nla_get_u16(
> +				attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
> +			ct_limit->limit = nla_get_u32(
> +				attr[OVS_CT_ZONE_LIMIT_ATTR_LIMIT]);
> +
> +			ovs_lock();
> +			ct_limit_set(info, ct_limit);
> +			ovs_unlock();
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int ovs_ct_limit_del_zone_limit(struct nlattr *nla_zone_limit,
> +				       struct ovs_ct_limit_info *info)
> +{
> +	struct nlattr *nla;
> +	int rem, err;
> +
> +	nla_for_each_nested(nla, nla_zone_limit, rem) {
> +		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
> +		u16 zone;
> +
> +		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ)
> +			return  -EINVAL;
> +
> +		err = nla_parse((struct nlattr **)&attr,
> +				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
> +				nla_len(nla), ct_zone_limit_policy, NULL);
> +		if (err)
> +			return err;
> +
> +		if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE])
> +			return -EINVAL;
> +
> +		zone = nla_get_u16(attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
> +
> +		ovs_lock();
> +		ct_limit_del(info, zone);
> +		ovs_unlock();
> +	}
> +	return 0;
> +}
> +
> +static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info,
> +					  struct sk_buff *reply)
> +{
> +	int err;
> +	struct nlattr *nla_nested;
> +
> +	nla_nested = nla_nest_start(reply, OVS_CT_ZONE_LIMIT_ATTR_GET_RLY);
> +
> +	err = nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT,
> +			  info->default_limit);
> +	if (err)
> +		return err;
> +
> +	nla_nest_end(reply, nla_nested);
> +	return 0;
> +}
> +
> +static int ovs_ct_limit_get_zone_limit(struct net *net,
> +				       struct nlattr *nla_zone_limit,
> +				       struct ovs_ct_limit_info *info,
> +				       struct sk_buff *reply)
> +{
> +	struct nlattr *nla, *nla_nested;
> +	int rem, err;
> +	u16 zone;
> +	u32 limit, count, conncount_key[5];
> +	struct nf_conntrack_zone ct_zone;
> +
> +	nla_for_each_nested(nla, nla_zone_limit, rem) {
> +		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
> +
> +		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_GET_REQ)
> +			return -EINVAL;
> +
> +		err = nla_parse((struct nlattr **)&attr,
> +				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
> +				nla_len(nla), ct_zone_limit_policy, NULL);
> +		if (err)
> +			return err;
> +
> +		if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE])
> +			return -EINVAL;
> +
> +		zone = nla_get_u16(attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
> +		nf_ct_zone_init(&ct_zone, zone, NF_CT_DEFAULT_ZONE_DIR, 0);
> +		rcu_read_lock();
> +		limit = ct_limit_get(info, zone);
> +		rcu_read_unlock();
> +
> +		conncount_key[0] = zone;
> +		count = nf_conncount_count(net, info->data, conncount_key,
> +					   NULL, &ct_zone);
> +
> +		nla_nested = nla_nest_start(reply,
> +					    OVS_CT_ZONE_LIMIT_ATTR_GET_RLY);
> +		if (nla_put_u16(reply, OVS_CT_ZONE_LIMIT_ATTR_ZONE, zone) ||
> +		    nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_LIMIT, limit) ||
> +		    nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_COUNT, count))
> +			return -EMSGSIZE;
> +		nla_nest_end(reply, nla_nested);
> +	}
> +
> +	return 0;
> +}
> +
> +static int ovs_ct_limit_cmd_set(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr **a = info->attrs;
> +	struct sk_buff *reply;
> +	struct ovs_header *ovs_reply_header;
> +	struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
> +	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
> +	int err;
> +
> +	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_SET,
> +					     &ovs_reply_header);
> +	if (IS_ERR(reply))
> +		return PTR_ERR(reply);
> +
> +	if (!a[OVS_CT_LIMIT_ATTR_OPTION])
> +		return -EINVAL;
> +
> +	err = ovs_ct_limit_set_zone_limit(a[OVS_CT_LIMIT_ATTR_OPTION],
> +					  ct_limit_info);
> +	if (err)
> +		goto exit_err;
> +
> +	genlmsg_end(reply, ovs_reply_header);
> +	return genlmsg_reply(reply, info);
> +
> +exit_err:
> +	nlmsg_free(reply);
> +	return err;
> +}
> +
> +static int ovs_ct_limit_cmd_del(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr **a = info->attrs;
> +	struct sk_buff *reply;
> +	struct ovs_header *ovs_reply_header;
> +	struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
> +	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
> +	int err;
> +
> +	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_DEL,
> +					     &ovs_reply_header);
> +	if (IS_ERR(reply))
> +		return PTR_ERR(reply);
> +
> +	if (!a[OVS_CT_LIMIT_ATTR_OPTION])
> +		return -EINVAL;
> +
> +	err = ovs_ct_limit_del_zone_limit(a[OVS_CT_LIMIT_ATTR_OPTION],
> +					  ct_limit_info);
> +	if (err)
> +		goto exit_err;
> +
> +	genlmsg_end(reply, ovs_reply_header);
> +	return genlmsg_reply(reply, info);
> +
> +exit_err:
> +	nlmsg_free(reply);
> +	return err;
> +}
> +
> +static int ovs_ct_limit_cmd_get(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct nlattr **a = info->attrs;
> +	struct nlattr *nla_reply;
> +	struct sk_buff *reply;
> +	struct ovs_header *ovs_reply_header;
> +	struct net *net = sock_net(skb->sk);
> +	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
> +	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
> +	int err;
> +
> +	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_GET,
> +					     &ovs_reply_header);
> +	if (IS_ERR(reply))
> +		return PTR_ERR(reply);
> +
> +	nla_reply = nla_nest_start(reply, OVS_CT_LIMIT_ATTR_OPTION);
> +
> +	err = ovs_ct_limit_get_default_limit(ct_limit_info, reply);
> +	if (err)
> +		goto exit_err;
> +
> +	if (a[OVS_CT_LIMIT_ATTR_OPTION]) {
> +		err = ovs_ct_limit_get_zone_limit(
> +			net, a[OVS_CT_LIMIT_ATTR_OPTION], ct_limit_info,
> +			reply);
> +		if (err)
> +			goto exit_err;
> +	}
> +
> +	nla_nest_end(reply, nla_reply);
> +	genlmsg_end(reply, ovs_reply_header);
> +	return genlmsg_reply(reply, info);
> +
> +exit_err:
> +	nlmsg_free(reply);
> +	return err;
> +}
> +
> +static struct genl_ops ct_limit_genl_ops[] = {
> +	{ .cmd = OVS_CT_LIMIT_CMD_SET,
> +		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
> +					   * privilege. */
> +		.policy = ct_limit_policy,
> +		.doit = ovs_ct_limit_cmd_set,
> +	},
> +	{ .cmd = OVS_CT_LIMIT_CMD_DEL,
> +		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
> +					   * privilege. */
> +		.policy = ct_limit_policy,
> +		.doit = ovs_ct_limit_cmd_del,
> +	},
> +	{ .cmd = OVS_CT_LIMIT_CMD_GET,
> +		.flags = 0,		  /* OK for unprivileged users. */
> +		.policy = ct_limit_policy,
> +		.doit = ovs_ct_limit_cmd_get,
> +	},
> +};
> +
> +static const struct genl_multicast_group ovs_ct_limit_multicast_group = {
> +	.name = OVS_CT_LIMIT_MCGROUP,
> +};
> +
> +struct genl_family dp_ct_limit_genl_family __ro_after_init = {
> +	.hdrsize = sizeof(struct ovs_header),
> +	.name = OVS_CT_LIMIT_FAMILY,
> +	.version = OVS_CT_LIMIT_VERSION,
> +	.maxattr = OVS_CT_LIMIT_ATTR_MAX,
> +	.netnsok = true,
> +	.parallel_ops = true,
> +	.ops = ct_limit_genl_ops,
> +	.n_ops = ARRAY_SIZE(ct_limit_genl_ops),
> +	.mcgrps = &ovs_ct_limit_multicast_group,
> +	.n_mcgrps = 1,
> +	.module = THIS_MODULE,
> +};
> +#endif
> +
> +int ovs_ct_init(struct net *net)
>   {
>   	unsigned int n_bits = sizeof(struct ovs_key_ct_labels) * BITS_PER_BYTE;
>   	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
> @@ -1666,12 +2151,22 @@ void ovs_ct_init(struct net *net)
>   	} else {
>   		ovs_net->xt_label = true;
>   	}
> +
> +#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +	return ovs_ct_limit_init(net, ovs_net);
> +#else
> +	return 0;
> +#endif
>   }
>   
>   void ovs_ct_exit(struct net *net)
>   {
>   	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
>   
> +#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +	ovs_ct_limit_exit(net, ovs_net);
> +#endif
> +
>   	if (ovs_net->xt_label)
>   		nf_connlabels_put(net);
>   }
> diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
> index 399dfdd2c4f9..900dadd70974 100644
> --- a/net/openvswitch/conntrack.h
> +++ b/net/openvswitch/conntrack.h
> @@ -17,10 +17,11 @@
>   #include "flow.h"
>   
>   struct ovs_conntrack_info;
> +struct ovs_ct_limit_info;
>   enum ovs_key_attr;
>   
>   #if IS_ENABLED(CONFIG_NF_CONNTRACK)
> -void ovs_ct_init(struct net *);
> +int ovs_ct_init(struct net *);
>   void ovs_ct_exit(struct net *);
>   bool ovs_ct_verify(struct net *, enum ovs_key_attr attr);
>   int ovs_ct_copy_action(struct net *, const struct nlattr *,
> @@ -44,7 +45,7 @@ void ovs_ct_free_action(const struct nlattr *a);
>   #else
>   #include <linux/errno.h>
>   
> -static inline void ovs_ct_init(struct net *net) { }
> +static inline int ovs_ct_init(struct net *net) { return 0; }
>   
>   static inline void ovs_ct_exit(struct net *net) { }
>   
> @@ -104,4 +105,8 @@ static inline void ovs_ct_free_action(const struct nlattr *a) { }
>   
>   #define CT_SUPPORTED_MASK 0
>   #endif /* CONFIG_NF_CONNTRACK */
> +
> +#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +extern struct genl_family dp_ct_limit_genl_family;
> +#endif
>   #endif /* ovs_conntrack.h */
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 015e24e08909..a61818e94396 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -2288,6 +2288,9 @@ static struct genl_family * const dp_genl_families[] = {
>   	&dp_flow_genl_family,
>   	&dp_packet_genl_family,
>   	&dp_meter_genl_family,
> +#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +	&dp_ct_limit_genl_family,
> +#endif
>   };
>   
>   static void dp_unregister_genl(int n_families)
> @@ -2323,8 +2326,7 @@ static int __net_init ovs_init_net(struct net *net)
>   
>   	INIT_LIST_HEAD(&ovs_net->dps);
>   	INIT_WORK(&ovs_net->dp_notify_work, ovs_dp_notify_wq);
> -	ovs_ct_init(net);
> -	return 0;
> +	return ovs_ct_init(net);
>   }
>   
>   static void __net_exit list_vports_from_net(struct net *net, struct net *dnet,
> @@ -2469,3 +2471,4 @@ MODULE_ALIAS_GENL_FAMILY(OVS_VPORT_FAMILY);
>   MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY);
>   MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY);
>   MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY);
> +MODULE_ALIAS_GENL_FAMILY(OVS_CT_LIMIT_FAMILY);
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index 523d65526766..51bd4dcb6c8b 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -144,6 +144,7 @@ struct dp_upcall_info {
>   struct ovs_net {
>   	struct list_head dps;
>   	struct work_struct dp_notify_work;
> +	struct ovs_ct_limit_info *ct_limit_info;
>   
>   	/* Module reference for configuring conntrack. */
>   	bool xt_label;
Yi-Hung Wei April 18, 2018, 12:30 a.m. | #2
> s/to commit/from committing/
> s/entry/entries/

Thanks, will fix that in both patches in v2.


> I think this is a great idea but I suggest porting to the iproute2 package
> so everyone can use it.  Then git rid of the OVS specific prefixes.
> Presuming of course that the conntrack connection
> limit backend works there as well I guess.  If it doesn't, then I'd suggest
> extending
> it.  This is a nice feature for all users in my opinion and then OVS
> can take advantage of it as well.

Thanks for the comment.  And yes, I think currently, iptables’s
connlimit extension does support limiting the # of connections.  Users
need to configure the zone properly, and the iptable’s connlimit
extension is using netfilter's nf_conncount backend already.

The main goal for this patch is to utilize netfilter backend
(nf_conncount) to count and limit the number of connections. OVS needs
the proposed OVS_CT_LIMIT netlink API and the corresponding booking
data structure because the current nf_conncount backend only counts
the # of connections, but it does not keep track of the connection
limit in nf_conncount.

Thanks,

-Yi-Hung
Gregory Rose April 18, 2018, 3:05 p.m. | #3
On 4/17/2018 5:30 PM, Yi-Hung Wei wrote:
>> s/to commit/from committing/
>> s/entry/entries/
> Thanks, will fix that in both patches in v2.
>
>
>> I think this is a great idea but I suggest porting to the iproute2 package
>> so everyone can use it.  Then git rid of the OVS specific prefixes.
>> Presuming of course that the conntrack connection
>> limit backend works there as well I guess.  If it doesn't, then I'd suggest
>> extending
>> it.  This is a nice feature for all users in my opinion and then OVS
>> can take advantage of it as well.
> Thanks for the comment.  And yes, I think currently, iptables’s
> connlimit extension does support limiting the # of connections.  Users
> need to configure the zone properly, and the iptable’s connlimit
> extension is using netfilter's nf_conncount backend already.
>
> The main goal for this patch is to utilize netfilter backend
> (nf_conncount) to count and limit the number of connections. OVS needs
> the proposed OVS_CT_LIMIT netlink API and the corresponding booking
> data structure because the current nf_conncount backend only counts
> the # of connections, but it does not keep track of the connection
> limit in nf_conncount.
>
> Thanks,
>
> -Yi-Hung

Thanks Yi-hung, I figured I was just missing something there.  I 
appreciate the explanation.

- Greg

Patch

diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 2650205cdaf9..89da9512ec1e 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -9,7 +9,8 @@  config OPENVSWITCH
 		   (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
 				     (!NF_NAT || NF_NAT) && \
 				     (!NF_NAT_IPV4 || NF_NAT_IPV4) && \
-				     (!NF_NAT_IPV6 || NF_NAT_IPV6)))
+				     (!NF_NAT_IPV6 || NF_NAT_IPV6) && \
+				     (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
 	select LIBCRC32C
 	select MPLS
 	select NET_MPLS_GSO
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c5904f629091..2f51da91d056 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -17,7 +17,9 @@ 
 #include <linux/udp.h>
 #include <linux/sctp.h>
 #include <net/ip.h>
+#include <net/genetlink.h>
 #include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_count.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_labels.h>
 #include <net/netfilter/nf_conntrack_seqadj.h>
@@ -76,6 +78,38 @@  struct ovs_conntrack_info {
 #endif
 };
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+#define OVS_CT_LIMIT_UNLIMITED	0
+#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
+#define CT_LIMIT_HASH_BUCKETS 512
+
+struct ovs_ct_limit {
+	/* Elements in ovs_ct_limit_info->limits hash table */
+	struct hlist_node hlist_node;
+	struct rcu_head rcu;
+	u16 zone;
+	u32 limit;
+};
+
+struct ovs_ct_limit_info {
+	u32 default_limit;
+	struct hlist_head *limits;
+	struct nf_conncount_data *data __aligned(8);
+};
+
+static const struct nla_policy ct_limit_policy[OVS_CT_LIMIT_ATTR_MAX + 1] = {
+	[OVS_CT_LIMIT_ATTR_OPTION] = { .type = NLA_NESTED, },
+};
+
+static const struct nla_policy
+	ct_zone_limit_policy[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1] = {
+		[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT] = { .type = NLA_U32, },
+		[OVS_CT_ZONE_LIMIT_ATTR_ZONE] = { .type = NLA_U16, },
+		[OVS_CT_ZONE_LIMIT_ATTR_LIMIT] = { .type = NLA_U32, },
+		[OVS_CT_ZONE_LIMIT_ATTR_COUNT] = { .type = NLA_U32, },
+};
+#endif
+
 static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
 
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
@@ -1036,6 +1070,94 @@  static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
 	return false;
 }
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+static struct hlist_head *ct_limit_hash_bucket(
+	const struct ovs_ct_limit_info *info, u16 zone)
+{
+	return &info->limits[zone & (CT_LIMIT_HASH_BUCKETS - 1)];
+}
+
+/* Call with ovs_mutex */
+static void ct_limit_set(const struct ovs_ct_limit_info *info,
+			 struct ovs_ct_limit *new_ct_limit)
+{
+	struct ovs_ct_limit *ct_limit;
+	struct hlist_head *head;
+
+	head = ct_limit_hash_bucket(info, new_ct_limit->zone);
+	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+		if (ct_limit->zone == new_ct_limit->zone) {
+			hlist_replace_rcu(&ct_limit->hlist_node,
+					  &new_ct_limit->hlist_node);
+			kfree_rcu(ct_limit, rcu);
+			return;
+		}
+	}
+
+	hlist_add_head_rcu(&new_ct_limit->hlist_node, head);
+}
+
+/* Call with ovs_mutex */
+static void ct_limit_del(const struct ovs_ct_limit_info *info, u16 zone)
+{
+	struct ovs_ct_limit *ct_limit;
+	struct hlist_head *head;
+
+	head = ct_limit_hash_bucket(info, zone);
+	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+		if (ct_limit->zone == zone) {
+			hlist_del_rcu(&ct_limit->hlist_node);
+			kfree_rcu(ct_limit, rcu);
+			return;
+		}
+	}
+}
+
+/* Call with RCU read lock */
+static u32 ct_limit_get(const struct ovs_ct_limit_info *info, u16 zone)
+{
+	struct ovs_ct_limit *ct_limit;
+	struct hlist_head *head;
+
+	head = ct_limit_hash_bucket(info, zone);
+	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+		if (ct_limit->zone == zone)
+			return ct_limit->limit;
+	}
+
+	return info->default_limit;
+}
+
+static int ovs_ct_check_limit(struct net *net,
+			      const struct ovs_conntrack_info *info,
+			      const struct nf_conntrack_tuple *tuple)
+{
+	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+	const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	u32 per_zone_limit, connections;
+	u32 conncount_key[5];
+
+	conncount_key[0] = info->zone.id;
+
+	rcu_read_lock();
+	per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id);
+	if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	connections = nf_conncount_count(net, ct_limit_info->data,
+					 conncount_key, tuple, &info->zone);
+	if (connections > per_zone_limit) {
+		rcu_read_unlock();
+		return -ENOMEM;
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+#endif
+
 /* Lookup connection and confirm if unconfirmed. */
 static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
 			 const struct ovs_conntrack_info *info,
@@ -1054,6 +1176,13 @@  static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
 	if (!ct)
 		return 0;
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	err = ovs_ct_check_limit(net, info,
+				 &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+	if (err)
+		return err;
+#endif
+
 	/* Set the conntrack event mask if given.  NEW and DELETE events have
 	 * their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
 	 * typically would receive many kinds of updates.  Setting the event
@@ -1655,7 +1784,363 @@  static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info)
 		nf_ct_tmpl_free(ct_info->ct);
 }
 
-void ovs_ct_init(struct net *net)
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+static int ovs_ct_limit_init(struct net *net, struct ovs_net *ovs_net)
+{
+	int i;
+
+	ovs_net->ct_limit_info = kmalloc(sizeof *ovs_net->ct_limit_info,
+					 GFP_KERNEL);
+	if (!ovs_net->ct_limit_info)
+		return -ENOMEM;
+
+	ovs_net->ct_limit_info->default_limit = OVS_CT_LIMIT_DEFAULT;
+	ovs_net->ct_limit_info->limits =
+		kmalloc_array(CT_LIMIT_HASH_BUCKETS, sizeof(struct hlist_head),
+			      GFP_KERNEL);
+	if (!ovs_net->ct_limit_info->limits) {
+		kfree(ovs_net->ct_limit_info);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < CT_LIMIT_HASH_BUCKETS; i++)
+		INIT_HLIST_HEAD(&ovs_net->ct_limit_info->limits[i]);
+
+	ovs_net->ct_limit_info->data =
+		nf_conncount_init(net, NFPROTO_INET, sizeof(u32));
+
+	if (IS_ERR(ovs_net->ct_limit_info->data)) {
+		kfree(ovs_net->ct_limit_info->limits);
+		kfree(ovs_net->ct_limit_info);
+		return PTR_ERR(ovs_net->ct_limit_info->data);
+	}
+	return 0;
+}
+
+static void ovs_ct_limit_exit(struct net *net, struct ovs_net *ovs_net)
+{
+	const struct ovs_ct_limit_info *info = ovs_net->ct_limit_info;
+	int i;
+
+	nf_conncount_destroy(net, NFPROTO_INET, info->data);
+	for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) {
+		struct hlist_head *head = &info->limits[i];
+		struct ovs_ct_limit *ct_limit;
+
+		hlist_for_each_entry_rcu(ct_limit, head, hlist_node)
+			kfree_rcu(ct_limit, rcu);
+	}
+	kfree(ovs_net->ct_limit_info->limits);
+	kfree(ovs_net->ct_limit_info);
+}
+
+static struct sk_buff *
+ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd,
+			     struct ovs_header **ovs_reply_header)
+{
+	struct sk_buff *skb;
+	struct ovs_header *ovs_header = info->userhdr;
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOMEM);
+
+	*ovs_reply_header = genlmsg_put(skb, info->snd_portid,
+					info->snd_seq,
+					&dp_ct_limit_genl_family, 0, cmd);
+
+	if (!*ovs_reply_header) {
+		nlmsg_free(skb);
+		return ERR_PTR(-EMSGSIZE);
+	}
+	(*ovs_reply_header)->dp_ifindex = ovs_header->dp_ifindex;
+
+	return skb;
+}
+
+static int ovs_ct_limit_set_zone_limit(struct nlattr *nla_zone_limit,
+				       struct ovs_ct_limit_info *info)
+{
+	struct nlattr *nla;
+	int rem, err;
+
+	nla_for_each_nested(nla, nla_zone_limit, rem) {
+		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
+		struct ovs_ct_limit *ct_limit;
+
+		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_SET_REQ)
+			return  -EINVAL;
+
+		err = nla_parse((struct nlattr **)&attr,
+				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
+				nla_len(nla), ct_zone_limit_policy, NULL);
+		if (err)
+			return err;
+
+		if (attr[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT]) {
+			u32 default_limit = nla_get_u32(
+				attr[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT]);
+			ovs_lock();
+			info->default_limit = default_limit;
+			ovs_unlock();
+		} else {
+			if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE] ||
+			    !attr[OVS_CT_ZONE_LIMIT_ATTR_LIMIT]) {
+				return -EINVAL;
+			}
+
+			ct_limit = kmalloc(sizeof(*ct_limit), GFP_KERNEL);
+			if (!ct_limit)
+				return -ENOMEM;
+
+			ct_limit->zone = nla_get_u16(
+				attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
+			ct_limit->limit = nla_get_u32(
+				attr[OVS_CT_ZONE_LIMIT_ATTR_LIMIT]);
+
+			ovs_lock();
+			ct_limit_set(info, ct_limit);
+			ovs_unlock();
+		}
+	}
+	return 0;
+}
+
+static int ovs_ct_limit_del_zone_limit(struct nlattr *nla_zone_limit,
+				       struct ovs_ct_limit_info *info)
+{
+	struct nlattr *nla;
+	int rem, err;
+
+	nla_for_each_nested(nla, nla_zone_limit, rem) {
+		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
+		u16 zone;
+
+		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ)
+			return  -EINVAL;
+
+		err = nla_parse((struct nlattr **)&attr,
+				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
+				nla_len(nla), ct_zone_limit_policy, NULL);
+		if (err)
+			return err;
+
+		if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE])
+			return -EINVAL;
+
+		zone = nla_get_u16(attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
+
+		ovs_lock();
+		ct_limit_del(info, zone);
+		ovs_unlock();
+	}
+	return 0;
+}
+
+static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info,
+					  struct sk_buff *reply)
+{
+	int err;
+	struct nlattr *nla_nested;
+
+	nla_nested = nla_nest_start(reply, OVS_CT_ZONE_LIMIT_ATTR_GET_RLY);
+
+	err = nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT,
+			  info->default_limit);
+	if (err)
+		return err;
+
+	nla_nest_end(reply, nla_nested);
+	return 0;
+}
+
+static int ovs_ct_limit_get_zone_limit(struct net *net,
+				       struct nlattr *nla_zone_limit,
+				       struct ovs_ct_limit_info *info,
+				       struct sk_buff *reply)
+{
+	struct nlattr *nla, *nla_nested;
+	int rem, err;
+	u16 zone;
+	u32 limit, count, conncount_key[5];
+	struct nf_conntrack_zone ct_zone;
+
+	nla_for_each_nested(nla, nla_zone_limit, rem) {
+		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
+
+		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_GET_REQ)
+			return -EINVAL;
+
+		err = nla_parse((struct nlattr **)&attr,
+				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
+				nla_len(nla), ct_zone_limit_policy, NULL);
+		if (err)
+			return err;
+
+		if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE])
+			return -EINVAL;
+
+		zone = nla_get_u16(attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
+		nf_ct_zone_init(&ct_zone, zone, NF_CT_DEFAULT_ZONE_DIR, 0);
+		rcu_read_lock();
+		limit = ct_limit_get(info, zone);
+		rcu_read_unlock();
+
+		conncount_key[0] = zone;
+		count = nf_conncount_count(net, info->data, conncount_key,
+					   NULL, &ct_zone);
+
+		nla_nested = nla_nest_start(reply,
+					    OVS_CT_ZONE_LIMIT_ATTR_GET_RLY);
+		if (nla_put_u16(reply, OVS_CT_ZONE_LIMIT_ATTR_ZONE, zone) ||
+		    nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_LIMIT, limit) ||
+		    nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_COUNT, count))
+			return -EMSGSIZE;
+		nla_nest_end(reply, nla_nested);
+	}
+
+	return 0;
+}
+
+static int ovs_ct_limit_cmd_set(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
+	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	int err;
+
+	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_SET,
+					     &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	if (!a[OVS_CT_LIMIT_ATTR_OPTION])
+		return -EINVAL;
+
+	err = ovs_ct_limit_set_zone_limit(a[OVS_CT_LIMIT_ATTR_OPTION],
+					  ct_limit_info);
+	if (err)
+		goto exit_err;
+
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_err:
+	nlmsg_free(reply);
+	return err;
+}
+
+static int ovs_ct_limit_cmd_del(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
+	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	int err;
+
+	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_DEL,
+					     &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	if (!a[OVS_CT_LIMIT_ATTR_OPTION])
+		return -EINVAL;
+
+	err = ovs_ct_limit_del_zone_limit(a[OVS_CT_LIMIT_ATTR_OPTION],
+					  ct_limit_info);
+	if (err)
+		goto exit_err;
+
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_err:
+	nlmsg_free(reply);
+	return err;
+}
+
+static int ovs_ct_limit_cmd_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	struct nlattr *nla_reply;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct net *net = sock_net(skb->sk);
+	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	int err;
+
+	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_GET,
+					     &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	nla_reply = nla_nest_start(reply, OVS_CT_LIMIT_ATTR_OPTION);
+
+	err = ovs_ct_limit_get_default_limit(ct_limit_info, reply);
+	if (err)
+		goto exit_err;
+
+	if (a[OVS_CT_LIMIT_ATTR_OPTION]) {
+		err = ovs_ct_limit_get_zone_limit(
+			net, a[OVS_CT_LIMIT_ATTR_OPTION], ct_limit_info,
+			reply);
+		if (err)
+			goto exit_err;
+	}
+
+	nla_nest_end(reply, nla_reply);
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_err:
+	nlmsg_free(reply);
+	return err;
+}
+
+static struct genl_ops ct_limit_genl_ops[] = {
+	{ .cmd = OVS_CT_LIMIT_CMD_SET,
+		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+					   * privilege. */
+		.policy = ct_limit_policy,
+		.doit = ovs_ct_limit_cmd_set,
+	},
+	{ .cmd = OVS_CT_LIMIT_CMD_DEL,
+		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+					   * privilege. */
+		.policy = ct_limit_policy,
+		.doit = ovs_ct_limit_cmd_del,
+	},
+	{ .cmd = OVS_CT_LIMIT_CMD_GET,
+		.flags = 0,		  /* OK for unprivileged users. */
+		.policy = ct_limit_policy,
+		.doit = ovs_ct_limit_cmd_get,
+	},
+};
+
+static const struct genl_multicast_group ovs_ct_limit_multicast_group = {
+	.name = OVS_CT_LIMIT_MCGROUP,
+};
+
+struct genl_family dp_ct_limit_genl_family __ro_after_init = {
+	.hdrsize = sizeof(struct ovs_header),
+	.name = OVS_CT_LIMIT_FAMILY,
+	.version = OVS_CT_LIMIT_VERSION,
+	.maxattr = OVS_CT_LIMIT_ATTR_MAX,
+	.netnsok = true,
+	.parallel_ops = true,
+	.ops = ct_limit_genl_ops,
+	.n_ops = ARRAY_SIZE(ct_limit_genl_ops),
+	.mcgrps = &ovs_ct_limit_multicast_group,
+	.n_mcgrps = 1,
+	.module = THIS_MODULE,
+};
+#endif
+
+int ovs_ct_init(struct net *net)
 {
 	unsigned int n_bits = sizeof(struct ovs_key_ct_labels) * BITS_PER_BYTE;
 	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
@@ -1666,12 +2151,22 @@  void ovs_ct_init(struct net *net)
 	} else {
 		ovs_net->xt_label = true;
 	}
+
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	return ovs_ct_limit_init(net, ovs_net);
+#else
+	return 0;
+#endif
 }
 
 void ovs_ct_exit(struct net *net)
 {
 	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	ovs_ct_limit_exit(net, ovs_net);
+#endif
+
 	if (ovs_net->xt_label)
 		nf_connlabels_put(net);
 }
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index 399dfdd2c4f9..900dadd70974 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -17,10 +17,11 @@ 
 #include "flow.h"
 
 struct ovs_conntrack_info;
+struct ovs_ct_limit_info;
 enum ovs_key_attr;
 
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
-void ovs_ct_init(struct net *);
+int ovs_ct_init(struct net *);
 void ovs_ct_exit(struct net *);
 bool ovs_ct_verify(struct net *, enum ovs_key_attr attr);
 int ovs_ct_copy_action(struct net *, const struct nlattr *,
@@ -44,7 +45,7 @@  void ovs_ct_free_action(const struct nlattr *a);
 #else
 #include <linux/errno.h>
 
-static inline void ovs_ct_init(struct net *net) { }
+static inline int ovs_ct_init(struct net *net) { return 0; }
 
 static inline void ovs_ct_exit(struct net *net) { }
 
@@ -104,4 +105,8 @@  static inline void ovs_ct_free_action(const struct nlattr *a) { }
 
 #define CT_SUPPORTED_MASK 0
 #endif /* CONFIG_NF_CONNTRACK */
+
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+extern struct genl_family dp_ct_limit_genl_family;
+#endif
 #endif /* ovs_conntrack.h */
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 015e24e08909..a61818e94396 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2288,6 +2288,9 @@  static struct genl_family * const dp_genl_families[] = {
 	&dp_flow_genl_family,
 	&dp_packet_genl_family,
 	&dp_meter_genl_family,
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	&dp_ct_limit_genl_family,
+#endif
 };
 
 static void dp_unregister_genl(int n_families)
@@ -2323,8 +2326,7 @@  static int __net_init ovs_init_net(struct net *net)
 
 	INIT_LIST_HEAD(&ovs_net->dps);
 	INIT_WORK(&ovs_net->dp_notify_work, ovs_dp_notify_wq);
-	ovs_ct_init(net);
-	return 0;
+	return ovs_ct_init(net);
 }
 
 static void __net_exit list_vports_from_net(struct net *net, struct net *dnet,
@@ -2469,3 +2471,4 @@  MODULE_ALIAS_GENL_FAMILY(OVS_VPORT_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY);
+MODULE_ALIAS_GENL_FAMILY(OVS_CT_LIMIT_FAMILY);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 523d65526766..51bd4dcb6c8b 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -144,6 +144,7 @@  struct dp_upcall_info {
 struct ovs_net {
 	struct list_head dps;
 	struct work_struct dp_notify_work;
+	struct ovs_ct_limit_info *ct_limit_info;
 
 	/* Module reference for configuring conntrack. */
 	bool xt_label;