[ovs-dev,branch-2.7,24/25] compat: nf_ct_delete compat.

Submitted by Jarno Rajahalme on March 15, 2017, 11:31 p.m.

Details

Message ID 1489620689-122370-25-git-send-email-jarno@ovn.org
State Superseded
Headers show

Commit Message

Jarno Rajahalme March 15, 2017, 11:31 p.m.
Upstream commit:

    commit f330a7fdbe1611104622faff7e614a246a7d20f0
    Author: Florian Westphal <fw@strlen.de>
    Date:   Thu Aug 25 15:33:31 2016 +0200

    netfilter: conntrack: get rid of conntrack timer

    With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
    Eric Dumazet pointed out during netfilter workshop 2016.

    Eric also says: "Another reason was the fact that Thomas was about to
    change max timer range [..]" (500462a9de657f8, 'timers: Switch to
    a non-cascading wheel').

    Remove the timer and use a 32bit jiffies value containing timestamp until
    entry is valid.

    During conntrack lookup, even before doing tuple comparision, check
    the timeout value and evict the entry in case it is too old.

    The dying bit is used as a synchronization point to avoid races where
    multiple cpus try to evict the same entry.

    Because lookup is always lockless, we need to bump the refcnt once
    when we evict, else we could try to evict already-dead entry that
    is being recycled.

    This is the standard/expected way when conntrack entries are destroyed.

    Followup patches will introduce garbage colliction via work queue
    and further places where we can reap obsoleted entries (e.g. during
    netlink dumps), this is needed to avoid expired conntracks from hanging
    around for too long when lookup rate is low after a busy period.

    Signed-off-by: Florian Westphal <fw@strlen.de>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Upstream commit f330a7fdbe16 ("netfilter: conntrack: get rid of
conntrack timer") changes the way nf_ct_delete() is called.  Prior to
commit the call pattern was like this:

       if (del_timer(&ct->timeout))
               nf_ct_delete(ct, ...);

After this change nf_ct_delete() is called directly:

       nf_ct_delete(ct, ...);

This patch provides a replacement implementation for nf_ct_delete()
that first calls the del_timer().  This replacement is only used if
the struct nf_conn has member 'timeout' of type 'struct timer_list'.

The following patch introduces the first caller to nf_ct_delete() in
the OVS kernel module.

Linux <3.12 does not have nf_ct_delete() at all, so we inline it if it
does not exist.  The inlined code is from 3.11 death_by_timeout(),
which in later versions simply calls nf_ct_delete().

Upstream commit 02982c27ba1e1bd9f9d4747214e19ca83aa88d0e introduced
nf_ct_delete() in Linux 3.12.  This commit has the original code that
is being inlined here.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
---
 acinclude.m4                                       |  6 ++++
 .../include/net/netfilter/nf_conntrack_core.h      | 37 ++++++++++++++++++++++
 2 files changed, 43 insertions(+)

Patch hide | download patch | download mbox

diff --git a/acinclude.m4 b/acinclude.m4
index 751dc63..f7bc53f 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -525,6 +525,12 @@  AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
   OVS_FIND_FIELD_IFELSE([$KSRC/include/linux/netfilter_ipv6.h], [nf_ipv6_ops],
                         [fragment.*sock], [OVS_DEFINE([HAVE_NF_IPV6_OPS_FRAGMENT])])
 
+  OVS_FIND_FIELD_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
+                        [nf_conn], [struct timer_list[[ \t]]*timeout],
+                        [OVS_DEFINE([HAVE_NF_CONN_TIMER])])
+  OVS_GREP_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
+                  [nf_ct_delete(], [OVS_DEFINE([HAVE_NF_CT_DELETE])])
+
   OVS_FIND_PARAM_IFELSE([$KSRC/include/net/netfilter/nf_conntrack.h],
                   [nf_ct_tmpl_alloc], [nf_conntrack_zone],
                   [OVS_DEFINE([HAVE_NF_CT_TMPL_ALLOC_TAKES_STRUCT_ZONE])])
diff --git a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
index 16b57a6..715e1d5 100644
--- a/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
+++ b/datapath/linux/compat/include/net/netfilter/nf_conntrack_core.h
@@ -88,4 +88,41 @@  static unsigned int rpl_nf_conntrack_in(struct net *net, u_int8_t pf,
 #define nf_conntrack_in rpl_nf_conntrack_in
 #endif /* < 4.10 */
 
+#ifdef HAVE_NF_CONN_TIMER
+
+#ifndef HAVE_NF_CT_DELETE
+#include <net/netfilter/nf_conntrack_timestamp.h>
+#endif
+
+static inline bool rpl_nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
+{
+	if (del_timer(&ct->timeout))
+#ifdef HAVE_NF_CT_DELETE
+		return nf_ct_delete(ct, portid, report);
+#else
+	{
+		struct nf_conn_tstamp *tstamp;
+
+		tstamp = nf_conn_tstamp_find(ct);
+		if (tstamp && tstamp->stop == 0)
+			tstamp->stop = ktime_to_ns(ktime_get_real());
+
+		if (!test_bit(IPS_DYING_BIT, &ct->status) &&
+		    unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
+			/* destroy event was not delivered */
+			nf_ct_delete_from_lists(ct);
+			nf_ct_dying_timeout(ct);
+			return false;
+		}
+		set_bit(IPS_DYING_BIT, &ct->status);
+		nf_ct_delete_from_lists(ct);
+		nf_ct_put(ct);
+		return true;
+	}
+#endif
+	return false;
+}
+#define nf_ct_delete rpl_nf_ct_delete
+#endif /* HAVE_NF_CONN_TIMER */
+
 #endif /* _NF_CONNTRACK_CORE_WRAPPER_H */