[ovs-dev] 答复: [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack
diff mbox series

Message ID b4da3f3d3a3e43539fd6bc72d6815725@jd.com
State New
Headers show
Series
  • [ovs-dev] 答复: [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack
Related show

Commit Message

姜立东 June 6, 2019, 1:42 a.m. UTC
Thanks Darrell.
So glad to make the requirement clear by this discussion, expecting to see the latest patch.

发件人: Darrell Ball <dlu998@gmail.com>
发送时间: 2019年6月6日 1:12
收件人: 姜立东 <jianglidong3@jd.com>
抄送: dev@openvswitch.org; 王志克 <wangzhike@jd.com>
主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack



On Wed, Jun 5, 2019 at 9:35 AM Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>> wrote:


On Tue, Jun 4, 2019 at 11:41 PM 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>> wrote:
comments inline.

发件人: Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>>
发送时间: 2019年6月5日 13:31
收件人: 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>>
抄送: dev@openvswitch.org<mailto:dev@openvswitch.org>; 王志克 <wangzhike@jd.com<mailto:wangzhike@jd.com>>
主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack



On Tue, Jun 4, 2019 at 7:13 PM 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>> wrote:
Please check comments inline.

BR, Lidong

发件人: Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>>
发送时间: 2019年6月4日 23:33
收件人: 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>>
抄送: dev@openvswitch.org<mailto:dev@openvswitch.org>; 王志克 <wangzhike@jd.com<mailto:wangzhike@jd.com>>
主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack



On Mon, Jun 3, 2019 at 7:59 PM 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>> wrote:
Hi Darrell,

The “tighter checking” on TCP_RST in kernel, do you mean processes under “TCP_CONNTRACK_CLOSE” case?
There is only one situation(when th->seq is less than peer td_maxack, as mentioned) may leads to explicitly –NF_ACCEPT, kernel implementation just lets most TCP_RST packets pass when TCP liberal mode is enabled.

yep, this is the tighter checking the kernel does for RSTs in general
Userspace code has a tighter check for RSTs
Why not follow kernel tight checking then? ☺

Meanwhile, I think the purpose of TCP liberal mode is to skip TCP SEQ check in conntrack, when SEQ out of sync is normal situation in some application scenarios.
For example, TCP_RST packets can have very different SEQ than what is captured in conntrack in hardware-offloading scenario, because most of packets are forwarded by hardware.

TCP resets are used in selected valid scenarios, but are more often associated with DOS scenarios, so they are a special case
Removing most or all protection from reset attacks seems irresponsible.

TCP reset happens in few scenarios in theory, but it can happen frequently in big scale applicant scenario with huge connections, some software clients use sensible shorter wait time, they may close socket before server response.
So it depends on TCP RST function to free server resource, it is necessary requirement just as what we can observer in our network every day, that is why we need to keep TCP_RST function working either in kernel conntrack and in user space conntrack.

That is common; however, the question is why not pass through the RST in offload mode in this scenario and let conntrack do cleanup on reuse, if/when necessary ?

If pass through TCP_RST in hardware, it leaves garbage information in OVS conntrack, while this connection has been cleared in both endpoints. When/if same port is reused, this garbage information may cause other problems.

'Reuse' is a pre-existing capability that handles this kind of case and the summary behavior in this use case will be similar.

In retrospect, the reuse capability may not meet your requirements in all cases.
Anyways, your second point below is sufficient.

I will modify the patch for 'no-tcp-seq-chk'



Also such solution makes OVS conntrack connection information less valuable, once flow is offloaded, it’s information is meaningless.

From this point in the lifecycle of the connection, there is just a little more to do anyways, 'normally'.

That being said, there can be added value features that may come into play here in the non-offload path.



Meanwhile, kernel implementation of this feature has been there years, we don’t see serious security issue is involved.

:-)

Why don’t we do less limitation, leave the decision to user of this mode, considering TCP liberal is mostly related to special deployment scenario.

'no-tcp-seq-chk' mode is a related feature and can be easily incrementally added or in lieu of 'liberal' mode; this will do as you require.




Hardware offload implementations can be done well or poorly; catering to poor implementations at the cost to every other use case
may not be the best approach.

No matter good or bad design, TCP RST is needed to let conntrack know that this connection is closing, and recycle this connection in short time.

In this scenario, TCP_RST can work normally with kernel TCP liberal mode because it is passed to destination, but won’t work with user space conntrack with current change due to being dropped.

I think tighter check is good for normal situation, but may be conflict with the purpose of TCP liberal mode by putting more limitations.

BR, Lidong

发件人: Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>>
发送时间: 2019年6月3日 23:53
收件人: 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>>
抄送: dev@openvswitch.org<mailto:dev@openvswitch.org>; 王志克 <wangzhike@jd.com<mailto:wangzhike@jd.com>>
主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack



On Mon, Jun 3, 2019 at 12:21 AM 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>> wrote:
Hi Darrell,

By checking latest patch, we notice it has inconsistent behavior on TCP_RST packet with
kernel tcp_liberal option.
In kernel, TCP_RST packet is only checked for TCP_CONNTRACK_CLOSE state,

TCP_CONNTRACK_CLOSE is the 'next state' on receiving RST


TCP_RST
packet is dropped if its seq is less than peer acked.
In other cases, TCP_RST packets are processed in  tcp_in_window() as other packets, if
tcp_liberal is enabled, they always are accepted.

On receipt of a RST in any present state, tighter checking is done; this checking is done
before tcp_in_window<https://elixir.bootlin.com/linux/v4.9.12/ident/tcp_in_window>() is executed.

Do you mean “tighter checking” is processes under “TCP_CONNTRACK_CLOSE” case? There is

Please refer to tcp_packet() and tcp_in_window() in kernel conntrack.

Current implementation may cause TCP_RST packet is marked as INVALID by conntrack
due to old seq info in conntrack in hardware-offloading scenario.
So that we suggest to move conntrack_get_tcp_liberal(ct) up in this check to include
TCP_RST packet as below,

I think the tighter checking for RSTs  I am proposing is generally consistent, although BSD and
Linux are somewhat different.




-    if (SEQ_GEQ(src->seqhi, end)

-        /* Last octet inside other's window space */

-        && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))

-        /* Retrans: not more than one window back */

-        && (ackskew >= -MAXACKWINDOW)

-        /* Acking not more than one reassembled fragment backwards */

-        && (ackskew <= (MAXACKWINDOW << sws))

-        /* Acking not more than one window forward */

+    if ((SEQ_GEQ(src->seqhi, end)

+          /* Last octet inside other's window space */

+          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))

+          /* Retrans: not more than one window back */

+          && ackskew >= -MAXACKWINDOW

+          /* Acking not more than one reassembled fragment backwards */

+          && ackskew <= (MAXACKWINDOW << sws))

+          /* Acking not more than one window forward */

         && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo

-            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {

+             || (orig_seq == src->seqlo + 1)

+             || (orig_seq + 1 == src->seqlo)))

+        || conntrack_get_tcp_liberal(ct)) {




Thanks,
Lidong

发件人: Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>>
发送时间: 2019年6月1日 2:11
收件人: 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>>
抄送: dev@openvswitch.org<mailto:dev@openvswitch.org>; 王志克 <wangzhike@jd.com<mailto:wangzhike@jd.com>>
主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack



On Fri, May 31, 2019 at 9:40 AM Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>> wrote:


On Fri, May 31, 2019 at 2:42 AM 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>> wrote:
Hi Darrell,

Thanks for prompt action.

I think there are 2 main differences in your change,
1. Replace Open_vswitch.other_config by dpctl CLI command.
2. One more case to check tcp_liberal as below.
@@ -333,10 +335,12 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
     } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT
                 || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2
                 || src->state >= CT_DPIF_TCPS_FIN_WAIT_2)
-               && SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)
-               /* Within a window forward of the originating packet */
-               && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) {
-               /* Within a window backward of the originating packet */
+               && ((SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)
+                    /* Within a window forward of the originating packet */
+                    && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW))
+                    /* Within a window backward of the originating packet */
+                   || (conntrack_get_tcp_liberal(ct)
+                       && (tcp_flags & TCP_RST) == 0)))
For item 1, it is fine for us.
For item 2, we didn’t actually hit into this conditional branch in our application, but considering conntrack is in the middle of this path,
it is reasonable to add this check and leave SEQ validation  to endpoints’ local TCP state machine. So it looks good to us.

Lidong

I just noticed you were referring to the "else if" condition above
This change had been dropped in v2 since it is not needed, here:

https://patchwork.ozlabs.org/patch/1107761/

The change in the 'if' condition check is the main algorithm change; you might want to
compare it to the proposed change that you had sent out.

Thanks Darrell




diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c

index 397aca1..e95d2f3 100644

--- a/lib/conntrack-tcp.c

+++ b/lib/conntrack-tcp.c

@@ -272,16 +272,18 @@  tcp_conn_update(struct conntrack *ct, struct conn *conn_,



     int ackskew = check_ackskew ? dst->seqlo - ack : 0;

 #define MAXACKWINDOW (0xffff + 1500)    /* 1500 is an arbitrary fudge factor */

-    if (SEQ_GEQ(src->seqhi, end)

-        /* Last octet inside other's window space */

-        && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))

-        /* Retrans: not more than one window back */

-        && (ackskew >= -MAXACKWINDOW)

-        /* Acking not more than one reassembled fragment backwards */

-        && (ackskew <= (MAXACKWINDOW << sws))

-        /* Acking not more than one window forward */

+    if (((SEQ_GEQ(src->seqhi, end)

+          /* Last octet inside other's window space */

+          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))

+          /* Retrans: not more than one window back */

+          && ackskew >= -MAXACKWINDOW

+          /* Acking not more than one reassembled fragment backwards */

+          && ackskew <= (MAXACKWINDOW << sws))

+         || conntrack_get_tcp_liberal(ct))

+          /* Acking not more than one window forward */

         && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo

-            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {

+             || (orig_seq == src->seqlo + 1)

+             || (orig_seq + 1 == src->seqlo))) {

         /* Require an exact/+1 sequence match on resets when possible */



         /* update max window */





Thanks Lidong

The algorithm I used adheres to the liberal mode definition.

For future reference, another important aspect is test support.

Darrell


BR,
Lidong

发件人: Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>>
发送时间: 2019年5月30日 14:51
收件人: 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>>
抄送: dev@openvswitch.org<mailto:dev@openvswitch.org>; 王志克 <wangzhike@jd.com<mailto:wangzhike@jd.com>>
主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack

Hi Lidong/Zhike

I sent an alternative implementation: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359364.html

Pls take a look and check if it meets your requirements.

Thanks Darrell

On Fri, May 24, 2019 at 10:11 AM Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>> wrote:
Thanks for the patch Lidong/Zhike

I took an initial look and will send a response next week.

Darrell

On Tue, May 21, 2019 at 8:53 PM 姜立东 <jianglidong3@jd.com<mailto:jianglidong3@jd.com>> wrote:
From 3ce112684921bca74839e109fda91848aa024a54 Mon Sep 17 00:00:00 2001
From: Jiang Lidong <jianglidong3@jd.com<mailto:jianglidong3@jd.com>>
Date: Wed, 22 May 2019 11:21:34 +0800
Subject: [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack

Adding similar cp_in_liberal option in userspace conntrack as
kernel conntrack does to skip seq check on tcp connection.
It prevents packet is marked as INVALID by stable seq
info in conntrack connection.

This option can help to make traffic survive in hardware
offloading cases, especially when traffic is being moved
back to software path from hardware forwarding engine.

Signed-off-by: Lidong Jiang <jianglidong3@jd.com<mailto:jianglidong3@jd.com>>
Signed-off-by: Zhike Wang <wangzhike@jd.com<mailto:wangzhike@jd.com>>

---
lib/conntrack-private.h | 2 ++
lib/conntrack-tcp.c     | 5 +++--
lib/conntrack.c         | 6 ++++++
lib/conntrack.h         | 4 +++-
lib/dpif-netdev.c       | 6 ++++++
5 files changed, 20 insertions(+), 3 deletions(-)

Patch
diff mbox series

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 51b7d7f..9bc99cd 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -172,6 +172,8 @@  struct conntrack {
     /* Fragmentation handling context. */
     struct ipf *ipf;
+
+    bool tcp_be_liberal;
};
 /* Lock acquisition order:
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 397aca1..61abafb 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -272,7 +272,7 @@  tcp_conn_update(struct conntrack *ct, struct conn *conn_,
     int ackskew = check_ackskew ? dst->seqlo - ack : 0;
#define MAXACKWINDOW (0xffff + 1500)    /* 1500 is an arbitrary fudge factor */
-    if (SEQ_GEQ(src->seqhi, end)
+    if ((SEQ_GEQ(src->seqhi, end)
         /* Last octet inside other's window space */
         && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
         /* Retrans: not more than one window back */
@@ -281,7 +281,8 @@  tcp_conn_update(struct conntrack *ct, struct conn *conn_,
         && (ackskew <= (MAXACKWINDOW << sws))
         /* Acking not more than one window forward */
         && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
-            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
+            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo)))
+            || (ct->tcp_be_liberal)) {
         /* Require an exact/+1 sequence match on resets when possible */
         /* update max window */
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 6711f5e..bd92710 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2282,6 +2282,12 @@  conntrack_ipf_ctx(struct conntrack *ct)
     return ct->ipf;
}
+void
+conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled)
+{
+    ct->tcp_be_liberal = enabled;
+}
+
int
conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump,
                      const uint16_t *pzone, int *ptot_bkts)
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 2012150..b8d799d 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -119,5 +119,7 @@  int conntrack_set_maxconns(struct conntrack *ct, uint32_t maxconns);
int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
-
+
+void conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled);
+
#endif /* conntrack.h */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5a6f2ab..ae6a18e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3823,6 +3823,12 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
     uint32_t tx_flush_interval, cur_tx_flush_interval;
     uint64_t rebalance_intvl;
+    bool tcp_be_liberal = smap_get_bool(other_config,
+                                        "conntrack_tcp_be_liberal",
+                                        false);
+
+    conntrack_set_tcp_be_liberal(&dp->conntrack, tcp_be_liberal);
+
     tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
                                      DEFAULT_TX_FLUSH_INTERVAL);
     atomic_read_relaxed(&dp->tx_flush_interval, &cur_tx_flush_interval);
--
1.8.3.1