From patchwork Wed Jun 5 06:41:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?5aec56uL5Lic?= X-Patchwork-Id: 1110332 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=jd.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45JfMV64l9z9s4Y for ; Wed, 5 Jun 2019 16:41:25 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id B4A78C00; Wed, 5 Jun 2019 06:41:22 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 889F7B59 for ; Wed, 5 Jun 2019 06:41:20 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from smtp3.jd.com (smtp3.jd.com [59.151.64.88]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C953F34F for ; Wed, 5 Jun 2019 06:41:15 +0000 (UTC) Received: from BJMAILD1MBX89.360buyAD.local (172.31.0.89) by BJMAILD1MBX44.360buyAD.local (172.31.0.44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1415.2; Wed, 5 Jun 2019 14:41:10 +0800 Received: from BJMAILD1MBX70.360buyAD.local (172.31.0.70) by BJMAILD1MBX89.360buyAD.local (172.31.0.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1415.2; Wed, 5 Jun 2019 14:41:10 +0800 Received: from BJMAILD1MBX70.360buyAD.local ([fe80::e137:8c01:c5ea:adea]) by BJMAILD1MBX70.360buyAD.local ([fe80::e137:8c01:c5ea:adea%18]) with mapi id 15.01.1415.002; Wed, 5 Jun 2019 14:41:10 +0800 From: =?utf-8?b?5aec56uL5Lic?= To: Darrell Ball Thread-Topic: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack Thread-Index: AdUQUYwDyRpr0wdcTK+oOQzsS2j16ABvxzkAARgRFoAAOTI98AANrw4AAAMpbwAAj9Y/oAACOiEAACYJHsAAC40JgAAmQzRQ//+4FQD//2sBYA== Date: Wed, 5 Jun 2019 06:41:10 +0000 Message-ID: References: <52f59b3d7ed6405e89a4970e912569c4@jd.com> <0bddbceafb734ada85a047815a533112@jd.com> <1e7a02d906e946d79355c6ccbe2e3d24@jd.com> <0fd5a076b6cc41eab247d9b892061af5@jd.com> <2cb6f2dd5e664aada825501d554d488c@jd.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.31.14.12] MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00, FROM_EXCESS_BASE64, HTML_MESSAGE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Content-Filtered-By: Mailman/MimeDel 2.1.12 Cc: "dev@openvswitch.org" , =?utf-8?b?546L5b+X5YWL?= Subject: [ovs-dev] =?utf-8?b?562U5aSNOiAgW1BBVENIXSBjb25udHJhY2s6IGFkZCB0?= =?utf-8?q?cp=5Fin=5Fliberal_option_in_userspace_conntrack?= X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org comments inline. 发件人: Darrell Ball 发送时间: 2019年6月5日 13:31 收件人: 姜立东 抄送: dev@openvswitch.org; 王志克 主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack On Tue, Jun 4, 2019 at 7:13 PM 姜立东 > wrote: Please check comments inline. BR, Lidong 发件人: Darrell Ball > 发送时间: 2019年6月4日 23:33 收件人: 姜立东 > 抄送: dev@openvswitch.org; 王志克 > 主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack On Mon, Jun 3, 2019 at 7:59 PM 姜立东 > wrote: Hi Darrell, The “tighter checking” on TCP_RST in kernel, do you mean processes under “TCP_CONNTRACK_CLOSE” case? There is only one situation(when th->seq is less than peer td_maxack, as mentioned) may leads to explicitly –NF_ACCEPT, kernel implementation just lets most TCP_RST packets pass when TCP liberal mode is enabled. yep, this is the tighter checking the kernel does for RSTs in general Userspace code has a tighter check for RSTs Why not follow kernel tight checking then? ☺ Meanwhile, I think the purpose of TCP liberal mode is to skip TCP SEQ check in conntrack, when SEQ out of sync is normal situation in some application scenarios. For example, TCP_RST packets can have very different SEQ than what is captured in conntrack in hardware-offloading scenario, because most of packets are forwarded by hardware. TCP resets are used in selected valid scenarios, but are more often associated with DOS scenarios, so they are a special case Removing most or all protection from reset attacks seems irresponsible. TCP reset happens in few scenarios in theory, but it can happen frequently in big scale applicant scenario with huge connections, some software clients use sensible shorter wait time, they may close socket before server response. So it depends on TCP RST function to free server resource, it is necessary requirement just as what we can observer in our network every day, that is why we need to keep TCP_RST function working either in kernel conntrack and in user space conntrack. That is common; however, the question is why not pass through the RST in offload mode in this scenario and let conntrack do cleanup on reuse, if/when necessary ? If pass through TCP_RST in hardware, it leaves garbage information in OVS conntrack, while this connection has been cleared in both endpoints. When/if same port is reused, this garbage information may cause other problems. Also such solution makes OVS conntrack connection information less valuable, once flow is offloaded, it’s information is meaningless. Meanwhile, kernel implementation of this feature has been there years, we don’t see serious security issue is involved. :-) Why don’t we do less limitation, leave the decision to user of this mode, considering TCP liberal is mostly related to special deployment scenario. 'no-tcp-seq-chk' mode is a related feature and can be easily incrementally added or in lieu of 'liberal' mode; this will do as you require. Hardware offload implementations can be done well or poorly; catering to poor implementations at the cost to every other use case may not be the best approach. No matter good or bad design, TCP RST is needed to let conntrack know that this connection is closing, and recycle this connection in short time. In this scenario, TCP_RST can work normally with kernel TCP liberal mode because it is passed to destination, but won’t work with user space conntrack with current change due to being dropped. I think tighter check is good for normal situation, but may be conflict with the purpose of TCP liberal mode by putting more limitations. BR, Lidong 发件人: Darrell Ball > 发送时间: 2019年6月3日 23:53 收件人: 姜立东 > 抄送: dev@openvswitch.org; 王志克 > 主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack On Mon, Jun 3, 2019 at 12:21 AM 姜立东 > wrote: Hi Darrell, By checking latest patch, we notice it has inconsistent behavior on TCP_RST packet with kernel tcp_liberal option. In kernel, TCP_RST packet is only checked for TCP_CONNTRACK_CLOSE state, TCP_CONNTRACK_CLOSE is the 'next state' on receiving RST TCP_RST packet is dropped if its seq is less than peer acked. In other cases, TCP_RST packets are processed in tcp_in_window() as other packets, if tcp_liberal is enabled, they always are accepted. On receipt of a RST in any present state, tighter checking is done; this checking is done before tcp_in_window() is executed. Do you mean “tighter checking” is processes under “TCP_CONNTRACK_CLOSE” case? There is Please refer to tcp_packet() and tcp_in_window() in kernel conntrack. Current implementation may cause TCP_RST packet is marked as INVALID by conntrack due to old seq info in conntrack in hardware-offloading scenario. So that we suggest to move conntrack_get_tcp_liberal(ct) up in this check to include TCP_RST packet as below, I think the tighter checking for RSTs I am proposing is generally consistent, although BSD and Linux are somewhat different. - if (SEQ_GEQ(src->seqhi, end) - /* Last octet inside other's window space */ - && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) - /* Retrans: not more than one window back */ - && (ackskew >= -MAXACKWINDOW) - /* Acking not more than one reassembled fragment backwards */ - && (ackskew <= (MAXACKWINDOW << sws)) - /* Acking not more than one window forward */ + if ((SEQ_GEQ(src->seqhi, end) + /* Last octet inside other's window space */ + && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) + /* Retrans: not more than one window back */ + && ackskew >= -MAXACKWINDOW + /* Acking not more than one reassembled fragment backwards */ + && ackskew <= (MAXACKWINDOW << sws)) + /* Acking not more than one window forward */ && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo - || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) { + || (orig_seq == src->seqlo + 1) + || (orig_seq + 1 == src->seqlo))) + || conntrack_get_tcp_liberal(ct)) { Thanks, Lidong 发件人: Darrell Ball > 发送时间: 2019年6月1日 2:11 收件人: 姜立东 > 抄送: dev@openvswitch.org; 王志克 > 主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack On Fri, May 31, 2019 at 9:40 AM Darrell Ball > wrote: On Fri, May 31, 2019 at 2:42 AM 姜立东 > wrote: Hi Darrell, Thanks for prompt action. I think there are 2 main differences in your change, 1. Replace Open_vswitch.other_config by dpctl CLI command. 2. One more case to check tcp_liberal as below. @@ -333,10 +335,12 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 || src->state >= CT_DPIF_TCPS_FIN_WAIT_2) - && SEQ_GEQ(src->seqhi + MAXACKWINDOW, end) - /* Within a window forward of the originating packet */ - && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) { - /* Within a window backward of the originating packet */ + && ((SEQ_GEQ(src->seqhi + MAXACKWINDOW, end) + /* Within a window forward of the originating packet */ + && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) + /* Within a window backward of the originating packet */ + || (conntrack_get_tcp_liberal(ct) + && (tcp_flags & TCP_RST) == 0))) For item 1, it is fine for us. For item 2, we didn’t actually hit into this conditional branch in our application, but considering conntrack is in the middle of this path, it is reasonable to add this check and leave SEQ validation to endpoints’ local TCP state machine. So it looks good to us. Lidong I just noticed you were referring to the "else if" condition above This change had been dropped in v2 since it is not needed, here: https://patchwork.ozlabs.org/patch/1107761/ The change in the 'if' condition check is the main algorithm change; you might want to compare it to the proposed change that you had sent out. Thanks Darrell diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 397aca1..e95d2f3 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -272,16 +272,18 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, int ackskew = check_ackskew ? dst->seqlo - ack : 0; #define MAXACKWINDOW (0xffff + 1500) /* 1500 is an arbitrary fudge factor */ - if (SEQ_GEQ(src->seqhi, end) - /* Last octet inside other's window space */ - && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) - /* Retrans: not more than one window back */ - && (ackskew >= -MAXACKWINDOW) - /* Acking not more than one reassembled fragment backwards */ - && (ackskew <= (MAXACKWINDOW << sws)) - /* Acking not more than one window forward */ + if (((SEQ_GEQ(src->seqhi, end) + /* Last octet inside other's window space */ + && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) + /* Retrans: not more than one window back */ + && ackskew >= -MAXACKWINDOW + /* Acking not more than one reassembled fragment backwards */ + && ackskew <= (MAXACKWINDOW << sws)) + || conntrack_get_tcp_liberal(ct)) + /* Acking not more than one window forward */ && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo - || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) { + || (orig_seq == src->seqlo + 1) + || (orig_seq + 1 == src->seqlo))) { /* Require an exact/+1 sequence match on resets when possible */ /* update max window */ Thanks Lidong The algorithm I used adheres to the liberal mode definition. For future reference, another important aspect is test support. Darrell BR, Lidong 发件人: Darrell Ball > 发送时间: 2019年5月30日 14:51 收件人: 姜立东 > 抄送: dev@openvswitch.org; 王志克 > 主题: Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack Hi Lidong/Zhike I sent an alternative implementation: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359364.html Pls take a look and check if it meets your requirements. Thanks Darrell On Fri, May 24, 2019 at 10:11 AM Darrell Ball > wrote: Thanks for the patch Lidong/Zhike I took an initial look and will send a response next week. Darrell On Tue, May 21, 2019 at 8:53 PM 姜立东 > wrote: From 3ce112684921bca74839e109fda91848aa024a54 Mon Sep 17 00:00:00 2001 From: Jiang Lidong > Date: Wed, 22 May 2019 11:21:34 +0800 Subject: [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack Adding similar cp_in_liberal option in userspace conntrack as kernel conntrack does to skip seq check on tcp connection. It prevents packet is marked as INVALID by stable seq info in conntrack connection. This option can help to make traffic survive in hardware offloading cases, especially when traffic is being moved back to software path from hardware forwarding engine. Signed-off-by: Lidong Jiang > Signed-off-by: Zhike Wang > --- lib/conntrack-private.h | 2 ++ lib/conntrack-tcp.c | 5 +++-- lib/conntrack.c | 6 ++++++ lib/conntrack.h | 4 +++- lib/dpif-netdev.c | 6 ++++++ 5 files changed, 20 insertions(+), 3 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 51b7d7f..9bc99cd 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -172,6 +172,8 @@ struct conntrack { /* Fragmentation handling context. */ struct ipf *ipf; + + bool tcp_be_liberal; }; /* Lock acquisition order: diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 397aca1..61abafb 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -272,7 +272,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, int ackskew = check_ackskew ? dst->seqlo - ack : 0; #define MAXACKWINDOW (0xffff + 1500) /* 1500 is an arbitrary fudge factor */ - if (SEQ_GEQ(src->seqhi, end) + if ((SEQ_GEQ(src->seqhi, end) /* Last octet inside other's window space */ && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) /* Retrans: not more than one window back */ @@ -281,7 +281,8 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, && (ackskew <= (MAXACKWINDOW << sws)) /* Acking not more than one window forward */ && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo - || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) { + || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) + || (ct->tcp_be_liberal)) { /* Require an exact/+1 sequence match on resets when possible */ /* update max window */ diff --git a/lib/conntrack.c b/lib/conntrack.c index 6711f5e..bd92710 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2282,6 +2282,12 @@ conntrack_ipf_ctx(struct conntrack *ct) return ct->ipf; } +void +conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled) +{ + ct->tcp_be_liberal = enabled; +} + int conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump, const uint16_t *pzone, int *ptot_bkts) diff --git a/lib/conntrack.h b/lib/conntrack.h index 2012150..b8d799d 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -119,5 +119,7 @@ int conntrack_set_maxconns(struct conntrack *ct, uint32_t maxconns); int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns); int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns); struct ipf *conntrack_ipf_ctx(struct conntrack *ct); - + +void conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled); + #endif /* conntrack.h */ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5a6f2ab..ae6a18e 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3823,6 +3823,12 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) uint32_t tx_flush_interval, cur_tx_flush_interval; uint64_t rebalance_intvl; + bool tcp_be_liberal = smap_get_bool(other_config, + "conntrack_tcp_be_liberal", + false); + + conntrack_set_tcp_be_liberal(&dp->conntrack, tcp_be_liberal); + tx_flush_interval = smap_get_int(other_config, "tx-flush-interval", DEFAULT_TX_FLUSH_INTERVAL); atomic_read_relaxed(&dp->tx_flush_interval, &cur_tx_flush_interval); -- 1.8.3.1