From patchwork Tue Feb 13 14:14:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 872860 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Je+HrHUe"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zgl0F6Jyzz9t5w for ; Wed, 14 Feb 2018 01:14:21 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965102AbeBMOOS (ORCPT ); Tue, 13 Feb 2018 09:14:18 -0500 Received: from mail-io0-f194.google.com ([209.85.223.194]:41361 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964935AbeBMOOQ (ORCPT ); Tue, 13 Feb 2018 09:14:16 -0500 Received: by mail-io0-f194.google.com with SMTP id f4so21407025ioh.8 for ; Tue, 13 Feb 2018 06:14:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:subject:from:to:cc:date:mime-version :content-transfer-encoding; bh=SWDc1fq/kOQT5lVbjiozYSv0NRLP+x3EP/u5hvoXUgo=; b=Je+HrHUepkex14usD1dL4PG1aDsH+Yz0GctDnZBn43A2APR4d2ZRz+DhNSsgY3eeTP egitvOgaxlkup9f/JVIIhpYO53nbyFfwovNGNjaRGLyrrsi3TI+JabTc/G3eDjVeIvAg dnl2vL9mVD7/sg+oc01jBDwUFVwL8BMd1Isz5WT2lnWhseqKQ99MPfmDs6c7UhZ7T9LE 4JXg3lmu6dNuQyUWIHjzpGNQhStVAFBcDBF5IpZJ9e2vfztZZ+dOF9052txFdVg6pG7W x1mdNqc1kHyLmrDQJROZ38b9onsLfnR2yStveX/8R+Z+f4J75TDafPhPzgnFppkuojPk toUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:mime-version :content-transfer-encoding; bh=SWDc1fq/kOQT5lVbjiozYSv0NRLP+x3EP/u5hvoXUgo=; b=mA4+XhN5zGLeasFvI+NB/4BPXTXytVrcW1/gmKO8jKGYYO2ooQMwTW3WELo0lBFCs8 FaeHLQd+hNmiOr4XAqQJeu/eMXRzYs8X/dibrCiKZzcObDswz/MLygZT2ClCTmuABXVU XZwvCE+nSk8/w+e4K+bcANWOjrcPGvo3WAjPCuLqFxtwZ6Va00BkyHaaagfpatJJQLZU 90HBx+ptlWWW1nFbPys1CMNHtIK0AL/R7+HnYhlulkIpxtPwloWevcpfJl2o6uxlsC/Y OMPwXHhaRXqsWvfR8GtsNnrAXY7SPoYDmHqMURJ7Y5B3uxoGlAk05ho7sgEEoqI1gp0f qvHQ== X-Gm-Message-State: APf1xPCQpd9NJatNWdwGi7gDykVf1Cs5TMSKd9vU/rpQCAGJ+Sv7sAaF fZdfWAOpUUrLRXRah+GK8uc= X-Google-Smtp-Source: AH8x224H1flu881pXKFSgmtzVPPYIUEffuMxiW1jx7Xtku48A1RiZkEnnsfbS1+U+0KGcJvoNH39iQ== X-Received: by 10.107.160.211 with SMTP id j202mr1355525ioe.181.1518531254860; Tue, 13 Feb 2018 06:14:14 -0800 (PST) Received: from edumazet-glaptop3.lan (c-67-180-167-114.hsd1.ca.comcast.net. [67.180.167.114]) by smtp.googlemail.com with ESMTPSA id v2sm13315374iob.72.2018.02.13.06.14.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 13 Feb 2018 06:14:14 -0800 (PST) Message-ID: <1518531252.3715.178.camel@gmail.com> Subject: [PATCH net-next] tcp: try to keep packet if SYN_RCV race is lost From: Eric Dumazet To: David Miller Cc: netdev , =?utf-8?b?67Cw7ISd7KeE?= Date: Tue, 13 Feb 2018 06:14:12 -0800 X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric Dumazet 배석진 reported that in some situations, packets for a given 5-tuple end up being processed by different CPUS. This involves RPS, and fragmentation. 배석진 is seeing packet drops when a SYN_RECV request socket is moved into ESTABLISH state. Other states are protected by socket lock. This is caused by a CPU losing the race, and simply not caring enough. Since this seems to occur frequently, we can do better and perform a second lookup. Note that all needed memory barriers are already in the existing code, thanks to the spin_lock()/spin_unlock() pair in inet_ehash_insert() and reqsk_put(). The second lookup must find the new socket, unless it has already been accepted and closed by another cpu. Note that the fragmentation could be avoided in the first place by use of a correct TCP MSS option in the SYN{ACK} packet, but this does not mean we can not be more robust. Many thanks to 배석진 for a very detailed analysis. Reported-by: 배석진 Signed-off-by: Eric Dumazet --- include/net/tcp.h | 3 ++- net/ipv4/tcp_input.c | 4 +++- net/ipv4/tcp_ipv4.c | 13 ++++++++++++- net/ipv4/tcp_minisocks.c | 3 ++- net/ipv6/tcp_ipv6.c | 13 ++++++++++++- 5 files changed, 31 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index e3fc667f9ac2601d8f9cb50261a7948c41709664..92b06c6e7732ad7c61b580427fc085fa0dff1063 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -374,7 +374,8 @@ enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, const struct tcphdr *th); struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, - struct request_sock *req, bool fastopen); + struct request_sock *req, bool fastopen, + bool *lost_race); int tcp_child_process(struct sock *parent, struct sock *child, struct sk_buff *skb); void tcp_enter_loss(struct sock *sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 575d3c1fb6e835e225834ca45f58b74ea29e000b..a6b48f6253e3f91d396bf6b03f06be285ba1006c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5870,10 +5870,12 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) tp->rx_opt.saw_tstamp = 0; req = tp->fastopen_rsk; if (req) { + bool req_stolen; + WARN_ON_ONCE(sk->sk_state != TCP_SYN_RECV && sk->sk_state != TCP_FIN_WAIT1); - if (!tcp_check_req(sk, skb, req, true)) + if (!tcp_check_req(sk, skb, req, true, &req_stolen)) goto discard; } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f8ad397e285e9b8db0b04f8abc30a42f22294ef9..6d7e0c061dae14cdea66af73a77b5b7232085cd3 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1672,6 +1672,7 @@ int tcp_v4_rcv(struct sk_buff *skb) if (sk->sk_state == TCP_NEW_SYN_RECV) { struct request_sock *req = inet_reqsk(sk); + bool req_stolen = false; struct sock *nsk; sk = req->rsk_listener; @@ -1694,10 +1695,20 @@ int tcp_v4_rcv(struct sk_buff *skb) th = (const struct tcphdr *)skb->data; iph = ip_hdr(skb); tcp_v4_fill_cb(skb, iph, th); - nsk = tcp_check_req(sk, skb, req, false); + nsk = tcp_check_req(sk, skb, req, false, &req_stolen); } if (!nsk) { reqsk_put(req); + if (req_stolen) { + /* Another cpu got exclusive access to req + * and created a full blown socket. + * Try to feed this packet to this socket + * instead of discarding it. + */ + tcp_v4_restore_cb(skb); + sock_put(sk); + goto lookup; + } goto discard_and_relse; } if (nsk == sk) { diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index a8384b0c11f8fa589e2ed5311899b62c80a269f8..e7e36433cdb5d9aabb5d194ef6395f3c9e415d56 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -578,7 +578,7 @@ EXPORT_SYMBOL(tcp_create_openreq_child); struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, struct request_sock *req, - bool fastopen) + bool fastopen, bool *req_stolen) { struct tcp_options_received tmp_opt; struct sock *child; @@ -785,6 +785,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, sock_rps_save_rxhash(child, skb); tcp_synack_rtt_meas(child, req); + *req_stolen = !own_req; return inet_csk_complete_hashdance(sk, child, req, own_req); listen_overflow: diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 412139f4eccd96923daaea064cd9fb8be13f5916..883df0ad5bfe9d5373c0f7ed37107cdc57959569 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1451,6 +1451,7 @@ static int tcp_v6_rcv(struct sk_buff *skb) if (sk->sk_state == TCP_NEW_SYN_RECV) { struct request_sock *req = inet_reqsk(sk); + bool req_stolen = false; struct sock *nsk; sk = req->rsk_listener; @@ -1470,10 +1471,20 @@ static int tcp_v6_rcv(struct sk_buff *skb) th = (const struct tcphdr *)skb->data; hdr = ipv6_hdr(skb); tcp_v6_fill_cb(skb, hdr, th); - nsk = tcp_check_req(sk, skb, req, false); + nsk = tcp_check_req(sk, skb, req, false, &req_stolen); } if (!nsk) { reqsk_put(req); + if (req_stolen) { + /* Another cpu got exclusive access to req + * and created a full blown socket. + * Try to feed this packet to this socket + * instead of discarding it. + */ + tcp_v6_restore_cb(skb); + sock_put(sk); + goto lookup; + } goto discard_and_relse; } if (nsk == sk) {