From patchwork Wed Jul 18 20:56:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 945909 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="ia8JH7Gr"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41W8b33GWKz9s4s for ; Thu, 19 Jul 2018 06:56:47 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729594AbeGRVgY (ORCPT ); Wed, 18 Jul 2018 17:36:24 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:43329 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726633AbeGRVgX (ORCPT ); Wed, 18 Jul 2018 17:36:23 -0400 Received: by mail-pl0-f66.google.com with SMTP id o7-v6so2563190plk.10 for ; Wed, 18 Jul 2018 13:56:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=w0369p8DlE0DHVYXoKLeATzaWg853ZGV8eUVWdS4SM0=; b=ia8JH7Grpz+tpWf0mLPfkFmrcz3b1hm6TI8c/qnRlf9SY8VwpP9LAjkdjNc6XQDeUZ FnIZL9FAjeBDc4Rj/Cik4yxudBTVB/64cUQST4qDXnN6gFC/6dXWzX5YI2KXPQWrmhL1 YDy8SXWfJ46ZDp+6DlJstIQ4voqnSRHCasJt9+kDMja4O9TWnh/sBruls7kotAXECJcu AG72JCh2k1NpYyotV7KWDs3p1+Sa531aFSBmVG2VxPXypUAnmBIV8UmFzUN2qEe6PDCS 8S6b1tUmRML1kzPvssk+VPy11B7oKcVHpEWixLTJihjHdnWWSJNsR8Hs7Hfw9+gwjBNs S5Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=w0369p8DlE0DHVYXoKLeATzaWg853ZGV8eUVWdS4SM0=; b=f1OZhAwsGy9gyBYCd3QIiv0YoW4IPpb6NdhZN7ZFDUBfG5szrOwtbFj0ouZrLZlW+N twbe0266d6+O2vxSfwKuIL+9B5LUGKyLGwjo/QbJ5bt9ST2htRfKf5rst+FN4fJCbgZS /f9wTrcWUcnUVdQYHgcG7TXFnLzfUoQ0GRC0VI3pb5LdL41R7WURSTXrsuTtw/Aidzeb s8VXMy4HTQU3U9GxT9nkgmm5ffhhmRZldzLjnjLIFbUwvsUN3ejoXY7qaYMt9tAcEAgC jX+NjPlt7HHWMvWUDCaCw0tYsd35Gbo3KZlGU4SAkjDVgIw2+kFsAjsinRfOL7f9WxeY ZPMQ== X-Gm-Message-State: AOUpUlG4AY/AOIuBjo92erLRYf5q4r+BdDy/F2NhOe+szHu373MV5Kmx RMxIW+RmNxHXPaH6k/hq6KlcZw== X-Google-Smtp-Source: AAOMgpdg1bA58kAukKEfJJXWq64RGDDWuwnNst4rI1MegCYvcCNGxT9SymBN8Z9bcHE/BJabFJuh1w== X-Received: by 2002:a17:902:b594:: with SMTP id a20-v6mr7445604pls.140.1531947403866; Wed, 18 Jul 2018 13:56:43 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id r19-v6sm9328167pgo.68.2018.07.18.13.56.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Jul 2018 13:56:42 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, brakmo@fb.com, ysseung@google.com, Yuchung Cheng Subject: [PATCH net 2/3] tcp: do not cancel delay-AcK on DCTCP special ACK Date: Wed, 18 Jul 2018 13:56:35 -0700 Message-Id: <20180718205636.210731-3-ycheng@google.com> X-Mailer: git-send-email 2.18.0.203.gfac676dfb9-goog In-Reply-To: <20180718205636.210731-1-ycheng@google.com> References: <20180718205636.210731-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently when a DCTCP receiver delays an ACK and receive a data packet with a different CE mark from the previous one's, it sends two immediate ACKs acking previous and latest sequences respectly (for ECN accounting). Previously sending the first ACK may mark off the delayed ACK timer (tcp_event_ack_sent). This may subsequently prevent sending the second ACK to acknowledge the latest sequence (tcp_ack_snd_check). The culprit is that tcp_send_ack() assumes it always acknowleges the latest sequence, which is not true for the first special ACK. The fix is to not make the assumption in tcp_send_ack and check the actual ack sequence before cancelling the delayed ACK. Further it's safer to pass the ack sequence number as a local variable into tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid future bugs like this. Reported-by: Neal Cardwell Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: Eric Dumazet --- include/net/tcp.h | 1 + net/ipv4/tcp_dctcp.c | 34 ++++------------------------------ net/ipv4/tcp_output.c | 10 +++++++--- 3 files changed, 12 insertions(+), 33 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 3482d13d655b..a08de496d1b2 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -539,6 +539,7 @@ void tcp_send_fin(struct sock *sk); void tcp_send_active_reset(struct sock *sk, gfp_t priority); int tcp_send_synack(struct sock *); void tcp_push_one(struct sock *, unsigned int mss_now); +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt); void tcp_send_ack(struct sock *sk); void tcp_send_delayed_ack(struct sock *sk); void tcp_send_loss_probe(struct sock *sk); diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 5869f89ca656..078328afbfe3 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -133,21 +133,8 @@ static void dctcp_ce_state_0_to_1(struct sock *sk) * ACK has not sent yet. */ if (!ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=0. */ - tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; - } + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 1; @@ -164,21 +151,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk) * ACK has not sent yet. */ if (ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=1. */ - tp->ecn_flags |= TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; - } + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ee1b0705321d..c4172c1fb198 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -160,7 +160,8 @@ static void tcp_event_data_sent(struct tcp_sock *tp, } /* Account for an ACK we sent. */ -static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts) +static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts, + u32 rcv_nxt) { struct tcp_sock *tp = tcp_sk(sk); @@ -171,6 +172,9 @@ static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts) if (hrtimer_try_to_cancel(&tp->compressed_ack_timer) == 1) __sock_put(sk); } + + if (unlikely(rcv_nxt != tp->rcv_nxt)) + return; /* Special ACK sent by DCTCP to reflect ECN */ tcp_dec_quickack_mode(sk, pkts); inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK); } @@ -1141,7 +1145,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, icsk->icsk_af_ops->send_check(sk, skb); if (likely(tcb->tcp_flags & TCPHDR_ACK)) - tcp_event_ack_sent(sk, tcp_skb_pcount(skb)); + tcp_event_ack_sent(sk, tcp_skb_pcount(skb), rcv_nxt); if (skb->len != tcp_header_size) { tcp_event_data_sent(tp, sk); @@ -3613,12 +3617,12 @@ void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) /* Send it off, this clears delayed acks for us. */ __tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0, rcv_nxt); } +EXPORT_SYMBOL_GPL(__tcp_send_ack); void tcp_send_ack(struct sock *sk) { __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt); } -EXPORT_SYMBOL_GPL(tcp_send_ack); /* This routine sends a packet with an out of date sequence * number. It assumes the other end will try to ack it.