From patchwork Thu Jan 14 23:43:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 567829 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id E51C814032F for ; Fri, 15 Jan 2016 10:47:40 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b=Z5AkJdXX; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756898AbcANXrg (ORCPT ); Thu, 14 Jan 2016 18:47:36 -0500 Received: from mail-pf0-f175.google.com ([209.85.192.175]:36594 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756641AbcANXoP (ORCPT ); Thu, 14 Jan 2016 18:44:15 -0500 Received: by mail-pf0-f175.google.com with SMTP id n128so107469319pfn.3 for ; Thu, 14 Jan 2016 15:44:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=LXkMvIN7m0IWtne09wx0mPwL8M1fmvnGwL420QT7uNE=; b=Z5AkJdXXwCe4UFCrX0Jx2bMubFlE5Ixh7gXzrLd5aBfJQc7DyC2/JGVeNc6ysQhfDh kJQG/jGsDaNIUfGHaPjfvcnB9blNEusPNMtXgNA5Kdczlm/jys/gDr0wDgRG1nV5ERO7 18SNug9mQoUW8amaLhrGlEdcduwsI9eKbmaNnyeNA32WFRiieNqx9qiNZyAUvW1dfI7Y fljEgiy54nbeMnuD06IyUwLp7eY52ceypiCvlx+HlCNWbnR/fcHTqzMoALEyUlVIJbVI 1AnyrbSZbMsC+cDX5xw/CsSsnaDCif78rQG2tvytqjr/4IoiHGl72S36SdDWjUYE9d6g VHqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=LXkMvIN7m0IWtne09wx0mPwL8M1fmvnGwL420QT7uNE=; b=NWr+oxOQTw56Xp52Xkn/DbrigzzdofixPMLpBQu4xD+dP5tr8nbMtXTqWUK/BNI3T1 NZLcll4zaWnlRA6E4HhQH5ueNRj2EQq4H3BhqoErJy/5FxdylKTwXRdilzyZ2Jvhi2sa SxJ0irX9GShQxxrJdZwJo0jNAm9O+3tt3YH34a3iUp+rxLEqe3BQIkcrXCNwEn7NsGjs hS6if0m+3TnqA/kQgXTfYHbLisgxppV4YUkuVK/EdXt/C9mHDoXP6W/TI8idQGwtDkeh 5m2C9e3vRBToJcHL3yr9ts54o2+zEDGIQBOaErNpq/m1CGddxS4x91/U3EmvWA0Cvph/ prGQ== X-Gm-Message-State: ALoCoQkZ0lXQKqRztd8lWzBJAGv1RCUukefdCWWnvA4Gm7X8o084647LyuqKDhLdnMXucMFPsiLiIArtJ9nLMvj0UkaGRPe41A== X-Received: by 10.98.13.77 with SMTP id v74mr10307132pfi.162.1452815054778; Thu, 14 Jan 2016 15:44:14 -0800 (PST) Received: from ycheng.mtv.corp.google.com ([172.17.133.36]) by smtp.gmail.com with ESMTPSA id m75sm11670252pfj.38.2016.01.14.15.44.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 14 Jan 2016 15:44:13 -0800 (PST) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, Yuchung Cheng , Neal Cardwell , Eric Dumazet Subject: [PATCH net-next 1/6] tcp: retransmit after recovery processing and congestion control Date: Thu, 14 Jan 2016 15:43:29 -0800 Message-Id: <1452815014-601-2-git-send-email-ycheng@google.com> X-Mailer: git-send-email 2.6.0.rc2.230.g3dd15c0 In-Reply-To: <1452815014-601-1-git-send-email-ycheng@google.com> References: <1452815014-601-1-git-send-email-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The retransmission and F-RTO transmission currently happen inside recovery state processing (tcp_fastretrans_alert) but before congestion control. This refactoring moves the logic after both s.t. we can determine how much to send (cwnd) before deciding what to send. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: Eric Dumazet --- net/ipv4/tcp_input.c | 58 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 46 insertions(+), 12 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0003d40..482c0b4 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -126,6 +126,10 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2; #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_PSH) #define TCP_HP_BITS (~(TCP_RESERVED_BITS|TCP_FLAG_PSH)) +#define REXMIT_NONE 0 /* no loss recovery to do */ +#define REXMIT_LOST 1 /* retransmit packets marked lost */ +#define REXMIT_NEW 2 /* FRTO-style transmit of unsent/new packets */ + /* Adapt the MSS value used to make delayed ack decision to the * real world. */ @@ -2664,7 +2668,8 @@ static void tcp_enter_recovery(struct sock *sk, bool ece_ack) /* Process an ACK in CA_Loss state. Move to CA_Open if lost data are * recovered or spurious. Otherwise retransmits more on partial ACKs. */ -static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack) +static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack, + int *rexmit) { struct tcp_sock *tp = tcp_sk(sk); bool recovered = !before(tp->snd_una, tp->high_seq); @@ -2686,10 +2691,15 @@ static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack) tp->frto = 0; /* Step 3.a. loss was real */ } else if (flag & FLAG_SND_UNA_ADVANCED && !recovered) { tp->high_seq = tp->snd_nxt; - __tcp_push_pending_frames(sk, tcp_current_mss(sk), - TCP_NAGLE_OFF); - if (after(tp->snd_nxt, tp->high_seq)) - return; /* Step 2.b */ + /* Step 2.b. Try send new data (but deferred until cwnd + * is updated in tcp_ack()). Otherwise fall back to + * the conventional recovery. + */ + if (tcp_send_head(sk) && + after(tcp_wnd_end(tp), tp->snd_nxt)) { + *rexmit = REXMIT_NEW; + return; + } tp->frto = 0; } } @@ -2708,7 +2718,7 @@ static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack) else if (flag & FLAG_SND_UNA_ADVANCED) tcp_reset_reno_sack(tp); } - tcp_xmit_retransmit_queue(sk); + *rexmit = REXMIT_LOST; } /* Undo during fast recovery after partial ACK. */ @@ -2758,7 +2768,7 @@ static bool tcp_try_undo_partial(struct sock *sk, const int acked, */ static void tcp_fastretrans_alert(struct sock *sk, const int acked, const int prior_unsacked, - bool is_dupack, int flag) + bool is_dupack, int flag, int *rexmit) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); @@ -2833,7 +2843,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked, } break; case TCP_CA_Loss: - tcp_process_loss(sk, flag, is_dupack); + tcp_process_loss(sk, flag, is_dupack, rexmit); if (icsk->icsk_ca_state != TCP_CA_Open && !(flag & FLAG_LOST_RETRANS)) return; @@ -2873,7 +2883,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const int acked, if (do_lost) tcp_update_scoreboard(sk, fast_rexmit); tcp_cwnd_reduction(sk, prior_unsacked, fast_rexmit, flag); - tcp_xmit_retransmit_queue(sk); + *rexmit = REXMIT_LOST; } /* Kathleen Nichols' algorithm for tracking the minimum value of @@ -3508,6 +3518,27 @@ static inline void tcp_in_ack_event(struct sock *sk, u32 flags) icsk->icsk_ca_ops->in_ack_event(sk, flags); } +/* Congestion control has updated the cwnd already. So if we're in + * loss recovery then now we do any new sends (for FRTO) or + * retransmits (for CA_Loss or CA_recovery) that make sense. + */ +static void tcp_xmit_recovery(struct sock *sk, int rexmit) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (rexmit == REXMIT_NONE) + return; + + if (unlikely(rexmit == 2)) { + __tcp_push_pending_frames(sk, tcp_current_mss(sk), + TCP_NAGLE_OFF); + if (after(tp->snd_nxt, tp->high_seq)) + return; + tp->frto = 0; + } + tcp_xmit_retransmit_queue(sk); +} + /* This routine deals with incoming acks, but not outgoing ones. */ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) { @@ -3522,6 +3553,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) int prior_packets = tp->packets_out; const int prior_unsacked = tp->packets_out - tp->sacked_out; int acked = 0; /* Number of packets newly acked */ + int rexmit = REXMIT_NONE; /* Flag to (re)transmit to recover losses */ sack_state.first_sackt.v64 = 0; @@ -3618,7 +3650,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if (tcp_ack_is_dubious(sk, flag)) { is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP)); tcp_fastretrans_alert(sk, acked, prior_unsacked, - is_dupack, flag); + is_dupack, flag, &rexmit); } if (tp->tlp_high_seq) tcp_process_tlp_ack(sk, ack, flag); @@ -3636,13 +3668,14 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); tcp_update_pacing_rate(sk); + tcp_xmit_recovery(sk, rexmit); return 1; no_queue: /* If data was DSACKed, see if we can undo a cwnd reduction. */ if (flag & FLAG_DSACKING_ACK) tcp_fastretrans_alert(sk, acked, prior_unsacked, - is_dupack, flag); + is_dupack, flag, &rexmit); /* If this ack opens up a zero window, clear backoff. It was * being used to time the probes, and is probably far higher than * it needs to be for normal retransmission. @@ -3666,7 +3699,8 @@ old_ack: flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una, &sack_state); tcp_fastretrans_alert(sk, acked, prior_unsacked, - is_dupack, flag); + is_dupack, flag, &rexmit); + tcp_xmit_recovery(sk, rexmit); } SOCK_DEBUG(sk, "Ack %u before %u:%u\n", ack, tp->snd_una, tp->snd_nxt);