diff mbox series

[net,5/5] tcp: send real dupACKs by locking advertized window for non-SACK flows

Message ID 1520427569-14365-6-git-send-email-ilpo.jarvinen@helsinki.fi
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series tcp: fixes to non-SACK TCP | expand

Commit Message

Ilpo Järvinen March 7, 2018, 12:59 p.m. UTC
Currently, the TCP code is overly eager to update window on
every ACK. It makes some of the ACKs that the receiver should
sent as dupACKs look like they update window update that are
not considered real dupACKs by the non-SACK sender-side code.

Make sure that when an ofo segment is received, no change to
window is applied if we are going to send a dupACK. It's ok
to change the window for non-dupACKs (such as the first ACK
after ofo arrivals start if that ACK was using delayed ACKs).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h   |  3 ++-
 net/ipv4/tcp_input.c  |  5 ++++-
 net/ipv4/tcp_output.c | 43 +++++++++++++++++++++++++------------------
 3 files changed, 31 insertions(+), 20 deletions(-)

Comments

Eric Dumazet March 7, 2018, 3:58 p.m. UTC | #1
On Wed, 2018-03-07 at 14:59 +0200, Ilpo Järvinen wrote:
> Currently, the TCP code is overly eager to update window on
> every ACK. It makes some of the ACKs that the receiver should
> sent as dupACKs look like they update window update that are
> not considered real dupACKs by the non-SACK sender-side code.
> 
> Make sure that when an ofo segment is received, no change to
> window is applied if we are going to send a dupACK. It's ok
> to change the window for non-dupACKs (such as the first ACK
> after ofo arrivals start if that ACK was using delayed ACKs).

This looks dangerous to me.

We certainly want to lower the window at some point, or risk expensive
pruning and/or drops.

It is not clear by reading your changelog/patch, but it looks like some
malicious peers could hurt us.

By current standards, non TCP sack flows are not worth any potential
issues.
Ilpo Järvinen March 7, 2018, 8:09 p.m. UTC | #2
Thanks for taking a look.

On Wed, 7 Mar 2018, Eric Dumazet wrote:
> On Wed, 2018-03-07 at 14:59 +0200, Ilpo Järvinen wrote:
> > Currently, the TCP code is overly eager to update window on
> > every ACK. It makes some of the ACKs that the receiver should
> > sent as dupACKs look like they update window update that are
> > not considered real dupACKs by the non-SACK sender-side code.
> > 
> > Make sure that when an ofo segment is received, no change to
> > window is applied if we are going to send a dupACK. It's ok
> > to change the window for non-dupACKs (such as the first ACK
> > after ofo arrivals start if that ACK was using delayed ACKs).
> 
> This looks dangerous to me.
> 
> We certainly want to lower the window at some point, or risk expensive
> pruning and/or drops.

I see you're conspiring for "treason" (if you recall those charmy
times) ;-).

> It is not clear by reading your changelog/patch, but it looks like some
> malicious peers could hurt us.

The malicious peers can certainly send out-of-window data already at will 
(with all of such packets being dropped regardless of that being 
expensive or not) so I don't see there's a big change for malicious case 
really. The window is only locked for what we've already promised for in 
an earlier ACK! ...Previously, reneging that promise (advertising smaller 
than the previous value) was called "treason" by us (unfortunately, the 
message is no longer as charmy).

Even with this change, the window is free to change when the ack field is 
updated which for legimate senders occurs typically once per RTT.

> By current standards, non TCP sack flows are not worth any potential
> issues.

Currently non-SACK senders cannot identify almost any duplicate ACKs 
because the window keeps updating for almost all ACKs. As a result, 
non-SACK senders end up doing loss recovery only with RTO. RTO recovery 
without SACK is quite annoying because it potentially sends 
large number of unnecessary rexmits.
Eric Dumazet March 7, 2018, 8:13 p.m. UTC | #3
On Wed, 2018-03-07 at 22:09 +0200, Ilpo Järvinen wrote:
> Thanks for taking a look.
> 
> On Wed, 7 Mar 2018, Eric Dumazet wrote:
> > On Wed, 2018-03-07 at 14:59 +0200, Ilpo Järvinen wrote:
> > > Currently, the TCP code is overly eager to update window on
> > > every ACK. It makes some of the ACKs that the receiver should
> > > sent as dupACKs look like they update window update that are
> > > not considered real dupACKs by the non-SACK sender-side code.
> > > 
> > > Make sure that when an ofo segment is received, no change to
> > > window is applied if we are going to send a dupACK. It's ok
> > > to change the window for non-dupACKs (such as the first ACK
> > > after ofo arrivals start if that ACK was using delayed ACKs).
> > 
> > This looks dangerous to me.
> > 
> > We certainly want to lower the window at some point, or risk
> > expensive
> > pruning and/or drops.
> 
> I see you're conspiring for "treason" (if you recall those charmy
> times) ;-).
> 
> > It is not clear by reading your changelog/patch, but it looks like
> > some
> > malicious peers could hurt us.
> 
> The malicious peers can certainly send out-of-window data already at
> will 
> (with all of such packets being dropped regardless of that being 
> expensive or not) so I don't see there's a big change for malicious
> case 
> really. The window is only locked for what we've already promised for
> in 
> an earlier ACK! ...Previously, reneging that promise (advertising
> smaller 
> than the previous value) was called "treason" by us (unfortunately,
> the 
> message is no longer as charmy).
> 
> Even with this change, the window is free to change when the ack
> field is 
> updated which for legimate senders occurs typically once per RTT.
> 
> > By current standards, non TCP sack flows are not worth any
> > potential
> > issues.
> 
> Currently non-SACK senders cannot identify almost any duplicate ACKs 
> because the window keeps updating for almost all ACKs. As a result, 
> non-SACK senders end up doing loss recovery only with RTO. RTO
> recovery 
> without SACK is quite annoying because it potentially sends 
> large number of unnecessary rexmits.

I get that, but switching from "always" to "never" sounds dangerous.

May I suggest you refine your patch, to maybe allow win reductions, in
a logarithmic fashion maybe ?

This way, when you send 1000 DUPACK, only few of them would actually
have to lower the window, and 99% of them would be considered as DUPACK
by these prehistoric TCP stacks.
Ilpo Järvinen March 7, 2018, 9:39 p.m. UTC | #4
On Wed, 7 Mar 2018, Eric Dumazet wrote:

> > Currently non-SACK senders cannot identify almost any duplicate ACKs 
> > because the window keeps updating for almost all ACKs. As a result, 
> > non-SACK senders end up doing loss recovery only with RTO. RTO
> > recovery 
> > without SACK is quite annoying because it potentially sends 
> > large number of unnecessary rexmits.
> 
> I get that, but switching from "always" to "never" sounds dangerous.
> 
> May I suggest you refine your patch, to maybe allow win reductions, in
> a logarithmic fashion maybe ?
> 
> This way, when you send 1000 DUPACK, only few of them would actually
> have to lower the window, and 99% of them would be considered as DUPACK
> by these prehistoric TCP stacks.

The problem I'm trying to fix is due to window increasing (of course, 
dupacks could also be masked because of decrease but that's something
I've never seen to occur in practice) so I tried to make you happy and 
change my fix to do "never increase, always decrease". However, it turns 
out that even pre-fixed code implements "never decrease" not "always 
decrease" like you thought:

static u16 tcp_select_window(struct sock *sk)
{
	...
	/* Never shrink the offered window */
        if (new_win < cur_win) {
		...
		new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);

...Thus, I don't see my fix making the case you're worried about any
worse, AFAICT.
Eric Dumazet March 7, 2018, 10:01 p.m. UTC | #5
On Wed, 2018-03-07 at 23:39 +0200, Ilpo Järvinen wrote:
> 
> The problem I'm trying to fix is due to window increasing (of
> course, 
> dupacks could also be masked because of decrease but that's something
> I've never seen to occur in practice) so I tried to make you happy
> and 
> change my fix to do "never increase, always decrease". However, it
> turns 
> out that even pre-fixed code implements "never decrease" not "always 
> decrease" like you thought:

If this was the case, your patch would not be needed.

So really try to add more information in the changelog so that your
patch makes sense to me, and really explains why it is safe.

This kind of information should be recorded in git history, not in mail
archives.

Thanks !
diff mbox series

Patch

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 8f4c549..e239662 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -225,7 +225,8 @@  struct tcp_sock {
 		fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
 		fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */
 		is_sack_reneg:1,    /* in recovery from loss with SACK reneg? */
-		unused:2;
+		dupack_wnd_lock :1, /* Non-SACK constant rwnd dupacks needed? */
+		unused:1;
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		thin_lto    : 1,/* Use linear timeouts for thin streams */
 		unused1	    : 1,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b689915..77a289f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4633,6 +4633,7 @@  int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
 static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	bool fragstolen;
 	int eaten;
 
@@ -4676,7 +4677,7 @@  static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 			 * gap in queue is filled.
 			 */
 			if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
-				inet_csk(sk)->icsk_ack.pingpong = 0;
+				icsk->icsk_ack.pingpong = 0;
 		}
 
 		if (tp->rx_opt.num_sacks)
@@ -4726,6 +4727,8 @@  static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		goto queue_and_out;
 	}
 
+	if (tcp_is_reno(tp) && !(icsk->icsk_ack.pending & ICSK_ACK_TIMER))
+		tp->dupack_wnd_lock = 1;
 	tcp_data_queue_ofo(sk, skb);
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6818042..45fbe92 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -249,25 +249,32 @@  static u16 tcp_select_window(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 old_win = tp->rcv_wnd;
-	u32 cur_win = tcp_receive_window(tp);
-	u32 new_win = __tcp_select_window(sk);
-
-	/* Never shrink the offered window */
-	if (new_win < cur_win) {
-		/* Danger Will Robinson!
-		 * Don't update rcv_wup/rcv_wnd here or else
-		 * we will not be able to advertise a zero
-		 * window in time.  --DaveM
-		 *
-		 * Relax Will Robinson.
-		 */
-		if (new_win == 0)
-			NET_INC_STATS(sock_net(sk),
-				      LINUX_MIB_TCPWANTZEROWINDOWADV);
-		new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);
+	u32 cur_win;
+	u32 new_win;
+
+	if (tp->dupack_wnd_lock) {
+		new_win = old_win;
+		tp->dupack_wnd_lock = 0;
+	} else {
+		cur_win = tcp_receive_window(tp);
+		new_win = __tcp_select_window(sk);
+		/* Never shrink the offered window */
+		if (new_win < cur_win) {
+			/* Danger Will Robinson!
+			 * Don't update rcv_wup/rcv_wnd here or else
+			 * we will not be able to advertise a zero
+			 * window in time.  --DaveM
+			 *
+			 * Relax Will Robinson.
+			 */
+			if (new_win == 0)
+				NET_INC_STATS(sock_net(sk),
+					      LINUX_MIB_TCPWANTZEROWINDOWADV);
+			new_win = ALIGN(cur_win, 1 << tp->rx_opt.rcv_wscale);
+		}
+		tp->rcv_wnd = new_win;
+		tp->rcv_wup = tp->rcv_nxt;
 	}
-	tp->rcv_wnd = new_win;
-	tp->rcv_wup = tp->rcv_nxt;
 
 	/* Make sure we do not exceed the maximum possible
 	 * scaled window.