diff mbox

A bug report for Linux TCP congestion control algorithms

Message ID CAMN46fP_a_Jti_rcLze=7yoaDroQ7qn8kek9HMKtFEQpbm7R3A@mail.gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Wei Sun May 26, 2017, 6:08 a.m. UTC
Hi there,

we find a special case for Linux TCP undo operations where
tp->snd_cwnd could be extremely large (e.g., 4294967294) by two
consecutive cwnd undo operations when using
reno/veno/vegas/highspeed/HTCP/yeah/westwood/hybla/illinois/scalable/lp
congestion control algorithms in the latest long-term kernel 4.9.

e.g., a simple trace for sender-side tcp state variables
cwnd               ssthresh              ca_state
10                   2147483647            0
10                   2147483647            0
1                           5                        4
11                    2147483647           4                   first
undo operation
4294967294    2147483647           0                   second undo operation
4294967294    2147483647           0


By debugging the code, we find that the second undo operation was
triggered by F-RTO mechanism without checking current tp->undo_marker.

The case should be existing for all kernel versions depending on F-RTO
internals (i.e., this bug exists for kernels 4.10 and earlier)

Just Let you know in case of some vulnerabilities as it is not hard to
trigger this specific case.

Attached is a simple Google's packetdrill script to trigger it and a
possible patch to fix it. Thanks

Comments

Yuchung Cheng May 26, 2017, 4:51 p.m. UTC | #1
On Thu, May 25, 2017 at 11:08 PM, Wei Sun <wsun@cse.unl.edu> wrote:
>
> Hi there,
>
> we find a special case for Linux TCP undo operations where
> tp->snd_cwnd could be extremely large (e.g., 4294967294) by two
> consecutive cwnd undo operations when using
> reno/veno/vegas/highspeed/HTCP/yeah/westwood/hybla/illinois/scalable/lp
> congestion control algorithms in the latest long-term kernel 4.9.
>
> e.g., a simple trace for sender-side tcp state variables
> cwnd               ssthresh              ca_state
> 10                   2147483647            0
> 10                   2147483647            0
> 1                           5                        4
> 11                    2147483647           4                   first
> undo operation
> 4294967294    2147483647           0                   second undo operation
> 4294967294    2147483647           0
>
>
> By debugging the code, we find that the second undo operation was
> triggered by F-RTO mechanism without checking current tp->undo_marker.
>
> The case should be existing for all kernel versions depending on F-RTO
> internals (i.e., this bug exists for kernels 4.10 and earlier)
Thanks for discovering that. Note this issue is addressed by

commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427
Author: Yuchung Cheng <ycheng@google.com>
Date:   Thu Jan 12 22:11:37 2017 -0800

    tcp: extend F-RTO to catch more spurious timeouts


>
>
> Just Let you know in case of some vulnerabilities as it is not hard to
> trigger this specific case.
>
> Attached is a simple Google's packetdrill script to trigger it and a
> possible patch to fix it. Thanks
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7727ffe..da23221 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2733,7 +2733,7 @@  static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack,
 		 * lost, i.e., never-retransmitted data are (s)acked.
 		 */
 		if ((flag & FLAG_ORIG_SACK_ACKED) &&
-		    tcp_try_undo_loss(sk, true))
+		    tcp_try_undo_loss(sk, tp->undo_marker))
 			return;
 
 		if (after(tp->snd_nxt, tp->high_seq)) {