diff mbox

[net-next,V2] tcp: Fix bug when gap in rcv sequence is filled

Message ID 1333733810-2381-1-git-send-email-subramanian.vijay@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Vijay Subramanian April 6, 2012, 5:36 p.m. UTC
As per RFC2581 and the newer RFC5681, "the receiver SHOULD send an immediate ACK
when it receives a data segment that fills in all or part of a gap in the
sequence space." When TCP receiver gets the next in-sequence packet, we move
data from ofo queue to receive queue. At this point, we should send an immediate
ack by entering quickack mode.  In the current code, instead of entering
quickack mode upon requeing packets from ofo queue to receive_queue, we enter
quickack mode only when ofo queue becomes empty after requeuing.  This ignores
the possibility that there may be further packets left in ofo queue. This patch
fixes this behavior and enters quickack mode whenever packets are moved from ofo
queue to receive queue.

Also, comment has been updated to reflect that RFC5681 obsoletes RFC2581.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
Changes from V1: 
-- Removed bool variable to simplify code. (chetan loke <loke.chetan@gmail.com>)

 net/ipv4/tcp_input.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

Comments

chetan L April 6, 2012, 6:06 p.m. UTC | #1
On Fri, Apr 6, 2012 at 1:36 PM, Vijay Subramanian
<subramanian.vijay@gmail.com> wrote:

> +       int requeued = 0; /*1 if an skb is requeued to receive_queue*/

Vijay - minor change: missing leading/trailing space in the comment.
checkpatch.pl ?


Chetan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vijay Subramanian April 6, 2012, 6:27 p.m. UTC | #2
>> +       int requeued = 0; /*1 if an skb is requeued to receive_queue*/
>
> Vijay - minor change: missing leading/trailing space in the comment.
> checkpatch.pl ?

Thanks for the reviews Chetan. checkpatch.pl reported no errors and no
warnings however.
I will wait for additional feedback from TCP folks and send version V3
if needed.

Thanks,
Vijay
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neal Cardwell April 6, 2012, 10:17 p.m. UTC | #3
On Fri, Apr 6, 2012 at 10:36 AM, Vijay Subramanian
<subramanian.vijay@gmail.com> wrote:
> As per RFC2581 and the newer RFC5681, "the receiver SHOULD send an immediate ACK
> when it receives a data segment that fills in all or part of a gap in the
> sequence space." When TCP receiver gets the next in-sequence packet, we move
> data from ofo queue to receive queue. At this point, we should send an immediate
> ack by entering quickack mode.  In the current code, instead of entering
> quickack mode upon requeing packets from ofo queue to receive_queue, we enter
> quickack mode only when ofo queue becomes empty after requeuing.  This ignores
> the possibility that there may be further packets left in ofo queue. This patch
> fixes this behavior and enters quickack mode whenever packets are moved from ofo
> queue to receive queue.

If you have some packet traces documenting that there is actually a
bug here, can you please share them?

AFAICT the code correctly generates immediate ACKs in these cases by:

(a) having __tcp_ack_snd_check() call tcp_send_ack() to send an
immediate ack if out_of_order_queue is non-empty.

(b) having tcp_data_queue set icsk_ack.pingpong to 0 when
out_of_order_queue transitions from non-empty to empty

My testing also shows that Linux TCP correctly generates an immediate
ACK when we move data from the out_of_order_queue to the
sk_receive_queue but the out_of_order_queue is still non-empty. Here
is a condensed packet trace showing the behavior I see in tests (with
a trace taken on machine A):

.798 B>A . 50001:60001(10000) ack 1001 win 20000
.838 A>B . ack 60001 win 10000
.848 B>A . 61001:62001(1000) ack 1001 win 20000
.848 A>B . ack 60001 win 10000 <nop,nop,sack 1 {61001:62001}>
.858 B>A . 63001:64001(1000) ack 1001 win 20000
.858 A>B . ack 60001 win 10000 <nop,nop,sack 2 {63001:64001}{61001:62001}>
.868 B>A . 60001:61001(1000) ack 1001 win 20000
.868 A>B . ack 62001 win 10000 <nop,nop,sack 1 {63001:64001}>

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vijay Subramanian April 7, 2012, 6:58 p.m. UTC | #4
Thanks for the review Neal.

>
> If you have some packet traces documenting that there is actually a
> bug here, can you please share them?
>
No. My claim was based only on an audit of the code.

> AFAICT the code correctly generates immediate ACKs in these cases by:
>
> (a) having __tcp_ack_snd_check() call tcp_send_ack() to send an
> immediate ack if out_of_order_queue is non-empty.
>
> (b) having tcp_data_queue set icsk_ack.pingpong to 0 when
> out_of_order_queue transitions from non-empty to empty
>

I noted that  icsk_ack.pingpong is set to 0 in event (b) and also when
data is added to ofo queue
but I completely missed the check in __tcp_ack_snd_check().


>
> .798 B>A . 50001:60001(10000) ack 1001 win 20000
> .838 A>B . ack 60001 win 10000
> .848 B>A . 61001:62001(1000) ack 1001 win 20000
> .848 A>B . ack 60001 win 10000 <nop,nop,sack 1 {61001:62001}>
> .858 B>A . 63001:64001(1000) ack 1001 win 20000
> .858 A>B . ack 60001 win 10000 <nop,nop,sack 2 {63001:64001}{61001:62001}>
> .868 B>A . 60001:61001(1000) ack 1001 win 20000
> .868 A>B . ack 62001 win 10000 <nop,nop,sack 1 {63001:64001}>
>

For your trace it is clear acks are sent correctly and there is no bug.
I should have checked the behavior on the wire. Sorry for the noise.

Thanks,
Vijay
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e886e2f..f4561f9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4388,12 +4388,14 @@  static void tcp_sack_remove(struct tcp_sock *tp)
 
 /* This one checks to see if we can put data from the
  * out_of_order queue into the receive_queue.
+ * Return 1 if data is moved from ofo queue to receive_queue and 0 otherwise.
  */
-static void tcp_ofo_queue(struct sock *sk)
+static int tcp_ofo_queue(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	__u32 dsack_high = tp->rcv_nxt;
 	struct sk_buff *skb;
+	int requeued = 0; /*1 if an skb is requeued to receive_queue*/
 
 	while ((skb = skb_peek(&tp->out_of_order_queue)) != NULL) {
 		if (after(TCP_SKB_CB(skb)->seq, tp->rcv_nxt))
@@ -4418,10 +4420,12 @@  static void tcp_ofo_queue(struct sock *sk)
 
 		__skb_unlink(skb, &tp->out_of_order_queue);
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
+		requeued = 1;
 		tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
 		if (tcp_hdr(skb)->fin)
 			tcp_fin(sk);
 	}
+	return requeued;
 }
 
 static int tcp_prune_ofo_queue(struct sock *sk);
@@ -4636,12 +4640,11 @@  queue_and_out:
 			tcp_fin(sk);
 
 		if (!skb_queue_empty(&tp->out_of_order_queue)) {
-			tcp_ofo_queue(sk);
 
-			/* RFC2581. 4.2. SHOULD send immediate ACK, when
-			 * gap in queue is filled.
+			/* RFC5681 (which obsoletes RFC2581.) 4.2. SHOULD send
+			 * immediate ACK, when gap in queue is filled.
 			 */
-			if (skb_queue_empty(&tp->out_of_order_queue))
+			if (tcp_ofo_queue(sk))
 				inet_csk(sk)->icsk_ack.pingpong = 0;
 		}