diff mbox

Bug in computing data_len in tcp_sendmsg?

Message ID 1322840663.2762.26.camel@edumazet-laptop
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Dec. 2, 2011, 3:44 p.m. UTC
Le vendredi 02 décembre 2011 à 12:59 +0100, Eric Dumazet a écrit :

> Thanks for this detailed explanation !
> 
> And yes, you're probably right.
> 
> Are you willing to submit a patch to fix this ?
> 
> (If not, I can do it myself of course)
> 

[PATCH net-next] tcp: fix tcp_trim_head()

commit f07d960df3 (tcp: avoid frag allocation for small frames)
breaked assumption in tcp stack that skb is either linear (data_len ==
0), either fully fragged (data_len == len)

Thanks to Vijay for providing a very detailed explanation.

Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp_output.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Dec. 2, 2011, 4:05 p.m. UTC | #1
Le vendredi 02 décembre 2011 à 16:44 +0100, Eric Dumazet a écrit :

> [PATCH net-next] tcp: fix tcp_trim_head()
> 
> commit f07d960df3 (tcp: avoid frag allocation for small frames)
> breaked assumption in tcp stack that skb is either linear (data_len ==
> 0), either fully fragged (data_len == len)
> 
> Thanks to Vijay for providing a very detailed explanation.
> 
> Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---

Another problem is the possible misalignement of skb->data if/when we
receive an ACK of an odd (not a 4 multiple) tcp sequence.

So when/if packet is restransmited, tcp header (and IP header as well)
could be misaligned.

Hmm, I probably miss something obvious, since this problem could happen
for linear skbs before my patch ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 2, 2011, 6:13 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 02 Dec 2011 17:05:35 +0100

> Le vendredi 02 décembre 2011 à 16:44 +0100, Eric Dumazet a écrit :
> 
>> [PATCH net-next] tcp: fix tcp_trim_head()
>> 
>> commit f07d960df3 (tcp: avoid frag allocation for small frames)
>> breaked assumption in tcp stack that skb is either linear (data_len ==
>> 0), either fully fragged (data_len == len)
>> 
>> Thanks to Vijay for providing a very detailed explanation.
>> 
>> Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> ---
> 
> Another problem is the possible misalignement of skb->data if/when we
> receive an ACK of an odd (not a 4 multiple) tcp sequence.
> 
> So when/if packet is restransmited, tcp header (and IP header as well)
> could be misaligned.
> 
> Hmm, I probably miss something obvious, since this problem could happen
> for linear skbs before my patch ?

Unfortunately, even if netfilter or the packet scheduler pulled data, this
misalignment wouldn't happen before your change.

Maybe we should revert your frag allocation avoidance change until we can
sort this out.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 2, 2011, 6:36 p.m. UTC | #3
Le vendredi 02 décembre 2011 à 13:13 -0500, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 02 Dec 2011 17:05:35 +0100
> 
> > Le vendredi 02 décembre 2011 à 16:44 +0100, Eric Dumazet a écrit :
> > 
> >> [PATCH net-next] tcp: fix tcp_trim_head()
> >> 
> >> commit f07d960df3 (tcp: avoid frag allocation for small frames)
> >> breaked assumption in tcp stack that skb is either linear (data_len ==
> >> 0), either fully fragged (data_len == len)
> >> 
> >> Thanks to Vijay for providing a very detailed explanation.
> >> 
> >> Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> >> ---
> > 
> > Another problem is the possible misalignement of skb->data if/when we
> > receive an ACK of an odd (not a 4 multiple) tcp sequence.
> > 
> > So when/if packet is restransmited, tcp header (and IP header as well)
> > could be misaligned.
> > 
> > Hmm, I probably miss something obvious, since this problem could happen
> > for linear skbs before my patch ?
> 
> Unfortunately, even if netfilter or the packet scheduler pulled data, this
> misalignment wouldn't happen before your change.
> 
> Maybe we should revert your frag allocation avoidance change until we can
> sort this out.

The patch I posted should solve the problem Vijay spotted.
(It really does, I tested it, allowing mtu probing)

What I ask now is following problem (even prior to my frag allocation
patch) :

1) We allocate a linear skb (SG being off) to cook a tcp frame of length
XXX bytes.

2) We send it.

3) We receive an ACK for first 31 bytes.

4) We trim 31 bytes from the head of skb.  (skb_pull(skb, 31))
    skb->data is now not anymore aligned to a 4 bytes boundary.

5) Later, we need to retransmit skb (XXX minus 31 bytes already ACKed)
   We push TCP header, and skb->data is not aligned.
   TCP header is not aligned anymore. x86 doesnt care, but what about
other arches with misalign traps ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 2, 2011, 6:40 p.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 02 Dec 2011 19:36:29 +0100

> What I ask now is following problem (even prior to my frag allocation
> patch) :
> 
> 1) We allocate a linear skb (SG being off) to cook a tcp frame of length
> XXX bytes.
> 
> 2) We send it.
> 
> 3) We receive an ACK for first 31 bytes.
> 
> 4) We trim 31 bytes from the head of skb.  (skb_pull(skb, 31))
>     skb->data is now not anymore aligned to a 4 bytes boundary.
> 
> 5) Later, we need to retransmit skb (XXX minus 31 bytes already ACKed)
>    We push TCP header, and skb->data is not aligned.
>    TCP header is not aligned anymore. x86 doesnt care, but what about
> other arches with misalign traps ?

Yes, for non-SG this always was technically possible.

But now you've made it possible for SG too.

Frankly, I think we should:

1) Revert your patch, so it's not possible for SG once more.

2) Add code to reallocate the SKB linear data when this happens in
   the non-SG case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 58f69ac..4a0f54a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1093,6 +1093,13 @@  static void __pskb_trim_head(struct sk_buff *skb, int len)
 {
 	int i, k, eat;
 
+	eat = min_t(int, len, skb_headlen(skb));
+	if (eat) {
+		__skb_pull(skb, eat);
+		len -= eat;
+		if (!len)
+			return;
+	}
 	eat = len;
 	k = 0;
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
@@ -1124,11 +1131,7 @@  int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len)
 	if (skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
 		return -ENOMEM;
 
-	/* If len == headlen, we avoid __skb_pull to preserve alignment. */
-	if (unlikely(len < skb_headlen(skb)))
-		__skb_pull(skb, len);
-	else
-		__pskb_trim_head(skb, len - skb_headlen(skb));
+	__pskb_trim_head(skb, len);
 
 	TCP_SKB_CB(skb)->seq += len;
 	skb->ip_summed = CHECKSUM_PARTIAL;