diff mbox

Bug in computing data_len in tcp_sendmsg?

Message ID 1322858756.2762.68.camel@edumazet-laptop
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Dec. 2, 2011, 8:45 p.m. UTC
Le vendredi 02 décembre 2011 à 15:24 -0500, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 02 Dec 2011 21:22:49 +0100
> 
> > Le vendredi 02 décembre 2011 à 13:40 -0500, David Miller a écrit :
> > 
> >> Yes, for non-SG this always was technically possible.
> >> 
> > 
> > Yes this can, I reproduced it very easily.
> > 
> > I find this hard to believe...
> 
> Grrr :-)
> 
> Ok, I'll think about this some more.

Maybe a quick fix would be to trim not len bytes but (len & ~3) bytes ?

This avoid reallocations and complex code for a 'should never happen in
normal circumstances'...

Retransmits could transmits 3 bytes already ACKed, is it a big deal ?

patch on top of linux/net tree :



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Dec. 2, 2011, 9:30 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 02 Dec 2011 21:45:56 +0100

> Retransmits could transmits 3 bytes already ACKed, is it a big deal ?

Unfortunately this kind of adjustment doesn't work.

When we trim the head in response to ACK'd data, the stack assumes that
the first byte sitting at the front of the retransmit queue is ->snd_una.

So if you just back align the pull, and don't make amends for the setting
of ->snd_una, we'll retransmit the wrong bytes.  The send queue will be
out of sync with the sequence number state of the socket.

This has implications for SACK tagging state bit in the transmit queue
as well.

In fact, this is a real dangerous road to go down, I think :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 3, 2011, 3:09 p.m. UTC | #2
Le vendredi 02 décembre 2011 à 16:30 -0500, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 02 Dec 2011 21:45:56 +0100
> 
> > Retransmits could transmits 3 bytes already ACKed, is it a big deal ?
> 
> Unfortunately this kind of adjustment doesn't work.
> 
> When we trim the head in response to ACK'd data, the stack assumes that
> the first byte sitting at the front of the retransmit queue is ->snd_una.
> 
> So if you just back align the pull, and don't make amends for the setting
> of ->snd_una, we'll retransmit the wrong bytes.  The send queue will be
> out of sync with the sequence number state of the socket.
> 
> This has implications for SACK tagging state bit in the transmit queue
> as well.
> 
> In fact, this is a real dangerous road to go down, I think :-)


Yeah, it was not a good idea :)

My plan is to add a third parameter to pskb_copy(struct sk_buff *skb,
gfp_t gfp_mask, int reserve)

and use pskb_copy() in tcp_retransmit_skb() if it appears we need
between 1 and 3 bytes to re-align skb head before calling
tcp_transmit_skb(), (if NET_IP_ALIGN is not null)

[ In the unlikely case we had to allocate a new skb with pskb_copy(), we
will pass clone_it=0 to tcp_transmit_skb() ]

I'll send a fully tested patch before the end of week end.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 63170e2..4e108c5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1126,7 +1126,7 @@  int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len)
 
 	/* If len == headlen, we avoid __skb_pull to preserve alignment. */
 	if (unlikely(len < skb_headlen(skb)))
-		__skb_pull(skb, len);
+		__skb_pull(skb, len & ~3); /* preserve alignement of tcp/ip headers */
 	else
 		__pskb_trim_head(skb, len - skb_headlen(skb));