Message ID | 1322840663.2762.26.camel@edumazet-laptop |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
Le vendredi 02 décembre 2011 à 16:44 +0100, Eric Dumazet a écrit : > [PATCH net-next] tcp: fix tcp_trim_head() > > commit f07d960df3 (tcp: avoid frag allocation for small frames) > breaked assumption in tcp stack that skb is either linear (data_len == > 0), either fully fragged (data_len == len) > > Thanks to Vijay for providing a very detailed explanation. > > Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- Another problem is the possible misalignement of skb->data if/when we receive an ACK of an odd (not a 4 multiple) tcp sequence. So when/if packet is restransmited, tcp header (and IP header as well) could be misaligned. Hmm, I probably miss something obvious, since this problem could happen for linear skbs before my patch ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 02 Dec 2011 17:05:35 +0100 > Le vendredi 02 décembre 2011 à 16:44 +0100, Eric Dumazet a écrit : > >> [PATCH net-next] tcp: fix tcp_trim_head() >> >> commit f07d960df3 (tcp: avoid frag allocation for small frames) >> breaked assumption in tcp stack that skb is either linear (data_len == >> 0), either fully fragged (data_len == len) >> >> Thanks to Vijay for providing a very detailed explanation. >> >> Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> >> --- > > Another problem is the possible misalignement of skb->data if/when we > receive an ACK of an odd (not a 4 multiple) tcp sequence. > > So when/if packet is restransmited, tcp header (and IP header as well) > could be misaligned. > > Hmm, I probably miss something obvious, since this problem could happen > for linear skbs before my patch ? Unfortunately, even if netfilter or the packet scheduler pulled data, this misalignment wouldn't happen before your change. Maybe we should revert your frag allocation avoidance change until we can sort this out. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 02 décembre 2011 à 13:13 -0500, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Fri, 02 Dec 2011 17:05:35 +0100 > > > Le vendredi 02 décembre 2011 à 16:44 +0100, Eric Dumazet a écrit : > > > >> [PATCH net-next] tcp: fix tcp_trim_head() > >> > >> commit f07d960df3 (tcp: avoid frag allocation for small frames) > >> breaked assumption in tcp stack that skb is either linear (data_len == > >> 0), either fully fragged (data_len == len) > >> > >> Thanks to Vijay for providing a very detailed explanation. > >> > >> Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com> > >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > >> --- > > > > Another problem is the possible misalignement of skb->data if/when we > > receive an ACK of an odd (not a 4 multiple) tcp sequence. > > > > So when/if packet is restransmited, tcp header (and IP header as well) > > could be misaligned. > > > > Hmm, I probably miss something obvious, since this problem could happen > > for linear skbs before my patch ? > > Unfortunately, even if netfilter or the packet scheduler pulled data, this > misalignment wouldn't happen before your change. > > Maybe we should revert your frag allocation avoidance change until we can > sort this out. The patch I posted should solve the problem Vijay spotted. (It really does, I tested it, allowing mtu probing) What I ask now is following problem (even prior to my frag allocation patch) : 1) We allocate a linear skb (SG being off) to cook a tcp frame of length XXX bytes. 2) We send it. 3) We receive an ACK for first 31 bytes. 4) We trim 31 bytes from the head of skb. (skb_pull(skb, 31)) skb->data is now not anymore aligned to a 4 bytes boundary. 5) Later, we need to retransmit skb (XXX minus 31 bytes already ACKed) We push TCP header, and skb->data is not aligned. TCP header is not aligned anymore. x86 doesnt care, but what about other arches with misalign traps ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 02 Dec 2011 19:36:29 +0100 > What I ask now is following problem (even prior to my frag allocation > patch) : > > 1) We allocate a linear skb (SG being off) to cook a tcp frame of length > XXX bytes. > > 2) We send it. > > 3) We receive an ACK for first 31 bytes. > > 4) We trim 31 bytes from the head of skb. (skb_pull(skb, 31)) > skb->data is now not anymore aligned to a 4 bytes boundary. > > 5) Later, we need to retransmit skb (XXX minus 31 bytes already ACKed) > We push TCP header, and skb->data is not aligned. > TCP header is not aligned anymore. x86 doesnt care, but what about > other arches with misalign traps ? Yes, for non-SG this always was technically possible. But now you've made it possible for SG too. Frankly, I think we should: 1) Revert your patch, so it's not possible for SG once more. 2) Add code to reallocate the SKB linear data when this happens in the non-SG case. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 58f69ac..4a0f54a 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1093,6 +1093,13 @@ static void __pskb_trim_head(struct sk_buff *skb, int len) { int i, k, eat; + eat = min_t(int, len, skb_headlen(skb)); + if (eat) { + __skb_pull(skb, eat); + len -= eat; + if (!len) + return; + } eat = len; k = 0; for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { @@ -1124,11 +1131,7 @@ int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len) if (skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) return -ENOMEM; - /* If len == headlen, we avoid __skb_pull to preserve alignment. */ - if (unlikely(len < skb_headlen(skb))) - __skb_pull(skb, len); - else - __pskb_trim_head(skb, len - skb_headlen(skb)); + __pskb_trim_head(skb, len); TCP_SKB_CB(skb)->seq += len; skb->ip_summed = CHECKSUM_PARTIAL;