Message ID | 1403140503-9689-1-git-send-email-ncardwell@google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2014-06-18 at 21:15 -0400, Neal Cardwell wrote: > If there is an MSS change (or misbehaving receiver) that causes a SACK > to arrive that covers the end of an skb but is less than one MSS, then > tcp_match_skb_to_sack() was rounding up pkt_len to the full length of > the skb ("Round if necessary..."), then chopping all bytes off the skb > and creating a zero-byte skb in the write queue. > > This was visible now because the recently simplified TLP logic in > bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte > skb at the end of the write queue, and now that we do not check that > skb's length we could send it as a TLP probe. > > Consider the following example scenario: > > mss: 1000 > skb: seq: 0 end_seq: 4000 len: 4000 > SACK: start_seq: 3999 end_seq: 4000 > > The tcp_match_skb_to_sack() code will compute: > > in_sack = false > pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 > new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 > new_len += mss = 4000 > > Previously we would find the new_len > skb->len check failing, so we > would fall through and set pkt_len = new_len = 4000 and chop off > pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment > afterward in the write queue. > > With this new commit, we notice that the new new_len >= skb->len check > succeeds, so that we return without trying to fragment. > > Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") > Reported-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Neal Cardwell <ncardwell@google.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> > --- Thanks Neal ! Acked-by: Eric Dumazet <edumazet@google.com> CC Per Hurtig, as we discovered this minor issue while backporting and fully testing bef1909ee3ed1ca39231b260a8d3b4544ecd0c8f ("tcp: fixing TLP's FIN recovery") -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Neal Cardwell <ncardwell@google.com> Date: Wed, 18 Jun 2014 21:15:03 -0400 > If there is an MSS change (or misbehaving receiver) that causes a SACK > to arrive that covers the end of an skb but is less than one MSS, then > tcp_match_skb_to_sack() was rounding up pkt_len to the full length of > the skb ("Round if necessary..."), then chopping all bytes off the skb > and creating a zero-byte skb in the write queue. > > This was visible now because the recently simplified TLP logic in > bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte > skb at the end of the write queue, and now that we do not check that > skb's length we could send it as a TLP probe. > > Consider the following example scenario: > > mss: 1000 > skb: seq: 0 end_seq: 4000 len: 4000 > SACK: start_seq: 3999 end_seq: 4000 > > The tcp_match_skb_to_sack() code will compute: > > in_sack = false > pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 > new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 > new_len += mss = 4000 > > Previously we would find the new_len > skb->len check failing, so we > would fall through and set pkt_len = new_len = 4000 and chop off > pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment > afterward in the write queue. > > With this new commit, we notice that the new new_len >= skb->len check > succeeds, so that we return without trying to fragment. > > Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") > Reported-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Neal Cardwell <ncardwell@google.com> Applied and queued up for -stable, thanks Neal. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 18 Jun 2014, Neal Cardwell wrote: > If there is an MSS change (or misbehaving receiver) that causes a SACK > to arrive that covers the end of an skb but is less than one MSS, then > tcp_match_skb_to_sack() was rounding up pkt_len to the full length of > the skb ("Round if necessary..."), then chopping all bytes off the skb > and creating a zero-byte skb in the write queue. > > This was visible now because the recently simplified TLP logic in > bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte > skb at the end of the write queue, and now that we do not check that > skb's length we could send it as a TLP probe. > > Consider the following example scenario: > > mss: 1000 > skb: seq: 0 end_seq: 4000 len: 4000 > SACK: start_seq: 3999 end_seq: 4000 > > The tcp_match_skb_to_sack() code will compute: > > in_sack = false > pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 > new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 > new_len += mss = 4000 > > Previously we would find the new_len > skb->len check failing, so we > would fall through and set pkt_len = new_len = 4000 and chop off > pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment > afterward in the write queue. > > With this new commit, we notice that the new new_len >= skb->len check > succeeds, so that we return without trying to fragment. > > Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") > Reported-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Neal Cardwell <ncardwell@google.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> > --- > net/ipv4/tcp_input.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 40661fc..b5c2375 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -1162,7 +1162,7 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb, > unsigned int new_len = (pkt_len / mss) * mss; > if (!in_sack && new_len < pkt_len) { > new_len += mss; > - if (new_len > skb->len) > + if (new_len >= skb->len) Any idea if tcp_fragment WARN_ON(len > skb->len)) could be similarly made stricter to include equality? Maybe SACK for a super skb including FIN but not covered by the SACK block might need it so that equality check there wouldn't work?
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 40661fc..b5c2375 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1162,7 +1162,7 @@ static int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb, unsigned int new_len = (pkt_len / mss) * mss; if (!in_sack && new_len < pkt_len) { new_len += mss; - if (new_len > skb->len) + if (new_len >= skb->len) return 0; } pkt_len = new_len;
If there is an MSS change (or misbehaving receiver) that causes a SACK to arrive that covers the end of an skb but is less than one MSS, then tcp_match_skb_to_sack() was rounding up pkt_len to the full length of the skb ("Round if necessary..."), then chopping all bytes off the skb and creating a zero-byte skb in the write queue. This was visible now because the recently simplified TLP logic in bef1909ee3ed1c ("tcp: fixing TLP's FIN recovery") could find that 0-byte skb at the end of the write queue, and now that we do not check that skb's length we could send it as a TLP probe. Consider the following example scenario: mss: 1000 skb: seq: 0 end_seq: 4000 len: 4000 SACK: start_seq: 3999 end_seq: 4000 The tcp_match_skb_to_sack() code will compute: in_sack = false pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000 new_len += mss = 4000 Previously we would find the new_len > skb->len check failing, so we would fall through and set pkt_len = new_len = 4000 and chop off pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment afterward in the write queue. With this new commit, we notice that the new new_len >= skb->len check succeeds, so that we return without trying to fragment. Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)