Message ID | 1453759313-22681-1-git-send-email-ycheng@google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Yuchung Cheng <ycheng@google.com> Date: Mon, 25 Jan 2016 14:01:53 -0800 > From: Neal Cardwell <ncardwell@google.com> > > This commit fixes a corner case in tcp_mark_head_lost() which was > causing the WARN_ON(len > skb->len) in tcp_fragment() to fire. > > tcp_mark_head_lost() was assuming that if a packet has > tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of > M*mss bytes, for any M < N. But with the tricky way TCP pcounts are > maintained, this is not always true. > > For example, suppose the sender sends 4 1-byte packets and have the > last 3 packet sacked. It will merge the last 3 packets in the write > queue into an skb with pcount = 3 and len = 3 bytes. If another > recovery happens after a sack reneging event, tcp_mark_head_lost() > may attempt to split the skb assuming it has more than 2*MSS bytes. > > This sounds very counterintuitive, but as the commit description for > the related commit c0638c247f55 ("tcp: don't fragment SACKed skbs in > tcp_mark_head_lost()") notes, this is because tcp_shifted_skb() > coalesces adjacent regions of SACKed skbs, and when doing this it > preserves the sum of their packet counts in order to reflect the > real-world dynamics on the wire. The c0638c247f55 commit tried to > avoid problems by not fragmenting SACKed skbs, since SACKed skbs are > where the non-proportionality between pcount and skb->len/mss is known > to be possible. However, that commit did not handle the case where > during a reneging event one of these weird SACKed skbs becomes an > un-SACKed skb, which tcp_mark_head_lost() can then try to fragment. > > The fix is to simply mark the entire skb lost when this happens. > This makes the recovery slightly more aggressive in such corner > cases before we detect reordering. But once we detect reordering > this code path is by-passed because FACK is disabled. > > Signed-off-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: Yuchung Cheng <ycheng@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks everyone.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0003d40..d2ad433 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2164,8 +2164,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; - int cnt, oldcnt; - int err; + int cnt, oldcnt, lost; unsigned int mss; /* Use SACK to deduce losses of new sequences sent during recovery */ const u32 loss_high = tcp_is_sack(tp) ? tp->snd_nxt : tp->high_seq; @@ -2205,9 +2204,10 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head) break; mss = tcp_skb_mss(skb); - err = tcp_fragment(sk, skb, (packets - oldcnt) * mss, - mss, GFP_ATOMIC); - if (err < 0) + /* If needed, chop off the prefix to mark as lost. */ + lost = (packets - oldcnt) * mss; + if (lost < skb->len && + tcp_fragment(sk, skb, lost, mss, GFP_ATOMIC) < 0) break; cnt = packets; }