Message ID | 1332104867.3597.1.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Sun, Mar 18, 2012 at 5:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > With increasing receive window sizes, but speed of light not improved > that much, out of order queue can contain a huge number of skbs, waiting > to be moved to receive_queue when missing packets can fill the holes. > > Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct > sk_buff)) to store regular (MTU <= 1500) frames. This makes highly > probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in > many cases. > > When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true > latency killer and cpu cache blower. > > Doing the coalescing attempt each time we add a frame in ofo queue > permits to keep memory use tight and in many cases avoid the > tcp_collapse() thing later. > > Tested on various wireless setups (b43, ath9k, ...) known to use big skb > truesize, this patch removed the "packets collapsed in receive queue due > to low socket buffer" I had before. > > This also reduced average memory used by tcp sockets. > > With help from Neal Cardwell. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Neal Cardwell <ncardwell@google.com> neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 18 Mar 2012 14:07:47 -0700 > With increasing receive window sizes, but speed of light not improved > that much, out of order queue can contain a huge number of skbs, waiting > to be moved to receive_queue when missing packets can fill the holes. > > Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct > sk_buff)) to store regular (MTU <= 1500) frames. This makes highly > probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in > many cases. > > When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true > latency killer and cpu cache blower. > > Doing the coalescing attempt each time we add a frame in ofo queue > permits to keep memory use tight and in many cases avoid the > tcp_collapse() thing later. > > Tested on various wireless setups (b43, ath9k, ...) known to use big skb > truesize, this patch removed the "packets collapsed in receive queue due > to low socket buffer" I had before. > > This also reduced average memory used by tcp sockets. > > With help from Neal Cardwell. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/snmp.h b/include/linux/snmp.h index 8ee8af4..2e68f5b 100644 --- a/include/linux/snmp.h +++ b/include/linux/snmp.h @@ -233,6 +233,7 @@ enum LINUX_MIB_TCPREQQFULLDOCOOKIES, /* TCPReqQFullDoCookies */ LINUX_MIB_TCPREQQFULLDROP, /* TCPReqQFullDrop */ LINUX_MIB_TCPRETRANSFAIL, /* TCPRetransFail */ + LINUX_MIB_TCPRCVCOALESCE, /* TCPRcvCoalesce */ __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index 02d6107..8af0d44 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -257,6 +257,7 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TCPReqQFullDoCookies", LINUX_MIB_TCPREQQFULLDOCOOKIES), SNMP_MIB_ITEM("TCPReqQFullDrop", LINUX_MIB_TCPREQQFULLDROP), SNMP_MIB_ITEM("TCPRetransFail", LINUX_MIB_TCPRETRANSFAIL), + SNMP_MIB_ITEM("TCPRcvCoalesce", LINUX_MIB_TCPRCVCOALESCE), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index fa7de12..e886e2f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4484,7 +4484,24 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) end_seq = TCP_SKB_CB(skb)->end_seq; if (seq == TCP_SKB_CB(skb1)->end_seq) { - __skb_queue_after(&tp->out_of_order_queue, skb1, skb); + /* Packets in ofo can stay in queue a long time. + * Better try to coalesce them right now + * to avoid future tcp_collapse_ofo_queue(), + * probably the most expensive function in tcp stack. + */ + if (skb->len <= skb_tailroom(skb1) && !tcp_hdr(skb)->fin) { + NET_INC_STATS_BH(sock_net(sk), + LINUX_MIB_TCPRCVCOALESCE); + BUG_ON(skb_copy_bits(skb, 0, + skb_put(skb1, skb->len), + skb->len)); + TCP_SKB_CB(skb1)->end_seq = end_seq; + TCP_SKB_CB(skb1)->ack_seq = TCP_SKB_CB(skb)->ack_seq; + __kfree_skb(skb); + skb = NULL; + } else { + __skb_queue_after(&tp->out_of_order_queue, skb1, skb); + } if (!tp->rx_opt.num_sacks || tp->selective_acks[0].end_seq != seq)
With increasing receive window sizes, but speed of light not improved that much, out of order queue can contain a huge number of skbs, waiting to be moved to receive_queue when missing packets can fill the holes. Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct sk_buff)) to store regular (MTU <= 1500) frames. This makes highly probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in many cases. When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true latency killer and cpu cache blower. Doing the coalescing attempt each time we add a frame in ofo queue permits to keep memory use tight and in many cases avoid the tcp_collapse() thing later. Tested on various wireless setups (b43, ath9k, ...) known to use big skb truesize, this patch removed the "packets collapsed in receive queue due to low socket buffer" I had before. This also reduced average memory used by tcp sockets. With help from Neal Cardwell. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- V2: rebase after tcp_data_queue_ofo() introduction. include/linux/snmp.h | 1 + net/ipv4/proc.c | 1 + net/ipv4/tcp_input.c | 19 ++++++++++++++++++- 3 files changed, 20 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html