Message ID | 1510444452.2849.149.camel@edumazet-glaptop3.roam.corp.google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | [v2,net-next] tcp: allow drivers to tweak TSQ logic | expand |
On Sat, Nov 11, 2017 at 6:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > I had many reports that TSQ logic breaks wifi aggregation. > > Current logic is to allow up to 1 ms of bytes to be queued into qdisc > and drivers queues. > > But Wifi aggregation needs a bigger budget to allow bigger rates to > be discovered by various TCP Congestion Controls algorithms. > > This patch adds an extra socket field, allowing wifi drivers to select > another log scale to derive TCP Small Queue credit from current pacing > rate. > > Initial value is 10, meaning that this patch does not change current > behavior. > > We expect wifi drivers to set this field to smaller values (tests have > been done with values from 6 to 9) > > They would have to use following template : > > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) > skb->sk->sk_pacing_shift = MY_PACING_SHIFT; > > > Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041 > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Johannes Berg <johannes.berg@intel.com> > Cc: Toke Høiland-Jørgensen <toke@toke.dk> > Cc: Kir Kolyshkin <kir@openvz.org> > --- Nice. Thanks, Eric! Acked-by: Neal Cardwell <ncardwell@google.com> neal
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 11 Nov 2017 15:54:12 -0800 > From: Eric Dumazet <edumazet@google.com> > > I had many reports that TSQ logic breaks wifi aggregation. > > Current logic is to allow up to 1 ms of bytes to be queued into qdisc > and drivers queues. > > But Wifi aggregation needs a bigger budget to allow bigger rates to > be discovered by various TCP Congestion Controls algorithms. > > This patch adds an extra socket field, allowing wifi drivers to select > another log scale to derive TCP Small Queue credit from current pacing > rate. > > Initial value is 10, meaning that this patch does not change current > behavior. > > We expect wifi drivers to set this field to smaller values (tests have > been done with values from 6 to 9) > > They would have to use following template : > > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) > skb->sk->sk_pacing_shift = MY_PACING_SHIFT; > > > Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041 > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Johannes Berg <johannes.berg@intel.com> > Cc: Toke Høiland-Jørgensen <toke@toke.dk> > Cc: Kir Kolyshkin <kir@openvz.org> > --- > v2: added kernel-doc comment, based on Johannes feedback. Applied, thanks Eric.
On 2017-11-12 00:54, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > I had many reports that TSQ logic breaks wifi aggregation. > > Current logic is to allow up to 1 ms of bytes to be queued into qdisc > and drivers queues. > > But Wifi aggregation needs a bigger budget to allow bigger rates to > be discovered by various TCP Congestion Controls algorithms. > > This patch adds an extra socket field, allowing wifi drivers to select > another log scale to derive TCP Small Queue credit from current pacing > rate. > > Initial value is 10, meaning that this patch does not change current > behavior. > > We expect wifi drivers to set this field to smaller values (tests have > been done with values from 6 to 9) > > They would have to use following template : > > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) > skb->sk->sk_pacing_shift = MY_PACING_SHIFT; I did some experiments with this approach (with your patch backported to a 4.9 kernel), and I got some crashes. After looking at the crashes and code some more, it seems that this would need some extra checks to ensure that skb->sk is a full struct sock, instead of just a struct request_sock. Should this be done by checking for skb->sk->sk_state == TCP_ESTABLISHED? It seems to me that this might introduce some extra overhead. - Felix
On Tue, 2017-11-28 at 14:10 +0100, Felix Fietkau wrote: > On 2017-11-12 00:54, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > I had many reports that TSQ logic breaks wifi aggregation. > > > > Current logic is to allow up to 1 ms of bytes to be queued into > > qdisc > > and drivers queues. > > > > But Wifi aggregation needs a bigger budget to allow bigger rates to > > be discovered by various TCP Congestion Controls algorithms. > > > > This patch adds an extra socket field, allowing wifi drivers to > > select > > another log scale to derive TCP Small Queue credit from current > > pacing > > rate. > > > > Initial value is 10, meaning that this patch does not change > > current > > behavior. > > > > We expect wifi drivers to set this field to smaller values (tests > > have > > been done with values from 6 to 9) > > > > They would have to use following template : > > > > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) > > skb->sk->sk_pacing_shift = MY_PACING_SHIFT; > > I did some experiments with this approach (with your patch backported > to > a 4.9 kernel), and I got some crashes. > After looking at the crashes and code some more, it seems that this > would need some extra checks to ensure that skb->sk is a full struct > sock, instead of just a struct request_sock. > Should this be done by checking for skb->sk->sk_state == > TCP_ESTABLISHED? It seems to me that this might introduce some extra > overhead. > Hi Felix. Answer is in the question, the pseudo code in the changelog was not 100% correct. I will add following helper to net-next I guess : void sk_pacing_shift_update(struct sock *sk, int val) { if (!sk || !sk_fullsock(sk) || sk->sk_pacing_shift == val) return; sk->sk_pacing_shift = val; } Then you might use it like that : sk_pacing_shift_update(skb->sk, 7); Thanks.
diff --git a/include/net/sock.h b/include/net/sock.h index 688a823dccc306bd21f47da167c6922161af5a6a..f8715c5af37d4e598770dbe5c5f83246241f18d5 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -267,6 +267,7 @@ struct sock_common { * @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) * @sk_gso_max_size: Maximum GSO segment size to build * @sk_gso_max_segs: Maximum number of GSO segments + * @sk_pacing_shift: scaling factor for TCP Small Queues * @sk_lingertime: %SO_LINGER l_linger setting * @sk_backlog: always used with the per-socket spinlock held * @sk_callback_lock: used with the callbacks in the end of this struct @@ -451,6 +452,7 @@ struct sock { kmemcheck_bitfield_end(flags); u16 sk_gso_max_segs; + u8 sk_pacing_shift; unsigned long sk_lingertime; struct proto *sk_prot_creator; rwlock_t sk_callback_lock; diff --git a/net/core/sock.c b/net/core/sock.c index 57bbd6040eb6a3c072ce4e024687786079552ddf..13719af7b4e35d2050ccba51d44c7f691a889b37 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2746,6 +2746,7 @@ void sock_init_data(struct socket *sock, struct sock *sk) sk->sk_max_pacing_rate = ~0U; sk->sk_pacing_rate = ~0U; + sk->sk_pacing_shift = 10; sk->sk_incoming_cpu = -1; /* * Before updating sk_refcnt, we must commit prior changes to memory diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 0256f7a410417d93c9edab9d25a3ce5a81c2b296..76dbe884f2469660028684a46fc19afa000a1353 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1720,7 +1720,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, { u32 bytes, segs; - bytes = min(sk->sk_pacing_rate >> 10, + bytes = min(sk->sk_pacing_rate >> sk->sk_pacing_shift, sk->sk_gso_max_size - 1 - MAX_TCP_HEADER); /* Goal is to send at least one packet per ms, @@ -2198,7 +2198,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, { unsigned int limit; - limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10); + limit = max(2 * skb->truesize, sk->sk_pacing_rate >> sk->sk_pacing_shift); limit = min_t(u32, limit, sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes); limit <<= factor;