diff mbox series

[v2,net-next] tcp: allow drivers to tweak TSQ logic

Message ID 1510444452.2849.149.camel@edumazet-glaptop3.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [v2,net-next] tcp: allow drivers to tweak TSQ logic | expand

Commit Message

Eric Dumazet Nov. 11, 2017, 11:54 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

I had many reports that TSQ logic breaks wifi aggregation.

Current logic is to allow up to 1 ms of bytes to be queued into qdisc
and drivers queues.

But Wifi aggregation needs a bigger budget to allow bigger rates to
be discovered by various TCP Congestion Controls algorithms.

This patch adds an extra socket field, allowing wifi drivers to select
another log scale to derive TCP Small Queue credit from current pacing
rate.

Initial value is 10, meaning that this patch does not change current
behavior.

We expect wifi drivers to set this field to smaller values (tests have
been done with values from 6 to 9)

They would have to use following template :

if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
     skb->sk->sk_pacing_shift = MY_PACING_SHIFT;


Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Kir Kolyshkin <kir@openvz.org>
---
v2: added kernel-doc comment, based on Johannes feedback.

 include/net/sock.h    |    2 ++
 net/core/sock.c       |    1 +
 net/ipv4/tcp_output.c |    4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

Comments

Neal Cardwell Nov. 12, 2017, 1:39 p.m. UTC | #1
On Sat, Nov 11, 2017 at 6:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> I had many reports that TSQ logic breaks wifi aggregation.
>
> Current logic is to allow up to 1 ms of bytes to be queued into qdisc
> and drivers queues.
>
> But Wifi aggregation needs a bigger budget to allow bigger rates to
> be discovered by various TCP Congestion Controls algorithms.
>
> This patch adds an extra socket field, allowing wifi drivers to select
> another log scale to derive TCP Small Queue credit from current pacing
> rate.
>
> Initial value is 10, meaning that this patch does not change current
> behavior.
>
> We expect wifi drivers to set this field to smaller values (tests have
> been done with values from 6 to 9)
>
> They would have to use following template :
>
> if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
>      skb->sk->sk_pacing_shift = MY_PACING_SHIFT;
>
>
> Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Johannes Berg <johannes.berg@intel.com>
> Cc: Toke Høiland-Jørgensen <toke@toke.dk>
> Cc: Kir Kolyshkin <kir@openvz.org>
> ---

Nice. Thanks, Eric!

Acked-by: Neal Cardwell <ncardwell@google.com>

neal
David Miller Nov. 14, 2017, 7:18 a.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 11 Nov 2017 15:54:12 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> I had many reports that TSQ logic breaks wifi aggregation.
> 
> Current logic is to allow up to 1 ms of bytes to be queued into qdisc
> and drivers queues.
> 
> But Wifi aggregation needs a bigger budget to allow bigger rates to
> be discovered by various TCP Congestion Controls algorithms.
> 
> This patch adds an extra socket field, allowing wifi drivers to select
> another log scale to derive TCP Small Queue credit from current pacing
> rate.
> 
> Initial value is 10, meaning that this patch does not change current
> behavior.
> 
> We expect wifi drivers to set this field to smaller values (tests have
> been done with values from 6 to 9)
> 
> They would have to use following template :
> 
> if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
>      skb->sk->sk_pacing_shift = MY_PACING_SHIFT;
> 
> 
> Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Johannes Berg <johannes.berg@intel.com>
> Cc: Toke Høiland-Jørgensen <toke@toke.dk>
> Cc: Kir Kolyshkin <kir@openvz.org>
> ---
> v2: added kernel-doc comment, based on Johannes feedback.

Applied, thanks Eric.
Felix Fietkau Nov. 28, 2017, 1:10 p.m. UTC | #3
On 2017-11-12 00:54, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> I had many reports that TSQ logic breaks wifi aggregation.
> 
> Current logic is to allow up to 1 ms of bytes to be queued into qdisc
> and drivers queues.
> 
> But Wifi aggregation needs a bigger budget to allow bigger rates to
> be discovered by various TCP Congestion Controls algorithms.
> 
> This patch adds an extra socket field, allowing wifi drivers to select
> another log scale to derive TCP Small Queue credit from current pacing
> rate.
> 
> Initial value is 10, meaning that this patch does not change current
> behavior.
> 
> We expect wifi drivers to set this field to smaller values (tests have
> been done with values from 6 to 9)
> 
> They would have to use following template :
> 
> if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
>      skb->sk->sk_pacing_shift = MY_PACING_SHIFT;
I did some experiments with this approach (with your patch backported to
a 4.9 kernel), and I got some crashes.
After looking at the crashes and code some more, it seems that this
would need some extra checks to ensure that skb->sk is a full struct
sock, instead of just a struct request_sock.
Should this be done by checking for skb->sk->sk_state ==
TCP_ESTABLISHED? It seems to me that this might introduce some extra
overhead.

- Felix
Eric Dumazet Nov. 28, 2017, 5:33 p.m. UTC | #4
On Tue, 2017-11-28 at 14:10 +0100, Felix Fietkau wrote:
> On 2017-11-12 00:54, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > I had many reports that TSQ logic breaks wifi aggregation.
> > 
> > Current logic is to allow up to 1 ms of bytes to be queued into
> > qdisc
> > and drivers queues.
> > 
> > But Wifi aggregation needs a bigger budget to allow bigger rates to
> > be discovered by various TCP Congestion Controls algorithms.
> > 
> > This patch adds an extra socket field, allowing wifi drivers to
> > select
> > another log scale to derive TCP Small Queue credit from current
> > pacing
> > rate.
> > 
> > Initial value is 10, meaning that this patch does not change
> > current
> > behavior.
> > 
> > We expect wifi drivers to set this field to smaller values (tests
> > have
> > been done with values from 6 to 9)
> > 
> > They would have to use following template :
> > 
> > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
> >      skb->sk->sk_pacing_shift = MY_PACING_SHIFT;
> 
> I did some experiments with this approach (with your patch backported
> to
> a 4.9 kernel), and I got some crashes.
> After looking at the crashes and code some more, it seems that this
> would need some extra checks to ensure that skb->sk is a full struct
> sock, instead of just a struct request_sock.
> Should this be done by checking for skb->sk->sk_state ==
> TCP_ESTABLISHED? It seems to me that this might introduce some extra
> overhead.
> 
Hi Felix.

Answer is in the question, the pseudo code in the changelog was not
100% correct.

I will add following helper to net-next I guess :

void sk_pacing_shift_update(struct sock *sk, int val)
{
	if (!sk || !sk_fullsock(sk) || sk->sk_pacing_shift == val)
		return;
	sk->sk_pacing_shift = val;
}


Then you might use it like that :

	sk_pacing_shift_update(skb->sk, 7);

Thanks.
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 688a823dccc306bd21f47da167c6922161af5a6a..f8715c5af37d4e598770dbe5c5f83246241f18d5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -267,6 +267,7 @@  struct sock_common {
   *	@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
   *	@sk_gso_max_size: Maximum GSO segment size to build
   *	@sk_gso_max_segs: Maximum number of GSO segments
+  *	@sk_pacing_shift: scaling factor for TCP Small Queues
   *	@sk_lingertime: %SO_LINGER l_linger setting
   *	@sk_backlog: always used with the per-socket spinlock held
   *	@sk_callback_lock: used with the callbacks in the end of this struct
@@ -451,6 +452,7 @@  struct sock {
 	kmemcheck_bitfield_end(flags);
 
 	u16			sk_gso_max_segs;
+	u8			sk_pacing_shift;
 	unsigned long	        sk_lingertime;
 	struct proto		*sk_prot_creator;
 	rwlock_t		sk_callback_lock;
diff --git a/net/core/sock.c b/net/core/sock.c
index 57bbd6040eb6a3c072ce4e024687786079552ddf..13719af7b4e35d2050ccba51d44c7f691a889b37 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2746,6 +2746,7 @@  void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_max_pacing_rate = ~0U;
 	sk->sk_pacing_rate = ~0U;
+	sk->sk_pacing_shift = 10;
 	sk->sk_incoming_cpu = -1;
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0256f7a410417d93c9edab9d25a3ce5a81c2b296..76dbe884f2469660028684a46fc19afa000a1353 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1720,7 +1720,7 @@  u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 {
 	u32 bytes, segs;
 
-	bytes = min(sk->sk_pacing_rate >> 10,
+	bytes = min(sk->sk_pacing_rate >> sk->sk_pacing_shift,
 		    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
 
 	/* Goal is to send at least one packet per ms,
@@ -2198,7 +2198,7 @@  static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 {
 	unsigned int limit;
 
-	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> sk->sk_pacing_shift);
 	limit = min_t(u32, limit,
 		      sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);
 	limit <<= factor;