diff mbox series

[net-next] tcp: allow drivers to tweak TSQ logic

Message ID 1510281664.2849.143.camel@edumazet-glaptop3.roam.corp.google.com
State Superseded, archived
Delegated to: David Miller
Headers show
Series [net-next] tcp: allow drivers to tweak TSQ logic | expand

Commit Message

Eric Dumazet Nov. 10, 2017, 2:41 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

I had many reports that TSQ logic breaks wifi aggregation.

Current logic is to allow up to 1 ms of bytes to be queued into qdisc
and drivers queues.

But Wifi aggregation needs a bigger budget to allow bigger rates to
be discovered by various TCP Congestion Controls algorithms.

This patch adds an extra socket field, allowing wifi drivers to select
another log scale to derive TCP Small Queue credit from current pacing
rate.

Initial value is 10, meaning that this patch does not change current
behavior.

We expect wifi drivers to set this field to smaller values (tests have
been done with values from 6 to 9)

They would have to use following template :

if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
     skb->sk->sk_pacing_shift = MY_PACING_SHIFT;


Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Kir Kolyshkin <kir@openvz.org>
---
 include/net/sock.h    |    1 +
 net/core/sock.c       |    1 +
 net/ipv4/tcp_output.c |    4 ++--
 3 files changed, 4 insertions(+), 2 deletions(-)

Comments

Johannes Berg Nov. 11, 2017, 2:27 p.m. UTC | #1
Thanks Eric!

> We expect wifi drivers to set this field to smaller values (tests have
> been done with values from 6 to 9)

I suppose we should test each driver or so.

> They would have to use following template :
> 
> if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
>      skb->sk->sk_pacing_shift = MY_PACING_SHIFT;

Hm. I wish we wouldn't have to do this on every skb, but perhaps it
doesn't matter that much.


>  	u16			sk_gso_max_segs;
> +	u8			sk_pacing_shift;

I guess you tried to fill a hole, but weren't we saying that it would
be better in the same cacheline? Then again, perhaps both cachelines
are resident anyway, haven't looked at this now.

Unrelated to that, I think this is missing a documentation update since
the struct has kernel-doc comments.

johannes
Eric Dumazet Nov. 11, 2017, 11:38 p.m. UTC | #2
On Sat, 2017-11-11 at 15:27 +0100, Johannes Berg wrote:
> Thanks Eric!
> 
> > We expect wifi drivers to set this field to smaller values (tests have
> > been done with values from 6 to 9)
> 
> I suppose we should test each driver or so.
> 
> > They would have to use following template :
> > 
> > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
> >      skb->sk->sk_pacing_shift = MY_PACING_SHIFT;
> 
> Hm. I wish we wouldn't have to do this on every skb, but perhaps it
> doesn't matter that much.

Yes, it does not matter, even at 40Gbit ;)

> 
> 
> >  	u16			sk_gso_max_segs;
> > +	u8			sk_pacing_shift;
> 
> I guess you tried to fill a hole, but weren't we saying that it would
> be better in the same cacheline? Then again, perhaps both cachelines
> are resident anyway, haven't looked at this now.

Same cache line already ;)

	u32                        sk_pacing_rate;       /* 0x1c0   0x4 */
	u32                        sk_max_pacing_rate;   /* 0x1c4   0x4 */
	struct page_frag           sk_frag;              /* 0x1c8  0x10 */
	netdev_features_t          sk_route_caps;        /* 0x1d8   0x8 */
	netdev_features_t          sk_route_nocaps;      /* 0x1e0   0x8 */
	int                        sk_gso_type;          /* 0x1e8   0x4 */
	unsigned int               sk_gso_max_size;      /* 0x1ec   0x4 */
	gfp_t                      sk_allocation;        /* 0x1f0   0x4 */
	__u32                      sk_txhash;            /* 0x1f4   0x4 */
	unsigned int               __sk_flags_offset[0]; /* 0x1f8     0 */
	unsigned int               sk_padding:1;         /* 0x1f8:0x1f 0x4 */
	unsigned int               sk_kern_sock:1;       /* 0x1f8:0x1e 0x4 */
	unsigned int               sk_no_check_tx:1;     /* 0x1f8:0x1d 0x4 */
	unsigned int               sk_no_check_rx:1;     /* 0x1f8:0x1c 0x4 */
	unsigned int               sk_userlocks:4;       /* 0x1f8:0x18 0x4 */
	unsigned int               sk_protocol:8;        /* 0x1f8:0x10 0x4 */
	unsigned int               sk_type:16;           /* 0x1f8: 0 0x4 */
	u16                        sk_gso_max_segs;      /* 0x1fc   0x2 */
	u8			   sk_pacing_shift;	 /* 0x1fe   0x1 */






> 
> Unrelated to that, I think this is missing a documentation update since
> the struct has kernel-doc comments.

Yeah, I believe these kernel-doc on gigantic struct sock are useless and
we should remove them, they have zero useful info.
Toke Høiland-Jørgensen Nov. 12, 2017, 2:35 p.m. UTC | #3
Johannes Berg <johannes@sipsolutions.net> writes:

> Thanks Eric!
>
>> We expect wifi drivers to set this field to smaller values (tests have
>> been done with values from 6 to 9)
>
> I suppose we should test each driver or so.

I think that for TXQ-enabled drivers we could probably set this in
tx_dequeue()?

Will test it when I get back to my testbed...

-Toke
Johannes Berg Nov. 13, 2017, 9:21 a.m. UTC | #4
On Sat, 2017-11-11 at 15:38 -0800, Eric Dumazet wrote:

> > Hm. I wish we wouldn't have to do this on every skb, but perhaps it
> > doesn't matter that much.
> 
> Yes, it does not matter, even at 40Gbit ;)

:)

> Same cache line already ;)

Hah. It seemed so far away, but actually looking at the layout helps :)

> > Unrelated to that, I think this is missing a documentation update since
> > the struct has kernel-doc comments.
> 
> Yeah, I believe these kernel-doc on gigantic struct sock are useless and
> we should remove them, they have zero useful info.

No argument from me, but while it's there ... anyway I see you resent
this, thanks for doing this.

I'll try to get people to run some tests for our driver.

johannes
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 688a823dccc306bd21f47da167c6922161af5a6a..fb0e5194a3bce61fac00fc234d2a5d1bb3c60f35 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -451,6 +451,7 @@  struct sock {
 	kmemcheck_bitfield_end(flags);
 
 	u16			sk_gso_max_segs;
+	u8			sk_pacing_shift;
 	unsigned long	        sk_lingertime;
 	struct proto		*sk_prot_creator;
 	rwlock_t		sk_callback_lock;
diff --git a/net/core/sock.c b/net/core/sock.c
index c59bcf90d90536fedc7809e397f6bd414781b529..2811ff8322d4a5f68e3e745cf585564e1ec5d809 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2746,6 +2746,7 @@  void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_max_pacing_rate = ~0U;
 	sk->sk_pacing_rate = ~0U;
+	sk->sk_pacing_shift = 10;
 	sk->sk_incoming_cpu = -1;
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9b98d35aa0d8d0a829e4a41985d805d4e2895a8e..fa5e7b81b5ec12039b1347474f5183b1d9c87887 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1737,7 +1737,7 @@  u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 {
 	u32 bytes, segs;
 
-	bytes = min(sk->sk_pacing_rate >> 10,
+	bytes = min(sk->sk_pacing_rate >> sk->sk_pacing_shift,
 		    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
 
 	/* Goal is to send at least one packet per ms,
@@ -2215,7 +2215,7 @@  static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 {
 	unsigned int limit;
 
-	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+	limit = max(2 * skb->truesize, sk->sk_pacing_rate >> sk->sk_pacing_shift);
 	limit = min_t(u32, limit,
 		      sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);
 	limit <<= factor;