diff mbox

[net-next] tcp: tcp_tso_autosize() minimum is one packet

Message ID 1432655728.4060.257.camel@edumazet-glaptop2.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 26, 2015, 3:55 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

By making sure sk->sk_gso_max_segs minimal value is one,
and sysctl_tcp_min_tso_segs minimal value is one as well,
tcp_tso_autosize() will return a non zero value.

We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b
("tcp: Do not apply TSO segment limit to non-TSO packets")
and save few cpu cycles in fast path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 net/core/sock.c            |    5 ++++-
 net/ipv4/sysctl_net_ipv4.c |    2 +-
 net/ipv4/tcp_output.c      |    4 ++--
 3 files changed, 7 insertions(+), 4 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Neal Cardwell May 26, 2015, 4:03 p.m. UTC | #1
On Tue, May 26, 2015 at 11:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> By making sure sk->sk_gso_max_segs minimal value is one,
> and sysctl_tcp_min_tso_segs minimal value is one as well,
> tcp_tso_autosize() will return a non zero value.
>
> We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b
> ("tcp: Do not apply TSO segment limit to non-TSO packets")
> and save few cpu cycles in fast path.

Thanks, Eric! This is a nice clean-up/simplification.

Acked-by: Neal Cardwell <ncardwell@google.com>

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu May 27, 2015, 12:03 a.m. UTC | #2
Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> -               if (tso_segs == 1 || !max_segs) {
> +               if (tso_segs == 1) {

What we're testing for here with max_segs is actually the question
"is TSO enabled".  So with your patch we will be taking the TSO
code path even when TSO is disabled.  Now this may or may not
trigger the original bug that I was trying to fix but it still
feels unsafe.

So please convince me that it is totally safe to take the TSO
code path with TSO disabled, e.g., when PMTU causes tso_segs
to be greater than one.

Thanks,
Eric Dumazet May 27, 2015, 12:36 a.m. UTC | #3
On Wed, 2015-05-27 at 08:03 +0800, Herbert Xu wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > -               if (tso_segs == 1 || !max_segs) {
> > +               if (tso_segs == 1) {
> 
> What we're testing for here with max_segs is actually the question
> "is TSO enabled".  So with your patch we will be taking the TSO
> code path even when TSO is disabled.  Now this may or may not
> trigger the original bug that I was trying to fix but it still
> feels unsafe.
> 
> So please convince me that it is totally safe to take the TSO
> code path with TSO disabled, e.g., when PMTU causes tso_segs
> to be greater than one.

tldr:  "TSO with max_segs==1"  <is the same than> "no TSO/GSO"


So worth thing that will happen is that the call to
tcp_mss_split_point() / tso_fragment() will nicely split the (TSO/GSO)
packet in a nice non GSO packet of one MSS before transmit.

Right now, one can "ethtool -K eth0 tso off gso off" in the middle of a
TCP session and  tcp_write_xmit() automatically falls back to splitting
too big skbs that were cooked at the time GSO/TSO was considered true in
sendmsg().
  
Not sure what you need to be convinced ;)

Thanks !

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu May 27, 2015, 1:38 a.m. UTC | #4
On Tue, May 26, 2015 at 05:36:34PM -0700, Eric Dumazet wrote:
> 
> tldr:  "TSO with max_segs==1"  <is the same than> "no TSO/GSO"

Not really.  They're not identical.  For example, before your
patch a packet greater than MSS with TSO disabled would call
tcp_nagle_test, with your patch it will call tcp_tso_should_defer
instead.

Maybe this is OK but it is far from obvious.

Cheers,
Herbert Xu May 27, 2015, 2:01 a.m. UTC | #5
On Wed, May 27, 2015 at 09:38:40AM +0800, Herbert Xu wrote:
> 
> Not really.  They're not identical.  For example, before your
> patch a packet greater than MSS with TSO disabled would call
> tcp_nagle_test, with your patch it will call tcp_tso_should_defer
> instead.
> 
> Maybe this is OK but it is far from obvious.

Funny enough it was you who originally introduced this change
in behaviour:

commit 8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 15 12:24:54 2013 -0700

    tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()

Now you're trying to do it again :)

The problem is back at the very beginning, tso_segs > 1 was the
same as TSO enabled.  So we either need to restore this identity
or find a new way of indiciating that TSO is enabled.

Cheers,
Eric Dumazet May 27, 2015, 2:14 a.m. UTC | #6
On Wed, 2015-05-27 at 09:38 +0800, Herbert Xu wrote:
> On Tue, May 26, 2015 at 05:36:34PM -0700, Eric Dumazet wrote:
> > 
> > tldr:  "TSO with max_segs==1"  <is the same than> "no TSO/GSO"
> 
> Not really.  They're not identical.  For example, before your
> patch a packet greater than MSS with TSO disabled would call
> tcp_nagle_test, with your patch it will call tcp_tso_should_defer
> instead.

Well, given that a device can set gso_max_segs to one, if there is a bug
here we'll need to fix it asap.

Fact that Nagle or tso should defer applies in this corner case is not
very important here, unless you have a specific case in mind ?
Anyway I double checked and I believes it is fine.

We normally deal with dynamic MSS changes, even for non GSO cases.

A non GSO packet temporarily becomes a GSO one in tcp_init_tso_segs()
(because its skb->len is bigger than cur_mss)

Then we split it.

Nagle or tso should defer would take same decision : send one full MSS.

By the time the last 'segment' (possibly smaller than mss) will be
considered, Nagle might apply there.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 27, 2015, 2:17 a.m. UTC | #7
On Wed, 2015-05-27 at 10:01 +0800, Herbert Xu wrote:
> On Wed, May 27, 2015 at 09:38:40AM +0800, Herbert Xu wrote:
> > 
> > Not really.  They're not identical.  For example, before your
> > patch a packet greater than MSS with TSO disabled would call
> > tcp_nagle_test, with your patch it will call tcp_tso_should_defer
> > instead.
> > 
> > Maybe this is OK but it is far from obvious.
> 
> Funny enough it was you who originally introduced this change
> in behaviour:
> 
> commit 8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Oct 15 12:24:54 2013 -0700
> 
>     tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()
> 
> Now you're trying to do it again :)
> 
> The problem is back at the very beginning, tso_segs > 1 was the
> same as TSO enabled.  So we either need to restore this identity
> or find a new way of indiciating that TSO is enabled.
> 
> Cheers,


I have no idea of what you are trying to tell me.

I want to remove duplicate tests, not pushing new ones.

I do not want to add back sk_can_gso()

I do not find something funny here. Sorry.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu May 27, 2015, 2:52 a.m. UTC | #8
On Tue, May 26, 2015 at 07:14:41PM -0700, Eric Dumazet wrote:
>
> Fact that Nagle or tso should defer applies in this corner case is not
> very important here, unless you have a specific case in mind ?
> Anyway I double checked and I believes it is fine.

Yes it does appear to be fine on a second look, for the case where
skb->len > mss.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
David Miller May 27, 2015, 3:52 a.m. UTC | #9
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 26 May 2015 08:55:28 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> By making sure sk->sk_gso_max_segs minimal value is one,
> and sysctl_tcp_min_tso_segs minimal value is one as well,
> tcp_tso_autosize() will return a non zero value.
> 
> We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b
> ("tcp: Do not apply TSO segment limit to non-TSO packets")
> and save few cpu cycles in fast path.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 27, 2015, 4:07 a.m. UTC | #10
On Tue, 2015-05-26 at 23:52 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 26 May 2015 08:55:28 -0700
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > By making sure sk->sk_gso_max_segs minimal value is one,
> > and sysctl_tcp_min_tso_segs minimal value is one as well,
> > tcp_tso_autosize() will return a non zero value.
> > 
> > We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b
> > ("tcp: Do not apply TSO segment limit to non-TSO packets")
> > and save few cpu cycles in fast path.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Applied, thanks Eric.

Thanks David, and many thanks to Herbert and Neal for the reviews !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 29124fcdc42a6367c20aea2e2811ec4e72558d7e..e72633c346b197b9dd6c94800fa0c55110aedfb2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1581,6 +1581,8 @@  EXPORT_SYMBOL_GPL(sk_clone_lock);
 
 void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 {
+	u32 max_segs = 1;
+
 	__sk_dst_set(sk, dst);
 	sk->sk_route_caps = dst->dev->features;
 	if (sk->sk_route_caps & NETIF_F_GSO)
@@ -1592,9 +1594,10 @@  void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 		} else {
 			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
 			sk->sk_gso_max_size = dst->dev->gso_max_size;
-			sk->sk_gso_max_segs = dst->dev->gso_max_segs;
+			max_segs = max_t(u32, dst->dev->gso_max_segs, 1);
 		}
 	}
+	sk->sk_gso_max_segs = max_segs;
 }
 EXPORT_SYMBOL_GPL(sk_setup_caps);
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 841de32f1fee4e6faa55ebf855df7168ee996fc3..e64892769607a6089e10d5c9e63dc3564c983341 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -702,7 +702,7 @@  static struct ctl_table ipv4_table[] = {
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &zero,
+		.extra1		= &one,
 		.extra2		= &gso_max_segs,
 	},
 	{
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 534e5fdb04c11152bae36f47a786e8b10b823cd3..190538a2a88ce061ca1352c6c993b7008e759a52 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2088,7 +2088,7 @@  static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now)))
 			break;
 
-		if (tso_segs == 1 || !max_segs) {
+		if (tso_segs == 1) {
 			if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
 						     (tcp_skb_is_last(sk, skb) ?
 						      nonagle : TCP_NAGLE_PUSH))))
@@ -2101,7 +2101,7 @@  static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		}
 
 		limit = mss_now;
-		if (tso_segs > 1 && max_segs && !tcp_urg_mode(tp))
+		if (tso_segs > 1 && !tcp_urg_mode(tp))
 			limit = tcp_mss_split_point(sk, skb, mss_now,
 						    min_t(unsigned int,
 							  cwnd_quota,