diff mbox series

[net] tcp_bbr: better deal with suboptimal GSO

Message ID 1519224183.55655.40.camel@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net] tcp_bbr: better deal with suboptimal GSO | expand

Commit Message

Eric Dumazet Feb. 21, 2018, 2:43 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

BBR uses tcp_tso_autosize() in an attempt to probe what would be the
burst sizes and to adjust cwnd in bbr_target_cwnd() with following
gold formula :

/* Allow enough full-sized skbs in flight to utilize end systems. */
cwnd += 3 * bbr->tso_segs_goal;

But GSO can be lacking or be constrained to very small
units (ip link set dev ... gso_max_segs 2)

What we really want is to have enough packets in flight so that both
GSO and GRO are efficient.

So in the case GSO is off or downgraded, we still want to have the same
number of packets in flight as if GSO/TSO was fully operational, so
that GRO can hopefully be working efficiently.

To fix this issue, we make tcp_tso_autosize() unaware of
sk->sk_gso_max_segs

Only tcp_tso_segs() has to enforce the gso_max_segs limit.

Tested:

ethtool -K eth0 tso off gso off
tc qd replace dev eth0 root pfifo_fast

Before patch:
for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
    691  (ss -temoi shows cwnd is stuck around 6 )
    667
    651
    631
    517

After patch :
# for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
   1733 (ss -temoi shows cwnd is around 386 )
   1778
   1746
   1781
   1718

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
---
 net/ipv4/tcp_output.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Paolo Abeni Feb. 21, 2018, 3:01 p.m. UTC | #1
Hi,

On Wed, 2018-02-21 at 06:43 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> gold formula :
> 
> /* Allow enough full-sized skbs in flight to utilize end systems. */
> cwnd += 3 * bbr->tso_segs_goal;
> 
> But GSO can be lacking or be constrained to very small
> units (ip link set dev ... gso_max_segs 2)
> 
> What we really want is to have enough packets in flight so that both
> GSO and GRO are efficient.
> 
> So in the case GSO is off or downgraded, we still want to have the same
> number of packets in flight as if GSO/TSO was fully operational, so
> that GRO can hopefully be working efficiently.
> 
> To fix this issue, we make tcp_tso_autosize() unaware of
> sk->sk_gso_max_segs
> 
> Only tcp_tso_segs() has to enforce the gso_max_segs limit.
> 
> Tested:
> 
> ethtool -K eth0 tso off gso off
> tc qd replace dev eth0 root pfifo_fast
> 
> Before patch:
> for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>     691  (ss -temoi shows cwnd is stuck around 6 )
>     667
>     651
>     631
>     517
> 
> After patch :
> # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>    1733 (ss -temoi shows cwnd is around 386 )
>    1778
>    1746
>    1781
>    1718
> 
> Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  net/ipv4/tcp_output.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
>  	 */
>  	segs = max_t(u32, bytes / mss_now, min_tso_segs);
>  
> -	return min_t(u32, segs, sk->sk_gso_max_segs);
> +	return segs;
>  }
>  EXPORT_SYMBOL(tcp_tso_autosize);
>  

Very minor nit, why don't:

	return max_t(u32, bytes / mss_now, min_tso_segs);

and drop the 'segs' local variable?

Thanks,

Paolo
Eric Dumazet Feb. 21, 2018, 3:09 p.m. UTC | #2
On Wed, Feb 21, 2018 at 7:01 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>
> Very minor nit, why don't:
>
>         return max_t(u32, bytes / mss_now, min_tso_segs);
>
> and drop the 'segs' local variable?

Simply to ease backports.

We had some constant changes in this function in the past.
Neal Cardwell Feb. 21, 2018, 3:14 p.m. UTC | #3
On Wed, Feb 21, 2018 at 9:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> gold formula :
>
> /* Allow enough full-sized skbs in flight to utilize end systems. */
> cwnd += 3 * bbr->tso_segs_goal;
>
> But GSO can be lacking or be constrained to very small
> units (ip link set dev ... gso_max_segs 2)
>
> What we really want is to have enough packets in flight so that both
> GSO and GRO are efficient.
>
> So in the case GSO is off or downgraded, we still want to have the same
> number of packets in flight as if GSO/TSO was fully operational, so
> that GRO can hopefully be working efficiently.
>
> To fix this issue, we make tcp_tso_autosize() unaware of
> sk->sk_gso_max_segs
>
> Only tcp_tso_segs() has to enforce the gso_max_segs limit.
>
> Tested:
>
> ethtool -K eth0 tso off gso off
> tc qd replace dev eth0 root pfifo_fast
>
> Before patch:
> for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>     691  (ss -temoi shows cwnd is stuck around 6 )
>     667
>     651
>     631
>     517
>
> After patch :
> # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>    1733 (ss -temoi shows cwnd is around 386 )
>    1778
>    1746
>    1781
>    1718
>
> Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  net/ipv4/tcp_output.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal
Soheil Hassas Yeganeh Feb. 21, 2018, 3:18 p.m. UTC | #4
On Wed, Feb 21, 2018 at 10:14 AM Neal Cardwell <ncardwell@google.com> wrote:

> On Wed, Feb 21, 2018 at 9:43 AM, Eric Dumazet <eric.dumazet@gmail.com>
wrote:
> >
> > From: Eric Dumazet <edumazet@google.com>
> >
> > BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> > burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> > gold formula :
> >
> > /* Allow enough full-sized skbs in flight to utilize end systems. */
> > cwnd += 3 * bbr->tso_segs_goal;
> >
> > But GSO can be lacking or be constrained to very small
> > units (ip link set dev ... gso_max_segs 2)
> >
> > What we really want is to have enough packets in flight so that both
> > GSO and GRO are efficient.
> >
> > So in the case GSO is off or downgraded, we still want to have the same
> > number of packets in flight as if GSO/TSO was fully operational, so
> > that GRO can hopefully be working efficiently.
> >
> > To fix this issue, we make tcp_tso_autosize() unaware of
> > sk->sk_gso_max_segs
> >
> > Only tcp_tso_segs() has to enforce the gso_max_segs limit.
> >
> > Tested:
> >
> > ethtool -K eth0 tso off gso off
> > tc qd replace dev eth0 root pfifo_fast
> >
> > Before patch:
> > for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
> >     691  (ss -temoi shows cwnd is stuck around 6 )
> >     667
> >     651
> >     631
> >     517
> >
> > After patch :
> > # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
> >    1733 (ss -temoi shows cwnd is around 386 )
> >    1778
> >    1746
> >    1781
> >    1718
> >
> > Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > ---
> >  net/ipv4/tcp_output.c |    9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)

> Acked-by: Neal Cardwell <ncardwell@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you Eric for the nice patch!
Paolo Abeni Feb. 21, 2018, 3:55 p.m. UTC | #5
On Wed, 2018-02-21 at 07:09 -0800, Eric Dumazet wrote:
> On Wed, Feb 21, 2018 at 7:01 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Very minor nit, why don't:
> > 
> >         return max_t(u32, bytes / mss_now, min_tso_segs);
> > 
> > and drop the 'segs' local variable?
> 
> Simply to ease backports.
> 
> We had some constant changes in this function in the past.

Ok, thank you for the explanation. No objections on my side.

Cheers,

Paolo
David Miller Feb. 22, 2018, 7:16 p.m. UTC | #6
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Feb 2018 06:43:03 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> gold formula :
> 
> /* Allow enough full-sized skbs in flight to utilize end systems. */
> cwnd += 3 * bbr->tso_segs_goal;
> 
> But GSO can be lacking or be constrained to very small
> units (ip link set dev ... gso_max_segs 2)
> 
> What we really want is to have enough packets in flight so that both
> GSO and GRO are efficient.
> 
> So in the case GSO is off or downgraded, we still want to have the same
> number of packets in flight as if GSO/TSO was fully operational, so
> that GRO can hopefully be working efficiently.
> 
> To fix this issue, we make tcp_tso_autosize() unaware of
> sk->sk_gso_max_segs
> 
> Only tcp_tso_segs() has to enforce the gso_max_segs limit.
 . ..
> Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>


Applied and queued up for -stable, thanks Eric.
diff mbox series

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1730,7 +1730,7 @@  u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 	 */
 	segs = max_t(u32, bytes / mss_now, min_tso_segs);
 
-	return min_t(u32, segs, sk->sk_gso_max_segs);
+	return segs;
 }
 EXPORT_SYMBOL(tcp_tso_autosize);
 
@@ -1742,9 +1742,10 @@  static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
 	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0;
 
-	return tso_segs ? :
-		tcp_tso_autosize(sk, mss_now,
-				 sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	if (!tso_segs)
+		tso_segs = tcp_tso_autosize(sk, mss_now,
+				sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	return min_t(u32, tso_segs, sk->sk_gso_max_segs);
 }
 
 /* Returns the portion of skb which can be sent right away */