diff mbox

[REGRESSION] tcp/ipv4: kernel panic because of (possible) division by zero

Message ID CADVnQykPOVf7PoK8-6Mvbmbcpe1eYY=cfKHgiNGDNSFfCDPb-A@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Neal Cardwell Jan. 10, 2016, 5:29 p.m. UTC
On Sun, Jan 10, 2016 at 9:57 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> I use YeAH. But YeAH code wasn't touched between 4.2 and 4.3.

Oh, interesting. Looks like tcp_yeah_ssthresh() has a bug where its
intended reduction can be bigger than tp->snd_cwnd, leading to it
return a zero ssthresh (or even an ssthresh that underflows to ~4
billion). If tcp_yeah_ssthresh() returns an ssthresh of 0 then PRR
will try to pull the cwnd down to 0.

Can you please leave ECN and Yeah enabled and run something like the
following patch, to verify this conjecture? If the conjecture is
right, then the tcp_yeah warning should fire but not the new
tcp_cwnd_reduction() warning:

-----------
-----------

If that works, then we may just want a version of this patch without
the warning.

Thanks!
neal

Comments

Oleksandr Natalenko Jan. 10, 2016, 5:50 p.m. UTC | #1
Haven't you missed "return ssthresh;" statement?

On неділя, 10 січня 2016 р. 12:29:17 EET Neal Cardwell wrote:
> On Sun, Jan 10, 2016 at 9:57 AM, Oleksandr Natalenko
> 
> <oleksandr@natalenko.name> wrote:
> > I use YeAH. But YeAH code wasn't touched between 4.2 and 4.3.
> 
> Oh, interesting. Looks like tcp_yeah_ssthresh() has a bug where its
> intended reduction can be bigger than tp->snd_cwnd, leading to it
> return a zero ssthresh (or even an ssthresh that underflows to ~4
> billion). If tcp_yeah_ssthresh() returns an ssthresh of 0 then PRR
> will try to pull the cwnd down to 0.
> 
> Can you please leave ECN and Yeah enabled and run something like the
> following patch, to verify this conjecture? If the conjecture is
> right, then the tcp_yeah warning should fire but not the new
> tcp_cwnd_reduction() warning:
> 
> -----------
> diff --git a/net/ipv4/tcp_yeah.c b/net/ipv4/tcp_yeah.c
> index 17d3566..ef60cba 100644
> --- a/net/ipv4/tcp_yeah.c
> +++ b/net/ipv4/tcp_yeah.c
> @@ -206,6 +206,7 @@ static u32 tcp_yeah_ssthresh(struct sock *sk)
>         const struct tcp_sock *tp = tcp_sk(sk);
>         struct yeah *yeah = inet_csk_ca(sk);
>         u32 reduction;
> +       s32 ssthresh;
> 
>         if (yeah->doing_reno_now < TCP_YEAH_RHO) {
>                 reduction = yeah->lastQ;
> @@ -219,7 +220,9 @@ static u32 tcp_yeah_ssthresh(struct sock *sk)
>         yeah->fast_count = 0;
>         yeah->reno_count = max(yeah->reno_count>>1, 2U);
> 
> -       return tp->snd_cwnd - reduction;
> +       ssthresh = tp->snd_cwnd - reduction;
> +       if (WARN_ON_ONCE(ssthresh <= 0))
> +               ssthresh = 1;
>  }
> 
>  static struct tcp_congestion_ops tcp_yeah __read_mostly = {
> -----------
> 
> If that works, then we may just want a version of this patch without
> the warning.
> 
> Thanks!
> neal
Neal Cardwell Jan. 10, 2016, 6 p.m. UTC | #2
On Sun, Jan 10, 2016 at 12:50 PM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Haven't you missed "return ssthresh;" statement?

Yes, sorry. You'd clearly need to return ssthresh as well. :-)

neal
Oleksandr Natalenko Jan. 10, 2016, 9:56 p.m. UTC | #3
OK, it seems the assumption about YeAH is correct. Here is stacktrace fired by 
WARN_ON_ONCE():

https://gist.github.com/851cedcfca60d6120035

Is there sufficient info for you to prepare upstream patch?

On неділя, 10 січня 2016 р. 12:29:17 EET Neal Cardwell wrote:
> On Sun, Jan 10, 2016 at 9:57 AM, Oleksandr Natalenko
> 
> <oleksandr@natalenko.name> wrote:
> > I use YeAH. But YeAH code wasn't touched between 4.2 and 4.3.
> 
> Oh, interesting. Looks like tcp_yeah_ssthresh() has a bug where its
> intended reduction can be bigger than tp->snd_cwnd, leading to it
> return a zero ssthresh (or even an ssthresh that underflows to ~4
> billion). If tcp_yeah_ssthresh() returns an ssthresh of 0 then PRR
> will try to pull the cwnd down to 0.
> 
> Can you please leave ECN and Yeah enabled and run something like the
> following patch, to verify this conjecture? If the conjecture is
> right, then the tcp_yeah warning should fire but not the new
> tcp_cwnd_reduction() warning:
> 
> -----------
> diff --git a/net/ipv4/tcp_yeah.c b/net/ipv4/tcp_yeah.c
> index 17d3566..ef60cba 100644
> --- a/net/ipv4/tcp_yeah.c
> +++ b/net/ipv4/tcp_yeah.c
> @@ -206,6 +206,7 @@ static u32 tcp_yeah_ssthresh(struct sock *sk)
>         const struct tcp_sock *tp = tcp_sk(sk);
>         struct yeah *yeah = inet_csk_ca(sk);
>         u32 reduction;
> +       s32 ssthresh;
> 
>         if (yeah->doing_reno_now < TCP_YEAH_RHO) {
>                 reduction = yeah->lastQ;
> @@ -219,7 +220,9 @@ static u32 tcp_yeah_ssthresh(struct sock *sk)
>         yeah->fast_count = 0;
>         yeah->reno_count = max(yeah->reno_count>>1, 2U);
> 
> -       return tp->snd_cwnd - reduction;
> +       ssthresh = tp->snd_cwnd - reduction;
> +       if (WARN_ON_ONCE(ssthresh <= 0))
> +               ssthresh = 1;
>  }
> 
>  static struct tcp_congestion_ops tcp_yeah __read_mostly = {
> -----------
> 
> If that works, then we may just want a version of this patch without
> the warning.
> 
> Thanks!
> neal
Neal Cardwell Jan. 11, 2016, 6:47 p.m. UTC | #4
On Sun, Jan 10, 2016 at 4:56 PM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> OK, it seems the assumption about YeAH is correct. Here is stacktrace fired by
> WARN_ON_ONCE():
>
> https://gist.github.com/851cedcfca60d6120035
>
> Is there sufficient info for you to prepare upstream patch?

Great. Thanks for running that test! Yes, I think that's enough info.
We sent a proposed upstream patch here:

  http://patchwork.ozlabs.org/patch/566072/
  tcp_yeah: don't set ssthresh below 2

Feel free to add your "Tested-By" to the thread if you have a chance
to test that patch.

Thanks,
neal
Oleksandr Natalenko Jan. 11, 2016, 11:26 p.m. UTC | #5
Compiled and booted OK, no issues so far, but I'd prefer to give it a several 
days test.

Thanks!

On понеділок, 11 січня 2016 р. 13:47:08 EET Neal Cardwell wrote:
> Great. Thanks for running that test! Yes, I think that's enough info.
> We sent a proposed upstream patch here:
> 
>   http://patchwork.ozlabs.org/patch/566072/
>   tcp_yeah: don't set ssthresh below 2
> 
> Feel free to add your "Tested-By" to the thread if you have a chance
> to test that patch.
diff mbox

Patch

diff --git a/net/ipv4/tcp_yeah.c b/net/ipv4/tcp_yeah.c
index 17d3566..ef60cba 100644
--- a/net/ipv4/tcp_yeah.c
+++ b/net/ipv4/tcp_yeah.c
@@ -206,6 +206,7 @@  static u32 tcp_yeah_ssthresh(struct sock *sk)
        const struct tcp_sock *tp = tcp_sk(sk);
        struct yeah *yeah = inet_csk_ca(sk);
        u32 reduction;
+       s32 ssthresh;

        if (yeah->doing_reno_now < TCP_YEAH_RHO) {
                reduction = yeah->lastQ;
@@ -219,7 +220,9 @@  static u32 tcp_yeah_ssthresh(struct sock *sk)
        yeah->fast_count = 0;
        yeah->reno_count = max(yeah->reno_count>>1, 2U);

-       return tp->snd_cwnd - reduction;
+       ssthresh = tp->snd_cwnd - reduction;
+       if (WARN_ON_ONCE(ssthresh <= 0))
+               ssthresh = 1;
 }

 static struct tcp_congestion_ops tcp_yeah __read_mostly = {