diff mbox

A buggy behavior for Linux TCP Reno and HTCP

Message ID CADVnQymwOwWVr_Ln5+04pyiGWChf-K0XcXYP5s-+db+q1zGB9w@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Neal Cardwell July 24, 2017, 6:29 p.m. UTC
On Mon, Jul 24, 2017 at 2:17 PM, Yuchung Cheng <ycheng@google.com> wrote:
> On Sun, Jul 23, 2017 at 7:37 PM, Neal Cardwell <ncardwell@google.com> wrote:
>> On Sun, Jul 23, 2017 at 10:36 PM, Neal Cardwell <ncardwell@google.com> wrote:
...
>>> What if we call the field tp->prior_cwnd? Then at least we'd have some
>>> nice symmetry:
>>>
>>> - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
>>> - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
>>> restored upon undo)
>>>
>>> That sounds appealing to me. WDYT?
>>
>> And, I should add, if we go with the tp->prior_cwnd approach, then we
>> can have a single "default"/CUBIC-style undo function, instead of 15
>> separate but identical implementations...
> you mean all CC modules share one ca_ops->undo_cwnd function? sounds a
> nice consolidation work.

Yes, exactly.

Right now we have 9 modules that have identical tcp_foo_cwnd_undo functions:

tcp_bic.c:188:  return max(tp->snd_cwnd, ca->loss_cwnd);
tcp_cubic.c:378:        return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
tcp_dctcp.c:318:        return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
tcp_highspeed.c:165:    return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
tcp_illinois.c:309:     return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
tcp_nv.c:190:   return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
tcp_scalable.c:50:      return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
tcp_veno.c:210: return max(tcp_sk(sk)->snd_cwnd, veno->loss_cwnd);
tcp_yeah.c:232: return max(tcp_sk(sk)->snd_cwnd, yeah->loss_cwnd);

And if we fix this bug in tcp_reno_undo_cwnd() by referring to
ca->loss_cwnd then we will add another 6 like this.

So my proposal would be

- tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
- tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
   restored upon undo)

Actually, now that I re-read the code, we already do have a
prior_cwnd, which is used for the PRR code, and already set upon
entering CA_Recovery. So if we set prior_cwnd for CA_Loss, perhaps we
can do something like:


And then change all the CC modules but BBR to use the
tcp_reno_undo_cwnd() instead of their own custom undo code.

WDYT?

neal

Comments

Yuchung Cheng July 24, 2017, 11:41 p.m. UTC | #1
On Mon, Jul 24, 2017 at 11:29 AM, Neal Cardwell <ncardwell@google.com> wrote:
> On Mon, Jul 24, 2017 at 2:17 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> On Sun, Jul 23, 2017 at 7:37 PM, Neal Cardwell <ncardwell@google.com> wrote:
>>> On Sun, Jul 23, 2017 at 10:36 PM, Neal Cardwell <ncardwell@google.com> wrote:
> ...
>>>> What if we call the field tp->prior_cwnd? Then at least we'd have some
>>>> nice symmetry:
>>>>
>>>> - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
>>>> - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
>>>> restored upon undo)
>>>>
>>>> That sounds appealing to me. WDYT?
>>>
>>> And, I should add, if we go with the tp->prior_cwnd approach, then we
>>> can have a single "default"/CUBIC-style undo function, instead of 15
>>> separate but identical implementations...
>> you mean all CC modules share one ca_ops->undo_cwnd function? sounds a
>> nice consolidation work.
>
> Yes, exactly.
>
> Right now we have 9 modules that have identical tcp_foo_cwnd_undo functions:
>
> tcp_bic.c:188:  return max(tp->snd_cwnd, ca->loss_cwnd);
> tcp_cubic.c:378:        return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_dctcp.c:318:        return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_highspeed.c:165:    return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_illinois.c:309:     return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_nv.c:190:   return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_scalable.c:50:      return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> tcp_veno.c:210: return max(tcp_sk(sk)->snd_cwnd, veno->loss_cwnd);
> tcp_yeah.c:232: return max(tcp_sk(sk)->snd_cwnd, yeah->loss_cwnd);
>
> And if we fix this bug in tcp_reno_undo_cwnd() by referring to
> ca->loss_cwnd then we will add another 6 like this.
>
> So my proposal would be
>
> - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
> - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
>    restored upon undo)
>
> Actually, now that I re-read the code, we already do have a
> prior_cwnd, which is used for the PRR code, and already set upon
> entering CA_Recovery. So if we set prior_cwnd for CA_Loss, perhaps we
> can do something like:
>
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index fde983f6376b..c2b174469645 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -456,7 +456,7 @@ u32 tcp_reno_undo_cwnd(struct sock *sk)
>  {
>         const struct tcp_sock *tp = tcp_sk(sk);
>
> -       return max(tp->snd_cwnd, tp->snd_ssthresh << 1);
> +       return max(tp->snd_cwnd, tp->prior_cwnd);
>  }
>  EXPORT_SYMBOL_GPL(tcp_reno_undo_cwnd);
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 2920e0cb09f8..ae790a84302d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -1951,6 +1951,7 @@ void tcp_enter_loss(struct sock *sk)
>             !after(tp->high_seq, tp->snd_una) ||
>             (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
>                 tp->prior_ssthresh = tcp_current_ssthresh(sk);
> +               tp->prior_cwnd = tp->snd_cwnd;
>                 tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
>                 tcp_ca_event(sk, CA_EVENT_LOSS);
>                 tcp_init_undo(tp);
>
> And then change all the CC modules but BBR to use the
> tcp_reno_undo_cwnd() instead of their own custom undo code.
>
> WDYT?
Looks reasonable. But we might want to eventually refactor TCP undo
code: the stats changes (prior_ssthresh, prior_cwnd, undo_marker,
undo_retrans) are scattered in different helpers, making the code hard
to audit.

>
> neal
Stephen Hemminger July 25, 2017, 4:19 a.m. UTC | #2
On Mon, 24 Jul 2017 16:41:12 -0700
Yuchung Cheng <ycheng@google.com> wrote:

> On Mon, Jul 24, 2017 at 11:29 AM, Neal Cardwell <ncardwell@google.com> wrote:
> > On Mon, Jul 24, 2017 at 2:17 PM, Yuchung Cheng <ycheng@google.com> wrote:  
> >> On Sun, Jul 23, 2017 at 7:37 PM, Neal Cardwell <ncardwell@google.com> wrote:  
> >>> On Sun, Jul 23, 2017 at 10:36 PM, Neal Cardwell <ncardwell@google.com> wrote:  
> > ...  
> >>>> What if we call the field tp->prior_cwnd? Then at least we'd have some
> >>>> nice symmetry:
> >>>>
> >>>> - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
> >>>> - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
> >>>> restored upon undo)
> >>>>
> >>>> That sounds appealing to me. WDYT?  
> >>>
> >>> And, I should add, if we go with the tp->prior_cwnd approach, then we
> >>> can have a single "default"/CUBIC-style undo function, instead of 15
> >>> separate but identical implementations...  
> >> you mean all CC modules share one ca_ops->undo_cwnd function? sounds a
> >> nice consolidation work.  
> >
> > Yes, exactly.
> >
> > Right now we have 9 modules that have identical tcp_foo_cwnd_undo functions:
> >
> > tcp_bic.c:188:  return max(tp->snd_cwnd, ca->loss_cwnd);
> > tcp_cubic.c:378:        return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_dctcp.c:318:        return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_highspeed.c:165:    return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_illinois.c:309:     return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_nv.c:190:   return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_scalable.c:50:      return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
> > tcp_veno.c:210: return max(tcp_sk(sk)->snd_cwnd, veno->loss_cwnd);
> > tcp_yeah.c:232: return max(tcp_sk(sk)->snd_cwnd, yeah->loss_cwnd);
> >
> > And if we fix this bug in tcp_reno_undo_cwnd() by referring to
> > ca->loss_cwnd then we will add another 6 like this.
> >
> > So my proposal would be
> >
> > - tp->snd_cwnd,  which is saved in tp->prior_cwnd  (and restored upon undo)
> > - tp->snd_ssthresh,  which is saved in tp-> prior_ssthresh  (and
> >    restored upon undo)
> >
> > Actually, now that I re-read the code, we already do have a
> > prior_cwnd, which is used for the PRR code, and already set upon
> > entering CA_Recovery. So if we set prior_cwnd for CA_Loss, perhaps we
> > can do something like:
> >
> > diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> > index fde983f6376b..c2b174469645 100644
> > --- a/net/ipv4/tcp_cong.c
> > +++ b/net/ipv4/tcp_cong.c
> > @@ -456,7 +456,7 @@ u32 tcp_reno_undo_cwnd(struct sock *sk)
> >  {
> >         const struct tcp_sock *tp = tcp_sk(sk);
> >
> > -       return max(tp->snd_cwnd, tp->snd_ssthresh << 1);
> > +       return max(tp->snd_cwnd, tp->prior_cwnd);
> >  }
> >  EXPORT_SYMBOL_GPL(tcp_reno_undo_cwnd);
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 2920e0cb09f8..ae790a84302d 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -1951,6 +1951,7 @@ void tcp_enter_loss(struct sock *sk)
> >             !after(tp->high_seq, tp->snd_una) ||
> >             (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
> >                 tp->prior_ssthresh = tcp_current_ssthresh(sk);
> > +               tp->prior_cwnd = tp->snd_cwnd;
> >                 tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> >                 tcp_ca_event(sk, CA_EVENT_LOSS);
> >                 tcp_init_undo(tp);
> >
> > And then change all the CC modules but BBR to use the
> > tcp_reno_undo_cwnd() instead of their own custom undo code.
> >
> > WDYT?  
> Looks reasonable. But we might want to eventually refactor TCP undo
> code: the stats changes (prior_ssthresh, prior_cwnd, undo_marker,
> undo_retrans) are scattered in different helpers, making the code hard
> to audit.

I like having common code as much as possible,
having per CC undo means more variations and sources of errors.
diff mbox

Patch

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index fde983f6376b..c2b174469645 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -456,7 +456,7 @@  u32 tcp_reno_undo_cwnd(struct sock *sk)
 {
        const struct tcp_sock *tp = tcp_sk(sk);

-       return max(tp->snd_cwnd, tp->snd_ssthresh << 1);
+       return max(tp->snd_cwnd, tp->prior_cwnd);
 }
 EXPORT_SYMBOL_GPL(tcp_reno_undo_cwnd);

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2920e0cb09f8..ae790a84302d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1951,6 +1951,7 @@  void tcp_enter_loss(struct sock *sk)
            !after(tp->high_seq, tp->snd_una) ||
            (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
                tp->prior_ssthresh = tcp_current_ssthresh(sk);
+               tp->prior_cwnd = tp->snd_cwnd;
                tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
                tcp_ca_event(sk, CA_EVENT_LOSS);
                tcp_init_undo(tp);