diff mbox series

[v3,net,2/5] tcp: prevent bogus FRTO undos with non-SACK flows

Message ID 1520936711-16784-3-git-send-email-ilpo.jarvinen@helsinki.fi
State Deferred, archived
Delegated to: David Miller
Headers show
Series tcp: fixes to non-SACK TCP | expand

Commit Message

Ilpo Järvinen March 13, 2018, 10:25 a.m. UTC
If SACK is not enabled and the first cumulative ACK after the RTO
retransmission covers more than the retransmitted skb, a spurious
FRTO undo will trigger (assuming FRTO is enabled for that RTO).
The reason is that any non-retransmitted segment acknowledged will
set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
no indication that it would have been delivered for real (the
scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
case so the check for that bit won't help like it does with SACK).
Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
in tcp_process_loss.

We need to use more strict condition for non-SACK case and check
that none of the cumulatively ACKed segments were retransmitted
to prove that progress is due to original transmissions. Only then
keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
non-SACK case.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Yuchung Cheng March 28, 2018, 10:26 p.m. UTC | #1
On Tue, Mar 13, 2018 at 3:25 AM, Ilpo Järvinen
<ilpo.jarvinen@helsinki.fi> wrote:
>
> If SACK is not enabled and the first cumulative ACK after the RTO
> retransmission covers more than the retransmitted skb, a spurious
> FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> The reason is that any non-retransmitted segment acknowledged will
> set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> no indication that it would have been delivered for real (the
> scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> case so the check for that bit won't help like it does with SACK).
> Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
> in tcp_process_loss.
>
> We need to use more strict condition for non-SACK case and check
> that none of the cumulatively ACKed segments were retransmitted
> to prove that progress is due to original transmissions. Only then
> keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
> non-SACK case.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> ---
>  net/ipv4/tcp_input.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4a26c09..c60745c 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3166,6 +3166,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
>                                 pkts_acked = rexmit_acked + newdata_acked;
>
>                         tcp_remove_reno_sacks(sk, pkts_acked);
> +
> +                       /* If any of the cumulatively ACKed segments was
> +                        * retransmitted, non-SACK case cannot confirm that
> +                        * progress was due to original transmission due to
> +                        * lack of TCPCB_SACKED_ACKED bits even if some of
> +                        * the packets may have been never retransmitted.
> +                        */
> +                       if (flag & FLAG_RETRANS_DATA_ACKED)
> +                               flag &= ~FLAG_ORIG_SACK_ACKED;

How about keeping your excellent comment but move the fix to F-RTO
code directly so it's more clear? this way the flag remains clear that
indicates some never-retransmitted data are acked/sacked.

// pseudo code for illustration

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8d480542aa07..f7f3357de618 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2629,8 +2629,15 @@ static void tcp_process_loss(struct sock *sk,
int flag, bool is_dupack,
        if (tp->frto) { /* F-RTO RFC5682 sec 3.1 (sack enhanced version). */
                /* Step 3.b. A timeout is spurious if not all data are
                 * lost, i.e., never-retransmitted data are (s)acked.
+                *
+                * If any of the cumulatively ACKed segments was
+                * retransmitted, non-SACK case cannot confirm that
+                * progress was due to original transmission due to
+                * lack of TCPCB_SACKED_ACKED bits even if some of
+                * the packets may have been never retransmitted.
                 */
                if ((flag & FLAG_ORIG_SACK_ACKED) &&
+                   (tcp_is_sack(tp) || !FLAG_RETRANS_DATA_ACKED) &&
                    tcp_try_undo_loss(sk, true))
                        return;





>                 } else {
>                         int delta;
>
> --
> 2.7.4
>
Ilpo Järvinen April 4, 2018, 10:35 a.m. UTC | #2
On Wed, 28 Mar 2018, Yuchung Cheng wrote:

> On Tue, Mar 13, 2018 at 3:25 AM, Ilpo Järvinen
> <ilpo.jarvinen@helsinki.fi> wrote:
> >
> > If SACK is not enabled and the first cumulative ACK after the RTO
> > retransmission covers more than the retransmitted skb, a spurious
> > FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> > The reason is that any non-retransmitted segment acknowledged will
> > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > no indication that it would have been delivered for real (the
> > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > case so the check for that bit won't help like it does with SACK).
> > Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
> > in tcp_process_loss.
> >
> > We need to use more strict condition for non-SACK case and check
> > that none of the cumulatively ACKed segments were retransmitted
> > to prove that progress is due to original transmissions. Only then
> > keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
> > non-SACK case.
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > ---
> >  net/ipv4/tcp_input.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 4a26c09..c60745c 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -3166,6 +3166,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
> >                                 pkts_acked = rexmit_acked + newdata_acked;
> >
> >                         tcp_remove_reno_sacks(sk, pkts_acked);
> > +
> > +                       /* If any of the cumulatively ACKed segments was
> > +                        * retransmitted, non-SACK case cannot confirm that
> > +                        * progress was due to original transmission due to
> > +                        * lack of TCPCB_SACKED_ACKED bits even if some of
> > +                        * the packets may have been never retransmitted.
> > +                        */
> > +                       if (flag & FLAG_RETRANS_DATA_ACKED)
> > +                               flag &= ~FLAG_ORIG_SACK_ACKED;
> 
> How about keeping your excellent comment but move the fix to F-RTO
> code directly so it's more clear? this way the flag remains clear that
> indicates some never-retransmitted data are acked/sacked.
> 
> // pseudo code for illustration
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 8d480542aa07..f7f3357de618 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2629,8 +2629,15 @@ static void tcp_process_loss(struct sock *sk,
> int flag, bool is_dupack,
>         if (tp->frto) { /* F-RTO RFC5682 sec 3.1 (sack enhanced version). */
>                 /* Step 3.b. A timeout is spurious if not all data are
>                  * lost, i.e., never-retransmitted data are (s)acked.
> +                *
> +                * If any of the cumulatively ACKed segments was
> +                * retransmitted, non-SACK case cannot confirm that
> +                * progress was due to original transmission due to
> +                * lack of TCPCB_SACKED_ACKED bits even if some of
> +                * the packets may have been never retransmitted.
>                  */
>                 if ((flag & FLAG_ORIG_SACK_ACKED) &&
> +                   (tcp_is_sack(tp) || !FLAG_RETRANS_DATA_ACKED) &&
>                     tcp_try_undo_loss(sk, true))
>                         return;

Of course I could put the back there but I really like the new place more 
(which was a result of your suggestion to place the code elsewhere).
IMHO, it makes more sense to have it in tcp_clean_rtx_queue() because we 
weren't successful in proving (there in tcp_clean_rtx_queue) that progress 
was due original transmission and thus I would not want falsely indicate 
it with that flag. And there's the non-SACK related block anyway already 
there so it keeps the non-SACK "pollution" off from the SACK code paths.

(In addition, I'd actually also like to rename FLAG_ORIG_SACK_ACKED to 
FLAG_ORIG_PROGRESS, the latter is more descriptive about the condition 
we're after regardless of SACK and less ambiguous in non-SACK case).
Neal Cardwell April 4, 2018, 4:33 p.m. UTC | #3
On Wed, Apr 4, 2018 at 6:35 AM Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
wrote:

> On Wed, 28 Mar 2018, Yuchung Cheng wrote:

> > On Tue, Mar 13, 2018 at 3:25 AM, Ilpo Järvinen
> > <ilpo.jarvinen@helsinki.fi> wrote:
> > >
> > > If SACK is not enabled and the first cumulative ACK after the RTO
> > > retransmission covers more than the retransmitted skb, a spurious
> > > FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> > > The reason is that any non-retransmitted segment acknowledged will
> > > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > > no indication that it would have been delivered for real (the
> > > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > > case so the check for that bit won't help like it does with SACK).
> > > Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
> > > in tcp_process_loss.
> > >
> > > We need to use more strict condition for non-SACK case and check
> > > that none of the cumulatively ACKed segments were retransmitted
> > > to prove that progress is due to original transmissions. Only then
> > > keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
> > > non-SACK case.
> > >
> > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > > ---
> > >  net/ipv4/tcp_input.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 4a26c09..c60745c 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -3166,6 +3166,15 @@ static int tcp_clean_rtx_queue(struct sock
*sk, u32 prior_fack,
> > >                                 pkts_acked = rexmit_acked +
newdata_acked;
> > >
> > >                         tcp_remove_reno_sacks(sk, pkts_acked);
> > > +
> > > +                       /* If any of the cumulatively ACKed segments
was
> > > +                        * retransmitted, non-SACK case cannot
confirm that
> > > +                        * progress was due to original transmission
due to
> > > +                        * lack of TCPCB_SACKED_ACKED bits even if
some of
> > > +                        * the packets may have been never
retransmitted.
> > > +                        */
> > > +                       if (flag & FLAG_RETRANS_DATA_ACKED)
> > > +                               flag &= ~FLAG_ORIG_SACK_ACKED;

FWIW I'd vote for this version.

> Of course I could put the back there but I really like the new place more
> (which was a result of your suggestion to place the code elsewhere).
> IMHO, it makes more sense to have it in tcp_clean_rtx_queue() because we
> weren't successful in proving (there in tcp_clean_rtx_queue) that progress
> was due original transmission and thus I would not want falsely indicate
> it with that flag. And there's the non-SACK related block anyway already
> there so it keeps the non-SACK "pollution" off from the SACK code paths.

I think that's a compelling argument. In particular, in terms of long-term
maintenance it seems risky to allow an unsound/buggy FLAG_ORIG_SACK_ACKED
bit be returned by tcp_clean_rtx_queue(). If we return an
incorrect/imcomplete FLAG_ORIG_SACK_ACKED bit then I worry that one day we
will forget that for non-SACK flows that bit is incorrect/imcomplete, and
we will add code using that bit but forgetting to check (tcp_is_sack(tp) ||
!FLAG_RETRANS_DATA_ACKED).

> (In addition, I'd actually also like to rename FLAG_ORIG_SACK_ACKED to
> FLAG_ORIG_PROGRESS, the latter is more descriptive about the condition
> we're after regardless of SACK and less ambiguous in non-SACK case).

I'm neutral on this. Not sure if the extra clarity is worth the code churn.

cheers,
neal
Yuchung Cheng April 4, 2018, 5:12 p.m. UTC | #4
On Wed, Apr 4, 2018 at 9:33 AM, Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Apr 4, 2018 at 6:35 AM Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> wrote:
>
> > On Wed, 28 Mar 2018, Yuchung Cheng wrote:
>
> > > On Tue, Mar 13, 2018 at 3:25 AM, Ilpo Järvinen
> > > <ilpo.jarvinen@helsinki.fi> wrote:
> > > >
> > > > If SACK is not enabled and the first cumulative ACK after the RTO
> > > > retransmission covers more than the retransmitted skb, a spurious
> > > > FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> > > > The reason is that any non-retransmitted segment acknowledged will
> > > > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > > > no indication that it would have been delivered for real (the
> > > > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > > > case so the check for that bit won't help like it does with SACK).
> > > > Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
> > > > in tcp_process_loss.
> > > >
> > > > We need to use more strict condition for non-SACK case and check
> > > > that none of the cumulatively ACKed segments were retransmitted
> > > > to prove that progress is due to original transmissions. Only then
> > > > keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
> > > > non-SACK case.
> > > >
> > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > > > ---
> > > >  net/ipv4/tcp_input.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > index 4a26c09..c60745c 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -3166,6 +3166,15 @@ static int tcp_clean_rtx_queue(struct sock
> *sk, u32 prior_fack,
> > > >                                 pkts_acked = rexmit_acked +
> newdata_acked;
> > > >
> > > >                         tcp_remove_reno_sacks(sk, pkts_acked);
> > > > +
> > > > +                       /* If any of the cumulatively ACKed segments
> was
> > > > +                        * retransmitted, non-SACK case cannot
> confirm that
> > > > +                        * progress was due to original transmission
> due to
> > > > +                        * lack of TCPCB_SACKED_ACKED bits even if
> some of
> > > > +                        * the packets may have been never
> retransmitted.
> > > > +                        */
> > > > +                       if (flag & FLAG_RETRANS_DATA_ACKED)
> > > > +                               flag &= ~FLAG_ORIG_SACK_ACKED;
>
> FWIW I'd vote for this version.
>
> > Of course I could put the back there but I really like the new place more
> > (which was a result of your suggestion to place the code elsewhere).
> > IMHO, it makes more sense to have it in tcp_clean_rtx_queue() because we
> > weren't successful in proving (there in tcp_clean_rtx_queue) that progress
> > was due original transmission and thus I would not want falsely indicate
> > it with that flag. And there's the non-SACK related block anyway already
> > there so it keeps the non-SACK "pollution" off from the SACK code paths.
>
> I think that's a compelling argument. In particular, in terms of long-term
> maintenance it seems risky to allow an unsound/buggy FLAG_ORIG_SACK_ACKED
> bit be returned by tcp_clean_rtx_queue(). If we return an
> incorrect/imcomplete FLAG_ORIG_SACK_ACKED bit then I worry that one day we
> will forget that for non-SACK flows that bit is incorrect/imcomplete, and
> we will add code using that bit but forgetting to check (tcp_is_sack(tp) ||
> !FLAG_RETRANS_DATA_ACKED).
Agreed. That's a good point. And I would much preferred to rename that
to FLAG_ORIG_PROGRESS (w/ updated comment).

so I think we're in agreement to use existing patch w/ the new name
FLAG_ORIG_PROGRESS

>
> > (In addition, I'd actually also like to rename FLAG_ORIG_SACK_ACKED to
> > FLAG_ORIG_PROGRESS, the latter is more descriptive about the condition
> > we're after regardless of SACK and less ambiguous in non-SACK case).
>
> I'm neutral on this. Not sure if the extra clarity is worth the code churn.
>
> cheers,
> neal
Neal Cardwell April 4, 2018, 5:22 p.m. UTC | #5
n Wed, Apr 4, 2018 at 1:13 PM Yuchung Cheng <ycheng@google.com> wrote:
> Agreed. That's a good point. And I would much preferred to rename that
> to FLAG_ORIG_PROGRESS (w/ updated comment).

> so I think we're in agreement to use existing patch w/ the new name
> FLAG_ORIG_PROGRESS

Yes, SGTM.

I guess this "prevent bogus FRTO undos" patch would go to "net" branch and
the s/FLAG_ORIG_SACK_ACKED/FLAG_ORIG_PROGRESS/ would go in "net-next"
branch?

neal
Yuchung Cheng April 4, 2018, 5:40 p.m. UTC | #6
On Wed, Apr 4, 2018 at 10:22 AM, Neal Cardwell <ncardwell@google.com> wrote:
> n Wed, Apr 4, 2018 at 1:13 PM Yuchung Cheng <ycheng@google.com> wrote:
>> Agreed. That's a good point. And I would much preferred to rename that
>> to FLAG_ORIG_PROGRESS (w/ updated comment).
>
>> so I think we're in agreement to use existing patch w/ the new name
>> FLAG_ORIG_PROGRESS
>
> Yes, SGTM.
>
> I guess this "prevent bogus FRTO undos" patch would go to "net" branch and
> the s/FLAG_ORIG_SACK_ACKED/FLAG_ORIG_PROGRESS/ would go in "net-next"
> branch?
huh? why not one patch ... this is getting close to patch-split paralyses.

>
> neal
Neal Cardwell April 4, 2018, 5:44 p.m. UTC | #7
On Wed, Apr 4, 2018 at 1:41 PM Yuchung Cheng <ycheng@google.com> wrote:

> On Wed, Apr 4, 2018 at 10:22 AM, Neal Cardwell <ncardwell@google.com>
wrote:
> > n Wed, Apr 4, 2018 at 1:13 PM Yuchung Cheng <ycheng@google.com> wrote:
> >> Agreed. That's a good point. And I would much preferred to rename that
> >> to FLAG_ORIG_PROGRESS (w/ updated comment).
> >
> >> so I think we're in agreement to use existing patch w/ the new name
> >> FLAG_ORIG_PROGRESS
> >
> > Yes, SGTM.
> >
> > I guess this "prevent bogus FRTO undos" patch would go to "net" branch
and
> > the s/FLAG_ORIG_SACK_ACKED/FLAG_ORIG_PROGRESS/ would go in "net-next"
> > branch?
> huh? why not one patch ... this is getting close to patch-split paralyses.

The flag rename seemed like a cosmetic issue that was not needed for the
fix. Smelled like net-next to me. But I don't feel strongly. However you
guys want to package it is fine with me. :-)

neal
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4a26c09..c60745c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3166,6 +3166,15 @@  static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack,
 				pkts_acked = rexmit_acked + newdata_acked;
 
 			tcp_remove_reno_sacks(sk, pkts_acked);
+
+			/* If any of the cumulatively ACKed segments was
+			 * retransmitted, non-SACK case cannot confirm that
+			 * progress was due to original transmission due to
+			 * lack of TCPCB_SACKED_ACKED bits even if some of
+			 * the packets may have been never retransmitted.
+			 */
+			if (flag & FLAG_RETRANS_DATA_ACKED)
+				flag &= ~FLAG_ORIG_SACK_ACKED;
 		} else {
 			int delta;