diff mbox

[net-next] TLP: Don't reschedule PTO when there's one outstanding TLP retransmission.

Message ID 1500971735-21852-1-git-send-email-maowenan@huawei.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

maowenan July 25, 2017, 8:35 a.m. UTC
If there is one TLP probe went out(TLP use the write_queue_tail packet
as TLP probe, we assume this first TLP probe named A), and this TLP
probe was not acked by receive side.

Then the transmit side sent the next two packetes out(named B,C), but
unfortunately these two packets are also not acked by receive side.

And then there is one data packet with ack_seq A arrive, in tcp_ack() 
will call tcp_schedule_loss_probe() to rearm PTO, the handler 
tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is 
one outstanding TLP named A,tp->tlp_high_seq is not zero),
so the new TLP probe can't be went out and need to rearm the RTO
timer(timeout is relative to the transmit time of the write queue head).

After this, another data packet with ack_seq A is received,
if the tlp_time_stamp is after rto_time_stamp, it will reset the
TLP timeout with delta value, which is before previous RTO timeout, 
so PTO is rearm and previous RTO is cleared. This TLP probe also can't 
be sent out because of tp->tlp_high_seq != 0, so there is no way(or need 
very long time)to retransmit the packet because of TLP A is lost.

This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe()
when TLP PTO is after RTO, It is no need to reschedule PTO when there
is one outstanding TLP retransmission, so if the TLP A is lost then RTO can
retransmit that packet, and tp->tlp_high_seq will be set to 0. After this TLP
will go to the normal work process.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 net/ipv4/tcp_output.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

maowenan July 25, 2017, 9:19 a.m. UTC | #1
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Mao Wenan
> Sent: Tuesday, July 25, 2017 4:36 PM
> To: netdev@vger.kernel.org; davem@davemloft.net; weiyongjun (A);
> Chenweilong
> Subject: [PATCH net-next] TLP: Don't reschedule PTO when there's one
> outstanding TLP retransmission.
> 
> If there is one TLP probe went out(TLP use the write_queue_tail packet as TLP
> probe, we assume this first TLP probe named A), and this TLP probe was not
> acked by receive side.
> 
> Then the transmit side sent the next two packetes out(named B,C), but
> unfortunately these two packets are also not acked by receive side.
> 
> And then there is one data packet with ack_seq A arrive, in tcp_ack() will call
> tcp_schedule_loss_probe() to rearm PTO, the handler
> tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one
> outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP probe
> can't be went out and need to rearm the RTO timer(timeout is relative to the
> transmit time of the write queue head).
> 
> After this, another data packet with ack_seq A is received, if the
> tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout with
> delta value, which is before previous RTO timeout, so PTO is rearm and
> previous RTO is cleared. This TLP probe also can't be sent out because of
> tp->tlp_high_seq != 0, so there is no way(or need very long time)to retransmit
> the packet because of TLP A is lost.
> 
> This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe()
> when TLP PTO is after RTO, It is no need to reschedule PTO when there is one
> outstanding TLP retransmission, so if the TLP A is lost then RTO can retransmit
> that packet, and tp->tlp_high_seq will be set to 0. After this TLP will go to the
> normal work process.
> 
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  net/ipv4/tcp_output.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> 886d874..0c8da1c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2423,6 +2423,10 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  	tlp_time_stamp = tcp_jiffies32 + timeout;
>  	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
>  	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
> +		/*It is no need to reschedule PTO when there is one outstanding TLP
> retransmission*/
> +		if (tp->tlp_high_seq) {
> +			return false;
> +		}
>  		s32 delta = rto_time_stamp - tcp_jiffies32;
>  		if (delta > 0)
>  			timeout = delta;
> --
> 2.5.0
> 
Add ycheng@google.com; kuznet@ms2.inr.ac.ru; linux-kernel@vger.kernel.org in mail loop.
Sergei Shtylyov July 25, 2017, 9:54 a.m. UTC | #2
Hello!

On 7/25/2017 11:35 AM, Mao Wenan wrote:

> If there is one TLP probe went out(TLP use the write_queue_tail packet
> as TLP probe, we assume this first TLP probe named A), and this TLP
> probe was not acked by receive side.
> 
> Then the transmit side sent the next two packetes out(named B,C), but
> unfortunately these two packets are also not acked by receive side.
> 
> And then there is one data packet with ack_seq A arrive, in tcp_ack()
> will call tcp_schedule_loss_probe() to rearm PTO, the handler
> tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is
> one outstanding TLP named A,tp->tlp_high_seq is not zero),
> so the new TLP probe can't be went out and need to rearm the RTO
> timer(timeout is relative to the transmit time of the write queue head).
> 
> After this, another data packet with ack_seq A is received,
> if the tlp_time_stamp is after rto_time_stamp, it will reset the
> TLP timeout with delta value, which is before previous RTO timeout,
> so PTO is rearm and previous RTO is cleared. This TLP probe also can't
> be sent out because of tp->tlp_high_seq != 0, so there is no way(or need
> very long time)to retransmit the packet because of TLP A is lost.
> 
> This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe()
> when TLP PTO is after RTO, It is no need to reschedule PTO when there
> is one outstanding TLP retransmission, so if the TLP A is lost then RTO can
> retransmit that packet, and tp->tlp_high_seq will be set to 0. After this TLP
> will go to the normal work process.
> 
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>   net/ipv4/tcp_output.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 886d874..0c8da1c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2423,6 +2423,10 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>   	tlp_time_stamp = tcp_jiffies32 + timeout;
>   	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
>   	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
> +		/*It is no need to reschedule PTO when there is one outstanding TLP retransmission*/

    Please add space after /* and before */

> +		if (tp->tlp_high_seq) {
> +			return false;
> +		}

    {} not needed.

[...]

MBR, Sergei
Neal Cardwell July 25, 2017, 1:29 p.m. UTC | #3
On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com> wrote:
> If there is one TLP probe went out(TLP use the write_queue_tail packet
> as TLP probe, we assume this first TLP probe named A), and this TLP
> probe was not acked by receive side.
>
> Then the transmit side sent the next two packetes out(named B,C), but
> unfortunately these two packets are also not acked by receive side.
>
> And then there is one data packet with ack_seq A arrive, in tcp_ack()
> will call tcp_schedule_loss_probe() to rearm PTO, the handler
> tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is
> one outstanding TLP named A,tp->tlp_high_seq is not zero),
> so the new TLP probe can't be went out and need to rearm the RTO
> timer(timeout is relative to the transmit time of the write queue head).
>
> After this, another data packet with ack_seq A is received,
> if the tlp_time_stamp is after rto_time_stamp, it will reset the
> TLP timeout with delta value, which is before previous RTO timeout,
> so PTO is rearm and previous RTO is cleared. This TLP probe also can't
> be sent out because of tp->tlp_high_seq != 0, so there is no way(or need
> very long time)to retransmit the packet because of TLP A is lost.
>
> This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe()
> when TLP PTO is after RTO, It is no need to reschedule PTO when there
> is one outstanding TLP retransmission, so if the TLP A is lost then RTO can
> retransmit that packet, and tp->tlp_high_seq will be set to 0. After this TLP
> will go to the normal work process.
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>

Thanks for posting this. This is a pretty involved scenario. To help
document/test precisely what the behavior is before and after your
patch, would you be able to post a packetdrill (
https://github.com/google/packetdrill ) test case for this scenario?

Can I ask if you saw this scenario in an actual trace, or noticed this
by inspection?

thanks,
neal
maowenan July 26, 2017, 1:26 a.m. UTC | #4
> -----Original Message-----

> From: Sergei Shtylyov [mailto:sergei.shtylyov@cogentembedded.com]

> Sent: Tuesday, July 25, 2017 5:55 PM

> To: maowenan; netdev@vger.kernel.org; davem@davemloft.net; weiyongjun

> (A); Chenweilong

> Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one

> outstanding TLP retransmission.

> 

> Hello!

> 

> On 7/25/2017 11:35 AM, Mao Wenan wrote:

> 

> > If there is one TLP probe went out(TLP use the write_queue_tail packet

> > as TLP probe, we assume this first TLP probe named A), and this TLP

> > probe was not acked by receive side.

> >

> > Then the transmit side sent the next two packetes out(named B,C), but

> > unfortunately these two packets are also not acked by receive side.

> >

> > And then there is one data packet with ack_seq A arrive, in tcp_ack()

> > will call tcp_schedule_loss_probe() to rearm PTO, the handler

> > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one

> > outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP

> > probe can't be went out and need to rearm the RTO timer(timeout is

> > relative to the transmit time of the write queue head).

> >

> > After this, another data packet with ack_seq A is received, if the

> > tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout

> > with delta value, which is before previous RTO timeout, so PTO is

> > rearm and previous RTO is cleared. This TLP probe also can't be sent

> > out because of tp->tlp_high_seq != 0, so there is no way(or need very

> > long time)to retransmit the packet because of TLP A is lost.

> >

> > This fix is not to pass the if(tp->tlp_high_seq) in

> > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need to

> > reschedule PTO when there is one outstanding TLP retransmission, so if

> > the TLP A is lost then RTO can retransmit that packet, and

> > tp->tlp_high_seq will be set to 0. After this TLP will go to the normal work

> process.

> >

> > Signed-off-by: Mao Wenan <maowenan@huawei.com>

> > ---

> >   net/ipv4/tcp_output.c | 4 ++++

> >   1 file changed, 4 insertions(+)

> >

> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index

> > 886d874..0c8da1c 100644

> > --- a/net/ipv4/tcp_output.c

> > +++ b/net/ipv4/tcp_output.c

> > @@ -2423,6 +2423,10 @@ bool tcp_schedule_loss_probe(struct sock *sk)

> >   	tlp_time_stamp = tcp_jiffies32 + timeout;

> >   	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;

> >   	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {

> > +		/*It is no need to reschedule PTO when there is one outstanding TLP

> > +retransmission*/

> 

>     Please add space after /* and before */

Ok, thank you.
> 

> > +		if (tp->tlp_high_seq) {

> > +			return false;

> > +		}

> 

>     {} not needed.

Ok, I will send the V2.

> 

> [...]

> 

> MBR, Sergei
maowenan July 26, 2017, 2:12 a.m. UTC | #5
> -----Original Message-----

> From: Neal Cardwell [mailto:ncardwell@google.com]

> Sent: Tuesday, July 25, 2017 9:30 PM

> To: maowenan

> Cc: Netdev; David Miller; weiyongjun (A); Chenweilong; Yuchung Cheng;

> Nandita Dukkipati

> Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one

> outstanding TLP retransmission.

> 

> On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com>

> wrote:

> > If there is one TLP probe went out(TLP use the write_queue_tail packet

> > as TLP probe, we assume this first TLP probe named A), and this TLP

> > probe was not acked by receive side.

> >

> > Then the transmit side sent the next two packetes out(named B,C), but

> > unfortunately these two packets are also not acked by receive side.

> >

> > And then there is one data packet with ack_seq A arrive, in tcp_ack()

> > will call tcp_schedule_loss_probe() to rearm PTO, the handler

> > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one

> > outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP

> > probe can't be went out and need to rearm the RTO timer(timeout is

> > relative to the transmit time of the write queue head).

> >

> > After this, another data packet with ack_seq A is received, if the

> > tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout

> > with delta value, which is before previous RTO timeout, so PTO is

> > rearm and previous RTO is cleared. This TLP probe also can't be sent

> > out because of tp->tlp_high_seq != 0, so there is no way(or need very

> > long time)to retransmit the packet because of TLP A is lost.

> >

> > This fix is not to pass the if(tp->tlp_high_seq) in

> > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need to

> > reschedule PTO when there is one outstanding TLP retransmission, so if

> > the TLP A is lost then RTO can retransmit that packet, and

> > tp->tlp_high_seq will be set to 0. After this TLP will go to the normal work

> process.

> >

> > Signed-off-by: Mao Wenan <maowenan@huawei.com>

> 

> Thanks for posting this. This is a pretty involved scenario. To help

> document/test precisely what the behavior is before and after your patch,

> would you be able to post a packetdrill ( https://github.com/google/packetdrill )

> test case for this scenario?

> 

> Can I ask if you saw this scenario in an actual trace, or noticed this by

> inspection?

[Mao Wenan] It is my actual product scenario, from some debug trace info 
I found that PTO is always rescheduled before RTO, this is PTO is trigged 
by receiving one tcp_ack packets. After this patch, the product works well and 
there is no previous issue happen. The packet can retransmit with RTO if TLP probe 
is lost. 
I will say sorry that my local test can't reproduce because i can't build successfully the same 
complex scenario, I don't have mature test case to reproduce now but I will do my best to try
in local test environment. 
Thank you.
> 

> thanks,

> neal
kernel test robot July 26, 2017, 9:35 a.m. UTC | #6
Hi Mao,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Mao-Wenan/TLP-Don-t-reschedule-PTO-when-there-s-one-outstanding-TLP-retransmission/20170726-172222
config: x86_64-randconfig-x000-201730 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net//ipv4/tcp_output.c: In function 'tcp_schedule_loss_probe':
>> net//ipv4/tcp_output.c:2430:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
      s32 delta = rto_time_stamp - tcp_jiffies32;
      ^~~

vim +2430 net//ipv4/tcp_output.c

6ba8a3b1 Nandita Dukkipati 2013-03-11  2374  
6ba8a3b1 Nandita Dukkipati 2013-03-11  2375  bool tcp_schedule_loss_probe(struct sock *sk)
6ba8a3b1 Nandita Dukkipati 2013-03-11  2376  {
6ba8a3b1 Nandita Dukkipati 2013-03-11  2377  	struct inet_connection_sock *icsk = inet_csk(sk);
6ba8a3b1 Nandita Dukkipati 2013-03-11  2378  	struct tcp_sock *tp = tcp_sk(sk);
6ba8a3b1 Nandita Dukkipati 2013-03-11  2379  	u32 timeout, tlp_time_stamp, rto_time_stamp;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2380  
6ba8a3b1 Nandita Dukkipati 2013-03-11  2381  	/* No consecutive loss probes. */
6ba8a3b1 Nandita Dukkipati 2013-03-11  2382  	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
6ba8a3b1 Nandita Dukkipati 2013-03-11  2383  		tcp_rearm_rto(sk);
6ba8a3b1 Nandita Dukkipati 2013-03-11  2384  		return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2385  	}
6ba8a3b1 Nandita Dukkipati 2013-03-11  2386  	/* Don't do any loss probe on a Fast Open connection before 3WHS
6ba8a3b1 Nandita Dukkipati 2013-03-11  2387  	 * finishes.
6ba8a3b1 Nandita Dukkipati 2013-03-11  2388  	 */
f9b99582 Yuchung Cheng     2015-09-18  2389  	if (tp->fastopen_rsk)
6ba8a3b1 Nandita Dukkipati 2013-03-11  2390  		return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2391  
6ba8a3b1 Nandita Dukkipati 2013-03-11  2392  	/* TLP is only scheduled when next timer event is RTO. */
6ba8a3b1 Nandita Dukkipati 2013-03-11  2393  	if (icsk->icsk_pending != ICSK_TIME_RETRANS)
6ba8a3b1 Nandita Dukkipati 2013-03-11  2394  		return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2395  
6ba8a3b1 Nandita Dukkipati 2013-03-11  2396  	/* Schedule a loss probe in 2*RTT for SACK capable connections
6ba8a3b1 Nandita Dukkipati 2013-03-11  2397  	 * in Open state, that are either limited by cwnd or application.
6ba8a3b1 Nandita Dukkipati 2013-03-11  2398  	 */
bec41a11 Yuchung Cheng     2017-01-12  2399  	if ((sysctl_tcp_early_retrans != 3 && sysctl_tcp_early_retrans != 4) ||
bec41a11 Yuchung Cheng     2017-01-12  2400  	    !tp->packets_out || !tcp_is_sack(tp) ||
bec41a11 Yuchung Cheng     2017-01-12  2401  	    icsk->icsk_ca_state != TCP_CA_Open)
6ba8a3b1 Nandita Dukkipati 2013-03-11  2402  		return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2403  
6ba8a3b1 Nandita Dukkipati 2013-03-11  2404  	if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&
6ba8a3b1 Nandita Dukkipati 2013-03-11  2405  	     tcp_send_head(sk))
6ba8a3b1 Nandita Dukkipati 2013-03-11  2406  		return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2407  
bb4d991a Yuchung Cheng     2017-07-19  2408  	/* Probe timeout is 2*rtt. Add minimum RTO to account
f9b99582 Yuchung Cheng     2015-09-18  2409  	 * for delayed ack when there's one outstanding packet. If no RTT
f9b99582 Yuchung Cheng     2015-09-18  2410  	 * sample is available then probe after TCP_TIMEOUT_INIT.
6ba8a3b1 Nandita Dukkipati 2013-03-11  2411  	 */
bb4d991a Yuchung Cheng     2017-07-19  2412  	if (tp->srtt_us) {
bb4d991a Yuchung Cheng     2017-07-19  2413  		timeout = usecs_to_jiffies(tp->srtt_us >> 2);
6ba8a3b1 Nandita Dukkipati 2013-03-11  2414  		if (tp->packets_out == 1)
bb4d991a Yuchung Cheng     2017-07-19  2415  			timeout += TCP_RTO_MIN;
bb4d991a Yuchung Cheng     2017-07-19  2416  		else
bb4d991a Yuchung Cheng     2017-07-19  2417  			timeout += TCP_TIMEOUT_MIN;
bb4d991a Yuchung Cheng     2017-07-19  2418  	} else {
bb4d991a Yuchung Cheng     2017-07-19  2419  		timeout = TCP_TIMEOUT_INIT;
bb4d991a Yuchung Cheng     2017-07-19  2420  	}
6ba8a3b1 Nandita Dukkipati 2013-03-11  2421  
6ba8a3b1 Nandita Dukkipati 2013-03-11  2422  	/* If RTO is shorter, just schedule TLP in its place. */
ac9517fc Eric Dumazet      2017-05-16  2423  	tlp_time_stamp = tcp_jiffies32 + timeout;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2424  	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2425  	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
e41572cd Mao Wenan         2017-07-25  2426  		/*It is no need to reschedule PTO when there is one outstanding TLP retransmission*/
e41572cd Mao Wenan         2017-07-25  2427  		if (tp->tlp_high_seq) {
e41572cd Mao Wenan         2017-07-25  2428  			return false;
e41572cd Mao Wenan         2017-07-25  2429  		}
ac9517fc Eric Dumazet      2017-05-16 @2430  		s32 delta = rto_time_stamp - tcp_jiffies32;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2431  		if (delta > 0)
6ba8a3b1 Nandita Dukkipati 2013-03-11  2432  			timeout = delta;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2433  	}
6ba8a3b1 Nandita Dukkipati 2013-03-11  2434  
6ba8a3b1 Nandita Dukkipati 2013-03-11  2435  	inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
6ba8a3b1 Nandita Dukkipati 2013-03-11  2436  				  TCP_RTO_MAX);
6ba8a3b1 Nandita Dukkipati 2013-03-11  2437  	return true;
6ba8a3b1 Nandita Dukkipati 2013-03-11  2438  }
6ba8a3b1 Nandita Dukkipati 2013-03-11  2439  

:::::: The code at line 2430 was first introduced by commit
:::::: ac9517fcf310327fa3e3b0d8366e4b11236b1b4b tcp: replace misc tcp_time_stamp to tcp_jiffies32

:::::: TO: Eric Dumazet <edumazet@google.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Yuchung Cheng July 26, 2017, 4:45 p.m. UTC | #7
On Tue, Jul 25, 2017 at 7:12 PM, maowenan <maowenan@huawei.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Neal Cardwell [mailto:ncardwell@google.com]
> > Sent: Tuesday, July 25, 2017 9:30 PM
> > To: maowenan
> > Cc: Netdev; David Miller; weiyongjun (A); Chenweilong; Yuchung Cheng;
> > Nandita Dukkipati
> > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one
> > outstanding TLP retransmission.
> >
> > On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com>
> > wrote:
> > > If there is one TLP probe went out(TLP use the write_queue_tail packet
> > > as TLP probe, we assume this first TLP probe named A), and this TLP
> > > probe was not acked by receive side.
> > >
> > > Then the transmit side sent the next two packetes out(named B,C), but
> > > unfortunately these two packets are also not acked by receive side.
> > >
> > > And then there is one data packet with ack_seq A arrive, in tcp_ack()
> > > will call tcp_schedule_loss_probe() to rearm PTO, the handler
> > > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one
> > > outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP
> > > probe can't be went out and need to rearm the RTO timer(timeout is
> > > relative to the transmit time of the write queue head).
> > >
> > > After this, another data packet with ack_seq A is received, if the
> > > tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout
> > > with delta value, which is before previous RTO timeout, so PTO is
> > > rearm and previous RTO is cleared. This TLP probe also can't be sent
> > > out because of tp->tlp_high_seq != 0, so there is no way(or need very
> > > long time)to retransmit the packet because of TLP A is lost.
> > >
> > > This fix is not to pass the if(tp->tlp_high_seq) in
> > > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need to
> > > reschedule PTO when there is one outstanding TLP retransmission, so if
> > > the TLP A is lost then RTO can retransmit that packet, and
> > > tp->tlp_high_seq will be set to 0. After this TLP will go to the normal work
> > process.
> > >
> > > Signed-off-by: Mao Wenan <maowenan@huawei.com>
> >
> > Thanks for posting this. This is a pretty involved scenario. To help
> > document/test precisely what the behavior is before and after your patch,
> > would you be able to post a packetdrill ( https://github.com/google/packetdrill )
> > test case for this scenario?
> >
> > Can I ask if you saw this scenario in an actual trace, or noticed this by
> > inspection?
> [Mao Wenan] It is my actual product scenario, from some debug trace info
> I found that PTO is always rescheduled before RTO, this is PTO is trigged
> by receiving one tcp_ack packets. After this patch, the product works well and
> there is no previous issue happen. The packet can retransmit with RTO if TLP probe
> is lost.
> I will say sorry that my local test can't reproduce because i can't build successfully the same
> complex scenario, I don't have mature test case to reproduce now but I will do my best to try
> in local test environment.
tcpdump output of an example trace also helps. Feel free to remove IP
address, ports, or payload information. We only need the timing and
TCP header info.


> Thank you.
> >
> > thanks,
> > neal
maowenan July 27, 2017, 1:28 a.m. UTC | #8
> -----Original Message-----

> From: Yuchung Cheng [mailto:ycheng@google.com]

> Sent: Thursday, July 27, 2017 12:45 AM

> To: maowenan

> Cc: Neal Cardwell; Netdev; David Miller; weiyongjun (A); Chenweilong; Nandita

> Dukkipati; Wangkefeng (Kevin)

> Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one

> outstanding TLP retransmission.

> 

> On Tue, Jul 25, 2017 at 7:12 PM, maowenan <maowenan@huawei.com>

> wrote:

> >

> >

> >

> > > -----Original Message-----

> > > From: Neal Cardwell [mailto:ncardwell@google.com]

> > > Sent: Tuesday, July 25, 2017 9:30 PM

> > > To: maowenan

> > > Cc: Netdev; David Miller; weiyongjun (A); Chenweilong; Yuchung

> > > Cheng; Nandita Dukkipati

> > > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's

> > > one outstanding TLP retransmission.

> > >

> > > On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com>

> > > wrote:

> > > > If there is one TLP probe went out(TLP use the write_queue_tail

> > > > packet as TLP probe, we assume this first TLP probe named A), and

> > > > this TLP probe was not acked by receive side.

> > > >

> > > > Then the transmit side sent the next two packetes out(named B,C),

> > > > but unfortunately these two packets are also not acked by receive side.

> > > >

> > > > And then there is one data packet with ack_seq A arrive, in

> > > > tcp_ack() will call tcp_schedule_loss_probe() to rearm PTO, the

> > > > handler

> > > > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is

> > > > one outstanding TLP named A,tp->tlp_high_seq is not zero), so the

> > > > new TLP probe can't be went out and need to rearm the RTO

> > > > timer(timeout is relative to the transmit time of the write queue head).

> > > >

> > > > After this, another data packet with ack_seq A is received, if the

> > > > tlp_time_stamp is after rto_time_stamp, it will reset the TLP

> > > > timeout with delta value, which is before previous RTO timeout, so

> > > > PTO is rearm and previous RTO is cleared. This TLP probe also

> > > > can't be sent out because of tp->tlp_high_seq != 0, so there is no

> > > > way(or need very long time)to retransmit the packet because of TLP A is

> lost.

> > > >

> > > > This fix is not to pass the if(tp->tlp_high_seq) in

> > > > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need

> > > > to reschedule PTO when there is one outstanding TLP

> > > > retransmission, so if the TLP A is lost then RTO can retransmit

> > > > that packet, and

> > > > tp->tlp_high_seq will be set to 0. After this TLP will go to the

> > > > tp->normal work

> > > process.

> > > >

> > > > Signed-off-by: Mao Wenan <maowenan@huawei.com>

> > >

> > > Thanks for posting this. This is a pretty involved scenario. To help

> > > document/test precisely what the behavior is before and after your

> > > patch, would you be able to post a packetdrill (

> > > https://github.com/google/packetdrill ) test case for this scenario?

> > >

> > > Can I ask if you saw this scenario in an actual trace, or noticed

> > > this by inspection?

> > [Mao Wenan] It is my actual product scenario, from some debug trace

> > info I found that PTO is always rescheduled before RTO, this is PTO is

> > trigged by receiving one tcp_ack packets. After this patch, the

> > product works well and there is no previous issue happen. The packet

> > can retransmit with RTO if TLP probe is lost.

> > I will say sorry that my local test can't reproduce because i can't

> > build successfully the same complex scenario, I don't have mature test

> > case to reproduce now but I will do my best to try in local test environment.

> tcpdump output of an example trace also helps. Feel free to remove IP address,

> ports, or payload information. We only need the timing and TCP header info.

Ok, please check as below. Server: 192.169.0.18, client: 192.169.0.17
......
20:36:12.239562 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216544:501216616(72) ack 2308880351 win 92
20:36:12.240034 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216616:501216688(72) ack 2308880351 win 92
20:36:12.260764 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216616:501216688(72) ack 2308880351 win 92     /* first TLP probe A, seq: 501216616 */ 
20:36:13.018020 IP 192.169.0.17.56246 > 192.169.0.18.1234: . ack 501216616 win 1970                         
20:36:13.018386 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216688:501216760(72) ack 2308880351 win 92     /* server send packet B */
20:36:13.022448 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216760:501216832(72) ack 2308880351 win 92     /* server send packet C */
20:36:13.033813 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880351:2308880437(86) ack 501216616 win 1970  /* receive client data packet ack 501216616 */
20:36:13.034213 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880437:2308880523(86) ack 501216616 win 1970  /* receive client data packet ack 501216616 */
20:36:13.034633 IP 192.169.0.18.1234 > 192.169.0.17.56246: . ack 2308880523 win 92                            /* server pure ack client */
20:36:13.039512 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880523:2308880659(136) ack 501216616 win 1970
20:36:13.040083 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880659:2308880795(136) ack 501216616 win 1970
20:36:13.040491 IP 192.169.0.18.1234 > 192.169.0.17.56246: . ack 2308880795 win 92
20:36:13.041161 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880795:2308880931(136) ack 501216616 win 1970
20:36:13.042828 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880931:2308881032(101) ack 501216616 win 1970
20:36:13.043198 IP 192.169.0.18.1234 > 192.169.0.17.56246: . ack 2308881032 win 92
......
> 

> 

> > Thank you.

> > >

> > > thanks,

> > > neal
Neal Cardwell July 28, 2017, 10:57 p.m. UTC | #9
On Wed, Jul 26, 2017 at 9:28 PM, maowenan <maowenan@huawei.com> wrote:
> Ok, please check as below. Server: 192.169.0.18, client: 192.169.0.17
> ......

Thanks for this trace! Here is a condensed version that I believe
should be equivalent, for those who are working on parsing what's
going on:

2.239562 srv > cli: P 544:616(72) ack 351 win 92
2.240034 srv > cli: P 616:688(72) ack 351 win 92
2.260764 srv > cli: P 616:688(72) ack 351 win 92    # TLP
3.018020 cli > srv: . ack 616 win 1970
3.018386 srv > cli: P 688:760(72) ack 351 win 92    # srv sends B
3.022448 srv > cli: P 760:832(72) ack 351 win 92    # srv sends C
3.033813 cli > srv: P 351:0437(86) ack 616 win 1970 # data/ACK 616
3.034213 cli > srv: P 437:0523(86) ack 616 win 1970 # data/ACK 616
3.034633 srv > cli: . ack 523 win 92
3.039512 cli > srv: P 523:0659(136) ack 616 win 1970
3.040083 cli > srv: P 659:0795(136) ack 616 win 1970
3.040491 srv > cli: . ack 795 win 92
3.041161 cli > srv: P 795:0931(136) ack 616 win 1970
3.042828 cli > srv: P 931:1032(101) ack 616 win 1970
3.043198 srv > cli: . ack 1032 win 92

For the record, I posted my diagnosis of this issue in your thread for
the v2 version of the patch.

thanks,
neal
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 886d874..0c8da1c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2423,6 +2423,10 @@  bool tcp_schedule_loss_probe(struct sock *sk)
 	tlp_time_stamp = tcp_jiffies32 + timeout;
 	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
 	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
+		/*It is no need to reschedule PTO when there is one outstanding TLP retransmission*/
+		if (tp->tlp_high_seq) {
+			return false;
+		}
 		s32 delta = rto_time_stamp - tcp_jiffies32;
 		if (delta > 0)
 			timeout = delta;