Message ID | 1500971735-21852-1-git-send-email-maowenan@huawei.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
> -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] > On Behalf Of Mao Wenan > Sent: Tuesday, July 25, 2017 4:36 PM > To: netdev@vger.kernel.org; davem@davemloft.net; weiyongjun (A); > Chenweilong > Subject: [PATCH net-next] TLP: Don't reschedule PTO when there's one > outstanding TLP retransmission. > > If there is one TLP probe went out(TLP use the write_queue_tail packet as TLP > probe, we assume this first TLP probe named A), and this TLP probe was not > acked by receive side. > > Then the transmit side sent the next two packetes out(named B,C), but > unfortunately these two packets are also not acked by receive side. > > And then there is one data packet with ack_seq A arrive, in tcp_ack() will call > tcp_schedule_loss_probe() to rearm PTO, the handler > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one > outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP probe > can't be went out and need to rearm the RTO timer(timeout is relative to the > transmit time of the write queue head). > > After this, another data packet with ack_seq A is received, if the > tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout with > delta value, which is before previous RTO timeout, so PTO is rearm and > previous RTO is cleared. This TLP probe also can't be sent out because of > tp->tlp_high_seq != 0, so there is no way(or need very long time)to retransmit > the packet because of TLP A is lost. > > This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe() > when TLP PTO is after RTO, It is no need to reschedule PTO when there is one > outstanding TLP retransmission, so if the TLP A is lost then RTO can retransmit > that packet, and tp->tlp_high_seq will be set to 0. After this TLP will go to the > normal work process. > > Signed-off-by: Mao Wenan <maowenan@huawei.com> > --- > net/ipv4/tcp_output.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index > 886d874..0c8da1c 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2423,6 +2423,10 @@ bool tcp_schedule_loss_probe(struct sock *sk) > tlp_time_stamp = tcp_jiffies32 + timeout; > rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout; > if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) { > + /*It is no need to reschedule PTO when there is one outstanding TLP > retransmission*/ > + if (tp->tlp_high_seq) { > + return false; > + } > s32 delta = rto_time_stamp - tcp_jiffies32; > if (delta > 0) > timeout = delta; > -- > 2.5.0 > Add ycheng@google.com; kuznet@ms2.inr.ac.ru; linux-kernel@vger.kernel.org in mail loop.
Hello! On 7/25/2017 11:35 AM, Mao Wenan wrote: > If there is one TLP probe went out(TLP use the write_queue_tail packet > as TLP probe, we assume this first TLP probe named A), and this TLP > probe was not acked by receive side. > > Then the transmit side sent the next two packetes out(named B,C), but > unfortunately these two packets are also not acked by receive side. > > And then there is one data packet with ack_seq A arrive, in tcp_ack() > will call tcp_schedule_loss_probe() to rearm PTO, the handler > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is > one outstanding TLP named A,tp->tlp_high_seq is not zero), > so the new TLP probe can't be went out and need to rearm the RTO > timer(timeout is relative to the transmit time of the write queue head). > > After this, another data packet with ack_seq A is received, > if the tlp_time_stamp is after rto_time_stamp, it will reset the > TLP timeout with delta value, which is before previous RTO timeout, > so PTO is rearm and previous RTO is cleared. This TLP probe also can't > be sent out because of tp->tlp_high_seq != 0, so there is no way(or need > very long time)to retransmit the packet because of TLP A is lost. > > This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe() > when TLP PTO is after RTO, It is no need to reschedule PTO when there > is one outstanding TLP retransmission, so if the TLP A is lost then RTO can > retransmit that packet, and tp->tlp_high_seq will be set to 0. After this TLP > will go to the normal work process. > > Signed-off-by: Mao Wenan <maowenan@huawei.com> > --- > net/ipv4/tcp_output.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 886d874..0c8da1c 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2423,6 +2423,10 @@ bool tcp_schedule_loss_probe(struct sock *sk) > tlp_time_stamp = tcp_jiffies32 + timeout; > rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout; > if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) { > + /*It is no need to reschedule PTO when there is one outstanding TLP retransmission*/ Please add space after /* and before */ > + if (tp->tlp_high_seq) { > + return false; > + } {} not needed. [...] MBR, Sergei
On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com> wrote: > If there is one TLP probe went out(TLP use the write_queue_tail packet > as TLP probe, we assume this first TLP probe named A), and this TLP > probe was not acked by receive side. > > Then the transmit side sent the next two packetes out(named B,C), but > unfortunately these two packets are also not acked by receive side. > > And then there is one data packet with ack_seq A arrive, in tcp_ack() > will call tcp_schedule_loss_probe() to rearm PTO, the handler > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is > one outstanding TLP named A,tp->tlp_high_seq is not zero), > so the new TLP probe can't be went out and need to rearm the RTO > timer(timeout is relative to the transmit time of the write queue head). > > After this, another data packet with ack_seq A is received, > if the tlp_time_stamp is after rto_time_stamp, it will reset the > TLP timeout with delta value, which is before previous RTO timeout, > so PTO is rearm and previous RTO is cleared. This TLP probe also can't > be sent out because of tp->tlp_high_seq != 0, so there is no way(or need > very long time)to retransmit the packet because of TLP A is lost. > > This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe() > when TLP PTO is after RTO, It is no need to reschedule PTO when there > is one outstanding TLP retransmission, so if the TLP A is lost then RTO can > retransmit that packet, and tp->tlp_high_seq will be set to 0. After this TLP > will go to the normal work process. > > Signed-off-by: Mao Wenan <maowenan@huawei.com> Thanks for posting this. This is a pretty involved scenario. To help document/test precisely what the behavior is before and after your patch, would you be able to post a packetdrill ( https://github.com/google/packetdrill ) test case for this scenario? Can I ask if you saw this scenario in an actual trace, or noticed this by inspection? thanks, neal
> -----Original Message----- > From: Sergei Shtylyov [mailto:sergei.shtylyov@cogentembedded.com] > Sent: Tuesday, July 25, 2017 5:55 PM > To: maowenan; netdev@vger.kernel.org; davem@davemloft.net; weiyongjun > (A); Chenweilong > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one > outstanding TLP retransmission. > > Hello! > > On 7/25/2017 11:35 AM, Mao Wenan wrote: > > > If there is one TLP probe went out(TLP use the write_queue_tail packet > > as TLP probe, we assume this first TLP probe named A), and this TLP > > probe was not acked by receive side. > > > > Then the transmit side sent the next two packetes out(named B,C), but > > unfortunately these two packets are also not acked by receive side. > > > > And then there is one data packet with ack_seq A arrive, in tcp_ack() > > will call tcp_schedule_loss_probe() to rearm PTO, the handler > > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one > > outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP > > probe can't be went out and need to rearm the RTO timer(timeout is > > relative to the transmit time of the write queue head). > > > > After this, another data packet with ack_seq A is received, if the > > tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout > > with delta value, which is before previous RTO timeout, so PTO is > > rearm and previous RTO is cleared. This TLP probe also can't be sent > > out because of tp->tlp_high_seq != 0, so there is no way(or need very > > long time)to retransmit the packet because of TLP A is lost. > > > > This fix is not to pass the if(tp->tlp_high_seq) in > > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need to > > reschedule PTO when there is one outstanding TLP retransmission, so if > > the TLP A is lost then RTO can retransmit that packet, and > > tp->tlp_high_seq will be set to 0. After this TLP will go to the normal work > process. > > > > Signed-off-by: Mao Wenan <maowenan@huawei.com> > > --- > > net/ipv4/tcp_output.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index > > 886d874..0c8da1c 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -2423,6 +2423,10 @@ bool tcp_schedule_loss_probe(struct sock *sk) > > tlp_time_stamp = tcp_jiffies32 + timeout; > > rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout; > > if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) { > > + /*It is no need to reschedule PTO when there is one outstanding TLP > > +retransmission*/ > > Please add space after /* and before */ Ok, thank you. > > > + if (tp->tlp_high_seq) { > > + return false; > > + } > > {} not needed. Ok, I will send the V2. > > [...] > > MBR, Sergei
> -----Original Message----- > From: Neal Cardwell [mailto:ncardwell@google.com] > Sent: Tuesday, July 25, 2017 9:30 PM > To: maowenan > Cc: Netdev; David Miller; weiyongjun (A); Chenweilong; Yuchung Cheng; > Nandita Dukkipati > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one > outstanding TLP retransmission. > > On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com> > wrote: > > If there is one TLP probe went out(TLP use the write_queue_tail packet > > as TLP probe, we assume this first TLP probe named A), and this TLP > > probe was not acked by receive side. > > > > Then the transmit side sent the next two packetes out(named B,C), but > > unfortunately these two packets are also not acked by receive side. > > > > And then there is one data packet with ack_seq A arrive, in tcp_ack() > > will call tcp_schedule_loss_probe() to rearm PTO, the handler > > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one > > outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP > > probe can't be went out and need to rearm the RTO timer(timeout is > > relative to the transmit time of the write queue head). > > > > After this, another data packet with ack_seq A is received, if the > > tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout > > with delta value, which is before previous RTO timeout, so PTO is > > rearm and previous RTO is cleared. This TLP probe also can't be sent > > out because of tp->tlp_high_seq != 0, so there is no way(or need very > > long time)to retransmit the packet because of TLP A is lost. > > > > This fix is not to pass the if(tp->tlp_high_seq) in > > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need to > > reschedule PTO when there is one outstanding TLP retransmission, so if > > the TLP A is lost then RTO can retransmit that packet, and > > tp->tlp_high_seq will be set to 0. After this TLP will go to the normal work > process. > > > > Signed-off-by: Mao Wenan <maowenan@huawei.com> > > Thanks for posting this. This is a pretty involved scenario. To help > document/test precisely what the behavior is before and after your patch, > would you be able to post a packetdrill ( https://github.com/google/packetdrill ) > test case for this scenario? > > Can I ask if you saw this scenario in an actual trace, or noticed this by > inspection? [Mao Wenan] It is my actual product scenario, from some debug trace info I found that PTO is always rescheduled before RTO, this is PTO is trigged by receiving one tcp_ack packets. After this patch, the product works well and there is no previous issue happen. The packet can retransmit with RTO if TLP probe is lost. I will say sorry that my local test can't reproduce because i can't build successfully the same complex scenario, I don't have mature test case to reproduce now but I will do my best to try in local test environment. Thank you. > > thanks, > neal
Hi Mao,
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Mao-Wenan/TLP-Don-t-reschedule-PTO-when-there-s-one-outstanding-TLP-retransmission/20170726-172222
config: x86_64-randconfig-x000-201730 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
net//ipv4/tcp_output.c: In function 'tcp_schedule_loss_probe':
>> net//ipv4/tcp_output.c:2430:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
s32 delta = rto_time_stamp - tcp_jiffies32;
^~~
vim +2430 net//ipv4/tcp_output.c
6ba8a3b1 Nandita Dukkipati 2013-03-11 2374
6ba8a3b1 Nandita Dukkipati 2013-03-11 2375 bool tcp_schedule_loss_probe(struct sock *sk)
6ba8a3b1 Nandita Dukkipati 2013-03-11 2376 {
6ba8a3b1 Nandita Dukkipati 2013-03-11 2377 struct inet_connection_sock *icsk = inet_csk(sk);
6ba8a3b1 Nandita Dukkipati 2013-03-11 2378 struct tcp_sock *tp = tcp_sk(sk);
6ba8a3b1 Nandita Dukkipati 2013-03-11 2379 u32 timeout, tlp_time_stamp, rto_time_stamp;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2380
6ba8a3b1 Nandita Dukkipati 2013-03-11 2381 /* No consecutive loss probes. */
6ba8a3b1 Nandita Dukkipati 2013-03-11 2382 if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
6ba8a3b1 Nandita Dukkipati 2013-03-11 2383 tcp_rearm_rto(sk);
6ba8a3b1 Nandita Dukkipati 2013-03-11 2384 return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2385 }
6ba8a3b1 Nandita Dukkipati 2013-03-11 2386 /* Don't do any loss probe on a Fast Open connection before 3WHS
6ba8a3b1 Nandita Dukkipati 2013-03-11 2387 * finishes.
6ba8a3b1 Nandita Dukkipati 2013-03-11 2388 */
f9b99582 Yuchung Cheng 2015-09-18 2389 if (tp->fastopen_rsk)
6ba8a3b1 Nandita Dukkipati 2013-03-11 2390 return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2391
6ba8a3b1 Nandita Dukkipati 2013-03-11 2392 /* TLP is only scheduled when next timer event is RTO. */
6ba8a3b1 Nandita Dukkipati 2013-03-11 2393 if (icsk->icsk_pending != ICSK_TIME_RETRANS)
6ba8a3b1 Nandita Dukkipati 2013-03-11 2394 return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2395
6ba8a3b1 Nandita Dukkipati 2013-03-11 2396 /* Schedule a loss probe in 2*RTT for SACK capable connections
6ba8a3b1 Nandita Dukkipati 2013-03-11 2397 * in Open state, that are either limited by cwnd or application.
6ba8a3b1 Nandita Dukkipati 2013-03-11 2398 */
bec41a11 Yuchung Cheng 2017-01-12 2399 if ((sysctl_tcp_early_retrans != 3 && sysctl_tcp_early_retrans != 4) ||
bec41a11 Yuchung Cheng 2017-01-12 2400 !tp->packets_out || !tcp_is_sack(tp) ||
bec41a11 Yuchung Cheng 2017-01-12 2401 icsk->icsk_ca_state != TCP_CA_Open)
6ba8a3b1 Nandita Dukkipati 2013-03-11 2402 return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2403
6ba8a3b1 Nandita Dukkipati 2013-03-11 2404 if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&
6ba8a3b1 Nandita Dukkipati 2013-03-11 2405 tcp_send_head(sk))
6ba8a3b1 Nandita Dukkipati 2013-03-11 2406 return false;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2407
bb4d991a Yuchung Cheng 2017-07-19 2408 /* Probe timeout is 2*rtt. Add minimum RTO to account
f9b99582 Yuchung Cheng 2015-09-18 2409 * for delayed ack when there's one outstanding packet. If no RTT
f9b99582 Yuchung Cheng 2015-09-18 2410 * sample is available then probe after TCP_TIMEOUT_INIT.
6ba8a3b1 Nandita Dukkipati 2013-03-11 2411 */
bb4d991a Yuchung Cheng 2017-07-19 2412 if (tp->srtt_us) {
bb4d991a Yuchung Cheng 2017-07-19 2413 timeout = usecs_to_jiffies(tp->srtt_us >> 2);
6ba8a3b1 Nandita Dukkipati 2013-03-11 2414 if (tp->packets_out == 1)
bb4d991a Yuchung Cheng 2017-07-19 2415 timeout += TCP_RTO_MIN;
bb4d991a Yuchung Cheng 2017-07-19 2416 else
bb4d991a Yuchung Cheng 2017-07-19 2417 timeout += TCP_TIMEOUT_MIN;
bb4d991a Yuchung Cheng 2017-07-19 2418 } else {
bb4d991a Yuchung Cheng 2017-07-19 2419 timeout = TCP_TIMEOUT_INIT;
bb4d991a Yuchung Cheng 2017-07-19 2420 }
6ba8a3b1 Nandita Dukkipati 2013-03-11 2421
6ba8a3b1 Nandita Dukkipati 2013-03-11 2422 /* If RTO is shorter, just schedule TLP in its place. */
ac9517fc Eric Dumazet 2017-05-16 2423 tlp_time_stamp = tcp_jiffies32 + timeout;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2424 rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2425 if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
e41572cd Mao Wenan 2017-07-25 2426 /*It is no need to reschedule PTO when there is one outstanding TLP retransmission*/
e41572cd Mao Wenan 2017-07-25 2427 if (tp->tlp_high_seq) {
e41572cd Mao Wenan 2017-07-25 2428 return false;
e41572cd Mao Wenan 2017-07-25 2429 }
ac9517fc Eric Dumazet 2017-05-16 @2430 s32 delta = rto_time_stamp - tcp_jiffies32;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2431 if (delta > 0)
6ba8a3b1 Nandita Dukkipati 2013-03-11 2432 timeout = delta;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2433 }
6ba8a3b1 Nandita Dukkipati 2013-03-11 2434
6ba8a3b1 Nandita Dukkipati 2013-03-11 2435 inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
6ba8a3b1 Nandita Dukkipati 2013-03-11 2436 TCP_RTO_MAX);
6ba8a3b1 Nandita Dukkipati 2013-03-11 2437 return true;
6ba8a3b1 Nandita Dukkipati 2013-03-11 2438 }
6ba8a3b1 Nandita Dukkipati 2013-03-11 2439
:::::: The code at line 2430 was first introduced by commit
:::::: ac9517fcf310327fa3e3b0d8366e4b11236b1b4b tcp: replace misc tcp_time_stamp to tcp_jiffies32
:::::: TO: Eric Dumazet <edumazet@google.com>
:::::: CC: David S. Miller <davem@davemloft.net>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Tue, Jul 25, 2017 at 7:12 PM, maowenan <maowenan@huawei.com> wrote: > > > > > -----Original Message----- > > From: Neal Cardwell [mailto:ncardwell@google.com] > > Sent: Tuesday, July 25, 2017 9:30 PM > > To: maowenan > > Cc: Netdev; David Miller; weiyongjun (A); Chenweilong; Yuchung Cheng; > > Nandita Dukkipati > > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one > > outstanding TLP retransmission. > > > > On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com> > > wrote: > > > If there is one TLP probe went out(TLP use the write_queue_tail packet > > > as TLP probe, we assume this first TLP probe named A), and this TLP > > > probe was not acked by receive side. > > > > > > Then the transmit side sent the next two packetes out(named B,C), but > > > unfortunately these two packets are also not acked by receive side. > > > > > > And then there is one data packet with ack_seq A arrive, in tcp_ack() > > > will call tcp_schedule_loss_probe() to rearm PTO, the handler > > > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one > > > outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP > > > probe can't be went out and need to rearm the RTO timer(timeout is > > > relative to the transmit time of the write queue head). > > > > > > After this, another data packet with ack_seq A is received, if the > > > tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout > > > with delta value, which is before previous RTO timeout, so PTO is > > > rearm and previous RTO is cleared. This TLP probe also can't be sent > > > out because of tp->tlp_high_seq != 0, so there is no way(or need very > > > long time)to retransmit the packet because of TLP A is lost. > > > > > > This fix is not to pass the if(tp->tlp_high_seq) in > > > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need to > > > reschedule PTO when there is one outstanding TLP retransmission, so if > > > the TLP A is lost then RTO can retransmit that packet, and > > > tp->tlp_high_seq will be set to 0. After this TLP will go to the normal work > > process. > > > > > > Signed-off-by: Mao Wenan <maowenan@huawei.com> > > > > Thanks for posting this. This is a pretty involved scenario. To help > > document/test precisely what the behavior is before and after your patch, > > would you be able to post a packetdrill ( https://github.com/google/packetdrill ) > > test case for this scenario? > > > > Can I ask if you saw this scenario in an actual trace, or noticed this by > > inspection? > [Mao Wenan] It is my actual product scenario, from some debug trace info > I found that PTO is always rescheduled before RTO, this is PTO is trigged > by receiving one tcp_ack packets. After this patch, the product works well and > there is no previous issue happen. The packet can retransmit with RTO if TLP probe > is lost. > I will say sorry that my local test can't reproduce because i can't build successfully the same > complex scenario, I don't have mature test case to reproduce now but I will do my best to try > in local test environment. tcpdump output of an example trace also helps. Feel free to remove IP address, ports, or payload information. We only need the timing and TCP header info. > Thank you. > > > > thanks, > > neal
> -----Original Message----- > From: Yuchung Cheng [mailto:ycheng@google.com] > Sent: Thursday, July 27, 2017 12:45 AM > To: maowenan > Cc: Neal Cardwell; Netdev; David Miller; weiyongjun (A); Chenweilong; Nandita > Dukkipati; Wangkefeng (Kevin) > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's one > outstanding TLP retransmission. > > On Tue, Jul 25, 2017 at 7:12 PM, maowenan <maowenan@huawei.com> > wrote: > > > > > > > > > -----Original Message----- > > > From: Neal Cardwell [mailto:ncardwell@google.com] > > > Sent: Tuesday, July 25, 2017 9:30 PM > > > To: maowenan > > > Cc: Netdev; David Miller; weiyongjun (A); Chenweilong; Yuchung > > > Cheng; Nandita Dukkipati > > > Subject: Re: [PATCH net-next] TLP: Don't reschedule PTO when there's > > > one outstanding TLP retransmission. > > > > > > On Tue, Jul 25, 2017 at 4:35 AM, Mao Wenan <maowenan@huawei.com> > > > wrote: > > > > If there is one TLP probe went out(TLP use the write_queue_tail > > > > packet as TLP probe, we assume this first TLP probe named A), and > > > > this TLP probe was not acked by receive side. > > > > > > > > Then the transmit side sent the next two packetes out(named B,C), > > > > but unfortunately these two packets are also not acked by receive side. > > > > > > > > And then there is one data packet with ack_seq A arrive, in > > > > tcp_ack() will call tcp_schedule_loss_probe() to rearm PTO, the > > > > handler > > > > tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is > > > > one outstanding TLP named A,tp->tlp_high_seq is not zero), so the > > > > new TLP probe can't be went out and need to rearm the RTO > > > > timer(timeout is relative to the transmit time of the write queue head). > > > > > > > > After this, another data packet with ack_seq A is received, if the > > > > tlp_time_stamp is after rto_time_stamp, it will reset the TLP > > > > timeout with delta value, which is before previous RTO timeout, so > > > > PTO is rearm and previous RTO is cleared. This TLP probe also > > > > can't be sent out because of tp->tlp_high_seq != 0, so there is no > > > > way(or need very long time)to retransmit the packet because of TLP A is > lost. > > > > > > > > This fix is not to pass the if(tp->tlp_high_seq) in > > > > tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need > > > > to reschedule PTO when there is one outstanding TLP > > > > retransmission, so if the TLP A is lost then RTO can retransmit > > > > that packet, and > > > > tp->tlp_high_seq will be set to 0. After this TLP will go to the > > > > tp->normal work > > > process. > > > > > > > > Signed-off-by: Mao Wenan <maowenan@huawei.com> > > > > > > Thanks for posting this. This is a pretty involved scenario. To help > > > document/test precisely what the behavior is before and after your > > > patch, would you be able to post a packetdrill ( > > > https://github.com/google/packetdrill ) test case for this scenario? > > > > > > Can I ask if you saw this scenario in an actual trace, or noticed > > > this by inspection? > > [Mao Wenan] It is my actual product scenario, from some debug trace > > info I found that PTO is always rescheduled before RTO, this is PTO is > > trigged by receiving one tcp_ack packets. After this patch, the > > product works well and there is no previous issue happen. The packet > > can retransmit with RTO if TLP probe is lost. > > I will say sorry that my local test can't reproduce because i can't > > build successfully the same complex scenario, I don't have mature test > > case to reproduce now but I will do my best to try in local test environment. > tcpdump output of an example trace also helps. Feel free to remove IP address, > ports, or payload information. We only need the timing and TCP header info. Ok, please check as below. Server: 192.169.0.18, client: 192.169.0.17 ...... 20:36:12.239562 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216544:501216616(72) ack 2308880351 win 92 20:36:12.240034 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216616:501216688(72) ack 2308880351 win 92 20:36:12.260764 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216616:501216688(72) ack 2308880351 win 92 /* first TLP probe A, seq: 501216616 */ 20:36:13.018020 IP 192.169.0.17.56246 > 192.169.0.18.1234: . ack 501216616 win 1970 20:36:13.018386 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216688:501216760(72) ack 2308880351 win 92 /* server send packet B */ 20:36:13.022448 IP 192.169.0.18.1234 > 192.169.0.17.56246: P 501216760:501216832(72) ack 2308880351 win 92 /* server send packet C */ 20:36:13.033813 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880351:2308880437(86) ack 501216616 win 1970 /* receive client data packet ack 501216616 */ 20:36:13.034213 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880437:2308880523(86) ack 501216616 win 1970 /* receive client data packet ack 501216616 */ 20:36:13.034633 IP 192.169.0.18.1234 > 192.169.0.17.56246: . ack 2308880523 win 92 /* server pure ack client */ 20:36:13.039512 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880523:2308880659(136) ack 501216616 win 1970 20:36:13.040083 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880659:2308880795(136) ack 501216616 win 1970 20:36:13.040491 IP 192.169.0.18.1234 > 192.169.0.17.56246: . ack 2308880795 win 92 20:36:13.041161 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880795:2308880931(136) ack 501216616 win 1970 20:36:13.042828 IP 192.169.0.17.56246 > 192.169.0.18.1234: P 2308880931:2308881032(101) ack 501216616 win 1970 20:36:13.043198 IP 192.169.0.18.1234 > 192.169.0.17.56246: . ack 2308881032 win 92 ...... > > > > Thank you. > > > > > > thanks, > > > neal
On Wed, Jul 26, 2017 at 9:28 PM, maowenan <maowenan@huawei.com> wrote: > Ok, please check as below. Server: 192.169.0.18, client: 192.169.0.17 > ...... Thanks for this trace! Here is a condensed version that I believe should be equivalent, for those who are working on parsing what's going on: 2.239562 srv > cli: P 544:616(72) ack 351 win 92 2.240034 srv > cli: P 616:688(72) ack 351 win 92 2.260764 srv > cli: P 616:688(72) ack 351 win 92 # TLP 3.018020 cli > srv: . ack 616 win 1970 3.018386 srv > cli: P 688:760(72) ack 351 win 92 # srv sends B 3.022448 srv > cli: P 760:832(72) ack 351 win 92 # srv sends C 3.033813 cli > srv: P 351:0437(86) ack 616 win 1970 # data/ACK 616 3.034213 cli > srv: P 437:0523(86) ack 616 win 1970 # data/ACK 616 3.034633 srv > cli: . ack 523 win 92 3.039512 cli > srv: P 523:0659(136) ack 616 win 1970 3.040083 cli > srv: P 659:0795(136) ack 616 win 1970 3.040491 srv > cli: . ack 795 win 92 3.041161 cli > srv: P 795:0931(136) ack 616 win 1970 3.042828 cli > srv: P 931:1032(101) ack 616 win 1970 3.043198 srv > cli: . ack 1032 win 92 For the record, I posted my diagnosis of this issue in your thread for the v2 version of the patch. thanks, neal
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 886d874..0c8da1c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2423,6 +2423,10 @@ bool tcp_schedule_loss_probe(struct sock *sk) tlp_time_stamp = tcp_jiffies32 + timeout; rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout; if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) { + /*It is no need to reschedule PTO when there is one outstanding TLP retransmission*/ + if (tp->tlp_high_seq) { + return false; + } s32 delta = rto_time_stamp - tcp_jiffies32; if (delta > 0) timeout = delta;
If there is one TLP probe went out(TLP use the write_queue_tail packet as TLP probe, we assume this first TLP probe named A), and this TLP probe was not acked by receive side. Then the transmit side sent the next two packetes out(named B,C), but unfortunately these two packets are also not acked by receive side. And then there is one data packet with ack_seq A arrive, in tcp_ack() will call tcp_schedule_loss_probe() to rearm PTO, the handler tcp_send_loss_probe() pass if(tp->tlp_high_seq)(because there is one outstanding TLP named A,tp->tlp_high_seq is not zero), so the new TLP probe can't be went out and need to rearm the RTO timer(timeout is relative to the transmit time of the write queue head). After this, another data packet with ack_seq A is received, if the tlp_time_stamp is after rto_time_stamp, it will reset the TLP timeout with delta value, which is before previous RTO timeout, so PTO is rearm and previous RTO is cleared. This TLP probe also can't be sent out because of tp->tlp_high_seq != 0, so there is no way(or need very long time)to retransmit the packet because of TLP A is lost. This fix is not to pass the if(tp->tlp_high_seq) in tcp_schedule_loss_probe() when TLP PTO is after RTO, It is no need to reschedule PTO when there is one outstanding TLP retransmission, so if the TLP A is lost then RTO can retransmit that packet, and tp->tlp_high_seq will be set to 0. After this TLP will go to the normal work process. Signed-off-by: Mao Wenan <maowenan@huawei.com> --- net/ipv4/tcp_output.c | 4 ++++ 1 file changed, 4 insertions(+)