diff mbox series

tcp: TCP_USER_TIMEOUT can not work in tcp_probe_timer()

Message ID 1505111262-12620-1-git-send-email-liujian56@huawei.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series tcp: TCP_USER_TIMEOUT can not work in tcp_probe_timer() | expand

Commit Message

Liu Jian Sept. 11, 2017, 6:27 a.m. UTC
From: liujian <liujian56@huawei.com>

After the tcp socket go to ESTABLISHED stat, change IP address (server
side),
then the tcp socket will go tcp_probe_timer process.

[root@localhost net]# netstat -toe
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      Timer
tcp        0   1104 9.81.254:personal-agent 9.84.201.213:23597      ESTABLISHED root       12819      probe (4.36/0/7)
[root@localhost net]# cat /proc/net/tcp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
   3: B1FE5109:15B3 D5C95409:5C2D 01 00000495:00000000 04:0000005E 00000000     0        7 12819 2 ffff95cdcf45a000 20 4 1 10 -1 

In my test case, tcp_write_queue_head(sk) and tcp_send_head(sk) is  same
SKB.
And ((s32)(tcp_time_stamp(tp) - start_ts) >
jiffies_to_msecs(icsk->icsk_user_timeout))
 always is false.
 Here use keepalive_time_elapsed(tp) to do the compare as
tcp_keepalive_timer do.

Signed-off-by: liujian <liujian56@huawei.com>
---
 net/ipv4/tcp_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet Sept. 11, 2017, 2:49 p.m. UTC | #1
On Mon, 2017-09-11 at 14:27 +0800, liujian56@huawei.com wrote:
> From: liujian <liujian56@huawei.com>
> 
> After the tcp socket go to ESTABLISHED stat, change IP address (server
> side),
> then the tcp socket will go tcp_probe_timer process.
> 
> [root@localhost net]# netstat -toe
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      Timer
> tcp        0   1104 9.81.254:personal-agent 9.84.201.213:23597      ESTABLISHED root       12819      probe (4.36/0/7)
> [root@localhost net]# cat /proc/net/tcp
>   sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
>    3: B1FE5109:15B3 D5C95409:5C2D 01 00000495:00000000 04:0000005E 00000000     0        7 12819 2 ffff95cdcf45a000 20 4 1 10 -1 
> 
> In my test case, tcp_write_queue_head(sk) and tcp_send_head(sk) is  same
> SKB.
> And ((s32)(tcp_time_stamp(tp) - start_ts) >
> jiffies_to_msecs(icsk->icsk_user_timeout))
>  always is false.

Interesting.

>  Here use keepalive_time_elapsed(tp) to do the compare as
> tcp_keepalive_timer do.


But zero window probe and TCP_USER_TIMEOUT can be used without
keepalives...


A packetdrill test would help, I will write one.

Thanks.
Eric Dumazet Sept. 11, 2017, 3:13 p.m. UTC | #2
On Mon, 2017-09-11 at 07:49 -0700, Eric Dumazet wrote:
> On Mon, 2017-09-11 at 14:27 +0800, liujian56@huawei.com wrote:
> > From: liujian <liujian56@huawei.com>
> > 
> > After the tcp socket go to ESTABLISHED stat, change IP address (server
> > side),
> > then the tcp socket will go tcp_probe_timer process.
> > 
> > [root@localhost net]# netstat -toe
> > Active Internet connections (w/o servers)
> > Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      Timer
> > tcp        0   1104 9.81.254:personal-agent 9.84.201.213:23597      ESTABLISHED root       12819      probe (4.36/0/7)
> > [root@localhost net]# cat /proc/net/tcp
> >   sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
> >    3: B1FE5109:15B3 D5C95409:5C2D 01 00000495:00000000 04:0000005E 00000000     0        7 12819 2 ffff95cdcf45a000 20 4 1 10 -1 
> > 
> > In my test case, tcp_write_queue_head(sk) and tcp_send_head(sk) is  same
> > SKB.
> > And ((s32)(tcp_time_stamp(tp) - start_ts) >
> > jiffies_to_msecs(icsk->icsk_user_timeout))
> >  always is false.
> 
> Interesting.
> 
> >  Here use keepalive_time_elapsed(tp) to do the compare as
> > tcp_keepalive_timer do.
> 
> 
> But zero window probe and TCP_USER_TIMEOUT can be used without
> keepalives...
> 
> 
> A packetdrill test would help, I will write one.

So existing code seems to work :

# cat window-probe-without-data-user-timeout.pkt

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>

// Client advertises a zero receive window, so we can't send.
  +.1 < . 1:1(0) ack 1 win 0
   +0 accept(3, ..., ...) = 4

   +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
   +0 write(4, ..., 2920) = 2920

// Window probes are scheduled just like RTOs.
  +.3~+.31 > . 0:0(0) ack 1
  +.6~+.62 > . 0:0(0) ack 1
 +1.2~+1.24 > . 0:0(0) ack 1
 +2.4~+2.48 > . 0:0(0) ack 1

# ./packetdrill window-probe-without-data-user_timeout.pkt
08:10:39.306137 IP 192.0.2.1.58149 > 192.168.79.31.8080: Flags [S], seq 0, win 0, options [mss 1460,sackOK,nop,nop,nop,wscale 7], length 0
08:10:39.306166 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [S.], seq 3982794529, ack 1, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 8], length 0
08:10:39.406296 IP 192.0.2.1.58149 > 192.168.79.31.8080: Flags [.], ack 1, win 0, length 0
08:10:39.716004 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [.], ack 1, win 115, length 0
08:10:40.327133 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [.], ack 1, win 115, length 0
08:10:41.540243 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [.], ack 1, win 115, length 0

You can see we got only 3 probes, not 4.
Eric Dumazet Sept. 11, 2017, 3:22 p.m. UTC | #3
On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote:

> You can see we got only 3 probes, not 4.

Here is complete packetdrill test showing that code behaves as expected.

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>

// Client advertises a zero receive window, so we can't send.
  +.1 < . 1:1(0) ack 1 win 0
   +0 accept(3, ..., ...) = 4

   +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
   +0 write(4, ..., 2920) = 2920

// Window probes are scheduled just like RTOs.
  +.3~+.31 > . 0:0(0) ack 1
  +.6~+.62 > . 0:0(0) ack 1
 +1.2~+1.24 > . 0:0(0) ack 1

// Peer opens its window too late !
   +3 < . 1:1(0) ack 1 win 1000
   +0 > R 1:1(0)
Liu Jian Sept. 12, 2017, 6:08 a.m. UTC | #4
Hi,

In the scenario, tcp server side IP changed, and at that memont,
userspace application still send data continuously;
tcp_send_head(sk)'s timestamp always be refreshed.

Here is the packetdrill script:

   0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>

  +.1 < . 1:1(0) ack 1 win 65530
   +0 accept(3, ..., ...) = 4

   +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
   +0 write(4, ..., 24) = 24
   +0 > P. 1:25(24) ack 1 win 229
   +.1 < . 1:1(0) ack 25 win 65530

//change the ipaddress
   +1 `ifconfig tun0 192.168.0.10/16`

   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24

   +0 `ifconfig tun0 192.168.0.1/16`
   +0 < . 1:1(0) ack 1 win 1000
   +0 write(4, ..., 24) = -1


[root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt
test.pkt:50: runtime error in write call: Expected result -1 but got 24 with errno 2 (No such file or directory)

real	1m11.364s
user	0m0.028s
sys	0m0.106s

[root@localhost ~]# netstat -toen
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      Timer
tcp        0    504 192.168.0.1:8080        192.0.2.1:33993         ESTABLISHED 0          45453      probe (22.38/0/7)

since the script didn't wait for enough time, here only got 7 probes.

在 2017/9/11 23:22, Eric Dumazet 写道:
> On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote:
> 
>> You can see we got only 3 probes, not 4.
> 
> Here is complete packetdrill test showing that code behaves as expected.
> 
>     0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>    +0 bind(3, ..., ...) = 0
>    +0 listen(3, 1) = 0
> 
>    +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
> 
> // Client advertises a zero receive window, so we can't send.
>   +.1 < . 1:1(0) ack 1 win 0
>    +0 accept(3, ..., ...) = 4
> 
>    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
>    +0 write(4, ..., 2920) = 2920
> 
> // Window probes are scheduled just like RTOs.
>   +.3~+.31 > . 0:0(0) ack 1
>   +.6~+.62 > . 0:0(0) ack 1
>  +1.2~+1.24 > . 0:0(0) ack 1
> 
> // Peer opens its window too late !
>    +3 < . 1:1(0) ack 1 win 1000
>    +0 > R 1:1(0)
> 
> 
> 
> .
>
Eric Dumazet Sept. 12, 2017, 3:05 p.m. UTC | #5
On Tue, 2017-09-12 at 14:08 +0800, liujian wrote:
> Hi,
> 
> In the scenario, tcp server side IP changed, and at that memont,
> userspace application still send data continuously;
> tcp_send_head(sk)'s timestamp always be refreshed.
> 
> Here is the packetdrill script:
> 
>    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>    +0 bind(3, ..., ...) = 0
>    +0 listen(3, 1) = 0
> 
>    +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
> 
>   +.1 < . 1:1(0) ack 1 win 65530
>    +0 accept(3, ..., ...) = 4
> 
>    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
>    +0 write(4, ..., 24) = 24
>    +0 > P. 1:25(24) ack 1 win 229
>    +.1 < . 1:1(0) ack 25 win 65530
> 
> //change the ipaddress
>    +1 `ifconfig tun0 192.168.0.10/16`
> 
>    +1 write(4, ..., 24) = 24
>    +1 write(4, ..., 24) = 24
>    +1 write(4, ..., 24) = 24
>    +1 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
>    +3 write(4, ..., 24) = 24
> 
>    +0 `ifconfig tun0 192.168.0.1/16`
>    +0 < . 1:1(0) ack 1 win 1000
>    +0 write(4, ..., 24) = -1
> 
> 

This has nothing to do with the code patch you have changed.

How have you tested your patch exactly ?


> [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt
> test.pkt:50: runtime error in write call: Expected result -1 but got 24 with errno 2 (No such file or directory)
> 
> real	1m11.364s
> user	0m0.028s
> sys	0m0.106s
> 
> [root@localhost ~]# netstat -toen
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      Timer
> tcp        0    504 192.168.0.1:8080        192.0.2.1:33993         ESTABLISHED 0          45453      probe (22.38/0/7)
> 
> since the script didn't wait for enough time, here only got 7 probes.
> 
> 在 2017/9/11 23:22, Eric Dumazet 写道:
> > On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote:
> > 
> >> You can see we got only 3 probes, not 4.
> > 
> > Here is complete packetdrill test showing that code behaves as expected.
> > 
> >     0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> >    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> >    +0 bind(3, ..., ...) = 0
> >    +0 listen(3, 1) = 0
> > 
> >    +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
> >    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
> > 
> > // Client advertises a zero receive window, so we can't send.
> >   +.1 < . 1:1(0) ack 1 win 0
> >    +0 accept(3, ..., ...) = 4
> > 
> >    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
> >    +0 write(4, ..., 2920) = 2920
> > 
> > // Window probes are scheduled just like RTOs.
> >   +.3~+.31 > . 0:0(0) ack 1
> >   +.6~+.62 > . 0:0(0) ack 1
> >  +1.2~+1.24 > . 0:0(0) ack 1
> > 
> > // Peer opens its window too late !
> >    +3 < . 1:1(0) ack 1 win 1000
> >    +0 > R 1:1(0)
> > 
> > 
> > 
> > .
> > 
>
Eric Dumazet Sept. 12, 2017, 3:38 p.m. UTC | #6
On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote:
> On Tue, 2017-09-12 at 14:08 +0800, liujian wrote:
> > Hi,
> > 
> > In the scenario, tcp server side IP changed, and at that memont,
> > userspace application still send data continuously;
> > tcp_send_head(sk)'s timestamp always be refreshed.
> > 
> > Here is the packetdrill script:
> > 
> >    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> >    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> >    +0 bind(3, ..., ...) = 0
> >    +0 listen(3, 1) = 0
> > 
> >    +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
> >    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
> > 
> >   +.1 < . 1:1(0) ack 1 win 65530
> >    +0 accept(3, ..., ...) = 4
> > 
> >    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
> >    +0 write(4, ..., 24) = 24
> >    +0 > P. 1:25(24) ack 1 win 229
> >    +.1 < . 1:1(0) ack 25 win 65530
> > 
> > //change the ipaddress
> >    +1 `ifconfig tun0 192.168.0.10/16`
> > 
> >    +1 write(4, ..., 24) = 24
> >    +1 write(4, ..., 24) = 24
> >    +1 write(4, ..., 24) = 24
> >    +1 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> >    +3 write(4, ..., 24) = 24
> > 
> >    +0 `ifconfig tun0 192.168.0.1/16`
> >    +0 < . 1:1(0) ack 1 win 1000
> >    +0 write(4, ..., 24) = -1
> > 
> > 
> 
> This has nothing to do with the code patch you have changed.
> 
> How have you tested your patch exactly ?
> 


lpaa23:~# ss -toenmi src :8080
State      Recv-Q Send-Q Local Address:Port               Peer
Address:Port              
ESTAB      0      144    192.168.134.161:8080
192.0.2.1:51165               timer:(persist,8.262ms,5) ino:1
82083 sk:3 <->
	 skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic
wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 
mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12
segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l
astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps
app_limited busy:11346ms rcv_space:29200 notsent:1
44 minrtt:100.043

This is the typical RTO timer, not zero window probe.
Liu Jian Sept. 13, 2017, 6:56 a.m. UTC | #7
在 2017/9/12 23:38, Eric Dumazet 写道:
> On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote:
>> On Tue, 2017-09-12 at 14:08 +0800, liujian wrote:
>>> Hi,
>>>
>>> In the scenario, tcp server side IP changed, and at that memont,
>>> userspace application still send data continuously;
>>> tcp_send_head(sk)'s timestamp always be refreshed.
>>>
>>> Here is the packetdrill script:
>>>
>>>    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>>>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>>>    +0 bind(3, ..., ...) = 0
>>>    +0 listen(3, 1) = 0
>>>
>>>    +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
>>>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
>>>
>>>   +.1 < . 1:1(0) ack 1 win 65530
>>>    +0 accept(3, ..., ...) = 4
>>>
>>>    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
>>>    +0 write(4, ..., 24) = 24
>>>    +0 > P. 1:25(24) ack 1 win 229
>>>    +.1 < . 1:1(0) ack 25 win 65530
>>>
>>> //change the ipaddress
>>>    +1 `ifconfig tun0 192.168.0.10/16`
>>>
>>>    +1 write(4, ..., 24) = 24
>>>    +1 write(4, ..., 24) = 24
>>>    +1 write(4, ..., 24) = 24
>>>    +1 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>    +3 write(4, ..., 24) = 24
>>>
>>>    +0 `ifconfig tun0 192.168.0.1/16`
>>>    +0 < . 1:1(0) ack 1 win 1000
>>>    +0 write(4, ..., 24) = -1
>>>
>>>
>>
>> This has nothing to do with the code patch you have changed.
>>
>> How have you tested your patch exactly ?
>>
I tested the patch, it can work.

[root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt
test.pkt:24: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out)

real	0m5.356s
user	0m0.026s
sys	0m0.104s

[root@localhost ~]# ss -toenmi src :8080
State       Recv-Q Send-Q               Local Address:Port                              Peer Address:Port
ESTAB       0      48                     192.168.0.1:8080                                 192.0.2.1:39559               timer:(persist,186ms,2) ino:37178 sk:6 <->
	 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:2 rtt:100.253/37.643 mss:1460 cwnd:10 bytes_acked:24 segs_out:5 segs_in:3 send 1.2Mbps lastsnd:976 lastrcv:4082 lastack:3982 pacing_rate 2.3Mbps rcv_space:29200


if change the TCP_USER_TIMEOUT to 30s, test result as below:

[root@localhost ~]#   time ./gtests/net/packetdrill/packetdrill test.pkt
test.pkt:37: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out)

real	0m44.362s
user	0m0.018s
sys	0m0.110s

ESTAB       0      360                         192.168.0.1:8080                                      192.0.2.1:47577     timer:(persist,516ms,6) ino:18806 sk:5 <->
	 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:6 rtt:100.228/37.623 mss:10 bytes_acked:24 segs_out:22 segs_in:3 send 1.2Mbps lastsnd:2343 lastrcv:40450 lastack:40350 pacing_rate 2.3Mbps rcv_sp

> 
> lpaa23:~# ss -toenmi src :8080
> State      Recv-Q Send-Q Local Address:Port               Peer
> Address:Port              
> ESTAB      0      144    192.168.134.161:8080
> 192.0.2.1:51165               timer:(persist,8.262ms,5) ino:1
> 82083 sk:3 <->
> 	 skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic
> wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 
> mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12
> segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l
> astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps
> app_limited busy:11346ms rcv_space:29200 notsent:1
> 44 minrtt:100.043userspace application still send data continuously
> 
> This is the typical RTO timer, not zero window probe.
> 
with the script, it is not zero window; but the code enter tcp_probe_timer.
->tcp_sendmsg
-->tcp_push
---->__tcp_push_pending_frames
------>tcp_write_xmit
-------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed)
---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH)
-------->tcp_transmit_skb (send fail, do not advance send head)
------->tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer)

Here userspace application still send data continuously,
it triggered ICSK_TIME_PROBE0 timer and tcp_send_head(sk)'s timestamp refresh.
Liu Jian Sept. 13, 2017, 7:15 a.m. UTC | #8
在 2017/9/13 14:56, liujian 写道:
> 
> 
> 在 2017/9/12 23:38, Eric Dumazet 写道:
>> On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote:
>>> On Tue, 2017-09-12 at 14:08 +0800, liujian wrote:
>>>> Hi,
>>>>
>>>> In the scenario, tcp server side IP changed, and at that memont,
>>>> userspace application still send data continuously;
>>>> tcp_send_head(sk)'s timestamp always be refreshed.
>>>>
>>>> Here is the packetdrill script:
>>>>
>>>>    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>>>>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>>>>    +0 bind(3, ..., ...) = 0
>>>>    +0 listen(3, 1) = 0
>>>>
>>>>    +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
>>>>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
>>>>
>>>>   +.1 < . 1:1(0) ack 1 win 65530
>>>>    +0 accept(3, ..., ...) = 4
>>>>
>>>>    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
>>>>    +0 write(4, ..., 24) = 24
>>>>    +0 > P. 1:25(24) ack 1 win 229
>>>>    +.1 < . 1:1(0) ack 25 win 65530
>>>>
>>>> //change the ipaddress
>>>>    +1 `ifconfig tun0 192.168.0.10/16`
>>>>
>>>>    +1 write(4, ..., 24) = 24
>>>>    +1 write(4, ..., 24) = 24
>>>>    +1 write(4, ..., 24) = 24
>>>>    +1 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>    +3 write(4, ..., 24) = 24
>>>>
>>>>    +0 `ifconfig tun0 192.168.0.1/16`
>>>>    +0 < . 1:1(0) ack 1 win 1000
>>>>    +0 write(4, ..., 24) = -1
>>>>
>>>>
>>>
>>> This has nothing to do with the code patch you have changed.
>>>
>>> How have you tested your patch exactly ?
>>>
> I tested the patch, it can work.
> 
> [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt
> test.pkt:24: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out)
> 
> real	0m5.356s
> user	0m0.026s
> sys	0m0.104s
> 
> [root@localhost ~]# ss -toenmi src :8080
> State       Recv-Q Send-Q               Local Address:Port                              Peer Address:Port
> ESTAB       0      48                     192.168.0.1:8080                                 192.0.2.1:39559               timer:(persist,186ms,2) ino:37178 sk:6 <->
> 	 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:2 rtt:100.253/37.643 mss:1460 cwnd:10 bytes_acked:24 segs_out:5 segs_in:3 send 1.2Mbps lastsnd:976 lastrcv:4082 lastack:3982 pacing_rate 2.3Mbps rcv_space:29200
> 
> 
> if change the TCP_USER_TIMEOUT to 30s, test result as below:
> 
> [root@localhost ~]#   time ./gtests/net/packetdrill/packetdrill test.pkt
> test.pkt:37: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out)
> 
> real	0m44.362s
> user	0m0.018s
> sys	0m0.110s
> 
> ESTAB       0      360                         192.168.0.1:8080                                      192.0.2.1:47577     timer:(persist,516ms,6) ino:18806 sk:5 <->
> 	 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:6 rtt:100.228/37.623 mss:10 bytes_acked:24 segs_out:22 segs_in:3 send 1.2Mbps lastsnd:2343 lastrcv:40450 lastack:40350 pacing_rate 2.3Mbps rcv_sp
> 
>>
>> lpaa23:~# ss -toenmi src :8080
>> State      Recv-Q Send-Q Local Address:Port               Peer
>> Address:Port              
>> ESTAB      0      144    192.168.134.161:8080
>> 192.0.2.1:51165               timer:(persist,8.262ms,5) ino:1
>> 82083 sk:3 <->
>> 	 skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic
>> wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 
>> mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12
>> segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l
>> astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps
>> app_limited busy:11346ms rcv_space:29200 notsent:1
>> 44 minrtt:100.043userspace application still send data continuously
>>
>> This is the typical RTO timer, not zero window probe.
>>
> with the script, it is not zero window; but the code enter tcp_probe_timer.
> ->tcp_sendmsg
> -->tcp_push
> ---->__tcp_push_pending_frames
> ------>tcp_write_xmit
> -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed)
> ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH)
> -------->tcp_transmit_skb (send fail, do not advance send head)
> ------->tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer)
sorry, change to
->tcp_sendmsg
-->tcp_push
---->__tcp_push_pending_frames
------>tcp_write_xmit
-------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed)
---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH)
------>tcp_write_xmit (send fail, do not advance send head)
------>tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer)

> Here userspace application still send data continuously,
> it triggered ICSK_TIME_PROBE0 timer and tcp_send_head(sk)'s timestamp refresh.
> 
>
Eric Dumazet Sept. 13, 2017, 2:40 p.m. UTC | #9
On Wed, 2017-09-13 at 15:15 +0800, liujian wrote:
> 
> 在 2017/9/13 14:56, liujian 写道:
> > 
> > 
> > 在 2017/9/12 23:38, Eric Dumazet 写道:
> >> On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote:
> >>> On Tue, 2017-09-12 at 14:08 +0800, liujian wrote:
> >>>> Hi,
> >>>>
> >>>> In the scenario, tcp server side IP changed, and at that memont,
> >>>> userspace application still send data continuously;
> >>>> tcp_send_head(sk)'s timestamp always be refreshed.
> >>>>
> >>>> Here is the packetdrill script:
> >>>>
> >>>>    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> >>>>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> >>>>    +0 bind(3, ..., ...) = 0
> >>>>    +0 listen(3, 1) = 0
> >>>>
> >>>>    +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
> >>>>    +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
> >>>>
> >>>>   +.1 < . 1:1(0) ack 1 win 65530
> >>>>    +0 accept(3, ..., ...) = 4
> >>>>
> >>>>    +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
> >>>>    +0 write(4, ..., 24) = 24
> >>>>    +0 > P. 1:25(24) ack 1 win 229
> >>>>    +.1 < . 1:1(0) ack 25 win 65530
> >>>>
> >>>> //change the ipaddress
> >>>>    +1 `ifconfig tun0 192.168.0.10/16`
> >>>>
> >>>>    +1 write(4, ..., 24) = 24
> >>>>    +1 write(4, ..., 24) = 24
> >>>>    +1 write(4, ..., 24) = 24
> >>>>    +1 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>    +3 write(4, ..., 24) = 24
> >>>>
> >>>>    +0 `ifconfig tun0 192.168.0.1/16`
> >>>>    +0 < . 1:1(0) ack 1 win 1000
> >>>>    +0 write(4, ..., 24) = -1
> >>>>
> >>>>
> >>>
> >>> This has nothing to do with the code patch you have changed.
> >>>
> >>> How have you tested your patch exactly ?
> >>>
> > I tested the patch, it can work.
> > 
> > [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt
> > test.pkt:24: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out)
> > 
> > real	0m5.356s
> > user	0m0.026s
> > sys	0m0.104s
> > 
> > [root@localhost ~]# ss -toenmi src :8080
> > State       Recv-Q Send-Q               Local Address:Port                              Peer Address:Port
> > ESTAB       0      48                     192.168.0.1:8080                                 192.0.2.1:39559               timer:(persist,186ms,2) ino:37178 sk:6 <->
> > 	 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:2 rtt:100.253/37.643 mss:1460 cwnd:10 bytes_acked:24 segs_out:5 segs_in:3 send 1.2Mbps lastsnd:976 lastrcv:4082 lastack:3982 pacing_rate 2.3Mbps rcv_space:29200
> > 
> > 
> > if change the TCP_USER_TIMEOUT to 30s, test result as below:
> > 
> > [root@localhost ~]#   time ./gtests/net/packetdrill/packetdrill test.pkt
> > test.pkt:37: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out)
> > 
> > real	0m44.362s
> > user	0m0.018s
> > sys	0m0.110s
> > 
> > ESTAB       0      360                         192.168.0.1:8080                                      192.0.2.1:47577     timer:(persist,516ms,6) ino:18806 sk:5 <->
> > 	 skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:6 rtt:100.228/37.623 mss:10 bytes_acked:24 segs_out:22 segs_in:3 send 1.2Mbps lastsnd:2343 lastrcv:40450 lastack:40350 pacing_rate 2.3Mbps rcv_sp
> > 
> >>
> >> lpaa23:~# ss -toenmi src :8080
> >> State      Recv-Q Send-Q Local Address:Port               Peer
> >> Address:Port              
> >> ESTAB      0      144    192.168.134.161:8080
> >> 192.0.2.1:51165               timer:(persist,8.262ms,5) ino:1
> >> 82083 sk:3 <->
> >> 	 skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic
> >> wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 
> >> mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12
> >> segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l
> >> astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps
> >> app_limited busy:11346ms rcv_space:29200 notsent:1
> >> 44 minrtt:100.043userspace application still send data continuously
> >>
> >> This is the typical RTO timer, not zero window probe.
> >>
> > with the script, it is not zero window; but the code enter tcp_probe_timer.
> > ->tcp_sendmsg
> > -->tcp_push
> > ---->__tcp_push_pending_frames
> > ------>tcp_write_xmit
> > -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed)
> > ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH)
> > -------->tcp_transmit_skb (send fail, do not advance send head)
> > ------->tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer)
> sorry, change to
> ->tcp_sendmsg
> -->tcp_push
> ---->__tcp_push_pending_frames
> ------>tcp_write_xmit
> -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed)
> ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH)
> ------>tcp_write_xmit (send fail, do not advance send head)
> ------>tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer)


OK but :

1) keepalive_time_elapsed() returns jiffies, and you compare it to a ms
value. It will break on kernels with HZ=100


2) keepalive_time_elapsed() is reset every time a packet is received.

With following packetdrill your patch (even if we fix 1)) breaks again


# ./packetdrill --local_ip 192.168.102.64 user_timeout.pkt
# cat user_timeout.pkt 

0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`

   +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>

  +.1 < . 1:1(0) ack 1 win 65530
   +0 accept(3, ..., ...) = 4

   +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
   +0 write(4, ..., 24) = 24
   +0 > P. 1:25(24) ack 1 win 115
   +.1 < . 1:1(0) ack 25 win 65530

//change the ipaddress
   +1 `ifconfig tun0 192.168.0.10/16`

   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24
   +1 write(4, ..., 24) = 24

   +0 `ifconfig tun0 192.168.102.64/16`
   +0 < . 1:2(1) ack 25 win 65530
   +0 `ifconfig tun0 192.168.0.10/16`

   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24
   +3 write(4, ..., 24) = 24

   +0 `ifconfig tun0 192.168.0.1/16`
   +0 < . 1:1(0) ack 1 win 1000
   +0 write(4, ..., 24) = -1


I would instead try to keep a meaningful value in
tcp_send_head(sk)->skb_mstamp if the transmit failed.

Or more exactly change skb->skb_mstamp only on (successfully)
transmitted skb.

I will provide a patch ( fixing net/ipv4/tcp_output.c )
diff mbox series

Patch

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 655dd8d..2a28a03 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -325,7 +325,7 @@  static void tcp_probe_timer(struct sock *sk)
 	if (!start_ts)
 		tcp_send_head(sk)->skb_mstamp = tp->tcp_mstamp;
 	else if (icsk->icsk_user_timeout &&
-		 (s32)(tcp_time_stamp(tp) - start_ts) >
+		 keepalive_time_elapsed(tp) >=
 		 jiffies_to_msecs(icsk->icsk_user_timeout))
 		goto abort;