diff mbox

NFS/TCP/IPv6 acting strangely in 4.2

Message ID 1441976691.4619.58.camel@edumazet-glaptop2.roam.corp.google.com
State New
Headers show

Commit Message

Eric Dumazet Sept. 11, 2015, 1:04 p.m. UTC
On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> I have a recent Marvell Armada 388 board here which uses the mvneta
> driver.  I'm seeing some weird effects with NFS with it acting as a
> client.  Unfortunately, the board does not have a functional RTC.
> 
> The NFS server is the same NFS server that I've used for years with
> multiple other clients (it's my main NFS server) and it continues to
> support all other clients without any ill effect.  The NFS server
> kernel is rather old now, being a 3.x kernel.
> 
> The NFS client appears to connect using TCP/IPv6 and initially is
> accessible.  Everything appears to work normally, and then the NFS
> server appears to stop responding.
> 
> Running tcpdump on the NFS server, and then dumping the captured
> packets
> with tshark (because tcpdump appears not to understand IPv6 SYNs on
> the
> NFS port) shows:
> 
>   3   0.030036 armada388 -> n2100 TCP 843→nfs [SYN] Seq=936803106
> Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128

>   4   0.030409 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=409465870
> Ack=936803107 Win=28560 Len=0 MSS=1440 SACK_PERM=1
>  TSval=818169117 TSecr=892366 WS=64

>   5   0.030493 armada388 -> n2100 TCP [TCP Port numbers reused]
> 843→nfs [SYN] Seq=936803633 Win=28800 Len=0 MSS=1440 SACK_PERM=1
> TSval=892366 TSecr=0 WS=128

Yes, this packet looks very wrong. Like two simultaneous connect with
same source port. It is not a retransmit (Seq numbers differ)

>   6   0.030699 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0
> Ack=936803634 Win=0 Len=0
>   7   0.030766 armada388 -> n2100 TCP 843→nfs [RST] Seq=936803107
> Win=0 Len=0

I suspect port reuse in NFS being broken.

Have you tried to apply

commit 099392048cd443349c50310f7fdc96070e40f4e7
Author: Trond Myklebust <trond.myklebust@primarydata.com>
Date:   Sat Aug 29 19:11:21 2015 -0700

    SUNRPC: Prevent SYN+SYNACK+RST storms
    
    Add a shutdown() call before we release the socket in order to ensure the
    reset is sent before we try to reconnect.
    
    Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

Comments

Russell King - ARM Linux Sept. 11, 2015, 1:18 p.m. UTC | #1
On Fri, Sep 11, 2015 at 06:04:51AM -0700, Eric Dumazet wrote:
> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> > I have a recent Marvell Armada 388 board here which uses the mvneta
> > driver.  I'm seeing some weird effects with NFS with it acting as a
> > client.  Unfortunately, the board does not have a functional RTC.
> > 
> > The NFS server is the same NFS server that I've used for years with
> > multiple other clients (it's my main NFS server) and it continues to
> > support all other clients without any ill effect.  The NFS server
> > kernel is rather old now, being a 3.x kernel.
> > 
> > The NFS client appears to connect using TCP/IPv6 and initially is
> > accessible.  Everything appears to work normally, and then the NFS
> > server appears to stop responding.
> > 
> > Running tcpdump on the NFS server, and then dumping the captured
> > packets
> > with tshark (because tcpdump appears not to understand IPv6 SYNs on
> > the
> > NFS port) shows:
> > 
> >   3   0.030036 armada388 -> n2100 TCP 843→nfs [SYN] Seq=936803106
> > Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128
> 
> >   4   0.030409 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=409465870
> > Ack=936803107 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> >  TSval=818169117 TSecr=892366 WS=64
> 
> >   5   0.030493 armada388 -> n2100 TCP [TCP Port numbers reused]
> > 843→nfs [SYN] Seq=936803633 Win=28800 Len=0 MSS=1440 SACK_PERM=1
> > TSval=892366 TSecr=0 WS=128
> 
> Yes, this packet looks very wrong. Like two simultaneous connect with
> same source port. It is not a retransmit (Seq numbers differ)
> 
> >   6   0.030699 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0
> > Ack=936803634 Win=0 Len=0
> >   7   0.030766 armada388 -> n2100 TCP 843→nfs [RST] Seq=936803107
> > Win=0 Len=0
> 
> I suspect port reuse in NFS being broken.
> 
> Have you tried to apply

Thanks, I'll give this a go and report back.

I've been trying to debug this unsuccessfully.  It's really not helpful
to enable rpc debugging, and then be greeted with this from printk:

[ 2760.156924] RPC:       state 4 conn 1 dead 0 zapped 1 sk_shutdown 3
[ 2760.163218] RPC:       xs_tcp_state_change client ee240800...
[ 2760.168976] RPC:       state 5 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 2760.175273] RPC:       xs_tcp_state_change client ee240800...
[ 2760.181032] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 2760.187315] RPC:       disconnected transport ee240800
[ 2760.192467] RPC:       xs_tcp_state_change client ee240800...
[ 2760.198224] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 2760.204505] RPC:       disconnected transport ee240800
[ 2760.209653] RPC:       xs_tcp_data_ready...
[ 2760.215001] RPC:       wake_up_first(ee240918 "xprt_sending")
lspci -vvx |less
[ 2805.912330] RPC:       looking up Generic cred
[ 2805.916791] RPC:       looking up Generic cred
[ 2805.921259] RPC:       looking up Generic cred
[ 2805.921784] RPC:       looking up Generic cred
[ 2805.921787] RPC:       looking up Generic cred
[ 2805.921788] RPC:       looking up Generic cred
[ 2805.921792] RPC:       new task initialized, procpid 327
[ 2805.921794] RPC:       allocated task ee358300
[ 2805.921797] RPC:  1335 __rpc_execute flags=0x80
[ 2805.921800] RPC:  1335 call_start nfs3 proc ACCESS (sync)
[ 2805.921801] RPC:  1335 call_reserve (status 0)
[ 2805.921805] RPC:  1335 reserved req ee8a2900 xid af74741b
** 168 printk messages dropped ** [ 2805.928207] RPC:       read reply XID b174741b
** 366 printk messages dropped ** [ 2805.941242] RPC:  1343 setting alarm for 60000 ms
** 310 printk messages dropped ** [ 2805.958274] RPC:       xprt = ee240800, tcp_copied = 120, tcp_offset = 120, tcp_reclen = 120
** 89 printk messages dropped ** [ 2805.959613] RPC:  1350 reserved req ee8a2900 xid be74741b
[ 2805.959614] RPC:       wake_up_first(ee240918 "xprt_sending")

which is just where we want to see the printk messages.  This kind'a
makes printk and the rpc debug stuff rather useless. :(
Russell King - ARM Linux Sept. 11, 2015, 2:33 p.m. UTC | #2
On Fri, Sep 11, 2015 at 06:04:51AM -0700, Eric Dumazet wrote:
> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> > Running tcpdump on the NFS server, and then dumping the captured
> > packets
> > with tshark (because tcpdump appears not to understand IPv6 SYNs on
> > the
> > NFS port) shows:
> > 
> >   3   0.030036 armada388 -> n2100 TCP 843→nfs [SYN] Seq=936803106
> > Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=892366 TSecr=0 WS=128
> 
> >   4   0.030409 n2100 -> armada388 TCP nfs→843 [SYN, ACK] Seq=409465870
> > Ack=936803107 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> >  TSval=818169117 TSecr=892366 WS=64
> 
> >   5   0.030493 armada388 -> n2100 TCP [TCP Port numbers reused]
> > 843→nfs [SYN] Seq=936803633 Win=28800 Len=0 MSS=1440 SACK_PERM=1
> > TSval=892366 TSecr=0 WS=128
> 
> Yes, this packet looks very wrong. Like two simultaneous connect with
> same source port. It is not a retransmit (Seq numbers differ)
> 
> >   6   0.030699 n2100 -> armada388 TCP nfs→843 [RST, ACK] Seq=0
> > Ack=936803634 Win=0 Len=0
> >   7   0.030766 armada388 -> n2100 TCP 843→nfs [RST] Seq=936803107
> > Win=0 Len=0
> 
> I suspect port reuse in NFS being broken.
> 
> Have you tried to apply
> 
> commit 099392048cd443349c50310f7fdc96070e40f4e7
> Author: Trond Myklebust <trond.myklebust@primarydata.com>
> Date:   Sat Aug 29 19:11:21 2015 -0700
> 
>     SUNRPC: Prevent SYN+SYNACK+RST storms
>     
>     Add a shutdown() call before we release the socket in order to ensure the
>     reset is sent before we try to reconnect.
>     
>     Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

Applying this patch changes the behaviour slightly, but doesn't fix the
problem:

  1   0.000000 armada388 -> n2100 TCP hcp-wismar→nfs [SYN] Seq=3766378552 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
  2   0.000431 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1297069321 Ack=3766378553 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361932 TSecr=240384 WS=64
  3   0.000518 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766379468 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
  4   0.000740 n2100 -> armada388 TCP nfs→hcp-wismar [RST, ACK] Seq=0 Ack=3766379469 Win=0 Len=0
  5   0.000812 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766379980 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
  6   0.001051 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1297080280 Ack=3766379981 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361932 TSecr=240384 WS=64
  7   0.001137 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766380519 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
  8   0.001352 n2100 -> armada388 TCP nfs→hcp-wismar [RST, ACK] Seq=0 Ack=3766380520 Win=0 Len=0

Notice the lack of second reset, but the syns trying to re-use the same
port but with different sequence numbers is still there.

Eventually, we get to a storm of SYNs:

468   0.084816 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766488832 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
469   0.085046 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1298392849 Ack=3766488833 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361940 TSecr=240384 WS=64
470   0.085136 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766489333 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
471   0.085391 n2100 -> armada388 TCP nfs→hcp-wismar [RST, ACK] Seq=0 Ack=3766489334 Win=0 Len=0
472   0.085482 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766489762 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
473   0.085777 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766490263 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
474   0.086023 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766490694 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
...
526   0.104063 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766514610 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
527   0.104308 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766515182 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
528   0.104560 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766515621 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
529   0.104799 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766516122 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
530   0.105052 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766516556 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
...
550   0.110487 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3766525755 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=240384 TSecr=0 WS=128
551   0.114016 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1298403514 Ack=3766489763 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869361940 TSecr=240384 WS=64

followed by a storm of SYN,ACKs and resets.  This seems to happen several
times in the packet log.

At some point something starts going really crazy:

9341   2.668325 n2100 -> armada388 TCP nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
9342   2.668413 n2100 -> armada388 TCP [TCP Dup ACK 9341#1] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
9343   2.668423 n2100 -> armada388 TCP [TCP Dup ACK 9341#2] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
9344   2.668433 n2100 -> armada388 TCP [TCP Dup ACK 9341#3] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362196 TSecr=240615
...
9419   2.669586 n2100 -> armada388 TCP [TCP Dup ACK 9341#78] nfs→hcp-wismar [ACK] Seq=1337671519 Ack=3802565165 Win=28560 Len=0 TSval=869362198 TSecr=240615

There's no other packets logged between any of these duplicated acks and
I don't seem to have anything preceeding that with a sequence number of
380256516*, so I assume tcpdump dropped some packets.

Eventually, we get to:

31213   8.247815 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3892006879 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=241188 TSecr=0 WS=128
...
31337   8.297654 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891941177 Win=0 Len=0
31338   8.297753 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891942022 Win=0 Len=0
31339   8.297798 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891942867 Win=0 Len=0
31340   8.297835 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891943715 Win=0 Len=0
31341   8.297872 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891944560 Win=0 Len=0
31342   8.298076 armada388 -> n2100 TCP hcp-wismar→nfs [RST] Seq=3891945416 Win=0 Len=0
31343   8.300232 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1425940018 Ack=3892006880 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869362756 TSecr=241188 WS=64

31344   9.050382 armada388 -> n2100 TCP [TCP Port numbers reused] hcp-wismar→nfs [SYN] Seq=3892315104 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=241289 TSecr=0 WS=128
31345   9.050760 n2100 -> armada388 TCP nfs→hcp-wismar [SYN, ACK] Seq=1438481320 Ack=3892315105 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869362837 TSecr=241289 WS=64
31346   9.051115 armada388 -> n2100 TCP hcp-wismar→nfs [ACK] Seq=3892315105 Ack=1438481321 Win=28800 Len=0 TSval=241289 TSecr=869362837
31347   9.051624 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
31348   9.051878 n2100 -> armada388 TCP nfs→hcp-wismar [ACK] Seq=1438481321 Ack=3892315225 Win=28608 Len=0 TSval=869362837 TSecr=241289
31349   9.052164 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]

where things start working again.

It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
too, but I don't see that solving the multiple _concurrent_ connection
attempts with the same port number - presumably it's somehow trying to
make the same socket repeatedly connect despite a previous connection
being in progress, which would have nothing to do with cleaning up a
previous attempt.

To me, it looks like the sunrpc code isn't tracking whether a TCP
connection is already in progress, and waiting for the TCP connection
to either finish connecting, or fail to connect before attempting
another connection.
Russell King - ARM Linux Sept. 11, 2015, 3:06 p.m. UTC | #3
On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> too, but I don't see that solving the multiple _concurrent_ connection
> attempts with the same port number - presumably it's somehow trying to
> make the same socket repeatedly connect despite a previous connection
> being in progress, which would have nothing to do with cleaning up a
> previous attempt.

As I suspected, applying the above commit in addition does not solve the
problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
SYNACK SYN RSTACK, and eventual SYN storms.

I do have this captured as well:

2558   0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
2559   0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
2560   0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2561   0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001

The packet at 2561 looks wrong to me - this doesn't follow what I know
would be the standard TCP setup of syn, synack, ack, because that final
ack is in the wrong direction.

2562   0.834965 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228969 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2563   0.835156 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
2564   0.835218 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054229397 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2565   0.835420 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
2566   0.835488 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054229824 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2567   0.835681 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
2568   0.835753 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054230261 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2569   0.835971 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
...
2701   0.858226 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622247 TSecr=60001
2702   0.858270 n2100 -> armada388 TCP [TCP Dup ACK 2701#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622247 TSecr=60001
2703   0.858283 n2100 -> armada388 TCP [TCP Dup ACK 2701#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622247 TSecr=60001
...
2772   0.859373 n2100 -> armada388 TCP [TCP Dup ACK 2701#71] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622248 TSecr=60001
2773   0.892349 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054273361 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
2774   0.892725 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622252 TSecr=60001
...
2960   0.979230 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622259 TSecr=60001
2961   0.979278 n2100 -> armada388 TCP [TCP Dup ACK 2960#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622259 TSecr=60001
2962   0.979291 n2100 -> armada388 TCP [TCP Dup ACK 2960#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622259 TSecr=60001
...
3030   0.980329 n2100 -> armada388 TCP [TCP Dup ACK 2960#70] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622260 TSecr=60001
3031   0.980343 n2100 -> armada388 TCP [TCP Dup ACK 2960#71] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622260 TSecr=60001
3032   0.989005 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1056293352 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60018 TSecr=0 WS=128
3033   0.989462 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622262 TSecr=60001
...
3206   1.011270 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1056917791 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60022 TSecr=0 WS=128
3207   1.011721 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622263 TSecr=60001
3208   1.011765 n2100 -> armada388 TCP [TCP Dup ACK 3207#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622263 TSecr=60001
3209   1.011779 n2100 -> armada388 TCP [TCP Dup ACK 3207#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622263 TSecr=60001
...
3247   1.012250 n2100 -> armada388 TCP [TCP Dup ACK 3207#40] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622264 TSecr=60001
3248   1.012528 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1056918216 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60022 TSecr=0 WS=128
3249   1.012881 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622264 TSecr=60001
... eventually ...
3483   1.096726 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1057880967 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60028 TSecr=0 WS=128
3484   1.099249 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622271 TSecr=60001
3485   1.099368 n2100 -> armada388 TCP [TCP Dup ACK 3484#1] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622271 TSecr=60001
3486   1.099401 n2100 -> armada388 TCP [TCP Dup ACK 3484#2] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622271 TSecr=60001
...
3541   1.100250 n2100 -> armada388 TCP [TCP Dup ACK 3484#57] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622272 TSecr=60001
3542   1.100261 n2100 -> armada388 TCP [TCP Dup ACK 3484#58] nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622272 TSecr=60001
3543   1.109405 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1057881400 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60028 TSecr=0 WS=128
3544   1.109778 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622274 TSecr=60001
...
3555   1.111223 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1057883994 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60028 TSecr=0 WS=128
3556   1.111403 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622274 TSecr=60001
...
3604   1.190099 armada388 -> n2100 TCP rpasswd→nfs [RST] Seq=1053655488 Win=0 Len=0

It looks like the SYNs for the new connections provoke ACKs from an
existing connection but obviously different sequence number.  That
looks weird, shouldn't they be provoking resets or being ignored?

What if these SYN packets were spoofed from half-way across the net?
It seems like it could be used to provoke ACK traffic (with valid
sequence numbers) against a potential victim.  However, that's the
3.x kernel which is responding with those ACKs - let's put that to
one side for the time being until there's an answer for the NFS
problem.
Eric Dumazet Sept. 11, 2015, 3:18 p.m. UTC | #4
On Fri, 2015-09-11 at 16:06 +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> > It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> > too, but I don't see that solving the multiple _concurrent_ connection
> > attempts with the same port number - presumably it's somehow trying to
> > make the same socket repeatedly connect despite a previous connection
> > being in progress, which would have nothing to do with cleaning up a
> > previous attempt.
> 
> As I suspected, applying the above commit in addition does not solve the
> problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
> SYNACK SYN RSTACK, and eventual SYN storms.
> 
> I do have this captured as well:
> 
> 2558   0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
> 2559   0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
> 2560   0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
> 2561   0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
> 
> The packet at 2561 looks wrong to me - this doesn't follow what I know
> would be the standard TCP setup of syn, synack, ack, because that final
> ack is in the wrong direction.
> 

This 2561 packet is an ACK packet, because n2100 has a SYN_RECV socket
created by packet 2558.

It receives a SYN packet (2560) that it interprets as a packet slightly
out of sequence (1054228544 being above 1053655487) for this SYN_RECV

The wrong packet is 2560, not 2561
Russell King - ARM Linux Sept. 11, 2015, 4:24 p.m. UTC | #5
On Fri, Sep 11, 2015 at 08:18:43AM -0700, Eric Dumazet wrote:
> On Fri, 2015-09-11 at 16:06 +0100, Russell King - ARM Linux wrote:
> > On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> > > It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> > > too, but I don't see that solving the multiple _concurrent_ connection
> > > attempts with the same port number - presumably it's somehow trying to
> > > make the same socket repeatedly connect despite a previous connection
> > > being in progress, which would have nothing to do with cleaning up a
> > > previous attempt.
> > 
> > As I suspected, applying the above commit in addition does not solve the
> > problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
> > SYNACK SYN RSTACK, and eventual SYN storms.
> > 
> > I do have this captured as well:
> > 
> > 2558   0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
> > 2559   0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
> > 2560   0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
> > 2561   0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
> > 
> > The packet at 2561 looks wrong to me - this doesn't follow what I know
> > would be the standard TCP setup of syn, synack, ack, because that final
> > ack is in the wrong direction.
> > 
> 
> This 2561 packet is an ACK packet, because n2100 has a SYN_RECV socket
> created by packet 2558.
> 
> It receives a SYN packet (2560) that it interprets as a packet slightly
> out of sequence (1054228544 being above 1053655487) for this SYN_RECV
> 
> The wrong packet is 2560, not 2561

Ok.

Looking deeper at the XPRT sunrpc code, I have to wonder about the
sanity of this:

void xprt_connect(struct rpc_task *task)
{
...
        if (!xprt_connected(xprt)) {
...
                if (test_bit(XPRT_CLOSING, &xprt->state))
                        return;
                if (xprt_test_and_set_connecting(xprt))
                        return;
                xprt->stat.connect_start = jiffies;
                xprt->ops->connect(xprt, task);

That calls into xs_connect(), which schedules a workqueue to do the
connection.  The workqueue will call xs_tcp_setup_socket().

xs_tcp_setup_socket() creates a socket if one didn't exist, otherwise
re-using the previously obtained socket (which'll be why its using the
same socket) and then goes on to call xs_tcp_finish_connecting().

xs_tcp_finish_connecting() calls kernel_connect(), which will return
-EINPROGRESS.  We seem to treat EINPROGRESS as if the connection was
successful:

        case 0:
        case -EINPROGRESS:
        case -EALREADY:
                xprt_unlock_connect(xprt, transport);
                xprt_clear_connecting(xprt);
                return;

and the xprt_clear_connecting() results in this whole path being
re-opened: the socket is not yet connected, so xprt_connected() will
return false, and despite the socket connection still being mid-way
through being connected, we clear the "connecting" status, causing
xprt_test_and_set_connecting() to return false.

That allows us to re-call xprt->ops->connect, re-queue the connect
worker, and re-run the call to kernel_connect() for a socket which is
already mid-way through being connected.

Shouldn't the "connecting" status only be cleared when either the socket
has _finished_ connecting, or when the connection has _failed_ to connect,
and not when it's mid-way through connecting?

I've not been able to prove this: I've set rpc_debug to 129 to log
just xprt and trans RPC facilities, and that's sufficient to change
the timing such that this doesn't happen.
diff mbox

Patch

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 214ca9dfb14e..7be90bc1a7c2 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -822,6 +822,8 @@  static void xs_reset_transport(struct sock_xprt *transport)
        if (atomic_read(&transport->xprt.swapper))
                sk_clear_memalloc(sk);
 
+       kernel_sock_shutdown(sock, SHUT_RDWR);
+
        write_lock_bh(&sk->sk_callback_lock);
        transport->inet = NULL;
        transport->sock = NULL;