diff mbox

Bug in tcp timestamp option ? TSecr in SYN-ACK != TSval in SYN

Message ID 1432733597.4060.375.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 27, 2015, 1:33 p.m. UTC
On Wed, 2015-05-27 at 00:09 -0700, Gopakumar Choorakkot Edakkunni wrote:
> Hi Eric,
> 
> Thanks a lot for the response, and sorry about the 3-times-email, was
> not sure whether majordomo accepted my subscription or not and hence
> the retx :)
> 
> So if a sequence happens as below
> 
> 1. Client sends the first SYN with some TSval X
> 2. Server responds SYN-ACK with TSecr X
> 3. The SYN-ACK just gets dropped on the way back to the client
> 4. Client sends a SYN retry after N seconds with a new TSval Y
> 5. Server responds SYN-ACK with TSecr X

This not what happens.

Here is the problem I think :

1. Client sends the first SYN with some TSval X

Then application canceled the connect(), like doing a close() or exit()
or core dump.

RST packet is sent. But lost by the network.

<~20 seconds later>  port is reused by client doing another
socket()/connect(same target). We have only ~30000 available source
port, so they are going to be reused at some point, depending on the
number of ports in use.

1. Client sends another SYN with Tsval X+20000.


2. Server responds SYN-ACK with TSecr X because it did not forget about
original SYN.
3. The SYN-ACK is dropped by client because of PAWS (RFC 7323)

> 
> And if there is some firewall in between in the amazon environment
> where the firewall expects to see the SYN-ACK with TSecr Y, then I
> guess it matches the problem I saw ? In my case clearly the SYN-ACKs
> never reached the client no matter how many times they were
> retransmitted. So this would mean that if there is such a wierd
> firewall in between, then one missing SYN-ACK can cause the tcp
> connection to eventually timeout ! This of course is just guesswork
> based on what we saw as the behaviour from tcpdump on server and
> client side when the timeouts were happening. Does this sound like a
> possibility - has anyone come across "interesting" firewalls like this
> ?
> 
> And about your question: "Are you establishing many active sessions
> per minute to this particular target ?" - in my particular case there
> were not more than three linux client boxes sitting behind a NAT
> (sharing the same public IP) and talking to the same server. And each
> client box opens a tcp socket once in 30 seconds to the server. So the
> number of active sessions per 30 seconds is not more than 3 sessions.
> Now if the NAT device had some bug and ended up NAT-ing more than one
> client SYN packet to the same source port, then of course thats
> another theory for why this TSecr/TSval mismatch can happen (other
> than the SYN-ACK drop theory above).

I really not think a NAT is  the problem here.

The problem is in linux code itself. Please try the patch I sent ?
(On the client)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet May 27, 2015, 1:45 p.m. UTC | #1
On Wed, 2015-05-27 at 06:33 -0700, Eric Dumazet wrote:

> The problem is in linux code itself. Please try the patch I sent ?
> (On the client)

On the server, sorry for the typo.

> 
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index df7fe3c31162e77b96f81399ef7d893485ab3d91..70db6572d241e132c28c381dfc1155b150c9557b 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -588,6 +588,9 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>  	if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
>  	    flg == TCP_FLAG_SYN &&
>  	    !paws_reject) {
> +		if (tmp_opt.saw_tstamp &&
> +		    after(tmp_opt.rcv_tsval, req->ts_recent))
> +			req->ts_recent = tmp_opt.rcv_tsval;
>  		/*
>  		 * RFC793 draws (Incorrectly! It was fixed in RFC1122)
>  		 * this case on figure 6 and figure 8, but formal


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gopakumar Choorakkot Edakkunni May 27, 2015, 5:29 p.m. UTC | #2
Thanks Eric. I will give this a spin. The issue doesnt happen all the
time, I can keep the server running with this patch for a while and
observe if the issue resurfaces or not

Rgds,
Gopa.

On Wed, May 27, 2015 at 6:45 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-05-27 at 06:33 -0700, Eric Dumazet wrote:
>
>> The problem is in linux code itself. Please try the patch I sent ?
>> (On the client)
>
> On the server, sorry for the typo.
>
>>
>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
>> index df7fe3c31162e77b96f81399ef7d893485ab3d91..70db6572d241e132c28c381dfc1155b150c9557b 100644
>> --- a/net/ipv4/tcp_minisocks.c
>> +++ b/net/ipv4/tcp_minisocks.c
>> @@ -588,6 +588,9 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>>       if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
>>           flg == TCP_FLAG_SYN &&
>>           !paws_reject) {
>> +             if (tmp_opt.saw_tstamp &&
>> +                 after(tmp_opt.rcv_tsval, req->ts_recent))
>> +                     req->ts_recent = tmp_opt.rcv_tsval;
>>               /*
>>                * RFC793 draws (Incorrectly! It was fixed in RFC1122)
>>                * this case on figure 6 and figure 8, but formal
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 27, 2015, 5:57 p.m. UTC | #3
On Wed, 2015-05-27 at 10:29 -0700, Gopakumar Choorakkot Edakkunni wrote:
> Thanks Eric. I will give this a spin. The issue doesnt happen all the
> time, I can keep the server running with this patch for a while and
> observe if the issue resurfaces or not

Note that if the traffic on the server is low, you might want to keep a
tcpdump running (capturing only headers for TCP port 443) for further
analysis.

Thanks !



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 27, 2015, 6:02 p.m. UTC | #4
On Wed, 2015-05-27 at 10:57 -0700, Eric Dumazet wrote:
> On Wed, 2015-05-27 at 10:29 -0700, Gopakumar Choorakkot Edakkunni wrote:
> > Thanks Eric. I will give this a spin. The issue doesnt happen all the
> > time, I can keep the server running with this patch for a while and
> > observe if the issue resurfaces or not
> 
> Note that if the traffic on the server is low, you might want to keep a
> tcpdump running (capturing only headers for TCP port 443) for further
> analysis.

We wonder if this could be a time-wait issue

Could you check on the server :

sysctl net.ipv4.tcp_tw_recycle
sysctl net.ipv4.tcp_tw_reuse

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gopakumar Choorakkot Edakkunni May 27, 2015, 6:15 p.m. UTC | #5
Doesnt seem so. This is the output from one of the servers I have
where I periodically hit this TSval != TSecr condition.

ubuntu@server:~$ sudo su
root@server:/home/ubuntu# sysctl net.ipv4.tcp_tw_recycle
net.ipv4.tcp_tw_recycle = 0
root@server:/home/ubuntu#  sysctl net.ipv4.tcp_tw_reuse
net.ipv4.tcp_tw_reuse = 0

Also I dont think your theory about port-reuse is what is happening
either - because as I mentioned, its a very very lightly loaded
server, and the client is also very very light weight in terms of tcp
connections (makes one connection every 30 seconds to this one server
and no one else). But the reason I mentioned the SYN-ACK-lost +
SYN-retry theory is because the client<-->server is over internet via
standard broadband links shared across multiple people and hence can
be quite lossy.

At any rate, whatever is the cause behind this, I guess what you
mentioned still holds good - that the tcp stack needs to update the
saved TSval to that of the latest SYN, right ?

Rgds,
Gopa.


On Wed, May 27, 2015 at 11:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-05-27 at 10:57 -0700, Eric Dumazet wrote:
>> On Wed, 2015-05-27 at 10:29 -0700, Gopakumar Choorakkot Edakkunni wrote:
>> > Thanks Eric. I will give this a spin. The issue doesnt happen all the
>> > time, I can keep the server running with this patch for a while and
>> > observe if the issue resurfaces or not
>>
>> Note that if the traffic on the server is low, you might want to keep a
>> tcpdump running (capturing only headers for TCP port 443) for further
>> analysis.
>
> We wonder if this could be a time-wait issue
>
> Could you check on the server :
>
> sysctl net.ipv4.tcp_tw_recycle
> sysctl net.ipv4.tcp_tw_reuse
>
> Thanks
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 27, 2015, 9:05 p.m. UTC | #6
On Wed, 2015-05-27 at 11:15 -0700, Gopakumar Choorakkot Edakkunni wrote:
> Doesnt seem so. This is the output from one of the servers I have
> where I periodically hit this TSval != TSecr condition.
> 
> ubuntu@server:~$ sudo su
> root@server:/home/ubuntu# sysctl net.ipv4.tcp_tw_recycle
> net.ipv4.tcp_tw_recycle = 0
> root@server:/home/ubuntu#  sysctl net.ipv4.tcp_tw_reuse
> net.ipv4.tcp_tw_reuse = 0
> 
> Also I dont think your theory about port-reuse is what is happening
> either - because as I mentioned, its a very very lightly loaded
> server, and the client is also very very light weight in terms of tcp
> connections (makes one connection every 30 seconds to this one server
> and no one else). But the reason I mentioned the SYN-ACK-lost +
> SYN-retry theory is because the client<-->server is over internet via
> standard broadband links shared across multiple people and hence can
> be quite lossy.
> 
> At any rate, whatever is the cause behind this, I guess what you
> mentioned still holds good - that the tcp stack needs to update the
> saved TSval to that of the latest SYN, right ?

Well, considering ISN is the same, I really doubt these are different
sessions.

tcpdump traces were taken on the server ?

So far, nothing really calls for SYNACK rtx carrying different TSecr,
RFC says nothing about this case, so either choice is valid.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index df7fe3c31162e77b96f81399ef7d893485ab3d91..70db6572d241e132c28c381dfc1155b150c9557b 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -588,6 +588,9 @@  struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
 	    flg == TCP_FLAG_SYN &&
 	    !paws_reject) {
+		if (tmp_opt.saw_tstamp &&
+		    after(tmp_opt.rcv_tsval, req->ts_recent))
+			req->ts_recent = tmp_opt.rcv_tsval;
 		/*
 		 * RFC793 draws (Incorrectly! It was fixed in RFC1122)
 		 * this case on figure 6 and figure 8, but formal