Message ID | 1467385815-6357-1-git-send-email-jbaron@akamai.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
> On Mac OSX 10.11.5 (latest version), we have found that when tcp > connections are abruptly terminated (via ^C), a FIN is sent followed > by an RST packet. The RST is sent with the same sequence number as the > FIN, and thus dropped since the stack only accepts RST packets matching > rcv_nxt (RFC 5961). The Linux behaviour appears to be correct, and accepting a broken FIN is most definitely a bad idea, it has a very different effect to a FIN because there can be data going the other direction and a FIN is a one way close Alan
On 07/01/2016 08:10 AM, Jason Baron wrote: > I'm wondering if anybody else has run into this... > > On Mac OSX 10.11.5 (latest version), we have found that when tcp > connections are abruptly terminated (via ^C), a FIN is sent followed > by an RST packet. That just seems, well, silly. If the client application wants to use abortive close (sigh..) it should do so, there shouldn't be this little-bit-pregnant, correct close initiation (FIN) followed by a RST. > The RST is sent with the same sequence number as the > FIN, and thus dropped since the stack only accepts RST packets matching > rcv_nxt (RFC 5961). This could also be resolved if Mac OSX replied with > an RST on the closed socket, but it appears that it does not. > > The workaround here is then to reset the connection, if the RST is > is equal to rcv_nxt - 1, if we have already received a FIN. > > The RST attack surface is limited b/c we only accept the RST after we've > accepted a FIN and have not previously sent a FIN and received back the > corresponding ACK. In other words RST is only accepted in the tcp > states: TCP_CLOSE_WAIT, TCP_LAST_ACK, and TCP_CLOSING. > > I'm interested if anybody else has run into this issue. Its problematic > since it takes up server resources for sockets sitting in TCP_CLOSE_WAIT. Isn't the server application expected to act on the read return of zero (which is supposed to be) triggered by the receipt of the FIN segment? rick jones > We are also in the process of contacting Apple to see what can be done > here...workaround patch is below.
On 07/01/2016 01:08 PM, Rick Jones wrote: > On 07/01/2016 08:10 AM, Jason Baron wrote: >> I'm wondering if anybody else has run into this... >> >> On Mac OSX 10.11.5 (latest version), we have found that when tcp >> connections are abruptly terminated (via ^C), a FIN is sent followed >> by an RST packet. > > That just seems, well, silly. If the client application wants to use > abortive close (sigh..) it should do so, there shouldn't be this > little-bit-pregnant, correct close initiation (FIN) followed by a RST. > >> The RST is sent with the same sequence number as the >> FIN, and thus dropped since the stack only accepts RST packets matching >> rcv_nxt (RFC 5961). This could also be resolved if Mac OSX replied with >> an RST on the closed socket, but it appears that it does not. >> >> The workaround here is then to reset the connection, if the RST is >> is equal to rcv_nxt - 1, if we have already received a FIN. >> >> The RST attack surface is limited b/c we only accept the RST after we've >> accepted a FIN and have not previously sent a FIN and received back the >> corresponding ACK. In other words RST is only accepted in the tcp >> states: TCP_CLOSE_WAIT, TCP_LAST_ACK, and TCP_CLOSING. >> >> I'm interested if anybody else has run into this issue. Its problematic >> since it takes up server resources for sockets sitting in TCP_CLOSE_WAIT. > > Isn't the server application expected to act on the read return of zero > (which is supposed to be) triggered by the receipt of the FIN segment? > yes, we do in fact see a POLLRDHUP from the FIN in this case and read of zero, but we still have more data to write to the socket, and b/c the RST is dropped here, the socket stays in TIME_WAIT until things eventually time out... Thanks, -Jason > rick jones > >> We are also in the process of contacting Apple to see what can be done >> here...workaround patch is below.
> yes, we do in fact see a POLLRDHUP from the FIN in this case and > read of zero, but we still have more data to write to the socket, and > b/c the RST is dropped here, the socket stays in TIME_WAIT until > things eventually time out... After the FIN when you send/retransmit your next segment do you then get a valid RST back from the Mac end? Alan
On 07/01/2016 02:16 PM, One Thousand Gnomes wrote: >> yes, we do in fact see a POLLRDHUP from the FIN in this case and >> read of zero, but we still have more data to write to the socket, and >> b/c the RST is dropped here, the socket stays in TIME_WAIT until >> things eventually time out... > > After the FIN when you send/retransmit your next segment do you then get > a valid RST back from the Mac end? > > Alan > No, we only get the single RST after the FIN from the Mac side which is dropped. I would have expected the RST from the Mac after the retransmits, but we don't see any further transmits from the Mac. And the linux socket stays in CLOSE-WAIT (i mistakingly said TIME_WAIT above). For reference, I put the packet exchange in my initial mail. Thanks, -Jason
Hi, After looking at this further we found that there is actually a rate limit on 'rst' packets sent by OSX on a closed socket. Its set to 250 per second and controlled via: net.inet.icmp.icmplim. Increasing that limit resolves the issue, but the default is apparently 250. Thanks, -Jason On 07/01/2016 02:16 PM, One Thousand Gnomes wrote: >> yes, we do in fact see a POLLRDHUP from the FIN in this case and >> read of zero, but we still have more data to write to the socket, and >> b/c the RST is dropped here, the socket stays in TIME_WAIT until >> things eventually time out... > After the FIN when you send/retransmit your next segment do you then get > a valid RST back from the Mac end? > > Alan
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 94d4aff97523..b3c55b91140c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5155,6 +5155,25 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen) return err; } +/* + * Mac OSX 10.11.5 can send a FIN followed by a RST where the RST + * has the same sequence number as the FIN. This is not compliant + * with RFC 5961, but ends up in a number of sockets tied up mostly + * in TCP_CLOSE_WAIT. The rst attack surface is limited b/c we only + * accept the RST after we've accepted a FIN and have not previously + * sent a FIN and received back the corresponding ACK. + */ +static bool tcp_fin_rst_check(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + return unlikely((TCP_SKB_CB(skb)->seq == (tp->rcv_nxt - 1)) && + (TCP_SKB_CB(skb)->end_seq == (tp->rcv_nxt - 1)) && + (sk->sk_state == TCP_CLOSE_WAIT || + sk->sk_state == TCP_LAST_ACK || + sk->sk_state == TCP_CLOSING)); +} + /* Does PAWS and seqno based validation of an incoming segment, flags will * play significant role here. */ @@ -5193,7 +5212,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, LINUX_MIB_TCPACKSKIPPEDSEQ, &tp->last_oow_ack_time)) tcp_send_dupack(sk, skb); - } + } else if (tcp_fin_rst_check(sk, skb)) + tcp_reset(sk); goto discard; } @@ -5206,7 +5226,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb, * else * Send a challenge ACK */ - if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) { + if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt || + tcp_fin_rst_check(sk, skb)) { rst_seq_match = true; } else if (tcp_is_sack(tp) && tp->rx_opt.num_sacks > 0) { struct tcp_sack_block *sp = &tp->selective_acks[0];