diff mbox series

Avoid potentially erroneos RST drop.

Message ID 20210430093601.zibczc4cjnwx3qwn@Fryzen495
State Changes Requested
Delegated to: Pablo Neira
Headers show
Series Avoid potentially erroneos RST drop. | expand

Commit Message

Ali Abdallah April 30, 2021, 9:36 a.m. UTC
In ignore state, we let SYN goes in original, the server might respond
with RST/ACK, and that RST packet is erroneously dropped because of the
flag IP_CT_TCP_FLAG_MAXACK_SET being already set. So we reset the flag
in this case.

Unfortunately that might not be enough, an out of order ACK in origin
might reset it back, and we might end up again dropping a valid RST when
the server responds with RST SEQ=0.

The patch disables also the RST check when we are not in established
state and we receive an RST with SEQ=0 that is most likely a response to
a SYN we had let it go through.

Signed-off-by: Ali Abdallah <aabdallah@suse.de>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Florian Westphal May 5, 2021, 7:53 p.m. UTC | #1
Ali Abdallah <ali.abdallah@suse.com> wrote:
> In ignore state, we let SYN goes in original, the server might respond
> with RST/ACK, and that RST packet is erroneously dropped because of the
> flag IP_CT_TCP_FLAG_MAXACK_SET being already set. So we reset the flag
> in this case.
> 
> Unfortunately that might not be enough, an out of order ACK in origin
> might reset it back, and we might end up again dropping a valid RST when
> the server responds with RST SEQ=0.
> 
> The patch disables also the RST check when we are not in established
> state and we receive an RST with SEQ=0 that is most likely a response to
> a SYN we had let it go through.

Ali, sorry for coming back to this again and again.

What do you think of this change?

Its an incremental change on top of your patch.

The only real change is that this will skip window check if
conntrack thinks connection is closing already.

In addition, tcp window check is skipped in that case.

This is supposed to expedite conntrack eviction in case of tuple reuse
by some nat/pat middlebox, or a peer that has lower timeouts than
conntrack before a port is re-used.

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -834,6 +834,22 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
 	return true;
 }
 
+static bool tcp_can_early_drop(const struct nf_conn *ct)
+{
+	switch (ct->proto.tcp.state) {
+	case TCP_CONNTRACK_FIN_WAIT:
+	case TCP_CONNTRACK_LAST_ACK:
+	case TCP_CONNTRACK_TIME_WAIT:
+	case TCP_CONNTRACK_CLOSE:
+	case TCP_CONNTRACK_CLOSE_WAIT:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
 /* Returns verdict for packet, or -1 for invalid. */
 int nf_conntrack_tcp_packet(struct nf_conn *ct,
 			    struct sk_buff *skb,
@@ -1053,8 +1069,16 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 			/* If we are not in established state, and an RST is
 			 * observed with SEQ=0, this is most likely an answer
 			 * to a SYN we had let go through above.
+			 *
+			 * Also expedite conntrack destruction: If we were already
+			 * closing, peer or NAT/PAT might already have reused tuple.
 			 */
-			if (seq == 0 && !nf_conntrack_tcp_established(ct))
+			if (!nf_conntrack_tcp_established(ct)) {
+				if (seq == 0 || tcp_can_early_drop(ct))
+					goto in_window;
+			}
+
+			if (seq == ct->proto.tcp.seen[!dir].td_maxack)
 				break;
 
 			if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) {
@@ -1066,10 +1090,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 				return -NF_ACCEPT;
 			}
 
-			if (!nf_conntrack_tcp_established(ct) ||
-			    seq == ct->proto.tcp.seen[!dir].td_maxack)
-				break;
-
 			/* Check if rst is part of train, such as
 			 *   foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
 			 *   foo:80 > bar:4379: R, 235946602:235946602(0)  ack 42
@@ -1181,22 +1201,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 	return NF_ACCEPT;
 }
 
-static bool tcp_can_early_drop(const struct nf_conn *ct)
-{
-	switch (ct->proto.tcp.state) {
-	case TCP_CONNTRACK_FIN_WAIT:
-	case TCP_CONNTRACK_LAST_ACK:
-	case TCP_CONNTRACK_TIME_WAIT:
-	case TCP_CONNTRACK_CLOSE:
-	case TCP_CONNTRACK_CLOSE_WAIT:
-		return true;
-	default:
-		break;
-	}
-
-	return false;
-}
-
 #if IS_ENABLED(CONFIG_NF_CT_NETLINK)
 
 #include <linux/netfilter/nfnetlink.h>
Ali Abdallah May 6, 2021, 7:33 a.m. UTC | #2
On 05.05.2021 21:53, Florian Westphal wrote:
> Ali, sorry for coming back to this again and again.
> 
> What do you think of this change?
> 
> Its an incremental change on top of your patch.
> 
> The only real change is that this will skip window check if
> conntrack thinks connection is closing already.
> 
> In addition, tcp window check is skipped in that case.
> 
> This is supposed to expedite conntrack eviction in case of tuple reuse
> by some nat/pat middlebox, or a peer that has lower timeouts than
> conntrack before a port is re-used.

Thanks Florian, this looks sane for me, I will give a try and report
back here.
Ali Abdallah May 19, 2021, 12:07 p.m. UTC | #3
On 05.05.2021 21:53, Florian Westphal wrote:
> Ali, sorry for coming back to this again and again.
> 
> What do you think of this change?

Hi Florian, I tested your patch and it solved the issue, no more NFS
hangs due to dropped RSTs. Please include it, together with the
following two patches I previously sent:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/

Thanks a lot!

> Its an incremental change on top of your patch.
> 
> The only real change is that this will skip window check if
> conntrack thinks connection is closing already.
> 
> In addition, tcp window check is skipped in that case.
> 
> This is supposed to expedite conntrack eviction in case of tuple reuse
> by some nat/pat middlebox, or a peer that has lower timeouts than
> conntrack before a port is re-used.
> 
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -834,6 +834,22 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
>  	return true;
>  }
>  
> +static bool tcp_can_early_drop(const struct nf_conn *ct)
> +{
> +	switch (ct->proto.tcp.state) {
> +	case TCP_CONNTRACK_FIN_WAIT:
> +	case TCP_CONNTRACK_LAST_ACK:
> +	case TCP_CONNTRACK_TIME_WAIT:
> +	case TCP_CONNTRACK_CLOSE:
> +	case TCP_CONNTRACK_CLOSE_WAIT:
> +		return true;
> +	default:
> +		break;
> +	}
> +
> +	return false;
> +}
> +
>  /* Returns verdict for packet, or -1 for invalid. */
>  int nf_conntrack_tcp_packet(struct nf_conn *ct,
>  			    struct sk_buff *skb,
> @@ -1053,8 +1069,16 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
>  			/* If we are not in established state, and an RST is
>  			 * observed with SEQ=0, this is most likely an answer
>  			 * to a SYN we had let go through above.
> +			 *
> +			 * Also expedite conntrack destruction: If we were already
> +			 * closing, peer or NAT/PAT might already have reused tuple.
>  			 */
> -			if (seq == 0 && !nf_conntrack_tcp_established(ct))
> +			if (!nf_conntrack_tcp_established(ct)) {
> +				if (seq == 0 || tcp_can_early_drop(ct))
> +					goto in_window;
> +			}
> +
> +			if (seq == ct->proto.tcp.seen[!dir].td_maxack)
>  				break;
>  
>  			if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) {
> @@ -1066,10 +1090,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
>  				return -NF_ACCEPT;
>  			}
>  
> -			if (!nf_conntrack_tcp_established(ct) ||
> -			    seq == ct->proto.tcp.seen[!dir].td_maxack)
> -				break;
> -
>  			/* Check if rst is part of train, such as
>  			 *   foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
>  			 *   foo:80 > bar:4379: R, 235946602:235946602(0)  ack 42
> @@ -1181,22 +1201,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
>  	return NF_ACCEPT;
>  }
>  
> -static bool tcp_can_early_drop(const struct nf_conn *ct)
> -{
> -	switch (ct->proto.tcp.state) {
> -	case TCP_CONNTRACK_FIN_WAIT:
> -	case TCP_CONNTRACK_LAST_ACK:
> -	case TCP_CONNTRACK_TIME_WAIT:
> -	case TCP_CONNTRACK_CLOSE:
> -	case TCP_CONNTRACK_CLOSE_WAIT:
> -		return true;
> -	default:
> -		break;
> -	}
> -
> -	return false;
> -}
> -
>  #if IS_ENABLED(CONFIG_NF_CT_NETLINK)
>  
>  #include <linux/netfilter/nfnetlink.h>
>
Florian Westphal May 19, 2021, 12:23 p.m. UTC | #4
Ali Abdallah <ali.abdallah@suse.com> wrote:
> On 05.05.2021 21:53, Florian Westphal wrote:
> > Ali, sorry for coming back to this again and again.
> > 
> > What do you think of this change?
> 
> Hi Florian, I tested your patch and it solved the issue, no more NFS
> hangs due to dropped RSTs. Please include it, together with the
> following two patches I previously sent:
> 
> https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/

Do we still need this one after this revised patch?
If we do, the help text has to be fixed, after your patch, be-liberal
turns off all sequence number/window checks.  The revised text implies
it only has to do with RSTs.

Alternative would be to add another sysctl, or turn the existing sysctl
into integer (0, off, 1 current behaviour (sequence check on for rst
only, 2 off for everything).

> https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/

I will send this patch for inclusion tomorrow or later today.

Pablo, please mark both patches as "Changes Requested".

I will deal with the 2nd patch and will resend it, with the more liberal
handing of RST when conntrack entry is closing.

Ali, if you still think the first patch is required please submit a new
version with at least a revised help text.
Pablo Neira Ayuso May 19, 2021, 10:16 p.m. UTC | #5
On Wed, May 19, 2021 at 02:23:32PM +0200, Florian Westphal wrote:
> Ali Abdallah <ali.abdallah@suse.com> wrote:
[...]
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/
> 
> I will send this patch for inclusion tomorrow or later today.
> 
> Pablo, please mark both patches as "Changes Requested".

Done.
Ali Abdallah May 21, 2021, 8 a.m. UTC | #6
On 19.05.2021 14:23, Florian Westphal wrote:
> > Hi Florian, I tested your patch and it solved the issue, no more NFS
> > hangs due to dropped RSTs. Please include it, together with the
> > following two patches I previously sent:
> > 
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/
> 
> Do we still need this one after this revised patch?
> If we do, the help text has to be fixed, after your patch, be-liberal
> turns off all sequence number/window checks.  The revised text implies
> it only has to do with RSTs.
> 
> Alternative would be to add another sysctl, or turn the existing sysctl
> into integer (0, off, 1 current behaviour (sequence check on for rst
> only, 2 off for everything).

I would still like to make the RST sequence number check optional.  I
think it is a good idea to use 0, 1 and > 1 off for everything, keeping
this way the current behaviour when tcp_be_liberal is set to 1.

I will send another patch with also revised text.

Many thanks.
Ali
diff mbox series

Patch

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 318b8f723349..e958fde8cf9b 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -949,6 +949,10 @@  int nf_conntrack_tcp_packet(struct nf_conn *ct,
 
 			ct->proto.tcp.last_flags =
 			ct->proto.tcp.last_wscale = 0;
+			/* Reset the max ack flag so in case the server replies
+			 * with RST/ACK it will not be marked as an invalid rst.
+			 */
+			ct->proto.tcp.seen[dir].flags &= ~IP_CT_TCP_FLAG_MAXACK_SET;
 			tcp_options(skb, dataoff, th, &seen);
 			if (seen.flags & IP_CT_TCP_FLAG_WINDOW_SCALE) {
 				ct->proto.tcp.last_flags |=
@@ -1030,6 +1034,13 @@  int nf_conntrack_tcp_packet(struct nf_conn *ct,
 		if (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET) {
 			u32 seq = ntohl(th->seq);
 
+			/* If we are not in established state, and an RST is
+			 * observed with SEQ=0, this is most likely an answer
+			 * to a SYN we had let go through above.
+			 */
+			if (seq == 0 && !nf_conntrack_tcp_established(ct))
+				break;
+
 			if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) {
 				/* Invalid RST  */
 				spin_unlock_bh(&ct->lock);