Message ID | 20210430093601.zibczc4cjnwx3qwn@Fryzen495 |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
Series | Avoid potentially erroneos RST drop. | expand |
Ali Abdallah <ali.abdallah@suse.com> wrote: > In ignore state, we let SYN goes in original, the server might respond > with RST/ACK, and that RST packet is erroneously dropped because of the > flag IP_CT_TCP_FLAG_MAXACK_SET being already set. So we reset the flag > in this case. > > Unfortunately that might not be enough, an out of order ACK in origin > might reset it back, and we might end up again dropping a valid RST when > the server responds with RST SEQ=0. > > The patch disables also the RST check when we are not in established > state and we receive an RST with SEQ=0 that is most likely a response to > a SYN we had let it go through. Ali, sorry for coming back to this again and again. What do you think of this change? Its an incremental change on top of your patch. The only real change is that this will skip window check if conntrack thinks connection is closing already. In addition, tcp window check is skipped in that case. This is supposed to expedite conntrack eviction in case of tuple reuse by some nat/pat middlebox, or a peer that has lower timeouts than conntrack before a port is re-used. diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -834,6 +834,22 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb, return true; } +static bool tcp_can_early_drop(const struct nf_conn *ct) +{ + switch (ct->proto.tcp.state) { + case TCP_CONNTRACK_FIN_WAIT: + case TCP_CONNTRACK_LAST_ACK: + case TCP_CONNTRACK_TIME_WAIT: + case TCP_CONNTRACK_CLOSE: + case TCP_CONNTRACK_CLOSE_WAIT: + return true; + default: + break; + } + + return false; +} + /* Returns verdict for packet, or -1 for invalid. */ int nf_conntrack_tcp_packet(struct nf_conn *ct, struct sk_buff *skb, @@ -1053,8 +1069,16 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, /* If we are not in established state, and an RST is * observed with SEQ=0, this is most likely an answer * to a SYN we had let go through above. + * + * Also expedite conntrack destruction: If we were already + * closing, peer or NAT/PAT might already have reused tuple. */ - if (seq == 0 && !nf_conntrack_tcp_established(ct)) + if (!nf_conntrack_tcp_established(ct)) { + if (seq == 0 || tcp_can_early_drop(ct)) + goto in_window; + } + + if (seq == ct->proto.tcp.seen[!dir].td_maxack) break; if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) { @@ -1066,10 +1090,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, return -NF_ACCEPT; } - if (!nf_conntrack_tcp_established(ct) || - seq == ct->proto.tcp.seen[!dir].td_maxack) - break; - /* Check if rst is part of train, such as * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42 * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42 @@ -1181,22 +1201,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, return NF_ACCEPT; } -static bool tcp_can_early_drop(const struct nf_conn *ct) -{ - switch (ct->proto.tcp.state) { - case TCP_CONNTRACK_FIN_WAIT: - case TCP_CONNTRACK_LAST_ACK: - case TCP_CONNTRACK_TIME_WAIT: - case TCP_CONNTRACK_CLOSE: - case TCP_CONNTRACK_CLOSE_WAIT: - return true; - default: - break; - } - - return false; -} - #if IS_ENABLED(CONFIG_NF_CT_NETLINK) #include <linux/netfilter/nfnetlink.h>
On 05.05.2021 21:53, Florian Westphal wrote: > Ali, sorry for coming back to this again and again. > > What do you think of this change? > > Its an incremental change on top of your patch. > > The only real change is that this will skip window check if > conntrack thinks connection is closing already. > > In addition, tcp window check is skipped in that case. > > This is supposed to expedite conntrack eviction in case of tuple reuse > by some nat/pat middlebox, or a peer that has lower timeouts than > conntrack before a port is re-used. Thanks Florian, this looks sane for me, I will give a try and report back here.
On 05.05.2021 21:53, Florian Westphal wrote: > Ali, sorry for coming back to this again and again. > > What do you think of this change? Hi Florian, I tested your patch and it solved the issue, no more NFS hangs due to dropped RSTs. Please include it, together with the following two patches I previously sent: https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/ https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/ Thanks a lot! > Its an incremental change on top of your patch. > > The only real change is that this will skip window check if > conntrack thinks connection is closing already. > > In addition, tcp window check is skipped in that case. > > This is supposed to expedite conntrack eviction in case of tuple reuse > by some nat/pat middlebox, or a peer that has lower timeouts than > conntrack before a port is re-used. > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > --- a/net/netfilter/nf_conntrack_proto_tcp.c > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > @@ -834,6 +834,22 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb, > return true; > } > > +static bool tcp_can_early_drop(const struct nf_conn *ct) > +{ > + switch (ct->proto.tcp.state) { > + case TCP_CONNTRACK_FIN_WAIT: > + case TCP_CONNTRACK_LAST_ACK: > + case TCP_CONNTRACK_TIME_WAIT: > + case TCP_CONNTRACK_CLOSE: > + case TCP_CONNTRACK_CLOSE_WAIT: > + return true; > + default: > + break; > + } > + > + return false; > +} > + > /* Returns verdict for packet, or -1 for invalid. */ > int nf_conntrack_tcp_packet(struct nf_conn *ct, > struct sk_buff *skb, > @@ -1053,8 +1069,16 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > /* If we are not in established state, and an RST is > * observed with SEQ=0, this is most likely an answer > * to a SYN we had let go through above. > + * > + * Also expedite conntrack destruction: If we were already > + * closing, peer or NAT/PAT might already have reused tuple. > */ > - if (seq == 0 && !nf_conntrack_tcp_established(ct)) > + if (!nf_conntrack_tcp_established(ct)) { > + if (seq == 0 || tcp_can_early_drop(ct)) > + goto in_window; > + } > + > + if (seq == ct->proto.tcp.seen[!dir].td_maxack) > break; > > if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) { > @@ -1066,10 +1090,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > return -NF_ACCEPT; > } > > - if (!nf_conntrack_tcp_established(ct) || > - seq == ct->proto.tcp.seen[!dir].td_maxack) > - break; > - > /* Check if rst is part of train, such as > * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42 > * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42 > @@ -1181,22 +1201,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > return NF_ACCEPT; > } > > -static bool tcp_can_early_drop(const struct nf_conn *ct) > -{ > - switch (ct->proto.tcp.state) { > - case TCP_CONNTRACK_FIN_WAIT: > - case TCP_CONNTRACK_LAST_ACK: > - case TCP_CONNTRACK_TIME_WAIT: > - case TCP_CONNTRACK_CLOSE: > - case TCP_CONNTRACK_CLOSE_WAIT: > - return true; > - default: > - break; > - } > - > - return false; > -} > - > #if IS_ENABLED(CONFIG_NF_CT_NETLINK) > > #include <linux/netfilter/nfnetlink.h> >
Ali Abdallah <ali.abdallah@suse.com> wrote: > On 05.05.2021 21:53, Florian Westphal wrote: > > Ali, sorry for coming back to this again and again. > > > > What do you think of this change? > > Hi Florian, I tested your patch and it solved the issue, no more NFS > hangs due to dropped RSTs. Please include it, together with the > following two patches I previously sent: > > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/ Do we still need this one after this revised patch? If we do, the help text has to be fixed, after your patch, be-liberal turns off all sequence number/window checks. The revised text implies it only has to do with RSTs. Alternative would be to add another sysctl, or turn the existing sysctl into integer (0, off, 1 current behaviour (sequence check on for rst only, 2 off for everything). > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/ I will send this patch for inclusion tomorrow or later today. Pablo, please mark both patches as "Changes Requested". I will deal with the 2nd patch and will resend it, with the more liberal handing of RST when conntrack entry is closing. Ali, if you still think the first patch is required please submit a new version with at least a revised help text.
On Wed, May 19, 2021 at 02:23:32PM +0200, Florian Westphal wrote: > Ali Abdallah <ali.abdallah@suse.com> wrote: [...] > > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/ > > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/ > > I will send this patch for inclusion tomorrow or later today. > > Pablo, please mark both patches as "Changes Requested". Done.
On 19.05.2021 14:23, Florian Westphal wrote: > > Hi Florian, I tested your patch and it solved the issue, no more NFS > > hangs due to dropped RSTs. Please include it, together with the > > following two patches I previously sent: > > > > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/ > > Do we still need this one after this revised patch? > If we do, the help text has to be fixed, after your patch, be-liberal > turns off all sequence number/window checks. The revised text implies > it only has to do with RSTs. > > Alternative would be to add another sysctl, or turn the existing sysctl > into integer (0, off, 1 current behaviour (sequence check on for rst > only, 2 off for everything). I would still like to make the RST sequence number check optional. I think it is a good idea to use 0, 1 and > 1 off for everything, keeping this way the current behaviour when tcp_be_liberal is set to 1. I will send another patch with also revised text. Many thanks. Ali
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 318b8f723349..e958fde8cf9b 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -949,6 +949,10 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, ct->proto.tcp.last_flags = ct->proto.tcp.last_wscale = 0; + /* Reset the max ack flag so in case the server replies + * with RST/ACK it will not be marked as an invalid rst. + */ + ct->proto.tcp.seen[dir].flags &= ~IP_CT_TCP_FLAG_MAXACK_SET; tcp_options(skb, dataoff, th, &seen); if (seen.flags & IP_CT_TCP_FLAG_WINDOW_SCALE) { ct->proto.tcp.last_flags |= @@ -1030,6 +1034,13 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, if (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET) { u32 seq = ntohl(th->seq); + /* If we are not in established state, and an RST is + * observed with SEQ=0, this is most likely an answer + * to a SYN we had let go through above. + */ + if (seq == 0 && !nf_conntrack_tcp_established(ct)) + break; + if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) { /* Invalid RST */ spin_unlock_bh(&ct->lock);
In ignore state, we let SYN goes in original, the server might respond with RST/ACK, and that RST packet is erroneously dropped because of the flag IP_CT_TCP_FLAG_MAXACK_SET being already set. So we reset the flag in this case. Unfortunately that might not be enough, an out of order ACK in origin might reset it back, and we might end up again dropping a valid RST when the server responds with RST SEQ=0. The patch disables also the RST check when we are not in established state and we receive an RST with SEQ=0 that is most likely a response to a SYN we had let it go through. Signed-off-by: Ali Abdallah <aabdallah@suse.de> --- net/netfilter/nf_conntrack_proto_tcp.c | 11 +++++++++++ 1 file changed, 11 insertions(+)