diff mbox series

[nf,v2] netfilter: conntrack: connection timeout after re-register

Message ID 20201007193252.7009D95C169C@us180.sjc.aristanetworks.com
State Accepted
Delegated to: Pablo Neira
Headers show
Series [nf,v2] netfilter: conntrack: connection timeout after re-register | expand

Commit Message

Francesco Ruggeri Oct. 7, 2020, 7:32 p.m. UTC
If the first packet conntrack sees after a re-register is an outgoing
keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
SND.NXT-1.
When the peer correctly acknowledges SND.NXT, tcp_in_window fails
check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and
returns false, which cascades into nf_conntrack_in setting
skb->_nfct = 0 and in later conntrack iptables rules not matching.
In cases where iptables are dropping packets that do not match
conntrack rules this can result in idle tcp connections to time out.

v2: adjust td_end when getting the reply rather than when sending out
    the keepalive packet.

Fixes: f94e63801ab2 ("netfilter: conntrack: reset tcp maxwin on re-register")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Comments

Francesco Ruggeri Oct. 8, 2020, 11:41 p.m. UTC | #1
On Wed, Oct 7, 2020 at 12:32 PM Francesco Ruggeri <fruggeri@arista.com> wrote:
>
> If the first packet conntrack sees after a re-register is an outgoing
> keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
> SND.NXT-1.
> When the peer correctly acknowledges SND.NXT, tcp_in_window fails
> check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and
> returns false, which cascades into nf_conntrack_in setting
> skb->_nfct = 0 and in later conntrack iptables rules not matching.
> In cases where iptables are dropping packets that do not match
> conntrack rules this can result in idle tcp connections to time out.
>
> v2: adjust td_end when getting the reply rather than when sending out
>     the keepalive packet.
>

Any comments?
Here is a simple reproducer.
The idea is to show that keepalive packets in an idle tcp
connection will be dropped (and the connection will time out)
if conntrack hooks are de-registered and then re-registered.
The reproducer has two files.
client_server.py creates both ends of a tcp connection, bounces
a few packets back and forth, and then blocks on a recv on the
client side. The client's keepalive is configured to time out in
20 seconds. This connection should not time out.
test is a bash script that creates a net namespace where it sets
iptables rules for the connection, starts client_server.py, and
then clears and restores the iptables rules (which causes
conntrack hooks to be de-registered and re-registered).

================ file client_server.py
#!/usr/bin/python

import socket

PORT=4446

# create server socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('localhost', PORT))
sock.listen(1)

# create client socket
cl_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
cl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 2)
cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)
cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 10)
cl_sock.connect(('localhost', PORT))

srv_sock, _ = sock.accept()

# Bounce a packet back and forth a few times
buf = 'aaaaaaaaaaaa'
for i in range(5):
   cl_sock.send(buf)
   buf = srv_sock.recv(100)
   srv_sock.send(buf)
   buf = cl_sock.recv(100)
   print buf

# idle the connection
try:
   buf = cl_sock.recv(100)
except socket.error, e:
   print "Error: %s" % e

sock.close()
cl_sock.close()
srv_sock.close()

============== file test
#!/bin/bash

ip netns add dummy
ip netns exec dummy ip link set lo up
echo "Created namespace"

ip netns exec dummy iptables-restore <<END
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
COMMIT
END
echo "Installed iptables rules"

ip netns exec dummy ./client_server.py &
echo "Created tcp connection"
sleep 2

ip netns exec dummy iptables-restore << END
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
END
echo "Cleared iptables rules"
sleep 4

ip netns exec dummy iptables-restore << END
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
COMMIT
END
echo "Restored original iptables rules"

wait
ip netns del dummy
exit 0
Jozsef Kadlecsik Oct. 9, 2020, 6:52 a.m. UTC | #2
Hi Francesco,

On Thu, 8 Oct 2020, Francesco Ruggeri wrote:

> On Wed, Oct 7, 2020 at 12:32 PM Francesco Ruggeri <fruggeri@arista.com> wrote:
> >
> > If the first packet conntrack sees after a re-register is an outgoing 
> > keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to 
> > SND.NXT-1. When the peer correctly acknowledges SND.NXT, tcp_in_window 
> > fails check III (Upper bound for valid (s)ack: sack <= 
> > receiver.td_end) and returns false, which cascades into 
> > nf_conntrack_in setting skb->_nfct = 0 and in later conntrack iptables 
> > rules not matching. In cases where iptables are dropping packets that 
> > do not match conntrack rules this can result in idle tcp connections 
> > to time out.
> >
> > v2: adjust td_end when getting the reply rather than when sending out
> >     the keepalive packet.
> >
> 
> Any comments?
> Here is a simple reproducer. The idea is to show that keepalive packets 
> in an idle tcp connection will be dropped (and the connection will time 
> out) if conntrack hooks are de-registered and then re-registered. The 
> reproducer has two files. client_server.py creates both ends of a tcp 
> connection, bounces a few packets back and forth, and then blocks on a 
> recv on the client side. The client's keepalive is configured to time 
> out in 20 seconds. This connection should not time out. test is a bash 
> script that creates a net namespace where it sets iptables rules for the 
> connection, starts client_server.py, and then clears and restores the 
> iptables rules (which causes conntrack hooks to be de-registered and 
> re-registered).

In my opinion an iptables restore should not cause conntrack hooks to be 
de-registered and re-registered, because important TCP initialization 
parameters cannot be "restored" later from the packets. Therefore the 
proper fix would be to prevent it to happen. Otherwise your patch looks OK 
to handle the case when conntrack is intentionally restarted.

Best regards,
Jozsef
 
> ================ file client_server.py
> #!/usr/bin/python
> 
> import socket
> 
> PORT=4446
> 
> # create server socket
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.bind(('localhost', PORT))
> sock.listen(1)
> 
> # create client socket
> cl_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> cl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 2)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 10)
> cl_sock.connect(('localhost', PORT))
> 
> srv_sock, _ = sock.accept()
> 
> # Bounce a packet back and forth a few times
> buf = 'aaaaaaaaaaaa'
> for i in range(5):
>    cl_sock.send(buf)
>    buf = srv_sock.recv(100)
>    srv_sock.send(buf)
>    buf = cl_sock.recv(100)
>    print buf
> 
> # idle the connection
> try:
>    buf = cl_sock.recv(100)
> except socket.error, e:
>    print "Error: %s" % e
> 
> sock.close()
> cl_sock.close()
> srv_sock.close()
> 
> ============== file test
> #!/bin/bash
> 
> ip netns add dummy
> ip netns exec dummy ip link set lo up
> echo "Created namespace"
> 
> ip netns exec dummy iptables-restore <<END
> *filter
> :INPUT DROP [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
> COMMIT
> END
> echo "Installed iptables rules"
> 
> ip netns exec dummy ./client_server.py &
> echo "Created tcp connection"
> sleep 2
> 
> ip netns exec dummy iptables-restore << END
> *filter
> :INPUT ACCEPT [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> COMMIT
> END
> echo "Cleared iptables rules"
> sleep 4
> 
> ip netns exec dummy iptables-restore << END
> *filter
> :INPUT DROP [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
> COMMIT
> END
> echo "Restored original iptables rules"
> 
> wait
> ip netns del dummy
> exit 0
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary
Florian Westphal Oct. 9, 2020, 11:03 a.m. UTC | #3
Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > Any comments?
> > Here is a simple reproducer. The idea is to show that keepalive packets 
> > in an idle tcp connection will be dropped (and the connection will time 
> > out) if conntrack hooks are de-registered and then re-registered. The 
> > reproducer has two files. client_server.py creates both ends of a tcp 
> > connection, bounces a few packets back and forth, and then blocks on a 
> > recv on the client side. The client's keepalive is configured to time 
> > out in 20 seconds. This connection should not time out. test is a bash 
> > script that creates a net namespace where it sets iptables rules for the 
> > connection, starts client_server.py, and then clears and restores the 
> > iptables rules (which causes conntrack hooks to be de-registered and 
> > re-registered).
> 
> In my opinion an iptables restore should not cause conntrack hooks to be 
> de-registered and re-registered, because important TCP initialization 
> parameters cannot be "restored" later from the packets. Therefore the 
> proper fix would be to prevent it to happen. Otherwise your patch looks OK 
> to handle the case when conntrack is intentionally restarted.

The repro clears all rules, waits 4 seconds, then restores the ruleset.
using iptables-restore < FOO; sleep 4; iptables-restore < FOO will
not result in any unregister ops.

We could make kernel defer unregister via some work queue but i don't
see what this would help/accomplish (and its questionable of how long it
should wait).

We could disallow unregister, but that seems silly (forces reboot...).

I think the patch is fine.
Jozsef Kadlecsik Oct. 9, 2020, 6:48 p.m. UTC | #4
Hi Florian,

On Fri, 9 Oct 2020, Florian Westphal wrote:

> Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > > The reproducer has two files. client_server.py creates both ends of 
> > > a tcp connection, bounces a few packets back and forth, and then 
> > > blocks on a recv on the client side. The client's keepalive is 
> > > configured to time out in 20 seconds. This connection should not 
> > > time out. test is a bash script that creates a net namespace where 
> > > it sets iptables rules for the connection, starts client_server.py, 
> > > and then clears and restores the iptables rules (which causes 
> > > conntrack hooks to be de-registered and re-registered).
> > 
> > In my opinion an iptables restore should not cause conntrack hooks to be 
> > de-registered and re-registered, because important TCP initialization 
> > parameters cannot be "restored" later from the packets. Therefore the 
> > proper fix would be to prevent it to happen. Otherwise your patch looks OK 
> > to handle the case when conntrack is intentionally restarted.
> 
> The repro clears all rules, waits 4 seconds, then restores the ruleset. 
> using iptables-restore < FOO; sleep 4; iptables-restore < FOO will not 
> result in any unregister ops.
>
> We could make kernel defer unregister via some work queue but i don't
> see what this would help/accomplish (and its questionable of how long it
> should wait).

Sorry, I can't put together the two paragraphs above: in the first you 
wrote that no (hook) unregister-register happens and in the second one 
that those could be derefed.

> We could disallow unregister, but that seems silly (forces reboot...).
> 
> I think the patch is fine.

The patch is fine, but why the packets are handled by conntrack (after the 
first restore and during the 4s sleep? And then again after the second 
restore?) as if all conntrack entries were removed?
 
Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary
Florian Westphal Oct. 9, 2020, 6:55 p.m. UTC | #5
Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > The repro clears all rules, waits 4 seconds, then restores the ruleset. 
> > using iptables-restore < FOO; sleep 4; iptables-restore < FOO will not 
> > result in any unregister ops.
> >
> > We could make kernel defer unregister via some work queue but i don't
> > see what this would help/accomplish (and its questionable of how long it
> > should wait).
> 
> Sorry, I can't put together the two paragraphs above: in the first you 
> wrote that no (hook) unregister-register happens and in the second one 
> that those could be derefed.

Sorry, my reply is confusing indeed.

Matches/targets that need conntrack increment a refcount.
So, when all rules are flushed, refcount goes down to 0 and conntrack is
disabled because the hooks get removed..

Just doing iptables-restore doesn't unregister as long as both the old
and new rulesets need conntrack.

The "delay unregister" remark was wrt. the "all rules were deleted"
case, i.e. add a "grace period" rather than acting right away when
conntrack use count did hit 0.

> > We could disallow unregister, but that seems silly (forces reboot...).
> > 
> > I think the patch is fine.
> 
> The patch is fine, but why the packets are handled by conntrack (after the 
> first restore and during the 4s sleep? And then again after the second 
> restore?) as if all conntrack entries were removed?

Conntrack entries are not removed, only the base hooks get unregistered.
This is a problem for tcp window tracking.

When re-register occurs, kernel is supposed to switch the existing
entries to "loose" mode so window tracking won't flag packets as
invalid, but apparently this isn't enough to handle keepalive case.
Jozsef Kadlecsik Oct. 9, 2020, 7:49 p.m. UTC | #6
On Fri, 9 Oct 2020, Florian Westphal wrote:

> Matches/targets that need conntrack increment a refcount. So, when all 
> rules are flushed, refcount goes down to 0 and conntrack is disabled 
> because the hooks get removed..
> 
> Just doing iptables-restore doesn't unregister as long as both the old
> and new rulesets need conntrack.
> 
> The "delay unregister" remark was wrt. the "all rules were deleted"
> case, i.e. add a "grace period" rather than acting right away when
> conntrack use count did hit 0.

Now I understand it, thanks really. The hooks are removed, so conntrack 
cannot "see" the packets and the entries become stale. 

What is the rationale behind "remove the conntrack hooks when there are no 
rule left referring to conntrack"? Performance optimization? But then the 
content of the whole conntrack table could be deleted too... ;-)
 
> Conntrack entries are not removed, only the base hooks get unregistered. 
> This is a problem for tcp window tracking.
> 
> When re-register occurs, kernel is supposed to switch the existing 
> entries to "loose" mode so window tracking won't flag packets as 
> invalid, but apparently this isn't enough to handle keepalive case.

"loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
enables/disables picking up already established connections. 

nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
for non RST packets.

But both seems to be modified only via the proc entries.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary
Francesco Ruggeri Oct. 9, 2020, 8 p.m. UTC | #7
On Fri, Oct 9, 2020 at 12:49 PM Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> What is the rationale behind "remove the conntrack hooks when there are no
> rule left referring to conntrack"? Performance optimization?

That seems to be the case. See commit 4d3a57f23dec ("netfilter: conntrack:
do not enable connection tracking unless needed").

Francesco
Florian Westphal Oct. 9, 2020, 8:05 p.m. UTC | #8
Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > The "delay unregister" remark was wrt. the "all rules were deleted"
> > case, i.e. add a "grace period" rather than acting right away when
> > conntrack use count did hit 0.
> 
> Now I understand it, thanks really. The hooks are removed, so conntrack 
> cannot "see" the packets and the entries become stale. 

Yes.

> What is the rationale behind "remove the conntrack hooks when there are no 
> rule left referring to conntrack"? Performance optimization? But then the 
> content of the whole conntrack table could be deleted too... ;-)

Yes, this isn't the case at the moment -- only hooks are removed,
entries will eventually time out.

> > Conntrack entries are not removed, only the base hooks get unregistered. 
> > This is a problem for tcp window tracking.
> > 
> > When re-register occurs, kernel is supposed to switch the existing 
> > entries to "loose" mode so window tracking won't flag packets as 
> > invalid, but apparently this isn't enough to handle keepalive case.
> 
> "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
> enables/disables picking up already established connections. 
> 
> nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
> for non RST packets.

You are right, mixup on my part.

> But both seems to be modified only via the proc entries.

Yes, we iterate table on re-register and modify the existing entries.
Pablo Neira Ayuso Oct. 14, 2020, 12:06 a.m. UTC | #9
On Fri, Oct 09, 2020 at 10:05:48PM +0200, Florian Westphal wrote:
> Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > > The "delay unregister" remark was wrt. the "all rules were deleted"
> > > case, i.e. add a "grace period" rather than acting right away when
> > > conntrack use count did hit 0.
> > 
> > Now I understand it, thanks really. The hooks are removed, so conntrack 
> > cannot "see" the packets and the entries become stale. 
> 
> Yes.
> 
> > What is the rationale behind "remove the conntrack hooks when there are no 
> > rule left referring to conntrack"? Performance optimization? But then the 
> > content of the whole conntrack table could be deleted too... ;-)
> 
> Yes, this isn't the case at the moment -- only hooks are removed,
> entries will eventually time out.
> 
> > > Conntrack entries are not removed, only the base hooks get unregistered. 
> > > This is a problem for tcp window tracking.
> > > 
> > > When re-register occurs, kernel is supposed to switch the existing 
> > > entries to "loose" mode so window tracking won't flag packets as 
> > > invalid, but apparently this isn't enough to handle keepalive case.
> > 
> > "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
> > enables/disables picking up already established connections. 
> > 
> > nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
> > for non RST packets.
> 
> You are right, mixup on my part.
> 
> > But both seems to be modified only via the proc entries.
> 
> Yes, we iterate table on re-register and modify the existing entries.

For iptables-nft, it might be possible to avoid this deregister +
register ct hooks in the same transaction: Maybe add something like
nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
before starting the transaction processing, then call
nf_ct_netns_put_all() which decrements refcounters and unregister
hooks if they reach 0.

The only problem with this approach is that this pulls in the
conntrack module, to solve that, struct nf_ct_hook in
net/netfilter/core.c could be used to store the reference to
->netns_get_all and ->net_put_all.

Legacy would still be flawed though.
Pablo Neira Ayuso Oct. 14, 2020, 8:11 a.m. UTC | #10
On Wed, Oct 14, 2020 at 02:06:28AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Oct 09, 2020 at 10:05:48PM +0200, Florian Westphal wrote:
> > Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > > > The "delay unregister" remark was wrt. the "all rules were deleted"
> > > > case, i.e. add a "grace period" rather than acting right away when
> > > > conntrack use count did hit 0.
> > > 
> > > Now I understand it, thanks really. The hooks are removed, so conntrack 
> > > cannot "see" the packets and the entries become stale. 
> > 
> > Yes.
> > 
> > > What is the rationale behind "remove the conntrack hooks when there are no 
> > > rule left referring to conntrack"? Performance optimization? But then the 
> > > content of the whole conntrack table could be deleted too... ;-)
> > 
> > Yes, this isn't the case at the moment -- only hooks are removed,
> > entries will eventually time out.
> > 
> > > > Conntrack entries are not removed, only the base hooks get unregistered. 
> > > > This is a problem for tcp window tracking.
> > > > 
> > > > When re-register occurs, kernel is supposed to switch the existing 
> > > > entries to "loose" mode so window tracking won't flag packets as 
> > > > invalid, but apparently this isn't enough to handle keepalive case.
> > > 
> > > "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
> > > enables/disables picking up already established connections. 
> > > 
> > > nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
> > > for non RST packets.
> > 
> > You are right, mixup on my part.
> > 
> > > But both seems to be modified only via the proc entries.
> > 
> > Yes, we iterate table on re-register and modify the existing entries.
> 
> For iptables-nft, it might be possible to avoid this deregister +
> register ct hooks in the same transaction: Maybe add something like
> nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
> before starting the transaction processing, then call
> nf_ct_netns_put_all() which decrements refcounters and unregister
> hooks if they reach 0.

Hm, scratch that, put_all() would create an imbalance with this
conditional increment.

> The only problem with this approach is that this pulls in the
> conntrack module, to solve that, struct nf_ct_hook in
> net/netfilter/core.c could be used to store the reference to
> ->netns_get_all and ->net_put_all.
> 
> Legacy would still be flawed though.
Florian Westphal Oct. 14, 2020, 8:23 a.m. UTC | #11
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Yes, we iterate table on re-register and modify the existing entries.
> 
> For iptables-nft, it might be possible to avoid this deregister +
> register ct hooks in the same transaction: Maybe add something like
> nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
> before starting the transaction processing, then call
> nf_ct_netns_put_all() which decrements refcounters and unregister
> hooks if they reach 0.

No need, its already fine.  Decrement happens from destroy path,
so new rules are already in place.

> The only problem with this approach is that this pulls in the
> conntrack module, to solve that, struct nf_ct_hook in
> net/netfilter/core.c could be used to store the reference to
> ->netns_get_all and ->net_put_all.
> 
> Legacy would still be flawed though.

Its fine too, new rule blob gets handled (and match/target checkentry
called) before old one is dismantled.

We only have a 0 refcount + hook unregister when rules get
flushed/removed explicitly.
Francesco Ruggeri Oct. 14, 2020, 6:42 p.m. UTC | #12
On Wed, Oct 14, 2020 at 1:23 AM Florian Westphal <fw@strlen.de> wrote:
>
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Legacy would still be flawed though.
>
> Its fine too, new rule blob gets handled (and match/target checkentry
> called) before old one is dismantled.
>
> We only have a 0 refcount + hook unregister when rules get
> flushed/removed explicitly.

Should the patch be used in the meantime while this gets
worked out?

Francesco
Florian Westphal Oct. 14, 2020, 7:35 p.m. UTC | #13
Francesco Ruggeri <fruggeri@arista.com> wrote:
> On Wed, Oct 14, 2020 at 1:23 AM Florian Westphal <fw@strlen.de> wrote:
> >
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Legacy would still be flawed though.
> >
> > Its fine too, new rule blob gets handled (and match/target checkentry
> > called) before old one is dismantled.
> >
> > We only have a 0 refcount + hook unregister when rules get
> > flushed/removed explicitly.
> 
> Should the patch be used in the meantime while this gets
> worked out?

I think the patch is correct, and I do NOT see a better solution.
Pablo Neira Ayuso Oct. 20, 2020, 3:21 p.m. UTC | #14
On Wed, Oct 07, 2020 at 12:32:52PM -0700, Francesco Ruggeri wrote:
> If the first packet conntrack sees after a re-register is an outgoing
> keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
> SND.NXT-1.
> When the peer correctly acknowledges SND.NXT, tcp_in_window fails
> check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and
> returns false, which cascades into nf_conntrack_in setting
> skb->_nfct = 0 and in later conntrack iptables rules not matching.
> In cases where iptables are dropping packets that do not match
> conntrack rules this can result in idle tcp connections to time out.

Applied, thanks.
diff mbox series

Patch

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index e8c86ee4c1c4..c8fb2187ad4b 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -541,13 +541,20 @@  static bool tcp_in_window(const struct nf_conn *ct,
 			swin = win << sender->td_scale;
 			sender->td_maxwin = (swin == 0 ? 1 : swin);
 			sender->td_maxend = end + sender->td_maxwin;
-			/*
-			 * We haven't seen traffic in the other direction yet
-			 * but we have to tweak window tracking to pass III
-			 * and IV until that happens.
-			 */
-			if (receiver->td_maxwin == 0)
+			if (receiver->td_maxwin == 0) {
+				/* We haven't seen traffic in the other
+				 * direction yet but we have to tweak window
+				 * tracking to pass III and IV until that
+				 * happens.
+				 */
 				receiver->td_end = receiver->td_maxend = sack;
+			} else if (sack == receiver->td_end + 1) {
+				/* Likely a reply to a keepalive.
+				 * Needed for III.
+				 */
+				receiver->td_end++;
+			}
+
 		}
 	} else if (((state->state == TCP_CONNTRACK_SYN_SENT
 		     && dir == IP_CT_DIR_ORIGINAL)