diff mbox series

document danger of '-j REJECT'ing of '-m state INVALID' packets

Message ID 20200509052235.150348-1-zenczykowski@gmail.com
State Not Applicable
Delegated to: David Miller
Headers show
Series document danger of '-j REJECT'ing of '-m state INVALID' packets | expand

Commit Message

Maciej Żenczykowski May 9, 2020, 5:22 a.m. UTC
From: Maciej Żenczykowski <maze@google.com>

This appears to be a common, but hard to debug, misconfiguration.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 extensions/libip6t_REJECT.man | 15 +++++++++++++++
 extensions/libipt_REJECT.man  | 15 +++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Jan Engelhardt May 9, 2020, 10:52 a.m. UTC | #1
On Saturday 2020-05-09 07:22, Maciej Żenczykowski wrote:
>diff --git a/extensions/libip6t_REJECT.man b/extensions/libip6t_REJECT.man
>index 0030a51f..b6474811 100644
>--- a/extensions/libip6t_REJECT.man
>+++ b/extensions/libip6t_REJECT.man
>@@ -30,3 +30,18 @@ TCP RST packet to be sent back.  This is mainly useful for blocking
> hosts (which won't accept your mail otherwise).
> \fBtcp\-reset\fP
> can only be used with kernel versions 2.6.14 or later.
>+.PP
>+\fIWarning:\fP if you are using connection tracking and \fBACCEPT\fP'ing
>+\fBESTABLISHED\fP (and possibly \fBRELATED\fP) state packets, do not
>+indiscriminately \fBREJECT\fP (especially with \fITCP RST\fP) \fBINVALID\fP
>+state packets.  Sometimes naturally occuring packet reordering will result
>+in packets being considered \fBINVALID\fP and the generated \fITCP RST\fP
>+will abort an otherwise healthy connection.

I fail to understand the problem here.

1. Because ESTABLISHED and INVALID are mutually exclusive, there is no ordering
dependency between two rules of the kind {EST=>ACCEPT, INV=>REJ},
and thus their order plays no role.

2. Given packets D,R (data, rst) leads to state(ct(D))=EST, state(ct(R))=EST in
the normal case. When this gets reordered to R,D, then we end up with
state(ct(R))=EST, state(ct(D))=INV. Though the outcome of nfct changes,
I do not think that will be of consequence, because in the absence of
filtering, the tcp layer should be discarding/rejecting D.

3. Natural reordering of D1,D2 to D2,D1 should not cause nfct to drop the ct
at reception of D1 and turn the state to INV. Reordering can happen at any
time, and we'd be having more reports of problems if it did, wouldn't we...
Maciej Żenczykowski May 9, 2020, 5:45 p.m. UTC | #2
So I've never tried to figure out how things break, just observed that
they do - first many many years ago (close to 15ish) - between my wifi
connected laptop at home and my university server in the same city.
I've kept an INVALID->DROP rule in all my firewalls since then and not
had problems.  I vaguely recall seeing delayed packets when I debugged
it back then.

See for example: https://github.com/moby/libnetwork/issues/1090 for
others running into this.

Now we've hit an issue at work where a network misconfiguration has
asymmetric one way pathing with a result that some packets were
getting *massively* delayed, and it's been causing user firewalls to
generate tcp resets for 'too old' 'already ack'ed' packets (ie. dups).

While this is of course a misconfig, and it shouldn't happen, in
practice it sometimes simply does.
All it takes is for a packet to get into a long queue, and the network
path to shift (immediately after it) to a less congested path.
Due to bufferbloat those long queues can take seconds to drain and
exceed path rtt by orders of magnitude.

I *think* what happens is:

A non-final tcp packet gets massively delayed, the packet past that
makes it through to the receive, and triggers an ACK with SACK, which
makes it back to the sender and triggers a retransmit and the
connections keeps on making forward progress,  then eventually the
delayed packet arrives and it's no longer considered valid and
triggers a tcp reset.  Massively of course depends on the rtt and
retransmit aggressiveness.

Here's my attempt to demonstrate what I believe the problem to be:

(on a freshly booted clean/empty/idle fedora 31 vm)

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m state --state INVALID -j DROP
modprobe ifb
ip link set dev ifb0 up
tc qdisc add dev ifb0 root netem reorder 99% 0% delay 10s
tc qdisc add dev eth0 clsact
tc filter add dev eth0 ingress u32 match u32 0 0 action mirred egress
redirect dev ifb0
wget -O /dev/null https://git.kernel.org/torvalds/t/linux-5.7-rc4.tar.gz
iptables-save -c

...
/dev/null                             [     <=>
                           ] 169.58M  2.93MB/s    in 45s
2020-05-09 10:35:44 (3.81 MB/s) - ‘/dev/null’ saved [177819073]
...
[31750:181080717] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
[244:1403178] -A INPUT -m state --state INVALID -j DROP


Now if I reboot, and run the same script, except instead of the
INVALID/DROP rule I do
  iptables -A INPUT -p tcp -j REJECT --reject-with tcp-reset
then the download never finishes (it hangs after 15MB @ 2MB/s and
eventually times out).

[4170:16758894] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
[37:147454] -A INPUT -p tcp -j REJECT --reject-with tcp-reset

(arguably since this is a VM, and thus NAT'ed by my host, and then
again by the real ipv4 NAT, the setup isn't entirely clear, but I hope
it makes my point: INVALID state needs to be dropped, not rejected)
Maciej Żenczykowski May 9, 2020, 6:02 p.m. UTC | #3
Side note, it doesn't have to be nearly as aggressive as the above.

With just:
  tc qdisc replace dev ifb0 root netem reorder 99.9% 0% delay 1s
I still see 169.58M @ 7.02MB/s in 26s:
  [24263:180667450] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
  [27:174654] -A INPUT -m state --state INVALID -j DROP
  [0:0] -A INPUT -p tcp -j REJECT --reject-with tcp-reset

And the connection still freezes without the INVALID/DROP rule (after
43MiB this time)
diff mbox series

Patch

diff --git a/extensions/libip6t_REJECT.man b/extensions/libip6t_REJECT.man
index 0030a51f..b6474811 100644
--- a/extensions/libip6t_REJECT.man
+++ b/extensions/libip6t_REJECT.man
@@ -30,3 +30,18 @@  TCP RST packet to be sent back.  This is mainly useful for blocking
 hosts (which won't accept your mail otherwise).
 \fBtcp\-reset\fP
 can only be used with kernel versions 2.6.14 or later.
+.PP
+\fIWarning:\fP if you are using connection tracking and \fBACCEPT\fP'ing
+\fBESTABLISHED\fP (and possibly \fBRELATED\fP) state packets, do not
+indiscriminately \fBREJECT\fP (especially with \fITCP RST\fP) \fBINVALID\fP
+state packets.  Sometimes naturally occuring packet reordering will result
+in packets being considered \fBINVALID\fP and the generated \fITCP RST\fP
+will abort an otherwise healthy connection.
+.P
+Suggested use:
+.br
+    -A INPUT -m state ESTABLISHED,RELATED -j ACCEPT
+.br
+    -A INPUT -m state INVALID -j DROP
+.br
+(and -j REJECT rules go here at the end)
diff --git a/extensions/libipt_REJECT.man b/extensions/libipt_REJECT.man
index 8a360ce7..d0f0f19b 100644
--- a/extensions/libipt_REJECT.man
+++ b/extensions/libipt_REJECT.man
@@ -30,3 +30,18 @@  TCP RST packet to be sent back.  This is mainly useful for blocking
 hosts (which won't accept your mail otherwise).
 .IP
 (*) Using icmp\-admin\-prohibited with kernels that do not support it will result in a plain DROP instead of REJECT
+.PP
+\fIWarning:\fP if you are using connection tracking and \fBACCEPT\fP'ing
+\fBESTABLISHED\fP (and possibly \fBRELATED\fP) state packets, do not
+indiscriminately \fBREJECT\fP (especially with \fITCP RST\fP) \fBINVALID\fP
+state packets.  Sometimes naturally occuring packet reordering will result
+in packets being considered \fBINVALID\fP and the generated \fITCP RST\fP
+will abort an otherwise healthy connection.
+.P
+Suggested use:
+.br
+    -A INPUT -m state ESTABLISHED,RELATED -j ACCEPT
+.br
+    -A INPUT -m state INVALID -j DROP
+.br
+(and -j REJECT rules go here at the end)