Message ID | 20220409094402.22567-1-toiwoton@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
Series | doc: Document that kernel may accept unimplemented expressions | expand |
Topi Miettinen <toiwoton@gmail.com> wrote: > Kernel silently accepts input chain filters using meta skuid, meta > skgid, meta cgroup or socket cgroupv2 expressions but they don't work > yet. Warn the users of this possibility. > > Signed-off-by: Topi Miettinen <toiwoton@gmail.com> > --- > doc/nft.txt | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/doc/nft.txt b/doc/nft.txt > index f7a53ac9..4820b4ae 100644 > --- a/doc/nft.txt > +++ b/doc/nft.txt > @@ -932,6 +932,11 @@ filter output oif wlan0 > ^^^^^^^^^^^^^^^^^^^^^^^ > --------------------------------- > > +Note that the kernel may accept expressions without errors even if it > +doesn't implement the feature. For example, input chain filters using > +*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* > +expressions are silently accepted but they don't work yet. Thats not correct. Those expressions load values from skb->sk, i.e. the socket associated with the packet. In input, such socket may exist, either because of tproxy rules, early demux, or bpf programs.
On 9.4.2022 12.51, Florian Westphal wrote: > Topi Miettinen <toiwoton@gmail.com> wrote: >> Kernel silently accepts input chain filters using meta skuid, meta >> skgid, meta cgroup or socket cgroupv2 expressions but they don't work >> yet. Warn the users of this possibility. >> >> Signed-off-by: Topi Miettinen <toiwoton@gmail.com> >> --- >> doc/nft.txt | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/doc/nft.txt b/doc/nft.txt >> index f7a53ac9..4820b4ae 100644 >> --- a/doc/nft.txt >> +++ b/doc/nft.txt >> @@ -932,6 +932,11 @@ filter output oif wlan0 >> ^^^^^^^^^^^^^^^^^^^^^^^ >> --------------------------------- >> >> +Note that the kernel may accept expressions without errors even if it >> +doesn't implement the feature. For example, input chain filters using >> +*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* >> +expressions are silently accepted but they don't work yet. > > Thats not correct. > > Those expressions load values from skb->sk, i.e. the socket associated > with the packet. > > In input, such socket may exist, either because of tproxy rules, early > demux, or bpf programs. Thanks. How about: Note that the kernel may accept expressions without errors even if it doesn't implement the feature. For example, input chain filters using *meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* expressions are silently accepted but they don't work yet, except when used with *tproxy* rules, early demultiplexing or BPF programs. Could you please explain this 'early demux' concept? Is this something that can be triggered with NFT rules, kernel configuration etc? I can't find any documentation. -Topi
Topi Miettinen <toiwoton@gmail.com> wrote: > Note that the kernel may accept expressions without errors even if it > doesn't implement the feature. For example, input chain filters using > *meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* > expressions are silently accepted but they don't work yet, except when used > with *tproxy* rules, early demultiplexing or BPF programs. This is what iptables-extensions(8) says: IMPORTANT: when being used in the INPUT chain, the cgroup matcher is currently only of limited functionality, meaning it will only match on packets that are processed for local sockets through early socket demuxing. Therefore, general usage on the INPUT chain is not advised unless the implications are well understood. For nftables, this is true for all meta types that use skb->sk internally, such as skuid, skgid, cgroup, ... > Could you please explain this 'early demux' concept? Is this something that > can be triggered with NFT rules, kernel configuration etc? I can't find any > documentation. sysctl. net.ipv4.ip_early_demux = 1 net.ipv4.tcp_early_demux = 1 net.ipv4.udp_early_demux = 1 This is a performance optimization, tcp edemux only works for sockets in established state, udp demux has restrictions as well. So, no guarantee that this will set the socket reliably, hence the paragraph in the iptables-extensions manpage.
On 9.4.2022 13.22, Florian Westphal wrote: > Topi Miettinen <toiwoton@gmail.com> wrote: >> Note that the kernel may accept expressions without errors even if it >> doesn't implement the feature. For example, input chain filters using >> *meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* >> expressions are silently accepted but they don't work yet, except when used >> with *tproxy* rules, early demultiplexing or BPF programs. > > This is what iptables-extensions(8) says: > > IMPORTANT: when being used in the INPUT chain, the cgroup matcher is currently only of limited functionality, meaning it will only match on packets that are processed for local sockets through early socket demuxing. Therefore, general usage on the INPUT chain is not advised unless the implications > are well understood. > > For nftables, this is true for all meta types that use skb->sk internally, > such as skuid, skgid, cgroup, ... > >> Could you please explain this 'early demux' concept? Is this something that >> can be triggered with NFT rules, kernel configuration etc? I can't find any >> documentation. > > sysctl. > net.ipv4.ip_early_demux = 1 > net.ipv4.tcp_early_demux = 1 > net.ipv4.udp_early_demux = 1 > > This is a performance optimization, tcp edemux only works for > sockets in established state, udp demux has restrictions as well. > > So, no guarantee that this will set the socket reliably, hence the > paragraph in the iptables-extensions manpage. Thanks. From this blog post I suppose the problem is that NFT rules aren't checked after final demux: https://www.privateinternetaccess.com/blog/linux-networking-stack-from-the-ground-up-part-4-2/ Would it be possible to add such checks in the future? What about: Note that the kernel may accept expressions without errors even if it doesn't implement the feature. For example, input chain filters using expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* are silently accepted but they don't work reliably yet, except when used with *tproxy* rules, early demultiplexing (available only for TCP for sockets in established state and UDP demux has restrictions as well) or BPF programs. -Topi
Topi Miettinen <toiwoton@gmail.com> wrote: > Would it be possible to add such checks in the future? We could add socket skuid, socket skgid, its not hard. > Note that the kernel may accept expressions without errors even if it > doesn't implement the feature. For example, input chain filters using > expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or Those can not be made to work. > *socket cgroupv2* are silently accepted but they don't work reliably socket should work, at least for tcp and udp. The cgroupv2 is buggy. I sent a patch, feel free to test it.
On 9.4.2022 14.42, Florian Westphal wrote: > Topi Miettinen <toiwoton@gmail.com> wrote: >> Would it be possible to add such checks in the future? > > We could add socket skuid, socket skgid, its not hard. That would be nice. Could the syntax still remain 'meta skuid' even though the credentials come from a socket for compatibility? >> Note that the kernel may accept expressions without errors even if it >> doesn't implement the feature. For example, input chain filters using >> expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or > > Those can not be made to work. > >> *socket cgroupv2* are silently accepted but they don't work reliably > > socket should work, at least for tcp and udp. > The cgroupv2 is buggy. I sent a patch, feel free to test it. Once the patch is applied, the warnings in manual page wrt. cgroupv2 would only apply to old kernels. How about the following: Note that different kernel versions may accept expressions without errors even if they don't implement the feature. For example, input chain filters using expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* are silently accepted but they may not work reliably or at all. -Topi
On Sat, Apr 09, 2022 at 04:01:48PM +0300, Topi Miettinen wrote: > On 9.4.2022 14.42, Florian Westphal wrote: > > Topi Miettinen <toiwoton@gmail.com> wrote: > > > Would it be possible to add such checks in the future? > > > > We could add socket skuid, socket skgid, its not hard. > > That would be nice. Could the syntax still remain 'meta skuid' even though > the credentials come from a socket for compatibility? > > > > Note that the kernel may accept expressions without errors even if it > > > doesn't implement the feature. For example, input chain filters using > > > expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or > > > > Those can not be made to work. > > > > > *socket cgroupv2* are silently accepted but they don't work reliably > > > > socket should work, at least for tcp and udp. > > The cgroupv2 is buggy. I sent a patch, feel free to test it. > > Once the patch is applied, the warnings in manual page wrt. cgroupv2 would > only apply to old kernels. How about the following: > > Note that different kernel versions may accept expressions without errors > even if they don't implement the feature. For example, input chain filters > using expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or > *socket cgroupv2* are silently accepted but they may not work reliably or at > all. Wrt this fix, it will be passed to -stable. Regarding general use of socket match from input: Probably more documentation on what kind of sockets early demux is actually being attached to might help understand how this is working.
diff --git a/doc/nft.txt b/doc/nft.txt index f7a53ac9..4820b4ae 100644 --- a/doc/nft.txt +++ b/doc/nft.txt @@ -932,6 +932,11 @@ filter output oif wlan0 ^^^^^^^^^^^^^^^^^^^^^^^ --------------------------------- +Note that the kernel may accept expressions without errors even if it +doesn't implement the feature. For example, input chain filters using +*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2* +expressions are silently accepted but they don't work yet. + EXIT STATUS ----------- On success, nft exits with a status of 0. Unspecified errors cause it to exit
Kernel silently accepts input chain filters using meta skuid, meta skgid, meta cgroup or socket cgroupv2 expressions but they don't work yet. Warn the users of this possibility. Signed-off-by: Topi Miettinen <toiwoton@gmail.com> --- doc/nft.txt | 5 +++++ 1 file changed, 5 insertions(+) base-commit: 6fa4ff56385831f01bd9d993178969a4eddbcdbf