diff mbox series

doc: Document that kernel may accept unimplemented expressions

Message ID 20220409094402.22567-1-toiwoton@gmail.com
State Changes Requested
Delegated to: Pablo Neira
Headers show
Series doc: Document that kernel may accept unimplemented expressions | expand

Commit Message

Topi Miettinen April 9, 2022, 9:44 a.m. UTC
Kernel silently accepts input chain filters using meta skuid, meta
skgid, meta cgroup or socket cgroupv2 expressions but they don't work
yet. Warn the users of this possibility.

Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
---
 doc/nft.txt | 5 +++++
 1 file changed, 5 insertions(+)


base-commit: 6fa4ff56385831f01bd9d993178969a4eddbcdbf

Comments

Florian Westphal April 9, 2022, 9:51 a.m. UTC | #1
Topi Miettinen <toiwoton@gmail.com> wrote:
> Kernel silently accepts input chain filters using meta skuid, meta
> skgid, meta cgroup or socket cgroupv2 expressions but they don't work
> yet. Warn the users of this possibility.
> 
> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
> ---
>  doc/nft.txt | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/nft.txt b/doc/nft.txt
> index f7a53ac9..4820b4ae 100644
> --- a/doc/nft.txt
> +++ b/doc/nft.txt
> @@ -932,6 +932,11 @@ filter output oif wlan0
>  ^^^^^^^^^^^^^^^^^^^^^^^
>  ---------------------------------
>  
> +Note that the kernel may accept expressions without errors even if it
> +doesn't implement the feature. For example, input chain filters using
> +*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2*
> +expressions are silently accepted but they don't work yet.

Thats not correct.

Those expressions load values from skb->sk, i.e. the socket associated
with the packet.

In input, such socket may exist, either because of tproxy rules, early
demux, or bpf programs.
Topi Miettinen April 9, 2022, 10:10 a.m. UTC | #2
On 9.4.2022 12.51, Florian Westphal wrote:
> Topi Miettinen <toiwoton@gmail.com> wrote:
>> Kernel silently accepts input chain filters using meta skuid, meta
>> skgid, meta cgroup or socket cgroupv2 expressions but they don't work
>> yet. Warn the users of this possibility.
>>
>> Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
>> ---
>>   doc/nft.txt | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/doc/nft.txt b/doc/nft.txt
>> index f7a53ac9..4820b4ae 100644
>> --- a/doc/nft.txt
>> +++ b/doc/nft.txt
>> @@ -932,6 +932,11 @@ filter output oif wlan0
>>   ^^^^^^^^^^^^^^^^^^^^^^^
>>   ---------------------------------
>>   
>> +Note that the kernel may accept expressions without errors even if it
>> +doesn't implement the feature. For example, input chain filters using
>> +*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2*
>> +expressions are silently accepted but they don't work yet.
> 
> Thats not correct.
> 
> Those expressions load values from skb->sk, i.e. the socket associated
> with the packet.
> 
> In input, such socket may exist, either because of tproxy rules, early
> demux, or bpf programs.

Thanks. How about:
Note that the kernel may accept expressions without errors even if it
doesn't implement the feature. For example, input chain filters using
*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2*
expressions are silently accepted but they don't work yet, except when 
used with *tproxy* rules, early demultiplexing or BPF programs.

Could you please explain this 'early demux' concept? Is this something 
that can be triggered with NFT rules, kernel configuration etc? I can't 
find any documentation.

-Topi
Florian Westphal April 9, 2022, 10:22 a.m. UTC | #3
Topi Miettinen <toiwoton@gmail.com> wrote:
> Note that the kernel may accept expressions without errors even if it
> doesn't implement the feature. For example, input chain filters using
> *meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2*
> expressions are silently accepted but they don't work yet, except when used
> with *tproxy* rules, early demultiplexing or BPF programs.

This is what iptables-extensions(8) says:

IMPORTANT:  when  being  used in the INPUT chain, the cgroup matcher is currently only of limited functionality, meaning it will only match on packets that are processed for local sockets through early socket demuxing. Therefore, general usage on the INPUT chain is not advised unless the implications
are well understood.

For nftables, this is true for all meta types that use skb->sk internally,
such as skuid, skgid, cgroup, ...

> Could you please explain this 'early demux' concept? Is this something that
> can be triggered with NFT rules, kernel configuration etc? I can't find any
> documentation.

sysctl.
net.ipv4.ip_early_demux = 1
net.ipv4.tcp_early_demux = 1
net.ipv4.udp_early_demux = 1

This is a performance optimization, tcp edemux only works for
sockets in established state, udp demux has restrictions as well.

So, no guarantee that this will set the socket reliably, hence the
paragraph in the iptables-extensions manpage.
Topi Miettinen April 9, 2022, 10:43 a.m. UTC | #4
On 9.4.2022 13.22, Florian Westphal wrote:
> Topi Miettinen <toiwoton@gmail.com> wrote:
>> Note that the kernel may accept expressions without errors even if it
>> doesn't implement the feature. For example, input chain filters using
>> *meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2*
>> expressions are silently accepted but they don't work yet, except when used
>> with *tproxy* rules, early demultiplexing or BPF programs.
> 
> This is what iptables-extensions(8) says:
> 
> IMPORTANT:  when  being  used in the INPUT chain, the cgroup matcher is currently only of limited functionality, meaning it will only match on packets that are processed for local sockets through early socket demuxing. Therefore, general usage on the INPUT chain is not advised unless the implications
> are well understood.
> 
> For nftables, this is true for all meta types that use skb->sk internally,
> such as skuid, skgid, cgroup, ...
> 
>> Could you please explain this 'early demux' concept? Is this something that
>> can be triggered with NFT rules, kernel configuration etc? I can't find any
>> documentation.
> 
> sysctl.
> net.ipv4.ip_early_demux = 1
> net.ipv4.tcp_early_demux = 1
> net.ipv4.udp_early_demux = 1
> 
> This is a performance optimization, tcp edemux only works for
> sockets in established state, udp demux has restrictions as well.
> 
> So, no guarantee that this will set the socket reliably, hence the
> paragraph in the iptables-extensions manpage.


Thanks. From this blog post I suppose the problem is that NFT rules 
aren't checked after final demux:
https://www.privateinternetaccess.com/blog/linux-networking-stack-from-the-ground-up-part-4-2/

Would it be possible to add such checks in the future?

What about:

Note that the kernel may accept expressions without errors even if it
doesn't implement the feature. For example, input chain filters using
expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or
*socket cgroupv2* are silently accepted but they don't work reliably
yet, except when used with *tproxy* rules, early demultiplexing
(available only for TCP for sockets in established state and UDP demux
has restrictions as well) or BPF programs.

-Topi
Florian Westphal April 9, 2022, 11:42 a.m. UTC | #5
Topi Miettinen <toiwoton@gmail.com> wrote:
> Would it be possible to add such checks in the future?

We could add socket skuid, socket skgid, its not hard.

> Note that the kernel may accept expressions without errors even if it
> doesn't implement the feature. For example, input chain filters using
> expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or

Those can not be made to work.

> *socket cgroupv2* are silently accepted but they don't work reliably

socket should work, at least for tcp and udp.
The cgroupv2 is buggy.  I sent a patch, feel free to test it.
Topi Miettinen April 9, 2022, 1:01 p.m. UTC | #6
On 9.4.2022 14.42, Florian Westphal wrote:
> Topi Miettinen <toiwoton@gmail.com> wrote:
>> Would it be possible to add such checks in the future?
> 
> We could add socket skuid, socket skgid, its not hard.

That would be nice. Could the syntax still remain 'meta skuid' even 
though the credentials come from a socket for compatibility?

>> Note that the kernel may accept expressions without errors even if it
>> doesn't implement the feature. For example, input chain filters using
>> expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or
> 
> Those can not be made to work.
> 
>> *socket cgroupv2* are silently accepted but they don't work reliably
> 
> socket should work, at least for tcp and udp.
> The cgroupv2 is buggy.  I sent a patch, feel free to test it.

Once the patch is applied, the warnings in manual page wrt. cgroupv2 
would only apply to old kernels. How about the following:

Note that different kernel versions may accept expressions without 
errors even if they don't implement the feature. For example, input 
chain filters using expressions such as *meta skuid*, *meta skgid*, 
*meta cgroup* or *socket cgroupv2* are silently accepted but they may 
not work reliably or at all.

-Topi
Pablo Neira Ayuso April 10, 2022, 3:16 p.m. UTC | #7
On Sat, Apr 09, 2022 at 04:01:48PM +0300, Topi Miettinen wrote:
> On 9.4.2022 14.42, Florian Westphal wrote:
> > Topi Miettinen <toiwoton@gmail.com> wrote:
> > > Would it be possible to add such checks in the future?
> > 
> > We could add socket skuid, socket skgid, its not hard.
> 
> That would be nice. Could the syntax still remain 'meta skuid' even though
> the credentials come from a socket for compatibility?
> 
> > > Note that the kernel may accept expressions without errors even if it
> > > doesn't implement the feature. For example, input chain filters using
> > > expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or
> > 
> > Those can not be made to work.
> > 
> > > *socket cgroupv2* are silently accepted but they don't work reliably
> > 
> > socket should work, at least for tcp and udp.
> > The cgroupv2 is buggy.  I sent a patch, feel free to test it.
> 
> Once the patch is applied, the warnings in manual page wrt. cgroupv2 would
> only apply to old kernels. How about the following:
> 
> Note that different kernel versions may accept expressions without errors
> even if they don't implement the feature. For example, input chain filters
> using expressions such as *meta skuid*, *meta skgid*, *meta cgroup* or
> *socket cgroupv2* are silently accepted but they may not work reliably or at
> all.

Wrt this fix, it will be passed to -stable.

Regarding general use of socket match from input: Probably more
documentation on what kind of sockets early demux is actually being
attached to might help understand how this is working.
diff mbox series

Patch

diff --git a/doc/nft.txt b/doc/nft.txt
index f7a53ac9..4820b4ae 100644
--- a/doc/nft.txt
+++ b/doc/nft.txt
@@ -932,6 +932,11 @@  filter output oif wlan0
 ^^^^^^^^^^^^^^^^^^^^^^^
 ---------------------------------
 
+Note that the kernel may accept expressions without errors even if it
+doesn't implement the feature. For example, input chain filters using
+*meta skuid*, *meta skgid*, *meta cgroup* or *socket cgroupv2*
+expressions are silently accepted but they don't work yet.
+
 EXIT STATUS
 -----------
 On success, nft exits with a status of 0. Unspecified errors cause it to exit