diff mbox series

[net-next,v4,4/4] netfilter: nfnetlink: Handle ACK flags for batch messages

Message ID 20240418104737.77914-5-donald.hunter@gmail.com
State Awaiting Upstream
Headers show
Series netlink: Add nftables spec w/ multi messages | expand

Commit Message

Donald Hunter April 18, 2024, 10:47 a.m. UTC
The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
messages. This is a problem for ynl which wants to receive an ack for
every message it sends, not just the commands in between the begin/end
messages.

Add processing for ACKs for begin/end messages and provide responses
when requested.

I have checked that iproute2, pyroute2 and systemd are unaffected by
this change since none of them use NLM_F_ACK for batch begin/end.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
---
 net/netfilter/nfnetlink.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Pablo Neira Ayuso April 18, 2024, 4:30 p.m. UTC | #1
Hi Donald,

Apologies for a bit late jumping back on this, it took me a while.

On Thu, Apr 18, 2024 at 11:47:37AM +0100, Donald Hunter wrote:
> The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
> messages. This is a problem for ynl which wants to receive an ack for
> every message it sends, not just the commands in between the begin/end
> messages.

Just a side note: Turning on NLM_F_ACK for every message fills up the
receiver buffer very quickly, leading to ENOBUFS. Netlink, in general,
supports batching (with non-atomic semantics), I did not look at ynl
in detail, if it does send() + recv() for each message for other
subsystem then fine, but if it uses batching to amortize the cost of
the syscall then this explicit ACK could be an issue with very large
batches.

Out of curiosity: Why does the tool need an explicit ack for each
command? As mentioned above, this consumes a lot netlink bandwidth.

> Add processing for ACKs for begin/end messages and provide responses
> when requested.
> 
> I have checked that iproute2, pyroute2 and systemd are unaffected by
> this change since none of them use NLM_F_ACK for batch begin/end.

nitpick: Quick grep shows me iproute2 does not use the nfnetlink
subsystem? If I am correct, maybe remove this.

Thanks!

One more comment below.

> Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
> ---
>  net/netfilter/nfnetlink.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index c9fbe0f707b5..4abf660c7baf 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -427,6 +427,9 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
>  
>  	nfnl_unlock(subsys_id);
>  
> +	if (nlh->nlmsg_flags & NLM_F_ACK)
> +		nfnl_err_add(&err_list, nlh, 0, &extack);
> +
>  	while (skb->len >= nlmsg_total_size(0)) {
>  		int msglen, type;
>  
> @@ -573,6 +576,8 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
>  		} else if (err) {
>  			ss->abort(net, oskb, NFNL_ABORT_NONE);
>  			netlink_ack(oskb, nlmsg_hdr(oskb), err, NULL);
> +		} else if (nlh->nlmsg_flags & NLM_F_ACK) {
> +			nfnl_err_add(&err_list, nlh, 0, &extack);
>  		}
>  	} else {
>  		enum nfnl_abort_action abort_action;
> -- 
> 2.44.0
>
Jakub Kicinski April 18, 2024, 4:51 p.m. UTC | #2
On Thu, 18 Apr 2024 18:30:55 +0200 Pablo Neira Ayuso wrote:
> Out of curiosity: Why does the tool need an explicit ack for each
> command? As mentioned above, this consumes a lot netlink bandwidth.

I think that the tool is sort of besides the point, it's just a PoC.
The point is that we're trying to describe netlink protocols in machine
readable fashion. Which in turn makes it possible to write netlink
binding generators in any language, like modern RPC frameworks.
For that to work we need protocol basics to be followed.

That's not to say that we're going to force all netlink families to
change to follow extra new rules. Just those that want to be accessed
via the bindings.
Donald Hunter April 19, 2024, 11:20 a.m. UTC | #3
Pablo Neira Ayuso <pablo@netfilter.org> writes:

> Hi Donald,
>
> Apologies for a bit late jumping back on this, it took me a while.
>
> On Thu, Apr 18, 2024 at 11:47:37AM +0100, Donald Hunter wrote:
>> The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
>> messages. This is a problem for ynl which wants to receive an ack for
>> every message it sends, not just the commands in between the begin/end
>> messages.
>
> Just a side note: Turning on NLM_F_ACK for every message fills up the
> receiver buffer very quickly, leading to ENOBUFS. Netlink, in general,
> supports batching (with non-atomic semantics), I did not look at ynl
> in detail, if it does send() + recv() for each message for other
> subsystem then fine, but if it uses batching to amortize the cost of
> the syscall then this explicit ACK could be an issue with very large
> batches.

ynl is batching the messages for send() and will accept batches from
recv() but nfnetlink is sending each ack separately. It is using
netlink_ack() which uses a new skb for each message, for example:

sudo strace -e sendto,recvfrom ./tools/net/ynl/cli.py \
     --spec Documentation/netlink/specs/nftables.yaml \
     --multi batch-begin '{"res-id": 10}' \
     --multi newtable '{"name": "test", "nfgen-family": 1}' \
     --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
     --multi batch-end '{"res-id": 10}'
sendto(3, [[{nlmsg_len=20, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28254, nlmsg_pid=0}, "\x00\x00\x00\x0a"], [{nlmsg_len=32, nlmsg_type=0xa00 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28255, nlmsg_pid=0}, "\x01\x00\x00\x00\x09\x00\x01\x00\x74\x65\x73\x74\x00\x00\x00\x00"], [{nlmsg_len=44, nlmsg_type=0xa03 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28256, nlmsg_pid=0}, "\x01\x00\x00\x00\x0a\x00\x03\x00\x63\x68\x61\x69\x6e\x00\x00\x00\x09\x00\x01\x00\x74\x65\x73\x74\x00\x00\x00\x00"], [{nlmsg_len=20, nlmsg_type=0x11 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28257, nlmsg_pid=0}, "\x00\x00\x00\x0a"]], 116, 0, NULL, 0) = 116
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28254, nlmsg_pid=997}, {error=0, msg={nlmsg_len=20, nlmsg_type=NFNL_MSG_BATCH_BEGIN, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28254, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28255, nlmsg_pid=997}, {error=0, msg={nlmsg_len=32, nlmsg_type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWTABLE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28255, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28256, nlmsg_pid=997}, {error=0, msg={nlmsg_len=44, nlmsg_type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWCHAIN, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28256, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28257, nlmsg_pid=997}, {error=0, msg={nlmsg_len=20, nlmsg_type=NFNL_MSG_BATCH_END, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28257, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36

> Out of curiosity: Why does the tool need an explicit ack for each
> command? As mentioned above, this consumes a lot netlink bandwidth.

For the ynl python library, I guess it was a design choice to request an
ack for each command.

Since the Netlink API allows a user to request acks, it seems necessary
to be consistent about providing them. Otherwise we'd need to extend the
netlink message specs to say which messages are ack capable and which
are not.

>> Add processing for ACKs for begin/end messages and provide responses
>> when requested.
>> 
>> I have checked that iproute2, pyroute2 and systemd are unaffected by
>> this change since none of them use NLM_F_ACK for batch begin/end.
>
> nitpick: Quick grep shows me iproute2 does not use the nfnetlink
> subsystem? If I am correct, maybe remove this.

Yeah, my mistake. I did check iproute2 but didn't mean to add it to the
list. For nft, NFNL_MSG_BATCH_* usage is contained in libnftnl from what
I can see. I'll update the patch.

> Thanks!
>
> One more comment below.

Did you miss adding a comment?

>
>> Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
>> ---
>>  net/netfilter/nfnetlink.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>> 
>> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
>> index c9fbe0f707b5..4abf660c7baf 100644
>> --- a/net/netfilter/nfnetlink.c
>> +++ b/net/netfilter/nfnetlink.c
>> @@ -427,6 +427,9 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
>>  
>>  	nfnl_unlock(subsys_id);
>>  
>> +	if (nlh->nlmsg_flags & NLM_F_ACK)
>> +		nfnl_err_add(&err_list, nlh, 0, &extack);
>> +
>>  	while (skb->len >= nlmsg_total_size(0)) {
>>  		int msglen, type;
>>  
>> @@ -573,6 +576,8 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
>>  		} else if (err) {
>>  			ss->abort(net, oskb, NFNL_ABORT_NONE);
>>  			netlink_ack(oskb, nlmsg_hdr(oskb), err, NULL);
>> +		} else if (nlh->nlmsg_flags & NLM_F_ACK) {
>> +			nfnl_err_add(&err_list, nlh, 0, &extack);
>>  		}
>>  	} else {
>>  		enum nfnl_abort_action abort_action;
>> -- 
>> 2.44.0
>>
Jakub Kicinski April 22, 2024, 8:33 p.m. UTC | #4
On Thu, 18 Apr 2024 09:51:53 -0700 Jakub Kicinski wrote:
> On Thu, 18 Apr 2024 18:30:55 +0200 Pablo Neira Ayuso wrote:
> > Out of curiosity: Why does the tool need an explicit ack for each
> > command? As mentioned above, this consumes a lot netlink bandwidth.  
> 
> I think that the tool is sort of besides the point, it's just a PoC.
> The point is that we're trying to describe netlink protocols in machine
> readable fashion. Which in turn makes it possible to write netlink
> binding generators in any language, like modern RPC frameworks.
> For that to work we need protocol basics to be followed.
> 
> That's not to say that we're going to force all netlink families to
> change to follow extra new rules. Just those that want to be accessed
> via the bindings.

Pablo, any thoughts? Convinced? Given this touches YNL in significant
ways I'd prefer to merge it to net-next.
Pablo Neira Ayuso April 22, 2024, 10:55 p.m. UTC | #5
On Fri, Apr 19, 2024 at 12:20:25PM +0100, Donald Hunter wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> writes:
> 
> > Hi Donald,
> >
> > Apologies for a bit late jumping back on this, it took me a while.
> >
> > On Thu, Apr 18, 2024 at 11:47:37AM +0100, Donald Hunter wrote:
> >> The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
> >> messages. This is a problem for ynl which wants to receive an ack for
> >> every message it sends, not just the commands in between the begin/end
> >> messages.
> >
> > Just a side note: Turning on NLM_F_ACK for every message fills up the
> > receiver buffer very quickly, leading to ENOBUFS. Netlink, in general,
> > supports batching (with non-atomic semantics), I did not look at ynl
> > in detail, if it does send() + recv() for each message for other
> > subsystem then fine, but if it uses batching to amortize the cost of
> > the syscall then this explicit ACK could be an issue with very large
> > batches.
> 
> ynl is batching the messages for send() and will accept batches from
> recv() but nfnetlink is sending each ack separately.

Yes, like it happens with other existing netlink interfaces when
batching is used, nfnetlink is no different in such case.

> It is using netlink_ack() which uses a new skb for each message, for
> example:
> 
> sudo strace -e sendto,recvfrom ./tools/net/ynl/cli.py \
>      --spec Documentation/netlink/specs/nftables.yaml \
>      --multi batch-begin '{"res-id": 10}' \
>      --multi newtable '{"name": "test", "nfgen-family": 1}' \
>      --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
>      --multi batch-end '{"res-id": 10}'
> sendto(3, [[{nlmsg_len=20, nlmsg_type=0x10 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28254, nlmsg_pid=0}, "\x00\x00\x00\x0a"], [{nlmsg_len=32, nlmsg_type=0xa00 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28255, nlmsg_pid=0}, "\x01\x00\x00\x00\x09\x00\x01\x00\x74\x65\x73\x74\x00\x00\x00\x00"], [{nlmsg_len=44, nlmsg_type=0xa03 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28256, nlmsg_pid=0}, "\x01\x00\x00\x00\x0a\x00\x03\x00\x63\x68\x61\x69\x6e\x00\x00\x00\x09\x00\x01\x00\x74\x65\x73\x74\x00\x00\x00\x00"], [{nlmsg_len=20, nlmsg_type=0x11 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28257, nlmsg_pid=0}, "\x00\x00\x00\x0a"]], 116, 0, NULL, 0) = 116
> recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28254, nlmsg_pid=997}, {error=0, msg={nlmsg_len=20, nlmsg_type=NFNL_MSG_BATCH_BEGIN, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28254, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
> recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28255, nlmsg_pid=997}, {error=0, msg={nlmsg_len=32, nlmsg_type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWTABLE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28255, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
> recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28256, nlmsg_pid=997}, {error=0, msg={nlmsg_len=44, nlmsg_type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_NEWCHAIN, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28256, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
> recvfrom(3, [{nlmsg_len=36, nlmsg_type=NLMSG_ERROR, nlmsg_flags=NLM_F_CAPPED, nlmsg_seq=28257, nlmsg_pid=997}, {error=0, msg={nlmsg_len=20, nlmsg_type=NFNL_MSG_BATCH_END, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, nlmsg_seq=28257, nlmsg_pid=0}}], 131072, 0, NULL, NULL) = 36
> 
> > Out of curiosity: Why does the tool need an explicit ack for each
> > command? As mentioned above, this consumes a lot netlink bandwidth.
> 
> For the ynl python library, I guess it was a design choice to request an
> ack for each command.
> 
> Since the Netlink API allows a user to request acks, it seems necessary
> to be consistent about providing them. Otherwise we'd need to extend the
> netlink message specs to say which messages are ack capable and which
> are not.

I see.

> >> Add processing for ACKs for begin/end messages and provide responses
> >> when requested.
> >> 
> >> I have checked that iproute2, pyroute2 and systemd are unaffected by
> >> this change since none of them use NLM_F_ACK for batch begin/end.
> >
> > nitpick: Quick grep shows me iproute2 does not use the nfnetlink
> > subsystem? If I am correct, maybe remove this.
> 
> Yeah, my mistake. I did check iproute2 but didn't mean to add it to the
> list. For nft, NFNL_MSG_BATCH_* usage is contained in libnftnl from what
> I can see. I'll update the patch.
> 
> > Thanks!
> >
> > One more comment below.
> 
> Did you miss adding a comment?

No more comments, thanks.
Pablo Neira Ayuso April 22, 2024, 10:56 p.m. UTC | #6
On Mon, Apr 22, 2024 at 01:33:28PM -0700, Jakub Kicinski wrote:
> On Thu, 18 Apr 2024 09:51:53 -0700 Jakub Kicinski wrote:
> > On Thu, 18 Apr 2024 18:30:55 +0200 Pablo Neira Ayuso wrote:
> > > Out of curiosity: Why does the tool need an explicit ack for each
> > > command? As mentioned above, this consumes a lot netlink bandwidth.  
> > 
> > I think that the tool is sort of besides the point, it's just a PoC.
> > The point is that we're trying to describe netlink protocols in machine
> > readable fashion. Which in turn makes it possible to write netlink
> > binding generators in any language, like modern RPC frameworks.
> > For that to work we need protocol basics to be followed.
> > 
> > That's not to say that we're going to force all netlink families to
> > change to follow extra new rules. Just those that want to be accessed
> > via the bindings.
> 
> Pablo, any thoughts? Convinced? Given this touches YNL in significant
> ways I'd prefer to merge it to net-next.

No objections, thanks for asking.
diff mbox series

Patch

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index c9fbe0f707b5..4abf660c7baf 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -427,6 +427,9 @@  static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	nfnl_unlock(subsys_id);
 
+	if (nlh->nlmsg_flags & NLM_F_ACK)
+		nfnl_err_add(&err_list, nlh, 0, &extack);
+
 	while (skb->len >= nlmsg_total_size(0)) {
 		int msglen, type;
 
@@ -573,6 +576,8 @@  static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
 		} else if (err) {
 			ss->abort(net, oskb, NFNL_ABORT_NONE);
 			netlink_ack(oskb, nlmsg_hdr(oskb), err, NULL);
+		} else if (nlh->nlmsg_flags & NLM_F_ACK) {
+			nfnl_err_add(&err_list, nlh, 0, &extack);
 		}
 	} else {
 		enum nfnl_abort_action abort_action;