diff mbox series

[v3] af_netlink: ensure that NLMSG_DONE never fails in dumps

Message ID 20171109014218.20562-1-Jason@zx2c4.com
State Superseded, archived
Delegated to: David Miller
Headers show
Series [v3] af_netlink: ensure that NLMSG_DONE never fails in dumps | expand

Commit Message

Jason A. Donenfeld Nov. 9, 2017, 1:42 a.m. UTC
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Can we get this into 4.14? Is there still time? It should also be queued
up for stable.

Changes v2->v3:
  - Johannes didn't like the subject line of the patch, so the only
    thing that's changed in this version is the new subject line.

 net/netlink/af_netlink.c | 14 ++++++++------
 net/netlink/af_netlink.h |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

Comments

Johannes Berg Nov. 9, 2017, 2:02 a.m. UTC | #1
On Thu, 2017-11-09 at 10:42 +0900, Jason A. Donenfeld wrote:
> +++ b/net/netlink/af_netlink.c
> @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
>  	struct sk_buff *skb = NULL;
>  	struct nlmsghdr *nlh;
>  	struct module *module;
> -	int len, err = -ENOBUFS;
> +	int err = -ENOBUFS;
>  	int alloc_min_size;
>  	int alloc_size;
>  
> @@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
>  	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
>  	netlink_skb_set_owner_r(skb, sk);
>  
> -	len = cb->dump(skb, cb);
> +	if (nlk->dump_done_errno > 0)
> +		nlk->dump_done_errno = cb->dump(skb, cb);
>  
> -	if (len > 0) {
> +	if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
>  		mutex_unlock(nlk->cb_mutex);

nit: I think your line got a little long here :)
 
> -	memcpy(nlmsg_data(nlh), &len, sizeof(len));
> +	memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, sizeof(nlk->dump_done_errno));

and here

> +	nlk->dump_done_errno = INT_MAX;

I guess positive values aren't really returned from dump?

johannes
Jason A. Donenfeld Nov. 9, 2017, 2:57 a.m. UTC | #2
On Thu, Nov 9, 2017 at 11:02 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> nit: I think your line got a little long here :)

Ack. For v4.

> and here

Ack. For v4.

>
>> +     nlk->dump_done_errno = INT_MAX;
>
> I guess positive values aren't really returned from dump?

When a positive value is returned, the API uses this to signify that
the dump is not done, and it should be called again. When 0 or
negative is returned, it means the dump is complete and the return
value should be returned in DONE. We therefore initialize
dump_done_errno to some positive value, so that we can make calling
->dump() conditional on it being positive. When it becomes zero or
negative, it's time to return. This mirrors the semantics of the
actual ->dump() function precisely.
diff mbox series

Patch

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..7020689e643e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@  static int netlink_dump(struct sock *sk)
 	struct sk_buff *skb = NULL;
 	struct nlmsghdr *nlh;
 	struct module *module;
-	int len, err = -ENOBUFS;
+	int err = -ENOBUFS;
 	int alloc_min_size;
 	int alloc_size;
 
@@ -2183,9 +2183,10 @@  static int netlink_dump(struct sock *sk)
 	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);
 
-	len = cb->dump(skb, cb);
+	if (nlk->dump_done_errno > 0)
+		nlk->dump_done_errno = cb->dump(skb, cb);
 
-	if (len > 0) {
+	if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
 		mutex_unlock(nlk->cb_mutex);
 
 		if (sk_filter(sk, skb))
@@ -2195,13 +2196,13 @@  static int netlink_dump(struct sock *sk)
 		return 0;
 	}
 
-	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-	if (!nlh)
+	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+	if (WARN_ON(!nlh))
 		goto errout_skb;
 
 	nl_dump_check_consistent(cb, nlh);
 
-	memcpy(nlmsg_data(nlh), &len, sizeof(len));
+	memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, sizeof(nlk->dump_done_errno));
 
 	if (sk_filter(sk, skb))
 		kfree_skb(skb);
@@ -2273,6 +2274,7 @@  int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	}
 
 	nlk->cb_running = true;
+	nlk->dump_done_errno = INT_MAX;
 
 	mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@  struct netlink_sock {
 	wait_queue_head_t	wait;
 	bool			bound;
 	bool			cb_running;
+	int			dump_done_errno;
 	struct netlink_callback	cb;
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;