Message ID | 20170408202434.12018-2-johannes@sipsolutions.net |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On 4/8/17 2:24 PM, Johannes Berg wrote: > @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err) > NLMSG_ERROR, payload, 0); > errmsg = nlmsg_data(rep); > errmsg->error = err; > - memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh->nlmsg_len : sizeof(*nlh)); > + memcpy(&errmsg->msg, nlh, > + !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len > + : sizeof(*nlh)); > + generically this makes userspace parsing more problematic: the parsing layer may not know if the socket option has been set to precisely know the size of errmsg->msg and how much data needs to be skipped to get to the new attributes.
On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote: > On 4/8/17 2:24 PM, Johannes Berg wrote: > > @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, > > struct nlmsghdr *nlh, int err) > > NLMSG_ERROR, payload, 0); > > errmsg = nlmsg_data(rep); > > errmsg->error = err; > > - memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh- > > >nlmsg_len : sizeof(*nlh)); > > + memcpy(&errmsg->msg, nlh, > > + !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len > > + : sizeof(*nlh)); > > + > > generically this makes userspace parsing more problematic: the > parsing layer may not know if the socket option has been set to > precisely know the size of errmsg->msg and how much data needs to be > skipped to get to the new attributes. Yes, I know. I'd hope that userspace can remember that per socket - I don't see a good other way to do this. If we insert the TLVs in front of, or instead of (with a TLV containing it), the request message then at least libnl's debugging will need to be changed. As it is, I can assume that libnl will not set the CAP setting, and everything works fine even if I don't change libnl, which makes things easier. Do you have any better ideas? johannes
On 4/10/17 9:30 AM, Johannes Berg wrote: > On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote: >> On 4/8/17 2:24 PM, Johannes Berg wrote: >>> @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, >>> struct nlmsghdr *nlh, int err) >>> NLMSG_ERROR, payload, 0); >>> errmsg = nlmsg_data(rep); >>> errmsg->error = err; >>> - memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh- >>>> nlmsg_len : sizeof(*nlh)); >>> + memcpy(&errmsg->msg, nlh, >>> + !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len >>> + : sizeof(*nlh)); >>> + >> >> generically this makes userspace parsing more problematic: the >> parsing layer may not know if the socket option has been set to >> precisely know the size of errmsg->msg and how much data needs to be >> skipped to get to the new attributes. > > Yes, I know. I'd hope that userspace can remember that per socket - I > don't see a good other way to do this. > > If we insert the TLVs in front of, or instead of (with a TLV containing > it), the request message then at least libnl's debugging will need to > be changed. > > As it is, I can assume that libnl will not set the CAP setting, and > everything works fine even if I don't change libnl, which makes things > easier. > > Do you have any better ideas? NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if one is set the other can not be set. CAP_ACK means abbreviate the response and EXT_ACK means give me more data.
On Mon, 2017-04-10 at 09:35 -0600, David Ahern wrote: > > Do you have any better ideas? > > NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if > one is set the other can not be set. CAP_ACK means abbreviate the > response and EXT_ACK means give me more data. So you mean if I want EXT_ACK I cannot also cap the message? I guess that's doable, but again wouldn't it hurt applications that want to use CAP? I assume every application wants to use EXT_ACK eventually, and those that use CAP right now wouldn't be able to. Another thought: if we add a new flag that indicates "message has been capped", which we introduce in this same patch, then we can disentangle this more easily, right? Adding a new flag for "TLVs present" won't really help, but if you know the message was capped then you know the TLVs start after the inner nlmsghdr and you ignore that header's nlmsg_len. johannes
On Mon, 2017-04-10 at 17:40 +0200, Johannes Berg wrote: > > Another thought: if we add a new flag that indicates "message has > been capped", which we introduce in this same patch, then we can > disentangle this more easily, right? > > Adding a new flag for "TLVs present" won't really help, but if you > know the message was capped then you know the TLVs start after the > inner nlmsghdr and you ignore that header's nlmsg_len. Actually, the flag should be set if (and only if) the message was capped *and* TLVs were requested (or present, doesn't matter.) That way it becomes completely backward compatible and stateless: * on kernels that don't have extack you can ignore the setsockopt failure * checking if TLVs are present becomes flag set || nlh->nlmsg_len > sizeof(*nlh) + sizeof(int) + sizeof(*inner_nlh) + inner_nlh->nlmsg_len * TLV start offset is tlv_start_offs = sizeof(*nlh) + sizeof(int) + sizeof(inner_nlh) if (flag set) tlv_start_offs += inner_nlh->nlmsg_len I need to resend anyway so I'll add that tomorrow. johannes
On Mon, Apr 10, 2017 at 09:35:27AM -0600, David Ahern wrote: > On 4/10/17 9:30 AM, Johannes Berg wrote: > > On Mon, 2017-04-10 at 09:26 -0600, David Ahern wrote: > >> On 4/8/17 2:24 PM, Johannes Berg wrote: > >>> @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, > >>> struct nlmsghdr *nlh, int err) > >>> NLMSG_ERROR, payload, 0); > >>> errmsg = nlmsg_data(rep); > >>> errmsg->error = err; > >>> - memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh- > >>>> nlmsg_len : sizeof(*nlh)); > >>> + memcpy(&errmsg->msg, nlh, > >>> + !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len > >>> + : sizeof(*nlh)); > >>> + > >> > >> generically this makes userspace parsing more problematic: the > >> parsing layer may not know if the socket option has been set to > >> precisely know the size of errmsg->msg and how much data needs to be > >> skipped to get to the new attributes. > > > > Yes, I know. I'd hope that userspace can remember that per socket - I > > don't see a good other way to do this. > > > > If we insert the TLVs in front of, or instead of (with a TLV containing > > it), the request message then at least libnl's debugging will need to > > be changed. > > > > As it is, I can assume that libnl will not set the CAP setting, and > > everything works fine even if I don't change libnl, which makes things > > easier. > > > > Do you have any better ideas? > > NETLINK_F_CAP_ACK and NETLINK_F_EXT_ACK should be incompatible -- if one > is set the other can not be set. CAP_ACK means abbreviate the response > and EXT_ACK means give me more data. CAP_ACK means: trim off the payload that the netlink error message is embedding, just like ICMP error does. What is exactly your concern? If the user explicitly requests this via socket option for this socket, then we're expecting they do the right handling for what they're asking for.
On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: > CAP_ACK means: trim off the payload that the netlink error message > is embedding, just like ICMP error does. > > What is exactly your concern? If the user explicitly requests this > via socket option for this socket, then we're expecting they do the > right handling for what they're asking for. I think David's concern was that when you want to parse the ACK in a library (or application), you may not easily know if the application (or library) requested capping. I've addressed this by adding two new flags now, though the CAPPED flag can only be relied on when the TLVS flag is present, but that's the only case where you care anyway. I felt that we had enough space to spend two bits rather than one, in order to not have to rely on any length calculations to see if TLVs are present - as I'd suggested in my email last night. johannes
On 4/11/17 1:02 AM, Johannes Berg wrote: > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: >> CAP_ACK means: trim off the payload that the netlink error message >> is embedding, just like ICMP error does. >> >> What is exactly your concern? If the user explicitly requests this >> via socket option for this socket, then we're expecting they do the >> right handling for what they're asking for. > > I think David's concern was that when you want to parse the ACK in a > library (or application), you may not easily know if the application > (or library) requested capping. exactly.
On Tue, Apr 11, 2017 at 08:25:57AM -0600, David Ahern wrote: > On 4/11/17 1:02 AM, Johannes Berg wrote: > > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: > >> CAP_ACK means: trim off the payload that the netlink error message > >> is embedding, just like ICMP error does. > >> > >> What is exactly your concern? If the user explicitly requests this > >> via socket option for this socket, then we're expecting they do the > >> right handling for what they're asking for. > > > > I think David's concern was that when you want to parse the ACK in a > > library (or application), you may not easily know if the application > > (or library) requested capping. > > exactly. Then, the library needs to be extended to enable this handling to modify the way it needs to handle errors, together with the setsockopt().
From: Pablo Neira Ayuso <pablo@netfilter.org> Date: Tue, 11 Apr 2017 19:31:43 +0200 > On Tue, Apr 11, 2017 at 08:25:57AM -0600, David Ahern wrote: >> On 4/11/17 1:02 AM, Johannes Berg wrote: >> > On Tue, 2017-04-11 at 08:59 +0200, Pablo Neira Ayuso wrote: >> >> CAP_ACK means: trim off the payload that the netlink error message >> >> is embedding, just like ICMP error does. >> >> >> >> What is exactly your concern? If the user explicitly requests this >> >> via socket option for this socket, then we're expecting they do the >> >> right handling for what they're asking for. >> > >> > I think David's concern was that when you want to parse the ACK in a >> > library (or application), you may not easily know if the application >> > (or library) requested capping. >> >> exactly. > > Then, the library needs to be extended to enable this handling to > modify the way it needs to handle errors, together with the > setsockopt(). That's my take on this. If there are libraries where there is a disconnect between the thing that controls the sending of the netlink request from the processing of the netlink response, that really is their problem to work out. If they wish to support extended ACK reports, they will need to sort those details out. If there is sharing of a newlink socket between different libraries, each wanting to operate in a different mode, that isn't all that reasonable to me. Often libraries can't even agree on whether they want to use a socket fd in non-blocking vs. blocking mode. David, if you have a specific case where it's absolutely impossible to resolve this when the library is converted to support extended ACKs, please mention it. Thanks.
On 4/11/17 11:42 AM, David Miller wrote: > David, if you have a specific case where it's absolutely impossible > to resolve this when the library is converted to support extended > ACKs, please mention it. I don't have a specific library in mind. It is more the disjoint nature of a socket option and message parsing, and thinking through the API.
From: David Ahern <dsa@cumulusnetworks.com> Date: Tue, 11 Apr 2017 12:57:59 -0600 > On 4/11/17 11:42 AM, David Miller wrote: >> David, if you have a specific case where it's absolutely impossible >> to resolve this when the library is converted to support extended >> ACKs, please mention it. > > I don't have a specific library in mind. It is more the disjoint nature > of a socket option and message parsing, and thinking through the API. Would you prefer it being passed via a control message at sendmsg() time? That would require more changes to userland programs, and also increase the per-request overhead in the kernel as it parses this CMSG every single time. We have a setsockopt for AF_PACKET that changes the entire layout of the packet mmap ring, nobody complains that libraries are hard to implement because of that right?
On 4/11/17 1:05 PM, David Miller wrote: > From: David Ahern <dsa@cumulusnetworks.com> > Date: Tue, 11 Apr 2017 12:57:59 -0600 > >> On 4/11/17 11:42 AM, David Miller wrote: >>> David, if you have a specific case where it's absolutely impossible >>> to resolve this when the library is converted to support extended >>> ACKs, please mention it. >> >> I don't have a specific library in mind. It is more the disjoint nature >> of a socket option and message parsing, and thinking through the API. > > Would you prefer it being passed via a control message at sendmsg() > time? That would require more changes to userland programs, and also > increase the per-request overhead in the kernel as it parses this CMSG > every single time. Johannes' v4 adds a flag in the response that the parser can use to definitely know the size of the errmsg.msg that was returned. That seems sufficient to me. Did you disagree with that suggestion?
On Tue, 2017-04-11 at 13:42 -0400, David Miller wrote: > > Then, the library needs to be extended to enable this handling to > > modify the way it needs to handle errors, together with the > > setsockopt(). So I'd tend to agree, but * it was easy to solve this, with the flags I added * the flags reduce the amount of parsing dependencies, for example, it is then no longer necessary to check if error == 0 to know if the message was capped or not (when not requested), since the flag indicates it * having the flag that TLVs were set lets us not store if the setsockopt failed or not - this can be useful in the error report case (but not in the success/cookie case) since it avoids extra state that needs to be passed around * libnl, which seems pretty common, allows just passing a single pointer around as state, which is often already used for something else, meaning a bigger refactoring can be required to pass the extra state So yes, in theory we don't need this, but in practice it does make the userland parsing code quite a bit easier. johannes
diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index a90404a0c5ff..4a44830741c1 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -483,7 +483,8 @@ static const struct crypto_link { [CRYPTO_MSG_DELRNG - CRYPTO_MSG_BASE] = { .doit = crypto_del_rng }, }; -static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { struct nlattr *attrs[CRYPTOCFGA_MAX+1]; const struct crypto_link *link; diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c index 10469b0088b5..b784055423c8 100644 --- a/drivers/infiniband/core/netlink.c +++ b/drivers/infiniband/core/netlink.c @@ -146,7 +146,8 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh, } EXPORT_SYMBOL(ibnl_put_attr); -static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { struct ibnl_client *client; int type = nlh->nlmsg_type; @@ -209,7 +210,7 @@ static void ibnl_rcv_reply_skb(struct sk_buff *skb) if (nlh->nlmsg_flags & NLM_F_REQUEST) return; - ibnl_rcv_msg(skb, nlh); + ibnl_rcv_msg(skb, nlh, NULL); msglen = NLMSG_ALIGN(nlh->nlmsg_len); if (msglen > skb->len) diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c index 109802f776ed..50e624fb8307 100644 --- a/drivers/scsi/scsi_netlink.c +++ b/drivers/scsi/scsi_netlink.c @@ -111,7 +111,7 @@ scsi_nl_rcv_msg(struct sk_buff *skb) next_msg: if ((err) || (nlh->nlmsg_flags & NLM_F_ACK)) - netlink_ack(skb, nlh, err); + netlink_ack(skb, nlh, err, NULL); skb_pull(skb, rlen); } diff --git a/include/linux/netlink.h b/include/linux/netlink.h index da14ab61f363..746359c27396 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -62,11 +62,37 @@ netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg) return __netlink_kernel_create(net, unit, THIS_MODULE, cfg); } +/** + * struct netlink_ext_ack - netlink extended ACK report struct + * @_msg: message string to report - don't access directly, use + * %NL_SET_ERR_MSG + * @bad_attr: attribute with error + * @missing_attr: number of missing attr (or 0) + */ +struct netlink_ext_ack { + const char *_msg; + const struct nlattr *bad_attr; + u16 missing_attr; +}; + +/* Always use this macro, this allows later putting the + * message into a separate section or such for things + * like translation or listing all possible messages. + * Currently string formatting is not supported (due + * to the lack of an output buffer.) + */ +#define NL_SET_ERR_MSG(extack, msg) do { \ + static const char *_msg = (msg); \ + \ + (extack)->_msg = _msg; \ +} while (0) + extern void netlink_kernel_release(struct sock *sk); extern int __netlink_change_ngroups(struct sock *sk, unsigned int groups); extern int netlink_change_ngroups(struct sock *sk, unsigned int groups); extern void __netlink_clear_multicast_users(struct sock *sk, unsigned int group); -extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err); +extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err, + const struct netlink_ext_ack *extack); extern int netlink_has_listeners(struct sock *sk, unsigned int group); extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 portid, int nonblock); diff --git a/include/net/netlink.h b/include/net/netlink.h index b239fcd33d80..a064ec3e2ee1 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -233,7 +233,8 @@ struct nl_info { }; int netlink_rcv_skb(struct sk_buff *skb, - int (*cb)(struct sk_buff *, struct nlmsghdr *)); + int (*cb)(struct sk_buff *, struct nlmsghdr *, + struct netlink_ext_ack *)); int nlmsg_notify(struct sock *sk, struct sk_buff *skb, u32 portid, unsigned int group, int report, gfp_t flags); diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index f3946a27bd07..5f1a48b140c8 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -101,6 +101,34 @@ struct nlmsghdr { struct nlmsgerr { int error; struct nlmsghdr msg; + /* + * followed by the message contents unless NETLINK_CAP_ACK was set, + * message length is aligned with NLMSG_ALIGN() + */ + /* + * followed by TLVs defined in enum nlmsgerr_attrs + * if NETLINK_EXT_ACK was set + */ +}; + +/** + * enum nlmsgerr_attrs - nlmsgerr attributes + * @NLMSGERR_ATTR_UNUSED: unused + * @NLMSGERR_ATTR_MSG: error message string (string) + * @NLMSGERR_ATTR_OFFS: error offset in the original message (u32) + * @NLMSGERR_ATTR_ATTR: top-level attribute that caused the error + * (or is missing, u16) + * @NUM_NLMSGERR_ATTRS: number of attributes + * @NLMSGERR_ATTR_MAX: highest attribute number + */ +enum nlmsgerr_attrs { + NLMSGERR_ATTR_UNUSED, + NLMSGERR_ATTR_MSG, + NLMSGERR_ATTR_OFFS, + NLMSGERR_ATTR_ATTR, + + NUM_NLMSGERR_ATTRS, + NLMSGERR_ATTR_MAX = NUM_NLMSGERR_ATTRS - 1 }; #define NETLINK_ADD_MEMBERSHIP 1 @@ -115,6 +143,7 @@ struct nlmsgerr { #define NETLINK_LISTEN_ALL_NSID 8 #define NETLINK_LIST_MEMBERSHIPS 9 #define NETLINK_CAP_ACK 10 +#define NETLINK_EXT_ACK 11 struct nl_pktinfo { __u32 group; diff --git a/kernel/audit.c b/kernel/audit.c index 2f4964cfde0b..d54bf5932374 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1402,7 +1402,7 @@ static void audit_receive_skb(struct sk_buff *skb) err = audit_receive_msg(skb, nlh); /* if err or if this message says it wants a response */ if (err || (nlh->nlmsg_flags & NLM_F_ACK)) - netlink_ack(skb, nlh, err); + netlink_ack(skb, nlh, err, NULL); nlh = nlmsg_next(nlh, &len); } diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index b2bd4c9ee860..9788147241f4 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -4120,7 +4120,8 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) /* Process one rtnetlink message. */ -static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); rtnl_doit_func doit; diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c index fb9d0e2fd148..217f4e3b82f6 100644 --- a/net/core/sock_diag.c +++ b/net/core/sock_diag.c @@ -238,7 +238,8 @@ static int __sock_diag_cmd(struct sk_buff *skb, struct nlmsghdr *nlh) return err; } -static int sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { int ret; diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c index 85f2fdc360c2..c8bf5136a72b 100644 --- a/net/decnet/netfilter/dn_rtmsg.c +++ b/net/decnet/netfilter/dn_rtmsg.c @@ -96,7 +96,7 @@ static unsigned int dnrmg_hook(void *priv, } -#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while (0) +#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err), NULL); return; } while (0) static inline void dnrmg_receive_user_skb(struct sk_buff *skb) { diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c index 1ab30e7d3f99..81dac16933fc 100644 --- a/net/hsr/hsr_netlink.c +++ b/net/hsr/hsr_netlink.c @@ -350,7 +350,7 @@ static int hsr_get_node_status(struct sk_buff *skb_in, struct genl_info *info) return 0; invalid: - netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL); + netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL, NULL); return 0; nla_put_failure: @@ -432,7 +432,7 @@ static int hsr_get_node_list(struct sk_buff *skb_in, struct genl_info *info) return 0; invalid: - netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL); + netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL, NULL); return 0; nla_put_failure: diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index c296f9b606d4..26356bf8cebf 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1305,7 +1305,7 @@ ip_set_dump_start(struct sk_buff *skb, struct netlink_callback *cb) * manually :-( */ if (nlh->nlmsg_flags & NLM_F_ACK) - netlink_ack(cb->skb, nlh, ret); + netlink_ack(cb->skb, nlh, ret, NULL); return ret; } } diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 68eda920160e..181d3bb800e6 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -148,7 +148,8 @@ int nfnetlink_unicast(struct sk_buff *skb, struct net *net, u32 portid, EXPORT_SYMBOL_GPL(nfnetlink_unicast); /* Process one complete nfnetlink message. */ -static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); const struct nfnl_callback *nc; @@ -261,7 +262,7 @@ static void nfnl_err_deliver(struct list_head *err_list, struct sk_buff *skb) struct nfnl_err *nfnl_err, *next; list_for_each_entry_safe(nfnl_err, next, err_list, head) { - netlink_ack(skb, nfnl_err->nlh, nfnl_err->err); + netlink_ack(skb, nfnl_err->nlh, nfnl_err->err, NULL); nfnl_err_del(nfnl_err); } } @@ -284,13 +285,13 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh, int err; if (subsys_id >= NFNL_SUBSYS_COUNT) - return netlink_ack(skb, nlh, -EINVAL); + return netlink_ack(skb, nlh, -EINVAL, NULL); replay: status = 0; skb = netlink_skb_clone(oskb, GFP_KERNEL); if (!skb) - return netlink_ack(oskb, nlh, -ENOMEM); + return netlink_ack(oskb, nlh, -ENOMEM, NULL); nfnl_lock(subsys_id); ss = nfnl_dereference_protected(subsys_id); @@ -304,20 +305,20 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh, #endif { nfnl_unlock(subsys_id); - netlink_ack(oskb, nlh, -EOPNOTSUPP); + netlink_ack(oskb, nlh, -EOPNOTSUPP, NULL); return kfree_skb(skb); } } if (!ss->commit || !ss->abort) { nfnl_unlock(subsys_id); - netlink_ack(oskb, nlh, -EOPNOTSUPP); + netlink_ack(oskb, nlh, -EOPNOTSUPP, NULL); return kfree_skb(skb); } if (genid && ss->valid_genid && !ss->valid_genid(net, genid)) { nfnl_unlock(subsys_id); - netlink_ack(oskb, nlh, -ERESTART); + netlink_ack(oskb, nlh, -ERESTART, NULL); return kfree_skb(skb); } @@ -407,7 +408,8 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh, * pointing to the batch header. */ nfnl_err_reset(&err_list); - netlink_ack(oskb, nlmsg_hdr(oskb), -ENOMEM); + netlink_ack(oskb, nlmsg_hdr(oskb), -ENOMEM, + NULL); status |= NFNL_BATCH_FAILURE; goto done; } @@ -467,7 +469,7 @@ static void nfnetlink_rcv_skb_batch(struct sk_buff *skb, struct nlmsghdr *nlh) err = nla_parse(cda, NFNL_BATCH_MAX, attr, attrlen, nfnl_batch_policy); if (err < 0) { - netlink_ack(skb, nlh, err); + netlink_ack(skb, nlh, err, NULL); return; } if (cda[NFNL_BATCH_GENID]) @@ -493,7 +495,7 @@ static void nfnetlink_rcv(struct sk_buff *skb) return; if (!netlink_net_capable(skb, CAP_NET_ADMIN)) { - netlink_ack(skb, nlh, -EPERM); + netlink_ack(skb, nlh, -EPERM, NULL); return; } diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index fc232441cf23..c74f56a4fcf1 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1652,6 +1652,13 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname, nlk->flags &= ~NETLINK_F_CAP_ACK; err = 0; break; + case NETLINK_EXT_ACK: + if (val) + nlk->flags |= NETLINK_F_EXT_ACK; + else + nlk->flags &= ~NETLINK_F_EXT_ACK; + err = 0; + break; default: err = -ENOPROTOOPT; } @@ -1736,6 +1743,16 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname, return -EFAULT; err = 0; break; + case NETLINK_EXT_ACK: + if (len < sizeof(int)) + return -EINVAL; + len = sizeof(int); + val = nlk->flags & NETLINK_F_EXT_ACK ? 1 : 0; + if (put_user(len, optlen) || + put_user(val, optval)) + return -EFAULT; + err = 0; + break; default: err = -ENOPROTOOPT; } @@ -2267,21 +2284,36 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } EXPORT_SYMBOL(__netlink_dump_start); -void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err) +void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err, + const struct netlink_ext_ack *extack) { struct sk_buff *skb; struct nlmsghdr *rep; struct nlmsgerr *errmsg; size_t payload = sizeof(*errmsg); + size_t acksize = sizeof(*errmsg); struct netlink_sock *nlk = nlk_sk(NETLINK_CB(in_skb).sk); /* Error messages get the original request appened, unless the user - * requests to cap the error message. + * requests to cap the error message, and get extra error data if + * requested. */ - if (!(nlk->flags & NETLINK_F_CAP_ACK) && err) - payload += nlmsg_len(nlh); + if (err) { + if (!(nlk->flags & NETLINK_F_CAP_ACK)) + payload += nlmsg_len(nlh); + acksize = payload; + if (nlk->flags & NETLINK_F_EXT_ACK && extack) { + if (extack->_msg) + acksize += + nla_total_size(strlen(extack->_msg) + 1); + if (extack->bad_attr) + acksize += nla_total_size(sizeof(u32)); + if (extack->missing_attr) + acksize += nla_total_size(sizeof(u16)); + } + } - skb = nlmsg_new(payload, GFP_KERNEL); + skb = nlmsg_new(acksize, GFP_KERNEL); if (!skb) { struct sock *sk; @@ -2300,14 +2332,35 @@ void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err) NLMSG_ERROR, payload, 0); errmsg = nlmsg_data(rep); errmsg->error = err; - memcpy(&errmsg->msg, nlh, payload > sizeof(*errmsg) ? nlh->nlmsg_len : sizeof(*nlh)); + memcpy(&errmsg->msg, nlh, + !(nlk->flags & NETLINK_F_CAP_ACK) ? nlh->nlmsg_len + : sizeof(*nlh)); + + if (err && nlk->flags & NETLINK_F_EXT_ACK && extack) { + if (extack->_msg) + WARN_ON(nla_put_string(skb, NLMSGERR_ATTR_MSG, + extack->_msg)); + if (extack->bad_attr && + !WARN_ON((u8 *)extack->bad_attr < in_skb->data || + (u8 *)extack->bad_attr >= in_skb->data + + in_skb->len)) + WARN_ON(nla_put_u32(skb, NLMSGERR_ATTR_OFFS, + (u8 *)extack->bad_attr - + in_skb->data)); + if (extack->missing_attr) + WARN_ON(nla_put_u16(skb, NLMSGERR_ATTR_ATTR, + extack->missing_attr)); + } + netlink_unicast(in_skb->sk, skb, NETLINK_CB(in_skb).portid, MSG_DONTWAIT); } EXPORT_SYMBOL(netlink_ack); int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *, - struct nlmsghdr *)) + struct nlmsghdr *, + struct netlink_ext_ack *)) { + struct netlink_ext_ack extack = {}; struct nlmsghdr *nlh; int err; @@ -2328,13 +2381,13 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *, if (nlh->nlmsg_type < NLMSG_MIN_TYPE) goto ack; - err = cb(skb, nlh); + err = cb(skb, nlh, &extack); if (err == -EINTR) goto skip; ack: if (nlh->nlmsg_flags & NLM_F_ACK || err) - netlink_ack(skb, nlh, err); + netlink_ack(skb, nlh, err, &extack); skip: msglen = NLMSG_ALIGN(nlh->nlmsg_len); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index f792f8d7f982..3490f2430532 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -13,6 +13,7 @@ #define NETLINK_F_RECV_NO_ENOBUFS 0x8 #define NETLINK_F_LISTEN_ALL_NSID 0x10 #define NETLINK_F_CAP_ACK 0x20 +#define NETLINK_F_EXT_ACK 0x40 #define NLGRPSZ(x) (ALIGN(x, sizeof(unsigned long) * 8) / 8) #define NLGRPLONGS(x) (NLGRPSZ(x)/sizeof(unsigned long)) diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 92e0981f7404..57b2e3648bc0 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -605,7 +605,8 @@ static int genl_family_rcv_msg(const struct genl_family *family, return err; } -static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { const struct genl_family *family; int err; diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 40a8aa39220d..1ba8c115a993 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2448,7 +2448,8 @@ static const struct xfrm_link { [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo }, }; -static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) { struct net *net = sock_net(skb->sk); struct nlattr *attrs[XFRMA_MAX+1];