Message ID | 1461013819-23223-1-git-send-email-roopa@cumulusnetworks.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote: > + if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) { > + struct rtnl_link_stats64 *sp; > + > + attr = nla_reserve(skb, IFLA_STATS_LINK_64, > + sizeof(struct rtnl_link_stats64)); > + if (!attr) > + goto nla_put_failure; > + > + sp = nla_data(attr); Are you sure we have a guarantee that sp is aligned to u64 fields ? x86 does not care, but some arches would have a potential misalign access here. > + dev_get_stats(dev, sp); > + }
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 18 Apr 2016 14:35:56 -0700 > On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote: > >> + if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) { >> + struct rtnl_link_stats64 *sp; >> + >> + attr = nla_reserve(skb, IFLA_STATS_LINK_64, >> + sizeof(struct rtnl_link_stats64)); >> + if (!attr) >> + goto nla_put_failure; >> + >> + sp = nla_data(attr); > > Are you sure we have a guarantee that sp is aligned to u64 fields ? > > x86 does not care, but some arches would have a potential misalign > access here. I'll do some testing on sparc and deal with any fallout.
From: David Miller <davem@davemloft.net> Date: Mon, 18 Apr 2016 20:57:55 -0400 (EDT) > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 18 Apr 2016 14:35:56 -0700 > >> On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote: >> >>> + if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) { >>> + struct rtnl_link_stats64 *sp; >>> + >>> + attr = nla_reserve(skb, IFLA_STATS_LINK_64, >>> + sizeof(struct rtnl_link_stats64)); >>> + if (!attr) >>> + goto nla_put_failure; >>> + >>> + sp = nla_data(attr); >> >> Are you sure we have a guarantee that sp is aligned to u64 fields ? >> >> x86 does not care, but some arches would have a potential misalign >> access here. > > I'll do some testing on sparc and deal with any fallout. Just thinking out loud before I start testing, yeah I think you are right. nlmsghdr is 64-bit aligned, but the nlattr header is 32-bit which will thus make the attribute data area not be aligned. I think the time has probably come to have a new netlink attribute format that doesn't have this multi-decade old problem. There are a lot of bits left in nla_type, we can use one to indicate that the nlattr struct is another 4 bytes in length in order to archieve proper alignment of the payload data. +struct nlattr64 { + __u16 nla_len; + __u16 nla_type; + __u32 nla_pad; +}; ... +#define NLA_F_64BIT_ALIGNED (1 << 13) -#define NLA_TYPE_MASK ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER) +#define NLA_TYPE_MASK ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER | NLA_F_64BIT_ALIGNED) ... #define NLA64_ALIGNTO 8 #define NLA64_ALIGN(len) (((len) + NLA64_ALIGNTO - 1) & ~(NLA64_ALIGNTO - 1)) #define NLA64_HDRLEN ((int) NLA64_ALIGN(sizeof(struct nlattr64))) We're going to need some new nla64_*() helpers and code added to some of the existing ones to test that new bit. For example, nla_data would now be: static inline void *nla_data(const struct nlattr *nla) { if (nla->nla_type & NLA_F_64BIT_ALIGNED) return (char *) nla + NLA64_HDRLEN; else return (char *) nla + NLA_HDRLEN; } nla_len would be: static inline int nla_len(const struct nlattr *nla) { int hdrlen = NLA_HDRLEN; if (nla->nla_type & NLA_F_64BIT_ALIGNED) hdrlen = NLA64_hdrlen; return nla->nla_len - hdrlen; } etc. etc. And anyways, I get unaligned accesses without Roopa's changes :-/ davem@patience:~$ ip l l [3391066.656729] Kernel unaligned access at TPC[7d6c14] loopback_get_stats64+0x74/0xa0 [3391066.672020] Kernel unaligned access at TPC[7d6c18] loopback_get_stats64+0x78/0xa0 [3391066.687282] Kernel unaligned access at TPC[7d6c1c] loopback_get_stats64+0x7c/0xa0 [3391066.702573] Kernel unaligned access at TPC[7d6c20] loopback_get_stats64+0x80/0xa0 [3391066.717858] Kernel unaligned access at TPC[8609dc] dev_get_stats+0x3c/0xe0
On Mon, 2016-04-18 at 21:48 -0400, David Miller wrote: > > And anyways, I get unaligned accesses without Roopa's changes :-/ > > davem@patience:~$ ip l l > [3391066.656729] Kernel unaligned access at TPC[7d6c14] loopback_get_stats64+0x74/0xa0 > [3391066.672020] Kernel unaligned access at TPC[7d6c18] loopback_get_stats64+0x78/0xa0 > [3391066.687282] Kernel unaligned access at TPC[7d6c1c] loopback_get_stats64+0x7c/0xa0 > [3391066.702573] Kernel unaligned access at TPC[7d6c20] loopback_get_stats64+0x80/0xa0 > [3391066.717858] Kernel unaligned access at TPC[8609dc] dev_get_stats+0x3c/0xe0 Yes, rtnl_fill_stats() probably has the same mistake. commit 550bce59baf3f3059cd4ae1e268f08f2d2cb1d5c Author: Roopa Prabhu <roopa@cumulusnetworks.com> Date: Fri Apr 15 20:36:25 2016 -0700 rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy This patch passes netlink attr data ptr directly to dev_get_stats thus elimiating a stats copy. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
On 4/18/16, 5:57 PM, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 18 Apr 2016 14:35:56 -0700 > >> On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote: >> >>> + if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) { >>> + struct rtnl_link_stats64 *sp; >>> + >>> + attr = nla_reserve(skb, IFLA_STATS_LINK_64, >>> + sizeof(struct rtnl_link_stats64)); >>> + if (!attr) >>> + goto nla_put_failure; >>> + >>> + sp = nla_data(attr); >> Are you sure we have a guarantee that sp is aligned to u64 fields ? >> >> x86 does not care, but some arches would have a potential misalign >> access here. > I'll do some testing on sparc and deal with any fallout. Thanks David. I will be happy to respin if you see problems.
On 4/18/16, 7:22 PM, Eric Dumazet wrote: > On Mon, 2016-04-18 at 21:48 -0400, David Miller wrote: > >> And anyways, I get unaligned accesses without Roopa's changes :-/ >> >> davem@patience:~$ ip l l >> [3391066.656729] Kernel unaligned access at TPC[7d6c14] loopback_get_stats64+0x74/0xa0 >> [3391066.672020] Kernel unaligned access at TPC[7d6c18] loopback_get_stats64+0x78/0xa0 >> [3391066.687282] Kernel unaligned access at TPC[7d6c1c] loopback_get_stats64+0x7c/0xa0 >> [3391066.702573] Kernel unaligned access at TPC[7d6c20] loopback_get_stats64+0x80/0xa0 >> [3391066.717858] Kernel unaligned access at TPC[8609dc] dev_get_stats+0x3c/0xe0 > Yes, rtnl_fill_stats() probably has the same mistake. > > commit 550bce59baf3f3059cd4ae1e268f08f2d2cb1d5c > Author: Roopa Prabhu <roopa@cumulusnetworks.com> > Date: Fri Apr 15 20:36:25 2016 -0700 > > rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy > > This patch passes netlink attr data ptr directly to dev_get_stats > thus elimiating a stats copy. > > Suggested-by: David Miller <davem@davemloft.net> > Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > > David, if you revert the one in rtnl_fill_stats, i will take care of the dev_get_stats in RTM_GETSTATS in v6. thanks, Roopa
From: Roopa Prabhu <roopa@cumulusnetworks.com> Date: Mon, 18 Apr 2016 19:40:08 -0700 > On 4/18/16, 7:22 PM, Eric Dumazet wrote: >> On Mon, 2016-04-18 at 21:48 -0400, David Miller wrote: >> >>> And anyways, I get unaligned accesses without Roopa's changes :-/ >>> >>> davem@patience:~$ ip l l >>> [3391066.656729] Kernel unaligned access at TPC[7d6c14] loopback_get_stats64+0x74/0xa0 >>> [3391066.672020] Kernel unaligned access at TPC[7d6c18] loopback_get_stats64+0x78/0xa0 >>> [3391066.687282] Kernel unaligned access at TPC[7d6c1c] loopback_get_stats64+0x7c/0xa0 >>> [3391066.702573] Kernel unaligned access at TPC[7d6c20] loopback_get_stats64+0x80/0xa0 >>> [3391066.717858] Kernel unaligned access at TPC[8609dc] dev_get_stats+0x3c/0xe0 >> Yes, rtnl_fill_stats() probably has the same mistake. >> >> commit 550bce59baf3f3059cd4ae1e268f08f2d2cb1d5c >> Author: Roopa Prabhu <roopa@cumulusnetworks.com> >> Date: Fri Apr 15 20:36:25 2016 -0700 >> >> rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy >> >> This patch passes netlink attr data ptr directly to dev_get_stats >> thus elimiating a stats copy. >> >> Suggested-by: David Miller <davem@davemloft.net> >> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> >> > David, if you revert the one in rtnl_fill_stats, i will take care of the dev_get_stats in RTM_GETSTATS in v6. Don't need to revert, see my idea with the IFLA_PAD attribute. :-)
From: David Miller <davem@davemloft.net> Date: Mon, 18 Apr 2016 21:48:51 -0400 (EDT) > I think the time has probably come to have a new netlink attribute > format that doesn't have this multi-decade old problem. ... Scratch this whole idea, I think the padding attribute works a lot better and is backwards compatible with every properly written netlink application.
+ selinux maintainers Le 18/04/2016 23:10, Roopa Prabhu a écrit : [snip] > diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c > index 8495b93..1714633 100644 > --- a/security/selinux/nlmsgtab.c > +++ b/security/selinux/nlmsgtab.c > @@ -76,6 +76,8 @@ static struct nlmsg_perm nlmsg_route_perms[] = > { RTM_NEWNSID, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, > { RTM_DELNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, > { RTM_GETNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, > + { RTM_NEWSTATS, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, I would say it's NETLINK_ROUTE_SOCKET__NLMSG_READ, not WRITE. This command is only sent by the kernel, not by the userland.
On Mon, 2016-04-18 at 23:52 -0400, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Mon, 18 Apr 2016 21:48:51 -0400 (EDT) > > > I think the time has probably come to have a new netlink attribute > > format that doesn't have this multi-decade old problem. > ... > > Scratch this whole idea, I think the padding attribute works a lot > better and is backwards compatible with every properly written > netlink application. This talk about attribute flags reminded me about something else. At netconf, we talked about how awkward it can be that one doesn't know if an attribute was accepted by the kernel or simply ignored because it's not supported (older kernel version). I considered (and Emmanuel has started to cook up a patch for it) adding a flag here meaning "reject if not parsed" (or so). Now, I realize that with all the existing netlink commands one can't actually rely on that (since it requires some infrastructure), but new commands in existing families and also entirely new generic netlink families would allow let userspace rely on such a thing. Thoughts? johannes
From: Johannes Berg <johannes@sipsolutions.net> Date: Tue, 19 Apr 2016 12:09:03 +0200 > At netconf, we talked about how awkward it can be that one doesn't know > if an attribute was accepted by the kernel or simply ignored because > it's not supported (older kernel version). > > I considered (and Emmanuel has started to cook up a patch for it) > adding a flag here meaning "reject if not parsed" (or so). > > Now, I realize that with all the existing netlink commands one can't > actually rely on that (since it requires some infrastructure), but new > commands in existing families and also entirely new generic netlink > families would allow let userspace rely on such a thing. I like this nlattr flag idea, it's opt-in and any tool can be updated to use the new facility. I'd be willing the backport the nlattr flag bit change to all stable releases as well.
On Tue, 2016-04-19 at 14:23 -0400, David Miller wrote: > > I like this nlattr flag idea, it's opt-in and any tool can be updated > to use the new facility. Right. > I'd be willing the backport the nlattr flag bit change to all stable > releases as well. I'm not really convinced that helps much - tools still can't really rely on the kernel supporting it. One thing that might work is that a tool might say it only wants to support kernels that have this change (assuming we backport it to enough kernels etc.); in that case the tool could add some absolutely must-have information (like the IFINDEX or whatever, depends on the command) with the new flag, this would get rejected since unpatched kernels wouldn't understand the flag and wouldn't find that must-have information. Nevertheless, I think it's most reliable with new netlink commands that are known to be available only on kernels understand and treating the flag correctly. johannes
On Tue, Apr 19, 2016 at 4:26 AM, Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote: > + selinux maintainers > > Le 18/04/2016 23:10, Roopa Prabhu a écrit : > [snip] >> >> diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c >> index 8495b93..1714633 100644 >> --- a/security/selinux/nlmsgtab.c >> +++ b/security/selinux/nlmsgtab.c >> @@ -76,6 +76,8 @@ static struct nlmsg_perm nlmsg_route_perms[] = >> { RTM_NEWNSID, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, >> { RTM_DELNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, >> { RTM_GETNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, >> + { RTM_NEWSTATS, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, > > I would say it's NETLINK_ROUTE_SOCKET__NLMSG_READ, not WRITE. This command > is only sent by the kernel, not by the userland. From what I could tell from the patch description, it looks like RTM_NEWSTATS only dumps stats to userspace and doesn't alter the state of the kernel, is that correct? If so, then yes, NLMSG__READ is the right SELinux permission. However, if RTM_NEWSTATS does alter the state/configuration of the kernel then we should use NLMSG__WRITE.
On 4/19/16, 12:55 PM, Paul Moore wrote: > On Tue, Apr 19, 2016 at 4:26 AM, Nicolas Dichtel > <nicolas.dichtel@6wind.com> wrote: >> + selinux maintainers >> >> Le 18/04/2016 23:10, Roopa Prabhu a écrit : >> [snip] >>> diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c >>> index 8495b93..1714633 100644 >>> --- a/security/selinux/nlmsgtab.c >>> +++ b/security/selinux/nlmsgtab.c >>> @@ -76,6 +76,8 @@ static struct nlmsg_perm nlmsg_route_perms[] = >>> { RTM_NEWNSID, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, >>> { RTM_DELNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, >>> { RTM_GETNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, >>> + { RTM_NEWSTATS, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, >> I would say it's NETLINK_ROUTE_SOCKET__NLMSG_READ, not WRITE. This command >> is only sent by the kernel, not by the userland. > From what I could tell from the patch description, it looks like > RTM_NEWSTATS only dumps stats to userspace and doesn't alter the state > of the kernel, is that correct? If so, then yes, NLMSG__READ is the > right SELinux permission. However, if RTM_NEWSTATS does alter the > state/configuration of the kernel then we should use NLMSG__WRITE. > okay, will change it to READ in the next version, thanks.
On 4/19/16 1:41 PM, Johannes Berg wrote: > On Tue, 2016-04-19 at 14:23 -0400, David Miller wrote: >> >> I like this nlattr flag idea, it's opt-in and any tool can be updated >> to use the new facility. > > Right. > >> I'd be willing the backport the nlattr flag bit change to all stable >> releases as well. > > I'm not really convinced that helps much - tools still can't really > rely on the kernel supporting it. > > One thing that might work is that a tool might say it only wants to > support kernels that have this change (assuming we backport it to > enough kernels etc.); in that case the tool could add some absolutely > must-have information (like the IFINDEX or whatever, depends on the > command) with the new flag, this would get rejected since unpatched > kernels wouldn't understand the flag and wouldn't find that must-have > information. > > Nevertheless, I think it's most reliable with new netlink commands that > are known to be available only on kernels understand and treating the > flag correctly. The kernel can set a flag in the response that it acknowledges the new attribute/flag. I did that for filtering neigh dumps -- 21fdd092acc7.
On Tue, 2016-04-19 at 19:53 -0600, David Ahern wrote: > > The kernel can set a flag in the response that it acknowledges the > new attribute/flag. I did that for filtering neigh dumps -- > 21fdd092acc7. > Hm, that works, but I think it requires writing extra code, which I was kinda trying to avoid. With the patch that Emmanuel wrote, we can restrict the changes to just nla_parse(). Anyway, I think we just have to document the behaviour very precisely, and userspace can make its own decisions. Essentially, apps will have a number of choices: 1) Use the new attribute flag only with commands known to have been added after the kernel support was added. 2) Use the new attribute flag with some required attribute for existing commands, so that older kernel will not find the required attribute and will reject the operation entirely. May or may not fall back to trying the operation again without the flag. 3) Simply use the new flag and do unexpected things on kernels not supporting the rejection mechanism - not much worse than today in many cases. I guess we'll write a proper commit message and send the patch. johannes
On Wed, 20 Apr 2016 09:32:20 +0200, Johannes Berg wrote: > 2) Use the new attribute flag with some required attribute for > existing commands, so that older kernel will not find the required > attribute and will reject the operation entirely. > May or may not fall back to trying the operation again without the > flag. This is basically what I submitted half a year ago. See: http://thread.gmane.org/gmane.linux.network/382850 Jiri
On Wed, 2016-04-20 at 14:48 +0200, Jiri Benc wrote: > On Wed, 20 Apr 2016 09:32:20 +0200, Johannes Berg wrote: > > > > 2) Use the new attribute flag with some required attribute for > > existing commands, so that older kernel will not find the > > required > > attribute and will reject the operation entirely. > > May or may not fall back to trying the operation again without > > the > > flag. > This is basically what I submitted half a year ago. See: > http://thread.gmane.org/gmane.linux.network/382850 > That looks like a *huge* patchset though - whereas my proposal really required only what Emmanuel sent in this thread. It did make some assumptions, for example that any attribute lower than the "maxtype" argument to nla_parse() was understood. [1] Looks like you have this on a per-message basis. I thought it was better on an attribute basis because that's really where the issue is. You can still detect it with the per-attribute flag approach as I described in (2) - if, for your lwtunnel example, you could specify the flag on the RTA_ENCAP attribute, without which no lwtunnel can be created (if I understand the code correctly.) johannes [1] for example, if I have three attributes: enum attrs {__unused, A, B, C}; and the policy policy = { [A] = { .type = NLA_U32 }, [C] = { .type = NLA_U8 }, } and then do nla_parse(tb, 3, msg, msg_len, &policy) it would assume that "B" is valid. Since this policy is equivalent to the policy with [B] = { .type = NLA_BINARY } (minimum length 0) we could also reject anything that has type=len=0 in the policy, if the NLA_F_NET_MUST_PARSE flag is set in the nla_type. This would likely be the right approach for most netlink families, since they usually don't have holes that they actually care about - I've yet to see any attribute that's not specified at all in the policy but used anyway, normally you want some level of checking, and indicate that by using { .type = NLA_BINARY } - but other things are possible. johannes
On Wed, 20 Apr 2016 15:17:08 +0200, Johannes Berg wrote: > Looks like you have this on a per-message basis. I thought it was > better on an attribute basis because that's really where the issue is. No problem. I'm not that happy with my patchset myself. Just wanted to point it out in case it's useful. Jiri
On Wed, 2016-04-20 at 15:34 +0200, Jiri Benc wrote: > On Wed, 20 Apr 2016 15:17:08 +0200, Johannes Berg wrote: > > > > Looks like you have this on a per-message basis. I thought it was > > better on an attribute basis because that's really where the issue > > is. > No problem. I'm not that happy with my patchset myself. Just wanted > to point it out in case it's useful. Yeah, I looked at it, but I think it ended up a bit too complicated really. It does have slightly more validation in some sense, but I don't really think that justifies the complexity? No matter what, we'll always have to deal with the problem of not having this capability on older kernels. One way to work around it would be to add a new NLM_F_REQUEST2 flag, since the kernel currently requires having NLM_F_REQUEST set, NLM_F_REQUEST2 messages would be rejected by existing kernels. Dunno if it's really worth it though, I suspect that family/command-specific detection will work in practically all cases. johannes
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index bb3a90b..0762f35 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -781,4 +781,27 @@ enum { #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1) +/* STATS section */ + +struct if_stats_msg { + __u8 family; + __u8 pad1; + __u16 pad2; + __u32 ifindex; + __u32 filter_mask; +}; + +/* A stats attribute can be netdev specific or a global stat. + * For netdev stats, lets use the prefix IFLA_STATS_LINK_* + */ +enum { + IFLA_STATS_UNSPEC, + IFLA_STATS_LINK_64, + __IFLA_STATS_MAX, +}; + +#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1) + +#define IFLA_STATS_FILTER_BIT(ATTR) (1 << (ATTR)) + #endif /* _UAPI_LINUX_IF_LINK_H */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index ca764b5..cc885c4 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -139,6 +139,11 @@ enum { RTM_GETNSID = 90, #define RTM_GETNSID RTM_GETNSID + RTM_NEWSTATS = 92, +#define RTM_NEWSTATS RTM_NEWSTATS + RTM_GETSTATS = 94, +#define RTM_GETSTATS RTM_GETSTATS + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1) }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a75f7e9..fe35102 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3451,6 +3451,153 @@ out: return err; } +static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev, + int type, u32 pid, u32 seq, u32 change, + unsigned int flags, unsigned int filter_mask) +{ + struct if_stats_msg *ifsm; + struct nlmsghdr *nlh; + struct nlattr *attr; + + ASSERT_RTNL(); + + nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifsm), flags); + if (!nlh) + return -EMSGSIZE; + + ifsm = nlmsg_data(nlh); + ifsm->ifindex = dev->ifindex; + ifsm->filter_mask = filter_mask; + + if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) { + struct rtnl_link_stats64 *sp; + + attr = nla_reserve(skb, IFLA_STATS_LINK_64, + sizeof(struct rtnl_link_stats64)); + if (!attr) + goto nla_put_failure; + + sp = nla_data(attr); + dev_get_stats(dev, sp); + } + + nlmsg_end(skb, nlh); + + return 0; + +nla_put_failure: + nlmsg_cancel(skb, nlh); + + return -EMSGSIZE; +} + +static const struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = { + [IFLA_STATS_LINK_64] = { .len = sizeof(struct rtnl_link_stats64) }, +}; + +static size_t if_nlmsg_stats_size(const struct net_device *dev, + u32 filter_mask) +{ + size_t size = 0; + + if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) + size += nla_total_size(sizeof(struct rtnl_link_stats64)); + + return size; +} + +static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh) +{ + struct net *net = sock_net(skb->sk); + struct if_stats_msg *ifsm; + struct net_device *dev = NULL; + struct sk_buff *nskb; + u32 filter_mask; + int err; + + ifsm = nlmsg_data(nlh); + if (ifsm->ifindex > 0) + dev = __dev_get_by_index(net, ifsm->ifindex); + else + return -EINVAL; + + if (!dev) + return -ENODEV; + + filter_mask = ifsm->filter_mask; + if (!filter_mask) + return -EINVAL; + + nskb = nlmsg_new(if_nlmsg_stats_size(dev, filter_mask), GFP_KERNEL); + if (!nskb) + return -ENOBUFS; + + err = rtnl_fill_statsinfo(nskb, dev, RTM_NEWSTATS, + NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0, + 0, filter_mask); + if (err < 0) { + /* -EMSGSIZE implies BUG in if_nlmsg_stats_size */ + WARN_ON(err == -EMSGSIZE); + kfree_skb(nskb); + } else { + err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid); + } + + return err; +} + +static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) +{ + struct net *net = sock_net(skb->sk); + struct if_stats_msg *ifsm; + int h, s_h; + int idx = 0, s_idx; + struct net_device *dev; + struct hlist_head *head; + unsigned int flags = NLM_F_MULTI; + u32 filter_mask = 0; + int err; + + s_h = cb->args[0]; + s_idx = cb->args[1]; + + cb->seq = net->dev_base_seq; + + ifsm = nlmsg_data(cb->nlh); + filter_mask = ifsm->filter_mask; + if (!filter_mask) + return -EINVAL; + + for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { + idx = 0; + head = &net->dev_index_head[h]; + hlist_for_each_entry(dev, head, index_hlist) { + if (idx < s_idx) + goto cont; + err = rtnl_fill_statsinfo(skb, dev, RTM_NEWSTATS, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, 0, + flags, filter_mask); + /* If we ran out of room on the first message, + * we're in trouble + */ + WARN_ON((err == -EMSGSIZE) && (skb->len == 0)); + + if (err < 0) + goto out; + + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); +cont: + idx++; + } + } +out: + cb->args[1] = idx; + cb->args[0] = h; + + return skb->len; +} + /* Process one rtnetlink message. */ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) @@ -3600,4 +3747,7 @@ void __init rtnetlink_init(void) rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, rtnl_bridge_getlink, NULL); rtnl_register(PF_BRIDGE, RTM_DELLINK, rtnl_bridge_dellink, NULL, NULL); rtnl_register(PF_BRIDGE, RTM_SETLINK, rtnl_bridge_setlink, NULL, NULL); + + rtnl_register(PF_UNSPEC, RTM_GETSTATS, rtnl_stats_get, rtnl_stats_dump, + NULL); } diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c index 8495b93..1714633 100644 --- a/security/selinux/nlmsgtab.c +++ b/security/selinux/nlmsgtab.c @@ -76,6 +76,8 @@ static struct nlmsg_perm nlmsg_route_perms[] = { RTM_NEWNSID, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, { RTM_DELNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, { RTM_GETNSID, NETLINK_ROUTE_SOCKET__NLMSG_READ }, + { RTM_NEWSTATS, NETLINK_ROUTE_SOCKET__NLMSG_WRITE }, + { RTM_GETSTATS, NETLINK_ROUTE_SOCKET__NLMSG_READ }, }; static struct nlmsg_perm nlmsg_tcpdiag_perms[] = @@ -155,7 +157,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm) switch (sclass) { case SECCLASS_NETLINK_ROUTE_SOCKET: /* RTM_MAX always point to RTM_SETxxxx, ie RTM_NEWxxx + 3 */ - BUILD_BUG_ON(RTM_MAX != (RTM_NEWNSID + 3)); + BUILD_BUG_ON(RTM_MAX != (RTM_NEWSTATS + 3)); err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms, sizeof(nlmsg_route_perms)); break;