diff mbox series

rtnetlink: fix data overflow in rtnl_calcit()

Message ID 20201016020238.22445-1-zhudi21@huawei.com
State Changes Requested
Delegated to: David Miller
Headers show
Series rtnetlink: fix data overflow in rtnl_calcit() | expand

Commit Message

zhudi Oct. 16, 2020, 2:02 a.m. UTC
"ip addr show" command execute error when we have a physical
network card with number of VFs larger than 247.

The return value of if_nlmsg_size() in rtnl_calcit() will exceed
range of u16 data type when any network cards has a larger number of
VFs. rtnl_vfinfo_size() will significant increase needed dump size when
the value of num_vfs is larger.

Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
which decides the memory size needed by netlink dump and netlink_dump()
will return -EMSGSIZE because of not enough memory was allocated.

So fix it by promoting  min_dump_alloc data type to u32 to
avoid data overflow and it's also align with the data type of
struct netlink_callback{}.min_dump_alloc which is assigned by
return value of rtnl_calcit()

Signed-off-by: zhudi <zhudi21@huawei.com>
---
 include/linux/netlink.h | 2 +-
 net/core/rtnetlink.c    | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

Comments

Jesse Brandeburg Oct. 16, 2020, 9:36 p.m. UTC | #1
zhudi wrote:

> "ip addr show" command execute error when we have a physical
> network card with number of VFs larger than 247.

Oh man, this bug has been hurting us forever and I've tried to fix it
several times without much luck, so thanks for working on it!

CC: David Ahern <dsahern@gmail.com>

As he's mentioned this bug.
 
> The return value of if_nlmsg_size() in rtnl_calcit() will exceed
> range of u16 data type when any network cards has a larger number of
> VFs. rtnl_vfinfo_size() will significant increase needed dump size when
> the value of num_vfs is larger.
> 
> Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
> which decides the memory size needed by netlink dump and netlink_dump()
> will return -EMSGSIZE because of not enough memory was allocated.
> 
> So fix it by promoting  min_dump_alloc data type to u32 to
> avoid data overflow and it's also align with the data type of
> struct netlink_callback{}.min_dump_alloc which is assigned by
> return value of rtnl_calcit()

I defer to others here on whether this is an acceptable API change.

> Signed-off-by: zhudi <zhudi21@huawei.com>

Kernel documentation says for you to use your real name, please do so,
unless you're a rock star and have officially changed your name to
zhudi.

> ---
>  include/linux/netlink.h | 2 +-
>  net/core/rtnetlink.c    | 8 ++++----
>  2 files changed, 5 insertions(+), 5 deletions(-)

Does it work without any changes to iproute2?


> 
> diff --git a/include/linux/netlink.h b/include/linux/netlink.h
> index e3e49f0e5c13..0a7db41b9e42 100644
> --- a/include/linux/netlink.h
> +++ b/include/linux/netlink.h
> @@ -230,7 +230,7 @@ struct netlink_dump_control {
>  	int (*done)(struct netlink_callback *);
>  	void *data;
>  	struct module *module;
> -	u16 min_dump_alloc;
> +	u32 min_dump_alloc;
>  };

As long as nothing in the API depends on the length of this struct, it
should work.
Vladimir Oltean Oct. 16, 2020, 10:45 p.m. UTC | #2
On Fri, Oct 16, 2020 at 02:36:25PM -0700, Jesse Brandeburg wrote:
> > Signed-off-by: zhudi <zhudi21@huawei.com>
> 
> Kernel documentation says for you to use your real name, please do so,
> unless you're a rock star and have officially changed your name to
> zhudi.

Well, his real name is probably 朱棣, and the pinyin transliteration
system doesn't really insist on separating 朱 (zhu) from 棣 (di), or on
capitalizing any of twose words, so I'm not sure what your point is.
Would you prefer his sign-off to read 朱棣 <zhudi21@huawei.com>?
Jesse Brandeburg Oct. 17, 2020, 12:44 a.m. UTC | #3
Vladimir Oltean wrote:

> On Fri, Oct 16, 2020 at 02:36:25PM -0700, Jesse Brandeburg wrote:
> > > Signed-off-by: zhudi <zhudi21@huawei.com>
> > 
> > Kernel documentation says for you to use your real name, please do so,
> > unless you're a rock star and have officially changed your name to
> > zhudi.

I apologize for seeming off-putting, my goal was to add a little levity
here, but I know that email does a bad job of transmitting my intent,
and I will do better.

> 
> Well, his real name is probably 朱棣, and the pinyin transliteration
> system doesn't really insist on separating 朱 (zhu) from 棣 (di), or on
> capitalizing any of twose words, so I'm not sure what your point is.
> Would you prefer his sign-off to read 朱棣 <zhudi21@huawei.com>?

Ah, thanks Vladimir for explaining the difference. If this is common
parlance for commit messages from our Chinese developers, please forgive
me, I'm trying to balance obvious correctness against typical usage.

I'll adjust my expectations for single word names, thanks!

Jesse
Michal Kubecek Oct. 17, 2020, 12:34 p.m. UTC | #4
On Fri, Oct 16, 2020 at 10:02:38AM +0800, zhudi wrote:
> "ip addr show" command execute error when we have a physical
> network card with number of VFs larger than 247.
> 
> The return value of if_nlmsg_size() in rtnl_calcit() will exceed
> range of u16 data type when any network cards has a larger number of
> VFs. rtnl_vfinfo_size() will significant increase needed dump size when
> the value of num_vfs is larger.
> 
> Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
> which decides the memory size needed by netlink dump and netlink_dump()
> will return -EMSGSIZE because of not enough memory was allocated.
> 
> So fix it by promoting  min_dump_alloc data type to u32 to
> avoid data overflow and it's also align with the data type of
> struct netlink_callback{}.min_dump_alloc which is assigned by
> return value of rtnl_calcit()

Unfortunately this is only part of the problem. For a NIC with so many
VFs (not sure if exactly 247 but it's close to that), IFLA_VFINFO_LIST
nested attribute itself would be over 64KB long which is not possible as
attribute size is u16.

So we should rather fail in such case (except when IFLA_VFINFO_LIST
itself fits into 64KB but the whole netlink message would not) and
provide an alternative way to get information about all VFs.

Michal
Jakub Kicinski Oct. 18, 2020, 6:41 p.m. UTC | #5
On Sat, 17 Oct 2020 14:34:11 +0200 Michal Kubecek wrote:
> On Fri, Oct 16, 2020 at 10:02:38AM +0800, zhudi wrote:
> > "ip addr show" command execute error when we have a physical
> > network card with number of VFs larger than 247.
> > 
> > The return value of if_nlmsg_size() in rtnl_calcit() will exceed
> > range of u16 data type when any network cards has a larger number of
> > VFs. rtnl_vfinfo_size() will significant increase needed dump size when
> > the value of num_vfs is larger.
> > 
> > Eventually we get a wrong value of min_ifinfo_dump_size because of overflow
> > which decides the memory size needed by netlink dump and netlink_dump()
> > will return -EMSGSIZE because of not enough memory was allocated.
> > 
> > So fix it by promoting  min_dump_alloc data type to u32 to
> > avoid data overflow and it's also align with the data type of
> > struct netlink_callback{}.min_dump_alloc which is assigned by
> > return value of rtnl_calcit()  
> 
> Unfortunately this is only part of the problem. For a NIC with so many
> VFs (not sure if exactly 247 but it's close to that), IFLA_VFINFO_LIST
> nested attribute itself would be over 64KB long which is not possible as
> attribute size is u16.
> 
> So we should rather fail in such case (except when IFLA_VFINFO_LIST
> itself fits into 64KB but the whole netlink message would not) and
> provide an alternative way to get information about all VFs.

Right, we should probably move to devlink as much as possible.

zhudi, why not use size_t? Seems like the most natural fit for 
counting size.
zhudi Oct. 19, 2020, 1:35 a.m. UTC | #6
> > "ip addr show" command execute error when we have a physical network
> > card with number of VFs larger than 247.
> 
> Oh man, this bug has been hurting us forever and I've tried to fix it several
> times without much luck, so thanks for working on it!
> 
> CC: David Ahern <dsahern@gmail.com>
> 
> As he's mentioned this bug.
> ...... 
> Kernel documentation says for you to use your real name, please do so,
> unless you're a rock star and have officially changed your name to zhudi.
> 

May be I should use name such as di.zhu  to avoid unnecessary problem :)

> > ---
> >  include/linux/netlink.h | 2 +-
> >  net/core/rtnetlink.c    | 8 ++++----
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> Does it work without any changes to iproute2?

The patch only change the internal data struct currently only used by netlink_dump_start() function.
So I think this patch  has not effects on iproute2 or other packages
diff mbox series

Patch

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index e3e49f0e5c13..0a7db41b9e42 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -230,7 +230,7 @@  struct netlink_dump_control {
 	int (*done)(struct netlink_callback *);
 	void *data;
 	struct module *module;
-	u16 min_dump_alloc;
+	u32 min_dump_alloc;
 };
 
 int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 68e0682450c6..b199137ddcdf 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3709,13 +3709,13 @@  static int rtnl_dellinkprop(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return rtnl_linkprop(RTM_DELLINKPROP, skb, nlh, extack);
 }
 
-static u16 rtnl_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
+static u32 rtnl_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct net_device *dev;
 	struct nlattr *tb[IFLA_MAX+1];
 	u32 ext_filter_mask = 0;
-	u16 min_ifinfo_dump_size = 0;
+	u32 min_ifinfo_dump_size = 0;
 	int hdrlen;
 
 	/* Same kernel<->userspace interface hack as in rtnl_dump_ifinfo. */
@@ -3735,7 +3735,7 @@  static u16 rtnl_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
 	 */
 	rcu_read_lock();
 	for_each_netdev_rcu(net, dev) {
-		min_ifinfo_dump_size = max_t(u16, min_ifinfo_dump_size,
+		min_ifinfo_dump_size = max_t(u32, min_ifinfo_dump_size,
 					     if_nlmsg_size(dev,
 						           ext_filter_mask));
 	}
@@ -5494,7 +5494,7 @@  static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
 		struct sock *rtnl;
 		rtnl_dumpit_func dumpit;
-		u16 min_dump_alloc = 0;
+		u32 min_dump_alloc = 0;
 
 		link = rtnl_get_link(family, type);
 		if (!link || !link->dumpit) {