diff mbox

[RFC] rtnetlink: Add filter for VF info dump requests

Message ID 20120207223346.21320.89370.stgit@gitlad.jf.intel.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Rose, Gregory V Feb. 7, 2012, 10:33 p.m. UTC
Add a 256 bit filter to allow the user to filter which VFs' info will
get displayed during the dump info request.  This is to fix a bug in
which a an info dump request on a device with many VFs would overflow
the recvmsg buffer which is allocated to max(8192, PAGE_SIZE).  A
complimentary patch to the iproute2 ip tool will allow the user to
set/clear individual VF filters.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
---

 include/linux/if_link.h   |    6 ++++++
 include/linux/netdevice.h |    2 ++
 net/core/rtnetlink.c      |   40 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 43 insertions(+), 5 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Feb. 9, 2012, 1:10 a.m. UTC | #1
From: Greg Rose <gregory.v.rose@intel.com>
Date: Tue, 07 Feb 2012 14:33:46 -0800

> Add a 256 bit filter to allow the user to filter which VFs' info will
> get displayed during the dump info request.  This is to fix a bug in
> which a an info dump request on a device with many VFs would overflow
> the recvmsg buffer which is allocated to max(8192, PAGE_SIZE).  A
> complimentary patch to the iproute2 ip tool will allow the user to
> set/clear individual VF filters.
> 
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>

This isn't the idea.

I meant we have to add a generic u32 mask attribute to the GETLINK
requests, and it's a mask of "features".  One feature bit would be for
enabling the "VF" part of the dump, another bit would be for the
macvlan source address lists that are being proposed, etc.  Any new
auxiliarry information type we want to add to devide dumps, we have
to add a feature bit for in this mask.

And then you can't just stop there, you have to update all of the
logic related to min_ifinfo_dump_size to take into account the current
request's mask state.

In fact this means you must get rid of the min_info_dump_size global
variable altogether because the size is context dependent upon the
mask configuration of the current dump request.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rose, Gregory V Feb. 9, 2012, 4:25 p.m. UTC | #2
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, February 08, 2012 5:10 PM
> To: Rose, Gregory V
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump requests
> 
> From: Greg Rose <gregory.v.rose@intel.com>
> Date: Tue, 07 Feb 2012 14:33:46 -0800
> 
> > Add a 256 bit filter to allow the user to filter which VFs' info will
> > get displayed during the dump info request.  This is to fix a bug in
> > which a an info dump request on a device with many VFs would overflow
> > the recvmsg buffer which is allocated to max(8192, PAGE_SIZE).  A
> > complimentary patch to the iproute2 ip tool will allow the user to
> > set/clear individual VF filters.
> >
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> 
> This isn't the idea.
> 
> I meant we have to add a generic u32 mask attribute to the GETLINK
> requests, and it's a mask of "features".  One feature bit would be for
> enabling the "VF" part of the dump, another bit would be for the
> macvlan source address lists that are being proposed, etc.  Any new
> auxiliarry information type we want to add to devide dumps, we have
> to add a feature bit for in this mask.
> 
> And then you can't just stop there, you have to update all of the
> logic related to min_ifinfo_dump_size to take into account the current
> request's mask state.
> 
> In fact this means you must get rid of the min_info_dump_size global
> variable altogether because the size is context dependent upon the
> mask configuration of the current dump request.

That will change the semantics to the 'ip link show' command quite a bit.  But OK, let me see what I can do with that.

- Greg


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 9, 2012, 6:43 p.m. UTC | #3
From: "Rose, Gregory V" <gregory.v.rose@intel.com>
Date: Thu, 9 Feb 2012 16:25:31 +0000

> That will change the semantics to the 'ip link show' command quite a bit.

I think that's rubbish.

You can make "ip link show" do whatever you want, including setting all
feature bits in getlink requests, which will preserve existing behavior.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rose, Gregory V Feb. 9, 2012, 6:45 p.m. UTC | #4
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, February 09, 2012 10:44 AM
> To: Rose, Gregory V
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump requests
> 
> From: "Rose, Gregory V" <gregory.v.rose@intel.com>
> Date: Thu, 9 Feb 2012 16:25:31 +0000
> 
> > That will change the semantics to the 'ip link show' command quite a
> bit.
> 
> I think that's rubbish.
> 
> You can make "ip link show" do whatever you want, including setting all
> feature bits in getlink requests, which will preserve existing behavior.

But there will be a change to the command line semantics no?  How do I communicate which feature bits to set in getlink requests otherwise?

- Greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 9, 2012, 6:49 p.m. UTC | #5
From: "Rose, Gregory V" <gregory.v.rose@intel.com>
Date: Thu, 9 Feb 2012 18:45:59 +0000

>> -----Original Message-----
>> From: David Miller [mailto:davem@davemloft.net]
>> Sent: Thursday, February 09, 2012 10:44 AM
>> To: Rose, Gregory V
>> Cc: netdev@vger.kernel.org
>> Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump requests
>> 
>> From: "Rose, Gregory V" <gregory.v.rose@intel.com>
>> Date: Thu, 9 Feb 2012 16:25:31 +0000
>> 
>> > That will change the semantics to the 'ip link show' command quite a
>> bit.
>> 
>> I think that's rubbish.
>> 
>> You can make "ip link show" do whatever you want, including setting all
>> feature bits in getlink requests, which will preserve existing behavior.
> 
> But there will be a change to the command line semantics no?  How do
> I communicate which feature bits to set in getlink requests
> otherwise?

Make "ip" internally default to "all" feature bits.  I don't really
understand what the consternation is.  Right now, that's what happens,
the kernel dumps all the attributes including "VF" stuff.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rose, Gregory V Feb. 9, 2012, 6:57 p.m. UTC | #6
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, February 09, 2012 10:50 AM
> To: Rose, Gregory V
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump requests
> 
> From: "Rose, Gregory V" <gregory.v.rose@intel.com>
> Date: Thu, 9 Feb 2012 18:45:59 +0000
> 
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Thursday, February 09, 2012 10:44 AM
> >> To: Rose, Gregory V
> >> Cc: netdev@vger.kernel.org
> >> Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump
> requests
> >>
> >> From: "Rose, Gregory V" <gregory.v.rose@intel.com>
> >> Date: Thu, 9 Feb 2012 16:25:31 +0000
> >>
> >> > That will change the semantics to the 'ip link show' command quite a
> >> bit.
> >>
> >> I think that's rubbish.
> >>
> >> You can make "ip link show" do whatever you want, including setting all
> >> feature bits in getlink requests, which will preserve existing
> behavior.
> >
> > But there will be a change to the command line semantics no?  How do
> > I communicate which feature bits to set in getlink requests
> > otherwise?
> 
> Make "ip" internally default to "all" feature bits.  I don't really
> understand what the consternation is.  Right now, that's what happens,
> the kernel dumps all the attributes including "VF" stuff.
> 

I'm sure the consternation on my part is that I had thought you wanted the feature bits to default to none.  You said this at one point:

---------
It seems clear that we must, at this point, turn off VF attributes by default.
---------

I'm sorry if I seem obtuse but I'm completely confused now.

And if we default all the feature bits for VFs, macvlans, etc. to on then it seems to me there has to be some way to turn them off from user space.

- Greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 9, 2012, 7:01 p.m. UTC | #7
From: "Rose, Gregory V" <gregory.v.rose@intel.com>
Date: Thu, 9 Feb 2012 18:57:26 +0000

> I'm sure the consternation on my part is that I had thought you
> wanted the feature bits to default to none.  You said this at one
> point:

In "the kernel", so that things like GLIBC, that don't ask for the
features, no longer break with lots of VFs.

"iproute2" in userland can set all the feature bits, and have code
added to accomodate large dump requests in the future, so you're going
to have to make changes to iproute2 for this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rose, Gregory V Feb. 9, 2012, 7:06 p.m. UTC | #8
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of David Miller
> Sent: Thursday, February 09, 2012 11:02 AM
> To: Rose, Gregory V
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump requests
> 
> From: "Rose, Gregory V" <gregory.v.rose@intel.com>
> Date: Thu, 9 Feb 2012 18:57:26 +0000
> 
> > I'm sure the consternation on my part is that I had thought you
> > wanted the feature bits to default to none.  You said this at one
> > point:
> 
> In "the kernel", so that things like GLIBC, that don't ask for the
> features, no longer break with lots of VFs.
> 
> "iproute2" in userland can set all the feature bits, and have code
> added to accomodate large dump requests in the future, so you're going
> to have to make changes to iproute2 for this.

OK, that's a bit more clear to me then.  And that's the only reason I mentioned that the semantics to 'ip link show' would change.  But now I see what you mean by the rubbish comment.  By default ip link show will set all the feature bits and I'll add option code to iproute2 to allow the user to turn it off.

Does that make sense?

- Greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Feb. 9, 2012, 7:11 p.m. UTC | #9
On Thu, 9 Feb 2012 19:06:02 +0000
"Rose, Gregory V" <gregory.v.rose@intel.com> wrote:

> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of David Miller
> > Sent: Thursday, February 09, 2012 11:02 AM
> > To: Rose, Gregory V
> > Cc: netdev@vger.kernel.org
> > Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump requests
> > 
> > From: "Rose, Gregory V" <gregory.v.rose@intel.com>
> > Date: Thu, 9 Feb 2012 18:57:26 +0000
> > 
> > > I'm sure the consternation on my part is that I had thought you
> > > wanted the feature bits to default to none.  You said this at one
> > > point:
> > 
> > In "the kernel", so that things like GLIBC, that don't ask for the
> > features, no longer break with lots of VFs.
> > 
> > "iproute2" in userland can set all the feature bits, and have code
> > added to accomodate large dump requests in the future, so you're going
> > to have to make changes to iproute2 for this.
> 
> OK, that's a bit more clear to me then.  And that's the only reason I mentioned that the semantics to 'ip link show' would change.  But now I see what you mean by the rubbish comment.  By default ip link show will set all the feature bits and I'll add option code to iproute2 to allow the user to turn it off.
> 
> Does that make sense?

Please don't introduce more flags, either use the existing detail flag or
just get everything. 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rose, Gregory V Feb. 9, 2012, 7:13 p.m. UTC | #10
> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> Sent: Thursday, February 09, 2012 11:12 AM
> To: Rose, Gregory V
> Cc: David Miller; netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump requests
> 
> On Thu, 9 Feb 2012 19:06:02 +0000
> "Rose, Gregory V" <gregory.v.rose@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org]
> > > On Behalf Of David Miller
> > > Sent: Thursday, February 09, 2012 11:02 AM
> > > To: Rose, Gregory V
> > > Cc: netdev@vger.kernel.org
> > > Subject: Re: [RFC PATCH] rtnetlink: Add filter for VF info dump
> requests
> > >
> > > From: "Rose, Gregory V" <gregory.v.rose@intel.com>
> > > Date: Thu, 9 Feb 2012 18:57:26 +0000
> > >
> > > > I'm sure the consternation on my part is that I had thought you
> > > > wanted the feature bits to default to none.  You said this at one
> > > > point:
> > >
> > > In "the kernel", so that things like GLIBC, that don't ask for the
> > > features, no longer break with lots of VFs.
> > >
> > > "iproute2" in userland can set all the feature bits, and have code
> > > added to accomodate large dump requests in the future, so you're going
> > > to have to make changes to iproute2 for this.
> >
> > OK, that's a bit more clear to me then.  And that's the only reason I
> mentioned that the semantics to 'ip link show' would change.  But now I
> see what you mean by the rubbish comment.  By default ip link show will
> set all the feature bits and I'll add option code to iproute2 to allow the
> user to turn it off.
> >
> > Does that make sense?
> 
> Please don't introduce more flags, either use the existing detail flag or
> just get everything.

Alright.

- Greg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 9, 2012, 7:17 p.m. UTC | #11
From: "Rose, Gregory V" <gregory.v.rose@intel.com>
Date: Thu, 9 Feb 2012 19:06:02 +0000

> OK, that's a bit more clear to me then.  And that's the only reason
> I mentioned that the semantics to 'ip link show' would change.  But
> now I see what you mean by the rubbish comment.  By default ip link
> show will set all the feature bits and I'll add option code to
> iproute2 to allow the user to turn it off.
> 
> Does that make sense?

Yes, it does.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index c52d4b5..052c240 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -280,6 +280,7 @@  enum {
 	IFLA_VF_VLAN,
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
+	IFLA_VF_INFOFILTER,	/* Filter vfinfo on dumps */
 	__IFLA_VF_MAX,
 };
 
@@ -305,6 +306,11 @@  struct ifla_vf_spoofchk {
 	__u32 vf;
 	__u32 setting;
 };
+
+struct ifla_vf_infofilter {
+	__u32 vf;
+	__u32 filter;
+};
 #ifdef __KERNEL__
 
 /* We don't want this structure exposed to user space */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0eac07c..dd30b5d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1298,6 +1298,8 @@  struct net_device {
 
 	/* group the device belongs to */
 	int group;
+	/* VF info display filter - Number of VFs max is 256 */
+	unsigned long show_vfinfo_filter[BITS_TO_LONGS(256)];
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 39aa20b..4fd78dd 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -727,9 +727,13 @@  static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
 static inline int rtnl_vfinfo_size(const struct net_device *dev)
 {
 	if (dev->dev.parent && dev_is_pci(dev->dev.parent)) {
-
-		int num_vfs = dev_num_vf(dev->dev.parent);
+		int j;
+		int num_vfs = 0;
 		size_t size = nla_total_size(sizeof(struct nlattr));
+		for (j = 0; j < 256; j++) {
+			if (test_bit(j, dev->show_vfinfo_filter))
+				num_vfs++;
+		}
 		size += nla_total_size(num_vfs * sizeof(struct nlattr));
 		size += num_vfs *
 			(nla_total_size(sizeof(struct ifla_vf_mac)) +
@@ -876,6 +880,7 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	const struct rtnl_link_stats64 *stats;
 	struct nlattr *attr, *af_spec;
 	struct rtnl_af_ops *af_ops;
+	u32 num_vf_filters_set = 0;
 
 	ASSERT_RTNL();
 	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags);
@@ -941,10 +946,18 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		goto nla_put_failure;
 	copy_rtnl_link_stats64(nla_data(attr), stats);
 
-	if (dev->dev.parent)
-		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
+	if (dev->dev.parent) {
+		int j;
+		for (j = 0; j < 256; j++) {
+			if (test_bit(j, dev->show_vfinfo_filter))
+				num_vf_filters_set++;
+		}
+		if (num_vf_filters_set)
+			NLA_PUT_U32(skb, IFLA_NUM_VF, num_vf_filters_set);
+	}
 
-	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
+	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent &&
+			num_vf_filters_set) {
 		int i;
 
 		struct nlattr *vfinfo, *vf;
@@ -960,6 +973,9 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			struct ifla_vf_tx_rate vf_tx_rate;
 			struct ifla_vf_spoofchk vf_spoofchk;
 
+			if (!test_bit(i, dev->show_vfinfo_filter))
+				continue;
+
 			/*
 			 * Not all SR-IOV capable drivers support the
 			 * spoofcheck query.  Preset to -1 so the user
@@ -1234,6 +1250,20 @@  static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
 							       ivs->setting);
 			break;
 		}
+		case IFLA_VF_INFOFILTER: {
+			struct ifla_vf_infofilter *ivf;
+			ivf = nla_data(vf);
+			if (ivf->vf < dev_num_vf(dev->dev.parent)) {
+				if (ivf->filter)
+					set_bit(ivf->vf,
+						dev->show_vfinfo_filter);
+			else
+					clear_bit(ivf->vf,
+						dev->show_vfinfo_filter);
+				err = 0;
+			}
+			break;
+		}
 		default:
 			err = -EINVAL;
 			break;