Message ID | 1303834864.3358.58.camel@edumazet-laptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
> -----Original Message----- > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > Sent: Tuesday, April 26, 2011 9:21 AM > To: Rose, Gregory V > Cc: David Miller; netdev@vger.kernel.org; bhutchings@solarflare.com > Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > > > I'm fine with however you folks want to approach this, just give me some > direction. > > I would just try following patch : I'll test it out, it's certainly a lot simpler. Perhaps I was getting a bit too fancy. Ben would want to make sure it works for 127 VFs, my device does 63. - Greg > > > diff --git a/include/linux/netlink.h b/include/linux/netlink.h > index 4c4ac3f..22cac81 100644 > --- a/include/linux/netlink.h > +++ b/include/linux/netlink.h > @@ -210,6 +210,7 @@ int netlink_sendskb(struct sock *sk, struct sk_buff > *skb); > #else > #define NLMSG_GOODSIZE SKB_WITH_OVERHEAD(8192UL) > #endif > +#define NLMSG_DUMPSIZE SKB_WITH_OVERHEAD(8192UL) > > #define NLMSG_DEFAULT_SIZE (NLMSG_GOODSIZE - NLMSG_HDRLEN) > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index c8f35b5..cb8d6ac 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -1669,7 +1669,7 @@ static int netlink_dump(struct sock *sk) > struct nlmsghdr *nlh; > int len, err = -ENOBUFS; > > - skb = sock_rmalloc(sk, NLMSG_GOODSIZE, 0, GFP_KERNEL); > + skb = sock_rmalloc(sk, NLMSG_DUMPSIZE, 0, GFP_KERNEL); > if (!skb) > goto errout; > >
On Tue, 2011-04-26 at 09:24 -0700, Rose, Gregory V wrote: > > -----Original Message----- > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > > Sent: Tuesday, April 26, 2011 9:21 AM > > To: Rose, Gregory V > > Cc: David Miller; netdev@vger.kernel.org; bhutchings@solarflare.com > > Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size > > > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > > > > > I'm fine with however you folks want to approach this, just give me some > > direction. > > > > I would just try following patch : > > I'll test it out, it's certainly a lot simpler. Perhaps I was getting a bit too fancy. > > Ben would want to make sure it works for 127 VFs, my device does 63. I haven't been working on SR-IOV myself so I'll pass this on to a colleague. Thanks, Eric. Ben.
> -----Original Message----- > From: Ben Hutchings [mailto:bhutchings@solarflare.com] > Sent: Tuesday, April 26, 2011 12:07 PM > To: Rose, Gregory V; Eric Dumazet > Cc: David Miller; netdev@vger.kernel.org > Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size > > On Tue, 2011-04-26 at 09:24 -0700, Rose, Gregory V wrote: > > > -----Original Message----- > > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > > > Sent: Tuesday, April 26, 2011 9:21 AM > > > To: Rose, Gregory V > > > Cc: David Miller; netdev@vger.kernel.org; bhutchings@solarflare.com > > > Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message > size > > > > > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > > > > > > > I'm fine with however you folks want to approach this, just give me > some > > > direction. > > > > > > I would just try following patch : > > > > I'll test it out, it's certainly a lot simpler. Perhaps I was getting a > bit too fancy. > > > > Ben would want to make sure it works for 127 VFs, my device does 63. > > I haven't been working on SR-IOV myself so I'll pass this on to a > colleague. Thanks, Eric. Eric's patch works fine for the case of 63 VFs. For most dumps it's allocating quite a bit more memory than necessary but that's not a big issue since it's transient and not held for long. - Greg
On 04/27/2011 04:24 PM, Eric Dumazet wrote: > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > >> I'm fine with however you folks want to approach this, just give me some direction. > > I would just try following patch : > This allows the sfc driver to use 102 VFs, up from the current limit of 45 VFs. It's unfortunate that this patch isn't sufficient to allow all 127 VFs to be used, but whilst we wait for a new netlink api this is an improvement worth having. I'd like to see this patch committed. Thanks Eric, - Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 27 avril 2011 à 16:46 +0100, Steve Hodgson a écrit : > On 04/27/2011 04:24 PM, Eric Dumazet wrote: > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > > > >> I'm fine with however you folks want to approach this, just give me some direction. > > > > I would just try following patch : > > > > This allows the sfc driver to use 102 VFs, up from the current limit of > 45 VFs. > > It's unfortunate that this patch isn't sufficient to allow all 127 VFs > to be used, but whilst we wait for a new netlink api this is an > improvement worth having. > netlink recvmsg() supports MSG_PEEK so user would get the needed size of its buffer before calling the real recvmsg() big blobs could be attached as skb fragments (up to 64Kbytes), but do we really want this... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] > On Behalf Of Eric Dumazet > Sent: Wednesday, April 27, 2011 9:30 AM > To: Steve Hodgson > Cc: Rose, Gregory V; David Miller; netdev@vger.kernel.org; > bhutchings@solarflare.com > Subject: Re: [RFC PATCH] netlink: Increase netlink dump skb message size > > Le mercredi 27 avril 2011 à 16:46 +0100, Steve Hodgson a écrit : > > On 04/27/2011 04:24 PM, Eric Dumazet wrote: > > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > > > > > >> I'm fine with however you folks want to approach this, just give me > some direction. > > > > > > I would just try following patch : > > > > > > > This allows the sfc driver to use 102 VFs, up from the current limit of > > 45 VFs. > > > > It's unfortunate that this patch isn't sufficient to allow all 127 VFs > > to be used, but whilst we wait for a new netlink api this is an > > improvement worth having. > > > > netlink recvmsg() supports MSG_PEEK so user would get the needed size of > its buffer before calling the real recvmsg() > > big blobs could be attached as skb fragments (up to 64Kbytes), but do we > really want this... [Greg Rose] I'm looking into an approach in which we make the get info dump for VFs orthogonal to the set VF info, i.e. like this: To set VF info we would follow the current convention: # ip link set eth(x) vf (n) mac xx:xx:xx:xx:xx:xx # ip link set eth(x) vf (n) vlan (nnnn) To see VF info: # ip link show eth(x) vf (n) would show that VF's mac and vlan and could then be expanded in the future to display more information required for additional features that users are asking for. The IFLA_VF_INFO dump would be moved out of the info dump for the physical function interface and would no longer be nested which would get rid of the need for huge amounts of buffer for info dumps on VFs. The ip link show command for the PF would need to report the number of VFs currently allocated to the PF so that could fed into a script that loops to show each VFs info. I think this approach would fix the problems we're looking at right now. - Greg
Le mercredi 27 avril 2011 à 10:15 -0700, Rose, Gregory V a écrit : > > -----Original Message----- > > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] > > On Behalf Of Eric Dumazet > > Sent: Wednesday, April 27, 2011 9:30 AM > > To: Steve Hodgson > > Cc: Rose, Gregory V; David Miller; netdev@vger.kernel.org; > > bhutchings@solarflare.com > > Subject: Re: [RFC PATCH] netlink: Increase netlink dump skb message size > > > > Le mercredi 27 avril 2011 à 16:46 +0100, Steve Hodgson a écrit : > > > On 04/27/2011 04:24 PM, Eric Dumazet wrote: > > > > Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > > > > > > > >> I'm fine with however you folks want to approach this, just give me > > some direction. > > > > > > > > I would just try following patch : > > > > > > > > > > This allows the sfc driver to use 102 VFs, up from the current limit of > > > 45 VFs. > > > > > > It's unfortunate that this patch isn't sufficient to allow all 127 VFs > > > to be used, but whilst we wait for a new netlink api this is an > > > improvement worth having. > > > > > > > netlink recvmsg() supports MSG_PEEK so user would get the needed size of > > its buffer before calling the real recvmsg() > > > > big blobs could be attached as skb fragments (up to 64Kbytes), but do we > > really want this... > [Greg Rose] > > I'm looking into an approach in which we make the get info dump for VFs orthogonal to the set VF info, i.e. like this: > > To set VF info we would follow the current convention: > > # ip link set eth(x) vf (n) mac xx:xx:xx:xx:xx:xx > # ip link set eth(x) vf (n) vlan (nnnn) > > To see VF info: > > # ip link show eth(x) vf (n) > > would show that VF's mac and vlan and could then be expanded in the future to display more information required for additional features that users are asking for. > > The IFLA_VF_INFO dump would be moved out of the info dump for the physical function interface and would no longer be nested which would get rid of the need for huge amounts of buffer for info dumps on VFs. The ip link show command for the PF would need to report the number of VFs currently allocated to the PF so that could fed into a script that loops to show each VFs info. > > I think this approach would fix the problems we're looking at right now. > Hmm, if you look at "ip link ..." you'll see it dumps everything from kernel and does the filter inside user command. BTW "ip" uses a 16384 bytes buffer, not a 8192 bytes one. $ strace -e trace=recvmsg ip link recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\340\3\0\0\20\0\2\0dR\270M\224s\0\0\0\0\4\3\1\0\0\0I\0\1\0 \0\0\0\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3000 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\364\3\0\0\20\0\2\0dR\270M\224s\0\0\0\0\1\0\4\0\0\0C\24\1 \0\0\0\0\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3024 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\34\4\0\0\20\0\2\0dR\270M\224s\0\0\0\0\1\0\7\0\0\0C\20\1\0 \0\0\0\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 2104 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0dR\270M\224s\0\0\0\0\0\0\7\0\0\0C\20\1\0 \0\0\0\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 20 ... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 27 avril 2011 à 10:39 -0700, Rose, Gregory V a écrit : > Right, but when I look in rtnetlink I see the routine to calculate the > amount of buffer needed for VF info dump is the number of device > parent (PF) VFs * the sizeof various IFLA_VF_INFO items. The more the > VFs the bigger this gets, especially if you want to add more stuff to > IFLA_VF_INFO. So when the kernel dumps this all out it can get bigger > than the NLMSG_GOODSIZE (or DUMPSIZE) pretty quickly. > > > > > BTW "ip" uses a 16384 bytes buffer, not a 8192 bytes one. > > I know, that's why I suffered some confusion about which size to use. > The ip command uses 16K but the NLMSG_GOODSIZE can be as small as 3712 > bytes (depending on page size). Despite the user buffer being 16k if > the size calculated by if_nlmsg_size() in rtnetlink.c is bigger than > NLMSG_GOODSIZE then you don't see the info for more than 40 or so VFs. > More VFs than that and nothing gets displayed. > One solution is to change rtnl_dump_ifinfo() to call rtnl_fill_ifinfo() once time per device (RTM_NEWLINK like now but no more VFINFO inside), then call another function to provide vf/vlan informations (RTM_NEWVF), using cb->args[2] as an index into VF space, so that we can stop if current skb is filled, and next recvmsg() starts at previous saved index. Or implement a new "ip vf ..." dumper and only dumps RTM_NEWVF messages. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > Sent: Wednesday, April 27, 2011 11:06 AM > To: Rose, Gregory V > Cc: Steve Hodgson; David Miller; netdev@vger.kernel.org; > bhutchings@solarflare.com > Subject: RE: [RFC PATCH] netlink: Increase netlink dump skb message size > > Le mercredi 27 avril 2011 à 10:39 -0700, Rose, Gregory V a écrit : > > > Right, but when I look in rtnetlink I see the routine to calculate the > > amount of buffer needed for VF info dump is the number of device > > parent (PF) VFs * the sizeof various IFLA_VF_INFO items. The more the > > VFs the bigger this gets, especially if you want to add more stuff to > > IFLA_VF_INFO. So when the kernel dumps this all out it can get bigger > > than the NLMSG_GOODSIZE (or DUMPSIZE) pretty quickly. > > > > > > > > BTW "ip" uses a 16384 bytes buffer, not a 8192 bytes one. > > > > I know, that's why I suffered some confusion about which size to use. > > The ip command uses 16K but the NLMSG_GOODSIZE can be as small as 3712 > > bytes (depending on page size). Despite the user buffer being 16k if > > the size calculated by if_nlmsg_size() in rtnetlink.c is bigger than > > NLMSG_GOODSIZE then you don't see the info for more than 40 or so VFs. > > More VFs than that and nothing gets displayed. > > > > One solution is to change rtnl_dump_ifinfo() to call rtnl_fill_ifinfo() > once time per device (RTM_NEWLINK like now but no more VFINFO inside), > then call another function to provide vf/vlan informations (RTM_NEWVF), > using cb->args[2] as an index into VF space, so that we can stop if > current skb is filled, and next recvmsg() starts at previous saved > index. > > Or implement a new "ip vf ..." dumper and only dumps RTM_NEWVF messages. I'm a netlink newbie so maybe that is a more obvious solution that I missed. Let me have a look at the code and see. Thanks, - Greg
diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 4c4ac3f..22cac81 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -210,6 +210,7 @@ int netlink_sendskb(struct sock *sk, struct sk_buff *skb); #else #define NLMSG_GOODSIZE SKB_WITH_OVERHEAD(8192UL) #endif +#define NLMSG_DUMPSIZE SKB_WITH_OVERHEAD(8192UL) #define NLMSG_DEFAULT_SIZE (NLMSG_GOODSIZE - NLMSG_HDRLEN) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index c8f35b5..cb8d6ac 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1669,7 +1669,7 @@ static int netlink_dump(struct sock *sk) struct nlmsghdr *nlh; int len, err = -ENOBUFS; - skb = sock_rmalloc(sk, NLMSG_GOODSIZE, 0, GFP_KERNEL); + skb = sock_rmalloc(sk, NLMSG_DUMPSIZE, 0, GFP_KERNEL); if (!skb) goto errout;
Le mardi 26 avril 2011 à 09:12 -0700, Rose, Gregory V a écrit : > I'm fine with however you folks want to approach this, just give me some direction. I would just try following patch : -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html