Message ID | 1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2014-10-15 at 11:26 +0800, Pan Jiafei wrote: > In some platform, there are some hardware block provided > to manage buffers to improve performance. So in some case, > it is expected that the packets received by some generic > NIC should be put into such hardware managed buffers > directly, so that such buffer can be released by hardware > or by driver. You repeat 'some' four times. > > This patch provide such general APIs for generic NIC to > use hardware block managed buffers without any modification > for generic NIC drivers. ... > In this patch, the following fields are added to "net_device": > void *hw_skb_priv; > struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length); > void (*free_hw_skb)(struct sk_buff *skb); > so in order to let generic NIC driver to use hardware managed > buffers, the function "alloc_hw_skb" and "free_hw_skb" > provide implementation for allocate and free hardware managed > buffers. "hw_skb_priv" is provided to pass some private data for > these two functions. > > When the socket buffer is allocated by these APIs, "hw_skb_state" > is provided in struct "sk_buff". this argument can indicate > that the buffer is hardware managed buffer, this buffer > should freed by software or by hardware. > > Documentation on how to use this featue can be found at > <file:Documentation/networking/hw_skb.txt>. > > Signed-off-by: Pan Jiafei <Jiafei.Pan@freescale.com> I am giving a strong NACK, of course. We are not going to grow sk_buff and add yet another conditional in fast path for a very obscure feature like that. Memory management is not going to be done by drivers. The way it should work is that if your hardware has specific needs, rx and tx paths (of the driver) need to make the needed adaptation. Not the other way. We already have complex skb layouts, we do not need a new one. Take a look at how drivers can 'lock' pages already, and build skb sith page frags. It is already there. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Pan Jiafei <Jiafei.Pan@freescale.com> Date: Wed, 15 Oct 2014 11:26:11 +0800 > In some platform, there are some hardware block provided > to manage buffers to improve performance. So in some case, > it is expected that the packets received by some generic > NIC should be put into such hardware managed buffers > directly, so that such buffer can be released by hardware > or by driver. > > This patch provide such general APIs for generic NIC to > use hardware block managed buffers without any modification > for generic NIC drivers. Why are existing interfaces insufficent for your needs? Several ethernet drivers already build SKBs from block buffer pools. They allocate pools of pages which the hardware divides into various powers of 2, then the RX descriptor says what pieces of which pools were used to hold the data for a packet, and then the SKB is constructed with page frags pointing to those locations. It's very cheap, inexpensive, and existing APIs are considered to cover all use cases. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 14 Oct 2014 21:15:03 -0700 > Take a look at how drivers can 'lock' pages already, and build skb > sith page frags. It is already there. +1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 15.10.2014 05:26, Pan Jiafei wrote: > In some platform, there are some hardware block provided > to manage buffers to improve performance. (..) > diff --git a/net/Kconfig b/net/Kconfig > index d6b138e..346e021 100644 > --- a/net/Kconfig > +++ b/net/Kconfig > @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT > with many clients some protection against DoS by a single (spoofed) > flow that greatly exceeds average workload. > > +config USE_HW_SKB > + bool "NIC use hardware managed buffer to build skb" > + depends on INET The feature seems to be valid for network devices in general. Why did you make it depending on INET ?? Regards, Oliver > + ---help--- > + If select this, the third party drivers will use hardware managed > + buffers to allocate SKB without any modification for the driver. > + > + Documentation on how to use this featue can be found at > + <file:Documentation/networking/hw_skb.txt>. > + > menu "Network testing" > > config NET_PKTGEN -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: David Miller [mailto:davem@davemloft.net] > Sent: Wednesday, October 15, 2014 12:25 PM > To: Pan Jiafei-B37022 > Cc: jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; linux- > doc@vger.kernel.org > Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb > > From: Pan Jiafei <Jiafei.Pan@freescale.com> > Date: Wed, 15 Oct 2014 11:26:11 +0800 > > > In some platform, there are some hardware block provided > > to manage buffers to improve performance. So in some case, > > it is expected that the packets received by some generic > > NIC should be put into such hardware managed buffers > > directly, so that such buffer can be released by hardware > > or by driver. > > > > This patch provide such general APIs for generic NIC to > > use hardware block managed buffers without any modification > > for generic NIC drivers. > > Why are existing interfaces insufficent for your needs? > > Several ethernet drivers already build SKBs from block > buffer pools. > Yes, for this matter, in order to do this we should modify the Ethernet drivers. For example, driver A want to driver B, C, D.. to support driver A's Hardware block access functions, so we have to modify driver B, C, D... It will be so complex for this matter. But by using my patch, I just modify a Ethernet device (I don't care Which driver it is used) flag in driver A in order to implement this Ethernet device using hardware block access functions provided by Driver A. > They allocate pools of pages which the hardware divides into various > powers of 2, then the RX descriptor says what pieces of which pools > were used to hold the data for a packet, and then the SKB is > constructed with page frags pointing to those locations. > > It's very cheap, inexpensive, and existing APIs are considered to > cover all use cases. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Oliver Hartkopp [mailto:socketcan@hartkopp.net] > Sent: Wednesday, October 15, 2014 12:59 PM > To: Pan Jiafei-B37022; davem@davemloft.net; jkosina@suse.cz > Cc: netdev@vger.kernel.org; Li Yang-Leo-R58472; linux-doc@vger.kernel.org > Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb > > On 15.10.2014 05:26, Pan Jiafei wrote: > > In some platform, there are some hardware block provided > > to manage buffers to improve performance. > > (..) [Pan Jiafei] I want to build a general patch to build skb by using hardware buffer manager block. > > > diff --git a/net/Kconfig b/net/Kconfig > > index d6b138e..346e021 100644 > > --- a/net/Kconfig > > +++ b/net/Kconfig > > @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT > > with many clients some protection against DoS by a single (spoofed) > > flow that greatly exceeds average workload. > > > > +config USE_HW_SKB > > + bool "NIC use hardware managed buffer to build skb" > > + depends on INET > > The feature seems to be valid for network devices in general. > Why did you make it depending on INET ?? > > Regards, > Oliver > [Pan Jiafei] Yes, INET dependency should be removed, thanks. > > + ---help--- > > + If select this, the third party drivers will use hardware managed > > + buffers to allocate SKB without any modification for the driver. > > + > > + Documentation on how to use this featue can be found at > > + <file:Documentation/networking/hw_skb.txt>. > > + > > menu "Network testing" > > > > config NET_PKTGEN -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Pan Jiafei > In some platform, there are some hardware block provided > to manage buffers to improve performance. So in some case, > it is expected that the packets received by some generic > NIC should be put into such hardware managed buffers > directly, so that such buffer can be released by hardware > or by driver. This looks like some strange variant of 'buffer loaning'. In general it just doesn't work due to the limited number of such buffers - they soon all become queued waiting for applications to read from sockets. It also isn't at all clear how you expect a 'generic NIC' to actually allocate buffers from your 'special area'. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-10-15 at 05:43 +0000, Jiafei.Pan@freescale.com wrote: > For my case, there are some shortcoming to use page frags. Firstly, I have to > modify each Ethernet drivers to support it especially I don’t which vendor's > driver the customer will use. Secondly, it is not enough only > build skb by 'lock' pages, the buffer address comes from hardware block and > should be aligned to hardware block. So you align to hardware block. What is the problem with this ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-10-15 at 05:34 +0000, Jiafei.Pan@freescale.com wrote: > Yes, for this matter, in order to do this we should modify the Ethernet > drivers. For example, driver A want to driver B, C, D.. to support driver > A's Hardware block access functions, so we have to modify driver B, C, D... > It will be so complex for this matter. > > But by using my patch, I just modify a Ethernet device (I don't care > Which driver it is used) flag in driver A in order to implement this > Ethernet device using hardware block access functions provided by > Driver A. We care a lot of all the bugs added by your patches. You have little idea of how many of them were added. We do not want to spend days of work explaining everything or fixing all the details for you. Carefully read net/core/skbuff.c, net/core/dev.c, GRO layer, you'll see how many spots you missed. You cannot control how skbs are cooked before reaching your driver ndo_start_xmit(). You are not going to add hooks in UDP , TCP, or other drivers RX path. This would be absolutely insane. Trying to control how skb are cooked in RX path is absolutely something drivers do, using page frags that are read-only by all the stack. Fix your driver to use existing infra, your suggestion is not going to be accepted. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Since an skb can sit forever in an application queue, you have created an easy way to livelock the system when enough skb's are waiting to be read. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Jiafei.Pan@freescale.com" <Jiafei.Pan@freescale.com> Date: Wed, 15 Oct 2014 05:34:24 +0000 > Yes, for this matter, in order to do this we should modify the Ethernet > drivers. For example, driver A want to driver B, C, D.. to support driver > A's Hardware block access functions, so we have to modify driver B, C, D... > It will be so complex for this matter. Experience says otherwise. It's three or four lines of code to attach a page to an SKB frag. The driver needs it's own buffer management and setup code anyways, and no generic facility will replace that. I think your precondition for these changes therefore doesn't really exist. Please, look over all of the drivers that exist already in the tree and build SKBs using page frage from hardware device buffer pools. You have to show us that all of those drivers can make use of your facility. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > Sent: Wednesday, October 15, 2014 5:16 PM > To: Pan Jiafei-B37022 > Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; > linux-doc@vger.kernel.org > Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb > > On Wed, 2014-10-15 at 05:34 +0000, Jiafei.Pan@freescale.com wrote: > > > Yes, for this matter, in order to do this we should modify the Ethernet > > drivers. For example, driver A want to driver B, C, D.. to support driver > > A's Hardware block access functions, so we have to modify driver B, C, D... > > It will be so complex for this matter. > > > > But by using my patch, I just modify a Ethernet device (I don't care > > Which driver it is used) flag in driver A in order to implement this > > Ethernet device using hardware block access functions provided by > > Driver A. > > We care a lot of all the bugs added by your patches. You have little > idea of how many of them were added. We do not want to spend days of > work explaining everything or fixing all the details for you. > > Carefully read net/core/skbuff.c, net/core/dev.c, GRO layer, you'll see > how many spots you missed. > > You cannot control how skbs are cooked before reaching your driver > ndo_start_xmit(). You are not going to add hooks in UDP , TCP, or other > drivers RX path. This would be absolutely insane. > Thanks for your comments and suggestion. In my case, I want to build skb from hardware block specified memory, I only can see two ways, one is modified net card driver replace common skb allocation function with my specially functions, another way is to hack common skb allocation function in which redirect to my specially functions. My patch is just for the second way. Except these two ways, would you please give me some advice for some other ways for my case? Thanks. > Trying to control how skb are cooked in RX path is absolutely something > drivers do, using page frags that are read-only by all the stack. > > Fix your driver to use existing infra, your suggestion is not going to > be accepted. >
> -----Original Message----- > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Wednesday, October 15, 2014 5:33 PM > To: Pan Jiafei-B37022 > Cc: davem@davemloft.net; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo- > R58472; linux-doc@vger.kernel.org > Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb > > Since an skb can sit forever in an application queue, you have created > an easy way to livelock the system when enough skb's are waiting to be > read. I think there is no possible to livelock the system, because in my patch The function __netdev_alloc_skb will try to allocate hardware block buffer Firstly if dev->alloc_hw_skb is set, but it will continue allocate normal skb buffer if the hardware block buffer allocation fails. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote: > Thanks for your comments and suggestion. In my case, I want to build skb > from hardware block specified memory, I only can see two ways, one is modified > net card driver replace common skb allocation function with my specially > functions, another way is to hack common skb allocation function in which > redirect to my specially functions. My patch is just for the second way. > Except these two ways, would you please give me some advice for some other > ways for my case? Thanks I suggest you read drivers/net/ethernet numerous examples. No need to change anything in net/* or include/*, really. For a start, look at drivers/net/ethernet/intel/igb/igb_main.c Mentioning 'hack' in your mails simply should hint you are doing something very wrong. What makes you think your hardware is so special ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > Sent: Thursday, October 16, 2014 12:15 PM > To: Pan Jiafei-B37022 > Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; > linux-doc@vger.kernel.org > Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb > > On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote: > > > Thanks for your comments and suggestion. In my case, I want to build skb > > from hardware block specified memory, I only can see two ways, one is modified > > net card driver replace common skb allocation function with my specially > > functions, another way is to hack common skb allocation function in which > > redirect to my specially functions. My patch is just for the second way. > > Except these two ways, would you please give me some advice for some other > > ways for my case? Thanks > > I suggest you read drivers/net/ethernet numerous examples. > > No need to change anything in net/* or include/*, really. > > For a start, look at drivers/net/ethernet/intel/igb/igb_main.c > > Mentioning 'hack' in your mails simply should hint you are doing > something very wrong. > > What makes you think your hardware is so special ? > In fact, I am developing a bridge driver, it can bridge between any other the third party net card and my own net card. My target is to let any other the third party net card can directly use my own net card specified buffer, then there will be no memory copy in the whole bridge process. By the way, I don’t see any similar between igb_main.c and my case. And also My bridge also can’t implemented with "skb frag" in order to aim at zero memory copy.
On 10/15/2014 10:15 PM, Jiafei.Pan@freescale.com wrote: >> -----Original Message----- >> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] >> Sent: Thursday, October 16, 2014 12:15 PM >> To: Pan Jiafei-B37022 >> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; >> linux-doc@vger.kernel.org >> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb >> >> On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote: >> >>> Thanks for your comments and suggestion. In my case, I want to build skb >>> from hardware block specified memory, I only can see two ways, one is modified >>> net card driver replace common skb allocation function with my specially >>> functions, another way is to hack common skb allocation function in which >>> redirect to my specially functions. My patch is just for the second way. >>> Except these two ways, would you please give me some advice for some other >>> ways for my case? Thanks >> I suggest you read drivers/net/ethernet numerous examples. >> >> No need to change anything in net/* or include/*, really. >> >> For a start, look at drivers/net/ethernet/intel/igb/igb_main.c >> >> Mentioning 'hack' in your mails simply should hint you are doing >> something very wrong. >> >> What makes you think your hardware is so special ? >> > In fact, I am developing a bridge driver, it can bridge between any other the > third party net card and my own net card. My target is to let any other the > third party net card can directly use my own net card specified buffer, then > there will be no memory copy in the whole bridge process. > By the way, I don’t see any similar between igb_main.c and my case. And also > My bridge also can’t implemented with "skb frag" in order to aim at zero memory > copy. I think the part you are not getting is that is how buffers are essentially handled now. So for example in the case if igb the only part we have copied out is usually the header, or the entire frame in the case of small packets. This has to happen in order to allow for changes to the header for routing and such. Beyond that the frags that are passed are the buffers that igb is still holding onto. So effectively what the other device transmits in a bridging/routing scenario is my own net card specified buffer plus the copied/modified header. For a brief period igb used build_skb but that isn't valid on most systems as memory mapped for a device can be overwritten if the page is unmapped resulting in any changes to the header for routing/bridging purposes being invalidated. Thus we cannot use the buffers for both the skb->data header which may be changed and Rx DMA simultaneously. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Alexander Duyck [mailto:alexander.h.duyck@redhat.com] > Sent: Thursday, October 16, 2014 11:28 PM > To: Pan Jiafei-B37022; Eric Dumazet > Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; > linux-doc@vger.kernel.org > Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb > > > On 10/15/2014 10:15 PM, Jiafei.Pan@freescale.com wrote: > >> -----Original Message----- > >> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > >> Sent: Thursday, October 16, 2014 12:15 PM > >> To: Pan Jiafei-B37022 > >> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; > >> linux-doc@vger.kernel.org > >> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb > >> > >> On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote: > >> > >>> Thanks for your comments and suggestion. In my case, I want to build skb > >>> from hardware block specified memory, I only can see two ways, one is > modified > >>> net card driver replace common skb allocation function with my specially > >>> functions, another way is to hack common skb allocation function in which > >>> redirect to my specially functions. My patch is just for the second way. > >>> Except these two ways, would you please give me some advice for some other > >>> ways for my case? Thanks > >> I suggest you read drivers/net/ethernet numerous examples. > >> > >> No need to change anything in net/* or include/*, really. > >> > >> For a start, look at drivers/net/ethernet/intel/igb/igb_main.c > >> > >> Mentioning 'hack' in your mails simply should hint you are doing > >> something very wrong. > >> > >> What makes you think your hardware is so special ? > >> > > In fact, I am developing a bridge driver, it can bridge between any other the > > third party net card and my own net card. My target is to let any other the > > third party net card can directly use my own net card specified buffer, then > > there will be no memory copy in the whole bridge process. > > By the way, I don’t see any similar between igb_main.c and my case. And also > > My bridge also can’t implemented with "skb frag" in order to aim at zero > memory > > copy. > > I think the part you are not getting is that is how buffers are > essentially handled now. [Pan Jiafei] Hi, Alex, thanks for your comments. I don’t confirm that you have catch my concerns. For example, I want to add igb net card into my bridge, and want to igb net driver allocate skb by using my specified memory address, but I don’t want to modify igb net driver directly, how to do this in my bridge drivers? Thanks, Jiafei. So for example in the case if igb the only > part we have copied out is usually the header, or the entire frame in > the case of small packets. This has to happen in order to allow for > changes to the header for routing and such. Beyond that the frags that > are passed are the buffers that igb is still holding onto. So > effectively what the other device transmits in a bridging/routing > scenario is my own net card specified buffer plus the copied/modified > header. > > For a brief period igb used build_skb but that isn't valid on most > systems as memory mapped for a device can be overwritten if the page is > unmapped resulting in any changes to the header for routing/bridging > purposes being invalidated. Thus we cannot use the buffers for both the > skb->data header which may be changed and Rx DMA simultaneously. > > Thanks, > > Alex
On Fri, 2014-10-17 at 02:35 +0000, Jiafei.Pan@freescale.com wrote: > [Pan Jiafei] Hi, Alex, thanks for your comments. I don’t confirm that > you have catch my concerns. For example, I want to add igb net card > into my bridge, and want to igb net driver allocate skb by using > my specified memory address, but I don’t want to modify igb net driver > directly, how to do this in my bridge drivers? This is exactly our point : We do not want to modify all drivers so that your bridge is happy with them. You'll have to make your bridge using standard infra instead. IGB has no way to know in advance that a particular frame should eventually reach your bridge anyway. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/17/2014 07:05 AM, Eric Dumazet wrote: > On Fri, 2014-10-17 at 02:35 +0000, Jiafei.Pan@freescale.com wrote: > >> [Pan Jiafei] Hi, Alex, thanks for your comments. I don’t confirm that >> you have catch my concerns. For example, I want to add igb net card >> into my bridge, and want to igb net driver allocate skb by using >> my specified memory address, but I don’t want to modify igb net driver >> directly, how to do this in my bridge drivers? > This is exactly our point : We do not want to modify all drivers so that > your bridge is happy with them. > > You'll have to make your bridge using standard infra instead. > > IGB has no way to know in advance that a particular frame should > eventually reach your bridge anyway. Also why is it igb should use your buffers, is there any reason why your device cannot use the receive buffers that are handed off to the stack from igb? It isn't as if there is a copy in the routing or bridging path. The receive buffer is normally handed off to the stack from the ingress device, a few headers might get tweaked, and then that buffer is transmitted by the egress interface. Only in the case of a buffer being handed to multiple egress devices or user space should it ever be copied. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/networking/hw_skb.txt b/Documentation/networking/hw_skb.txt new file mode 100644 index 0000000..256f3fc --- /dev/null +++ b/Documentation/networking/hw_skb.txt @@ -0,0 +1,117 @@ +Document for using hardware managed SKB. + +1. Description + +In some platform, there are some hardware block provided +to manage buffers to improve performance. So in some case, +it is expected that the packets received by some generic +NIC should be put into such hardware managed buffers +directly, so that such buffer can be released by hardware +or by driver. + +2. Related Struct Definition + +Some general APIs are provided for generic NIC to use hardware +block managed buffers without any modification for generic NIC +drivers. + +1)Kernel Configuration Item + + "CONFIG_USE_HW_SKB" + +2)The DEVICE structure + + struct net_device { + ... + #ifdef CONFIG_USE_HW_SKB + void *hw_skb_priv; + struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length); + void (*free_hw_skb)(struct sk_buff *skb); + #endif + ... + } + +"hw_skb_priv" is private data for "alloc_hw_skb" and "free_hw_skb" functions. +"alloc_hw_skb" is for allocating skb by using hardware managed buffer. +"free_hw_skb" is for freeing skb allocated by hardware manager buffer. + +3)struct sk_buff - socket buffer + + struct sk_buff { + ... + #ifdef CONFIG_SKB_USE_HW_BP + __u32 hw_skb_state; + void *hw_skb_priv; + void (*free_hw_skb)(struct sk_buff *skb); + #endif + ... + } + + /* hw_skb_state list */ + enum hw_skb_state { + /* If set, SKB use hardware managed buffer */ + IS_HW_SKB = 1 << 0, + /* If set, and skb can be freed by software by calling + * netdev->free_hw_skb + */ + HW_SKB_SW_FREE = 1 << 1, + }; + +"hw_skb_priv" and "free_hw_skb" are the same with the field in the +struct "net_device" + +After calling "alloc_hw_skb" to allocate skb by using hardware managed +buffers, "hw_skb_priv" and "free_hw_skb" is set in SKB driver: + skb->hw_skb_priv = dev->hw_skb_priv; + skb->free_hw_skb = dev->free_hw_skb; +So that when "struct net_device *dev" is changed after the skb is allocated, +It is be confirmed that this skb can be freed by the method synced +with allocation. + +"hw_skb_state" indicates that the state of SKB. When the skb is allocated +by "alloc_hw_skb" function, the flag of "IS_HW_SKB" is set by +"__netdev_alloc_skb" function in skbuff.c when returned from "alloc_hw_skb". +But in "alloc_hw_skb", "HW_SKB_SW_FREE" must be set if the skb should be +freed by calling "free_hw_skb", otherwise, the skb will never be freed by +any driver until it is freed by hardware block. + +SKB using hardware managed buffer is not recycleable. + +3. How to use this feature + +For example, driver "A" wants the third-party NIC driver "B" to +store the data in some hardware managed buffer then send to "A". + +1) Select "CONFIG_USE_HW_SKB" to enable this feature. + +2) In driver "A", implement the function "alloc_hw_skb" and +"free_hw_skb". For example: + +struct sk_buff *alloc_hw_skb(void *priv, unsigned int length) +{ + buf = alloc_hw_buffer(); + skb = build_skb(buf, ...); + if (skb) + skb->hw_skb_state |= HW_SKB_SW_FREE; + + return skb; +} + +void free_hw_skb(struct sk_buff *skb) +{ + free_hw_buffer(skb->head); +} + +3) In driver "A", get "net_device" handle of net device case using +driver "B". + ... + net_dev_b->hw_skb_priv = priv; + net_dev_b->alloc_hw_skb = alloc_hw_skb; + net_dev_b->free_hw_skb = free_hw_skb; + ... + +4) Then, when driver "B" wants to allocate skb, "alloc_hw_skb" +will be called to allocate hardware manager skb firstly, if +failed, the normal skb will also be allocate, if successed, +the skb will be freed by calling free_hw_skb when "kfree_skb" +is called to free this skb. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 838407a..42b6158 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1689,6 +1689,11 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; int group; struct pm_qos_request pm_qos_req; +#ifdef CONFIG_USE_HW_SKB + void *hw_skb_priv; + struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length); + void (*free_hw_skb)(struct sk_buff *skb); +#endif }; #define to_net_dev(d) container_of(d, struct net_device, dev) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 776104b..d9afdeb 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -436,6 +436,16 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1, } +/* hw_skb_state list */ +enum hw_skb_state { + /* If set, SKB use hardware managed buffer */ + IS_HW_SKB = 1 << 0, + /* If set, and skb can be freed by software by calling + * netdev->free_hw_skb + */ + HW_SKB_SW_FREE = 1 << 1, +}; + /** * struct sk_buff - socket buffer * @next: Next buffer in list @@ -646,6 +656,12 @@ struct sk_buff { __u16 network_header; __u16 mac_header; +#ifdef CONFIG_USE_HW_SKB + __u32 hw_skb_state; + void *hw_skb_priv; + void (*free_hw_skb)(struct sk_buff *skb); +#endif + __u32 headers_end[0]; /* These elements must be at the end, see alloc_skb() for details. */ diff --git a/net/Kconfig b/net/Kconfig index d6b138e..346e021 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT with many clients some protection against DoS by a single (spoofed) flow that greatly exceeds average workload. +config USE_HW_SKB + bool "NIC use hardware managed buffer to build skb" + depends on INET + ---help--- + If select this, the third party drivers will use hardware managed + buffers to allocate SKB without any modification for the driver. + + Documentation on how to use this featue can be found at + <file:Documentation/networking/hw_skb.txt>. + menu "Network testing" config NET_PKTGEN diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7b3df0d..f8603e5 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -415,6 +415,19 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); +#ifdef CONFIG_USE_HW_SKB + if (dev->alloc_hw_skb) { + skb = dev->alloc_hw_skb(dev->hw_skb_priv, length); + if (likely(skb)) { + skb->hw_skb_state |= IS_HW_SKB; + skb->hw_skb_priv = dev->hw_skb_priv; + skb->free_hw_skb = dev->free_hw_skb; + skb_reserve(skb, NET_SKB_PAD); + skb->dev = dev; + return skb; + } + } +#endif if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) { void *data; @@ -432,6 +445,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); } + if (likely(skb)) { skb_reserve(skb, NET_SKB_PAD); skb->dev = dev; @@ -483,6 +497,15 @@ static void skb_clone_fraglist(struct sk_buff *skb) static void skb_free_head(struct sk_buff *skb) { +#ifdef CONFIG_USE_HW_SKB + if (skb->hw_skb_state & IS_HW_SKB) { + if (skb->hw_skb_state & HW_SKB_SW_FREE) { + BUG_ON(!skb->free_hw_skb); + skb->free_hw_skb(skb); + } + return; + } +#endif if (skb->head_frag) put_page(virt_to_head_page(skb->head)); else @@ -506,6 +529,10 @@ static void skb_release_data(struct sk_buff *skb) * If skb buf is from userspace, we need to notify the caller * the lower device DMA has done; */ +#ifdef CONFIG_USE_HW_SKB + if (skb->hw_skb_state & IS_HW_SKB) + goto skip_callback; +#endif if (shinfo->tx_flags & SKBTX_DEV_ZEROCOPY) { struct ubuf_info *uarg; @@ -514,6 +541,7 @@ static void skb_release_data(struct sk_buff *skb) uarg->callback(uarg, true); } +skip_callback: if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list);
In some platform, there are some hardware block provided to manage buffers to improve performance. So in some case, it is expected that the packets received by some generic NIC should be put into such hardware managed buffers directly, so that such buffer can be released by hardware or by driver. This patch provide such general APIs for generic NIC to use hardware block managed buffers without any modification for generic NIC drivers. In this patch, the following fields are added to "net_device": void *hw_skb_priv; struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length); void (*free_hw_skb)(struct sk_buff *skb); so in order to let generic NIC driver to use hardware managed buffers, the function "alloc_hw_skb" and "free_hw_skb" provide implementation for allocate and free hardware managed buffers. "hw_skb_priv" is provided to pass some private data for these two functions. When the socket buffer is allocated by these APIs, "hw_skb_state" is provided in struct "sk_buff". this argument can indicate that the buffer is hardware managed buffer, this buffer should freed by software or by hardware. Documentation on how to use this featue can be found at <file:Documentation/networking/hw_skb.txt>. Signed-off-by: Pan Jiafei <Jiafei.Pan@freescale.com> --- Documentation/networking/hw_skb.txt | 117 ++++++++++++++++++++++++++++++++++++ include/linux/netdevice.h | 5 ++ include/linux/skbuff.h | 16 +++++ net/Kconfig | 10 +++ net/core/skbuff.c | 28 +++++++++ 5 files changed, 176 insertions(+) create mode 100644 Documentation/networking/hw_skb.txt