diff mbox

net: use hardware buffer pool to allocate skb

Message ID 1413343571-33231-1-git-send-email-Jiafei.Pan@freescale.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Pan Jiafei Oct. 15, 2014, 3:26 a.m. UTC
In some platform, there are some hardware block provided
to manage buffers to improve performance. So in some case,
it is expected that the packets received by some generic
NIC should be put into such hardware managed buffers
directly, so that such buffer can be released by hardware
or by driver.

This patch provide such general APIs for generic NIC to
use hardware block managed buffers without any modification
for generic NIC drivers.
In this patch, the following fields are added to "net_device":
    void *hw_skb_priv;
    struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
    void (*free_hw_skb)(struct sk_buff *skb);
so in order to let generic NIC driver to use hardware managed
buffers, the function "alloc_hw_skb" and "free_hw_skb"
provide implementation for allocate and free hardware managed
buffers. "hw_skb_priv" is provided to pass some private data for
these two functions.

When the socket buffer is allocated by these APIs, "hw_skb_state"
is provided in struct "sk_buff". this argument can indicate
that the buffer is hardware managed buffer, this buffer
should freed by software or by hardware.

Documentation on how to use this featue can be found at
<file:Documentation/networking/hw_skb.txt>.

Signed-off-by: Pan Jiafei <Jiafei.Pan@freescale.com>
---
 Documentation/networking/hw_skb.txt | 117 ++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h           |   5 ++
 include/linux/skbuff.h              |  16 +++++
 net/Kconfig                         |  10 +++
 net/core/skbuff.c                   |  28 +++++++++
 5 files changed, 176 insertions(+)
 create mode 100644 Documentation/networking/hw_skb.txt

Comments

Eric Dumazet Oct. 15, 2014, 4:15 a.m. UTC | #1
On Wed, 2014-10-15 at 11:26 +0800, Pan Jiafei wrote:
> In some platform, there are some hardware block provided
> to manage buffers to improve performance. So in some case,
> it is expected that the packets received by some generic
> NIC should be put into such hardware managed buffers
> directly, so that such buffer can be released by hardware
> or by driver.

You repeat 'some' four times.

> 
> This patch provide such general APIs for generic NIC to
> use hardware block managed buffers without any modification
> for generic NIC drivers.

...

> In this patch, the following fields are added to "net_device":
>     void *hw_skb_priv;
>     struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
>     void (*free_hw_skb)(struct sk_buff *skb);
> so in order to let generic NIC driver to use hardware managed
> buffers, the function "alloc_hw_skb" and "free_hw_skb"
> provide implementation for allocate and free hardware managed
> buffers. "hw_skb_priv" is provided to pass some private data for
> these two functions.
> 
> When the socket buffer is allocated by these APIs, "hw_skb_state"
> is provided in struct "sk_buff". this argument can indicate
> that the buffer is hardware managed buffer, this buffer
> should freed by software or by hardware.
> 
> Documentation on how to use this featue can be found at
> <file:Documentation/networking/hw_skb.txt>.
> 
> Signed-off-by: Pan Jiafei <Jiafei.Pan@freescale.com>


I am giving a strong NACK, of course.

We are not going to grow sk_buff and add yet another conditional in fast
path for a very obscure feature like that.

Memory management is not going to be done by drivers.

The way it should work is that if your hardware has specific needs, rx
and tx paths (of the driver) need to make the needed adaptation.
Not the other way.

We already have complex skb layouts, we do not need a new one. 

Take a look at how drivers can 'lock' pages already, and build skb sith
page frags. It is already there.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 15, 2014, 4:25 a.m. UTC | #2
From: Pan Jiafei <Jiafei.Pan@freescale.com>
Date: Wed, 15 Oct 2014 11:26:11 +0800

> In some platform, there are some hardware block provided
> to manage buffers to improve performance. So in some case,
> it is expected that the packets received by some generic
> NIC should be put into such hardware managed buffers
> directly, so that such buffer can be released by hardware
> or by driver.
> 
> This patch provide such general APIs for generic NIC to
> use hardware block managed buffers without any modification
> for generic NIC drivers.

Why are existing interfaces insufficent for your needs?

Several ethernet drivers already build SKBs from block
buffer pools.

They allocate pools of pages which the hardware divides into various
powers of 2, then the RX descriptor says what pieces of which pools
were used to hold the data for a packet, and then the SKB is
constructed with page frags pointing to those locations.

It's very cheap, inexpensive, and existing APIs are considered to
cover all use cases.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 15, 2014, 4:26 a.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 14 Oct 2014 21:15:03 -0700

> Take a look at how drivers can 'lock' pages already, and build skb
> sith page frags. It is already there.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Hartkopp Oct. 15, 2014, 4:59 a.m. UTC | #4
On 15.10.2014 05:26, Pan Jiafei wrote:
> In some platform, there are some hardware block provided
> to manage buffers to improve performance.

(..)

> diff --git a/net/Kconfig b/net/Kconfig
> index d6b138e..346e021 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT
>   	  with many clients some protection against DoS by a single (spoofed)
>   	  flow that greatly exceeds average workload.
>
> +config USE_HW_SKB
> +	bool "NIC use hardware managed buffer to build skb"
> +	depends on INET

The feature seems to be valid for network devices in general.
Why did you make it depending on INET ??

Regards,
Oliver

> +	---help---
> +	  If select this, the third party drivers will use hardware managed
> +	  buffers to allocate SKB without any modification for the driver.
> +
> +	  Documentation on how to use this featue can be found at
> +	  <file:Documentation/networking/hw_skb.txt>.
> +
>   menu "Network testing"
>
>   config NET_PKTGEN

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pan Jiafei Oct. 15, 2014, 5:34 a.m. UTC | #5
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, October 15, 2014 12:25 PM
> To: Pan Jiafei-B37022
> Cc: jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472; linux-
> doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> From: Pan Jiafei <Jiafei.Pan@freescale.com>
> Date: Wed, 15 Oct 2014 11:26:11 +0800
> 
> > In some platform, there are some hardware block provided
> > to manage buffers to improve performance. So in some case,
> > it is expected that the packets received by some generic
> > NIC should be put into such hardware managed buffers
> > directly, so that such buffer can be released by hardware
> > or by driver.
> >
> > This patch provide such general APIs for generic NIC to
> > use hardware block managed buffers without any modification
> > for generic NIC drivers.
> 
> Why are existing interfaces insufficent for your needs?
> 
> Several ethernet drivers already build SKBs from block
> buffer pools.
> 
Yes, for this matter, in order to do this we should modify the Ethernet
drivers. For example, driver A want to driver B, C, D.. to support driver
A's Hardware block access functions, so we have to modify driver B, C, D...
It will be so complex for this matter.

But by using my patch, I just modify a Ethernet device (I don't care
Which driver it is used) flag in driver A in order to implement this
Ethernet device using hardware block access functions provided by
Driver A.

> They allocate pools of pages which the hardware divides into various
> powers of 2, then the RX descriptor says what pieces of which pools
> were used to hold the data for a packet, and then the SKB is
> constructed with page frags pointing to those locations.
> 
> It's very cheap, inexpensive, and existing APIs are considered to
> cover all use cases.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pan Jiafei Oct. 15, 2014, 5:47 a.m. UTC | #6
> -----Original Message-----
> From: Oliver Hartkopp [mailto:socketcan@hartkopp.net]
> Sent: Wednesday, October 15, 2014 12:59 PM
> To: Pan Jiafei-B37022; davem@davemloft.net; jkosina@suse.cz
> Cc: netdev@vger.kernel.org; Li Yang-Leo-R58472; linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> On 15.10.2014 05:26, Pan Jiafei wrote:
> > In some platform, there are some hardware block provided
> > to manage buffers to improve performance.
> 
> (..)
[Pan Jiafei] I want to build a general patch to build skb
by using hardware buffer manager block.
> 
> > diff --git a/net/Kconfig b/net/Kconfig
> > index d6b138e..346e021 100644
> > --- a/net/Kconfig
> > +++ b/net/Kconfig
> > @@ -291,6 +291,16 @@ config NET_FLOW_LIMIT
> >   	  with many clients some protection against DoS by a single (spoofed)
> >   	  flow that greatly exceeds average workload.
> >
> > +config USE_HW_SKB
> > +	bool "NIC use hardware managed buffer to build skb"
> > +	depends on INET
> 
> The feature seems to be valid for network devices in general.
> Why did you make it depending on INET ??
> 
> Regards,
> Oliver
> 
[Pan Jiafei] Yes, INET dependency should be removed, thanks.
> > +	---help---
> > +	  If select this, the third party drivers will use hardware managed
> > +	  buffers to allocate SKB without any modification for the driver.
> > +
> > +	  Documentation on how to use this featue can be found at
> > +	  <file:Documentation/networking/hw_skb.txt>.
> > +
> >   menu "Network testing"
> >
> >   config NET_PKTGEN

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Oct. 15, 2014, 8:57 a.m. UTC | #7
From: Pan Jiafei
> In some platform, there are some hardware block provided
> to manage buffers to improve performance. So in some case,
> it is expected that the packets received by some generic
> NIC should be put into such hardware managed buffers
> directly, so that such buffer can be released by hardware
> or by driver.

This looks like some strange variant of 'buffer loaning'.
In general it just doesn't work due to the limited number
of such buffers - they soon all become queued waiting for
applications to read from sockets.

It also isn't at all clear how you expect a 'generic NIC'
to actually allocate buffers from your 'special area'.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 15, 2014, 9:15 a.m. UTC | #8
On Wed, 2014-10-15 at 05:43 +0000, Jiafei.Pan@freescale.com wrote:

> For my case, there are some shortcoming to use page frags. Firstly, I have to
> modify each Ethernet drivers to support it especially I don’t which vendor's
> driver the customer will use. Secondly, it is not enough only 
> build skb by 'lock' pages, the buffer address comes from hardware block and
> should be aligned to hardware block.

So you align to hardware block. What is the problem with this ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 15, 2014, 9:15 a.m. UTC | #9
On Wed, 2014-10-15 at 05:34 +0000, Jiafei.Pan@freescale.com wrote:
 
> Yes, for this matter, in order to do this we should modify the Ethernet
> drivers. For example, driver A want to driver B, C, D.. to support driver
> A's Hardware block access functions, so we have to modify driver B, C, D...
> It will be so complex for this matter.
> 
> But by using my patch, I just modify a Ethernet device (I don't care
> Which driver it is used) flag in driver A in order to implement this
> Ethernet device using hardware block access functions provided by
> Driver A.

We care a lot of all the bugs added by your patches. You have little
idea of how many of them were added. We do not want to spend days of
work explaining everything or fixing all the details for you.

Carefully read net/core/skbuff.c, net/core/dev.c, GRO layer, you'll see
how many spots you missed.

You cannot control how skbs are cooked before reaching your driver
ndo_start_xmit(). You are not going to add hooks in UDP , TCP, or other
drivers RX path. This would be absolutely insane.

Trying to control how skb are cooked in RX path is absolutely something
drivers do, using page frags that are read-only by all the stack.

Fix your driver to use existing infra, your suggestion is not going to
be accepted.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger Oct. 15, 2014, 9:33 a.m. UTC | #10
Since an skb can sit forever in an application queue, you have created
an easy way to livelock the system when enough skb's are waiting to be
read.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 15, 2014, 3:51 p.m. UTC | #11
From: "Jiafei.Pan@freescale.com" <Jiafei.Pan@freescale.com>
Date: Wed, 15 Oct 2014 05:34:24 +0000

> Yes, for this matter, in order to do this we should modify the Ethernet
> drivers. For example, driver A want to driver B, C, D.. to support driver
> A's Hardware block access functions, so we have to modify driver B, C, D...
> It will be so complex for this matter.

Experience says otherwise.  It's three or four lines of code to attach
a page to an SKB frag.

The driver needs it's own buffer management and setup code anyways,
and no generic facility will replace that.

I think your precondition for these changes therefore doesn't really
exist.

Please, look over all of the drivers that exist already in the tree
and build SKBs using page frage from hardware device buffer pools.

You have to show us that all of those drivers can make use of your
facility.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pan Jiafei Oct. 16, 2014, 2:17 a.m. UTC | #12
> -----Original Message-----

> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]

> Sent: Wednesday, October 15, 2014 5:16 PM

> To: Pan Jiafei-B37022

> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;

> linux-doc@vger.kernel.org

> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb

> 

> On Wed, 2014-10-15 at 05:34 +0000, Jiafei.Pan@freescale.com wrote:

> 

> > Yes, for this matter, in order to do this we should modify the Ethernet

> > drivers. For example, driver A want to driver B, C, D.. to support driver

> > A's Hardware block access functions, so we have to modify driver B, C, D...

> > It will be so complex for this matter.

> >

> > But by using my patch, I just modify a Ethernet device (I don't care

> > Which driver it is used) flag in driver A in order to implement this

> > Ethernet device using hardware block access functions provided by

> > Driver A.

> 

> We care a lot of all the bugs added by your patches. You have little

> idea of how many of them were added. We do not want to spend days of

> work explaining everything or fixing all the details for you.

> 

> Carefully read net/core/skbuff.c, net/core/dev.c, GRO layer, you'll see

> how many spots you missed.

> 

> You cannot control how skbs are cooked before reaching your driver

> ndo_start_xmit(). You are not going to add hooks in UDP , TCP, or other

> drivers RX path. This would be absolutely insane.

> 


Thanks for your comments and suggestion. In my case, I want to build skb
from hardware block specified memory, I only can see two ways, one is modified
net card driver replace common skb allocation function with my specially
functions, another way is to hack common skb allocation function in which
redirect to my specially functions. My patch is just for the second way.
Except these two ways, would you please give me some advice for some other
ways for my case? Thanks.

> Trying to control how skb are cooked in RX path is absolutely something

> drivers do, using page frags that are read-only by all the stack.

> 

> Fix your driver to use existing infra, your suggestion is not going to

> be accepted.

>
Pan Jiafei Oct. 16, 2014, 2:30 a.m. UTC | #13
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, October 15, 2014 5:33 PM
> To: Pan Jiafei-B37022
> Cc: davem@davemloft.net; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-
> R58472; linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> Since an skb can sit forever in an application queue, you have created
> an easy way to livelock the system when enough skb's are waiting to be
> read.

I think there is no possible to livelock the system, because in my patch
The function __netdev_alloc_skb will try to allocate hardware block buffer
Firstly if dev->alloc_hw_skb is set, but it will continue allocate normal
skb buffer if the hardware block buffer allocation fails.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 16, 2014, 4:15 a.m. UTC | #14
On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote:

> Thanks for your comments and suggestion. In my case, I want to build skb
> from hardware block specified memory, I only can see two ways, one is modified
> net card driver replace common skb allocation function with my specially
> functions, another way is to hack common skb allocation function in which
> redirect to my specially functions. My patch is just for the second way.
> Except these two ways, would you please give me some advice for some other
> ways for my case? Thanks

I suggest you read drivers/net/ethernet numerous examples.

No need to change anything  in net/* or include/*, really.

For a start, look at drivers/net/ethernet/intel/igb/igb_main.c

Mentioning 'hack' in your mails simply should hint you are doing
something very wrong.

What makes you think your hardware is so special ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pan Jiafei Oct. 16, 2014, 5:15 a.m. UTC | #15
> -----Original Message-----

> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]

> Sent: Thursday, October 16, 2014 12:15 PM

> To: Pan Jiafei-B37022

> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;

> linux-doc@vger.kernel.org

> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb

> 

> On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote:

> 

> > Thanks for your comments and suggestion. In my case, I want to build skb

> > from hardware block specified memory, I only can see two ways, one is modified

> > net card driver replace common skb allocation function with my specially

> > functions, another way is to hack common skb allocation function in which

> > redirect to my specially functions. My patch is just for the second way.

> > Except these two ways, would you please give me some advice for some other

> > ways for my case? Thanks

> 

> I suggest you read drivers/net/ethernet numerous examples.

> 

> No need to change anything  in net/* or include/*, really.

> 

> For a start, look at drivers/net/ethernet/intel/igb/igb_main.c

> 

> Mentioning 'hack' in your mails simply should hint you are doing

> something very wrong.

> 

> What makes you think your hardware is so special ?

> 

In fact, I am developing a bridge driver, it can bridge between any other the
third party net card and my own net card. My target is to let any other the
third party net card can directly use my own net card specified buffer, then
there will be no memory copy in the whole bridge process.
By the way, I don’t see any similar between igb_main.c and my case. And also
My bridge also can’t implemented with "skb frag" in order to aim at zero memory
copy.
Alexander Duyck Oct. 16, 2014, 3:28 p.m. UTC | #16
On 10/15/2014 10:15 PM, Jiafei.Pan@freescale.com wrote:
>> -----Original Message-----
>> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>> Sent: Thursday, October 16, 2014 12:15 PM
>> To: Pan Jiafei-B37022
>> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;
>> linux-doc@vger.kernel.org
>> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
>>
>> On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote:
>>
>>> Thanks for your comments and suggestion. In my case, I want to build skb
>>> from hardware block specified memory, I only can see two ways, one is modified
>>> net card driver replace common skb allocation function with my specially
>>> functions, another way is to hack common skb allocation function in which
>>> redirect to my specially functions. My patch is just for the second way.
>>> Except these two ways, would you please give me some advice for some other
>>> ways for my case? Thanks
>> I suggest you read drivers/net/ethernet numerous examples.
>>
>> No need to change anything  in net/* or include/*, really.
>>
>> For a start, look at drivers/net/ethernet/intel/igb/igb_main.c
>>
>> Mentioning 'hack' in your mails simply should hint you are doing
>> something very wrong.
>>
>> What makes you think your hardware is so special ?
>>
> In fact, I am developing a bridge driver, it can bridge between any other the
> third party net card and my own net card. My target is to let any other the
> third party net card can directly use my own net card specified buffer, then
> there will be no memory copy in the whole bridge process.
> By the way, I don’t see any similar between igb_main.c and my case. And also
> My bridge also can’t implemented with "skb frag" in order to aim at zero memory
> copy.

I think the part you are not getting is that is how buffers are 
essentially handled now.  So for example in the case if igb the only 
part we have copied out is usually the header, or the entire frame in 
the case of small packets.  This has to happen in order to allow for 
changes to the header for routing and such.  Beyond that the frags that 
are passed are the buffers that igb is still holding onto.  So 
effectively what the other device transmits in a bridging/routing 
scenario is my own net card specified buffer plus the copied/modified 
header.

For a brief period igb used build_skb but that isn't valid on most 
systems as memory mapped for a device can be overwritten if the page is 
unmapped resulting in any changes to the header for routing/bridging 
purposes being invalidated.  Thus we cannot use the buffers for both the 
skb->data header which may be changed and Rx DMA simultaneously.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pan Jiafei Oct. 17, 2014, 2:35 a.m. UTC | #17
> -----Original Message-----

> From: Alexander Duyck [mailto:alexander.h.duyck@redhat.com]

> Sent: Thursday, October 16, 2014 11:28 PM

> To: Pan Jiafei-B37022; Eric Dumazet

> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;

> linux-doc@vger.kernel.org

> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb

> 

> 

> On 10/15/2014 10:15 PM, Jiafei.Pan@freescale.com wrote:

> >> -----Original Message-----

> >> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]

> >> Sent: Thursday, October 16, 2014 12:15 PM

> >> To: Pan Jiafei-B37022

> >> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;

> >> linux-doc@vger.kernel.org

> >> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb

> >>

> >> On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote:

> >>

> >>> Thanks for your comments and suggestion. In my case, I want to build skb

> >>> from hardware block specified memory, I only can see two ways, one is

> modified

> >>> net card driver replace common skb allocation function with my specially

> >>> functions, another way is to hack common skb allocation function in which

> >>> redirect to my specially functions. My patch is just for the second way.

> >>> Except these two ways, would you please give me some advice for some other

> >>> ways for my case? Thanks

> >> I suggest you read drivers/net/ethernet numerous examples.

> >>

> >> No need to change anything  in net/* or include/*, really.

> >>

> >> For a start, look at drivers/net/ethernet/intel/igb/igb_main.c

> >>

> >> Mentioning 'hack' in your mails simply should hint you are doing

> >> something very wrong.

> >>

> >> What makes you think your hardware is so special ?

> >>

> > In fact, I am developing a bridge driver, it can bridge between any other the

> > third party net card and my own net card. My target is to let any other the

> > third party net card can directly use my own net card specified buffer, then

> > there will be no memory copy in the whole bridge process.

> > By the way, I don’t see any similar between igb_main.c and my case. And also

> > My bridge also can’t implemented with "skb frag" in order to aim at zero

> memory

> > copy.

> 

> I think the part you are not getting is that is how buffers are

> essentially handled now.  


[Pan Jiafei] Hi, Alex, thanks for your comments. I don’t confirm that
you have catch my concerns. For example, I want to add igb net card 
into my bridge, and want to igb net driver allocate skb by using
my specified memory address, but I don’t want to modify igb net driver
directly, how to do this in my bridge drivers?

Thanks,
Jiafei.

So for example in the case if igb the only
> part we have copied out is usually the header, or the entire frame in

> the case of small packets.  This has to happen in order to allow for

> changes to the header for routing and such.  Beyond that the frags that

> are passed are the buffers that igb is still holding onto.  So

> effectively what the other device transmits in a bridging/routing

> scenario is my own net card specified buffer plus the copied/modified

> header.

> 

> For a brief period igb used build_skb but that isn't valid on most

> systems as memory mapped for a device can be overwritten if the page is

> unmapped resulting in any changes to the header for routing/bridging

> purposes being invalidated.  Thus we cannot use the buffers for both the

> skb->data header which may be changed and Rx DMA simultaneously.

> 

> Thanks,

> 

> Alex
Eric Dumazet Oct. 17, 2014, 2:05 p.m. UTC | #18
On Fri, 2014-10-17 at 02:35 +0000, Jiafei.Pan@freescale.com wrote:

> [Pan Jiafei] Hi, Alex, thanks for your comments. I don’t confirm that
> you have catch my concerns. For example, I want to add igb net card 
> into my bridge, and want to igb net driver allocate skb by using
> my specified memory address, but I don’t want to modify igb net driver
> directly, how to do this in my bridge drivers?

This is exactly our point : We do not want to modify all drivers so that
your bridge is happy with them.

You'll have to make your bridge using standard infra instead.

IGB has no way to know in advance that a particular frame should
eventually reach your bridge anyway.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander H Duyck Oct. 17, 2014, 2:12 p.m. UTC | #19
On 10/17/2014 07:05 AM, Eric Dumazet wrote:
> On Fri, 2014-10-17 at 02:35 +0000, Jiafei.Pan@freescale.com wrote:
>
>> [Pan Jiafei] Hi, Alex, thanks for your comments. I don’t confirm that
>> you have catch my concerns. For example, I want to add igb net card 
>> into my bridge, and want to igb net driver allocate skb by using
>> my specified memory address, but I don’t want to modify igb net driver
>> directly, how to do this in my bridge drivers?
> This is exactly our point : We do not want to modify all drivers so that
> your bridge is happy with them.
>
> You'll have to make your bridge using standard infra instead.
>
> IGB has no way to know in advance that a particular frame should
> eventually reach your bridge anyway.

Also why is it igb should use your buffers, is there any reason why your
device cannot use the receive buffers that are handed off to the stack
from igb?  It isn't as if there is a copy in the routing or bridging
path.  The receive buffer is normally handed off to the stack from the
ingress device, a few headers might get tweaked, and then that buffer is
transmitted by the egress interface.  Only in the case of a buffer being
handed to multiple egress devices or user space should it ever be copied.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/hw_skb.txt b/Documentation/networking/hw_skb.txt
new file mode 100644
index 0000000..256f3fc
--- /dev/null
+++ b/Documentation/networking/hw_skb.txt
@@ -0,0 +1,117 @@ 
+Document for using hardware managed SKB.
+
+1. Description
+
+In some platform, there are some hardware block provided
+to manage buffers to improve performance. So in some case,
+it is expected that the packets received by some generic
+NIC should be put into such hardware managed buffers
+directly, so that such buffer can be released by hardware
+or by driver.
+
+2. Related Struct Definition
+
+Some general APIs are provided for generic NIC to use hardware
+block managed buffers without any modification for generic NIC
+drivers.
+
+1)Kernel Configuration Item
+
+	"CONFIG_USE_HW_SKB"
+
+2)The DEVICE structure
+
+	struct net_device {
+		...
+	#ifdef CONFIG_USE_HW_SKB
+		void *hw_skb_priv;
+		struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
+		void (*free_hw_skb)(struct sk_buff *skb);
+	#endif
+		...
+	}
+
+"hw_skb_priv" is private data for "alloc_hw_skb" and "free_hw_skb" functions.
+"alloc_hw_skb" is for allocating skb by using hardware managed buffer.
+"free_hw_skb" is for freeing skb allocated by hardware manager buffer.
+
+3)struct sk_buff - socket buffer
+
+	struct sk_buff {
+		...
+	#ifdef CONFIG_SKB_USE_HW_BP
+		__u32 			hw_skb_state;
+		void			*hw_skb_priv;
+		void 			(*free_hw_skb)(struct sk_buff *skb);
+	#endif
+		...
+	}
+
+	/* hw_skb_state list */
+	enum hw_skb_state {
+		/* If set, SKB use hardware managed buffer */
+		IS_HW_SKB = 1 << 0,
+		/* If set, and skb can be freed by software by calling
+		 * netdev->free_hw_skb
+		 */
+		HW_SKB_SW_FREE = 1 << 1,
+	};
+
+"hw_skb_priv" and "free_hw_skb" are the same with the field in the
+struct "net_device"
+
+After calling "alloc_hw_skb" to allocate skb by using hardware managed
+buffers, "hw_skb_priv" and "free_hw_skb" is set in SKB driver:
+	skb->hw_skb_priv = dev->hw_skb_priv;
+	skb->free_hw_skb = dev->free_hw_skb;
+So that when "struct net_device	*dev" is changed after the skb is allocated,
+It is be confirmed that this skb can be freed by the method synced
+with allocation.
+
+"hw_skb_state" indicates that the state of SKB. When the skb is allocated
+by "alloc_hw_skb" function, the flag of "IS_HW_SKB" is set by
+"__netdev_alloc_skb" function in skbuff.c when returned from "alloc_hw_skb".
+But in "alloc_hw_skb", "HW_SKB_SW_FREE" must be set if the skb should be
+freed by calling "free_hw_skb", otherwise, the skb will never be freed by
+any driver until it is freed by hardware block.
+
+SKB using hardware managed buffer is not recycleable.
+
+3. How to use this feature
+
+For example, driver "A" wants the third-party NIC driver "B" to
+store the data in some hardware managed buffer then send to "A".
+
+1) Select "CONFIG_USE_HW_SKB" to enable this feature.
+
+2) In driver "A", implement the function "alloc_hw_skb" and
+"free_hw_skb". For example:
+
+struct sk_buff *alloc_hw_skb(void *priv, unsigned int length)
+{
+	buf = alloc_hw_buffer();
+	skb = build_skb(buf, ...);
+	if (skb)
+		skb->hw_skb_state |= HW_SKB_SW_FREE;
+
+	return skb;
+}
+
+void free_hw_skb(struct sk_buff *skb)
+{
+	free_hw_buffer(skb->head);
+}
+
+3) In driver "A", get "net_device" handle of net device case using
+driver "B".
+	...
+	net_dev_b->hw_skb_priv = priv;
+	net_dev_b->alloc_hw_skb = alloc_hw_skb;
+	net_dev_b->free_hw_skb = free_hw_skb;
+	...
+
+4) Then, when driver "B" wants to allocate skb, "alloc_hw_skb"
+will be called to allocate hardware manager skb firstly, if
+failed, the normal skb will also be allocate, if successed,
+the skb will be freed by calling free_hw_skb when "kfree_skb"
+is called to free this skb.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 838407a..42b6158 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1689,6 +1689,11 @@  struct net_device {
 	struct lock_class_key *qdisc_tx_busylock;
 	int group;
 	struct pm_qos_request	pm_qos_req;
+#ifdef CONFIG_USE_HW_SKB
+	void *hw_skb_priv;
+	struct sk_buff *(*alloc_hw_skb)(void *hw_skb_priv, unsigned int length);
+	void (*free_hw_skb)(struct sk_buff *skb);
+#endif
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 776104b..d9afdeb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -436,6 +436,16 @@  static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
 }
 
 
+/* hw_skb_state list */
+enum hw_skb_state {
+	/* If set, SKB use hardware managed buffer */
+	IS_HW_SKB = 1 << 0,
+	/* If set, and skb can be freed by software by calling
+	 * netdev->free_hw_skb
+	 */
+	HW_SKB_SW_FREE = 1 << 1,
+};
+
 /** 
  *	struct sk_buff - socket buffer
  *	@next: Next buffer in list
@@ -646,6 +656,12 @@  struct sk_buff {
 	__u16			network_header;
 	__u16			mac_header;
 
+#ifdef CONFIG_USE_HW_SKB
+	__u32			hw_skb_state;
+	void			*hw_skb_priv;
+	void			(*free_hw_skb)(struct sk_buff *skb);
+#endif
+
 	__u32			headers_end[0];
 
 	/* These elements must be at the end, see alloc_skb() for details.  */
diff --git a/net/Kconfig b/net/Kconfig
index d6b138e..346e021 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -291,6 +291,16 @@  config NET_FLOW_LIMIT
 	  with many clients some protection against DoS by a single (spoofed)
 	  flow that greatly exceeds average workload.
 
+config USE_HW_SKB
+	bool "NIC use hardware managed buffer to build skb"
+	depends on INET
+	---help---
+	  If select this, the third party drivers will use hardware managed
+	  buffers to allocate SKB without any modification for the driver.
+
+	  Documentation on how to use this featue can be found at
+	  <file:Documentation/networking/hw_skb.txt>.
+
 menu "Network testing"
 
 config NET_PKTGEN
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7b3df0d..f8603e5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -415,6 +415,19 @@  struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 	unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
+#ifdef CONFIG_USE_HW_SKB
+	if (dev->alloc_hw_skb) {
+		skb = dev->alloc_hw_skb(dev->hw_skb_priv, length);
+		if (likely(skb)) {
+			skb->hw_skb_state |= IS_HW_SKB;
+			skb->hw_skb_priv = dev->hw_skb_priv;
+			skb->free_hw_skb = dev->free_hw_skb;
+			skb_reserve(skb, NET_SKB_PAD);
+			skb->dev = dev;
+			return skb;
+		}
+	}
+#endif
 	if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) {
 		void *data;
 
@@ -432,6 +445,7 @@  struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 		skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
 				  SKB_ALLOC_RX, NUMA_NO_NODE);
 	}
+
 	if (likely(skb)) {
 		skb_reserve(skb, NET_SKB_PAD);
 		skb->dev = dev;
@@ -483,6 +497,15 @@  static void skb_clone_fraglist(struct sk_buff *skb)
 
 static void skb_free_head(struct sk_buff *skb)
 {
+#ifdef CONFIG_USE_HW_SKB
+	if (skb->hw_skb_state & IS_HW_SKB) {
+		if (skb->hw_skb_state & HW_SKB_SW_FREE) {
+			BUG_ON(!skb->free_hw_skb);
+			skb->free_hw_skb(skb);
+		}
+		return;
+	}
+#endif
 	if (skb->head_frag)
 		put_page(virt_to_head_page(skb->head));
 	else
@@ -506,6 +529,10 @@  static void skb_release_data(struct sk_buff *skb)
 	 * If skb buf is from userspace, we need to notify the caller
 	 * the lower device DMA has done;
 	 */
+#ifdef CONFIG_USE_HW_SKB
+	if (skb->hw_skb_state & IS_HW_SKB)
+		goto skip_callback;
+#endif
 	if (shinfo->tx_flags & SKBTX_DEV_ZEROCOPY) {
 		struct ubuf_info *uarg;
 
@@ -514,6 +541,7 @@  static void skb_release_data(struct sk_buff *skb)
 			uarg->callback(uarg, true);
 	}
 
+skip_callback:
 	if (shinfo->frag_list)
 		kfree_skb_list(shinfo->frag_list);