diff mbox

[0/4] skb paged fragment destructors

Message ID 1324550017.7877.77.camel@zakaz.uk.xensource.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Ian Campbell Dec. 22, 2011, 10:33 a.m. UTC
On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 21 Dec 2011 15:02:18 +0100
> 
> > No idea on this +2 point.
> 
> I think I know, and I believe I instructed Alexey Kuznetsov to do
> this.
> 
> When sendfile() is performed, we might start the SKB with the last few
> bytes of one page, and end the SKB with the first few bytes of another
> page.
> 
> In order to fit a full 64K frame into an SKB in this situation we have
> to accomodate this case.

Thanks David, that makes sense.

However I think you only actually need 1 extra page for that. If the
data in frag[0] starts at $offset then frag[16] will need to have
$offset bytes in it. e.g.
	4096-$offset + 4096*15 + $offset = 65536
which == 17 pages rather than 18.

The following documents the status quo but I could update to switch to +
1 instead if there are no flaws in the above logic...

Ian.

From 0a327232d6df809c4cf294edc8d02f6f88dd5678 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Thu, 22 Dec 2011 10:07:19 +0000
Subject: [PATCH] net: document reason for 2 additional pages in MAX_SKB_FRAGS

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 include/linux/skbuff.h |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

Comments

David Miller Dec. 22, 2011, 6:20 p.m. UTC | #1
From: Ian Campbell <Ian.Campbell@citrix.com>
Date: Thu, 22 Dec 2011 10:33:36 +0000

> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Wed, 21 Dec 2011 15:02:18 +0100
>> 
>> > No idea on this +2 point.
>> 
>> I think I know, and I believe I instructed Alexey Kuznetsov to do
>> this.
>> 
>> When sendfile() is performed, we might start the SKB with the last few
>> bytes of one page, and end the SKB with the first few bytes of another
>> page.
>> 
>> In order to fit a full 64K frame into an SKB in this situation we have
>> to accomodate this case.
> 
> Thanks David, that makes sense.
> 
> However I think you only actually need 1 extra page for that. If the
> data in frag[0] starts at $offset then frag[16] will need to have
> $offset bytes in it. e.g.
> 	4096-$offset + 4096*15 + $offset = 65536
> which == 17 pages rather than 18.
> 
> The following documents the status quo but I could update to switch to +
> 1 instead if there are no flaws in the above logic...

Indeed, you're right.  Please change this to 1 and document it, and we
can put that change into net-next, thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
=?ISO-8859-2?Q?Micha=B3_Miros=B3aw?= Dec. 22, 2011, 6:34 p.m. UTC | #2
2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>:
> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Wed, 21 Dec 2011 15:02:18 +0100
>> > No idea on this +2 point.
>> I think I know, and I believe I instructed Alexey Kuznetsov to do
>> this.
>>
>> When sendfile() is performed, we might start the SKB with the last few
>> bytes of one page, and end the SKB with the first few bytes of another
>> page.
>>
>> In order to fit a full 64K frame into an SKB in this situation we have
>> to accomodate this case.
> Thanks David, that makes sense.
>
> However I think you only actually need 1 extra page for that. If the
> data in frag[0] starts at $offset then frag[16] will need to have
> $offset bytes in it. e.g.
>        4096-$offset + 4096*15 + $offset = 65536
> which == 17 pages rather than 18.
>
> The following documents the status quo but I could update to switch to +
> 1 instead if there are no flaws in the above logic...

Since max IP datagram is 64K-1, adding ethernet and possibly VLAN
headers makes the max size slightly above 64K and then you have
64K/PAGE_SIZE+2 pages appear in worst case.

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 22, 2011, 6:43 p.m. UTC | #3
From: Michał Mirosław <mirqus@gmail.com>

Date: Thu, 22 Dec 2011 19:34:21 +0100

> 2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>:

>> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote:

>>> From: Eric Dumazet <eric.dumazet@gmail.com>

>>> Date: Wed, 21 Dec 2011 15:02:18 +0100

>>> > No idea on this +2 point.

>>> I think I know, and I believe I instructed Alexey Kuznetsov to do

>>> this.

>>>

>>> When sendfile() is performed, we might start the SKB with the last few

>>> bytes of one page, and end the SKB with the first few bytes of another

>>> page.

>>>

>>> In order to fit a full 64K frame into an SKB in this situation we have

>>> to accomodate this case.

>> Thanks David, that makes sense.

>>

>> However I think you only actually need 1 extra page for that. If the

>> data in frag[0] starts at $offset then frag[16] will need to have

>> $offset bytes in it. e.g.

>>        4096-$offset + 4096*15 + $offset = 65536

>> which == 17 pages rather than 18.

>>

>> The following documents the status quo but I could update to switch to +

>> 1 instead if there are no flaws in the above logic...

> 

> Since max IP datagram is 64K-1, adding ethernet and possibly VLAN

> headers makes the max size slightly above 64K and then you have

> 64K/PAGE_SIZE+2 pages appear in worst case.


Headers go into the linear area, so they are not relevant for these
calculations.
=?ISO-8859-2?Q?Micha=B3_Miros=B3aw?= Dec. 22, 2011, 7:29 p.m. UTC | #4
2011/12/22 David Miller <davem@davemloft.net>:
> From: Michał Mirosław <mirqus@gmail.com>
> Date: Thu, 22 Dec 2011 19:34:21 +0100
>
>> 2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>:
>>> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote:
>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>> Date: Wed, 21 Dec 2011 15:02:18 +0100
>>>> > No idea on this +2 point.
>>>> I think I know, and I believe I instructed Alexey Kuznetsov to do
>>>> this.
>>>>
>>>> When sendfile() is performed, we might start the SKB with the last few
>>>> bytes of one page, and end the SKB with the first few bytes of another
>>>> page.
>>>>
>>>> In order to fit a full 64K frame into an SKB in this situation we have
>>>> to accomodate this case.
>>> Thanks David, that makes sense.
>>>
>>> However I think you only actually need 1 extra page for that. If the
>>> data in frag[0] starts at $offset then frag[16] will need to have
>>> $offset bytes in it. e.g.
>>>        4096-$offset + 4096*15 + $offset = 65536
>>> which == 17 pages rather than 18.
>>>
>>> The following documents the status quo but I could update to switch to +
>>> 1 instead if there are no flaws in the above logic...
>>
>> Since max IP datagram is 64K-1, adding ethernet and possibly VLAN
>> headers makes the max size slightly above 64K and then you have
>> 64K/PAGE_SIZE+2 pages appear in worst case.
>
> Headers go into the linear area, so they are not relevant for these
> calculations.

Does this hold for LRO'ed packets and packets sent via PF_PACKET socket?

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
=?ISO-8859-2?Q?Micha=B3_Miros=B3aw?= Dec. 23, 2011, 6:10 p.m. UTC | #5
2011/12/22 David Miller <davem@davemloft.net>:
> From: Michał Mirosław <mirqus@gmail.com>
> Date: Thu, 22 Dec 2011 20:29:30 +0100
>
>> 2011/12/22 David Miller <davem@davemloft.net>:
>>> From: Michał Mirosław <mirqus@gmail.com>
>>> Date: Thu, 22 Dec 2011 19:34:21 +0100
>>>
>>>> 2011/12/22 Ian Campbell <Ian.Campbell@citrix.com>:
>>>>> On Wed, 2011-12-21 at 19:28 +0000, David Miller wrote:
>>>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>>>> Date: Wed, 21 Dec 2011 15:02:18 +0100
>>>>>> > No idea on this +2 point.
>>>>>> I think I know, and I believe I instructed Alexey Kuznetsov to do
>>>>>> this.
>>>>>>
>>>>>> When sendfile() is performed, we might start the SKB with the last few
>>>>>> bytes of one page, and end the SKB with the first few bytes of another
>>>>>> page.
>>>>>>
>>>>>> In order to fit a full 64K frame into an SKB in this situation we have
>>>>>> to accomodate this case.
>>>>> Thanks David, that makes sense.
>>>>>
>>>>> However I think you only actually need 1 extra page for that. If the
>>>>> data in frag[0] starts at $offset then frag[16] will need to have
>>>>> $offset bytes in it. e.g.
>>>>>        4096-$offset + 4096*15 + $offset = 65536
>>>>> which == 17 pages rather than 18.
>>>>>
>>>>> The following documents the status quo but I could update to switch to +
>>>>> 1 instead if there are no flaws in the above logic...
>>>>
>>>> Since max IP datagram is 64K-1, adding ethernet and possibly VLAN
>>>> headers makes the max size slightly above 64K and then you have
>>>> 64K/PAGE_SIZE+2 pages appear in worst case.
>>>
>>> Headers go into the linear area, so they are not relevant for these
>>> calculations.
>>
>> Does this hold for LRO'ed packets and packets sent via PF_PACKET socket?
>
> LRO is for receive, not packets we build on transmit, and in any event
> have their headers also pulled into the linear area before the stack
> sees it.

Yes, but the drivers seem to first fill the data to frags[] and only
then move the packets header (I browsed myri10ge).

> For PF_PACKET, in the packet_snd() case it uses a linear SKB and in
> the tpacket_snd() case it uses a linear SKB as well.

I saw skb_fill_page_desc() in af_packet.c (called by
tpacket_fill_skb() which is called by tpacket_snd()), that's why I'm
asking.

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fe86488..1c894a8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -128,8 +128,12 @@  struct sk_buff_head {
 
 struct sk_buff;
 
-/* To allow 64K frame to be packed as single skb without frag_list. Since
- * GRO uses frags we allocate at least 16 regardless of page size.
+/* To allow 64K frame to be packed as single skb without frag_list we
+ * require 64K/PAGE_SIZE pages plus 2 additional pages to allow for
+ * buffers which do not start on a page boundary.
+ *
+ * Since GRO uses frags we allocate at least 16 regardless of page
+ * size.
  */
 #if (65536/PAGE_SIZE + 2) < 16
 #define MAX_SKB_FRAGS 16UL