diff mbox

xen/netfront: improve truesize tracking

Message ID 1355838711-5473-1-git-send-email-ian.campbell@citrix.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Ian Campbell Dec. 18, 2012, 1:51 p.m. UTC
Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
than that. We have already accounted for this in
NETFRONT_SKB_CB(skb)->pull_to so use that instead.

Fixes WARN_ON from skb_try_coalesce.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: annie li <annie.li@oracle.com>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
Cc: stable@kernel.org # 3.7.x only
---
 drivers/net/xen-netfront.c |   15 +++++----------
 1 files changed, 5 insertions(+), 10 deletions(-)

Comments

Konrad Rzeszutek Wilk Dec. 18, 2012, 2:10 p.m. UTC | #1
On Tue, Dec 18, 2012 at 01:51:51PM +0000, Ian Campbell wrote:
> Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> than that. We have already accounted for this in
> NETFRONT_SKB_CB(skb)->pull_to so use that instead.
> 
> Fixes WARN_ON from skb_try_coalesce.

This should probably be also on the stable tree for 3.7 at least?

> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
  ^^ - Reported-by: 

> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  ^^ - Acked-by:

> Cc: annie li <annie.li@oracle.com>
> Cc: xen-devel@lists.xensource.com
> Cc: netdev@vger.kernel.org
> Cc: stable@kernel.org # 3.7.x only
> ---
>  drivers/net/xen-netfront.c |   15 +++++----------
>  1 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..b06ef81 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -971,17 +971,12 @@ err:
>  		 * overheads. Here, we add the size of the data pulled
>  		 * in xennet_fill_frags().
>  		 *
> -		 * We also adjust for any unused space in the main
> -		 * data area by subtracting (RX_COPY_THRESHOLD -
> -		 * len). This is especially important with drivers
> -		 * which split incoming packets into header and data,
> -		 * using only 66 bytes of the main data area (see the
> -		 * e1000 driver for example.)  On such systems,
> -		 * without this last adjustement, our achievable
> -		 * receive throughout using the standard receive
> -		 * buffer size was cut by 25%(!!!).
> +		 * We also adjust for the __pskb_pull_tail done in
> +		 * handle_incoming_queue which pulls data from the
> +		 * frags into the head area, which is already
> +		 * accounted in RX_COPY_THRESHOLD.
>  		 */
> -		skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> +		skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
>  		skb->len += skb->data_len;
>  
>  		if (rx->flags & XEN_NETRXF_csum_blank)
> -- 
> 1.7.2.5
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Dec. 18, 2012, 2:15 p.m. UTC | #2
On Tue, 2012-12-18 at 14:10 +0000, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 18, 2012 at 01:51:51PM +0000, Ian Campbell wrote:
> > Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> > than that. We have already accounted for this in
> > NETFRONT_SKB_CB(skb)->pull_to so use that instead.
> > 
> > Fixes WARN_ON from skb_try_coalesce.
> 
> This should probably be also on the stable tree for 3.7 at least?

Yes, hence "Cc: stable@kernel.org # 3.7.x only" below ;-)

> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Sander Eikelenboom <linux@eikelenboom.it>
>   ^^ - Reported-by: 
> 
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>   ^^ - Acked-by:
> 
> > Cc: annie li <annie.li@oracle.com>
> > Cc: xen-devel@lists.xensource.com
> > Cc: netdev@vger.kernel.org
> > Cc: stable@kernel.org # 3.7.x only
> > ---
> >  drivers/net/xen-netfront.c |   15 +++++----------
> >  1 files changed, 5 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index caa0110..b06ef81 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -971,17 +971,12 @@ err:
> >  		 * overheads. Here, we add the size of the data pulled
> >  		 * in xennet_fill_frags().
> >  		 *
> > -		 * We also adjust for any unused space in the main
> > -		 * data area by subtracting (RX_COPY_THRESHOLD -
> > -		 * len). This is especially important with drivers
> > -		 * which split incoming packets into header and data,
> > -		 * using only 66 bytes of the main data area (see the
> > -		 * e1000 driver for example.)  On such systems,
> > -		 * without this last adjustement, our achievable
> > -		 * receive throughout using the standard receive
> > -		 * buffer size was cut by 25%(!!!).
> > +		 * We also adjust for the __pskb_pull_tail done in
> > +		 * handle_incoming_queue which pulls data from the
> > +		 * frags into the head area, which is already
> > +		 * accounted in RX_COPY_THRESHOLD.
> >  		 */
> > -		skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> > +		skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
> >  		skb->len += skb->data_len;
> >  
> >  		if (rx->flags & XEN_NETRXF_csum_blank)
> > -- 
> > 1.7.2.5
> > 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 18, 2012, 3:12 p.m. UTC | #3
On Tue, 2012-12-18 at 13:51 +0000, Ian Campbell wrote:
> Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> than that. We have already accounted for this in
> NETFRONT_SKB_CB(skb)->pull_to so use that instead.
> 
> Fixes WARN_ON from skb_try_coalesce.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: annie li <annie.li@oracle.com>
> Cc: xen-devel@lists.xensource.com
> Cc: netdev@vger.kernel.org
> Cc: stable@kernel.org # 3.7.x only
> ---
>  drivers/net/xen-netfront.c |   15 +++++----------
>  1 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index caa0110..b06ef81 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -971,17 +971,12 @@ err:
>  		 * overheads. Here, we add the size of the data pulled
>  		 * in xennet_fill_frags().
>  		 *
> -		 * We also adjust for any unused space in the main
> -		 * data area by subtracting (RX_COPY_THRESHOLD -
> -		 * len). This is especially important with drivers
> -		 * which split incoming packets into header and data,
> -		 * using only 66 bytes of the main data area (see the
> -		 * e1000 driver for example.)  On such systems,
> -		 * without this last adjustement, our achievable
> -		 * receive throughout using the standard receive
> -		 * buffer size was cut by 25%(!!!).
> +		 * We also adjust for the __pskb_pull_tail done in
> +		 * handle_incoming_queue which pulls data from the
> +		 * frags into the head area, which is already
> +		 * accounted in RX_COPY_THRESHOLD.
>  		 */
> -		skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> +		skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
>  		skb->len += skb->data_len;
>  
>  		if (rx->flags & XEN_NETRXF_csum_blank)


But skb truesize is not what you think.

You must account the exact memory used by this skb, not only the used
part of it.

At the very minimum, it should be

skb->truesize += skb->data_len;

But it really should be the allocated size of the fragment.

If its a page, then its a page, even if you use one single byte in it.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Dec. 18, 2012, 3:26 p.m. UTC | #4
On Tue, 2012-12-18 at 15:12 +0000, Eric Dumazet wrote:
> On Tue, 2012-12-18 at 13:51 +0000, Ian Campbell wrote:
> > Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
> > than that. We have already accounted for this in
> > NETFRONT_SKB_CB(skb)->pull_to so use that instead.
> > 
> > Fixes WARN_ON from skb_try_coalesce.
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Sander Eikelenboom <linux@eikelenboom.it>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: annie li <annie.li@oracle.com>
> > Cc: xen-devel@lists.xensource.com
> > Cc: netdev@vger.kernel.org
> > Cc: stable@kernel.org # 3.7.x only
> > ---
> >  drivers/net/xen-netfront.c |   15 +++++----------
> >  1 files changed, 5 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index caa0110..b06ef81 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -971,17 +971,12 @@ err:
> >  		 * overheads. Here, we add the size of the data pulled
> >  		 * in xennet_fill_frags().
> >  		 *
> > -		 * We also adjust for any unused space in the main
> > -		 * data area by subtracting (RX_COPY_THRESHOLD -
> > -		 * len). This is especially important with drivers
> > -		 * which split incoming packets into header and data,
> > -		 * using only 66 bytes of the main data area (see the
> > -		 * e1000 driver for example.)  On such systems,
> > -		 * without this last adjustement, our achievable
> > -		 * receive throughout using the standard receive
> > -		 * buffer size was cut by 25%(!!!).
> > +		 * We also adjust for the __pskb_pull_tail done in
> > +		 * handle_incoming_queue which pulls data from the
> > +		 * frags into the head area, which is already
> > +		 * accounted in RX_COPY_THRESHOLD.
> >  		 */
> > -		skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> > +		skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
> >  		skb->len += skb->data_len;
> >  
> >  		if (rx->flags & XEN_NETRXF_csum_blank)
> 
> 
> But skb truesize is not what you think.

Indeed, it seems I was completely backwards about what it means!

> You must account the exact memory used by this skb, not only the used
> part of it.
> 
> At the very minimum, it should be
> 
> skb->truesize += skb->data_len;
> 
> But it really should be the allocated size of the fragment.
> 
> If its a page, then its a page, even if you use one single byte in it.

So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?

Sander, can you try that change?

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 18, 2012, 4:13 p.m. UTC | #5
On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:

> So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> 

I dont know what are the real frag sizes in your case.

Some drivers allocate a full page for an ethernet frame, others use half
of a page, it really depends.

As the frag ABI doesnt contain real size, its ok in this case to account
the actual frag size.

(skb->data_len in your driver)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Dec. 18, 2012, 4:22 p.m. UTC | #6
On Tue, 2012-12-18 at 16:13 +0000, Eric Dumazet wrote:
> On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:
> 
> > So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> > 
> 
> I dont know what are the real frag sizes in your case.

I think it's a page, see xennet_alloc_rx_buffers and the alloc_page
therein.

> Some drivers allocate a full page for an ethernet frame, others use half
> of a page, it really depends.
> 
> As the frag ABI doesnt contain real size, its ok in this case to account
> the actual frag size.
> 
> (skb->data_len in your driver)

I guess I'm a bit confused by what truesize means again then ;-),
because in that case the original patch is correct although it would
have been less confusing to do:
	skb->truesize += skb->data_len; 
in xennet_poll() and then do the subtraction of
NETFRONT_SKB_CB(skb)->pull_to in handle_incoming_queue() where we
actually do the pull up.

Unless __pskb_pull_tail does that adjustment for us, but if it does I
can't see where.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 18, 2012, 4:35 p.m. UTC | #7
On Tue, 2012-12-18 at 16:22 +0000, Ian Campbell wrote:
> On Tue, 2012-12-18 at 16:13 +0000, Eric Dumazet wrote:
> > On Tue, 2012-12-18 at 15:26 +0000, Ian Campbell wrote:
> > 
> > > So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?
> > > 
> > 
> > I dont know what are the real frag sizes in your case.
> 
> I think it's a page, see xennet_alloc_rx_buffers and the alloc_page
> therein.
> 

If they are order-0 pages, then PAGE_SIZE * nr_frags  is OK.


> > Some drivers allocate a full page for an ethernet frame, others use half
> > of a page, it really depends.
> > 

> > As the frag ABI doesnt contain real size, its ok in this case to account
> > the actual frag size.
> > 
> > (skb->data_len in your driver)
> 
> I guess I'm a bit confused by what truesize means again then ;-),
> because in that case the original patch is correct although it would
> have been less confusing to do:
> 	skb->truesize += skb->data_len; 
> in xennet_poll() and then do the subtraction of
> NETFRONT_SKB_CB(skb)->pull_to in handle_incoming_queue() where we
> actually do the pull up.
> 
> Unless __pskb_pull_tail does that adjustment for us, but if it does I
> can't see where.

Thats because skb frags only contain :

a page pointer.
An offset
A size. (Exact number or used bytes in this frag)

And not the 'originally allocated size. It could be 256, 768, 2048,
4096, 65536 bytes, nobody but the driver really knows.

So when we pull X bytes from a fragment to skb->head, there is no way to
remember what was the original size of the fragment.

Only the driver allocating the frag knows its truesize.

Once skb is given to the stack, we lose this information, and rely on
skb->truesize being an accurate estimation.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sander Eikelenboom Dec. 19, 2012, 11:34 a.m. UTC | #8
Tuesday, December 18, 2012, 4:26:38 PM, you wrote:

> On Tue, 2012-12-18 at 15:12 +0000, Eric Dumazet wrote:
>> On Tue, 2012-12-18 at 13:51 +0000, Ian Campbell wrote:
>> > Using RX_COPY_THRESHOLD is incorrect if the SKB is actually smaller
>> > than that. We have already accounted for this in
>> > NETFRONT_SKB_CB(skb)->pull_to so use that instead.
>> > 
>> > Fixes WARN_ON from skb_try_coalesce.
>> > 
>> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> > Cc: Sander Eikelenboom <linux@eikelenboom.it>
>> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > Cc: annie li <annie.li@oracle.com>
>> > Cc: xen-devel@lists.xensource.com
>> > Cc: netdev@vger.kernel.org
>> > Cc: stable@kernel.org # 3.7.x only
>> > ---
>> >  drivers/net/xen-netfront.c |   15 +++++----------
>> >  1 files changed, 5 insertions(+), 10 deletions(-)
>> > 
>> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> > index caa0110..b06ef81 100644
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -971,17 +971,12 @@ err:
>> >              * overheads. Here, we add the size of the data pulled
>> >              * in xennet_fill_frags().
>> >              *
>> > -            * We also adjust for any unused space in the main
>> > -            * data area by subtracting (RX_COPY_THRESHOLD -
>> > -            * len). This is especially important with drivers
>> > -            * which split incoming packets into header and data,
>> > -            * using only 66 bytes of the main data area (see the
>> > -            * e1000 driver for example.)  On such systems,
>> > -            * without this last adjustement, our achievable
>> > -            * receive throughout using the standard receive
>> > -            * buffer size was cut by 25%(!!!).
>> > +            * We also adjust for the __pskb_pull_tail done in
>> > +            * handle_incoming_queue which pulls data from the
>> > +            * frags into the head area, which is already
>> > +            * accounted in RX_COPY_THRESHOLD.
>> >              */
>> > -           skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> > +           skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
>> >             skb->len += skb->data_len;
>> >  
>> >             if (rx->flags & XEN_NETRXF_csum_blank)
>> 
>> 
>> But skb truesize is not what you think.

> Indeed, it seems I was completely backwards about what it means!

>> You must account the exact memory used by this skb, not only the used
>> part of it.
>> 
>> At the very minimum, it should be
>> 
>> skb->truesize += skb->data_len;
>> 
>> But it really should be the allocated size of the fragment.
>> 
>> If its a page, then its a page, even if you use one single byte in it.

> So actually we want += PAGE_SIZE * skb_shinfo(skb)->nr_frags ?

> Sander, can you try that change?

Hi Ian,

It ran overnight and i haven't seen the warn_once trigger.
(but i also didn't with the previous patch)

--
Sander

> Ian.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 19, 2012, 4:17 p.m. UTC | #9
On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

> Hi Ian,
> 
> It ran overnight and i haven't seen the warn_once trigger.
> (but i also didn't with the previous patch)
> 

As I said, the miminum value to not trigger the warning was what Ian
patch was doing, but it was still a not accurate estimation.

Doing the real accounting might trigger slow transferts, or dropped
packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

So the real question was : If accounting for full pages, is your
applications run as smooth as before, with no huge performance
regression ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index caa0110..b06ef81 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -971,17 +971,12 @@  err:
 		 * overheads. Here, we add the size of the data pulled
 		 * in xennet_fill_frags().
 		 *
-		 * We also adjust for any unused space in the main
-		 * data area by subtracting (RX_COPY_THRESHOLD -
-		 * len). This is especially important with drivers
-		 * which split incoming packets into header and data,
-		 * using only 66 bytes of the main data area (see the
-		 * e1000 driver for example.)  On such systems,
-		 * without this last adjustement, our achievable
-		 * receive throughout using the standard receive
-		 * buffer size was cut by 25%(!!!).
+		 * We also adjust for the __pskb_pull_tail done in
+		 * handle_incoming_queue which pulls data from the
+		 * frags into the head area, which is already
+		 * accounted in RX_COPY_THRESHOLD.
 		 */
-		skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
+		skb->truesize += skb->data_len - NETFRONT_SKB_CB(skb)->pull_to;
 		skb->len += skb->data_len;
 
 		if (rx->flags & XEN_NETRXF_csum_blank)