diff mbox

xen-netfront: pull on receive skb may need to happen earlier

Message ID 51D6AED902000078000E2EA9@nat28.tlf.novell.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jan Beulich July 5, 2013, 9:32 a.m. UTC
Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure
linear area is big enough on RX") xennet_fill_frags() may end up
filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce
the fragment count subsequently via __pskb_pull_tail(). That's a
result of xennet_get_responses() allowing a maximum of one more slot to
be consumed (and intermediately transformed into a fragment) if the
head slot has a size less than or equal to RX_COPY_THRESHOLD.

Hence we need to adjust xennet_fill_frags() to pull earlier if we
reached the maximum fragment count - due to the described behavior of
xennet_get_responses() this guarantees that at least the first fragment
will get completely consumed, and hence the fragment count reduced.

In order to not needlessly call __pskb_pull_tail() twice, make the
original call conditional upon the pull target not having been reached
yet, and defer the newly added one as much as possible (an alternative
would have been to always call the function right before the call to
xennet_fill_frags(), but that would imply more frequent cases of
needing to call it twice).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: stable@vger.kernel.org (3.6 onwards)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Wei Liu July 5, 2013, 2:53 p.m. UTC | #1
On Fri, Jul 05, 2013 at 10:32:41AM +0100, Jan Beulich wrote:
> Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure
> linear area is big enough on RX") xennet_fill_frags() may end up
> filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce
> the fragment count subsequently via __pskb_pull_tail(). That's a
> result of xennet_get_responses() allowing a maximum of one more slot to
> be consumed (and intermediately transformed into a fragment) if the
> head slot has a size less than or equal to RX_COPY_THRESHOLD.
> 
> Hence we need to adjust xennet_fill_frags() to pull earlier if we
> reached the maximum fragment count - due to the described behavior of
> xennet_get_responses() this guarantees that at least the first fragment
> will get completely consumed, and hence the fragment count reduced.
> 
> In order to not needlessly call __pskb_pull_tail() twice, make the
> original call conditional upon the pull target not having been reached
> yet, and defer the newly added one as much as possible (an alternative
> would have been to always call the function right before the call to
> xennet_fill_frags(), but that would imply more frequent cases of
> needing to call it twice).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: stable@vger.kernel.org (3.6 onwards)
> 
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -831,6 +831,15 @@ static RING_IDX xennet_fill_frags(struct
>  			RING_GET_RESPONSE(&np->rx, ++cons);
>  		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
>  
> +		if (nr_frags == MAX_SKB_FRAGS) {
> +			unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
> +
> +			BUG_ON(pull_to <= skb_headlen(skb));
> +			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));

skb_headlen is in fact "skb->len - skb->data_len". Looking at the
caller code:

    while loop {
        skb_shinfo(skb)->frags[0].page_offset = rx->offset;
	skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status);
	skb->data_len = rx->status;

	i = xennet_fill_frags(np, skb, &tmpq);

	/*                                                                                                                                             
	 * Truesize is the actual allocation size, even if the                                                                                         
	 * allocation is only partially used.                                                                                                          
	 */
	skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;
	skb->len += skb->data_len;
    }

    handle_incoming_packet();

You seem to be altering the behavior of the original code, because in
your patch the skb->len is incremented before use, while in the original
code (which calls skb_headlen in handle_incoming_packet) the skb->len is
correctly set.

> +			nr_frags = shinfo->nr_frags;
> +		}
> +		BUG_ON(nr_frags >= MAX_SKB_FRAGS);
> +
>  		__skb_fill_page_desc(skb, nr_frags,
>  				     skb_frag_page(nfrag),
>  				     rx->offset, rx->status);
> @@ -929,7 +938,8 @@ static int handle_incoming_queue(struct 
>  	while ((skb = __skb_dequeue(rxq)) != NULL) {
>  		int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
>  
> -		__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
> +		if (pull_to > skb_headlen(skb))
> +			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
>  
>  		/* Ethernet work: Delayed to here as it peeks the header. */
>  		skb->protocol = eth_type_trans(skb, dev);
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 7, 2013, 1:10 a.m. UTC | #2
From: Wei Liu <wei.liu2@citrix.com>
Date: Fri, 5 Jul 2013 15:53:19 +0100

> You seem to be altering the behavior of the original code, because in
> your patch the skb->len is incremented before use, while in the original
> code (which calls skb_headlen in handle_incoming_packet) the skb->len is
> correctly set.

Indeed, I think you're right.

Because __pskb_pull_tail() decrements the number of bytes pulled
from skb->data_len, it won't be accounted for in skb->len
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Beulich July 8, 2013, 9:59 a.m. UTC | #3
>>> On 05.07.13 at 16:53, Wei Liu <wei.liu2@citrix.com> wrote:
> On Fri, Jul 05, 2013 at 10:32:41AM +0100, Jan Beulich wrote:
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -831,6 +831,15 @@ static RING_IDX xennet_fill_frags(struct
>>  			RING_GET_RESPONSE(&np->rx, ++cons);
>>  		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
>>  
>> +		if (nr_frags == MAX_SKB_FRAGS) {
>> +			unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
>> +
>> +			BUG_ON(pull_to <= skb_headlen(skb));
>> +			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
> 
> skb_headlen is in fact "skb->len - skb->data_len". Looking at the
> caller code:
> 
>     while loop {
>         skb_shinfo(skb)->frags[0].page_offset = rx->offset;
> 	skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status);
> 	skb->data_len = rx->status;
> 
> 	i = xennet_fill_frags(np, skb, &tmpq);
> 
> 	/*                                                                           
>                                                                   
> 	 * Truesize is the actual allocation size, even if the                       
>                                                                   
> 	 * allocation is only partially used.                                        
>                                                                   
> 	 */
> 	skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;
> 	skb->len += skb->data_len;
>     }
> 
>     handle_incoming_packet();
> 
> You seem to be altering the behavior of the original code, because in
> your patch the skb->len is incremented before use, while in the original
> code (which calls skb_headlen in handle_incoming_packet) the skb->len is
> correctly set.

Right. So I basically need to keep skb->len up-to-date along with
->data_len. Just handed a patch to Dion with that done; I'll defer
sending a v2 for the upstream code until I know the change works
for our kernel.

Jan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dion Kant July 8, 2013, 12:16 p.m. UTC | #4
On 07/08/2013 11:59 AM, Jan Beulich wrote:
>>>> On 05.07.13 at 16:53, Wei Liu <wei.liu2@citrix.com> wrote:
>> On Fri, Jul 05, 2013 at 10:32:41AM +0100, Jan Beulich wrote:
>>> --- a/drivers/net/xen-netfront.c
>>> +++ b/drivers/net/xen-netfront.c
>>> @@ -831,6 +831,15 @@ static RING_IDX xennet_fill_frags(struct
>>>  			RING_GET_RESPONSE(&np->rx, ++cons);
>>>  		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
>>>  
>>> +		if (nr_frags == MAX_SKB_FRAGS) {
>>> +			unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
>>> +
>>> +			BUG_ON(pull_to <= skb_headlen(skb));
>>> +			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
>>
>> skb_headlen is in fact "skb->len - skb->data_len". Looking at the
>> caller code:
>>
>>     while loop {
>>         skb_shinfo(skb)->frags[0].page_offset = rx->offset;
>> 	skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status);
>> 	skb->data_len = rx->status;
>>
>> 	i = xennet_fill_frags(np, skb, &tmpq);
>>
>> 	/*                                                                           
>>                                                                   
>> 	 * Truesize is the actual allocation size, even if the                       
>>                                                                   
>> 	 * allocation is only partially used.                                        
>>                                                                   
>> 	 */
>> 	skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags;
>> 	skb->len += skb->data_len;
>>     }
>>
>>     handle_incoming_packet();
>>
>> You seem to be altering the behavior of the original code, because in
>> your patch the skb->len is incremented before use, while in the original
>> code (which calls skb_headlen in handle_incoming_packet) the skb->len is
>> correctly set.
> 
> Right. So I basically need to keep skb->len up-to-date along with
> ->data_len. Just handed a patch to Dion with that done; I'll defer
> sending a v2 for the upstream code until I know the change works
> for our kernel.
> 
> Jan
> 

Jan,

I was wondering about the following.

In netif_poll(struct napi_struct *napi, int budget) the skb->cb is assigend,
but it may be clipped to RX_COPY_THRESHOLD

		NETFRONT_SKB_CB(skb)->pull_to = rx->status;
		if (NETFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD)
		      NETFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD;

How does this modification of NETFRONT_SKB_CB(skb)->pull_to propagates
or is this irrelevant?

Dion

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Beulich July 8, 2013, 12:41 p.m. UTC | #5
>>> On 08.07.13 at 14:16, Dion Kant <g.w.kant@hunenet.nl> wrote:
> I was wondering about the following.
> 
> In netif_poll(struct napi_struct *napi, int budget) the skb->cb is assigend,
> but it may be clipped to RX_COPY_THRESHOLD
> 
> 		NETFRONT_SKB_CB(skb)->pull_to = rx->status;
> 		if (NETFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD)
> 		      NETFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD;
> 
> How does this modification of NETFRONT_SKB_CB(skb)->pull_to propagates
> or is this irrelevant?

I'm not sure I understand what you're asking. pull_to is a private
(to netfront) field used to communicate how much of the skb
needs to be pulled. I don't see how the word "propagate" makes
any sense here, but as said I may simply not be getting what
you're asking about.

Jan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -831,6 +831,15 @@  static RING_IDX xennet_fill_frags(struct
 			RING_GET_RESPONSE(&np->rx, ++cons);
 		skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
 
+		if (nr_frags == MAX_SKB_FRAGS) {
+			unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
+
+			BUG_ON(pull_to <= skb_headlen(skb));
+			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
+			nr_frags = shinfo->nr_frags;
+		}
+		BUG_ON(nr_frags >= MAX_SKB_FRAGS);
+
 		__skb_fill_page_desc(skb, nr_frags,
 				     skb_frag_page(nfrag),
 				     rx->offset, rx->status);
@@ -929,7 +938,8 @@  static int handle_incoming_queue(struct 
 	while ((skb = __skb_dequeue(rxq)) != NULL) {
 		int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
 
-		__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
+		if (pull_to > skb_headlen(skb))
+			__pskb_pull_tail(skb, pull_to - skb_headlen(skb));
 
 		/* Ethernet work: Delayed to here as it peeks the header. */
 		skb->protocol = eth_type_trans(skb, dev);