Message ID | 51D6AED902000078000E2EA9@nat28.tlf.novell.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Jul 05, 2013 at 10:32:41AM +0100, Jan Beulich wrote: > Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure > linear area is big enough on RX") xennet_fill_frags() may end up > filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce > the fragment count subsequently via __pskb_pull_tail(). That's a > result of xennet_get_responses() allowing a maximum of one more slot to > be consumed (and intermediately transformed into a fragment) if the > head slot has a size less than or equal to RX_COPY_THRESHOLD. > > Hence we need to adjust xennet_fill_frags() to pull earlier if we > reached the maximum fragment count - due to the described behavior of > xennet_get_responses() this guarantees that at least the first fragment > will get completely consumed, and hence the fragment count reduced. > > In order to not needlessly call __pskb_pull_tail() twice, make the > original call conditional upon the pull target not having been reached > yet, and defer the newly added one as much as possible (an alternative > would have been to always call the function right before the call to > xennet_fill_frags(), but that would imply more frequent cases of > needing to call it twice). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > Cc: Wei Liu <wei.liu2@citrix.com> > Cc: Ian Campbell <ian.campbell@citrix.com> > Cc: stable@vger.kernel.org (3.6 onwards) > > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -831,6 +831,15 @@ static RING_IDX xennet_fill_frags(struct > RING_GET_RESPONSE(&np->rx, ++cons); > skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; > > + if (nr_frags == MAX_SKB_FRAGS) { > + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; > + > + BUG_ON(pull_to <= skb_headlen(skb)); > + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); skb_headlen is in fact "skb->len - skb->data_len". Looking at the caller code: while loop { skb_shinfo(skb)->frags[0].page_offset = rx->offset; skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status); skb->data_len = rx->status; i = xennet_fill_frags(np, skb, &tmpq); /* * Truesize is the actual allocation size, even if the * allocation is only partially used. */ skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; skb->len += skb->data_len; } handle_incoming_packet(); You seem to be altering the behavior of the original code, because in your patch the skb->len is incremented before use, while in the original code (which calls skb_headlen in handle_incoming_packet) the skb->len is correctly set. > + nr_frags = shinfo->nr_frags; > + } > + BUG_ON(nr_frags >= MAX_SKB_FRAGS); > + > __skb_fill_page_desc(skb, nr_frags, > skb_frag_page(nfrag), > rx->offset, rx->status); > @@ -929,7 +938,8 @@ static int handle_incoming_queue(struct > while ((skb = __skb_dequeue(rxq)) != NULL) { > int pull_to = NETFRONT_SKB_CB(skb)->pull_to; > > - __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > + if (pull_to > skb_headlen(skb)) > + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > > /* Ethernet work: Delayed to here as it peeks the header. */ > skb->protocol = eth_type_trans(skb, dev); > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Wei Liu <wei.liu2@citrix.com> Date: Fri, 5 Jul 2013 15:53:19 +0100 > You seem to be altering the behavior of the original code, because in > your patch the skb->len is incremented before use, while in the original > code (which calls skb_headlen in handle_incoming_packet) the skb->len is > correctly set. Indeed, I think you're right. Because __pskb_pull_tail() decrements the number of bytes pulled from skb->data_len, it won't be accounted for in skb->len -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> On 05.07.13 at 16:53, Wei Liu <wei.liu2@citrix.com> wrote: > On Fri, Jul 05, 2013 at 10:32:41AM +0100, Jan Beulich wrote: >> --- a/drivers/net/xen-netfront.c >> +++ b/drivers/net/xen-netfront.c >> @@ -831,6 +831,15 @@ static RING_IDX xennet_fill_frags(struct >> RING_GET_RESPONSE(&np->rx, ++cons); >> skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; >> >> + if (nr_frags == MAX_SKB_FRAGS) { >> + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; >> + >> + BUG_ON(pull_to <= skb_headlen(skb)); >> + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > > skb_headlen is in fact "skb->len - skb->data_len". Looking at the > caller code: > > while loop { > skb_shinfo(skb)->frags[0].page_offset = rx->offset; > skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status); > skb->data_len = rx->status; > > i = xennet_fill_frags(np, skb, &tmpq); > > /* > > * Truesize is the actual allocation size, even if the > > * allocation is only partially used. > > */ > skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; > skb->len += skb->data_len; > } > > handle_incoming_packet(); > > You seem to be altering the behavior of the original code, because in > your patch the skb->len is incremented before use, while in the original > code (which calls skb_headlen in handle_incoming_packet) the skb->len is > correctly set. Right. So I basically need to keep skb->len up-to-date along with ->data_len. Just handed a patch to Dion with that done; I'll defer sending a v2 for the upstream code until I know the change works for our kernel. Jan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/08/2013 11:59 AM, Jan Beulich wrote: >>>> On 05.07.13 at 16:53, Wei Liu <wei.liu2@citrix.com> wrote: >> On Fri, Jul 05, 2013 at 10:32:41AM +0100, Jan Beulich wrote: >>> --- a/drivers/net/xen-netfront.c >>> +++ b/drivers/net/xen-netfront.c >>> @@ -831,6 +831,15 @@ static RING_IDX xennet_fill_frags(struct >>> RING_GET_RESPONSE(&np->rx, ++cons); >>> skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; >>> >>> + if (nr_frags == MAX_SKB_FRAGS) { >>> + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; >>> + >>> + BUG_ON(pull_to <= skb_headlen(skb)); >>> + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); >> >> skb_headlen is in fact "skb->len - skb->data_len". Looking at the >> caller code: >> >> while loop { >> skb_shinfo(skb)->frags[0].page_offset = rx->offset; >> skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status); >> skb->data_len = rx->status; >> >> i = xennet_fill_frags(np, skb, &tmpq); >> >> /* >> >> * Truesize is the actual allocation size, even if the >> >> * allocation is only partially used. >> >> */ >> skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; >> skb->len += skb->data_len; >> } >> >> handle_incoming_packet(); >> >> You seem to be altering the behavior of the original code, because in >> your patch the skb->len is incremented before use, while in the original >> code (which calls skb_headlen in handle_incoming_packet) the skb->len is >> correctly set. > > Right. So I basically need to keep skb->len up-to-date along with > ->data_len. Just handed a patch to Dion with that done; I'll defer > sending a v2 for the upstream code until I know the change works > for our kernel. > > Jan > Jan, I was wondering about the following. In netif_poll(struct napi_struct *napi, int budget) the skb->cb is assigend, but it may be clipped to RX_COPY_THRESHOLD NETFRONT_SKB_CB(skb)->pull_to = rx->status; if (NETFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD) NETFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD; How does this modification of NETFRONT_SKB_CB(skb)->pull_to propagates or is this irrelevant? Dion -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> On 08.07.13 at 14:16, Dion Kant <g.w.kant@hunenet.nl> wrote: > I was wondering about the following. > > In netif_poll(struct napi_struct *napi, int budget) the skb->cb is assigend, > but it may be clipped to RX_COPY_THRESHOLD > > NETFRONT_SKB_CB(skb)->pull_to = rx->status; > if (NETFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD) > NETFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD; > > How does this modification of NETFRONT_SKB_CB(skb)->pull_to propagates > or is this irrelevant? I'm not sure I understand what you're asking. pull_to is a private (to netfront) field used to communicate how much of the skb needs to be pulled. I don't see how the word "propagate" makes any sense here, but as said I may simply not be getting what you're asking about. Jan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -831,6 +831,15 @@ static RING_IDX xennet_fill_frags(struct RING_GET_RESPONSE(&np->rx, ++cons); skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; + if (nr_frags == MAX_SKB_FRAGS) { + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; + + BUG_ON(pull_to <= skb_headlen(skb)); + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); + nr_frags = shinfo->nr_frags; + } + BUG_ON(nr_frags >= MAX_SKB_FRAGS); + __skb_fill_page_desc(skb, nr_frags, skb_frag_page(nfrag), rx->offset, rx->status); @@ -929,7 +938,8 @@ static int handle_incoming_queue(struct while ((skb = __skb_dequeue(rxq)) != NULL) { int pull_to = NETFRONT_SKB_CB(skb)->pull_to; - __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); + if (pull_to > skb_headlen(skb)) + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); /* Ethernet work: Delayed to here as it peeks the header. */ skb->protocol = eth_type_trans(skb, dev);
Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX") xennet_fill_frags() may end up filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce the fragment count subsequently via __pskb_pull_tail(). That's a result of xennet_get_responses() allowing a maximum of one more slot to be consumed (and intermediately transformed into a fragment) if the head slot has a size less than or equal to RX_COPY_THRESHOLD. Hence we need to adjust xennet_fill_frags() to pull earlier if we reached the maximum fragment count - due to the described behavior of xennet_get_responses() this guarantees that at least the first fragment will get completely consumed, and hence the fragment count reduced. In order to not needlessly call __pskb_pull_tail() twice, make the original call conditional upon the pull target not having been reached yet, and defer the newly added one as much as possible (an alternative would have been to always call the function right before the call to xennet_fill_frags(), but that would imply more frequent cases of needing to call it twice). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: stable@vger.kernel.org (3.6 onwards) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html