Patchwork [v2,1/1] xen/netback: correctly calculate required slots of skb.

login
register
mail settings
Submitter annie.li@oracle.com
Date July 10, 2013, 9:15 a.m.
Message ID <1373447711-31303-1-git-send-email-annie.li@oracle.com>
Download mbox | patch
Permalink /patch/257989/
State Superseded
Delegated to: David Miller
Headers show

Comments

annie.li@oracle.com - July 10, 2013, 9:15 a.m.
When counting required slots for skb, netback directly uses DIV_ROUND_UP to get
slots required by header data. This is wrong when offset in the page of header
data is not zero, and is also inconsistent with following calculation for
required slot in netbk_gop_skb.

In netbk_gop_skb, required slots are calculated based on offset and len in page
of header data. It is possible that required slots here is larger than the one
calculated in earlier netbk_count_requests. This inconsistency directly results
in rx_req_cons_peek and xen_netbk_rx_ring_full judgement are wrong.

Then it comes to situation the ring is actually full, but netback thinks it is
not and continues to create responses. This results in response overlaps request
in the ring, then grantcopy gets wrong grant reference and throws out error,
for example "(XEN) grant_table.c:1763:d0 Bad grant reference 2949120", the
grant reference is invalid value here. Netback returns XEN_NETIF_RSP_ERROR(-1)
to netfront when grant copy status is error, then netfront gets rx->status
(the status is -1, not really data size now), and throws out error,
"kernel: net eth1: rx->offset: 0, size: 4294967295". This issue can be reproduced
by doing gzip/gunzip in nfs share with mtu = 9000, the guest would panic after
running such test for a while.

This patch is based on 3.10-rc7.

Signed-off-by: Annie Li <annie.li@oracle.com>
---
 drivers/net/xen-netback/netback.c |   98 ++++++++++++++++++++++++-------------
 1 files changed, 63 insertions(+), 35 deletions(-)
David Miller - July 11, 2013, 2:18 a.m.
From: Annie Li <annie.li@oracle.com>
Date: Wed, 10 Jul 2013 17:15:11 +0800

> When counting required slots for skb, netback directly uses DIV_ROUND_UP to get
> slots required by header data. This is wrong when offset in the page of header
> data is not zero, and is also inconsistent with following calculation for
> required slot in netbk_gop_skb.
> 
> In netbk_gop_skb, required slots are calculated based on offset and len in page
> of header data. It is possible that required slots here is larger than the one
> calculated in earlier netbk_count_requests. This inconsistency directly results
> in rx_req_cons_peek and xen_netbk_rx_ring_full judgement are wrong.
> 
> Then it comes to situation the ring is actually full, but netback thinks it is
> not and continues to create responses. This results in response overlaps request
> in the ring, then grantcopy gets wrong grant reference and throws out error,
> for example "(XEN) grant_table.c:1763:d0 Bad grant reference 2949120", the
> grant reference is invalid value here. Netback returns XEN_NETIF_RSP_ERROR(-1)
> to netfront when grant copy status is error, then netfront gets rx->status
> (the status is -1, not really data size now), and throws out error,
> "kernel: net eth1: rx->offset: 0, size: 4294967295". This issue can be reproduced
> by doing gzip/gunzip in nfs share with mtu = 9000, the guest would panic after
> running such test for a while.
> 
> This patch is based on 3.10-rc7.
> 
> Signed-off-by: Annie Li <annie.li@oracle.com>

This patch looks good to me, but I'd like to see some reviews from other
experts in this area.

In the future I'd really like to see this code either use PAGE_SIZE
everywhere or MAX_BUFFER_OFFSET everywhere, in the buffer chopping
code.

I think using both leads to confusion and makes this code harder to
read.  I prefer MAX_BUFFER_OFFSET because it gives the indication that
what this value represents is the modulus upon which we must chop up
RX buffers in this driver.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
annie.li@oracle.com - July 11, 2013, 2:48 a.m.
On 2013-7-11 10:18, David Miller wrote:
> From: Annie Li <annie.li@oracle.com>
> Date: Wed, 10 Jul 2013 17:15:11 +0800
>
>> When counting required slots for skb, netback directly uses DIV_ROUND_UP to get
>> slots required by header data. This is wrong when offset in the page of header
>> data is not zero, and is also inconsistent with following calculation for
>> required slot in netbk_gop_skb.
>>
>> In netbk_gop_skb, required slots are calculated based on offset and len in page
>> of header data. It is possible that required slots here is larger than the one
>> calculated in earlier netbk_count_requests. This inconsistency directly results
>> in rx_req_cons_peek and xen_netbk_rx_ring_full judgement are wrong.
>>
>> Then it comes to situation the ring is actually full, but netback thinks it is
>> not and continues to create responses. This results in response overlaps request
>> in the ring, then grantcopy gets wrong grant reference and throws out error,
>> for example "(XEN) grant_table.c:1763:d0 Bad grant reference 2949120", the
>> grant reference is invalid value here. Netback returns XEN_NETIF_RSP_ERROR(-1)
>> to netfront when grant copy status is error, then netfront gets rx->status
>> (the status is -1, not really data size now), and throws out error,
>> "kernel: net eth1: rx->offset: 0, size: 4294967295". This issue can be reproduced
>> by doing gzip/gunzip in nfs share with mtu = 9000, the guest would panic after
>> running such test for a while.
>>
>> This patch is based on 3.10-rc7.
>>
>> Signed-off-by: Annie Li <annie.li@oracle.com>
> This patch looks good to me, but I'd like to see some reviews from other
> experts in this area.
>
> In the future I'd really like to see this code either use PAGE_SIZE
> everywhere or MAX_BUFFER_OFFSET everywhere, in the buffer chopping
> code.
>
> I think using both leads to confusion and makes this code harder to
> read.

True, I had the confusion too.

>   I prefer MAX_BUFFER_OFFSET because it gives the indication that
> what this value represents is the modulus upon which we must chop up
> RX buffers in this driver.

Would PAGE_SIZE be more straight? MAX_BUFFER_OFFSET gives an idea of 
offset instead of length.
Anyway, making it consistent is a good idea.

Thanks
Annie
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell - July 11, 2013, 8:11 a.m.
On Wed, 2013-07-10 at 17:15 +0800, Annie Li wrote:
> +static int netbk_count_slots(struct xenvif *vif, struct sk_buff *skb,
> +			    int *copy_off, unsigned long size,
> +			    unsigned long offset, int *head)
>  {
> -	unsigned int count;
> -	int i, copy_off;
> +	unsigned long bytes;
> +	int count = 0;
>  
> -	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> +	offset &= ~PAGE_MASK;
>  
> -	copy_off = skb_headlen(skb) % PAGE_SIZE;
> +	while (size > 0) {
> +		BUG_ON(offset >= PAGE_SIZE);
> +		BUG_ON(*copy_off > MAX_BUFFER_OFFSET);
>  
> -	if (skb_shinfo(skb)->gso_size)
> -		count++;
> +		bytes = PAGE_SIZE - offset;
>  
> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> -		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
> -		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
> -		unsigned long bytes;
> +		if (bytes > size)
> +			bytes = size;
>  
> -		offset &= ~PAGE_MASK;
> +		if (start_new_rx_buffer(*copy_off, bytes, *head)) {
> +			count++;
> +			*copy_off = 0;
> +		}
>  
> -		while (size > 0) {
> -			BUG_ON(offset >= PAGE_SIZE);
> -			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
> +		if (*copy_off + bytes > MAX_BUFFER_OFFSET)
> +			bytes = MAX_BUFFER_OFFSET - *copy_off;
>  
> -			bytes = PAGE_SIZE - offset;
> +		*copy_off += bytes;
>  
> -			if (bytes > size)
> -				bytes = size;
> +		offset += bytes;
> +		size -= bytes;
>  
> -			if (start_new_rx_buffer(copy_off, bytes, 0)) {
> -				count++;
> -				copy_off = 0;
> -			}
> +		/* Next frame */
> +		if (offset == PAGE_SIZE && size)
> +			offset = 0;
> +
> +		if (*head)
> +			count++;

This little bit corresponds to the "/* Leave a gap for the GSO
descriptor. */" in gop_frag_copy?

If so then it would be useful to duplicate the comment, but more
importantly the condition on gop_frag_copy is:
        (head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
why the difference?

> +		*head = 0; /* There must be something in this buffer now. */
> +	}
> +
> +	return count;
> +}


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
annie.li@oracle.com - July 11, 2013, 8:34 a.m.
On 2013-7-11 16:11, Ian Campbell wrote:
> On Wed, 2013-07-10 at 17:15 +0800, Annie Li wrote:
>> +static int netbk_count_slots(struct xenvif *vif, struct sk_buff *skb,
>> +			    int *copy_off, unsigned long size,
>> +			    unsigned long offset, int *head)
>>   {
>> -	unsigned int count;
>> -	int i, copy_off;
>> +	unsigned long bytes;
>> +	int count = 0;
>>   
>> -	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
>> +	offset &= ~PAGE_MASK;
>>   
>> -	copy_off = skb_headlen(skb) % PAGE_SIZE;
>> +	while (size > 0) {
>> +		BUG_ON(offset >= PAGE_SIZE);
>> +		BUG_ON(*copy_off > MAX_BUFFER_OFFSET);
>>   
>> -	if (skb_shinfo(skb)->gso_size)
>> -		count++;
>> +		bytes = PAGE_SIZE - offset;
>>   
>> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>> -		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
>> -		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
>> -		unsigned long bytes;
>> +		if (bytes > size)
>> +			bytes = size;
>>   
>> -		offset &= ~PAGE_MASK;
>> +		if (start_new_rx_buffer(*copy_off, bytes, *head)) {
>> +			count++;
>> +			*copy_off = 0;
>> +		}
>>   
>> -		while (size > 0) {
>> -			BUG_ON(offset >= PAGE_SIZE);
>> -			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
>> +		if (*copy_off + bytes > MAX_BUFFER_OFFSET)
>> +			bytes = MAX_BUFFER_OFFSET - *copy_off;
>>   
>> -			bytes = PAGE_SIZE - offset;
>> +		*copy_off += bytes;
>>   
>> -			if (bytes > size)
>> -				bytes = size;
>> +		offset += bytes;
>> +		size -= bytes;
>>   
>> -			if (start_new_rx_buffer(copy_off, bytes, 0)) {
>> -				count++;
>> -				copy_off = 0;
>> -			}
>> +		/* Next frame */
>> +		if (offset == PAGE_SIZE && size)
>> +			offset = 0;
>> +
>> +		if (*head)
>> +			count++;
> This little bit corresponds to the "/* Leave a gap for the GSO
> descriptor. */" in gop_frag_copy?

No, it does not correspond to this in gop_frag_copy. The code here only 
increase count for the first time. I thought to initialize the count in 
xen_netbk_count_skb_slots with 1 to avoid this. But thinking of the 
extreme case when the header size is zero(not sure whether this case 
could be true), I increase the count here to keep safe in case header 
size is zero.

There is code correspond to that in gop_frag_copy in 
xen_netbk_count_skb_slots, see following,

+	if (skb_shinfo(skb)->gso_size && !vif->gso_prefix)
+		count++;
(The original code does not have gso_prefix, I added it in this patch too based on Wei's suggestion)


>
> If so then it would be useful to duplicate the comment, but more
> importantly the condition on gop_frag_copy is:
>          (head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> why the difference?

Actually it is similar with that in xen_netbk_count_skb_slots, see above 
comments.

Thanks
Annie
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell - July 11, 2013, 9:47 a.m.
On Thu, 2013-07-11 at 16:34 +0800, annie li wrote:
> On 2013-7-11 16:11, Ian Campbell wrote:
> > On Wed, 2013-07-10 at 17:15 +0800, Annie Li wrote:
> >> +static int netbk_count_slots(struct xenvif *vif, struct sk_buff *skb,
> >> +			    int *copy_off, unsigned long size,
> >> +			    unsigned long offset, int *head)
> >>   {
> >> -	unsigned int count;
> >> -	int i, copy_off;
> >> +	unsigned long bytes;
> >> +	int count = 0;
> >>   
> >> -	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> >> +	offset &= ~PAGE_MASK;
> >>   
> >> -	copy_off = skb_headlen(skb) % PAGE_SIZE;
> >> +	while (size > 0) {
> >> +		BUG_ON(offset >= PAGE_SIZE);
> >> +		BUG_ON(*copy_off > MAX_BUFFER_OFFSET);
> >>   
> >> -	if (skb_shinfo(skb)->gso_size)
> >> -		count++;
> >> +		bytes = PAGE_SIZE - offset;
> >>   
> >> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> >> -		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
> >> -		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
> >> -		unsigned long bytes;
> >> +		if (bytes > size)
> >> +			bytes = size;
> >>   
> >> -		offset &= ~PAGE_MASK;
> >> +		if (start_new_rx_buffer(*copy_off, bytes, *head)) {
> >> +			count++;
> >> +			*copy_off = 0;
> >> +		}
> >>   
> >> -		while (size > 0) {
> >> -			BUG_ON(offset >= PAGE_SIZE);
> >> -			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
> >> +		if (*copy_off + bytes > MAX_BUFFER_OFFSET)
> >> +			bytes = MAX_BUFFER_OFFSET - *copy_off;
> >>   
> >> -			bytes = PAGE_SIZE - offset;
> >> +		*copy_off += bytes;
> >>   
> >> -			if (bytes > size)
> >> -				bytes = size;
> >> +		offset += bytes;
> >> +		size -= bytes;
> >>   
> >> -			if (start_new_rx_buffer(copy_off, bytes, 0)) {
> >> -				count++;
> >> -				copy_off = 0;
> >> -			}
> >> +		/* Next frame */
> >> +		if (offset == PAGE_SIZE && size)
> >> +			offset = 0;
> >> +
> >> +		if (*head)
> >> +			count++;
> > This little bit corresponds to the "/* Leave a gap for the GSO
> > descriptor. */" in gop_frag_copy?
> 
> No, it does not correspond to this in gop_frag_copy.

So what does it correspond to?

>  The code here only 
> increase count for the first time. I thought to initialize the count in 
> xen_netbk_count_skb_slots with 1 to avoid this. But thinking of the 
> extreme case when the header size is zero(not sure whether this case 
> could be true), I increase the count here to keep safe in case header 
> size is zero.

netfront requires that the first slot always contains some data,
gop_frag_copy will BUG if that's not the case.

> 
> There is code correspond to that in gop_frag_copy in 
> xen_netbk_count_skb_slots, see following,
> 
> +	if (skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> +		count++;
> (The original code does not have gso_prefix, I added it in this patch
> too based on Wei's suggestion)

OK.

It's be nice if the count and copy variants of this logic could be as
similar as possible but they already differ quite a bit. I have
considered whether we could combine the two by adding "dry-run"
functionality to gop_frag_copy but that seems like it would just ugly up
both versions.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell - July 11, 2013, 11:12 a.m.
> 
> > 
> > 
> > > 
> > > > The code here only 
> > > > increase count for the first time. I thought to initialize the
> > > > count in 
> > > > xen_netbk_count_skb_slots with 1 to avoid this. But thinking of
> > > > the 
> > > > extreme case when the header size is zero(not sure whether this
> > > > case 
> > > > could be true), I increase the count here to keep safe in case
> > > > header 
> > > > size is zero.
> > > 
> > > netfront requires that the first slot always contains some data,
> > > gop_frag_copy will BUG if that's not the case.
> > > 
> 
> 
> In gop_frag_copy, we can not go into the while if the size is 0. Which
> BUG_ON do you mean here?

in gop_frag_copy:
		if (start_new_rx_buffer(npo->copy_off, bytes, *head)) {
			/*
			 * Netfront requires there to be some data in the head
			 * buffer.
			 */
			BUG_ON(*head);

Ian


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
annie.li@oracle.com - July 11, 2013, 1:35 p.m.
On 2013-7-11 19:12, Ian Campbell wrote:
>>>>> The code here only
>>>>> increase count for the first time. I thought to initialize the
>>>>> count in
>>>>> xen_netbk_count_skb_slots with 1 to avoid this. But thinking of
>>>>> the
>>>>> extreme case when the header size is zero(not sure whether this
>>>>> case
>>>>> could be true), I increase the count here to keep safe in case
>>>>> header
>>>>> size is zero.
>>>> netfront requires that the first slot always contains some data,
>>>> gop_frag_copy will BUG if that's not the case.
>>>>
>>
>> In gop_frag_copy, we can not go into the while if the size is 0. Which
>> BUG_ON do you mean here?
> in gop_frag_copy:
> 		if (start_new_rx_buffer(npo->copy_off, bytes, *head)) {
> 			/*
> 			 * Netfront requires there to be some data in the head
> 			 * buffer.
> 			 */
> 			BUG_ON(*head);
>
But If there is SKB with zero header size and zero offset, the code will 
not run into the while loop in gop_frag_copy and this if condition. 
BUG_ON will not happen in such situation.

Thanks
Annie
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 11, 2013, 7:04 p.m.
From: annie li <annie.li@oracle.com>
Date: Thu, 11 Jul 2013 10:48:58 +0800

> On 2013-7-11 10:18, David Miller wrote:
>>   I prefer MAX_BUFFER_OFFSET because it gives the indication that
>> what this value represents is the modulus upon which we must chop up
>> RX buffers in this driver.
> 
> Would PAGE_SIZE be more straight? MAX_BUFFER_OFFSET gives an idea of
> offset instead of length.
> Anyway, making it consistent is a good idea.

I think the choice is a tossup, and less important than consistency.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 11, 2013, 8:03 p.m.
From: Annie Li <annie.li@oracle.com>
Date: Wed, 10 Jul 2013 17:15:11 +0800

> When counting required slots for skb, netback directly uses DIV_ROUND_UP to get
> slots required by header data. This is wrong when offset in the page of header
> data is not zero, and is also inconsistent with following calculation for
> required slot in netbk_gop_skb.
> 
> In netbk_gop_skb, required slots are calculated based on offset and len in page
> of header data. It is possible that required slots here is larger than the one
> calculated in earlier netbk_count_requests. This inconsistency directly results
> in rx_req_cons_peek and xen_netbk_rx_ring_full judgement are wrong.
> 
> Then it comes to situation the ring is actually full, but netback thinks it is
> not and continues to create responses. This results in response overlaps request
> in the ring, then grantcopy gets wrong grant reference and throws out error,
> for example "(XEN) grant_table.c:1763:d0 Bad grant reference 2949120", the
> grant reference is invalid value here. Netback returns XEN_NETIF_RSP_ERROR(-1)
> to netfront when grant copy status is error, then netfront gets rx->status
> (the status is -1, not really data size now), and throws out error,
> "kernel: net eth1: rx->offset: 0, size: 4294967295". This issue can be reproduced
> by doing gzip/gunzip in nfs share with mtu = 9000, the guest would panic after
> running such test for a while.
> 
> This patch is based on 3.10-rc7.
> 
> Signed-off-by: Annie Li <annie.li@oracle.com>

A lot of discussion... will we have another respin of this patch or can I
get an ACK from Ian or someone else?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Liu - July 11, 2013, 9:12 p.m.
On Thu, Jul 11, 2013 at 9:03 PM, David Miller <davem@davemloft.net> wrote:
> From: Annie Li <annie.li@oracle.com>
> Date: Wed, 10 Jul 2013 17:15:11 +0800
>
>> When counting required slots for skb, netback directly uses DIV_ROUND_UP to get
>> slots required by header data. This is wrong when offset in the page of header
>> data is not zero, and is also inconsistent with following calculation for
>> required slot in netbk_gop_skb.
>>
>> In netbk_gop_skb, required slots are calculated based on offset and len in page
>> of header data. It is possible that required slots here is larger than the one
>> calculated in earlier netbk_count_requests. This inconsistency directly results
>> in rx_req_cons_peek and xen_netbk_rx_ring_full judgement are wrong.
>>
>> Then it comes to situation the ring is actually full, but netback thinks it is
>> not and continues to create responses. This results in response overlaps request
>> in the ring, then grantcopy gets wrong grant reference and throws out error,
>> for example "(XEN) grant_table.c:1763:d0 Bad grant reference 2949120", the
>> grant reference is invalid value here. Netback returns XEN_NETIF_RSP_ERROR(-1)
>> to netfront when grant copy status is error, then netfront gets rx->status
>> (the status is -1, not really data size now), and throws out error,
>> "kernel: net eth1: rx->offset: 0, size: 4294967295". This issue can be reproduced
>> by doing gzip/gunzip in nfs share with mtu = 9000, the guest would panic after
>> running such test for a while.
>>
>> This patch is based on 3.10-rc7.
>>
>> Signed-off-by: Annie Li <annie.li@oracle.com>
>
> A lot of discussion... will we have another respin of this patch or can I
> get an ACK from Ian or someone else?
>

Matt also proposed a solution to this issue ([PATCH RFC] xen-netback: calculate
the number of slots required for large MTU vifs -- it's posted on netdev as
well).

We're discussing these two patches at the moment and have not come to a
conclusion on which one to go in. I would really appreciate if you could wait a
little longer. Thanks.


Wei.

> Thanks.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell - July 16, 2013, 9 a.m.
On Thu, 2013-07-11 at 13:03 -0700, David Miller wrote:
> From: Annie Li <annie.li@oracle.com>
> Date: Wed, 10 Jul 2013 17:15:11 +0800
> 
> > When counting required slots for skb, netback directly uses DIV_ROUND_UP to get
> > slots required by header data. This is wrong when offset in the page of header
> > data is not zero, and is also inconsistent with following calculation for
> > required slot in netbk_gop_skb.
> > 
> > In netbk_gop_skb, required slots are calculated based on offset and len in page
> > of header data. It is possible that required slots here is larger than the one
> > calculated in earlier netbk_count_requests. This inconsistency directly results
> > in rx_req_cons_peek and xen_netbk_rx_ring_full judgement are wrong.
> > 
> > Then it comes to situation the ring is actually full, but netback thinks it is
> > not and continues to create responses. This results in response overlaps request
> > in the ring, then grantcopy gets wrong grant reference and throws out error,
> > for example "(XEN) grant_table.c:1763:d0 Bad grant reference 2949120", the
> > grant reference is invalid value here. Netback returns XEN_NETIF_RSP_ERROR(-1)
> > to netfront when grant copy status is error, then netfront gets rx->status
> > (the status is -1, not really data size now), and throws out error,
> > "kernel: net eth1: rx->offset: 0, size: 4294967295". This issue can be reproduced
> > by doing gzip/gunzip in nfs share with mtu = 9000, the guest would panic after
> > running such test for a while.
> > 
> > This patch is based on 3.10-rc7.
> > 
> > Signed-off-by: Annie Li <annie.li@oracle.com>
> 
> A lot of discussion... will we have another respin of this patch or can I
> get an ACK from Ian or someone else?

I was out at a conference last week and I'm heading off again for
another on Saturday. I need to go back through this thread and the
alternative from Matt and figure out what is what. I hope to get to it
in the back half of this week, or perhaps in some airport somewhere :-/.

Sorry for the delay.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 8c20935..d2a9478 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -354,56 +354,84 @@  static bool start_new_rx_buffer(int offset, unsigned long size, int head)
 	return false;
 }
 
-/*
- * Figure out how many ring slots we're going to need to send @skb to
- * the guest. This function is essentially a dry run of
- * netbk_gop_frag_copy.
- */
-unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
+static int netbk_count_slots(struct xenvif *vif, struct sk_buff *skb,
+			    int *copy_off, unsigned long size,
+			    unsigned long offset, int *head)
 {
-	unsigned int count;
-	int i, copy_off;
+	unsigned long bytes;
+	int count = 0;
 
-	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
+	offset &= ~PAGE_MASK;
 
-	copy_off = skb_headlen(skb) % PAGE_SIZE;
+	while (size > 0) {
+		BUG_ON(offset >= PAGE_SIZE);
+		BUG_ON(*copy_off > MAX_BUFFER_OFFSET);
 
-	if (skb_shinfo(skb)->gso_size)
-		count++;
+		bytes = PAGE_SIZE - offset;
 
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
-		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
-		unsigned long bytes;
+		if (bytes > size)
+			bytes = size;
 
-		offset &= ~PAGE_MASK;
+		if (start_new_rx_buffer(*copy_off, bytes, *head)) {
+			count++;
+			*copy_off = 0;
+		}
 
-		while (size > 0) {
-			BUG_ON(offset >= PAGE_SIZE);
-			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
+		if (*copy_off + bytes > MAX_BUFFER_OFFSET)
+			bytes = MAX_BUFFER_OFFSET - *copy_off;
 
-			bytes = PAGE_SIZE - offset;
+		*copy_off += bytes;
 
-			if (bytes > size)
-				bytes = size;
+		offset += bytes;
+		size -= bytes;
 
-			if (start_new_rx_buffer(copy_off, bytes, 0)) {
-				count++;
-				copy_off = 0;
-			}
+		/* Next frame */
+		if (offset == PAGE_SIZE && size)
+			offset = 0;
+
+		if (*head)
+			count++;
+		*head = 0; /* There must be something in this buffer now. */
+	}
+
+	return count;
+}
 
-			if (copy_off + bytes > MAX_BUFFER_OFFSET)
-				bytes = MAX_BUFFER_OFFSET - copy_off;
+/*
+ * Figure out how many ring slots we're going to need to send @skb to
+ * the guest. This function is essentially a dry run of
+ * netbk_gop_frag_copy.
+ */
+unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
+{
+	int i, copy_off = 0;
+	int nr_frags = skb_shinfo(skb)->nr_frags;
+	unsigned char *data;
+	int head = 1;
+	unsigned int count = 0;
 
-			copy_off += bytes;
+	if (skb_shinfo(skb)->gso_size && !vif->gso_prefix)
+		count++;
 
-			offset += bytes;
-			size -= bytes;
+	data = skb->data;
+	while (data < skb_tail_pointer(skb)) {
+		unsigned int offset = offset_in_page(data);
+		unsigned int len = PAGE_SIZE - offset;
 
-			if (offset == PAGE_SIZE)
-				offset = 0;
-		}
+		if (data + len > skb_tail_pointer(skb))
+			len = skb_tail_pointer(skb) - data;
+
+		count += netbk_count_slots(vif, skb, &copy_off, len,
+					   offset, &head);
+		data += len;
+	}
+
+	for (i = 0; i < nr_frags; i++) {
+		count += netbk_count_slots(vif, skb, &copy_off,
+				skb_frag_size(&skb_shinfo(skb)->frags[i]),
+				skb_shinfo(skb)->frags[i].page_offset, &head);
 	}
+
 	return count;
 }