Patchwork [Xen-devel] xen/netfront: handle compound page fragments on transmit

login
register
mail settings
Submitter Ian Campbell
Date Nov. 20, 2012, 2:14 p.m.
Message ID <1353420865.13542.44.camel@zakaz.uk.xensource.com>
Download mbox | patch
Permalink /patch/200341/
State Superseded
Delegated to: David Miller
Headers show

Comments

Ian Campbell - Nov. 20, 2012, 2:14 p.m.
On Tue, 2012-11-20 at 13:51 +0000, Jan Beulich wrote:
> >>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote:
> >> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote:
> >> > An SKB paged fragment can consist of a compound page with order > 0.
> >> > However the netchannel protocol deals only in PAGE_SIZE frames.
> >> > 
> >> > Handle this in xennet_make_frags by iterating over the frames which
> >> > make up the page.
> >> > 
> >> > This is the netfront equivalent to 6a8ed462f16b for netback.
> >> 
> >> Wouldn't you need to be at least a little more conservative here
> >> with respect to resource use: I realize that get_id_from_freelist()
> >> return values were never checked, and failure of
> >> gnttab_claim_grant_reference() was always dealt with via
> >> BUG_ON(), but considering that netfront_tx_slot_available()
> >> doesn't account for compound page fragments, I think this (lack
> >> of) error handling needs improvement in the course of the
> >> change here (regardless of - I think - someone having said that
> >> usually the sum of all pages referenced from an skb's fragments
> >> would not exceed MAX_SKB_FRAGS - "usually" just isn't enough
> >> imo).
> > 
> > I think it is more than "usually", it is derived from the number of
> > pages needed to contain 64K of data which is the maximum size of the
> > data associated with an skb (AIUI).
> > 
> > Unwinding from failure in xennet_make_frags looks pretty tricky,
> 
> Yes, I agree.
> 
> > but how about this incremental patch:
> 
> Looks good, but can probably be simplified quite a bit:
> 
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
> >  	np->tx.req_prod_pvt = prod;
> >  }
> >  
> > +/*
> > + * Count how many ring slots are required to send the frags of this
> > + * skb. Each frag might be a compound page.
> > + */
> > +static int xennet_count_skb_frag_pages(struct sk_buff *skb)
> > +{
> > +	int i, frags = skb_shinfo(skb)->nr_frags;
> > +	int pages = 0;
> > +
> > +	for (i = 0; i < frags; i++) {
> > +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> > +		unsigned long size = skb_frag_size(frag);
> > +		unsigned long offset = frag->page_offset;
> > +
> > +		/* Skip unused frames from start of page */
> > +		offset &= ~PAGE_MASK;
> > +
> > +		while (size > 0) {
> > +			unsigned long bytes;
> > +
> > +			BUG_ON(offset >= PAGE_SIZE);
> > +
> > +			bytes = PAGE_SIZE - offset;
> > +			if (bytes > size)
> > +				bytes = size;
> > +
> > +			offset += bytes;
> > +			size -= bytes;
> > +
> > +			/* Next frame */
> > +			if (offset == PAGE_SIZE && size) {
> > +				pages++;
> > +				offset = 0;
> > +			}
> > +		}
> 
> Isn't the whole loop equivalent to 
> 
> 		pages = PFN_UP(offset + size);
> 
> (at least as long as size is not zero)?

Er, yes. Wood for the trees etc...

I think using PFN_UP overcounts a bit since the data needed start in the
first frame of a compound frame, but if you keep the 
        /* Skip unused frames from start of page */
        offset &= ~PAGE_MASK;
        
I think that does the right thing

> > @@ -517,12 +557,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	grant_ref_t ref;
> >  	unsigned long mfn;
> >  	int notify;
> > -	int frags = skb_shinfo(skb)->nr_frags;
> > +	int frags;
> >  	unsigned int offset = offset_in_page(data);
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> > +	frags = xennet_count_skb_frag_pages(skb) +
> > +		DIV_ROUND_UP(offset + len, PAGE_SIZE);
> >  	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> 
> This condition would now need adjustment, though (because
> "frags" is no longer what its name says).

I think it already wasn't what the name says, since it included the skb
head too. Perhaps "slots" would be a better name?




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Beulich - Nov. 20, 2012, 2:32 p.m.
>>> On 20.11.12 at 15:14, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2012-11-20 at 13:51 +0000, Jan Beulich wrote:
>> >>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> > On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote:
>> >> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote:
>> >> > An SKB paged fragment can consist of a compound page with order > 0.
>> >> > However the netchannel protocol deals only in PAGE_SIZE frames.
>> >> > 
>> >> > Handle this in xennet_make_frags by iterating over the frames which
>> >> > make up the page.
>> >> > 
>> >> > This is the netfront equivalent to 6a8ed462f16b for netback.
>> >> 
>> >> Wouldn't you need to be at least a little more conservative here
>> >> with respect to resource use: I realize that get_id_from_freelist()
>> >> return values were never checked, and failure of
>> >> gnttab_claim_grant_reference() was always dealt with via
>> >> BUG_ON(), but considering that netfront_tx_slot_available()
>> >> doesn't account for compound page fragments, I think this (lack
>> >> of) error handling needs improvement in the course of the
>> >> change here (regardless of - I think - someone having said that
>> >> usually the sum of all pages referenced from an skb's fragments
>> >> would not exceed MAX_SKB_FRAGS - "usually" just isn't enough
>> >> imo).
>> > 
>> > I think it is more than "usually", it is derived from the number of
>> > pages needed to contain 64K of data which is the maximum size of the
>> > data associated with an skb (AIUI).
>> > 
>> > Unwinding from failure in xennet_make_frags looks pretty tricky,
>> 
>> Yes, I agree.
>> 
>> > but how about this incremental patch:
>> 
>> Looks good, but can probably be simplified quite a bit:
>> 
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, 
> struct net_device *dev,
>> >  	np->tx.req_prod_pvt = prod;
>> >  }
>> >  
>> > +/*
>> > + * Count how many ring slots are required to send the frags of this
>> > + * skb. Each frag might be a compound page.
>> > + */
>> > +static int xennet_count_skb_frag_pages(struct sk_buff *skb)
>> > +{
>> > +	int i, frags = skb_shinfo(skb)->nr_frags;
>> > +	int pages = 0;
>> > +
>> > +	for (i = 0; i < frags; i++) {
>> > +		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>> > +		unsigned long size = skb_frag_size(frag);
>> > +		unsigned long offset = frag->page_offset;
>> > +
>> > +		/* Skip unused frames from start of page */
>> > +		offset &= ~PAGE_MASK;
>> > +
>> > +		while (size > 0) {
>> > +			unsigned long bytes;
>> > +
>> > +			BUG_ON(offset >= PAGE_SIZE);
>> > +
>> > +			bytes = PAGE_SIZE - offset;
>> > +			if (bytes > size)
>> > +				bytes = size;
>> > +
>> > +			offset += bytes;
>> > +			size -= bytes;
>> > +
>> > +			/* Next frame */
>> > +			if (offset == PAGE_SIZE && size) {
>> > +				pages++;
>> > +				offset = 0;
>> > +			}
>> > +		}
>> 
>> Isn't the whole loop equivalent to 
>> 
>> 		pages = PFN_UP(offset + size);
>> 
>> (at least as long as size is not zero)?
> 
> Er, yes. Wood for the trees etc...
> 
> I think using PFN_UP overcounts a bit since the data needed start in the
> first frame of a compound frame, but if you keep the 
>         /* Skip unused frames from start of page */
>         offset &= ~PAGE_MASK;
>         
> I think that does the right thing

Right, that's what I said (I only wanted the loop to be replaced, not
what was prior to it).

> @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>  	grant_ref_t ref;
>  	unsigned long mfn;
>  	int notify;
> -	int frags = skb_shinfo(skb)->nr_frags;
> +	int slots;
>  	unsigned int offset = offset_in_page(data);
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> -		       frags);
> +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +		xennet_count_skb_frag_slots(skb);
> +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

But still - isn't this wrong now (i.e. can't it now validly exceed the
boundary checked for)?

Jan

> +		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
> +		       slots);
>  		dump_stack();
>  		goto drop;
>  	}

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell - Nov. 20, 2012, 3:06 p.m.
On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> >  	grant_ref_t ref;
> >  	unsigned long mfn;
> >  	int notify;
> > -	int frags = skb_shinfo(skb)->nr_frags;
> > +	int slots;
> >  	unsigned int offset = offset_in_page(data);
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> > -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> > -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> > -		       frags);
> > +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> > +		xennet_count_skb_frag_slots(skb);
> > +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 
> But still - isn't this wrong now (i.e. can't it now validly exceed the
> boundary checked for)?

In practice no because of the property that the number of pages backing
the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
the frags.

Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Nov. 20, 2012, 3:28 p.m.
On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote:

> In practice no because of the property that the number of pages backing
> the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> the frags.

Yes, but you can make this test trigger with some hacks from userland
(since the frag allocator is per task instead of per socket), so you
should remove the dump_stack() ?

Best way would be to count exact number of slots.

This could be something like 48 slots for a single skb

(if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000
bytes)

MAX_SKB_FRAGS is really number of frags, while your driver needs a count
of 'order-0' 'frames'



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Beulich - Nov. 20, 2012, 3:44 p.m.
>>> On 20.11.12 at 16:06, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:
>> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, 
>> > struct net_device *dev)
>> >  	grant_ref_t ref;
>> >  	unsigned long mfn;
>> >  	int notify;
>> > -	int frags = skb_shinfo(skb)->nr_frags;
>> > +	int slots;
>> >  	unsigned int offset = offset_in_page(data);
>> >  	unsigned int len = skb_headlen(skb);
>> >  	unsigned long flags;
>> >  
>> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
>> > -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
>> > -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
>> > -		       frags);
>> > +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>> > +		xennet_count_skb_frag_slots(skb);
>> > +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> 
>> But still - isn't this wrong now (i.e. can't it now validly exceed the
>> boundary checked for)?
> 
> In practice no because of the property that the number of pages backing
> the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> the frags.

So are you saying that there is something in the system
preventing up to MAX_SKB_FRAGS * SKB_FRAG_PAGE_ORDER
(or NETDEV_FRAG_PAGE_MAX_ORDER) skb-s to be created? I
didn't find any. I do notice that __netdev_alloc_frag() currently
never gets called with a size larger than PAGE_SIZE, but
considering that the function just recently got made capable of
that, I'm sure respective users will show up rather sooner than
later.

Jan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell - Nov. 20, 2012, 3:54 p.m.
On Tue, 2012-11-20 at 15:28 +0000, Eric Dumazet wrote:
> On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote:
> 
> > In practice no because of the property that the number of pages backing
> > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as
> > the frags.
> 
> Yes, but you can make this test trigger with some hacks from userland
> (since the frag allocator is per task instead of per socket), so you
> should remove the dump_stack() ?
> 
> Best way would be to count exact number of slots.
> 
> This could be something like 48 slots for a single skb
> 
> (if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000
> bytes)
> 
> MAX_SKB_FRAGS is really number of frags, while your driver needs a count
> of 'order-0' 'frames'

The use of MAX_SKB_FRAGS is a bit misleading here, it's really the max
number of slots which the other end will be willing to receive as a
single frame (in the Ethernet sense), as defined by the PV protocol. It
happens to be the same as MAX_SKB_FRAGS (or it is at least
MAX_SKB_FRAGS, I'm not too sure).

I'll nuke the dump_stack() though -- it's not clear what sort of useful
context it would contain anyway.

Ian.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index a12b99a..b744875 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -505,6 +505,29 @@  static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 	np->tx.req_prod_pvt = prod;
 }
 
+/*
+ * Count how many ring slots are required to send the frags of this
+ * skb. Each frag might be a compound page.
+ */
+static int xennet_count_skb_frag_slots(struct sk_buff *skb)
+{
+	int i, frags = skb_shinfo(skb)->nr_frags;
+	int pages = 0;
+
+	for (i = 0; i < frags; i++) {
+		skb_frag_t *frag = skb_shinfo(skb)->frags + i;
+		unsigned long size = skb_frag_size(frag);
+		unsigned long offset = frag->page_offset;
+
+		/* Skip unused frames from start of page */
+		offset &= ~PAGE_MASK;
+
+		pages += PFN_UP(offset + size);
+	}
+
+	return pages;
+}
+
 static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	unsigned short id;
@@ -517,15 +540,16 @@  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	grant_ref_t ref;
 	unsigned long mfn;
 	int notify;
-	int frags = skb_shinfo(skb)->nr_frags;
+	int slots;
 	unsigned int offset = offset_in_page(data);
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
-	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
-	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
-		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
-		       frags);
+	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
+		xennet_count_skb_frag_slots(skb);
+	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
+		       slots);
 		dump_stack();
 		goto drop;
 	}
@@ -533,7 +557,7 @@  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	spin_lock_irqsave(&np->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     (slots > 1 && !xennet_can_sg(dev)) ||
 		     netif_needs_gso(skb, netif_skb_features(skb)))) {
 		spin_unlock_irqrestore(&np->tx_lock, flags);
 		goto drop;