diff mbox

[V2] xen/netfront: handle compound page fragments on transmit

Message ID 1353428844.2590.17.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Nov. 20, 2012, 4:27 p.m. UTC
On Tue, 2012-11-20 at 16:00 +0000, Ian Campbell wrote:
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> 
> Handle this in xennet_make_frags by iterating over the frames which
> make up the page.
> 
> This is the netfront equivalent to 6a8ed462f16b for netback.
...

> -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> -		       frags);
> -		dump_stack();
> +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +		xennet_count_skb_frag_slots(skb);
> +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
> +		       slots);

I think this is wrong.

You should change netfront_tx_slot_available() to stop the queue before
this can happen.

Yes, you dont hit this on your tests, but a driver should not drop a
good packet.

>  		goto drop;
>  	}
>  
>  	spin_lock_irqsave(&np->tx_lock, flags);
>  
>  	if (unlikely(!netif_carrier_ok(dev) ||
> -		     (frags > 1 && !xennet_can_sg(dev)) ||
> +		     (slots > 1 && !xennet_can_sg(dev)) ||
>  		     netif_needs_gso(skb, netif_skb_features(skb)))) {
>  		spin_unlock_irqrestore(&np->tx_lock, flags);
>  		goto drop;




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ian Campbell Nov. 21, 2012, 12:08 p.m. UTC | #1
On Tue, 2012-11-20 at 16:27 +0000, Eric Dumazet wrote:
> On Tue, 2012-11-20 at 16:00 +0000, Ian Campbell wrote:
> > An SKB paged fragment can consist of a compound page with order > 0.
> > However the netchannel protocol deals only in PAGE_SIZE frames.
> > 
> > Handle this in xennet_make_frags by iterating over the frames which
> > make up the page.
> > 
> > This is the netfront equivalent to 6a8ed462f16b for netback.
> ...
> 
> > -	frags += DIV_ROUND_UP(offset + len, PAGE_SIZE);
> > -	if (unlikely(frags > MAX_SKB_FRAGS + 1)) {
> > -		printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n",
> > -		       frags);
> > -		dump_stack();
> > +	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> > +		xennet_count_skb_frag_slots(skb);
> > +	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> > +		printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n",
> > +		       slots);
> 
> I think this is wrong.
> 
> You should change netfront_tx_slot_available() to stop the queue before
> this can happen.
>
> Yes, you dont hit this on your tests, but a driver should not drop a
> good packet.

The max-frag related limitation comes from the "wire" protocol used
between front and back. As it stands either the frontend or the backend
is more than likely going to drop the sort of pathalogical skbs you are
worried.

I agree that this absolutely needs to be fixed in the protocol (and I've
posted a call to arms on this topic on xen-devel) but I'd like to do it
in a coordinated manner as part of a protocol extension (where the front
and backend negotiate the maximum number of order-0 pages per Ethernet
frame they are willing to handle) rather than as a side effect of this
patch.

So right now I don't want to introduce frontends which default to
sending increased numbers of pages in to the wild, since that makes
things more complex when we come to extend the protocol.

Perhaps in the short term doing an skb_linearize when we hit this case
would help, that will turn the pathalogical skb into a much more normal
one. It'll be expensive but it should be rare. That assumes you can
linearize such a large skb, which depends on the ability to allocate
large order pages which isn't a given. Herm, maybe that doesn't work
then.

AFAIK we don't have an existing skb_foo operation which copies an skb,
including (or only) the frags, with the side effect of aligning and
coalescing them. Do we?

Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 21, 2012, 3:13 p.m. UTC | #2
On Wed, 2012-11-21 at 12:08 +0000, Ian Campbell wrote:
> The max-frag related limitation comes from the "wire" protocol used
> between front and back. As it stands either the frontend or the backend
> is more than likely going to drop the sort of pathalogical skbs you are
> worried.
> 
> I agree that this absolutely needs to be fixed in the protocol (and I've
> posted a call to arms on this topic on xen-devel) but I'd like to do it
> in a coordinated manner as part of a protocol extension (where the front
> and backend negotiate the maximum number of order-0 pages per Ethernet
> frame they are willing to handle) rather than as a side effect of this
> patch.
> 
> So right now I don't want to introduce frontends which default to
> sending increased numbers of pages in to the wild, since that makes
> things more complex when we come to extend the protocol.
> 
> Perhaps in the short term doing an skb_linearize when we hit this case
> would help, that will turn the pathalogical skb into a much more normal
> one. It'll be expensive but it should be rare. That assumes you can
> linearize such a large skb, which depends on the ability to allocate
> large order pages which isn't a given. Herm, maybe that doesn't work
> then.
> 

First of all, thanks a lot for all these detailed informations.

This now makes sense !


> AFAIK we don't have an existing skb_foo operation which copies an skb,
> including (or only) the frags, with the side effect of aligning and
> coalescing them. Do we?
> 

No, we only have the full linearize helper, and skb_try_coalesce()
helpers.

TCP stack uses an internal function to collapse several skbs so skbs
using a single page, I guess we could generalize this and make it
available to other uses.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index caa0110..cb1e605 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -215,10 +215,13 @@  static void rx_refill_timeout(unsigned long data)
        napi_schedule(&np->napi);
 }
 
+/* Considering a 64Kb packet of 16 frags, each frag can be mapped
+ * to 3 order-0 parts on pathological cases
+ */
 static int netfront_tx_slot_available(struct netfront_info *np)
 {
        return (np->tx.req_prod_pvt - np->tx.rsp_cons) <
-               (TX_MAX_TARGET - MAX_SKB_FRAGS - 2);
+               (TX_MAX_TARGET - 3*MAX_SKB_FRAGS - 2);
 }
 
 static void xennet_maybe_wake_tx(struct net_device *dev)