diff mbox

[net-next] xen-netfront: try linearizing SKB if it occupies too many slots

Message ID 1400238496-2471-1-git-send-email-wei.liu2@citrix.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Wei Liu May 16, 2014, 11:08 a.m. UTC
Some workload, such as Redis can generate SKBs which make use of
compound pages. Netfront doesn't quite like that because it doesn't want
to send packet that occupies exessive slots to the backend as backend
might deem it malicious. On the flip side these packets are actually
legit, the size check at the beginning of xennet_start_xmit ensures that
packet size is below 64K.

So we linearize SKB if it occupies too many slots. If the linearization
fails then the SKB is dropped.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netfront.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

Eric Dumazet May 16, 2014, 1:04 p.m. UTC | #1
On Fri, 2014-05-16 at 12:08 +0100, Wei Liu wrote:
> Some workload, such as Redis can generate SKBs which make use of
> compound pages. Netfront doesn't quite like that because it doesn't want
> to send packet that occupies exessive slots to the backend as backend
> might deem it malicious. On the flip side these packets are actually
> legit, the size check at the beginning of xennet_start_xmit ensures that
> packet size is below 64K.
> 
> So we linearize SKB if it occupies too many slots. If the linearization
> fails then the SKB is dropped.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Konrad Wilk <konrad.wilk@oracle.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)

This is likely to fail on typical host.

What about adding a smart helper trying to aggregate consecutive
smallest fragments into a single frag ?

This would be needed for bnx2x for example as well.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Liu May 16, 2014, 1:11 p.m. UTC | #2
On Fri, May 16, 2014 at 06:04:34AM -0700, Eric Dumazet wrote:
> On Fri, 2014-05-16 at 12:08 +0100, Wei Liu wrote:
> > Some workload, such as Redis can generate SKBs which make use of
> > compound pages. Netfront doesn't quite like that because it doesn't want
> > to send packet that occupies exessive slots to the backend as backend
> > might deem it malicious. On the flip side these packets are actually
> > legit, the size check at the beginning of xennet_start_xmit ensures that
> > packet size is below 64K.
> > 
> > So we linearize SKB if it occupies too many slots. If the linearization
> > fails then the SKB is dropped.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Cc: David Vrabel <david.vrabel@citrix.com>
> > Cc: Konrad Wilk <konrad.wilk@oracle.com>
> > Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > Cc: Stefan Bader <stefan.bader@canonical.com>
> > Cc: Zoltan Kiss <zoltan.kiss@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> This is likely to fail on typical host.
> 

It's not that common to trigger this, I only saw a few reports. In fact
Stefan's report is the first one that comes with a method to reproduce
it.

I tested with redis-benchmark on a guest with 256MB RAM and only saw a
few "failed to linearize", never saw a single one with 1GB guest.

> What about adding a smart helper trying to aggregate consecutive
> smallest fragments into a single frag ?
> 

Ideally this is a better apporach, but I'm afraid I won't be able to
look into this until early / mid June.

Wei.

> This would be needed for bnx2x for example as well.
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 16, 2014, 2:21 p.m. UTC | #3
On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:

> It's not that common to trigger this, I only saw a few reports. In fact
> Stefan's report is the first one that comes with a method to reproduce
> it.
> 
> I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> few "failed to linearize", never saw a single one with 1GB guest.

Well, I am just saying. This is asking order-5 allocations, and yes,
this is going to fail after few days of uptime, no matter what you try.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Liu May 16, 2014, 2:36 p.m. UTC | #4
On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
> On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
> 
> > It's not that common to trigger this, I only saw a few reports. In fact
> > Stefan's report is the first one that comes with a method to reproduce
> > it.
> > 
> > I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> > few "failed to linearize", never saw a single one with 1GB guest.
> 
> Well, I am just saying. This is asking order-5 allocations, and yes,
> this is going to fail after few days of uptime, no matter what you try.
> 

Hmm... I see what you mean -- memory fragmentation leads to allocation
failure. Thanks.

> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 16, 2014, 3:22 p.m. UTC | #5
On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
> On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
> > On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
> > 
> > > It's not that common to trigger this, I only saw a few reports. In fact
> > > Stefan's report is the first one that comes with a method to reproduce
> > > it.
> > > 
> > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> > > few "failed to linearize", never saw a single one with 1GB guest.
> > 
> > Well, I am just saying. This is asking order-5 allocations, and yes,
> > this is going to fail after few days of uptime, no matter what you try.
> > 
> 
> Hmm... I see what you mean -- memory fragmentation leads to allocation
> failure. Thanks.

In the mean time, have you tried to lower gso_max_size ?

Setting it witk netif_set_gso_max_size() to something like 56000 might
avoid the problem.

(Not sure if it is applicable in your case)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Liu May 16, 2014, 3:34 p.m. UTC | #6
On Fri, May 16, 2014 at 08:22:19AM -0700, Eric Dumazet wrote:
> On Fri, 2014-05-16 at 15:36 +0100, Wei Liu wrote:
> > On Fri, May 16, 2014 at 07:21:08AM -0700, Eric Dumazet wrote:
> > > On Fri, 2014-05-16 at 14:11 +0100, Wei Liu wrote:
> > > 
> > > > It's not that common to trigger this, I only saw a few reports. In fact
> > > > Stefan's report is the first one that comes with a method to reproduce
> > > > it.
> > > > 
> > > > I tested with redis-benchmark on a guest with 256MB RAM and only saw a
> > > > few "failed to linearize", never saw a single one with 1GB guest.
> > > 
> > > Well, I am just saying. This is asking order-5 allocations, and yes,
> > > this is going to fail after few days of uptime, no matter what you try.
> > > 
> > 
> > Hmm... I see what you mean -- memory fragmentation leads to allocation
> > failure. Thanks.
> 
> In the mean time, have you tried to lower gso_max_size ?
> 
> Setting it witk netif_set_gso_max_size() to something like 56000 might
> avoid the problem.
> 
> (Not sure if it is applicable in your case)
> 

It works, at least in this Redis testcase. Could you explain a bit where
this 56000 magic number comes from? :-)

Presumably I can derive it from some constant in core network code?

Wei.

> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 895355d..b378dcd 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -573,9 +573,20 @@  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-		net_alert_ratelimited(
-			"xennet: skb rides the rocket: %d slots\n", slots);
-		goto drop;
+		if (skb_linearize(skb)) {
+			net_alert_ratelimited(
+				"xennet: failed to linearize skb, skb dropped\n");
+			goto drop;
+		}
+		data = skb->data;
+		offset = offset_in_page(data);
+		len = skb_headlen(skb);
+		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE);
+		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+			net_alert_ratelimited(
+				"xennet: still too many slots after linerization: %d", slots);
+			goto drop;
+		}
 	}
 
 	spin_lock_irqsave(&np->tx_lock, flags);