diff mbox

macvtap: Limit packet queue length

Message ID 20100722064157.GA25913@gondor.apana.org.au
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu July 22, 2010, 6:41 a.m. UTC
Hi:

macvtap: Limit packet queue length

Mark Wagner reported OOM symptoms when sending UDP traffic over
a macvtap link to a kvm receiver.

This appears to be caused by the fact that macvtap packet queues
are unlimited in length.  This means that if the receiver can't
keep up with the rate of flow, then we will hit OOM. Of course
it gets worse if the OOM killer then decides to kill the receiver.

This patch imposes a cap on the packet queue length, in the same
way as the tuntap driver, using the device TX queue length.

Please note that macvtap currently has no way of giving congestion
notification, that means the software device TX queue cannot be
used and packets will always be dropped once the macvtap driver
queue fills up.

This shouldn't be a great problem for the scenario where macvtap
is used to feed a kvm receiver, as the traffic is most likely
external in origin so congestion notification can't be applied
anyway.

Of course, if anybody decides to complain about guest-to-guest
UDP packet loss down the track, then we may have to revisit this.

Incidentally, this patch also fixes a real memory leak when
macvtap_get_queue fails.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


Cheers,

Comments

Shirley Ma July 22, 2010, 3:59 p.m. UTC | #1
On Thu, 2010-07-22 at 14:41 +0800, Herbert Xu wrote:
>  {
>         struct macvtap_queue *q = macvtap_get_queue(dev, skb);
>         if (!q)
> -               return -ENOLINK;
> +               goto drop;
> +
> +       if (skb_queue_len(&q->sk.sk_receive_queue) >=
> dev->tx_queue_len)
> +               goto drop;
> 

Do we need to orphan skb here, just like tun?

>         skb_queue_tail(&q->sk.sk_receive_queue, skb);
>         wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN |
> POLLRDNORM | POLLRDBAND);
> -       return 0;
> +       return NET_RX_SUCCESS;
> +
> +drop:

Do we need to increase dropped++ counter here to let user know there are
packets dropped?

> +       kfree_skb(skb);
> +       return NET_RX_DROP;
>  }
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 22, 2010, 4:07 p.m. UTC | #2
On Thu, Jul 22, 2010 at 08:59:58AM -0700, Shirley Ma wrote:
> On Thu, 2010-07-22 at 14:41 +0800, Herbert Xu wrote:
> >  {
> >         struct macvtap_queue *q = macvtap_get_queue(dev, skb);
> >         if (!q)
> > -               return -ENOLINK;
> > +               goto drop;
> > +
> > +       if (skb_queue_len(&q->sk.sk_receive_queue) >=
> > dev->tx_queue_len)
> > +               goto drop;
> > 
> 
> Do we need to orphan skb here, just like tun?

We could, but that is orthogonal to the problem at hand so feel
free to do that in another patch.

> >         skb_queue_tail(&q->sk.sk_receive_queue, skb);
> >         wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN |
> > POLLRDNORM | POLLRDBAND);
> > -       return 0;
> > +       return NET_RX_SUCCESS;
> > +
> > +drop:
> 
> Do we need to increase dropped++ counter here to let user know there are
> packets dropped?

The caller is supposed to handle this.

Cheers,
David Miller July 22, 2010, 7:58 p.m. UTC | #3
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 23 Jul 2010 00:07:31 +0800

> On Thu, Jul 22, 2010 at 08:59:58AM -0700, Shirley Ma wrote:
>> On Thu, 2010-07-22 at 14:41 +0800, Herbert Xu wrote:
>> >  {
>> >         struct macvtap_queue *q = macvtap_get_queue(dev, skb);
>> >         if (!q)
>> > -               return -ENOLINK;
>> > +               goto drop;
>> > +
>> > +       if (skb_queue_len(&q->sk.sk_receive_queue) >=
>> > dev->tx_queue_len)
>> > +               goto drop;
>> > 
>> 
>> Do we need to orphan skb here, just like tun?
> 
> We could, but that is orthogonal to the problem at hand so feel
> free to do that in another patch.

These days, the stack pre-orphans all packets sent to ->ndo_start_xmit()
in dev_hard_start_xmit() as long as socket based TX timestamping is not
active for the packet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann July 23, 2010, 7:28 a.m. UTC | #4
On Thursday 22 July 2010, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Fri, 23 Jul 2010 00:07:31 +0800
> 
> > On Thu, Jul 22, 2010 at 08:59:58AM -0700, Shirley Ma wrote:
> >> On Thu, 2010-07-22 at 14:41 +0800, Herbert Xu wrote:
> >> >  {
> >> >         struct macvtap_queue *q = macvtap_get_queue(dev, skb);
> >> >         if (!q)
> >> > -               return -ENOLINK;
> >> > +               goto drop;
> >> > +
> >> > +       if (skb_queue_len(&q->sk.sk_receive_queue) >=
> >> > dev->tx_queue_len)
> >> > +               goto drop;
> >> > 
> >> 
> >> Do we need to orphan skb here, just like tun?
> > 
> > We could, but that is orthogonal to the problem at hand so feel
> > free to do that in another patch.
> 
> These days, the stack pre-orphans all packets sent to ->ndo_start_xmit()
> in dev_hard_start_xmit() as long as socket based TX timestamping is not
> active for the packet.

But this is the receive path, not transmit, so a packet coming from an
external NIC never goes through dev_hard_start_xmit.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 23, 2010, 7:58 a.m. UTC | #5
On Fri, Jul 23, 2010 at 09:28:08AM +0200, Arnd Bergmann wrote:
>
> But this is the receive path, not transmit, so a packet coming from an
> external NIC never goes through dev_hard_start_xmit.

In that case skb->sk is probably coming from tuntap, which means
that orphaning the skb is not necessarily a good thing.

Cheers,
diff mbox

Patch

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index a8a94e2..488d3b9 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -180,11 +180,18 @@  static int macvtap_forward(struct net_device *dev, struct sk_buff *skb)
 {
 	struct macvtap_queue *q = macvtap_get_queue(dev, skb);
 	if (!q)
-		return -ENOLINK;
+		goto drop;
+
+	if (skb_queue_len(&q->sk.sk_receive_queue) >= dev->tx_queue_len)
+		goto drop;
 
 	skb_queue_tail(&q->sk.sk_receive_queue, skb);
 	wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN | POLLRDNORM | POLLRDBAND);
-	return 0;
+	return NET_RX_SUCCESS;
+
+drop:
+	kfree_skb(skb);
+	return NET_RX_DROP;
 }
 
 /*