diff mbox

unbounded qemu NetQue's ?

Message ID 20130106192356.GB5019@onelab2.iet.unipi.it
State New
Headers show

Commit Message

Luigi Rizzo Jan. 6, 2013, 7:23 p.m. UTC
Hi,
while testing the tx path in qemu without a network backend connected,
i noticed that qemu_net_queue_send() builds up an unbounded
queue, because QTAILQ* have no notion of a queue length.

As a result, memory usage of a qemu instance can grow to extremely
large values.

I wonder if it makes sense to implement a hard limit on size of
NetQue's. The patch below is only a partial implementation
but probably sufficient for these purposes.

	cheers
	luigi

Comments

Stefan Hajnoczi Jan. 7, 2013, 1:49 p.m. UTC | #1
On Sun, Jan 06, 2013 at 08:23:56PM +0100, Luigi Rizzo wrote:
> Hi,
> while testing the tx path in qemu without a network backend connected,
> i noticed that qemu_net_queue_send() builds up an unbounded
> queue, because QTAILQ* have no notion of a queue length.
> 
> As a result, memory usage of a qemu instance can grow to extremely
> large values.
> 
> I wonder if it makes sense to implement a hard limit on size of
> NetQue's. The patch below is only a partial implementation
> but probably sufficient for these purposes.
> 
> 	cheers
> 	luigi

Hi Luigi,
Good idea, we should bound the queues to prevent rare situations or bugs
from hogging memory.

Ideally we would do away with queues completely and make net clients
hand buffers to each other ahead of time to impose flow control.  I've
thought about this a few times and always reached the conclusion that
it's not quite possible.

The scenario that calls for queuing packets is as follows:

The USB NIC code has a single buffer and it relies on the guest to empty
it before accepting the next packet.  The pcap net client consumes each
packet immediately.  Attach these net clients to a hub.  Since pcap can
always consume but USB NIC is very slow at consuming we have a problem.
This is where we rely on queuing.  Eventually all packets get processed
even by the slow USB NIC and the end result is that pcap and USB NIC
both see the same packets.

I thought about making pcap support built in to net.c but getting rid of
the pcap net client doesn't solve the problem, it just makes it less
extreme.

Any ideas on how to solve this?

> diff -urp qemu-1.3.0-orig/net/queue.c qemu-1.3.0/net/queue.c
> --- qemu-1.3.0-orig/net/queue.c	2012-12-03 20:37:05.000000000 +0100
> +++ qemu-1.3.0/net/queue.c	2013-01-06 19:38:12.000000000 +0100
> @@ -92,6 +92,9 @@ static void qemu_net_queue_append(NetQue

Please also do it for qemu_net_queue_append_iov().

>  {
>      NetPacket *packet;
>  
> +    if (queue->packets.tqh_count > 10000)
> +	return; // XXX drop
> +

sent_cb() must be invoked.  It's best to do this in a QEMUBH in case the
caller is not prepared for reentrancy.

>      packet = g_malloc(sizeof(NetPacket) + size);
>      packet->sender = sender;
>      packet->flags = flags;
> diff -urp qemu-1.3.0-orig/qemu-queue.h qemu-1.3.0/qemu-queue.h
> --- qemu-1.3.0-orig/qemu-queue.h	2012-12-03 20:37:05.000000000 +0100
> +++ qemu-1.3.0/qemu-queue.h	2013-01-06 19:34:01.000000000 +0100

Please don't modify qemu-queue.h.  It's a generic queue data structure
used by all of QEMU.  Instead, keep a counter in the NetQueue.

Stefan
Luigi Rizzo Jan. 7, 2013, 2:27 p.m. UTC | #2
On Mon, Jan 7, 2013 at 2:49 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Sun, Jan 06, 2013 at 08:23:56PM +0100, Luigi Rizzo wrote:
> > Hi,
> > while testing the tx path in qemu without a network backend connected,
> > i noticed that qemu_net_queue_send() builds up an unbounded
> > queue, because QTAILQ* have no notion of a queue length.
> >
> > As a result, memory usage of a qemu instance can grow to extremely
> > large values.
> >
> > I wonder if it makes sense to implement a hard limit on size of
> > NetQue's. The patch below is only a partial implementation
> > but probably sufficient for these purposes.
> >
> >       cheers
> >       luigi
>
> Hi Luigi,
> Good idea, we should bound the queues to prevent rare situations or bugs
> from hogging memory.
>
> Ideally we would do away with queues completely and make net clients
> hand buffers to each other ahead of time to impose flow control.  I've
> thought about this a few times and always reached the conclusion that
> it's not quite possible.
>

given that implementing flow control on the inbound (host->guest) path
is impossible, i tend to agree that removing queues is probably
not worth the effort even if possible at all.
...

Your comments below also remind me of a more general issues with the code:

 > diff -urp qemu-1.3.0-orig/net/queue.c qemu-1.3.0/net/queue.c
> > --- qemu-1.3.0-orig/net/queue.c       2012-12-03 20:37:05.000000000+0100
> > +++ qemu-1.3.0/net/queue.c    2013-01-06 19:38:12.000000000 +0100
> > @@ -92,6 +92,9 @@ static void qemu_net_queue_append(NetQue
>
> Please also do it for qemu_net_queue_append_iov().
>
>
the qemu code has many duplicate functions of the form foo() and foo_iov() .
Not only here, but even in the backents (e.g. see net/tap.c)
I think it would be good to settle on a single version of the function
and remove or convert the non-iov instances.

 >  {
> >      NetPacket *packet;
> >
> > +    if (queue->packets.tqh_count > 10000)
> > +     return; // XXX drop
> > +
>
> sent_cb() must be invoked.  It's best to do this in a QEMUBH in case the
> caller is not prepared for reentrancy.
>
>
i forgot that, but the use of sent_cb() is interesting:
the only place that actually uses it seems to be net/tap.c,
and the way it uses it only makes sense if the queue has
room!

>      packet = g_malloc(sizeof(NetPacket) + size);
> >      packet->sender = sender;
> >      packet->flags = flags;
> > diff -urp qemu-1.3.0-orig/qemu-queue.h qemu-1.3.0/qemu-queue.h
> > --- qemu-1.3.0-orig/qemu-queue.h      2012-12-03 20:37:05.000000000+0100
> > +++ qemu-1.3.0/qemu-queue.h   2013-01-06 19:34:01.000000000 +0100
>
> Please don't modify qemu-queue.h.  It's a generic queue data structure
> used by all of QEMU.  Instead, keep a counter in the NetQueue.
>
>
good suggestion, that also makes the change smaller.

cheers
luigi


> Stefan
>
Stefan Hajnoczi Jan. 10, 2013, 12:29 p.m. UTC | #3
On Mon, Jan 07, 2013 at 03:27:30PM +0100, Luigi Rizzo wrote:
> On Mon, Jan 7, 2013 at 2:49 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > On Sun, Jan 06, 2013 at 08:23:56PM +0100, Luigi Rizzo wrote:
>  > diff -urp qemu-1.3.0-orig/net/queue.c qemu-1.3.0/net/queue.c
> > > --- qemu-1.3.0-orig/net/queue.c       2012-12-03 20:37:05.000000000+0100
> > > +++ qemu-1.3.0/net/queue.c    2013-01-06 19:38:12.000000000 +0100
> > > @@ -92,6 +92,9 @@ static void qemu_net_queue_append(NetQue
> >
> > Please also do it for qemu_net_queue_append_iov().
> >
> >
> the qemu code has many duplicate functions of the form foo() and foo_iov() .
> Not only here, but even in the backents (e.g. see net/tap.c)
> I think it would be good to settle on a single version of the function
> and remove or convert the non-iov instances.

Yes.  Patches welcome.

Stefan
Luigi Rizzo Jan. 16, 2013, 9:57 p.m. UTC | #4
On Mon, Jan 7, 2013 at 5:49 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Sun, Jan 06, 2013 at 08:23:56PM +0100, Luigi Rizzo wrote:
> > Hi,
> > while testing the tx path in qemu without a network backend connected,
> > i noticed that qemu_net_queue_send() builds up an unbounded
> > queue, because QTAILQ* have no notion of a queue length.
> >
> > As a result, memory usage of a qemu instance can grow to extremely
> > large values.
> >
> > I wonder if it makes sense to implement a hard limit on size of
> > NetQue's. The patch below is only a partial implementation
> > but probably sufficient for these purposes.
> >
> >       cheers
> >       luigi
>
> Hi Luigi,
> Good idea, we should bound the queues to prevent rare situations or bugs
> from hogging memory.
>

...


>  > diff -urp qemu-1.3.0-orig/net/queue.c qemu-1.3.0/net/queue.c
> > --- qemu-1.3.0-orig/net/queue.c       2012-12-03 20:37:05.000000000+0100
> > +++ qemu-1.3.0/net/queue.c    2013-01-06 19:38:12.000000000 +0100
> > @@ -92,6 +92,9 @@ static void qemu_net_queue_append(NetQue
>
> Please also do it for qemu_net_queue_append_iov().
>
> >  {
> >      NetPacket *packet;
> >
> > +    if (queue->packets.tqh_count > 10000)
> > +     return; // XXX drop
> > +
>
> sent_cb() must be invoked.  It's best to do this in a QEMUBH in case the
> caller is not prepared for reentrancy.
>

Stefan, the semantic of callbacks makes it difficult to run them on drops:
they are supposed to run only when the queue has been drained,
and apparently only once per sender, according to the comment
in the header of net/queue.c:

/* The delivery handler may only return zero if it will call
 * qemu_net_queue_flush() when it determines that it is once again able
 * to deliver packets. It must also call qemu_net_queue_purge() in its
 * cleanup path.
 *
 * If a sent callback is provided to send(), the caller must handle a
 * zero return from the delivery handler by not sending any more packets
 * until we have invoked the callback. Only in that case will we queue
 * the packet.
 *
 * If a sent callback isn't provided, we just drop the packet to avoid
 * unbounded queueing.
 */

This seems to suggest that a packet to qemu_net_queue_send()
should never be queued unless it has a callback
(hence the existing code is buggy, because it never ever drops packets),
so the queue can only hold one packet per sender,
hence there is no real risk of overflow.

cheers
luigi
diff mbox

Patch

diff -urp qemu-1.3.0-orig/net/queue.c qemu-1.3.0/net/queue.c
--- qemu-1.3.0-orig/net/queue.c	2012-12-03 20:37:05.000000000 +0100
+++ qemu-1.3.0/net/queue.c	2013-01-06 19:38:12.000000000 +0100
@@ -92,6 +92,9 @@  static void qemu_net_queue_append(NetQue
 {
     NetPacket *packet;
 
+    if (queue->packets.tqh_count > 10000)
+	return; // XXX drop
+
     packet = g_malloc(sizeof(NetPacket) + size);
     packet->sender = sender;
     packet->flags = flags;
diff -urp qemu-1.3.0-orig/qemu-queue.h qemu-1.3.0/qemu-queue.h
--- qemu-1.3.0-orig/qemu-queue.h	2012-12-03 20:37:05.000000000 +0100
+++ qemu-1.3.0/qemu-queue.h	2013-01-06 19:34:01.000000000 +0100
@@ -320,11 +320,12 @@  struct {                                
 struct name {                                                           \
         qual type *tqh_first;           /* first element */             \
         qual type *qual *tqh_last;      /* addr of last next element */ \
+	int32_t tqh_count;						\
 }
 #define QTAILQ_HEAD(name, type)  Q_TAILQ_HEAD(name, struct type,)
 
 #define QTAILQ_HEAD_INITIALIZER(head)                                   \
-        { NULL, &(head).tqh_first }
+        { NULL, &(head).tqh_first, 0 }
 
 #define Q_TAILQ_ENTRY(type, qual)                                       \
 struct {                                                                \
@@ -339,6 +340,7 @@  struct {                                
 #define QTAILQ_INIT(head) do {                                          \
         (head)->tqh_first = NULL;                                       \
         (head)->tqh_last = &(head)->tqh_first;                          \
+        (head)->tqh_count = 0;						\
 } while (/*CONSTCOND*/0)
 
 #define QTAILQ_INSERT_HEAD(head, elm, field) do {                       \
@@ -348,6 +350,7 @@  struct {                                
         else                                                            \
                 (head)->tqh_last = &(elm)->field.tqe_next;              \
         (head)->tqh_first = (elm);                                      \
+        (head)->tqh_count++;						\
         (elm)->field.tqe_prev = &(head)->tqh_first;                     \
 } while (/*CONSTCOND*/0)
 
@@ -356,6 +359,7 @@  struct {                                
         (elm)->field.tqe_prev = (head)->tqh_last;                       \
         *(head)->tqh_last = (elm);                                      \
         (head)->tqh_last = &(elm)->field.tqe_next;                      \
+        (head)->tqh_count++;						\
 } while (/*CONSTCOND*/0)
 
 #define QTAILQ_INSERT_AFTER(head, listelm, elm, field) do {             \
@@ -381,6 +385,7 @@  struct {                                
                     (elm)->field.tqe_prev;                              \
         else                                                            \
                 (head)->tqh_last = (elm)->field.tqe_prev;               \
+        (head)->tqh_count--;						\
         *(elm)->field.tqe_prev = (elm)->field.tqe_next;                 \
 } while (/*CONSTCOND*/0)