diff mbox

RFC: handling "backend too fast" in virtio-net

Message ID 20130214182157.GA44257@onelab2.iet.unipi.it
State New
Headers show

Commit Message

Luigi Rizzo Feb. 14, 2013, 6:21 p.m. UTC
virtio-style network devices (where the producer and consumer chase
each other through a shared memory block) can enter into a
bad operating regime when the consumer is too fast.

I am hitting this case right now when virtio is used on top of the
netmap/VALE backend that I posted a few weeks ago: what happens is that
the backend is so fast that the io thread keeps re-enabling notifications
every few packets.  This results, on my test machine, in a throughput of
250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps).

I'd like to get some feedback on the following trivial patch to have
the thread keep spinning for a bounded amount of time when the producer
is slower than the consumer. This gives a relatively stable throughput
between 700 and 800 Kpps (we have something similar in our paravirtualized
e1000 driver, which is slightly faster at 900-1100 Kpps).

EXISTING LOGIC: reschedule the bh when at least a burst of packets has
   been received. Otherwise enable notifications and do another
   race-prevention check.

NEW LOGIC: when the bh is scheduled the first time, establish a
   budget (currently 20) of reschedule attempts.
   Every time virtio_net_flush_tx() returns 0 packets [this could 
   be changed to 'less than a full burst'] the counter is increased.
   The bh is rescheduled until the counter reaches the budget,
   at which point we re-enable notifications as above.

I am not 100% happy with this patch because there is a "magic"
constant (the maximum number of retries) which should be really
adaptive, or perhaps we should even bound the amount of time the
bh runs, rather than the number of retries.
In practice, having the thread spin (or sleep) for 10-20us 
even without new packets is probably preferable to 
re-enable notifications and request a kick.

opinions ?

cheers
luigi

Comments

Stefan Hajnoczi Feb. 15, 2013, 10:24 a.m. UTC | #1
On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote:

CCed Michael Tsirkin

> virtio-style network devices (where the producer and consumer chase
> each other through a shared memory block) can enter into a
> bad operating regime when the consumer is too fast.
> 
> I am hitting this case right now when virtio is used on top of the
> netmap/VALE backend that I posted a few weeks ago: what happens is that
> the backend is so fast that the io thread keeps re-enabling notifications
> every few packets.  This results, on my test machine, in a throughput of
> 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps).
> 
> I'd like to get some feedback on the following trivial patch to have
> the thread keep spinning for a bounded amount of time when the producer
> is slower than the consumer. This gives a relatively stable throughput
> between 700 and 800 Kpps (we have something similar in our paravirtualized
> e1000 driver, which is slightly faster at 900-1100 Kpps).

Did you experiment with tx timer instead of bh?  It seems that
hw/virtio-net.c has two tx mitigation strategies - the bh approach that
you've tweaked and a true timer.

It seems you don't really want tx batching but you do want to avoid
guest->host notifications?

> EXISTING LOGIC: reschedule the bh when at least a burst of packets has
>    been received. Otherwise enable notifications and do another
>    race-prevention check.
> 
> NEW LOGIC: when the bh is scheduled the first time, establish a
>    budget (currently 20) of reschedule attempts.
>    Every time virtio_net_flush_tx() returns 0 packets [this could 
>    be changed to 'less than a full burst'] the counter is increased.
>    The bh is rescheduled until the counter reaches the budget,
>    at which point we re-enable notifications as above.
> 
> I am not 100% happy with this patch because there is a "magic"
> constant (the maximum number of retries) which should be really
> adaptive, or perhaps we should even bound the amount of time the
> bh runs, rather than the number of retries.
> In practice, having the thread spin (or sleep) for 10-20us 
> even without new packets is probably preferable to 
> re-enable notifications and request a kick.
> 
> opinions ?
> 
> cheers
> luigi
> 
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index 573c669..5389088 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -49,6 +51,7 @@ typedef struct VirtIONet
>      NICState *nic;
>      uint32_t tx_timeout;
>      int32_t tx_burst;
> +    int32_t tx_retries; /* keep spinning a bit on tx */
>      uint32_t has_vnet_hdr;
>      size_t host_hdr_len;
>      size_t guest_hdr_len;
> @@ -1062,7 +1065,9 @@ static void virtio_net_tx_bh(void *opaque)
> 
>      /* If we flush a full burst of packets, assume there are
>       * more coming and immediately reschedule */
> -    if (ret >= n->tx_burst) {
> +    if (ret == 0)
> +       n->tx_retries++;
> +    if (n->tx_retries < 20) {
>          qemu_bh_schedule(q->tx_bh);
>          q->tx_waiting = 1;
>          return;
> @@ -1076,6 +1082,8 @@ static void virtio_net_tx_bh(void *opaque)
>          virtio_queue_set_notification(q->tx_vq, 0);
>          qemu_bh_schedule(q->tx_bh);
>          q->tx_waiting = 1;
> +    } else {
> +       n->tx_retries = 0;
>      }
>  }
> 
> 
>
Luigi Rizzo Feb. 15, 2013, 9:56 p.m. UTC | #2
On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote:
> On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote:
> 
> CCed Michael Tsirkin
> 
> > virtio-style network devices (where the producer and consumer chase
> > each other through a shared memory block) can enter into a
> > bad operating regime when the consumer is too fast.
> > 
> > I am hitting this case right now when virtio is used on top of the
> > netmap/VALE backend that I posted a few weeks ago: what happens is that
> > the backend is so fast that the io thread keeps re-enabling notifications
> > every few packets.  This results, on my test machine, in a throughput of
> > 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps).
> > 
> > I'd like to get some feedback on the following trivial patch to have
> > the thread keep spinning for a bounded amount of time when the producer
> > is slower than the consumer. This gives a relatively stable throughput
> > between 700 and 800 Kpps (we have something similar in our paravirtualized
> > e1000 driver, which is slightly faster at 900-1100 Kpps).
> 
> Did you experiment with tx timer instead of bh?  It seems that
> hw/virtio-net.c has two tx mitigation strategies - the bh approach that
> you've tweaked and a true timer.
> 
> It seems you don't really want tx batching but you do want to avoid
> guest->host notifications?

i have just tried:
bh (with my tweaks): 700-800Kpps (large variance)
timer (150us, 256 slots):  ~490Kpps (very stable)

As expected The throughput in the timer version seems proportional
to ring_size/timeout , e.g. 1ms and 256slots
give approx 210Kpps, 1ms and 128 slots give 108Kpps.
but it saturates around 490Kpps.

cheers
luigi

> > EXISTING LOGIC: reschedule the bh when at least a burst of packets has
> >    been received. Otherwise enable notifications and do another
> >    race-prevention check.
> > 
> > NEW LOGIC: when the bh is scheduled the first time, establish a
> >    budget (currently 20) of reschedule attempts.
> >    Every time virtio_net_flush_tx() returns 0 packets [this could 
> >    be changed to 'less than a full burst'] the counter is increased.
> >    The bh is rescheduled until the counter reaches the budget,
> >    at which point we re-enable notifications as above.
> > 
> > I am not 100% happy with this patch because there is a "magic"
> > constant (the maximum number of retries) which should be really
> > adaptive, or perhaps we should even bound the amount of time the
> > bh runs, rather than the number of retries.
> > In practice, having the thread spin (or sleep) for 10-20us 
> > even without new packets is probably preferable to 
> > re-enable notifications and request a kick.
> > 
> > opinions ?
> > 
> > cheers
> > luigi
> > 
> > diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> > index 573c669..5389088 100644
> > --- a/hw/virtio-net.c
> > +++ b/hw/virtio-net.c
> > @@ -49,6 +51,7 @@ typedef struct VirtIONet
> >      NICState *nic;
> >      uint32_t tx_timeout;
> >      int32_t tx_burst;
> > +    int32_t tx_retries; /* keep spinning a bit on tx */
> >      uint32_t has_vnet_hdr;
> >      size_t host_hdr_len;
> >      size_t guest_hdr_len;
> > @@ -1062,7 +1065,9 @@ static void virtio_net_tx_bh(void *opaque)
> > 
> >      /* If we flush a full burst of packets, assume there are
> >       * more coming and immediately reschedule */
> > -    if (ret >= n->tx_burst) {
> > +    if (ret == 0)
> > +       n->tx_retries++;
> > +    if (n->tx_retries < 20) {
> >          qemu_bh_schedule(q->tx_bh);
> >          q->tx_waiting = 1;
> >          return;
> > @@ -1076,6 +1082,8 @@ static void virtio_net_tx_bh(void *opaque)
> >          virtio_queue_set_notification(q->tx_vq, 0);
> >          qemu_bh_schedule(q->tx_bh);
> >          q->tx_waiting = 1;
> > +    } else {
> > +       n->tx_retries = 0;
> >      }
> >  }
> > 
> > 
> >
Stefan Hajnoczi Feb. 18, 2013, 9:50 a.m. UTC | #3
On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote:
> On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote:
> 
> CCed Michael Tsirkin
> 
> > virtio-style network devices (where the producer and consumer chase
> > each other through a shared memory block) can enter into a
> > bad operating regime when the consumer is too fast.
> > 
> > I am hitting this case right now when virtio is used on top of the
> > netmap/VALE backend that I posted a few weeks ago: what happens is that
> > the backend is so fast that the io thread keeps re-enabling notifications
> > every few packets.  This results, on my test machine, in a throughput of
> > 250-300Kpps (and extremely unstable, oscillating between 200 and 600Kpps).
> > 
> > I'd like to get some feedback on the following trivial patch to have
> > the thread keep spinning for a bounded amount of time when the producer
> > is slower than the consumer. This gives a relatively stable throughput
> > between 700 and 800 Kpps (we have something similar in our paravirtualized
> > e1000 driver, which is slightly faster at 900-1100 Kpps).
> 
> Did you experiment with tx timer instead of bh?  It seems that
> hw/virtio-net.c has two tx mitigation strategies - the bh approach that
> you've tweaked and a true timer.
> 
> It seems you don't really want tx batching but you do want to avoid
> guest->host notifications?

One more thing I forgot: virtio-net does not use ioeventfd by default.
ioeventfd changes the cost of guest->host notifications because the
notification becomes an eventfd signal inside the kernel and kvm.ko then
re-enters the guest.

This means a guest->host notification becomes a light-weight exit and we
don't return from ioctl(KVM_RUN).

Perhaps -device virtio-blk-pci,ioeventfd=on will give similar results to
your patch?

Stefan
Luigi Rizzo Feb. 18, 2013, 10:12 a.m. UTC | #4
On Mon, Feb 18, 2013 at 1:50 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote:
> > On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote:
> >
> > CCed Michael Tsirkin
> >
> > > virtio-style network devices (where the producer and consumer chase
> > > each other through a shared memory block) can enter into a
> > > bad operating regime when the consumer is too fast.
> > >
> > > I am hitting this case right now when virtio is used on top of the
> > > netmap/VALE backend that I posted a few weeks ago: what happens is that
> > > the backend is so fast that the io thread keeps re-enabling
> notifications
> > > every few packets.  This results, on my test machine, in a throughput
> of
> > > 250-300Kpps (and extremely unstable, oscillating between 200 and
> 600Kpps).
> > >
> > > I'd like to get some feedback on the following trivial patch to have
> > > the thread keep spinning for a bounded amount of time when the producer
> > > is slower than the consumer. This gives a relatively stable throughput
> > > between 700 and 800 Kpps (we have something similar in our
> paravirtualized
> > > e1000 driver, which is slightly faster at 900-1100 Kpps).
> >
> > Did you experiment with tx timer instead of bh?  It seems that
> > hw/virtio-net.c has two tx mitigation strategies - the bh approach that
> > you've tweaked and a true timer.
> >
> > It seems you don't really want tx batching but you do want to avoid
> > guest->host notifications?
>
> One more thing I forgot: virtio-net does not use ioeventfd by default.
> ioeventfd changes the cost of guest->host notifications because the
> notification becomes an eventfd signal inside the kernel and kvm.ko then
> re-enters the guest.
>
> This means a guest->host notification becomes a light-weight exit and we
> don't return from ioctl(KVM_RUN).
>
> Perhaps -device virtio-blk-pci,ioeventfd=on will give similar results to
> your patch?
>

is the ioeventfd the mechanism used by vhostnet ?
If so, Giuseppe Lettieri (in Cc) has tried that with
a modified netmap backend and experienced the same
problem -- making the io thread (user or kernel)
spin a bit more has great benefit on the throughput.

cheers
luigi
Stefan Hajnoczi Feb. 18, 2013, 2:02 p.m. UTC | #5
On Mon, Feb 18, 2013 at 02:12:13AM -0800, Luigi Rizzo wrote:
> On Mon, Feb 18, 2013 at 1:50 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > On Fri, Feb 15, 2013 at 11:24:29AM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Feb 14, 2013 at 07:21:57PM +0100, Luigi Rizzo wrote:
> > >
> > > CCed Michael Tsirkin
> > >
> > > > virtio-style network devices (where the producer and consumer chase
> > > > each other through a shared memory block) can enter into a
> > > > bad operating regime when the consumer is too fast.
> > > >
> > > > I am hitting this case right now when virtio is used on top of the
> > > > netmap/VALE backend that I posted a few weeks ago: what happens is that
> > > > the backend is so fast that the io thread keeps re-enabling
> > notifications
> > > > every few packets.  This results, on my test machine, in a throughput
> > of
> > > > 250-300Kpps (and extremely unstable, oscillating between 200 and
> > 600Kpps).
> > > >
> > > > I'd like to get some feedback on the following trivial patch to have
> > > > the thread keep spinning for a bounded amount of time when the producer
> > > > is slower than the consumer. This gives a relatively stable throughput
> > > > between 700 and 800 Kpps (we have something similar in our
> > paravirtualized
> > > > e1000 driver, which is slightly faster at 900-1100 Kpps).
> > >
> > > Did you experiment with tx timer instead of bh?  It seems that
> > > hw/virtio-net.c has two tx mitigation strategies - the bh approach that
> > > you've tweaked and a true timer.
> > >
> > > It seems you don't really want tx batching but you do want to avoid
> > > guest->host notifications?
> >
> > One more thing I forgot: virtio-net does not use ioeventfd by default.
> > ioeventfd changes the cost of guest->host notifications because the
> > notification becomes an eventfd signal inside the kernel and kvm.ko then
> > re-enters the guest.
> >
> > This means a guest->host notification becomes a light-weight exit and we
> > don't return from ioctl(KVM_RUN).
> >
> > Perhaps -device virtio-blk-pci,ioeventfd=on will give similar results to
> > your patch?
> >
> 
> is the ioeventfd the mechanism used by vhostnet ?
> If so, Giuseppe Lettieri (in Cc) has tried that with
> a modified netmap backend and experienced the same
> problem -- making the io thread (user or kernel)
> spin a bit more has great benefit on the throughput.

Yes, ioeventfd is how the vhost_net kernel module is notified by the
guest.

It's easy to test, just use -device virtio-net-pci,ioeventfd=on,...

Stefan
diff mbox

Patch

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 573c669..5389088 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -49,6 +51,7 @@  typedef struct VirtIONet
     NICState *nic;
     uint32_t tx_timeout;
     int32_t tx_burst;
+    int32_t tx_retries; /* keep spinning a bit on tx */
     uint32_t has_vnet_hdr;
     size_t host_hdr_len;
     size_t guest_hdr_len;
@@ -1062,7 +1065,9 @@  static void virtio_net_tx_bh(void *opaque)

     /* If we flush a full burst of packets, assume there are
      * more coming and immediately reschedule */
-    if (ret >= n->tx_burst) {
+    if (ret == 0)
+       n->tx_retries++;
+    if (n->tx_retries < 20) {
         qemu_bh_schedule(q->tx_bh);
         q->tx_waiting = 1;
         return;
@@ -1076,6 +1082,8 @@  static void virtio_net_tx_bh(void *opaque)
         virtio_queue_set_notification(q->tx_vq, 0);
         qemu_bh_schedule(q->tx_bh);
         q->tx_waiting = 1;
+    } else {
+       n->tx_retries = 0;
     }
 }