diff mbox

[0/2] virtio-pci: fix abort when fail to allocate ioeventfd

Message ID 385557e6-411b-4237-8c95-f6c1dfc858fe@zmail13.collab.prod.int.phx2.redhat.com
State New
Headers show

Commit Message

Amos Kong March 13, 2012, 2:47 p.m. UTC
----- Original Message -----
> On Tue, Mar 13, 2012 at 11:51 AM, Amos Kong <akong@redhat.com> wrote:
> > On 13/03/12 19:23, Stefan Hajnoczi wrote:
> >>
> >> On Tue, Mar 13, 2012 at 10:42 AM, Amos Kong<akong@redhat.com>
> >>  wrote:
> >>>
> >>> Boot up guest with 232 virtio-blk disk, qemu will abort for fail
> >>> to
> >>> allocate ioeventfd. This patchset changes
> >>> kvm_has_many_ioeventfds(),
> >>> and check if available ioeventfd exists. If not, virtio-pci will
> >>> fallback to userspace, and don't use ioeventfd for io
> >>> notification.
> >
> >
> > Hi Stefan,
> >
> >
> >> Please explain how it fails with 232 devices.  Where does it abort
> >> and
> >> why?
> >
> >
> > (gdb) bt
> > #0  0x00007ffff48c8885 in raise () from /lib64/libc.so.6
> > #1  0x00007ffff48ca065 in abort () from /lib64/libc.so.6
> > #2  0x00007ffff7e89a3d in kvm_io_ioeventfd_add
> > (section=0x7fffbfbf5610,
> > match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778
> > #3  0x00007ffff7e89b3f in kvm_eventfd_add (listener=0x7ffff82ebe80,
> > section=0x7fffbfbf5610, match_data=true, data=0, fd=461) at
> > /home/devel/qemu/kvm-all.c:802
> > #4  0x00007ffff7e9bcf7 in address_space_add_del_ioeventfds
> > (as=0x7ffff8b278a0, fds_new=0x7fffb80106f0, fds_new_nb=201,
> > fds_old=0x7fffb800db20, fds_old_nb=200)
> >    at /home/devel/qemu/memory.c:612
> > #5  0x00007ffff7e9c04f in address_space_update_ioeventfds
> > (as=0x7ffff8b278a0) at /home/devel/qemu/memory.c:645
> > #6  0x00007ffff7e9caa0 in address_space_update_topology
> > (as=0x7ffff8b278a0)
> > at /home/devel/qemu/memory.c:726
> > #7  0x00007ffff7e9cb95 in memory_region_update_topology
> > (mr=0x7fffdeb179b0)
> > at /home/devel/qemu/memory.c:746
> > #8  0x00007ffff7e9e802 in memory_region_add_eventfd
> > (mr=0x7fffdeb179b0,
> > addr=16, size=2, match_data=true, data=0, fd=461) at
> > /home/devel/qemu/memory.c:1220
> > #9  0x00007ffff7d9e832 in virtio_pci_set_host_notifier_internal
> > (proxy=0x7fffdeb175a0, n=0, assign=true) at
> > /home/devel/qemu/hw/virtio-pci.c:175
> > #10 0x00007ffff7d9ea5f in virtio_pci_start_ioeventfd
> > (proxy=0x7fffdeb175a0)
> > at /home/devel/qemu/hw/virtio-pci.c:230
> > #11 0x00007ffff7d9ee51 in virtio_ioport_write
> > (opaque=0x7fffdeb175a0,
> > addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:325
> > #12 0x00007ffff7d9f37b in virtio_pci_config_writeb
> > (opaque=0x7fffdeb175a0,
> > addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:457
> > #13 0x00007ffff7e9ac23 in memory_region_iorange_write
> > (iorange=0x7fffb8005cc0, offset=18, width=1, data=7) at
> > /home/devel/qemu/memory.c:427
> > #14 0x00007ffff7e857e2 in ioport_writeb_thunk
> > (opaque=0x7fffb8005cc0,
> > addr=61970, data=7) at /home/devel/qemu/ioport.c:212
> > #15 0x00007ffff7e85197 in ioport_write (index=0, address=61970,
> > data=7) at
> > /home/devel/qemu/ioport.c:83
> > #16 0x00007ffff7e85d9a in cpu_outb (addr=61970, val=7 '\a') at
> > /home/devel/qemu/ioport.c:289
> > #17 0x00007ffff7e8a70a in kvm_handle_io (port=61970,
> > data=0x7ffff7c11000,
> > direction=1, size=1, count=1) at /home/devel/qemu/kvm-all.c:1123
> > #18 0x00007ffff7e8ad0a in kvm_cpu_exec (env=0x7fffc1688010) at
> > /home/devel/qemu/kvm-all.c:1271
> > #19 0x00007ffff7e595fc in qemu_kvm_cpu_thread_fn
> > (arg=0x7fffc1688010) at
> > /home/devel/qemu/cpus.c:733
> > #20 0x00007ffff63687f1 in start_thread () from
> > /lib64/libpthread.so.0
> > #21 0x00007ffff497b92d in clone () from /lib64/libc.so.6
> > (gdb) frame 2
> > #2  0x00007ffff7e89a3d in kvm_io_ioeventfd_add
> > (section=0x7fffbfbf5610,
> > match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778
> > 778             abort();
> > (gdb) l
> > 773         assert(match_data && section->size == 2);
> > 774
> > 775         r = kvm_set_ioeventfd_pio_word(fd,
> > section->offset_within_address_space,
> > 776                                        data, true);
> > 777         if (r < 0) {
> > 778             abort();
> > 779         }
> 
> This is where graceful fallback code needs to be added.  I have added
> Avi because I'm not very familiar with the new memory API and how it
> does ioeventfd.

Hi, Stefan

I thought a fix here, but virtio_pci_start_ioeventfd() could not know allocation failed.




> Basically we need a way for ioeventfd to fail if we hit rlimit or the
> in-kernel io bus device limit.  Suggestions?



> (The reason I don't like using has_many_ioeventfds() is because it's
> ugly and inefficient to open and then close file descriptors -
> especially in a multithreaded program where file descriptor
> availability can change while we're executing.)

I identified it's too ugly ;)
but I want to reserve it for older kernel (iobus limit is 6) 

Can we remove the check for old kernel (iobus limit is 6)?
user can disable ioeventfd through qemu cmdline
 virtio-net-pci.ioeventfd=on/off
 virtio-blk-pci.ioeventfd=on/off


Amos

Comments

Stefan Hajnoczi March 13, 2012, 4:36 p.m. UTC | #1
On Tue, Mar 13, 2012 at 2:47 PM, Amos Kong <akong@redhat.com> wrote:
> ----- Original Message -----
>> On Tue, Mar 13, 2012 at 11:51 AM, Amos Kong <akong@redhat.com> wrote:
>> > On 13/03/12 19:23, Stefan Hajnoczi wrote:
>> >>
>> >> On Tue, Mar 13, 2012 at 10:42 AM, Amos Kong<akong@redhat.com>
>> >>  wrote:
>> >>>
>> >>> Boot up guest with 232 virtio-blk disk, qemu will abort for fail
>> >>> to
>> >>> allocate ioeventfd. This patchset changes
>> >>> kvm_has_many_ioeventfds(),
>> >>> and check if available ioeventfd exists. If not, virtio-pci will
>> >>> fallback to userspace, and don't use ioeventfd for io
>> >>> notification.
>> >
>> >
>> > Hi Stefan,
>> >
>> >
>> >> Please explain how it fails with 232 devices.  Where does it abort
>> >> and
>> >> why?
>> >
>> >
>> > (gdb) bt
>> > #0  0x00007ffff48c8885 in raise () from /lib64/libc.so.6
>> > #1  0x00007ffff48ca065 in abort () from /lib64/libc.so.6
>> > #2  0x00007ffff7e89a3d in kvm_io_ioeventfd_add
>> > (section=0x7fffbfbf5610,
>> > match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778
>> > #3  0x00007ffff7e89b3f in kvm_eventfd_add (listener=0x7ffff82ebe80,
>> > section=0x7fffbfbf5610, match_data=true, data=0, fd=461) at
>> > /home/devel/qemu/kvm-all.c:802
>> > #4  0x00007ffff7e9bcf7 in address_space_add_del_ioeventfds
>> > (as=0x7ffff8b278a0, fds_new=0x7fffb80106f0, fds_new_nb=201,
>> > fds_old=0x7fffb800db20, fds_old_nb=200)
>> >    at /home/devel/qemu/memory.c:612
>> > #5  0x00007ffff7e9c04f in address_space_update_ioeventfds
>> > (as=0x7ffff8b278a0) at /home/devel/qemu/memory.c:645
>> > #6  0x00007ffff7e9caa0 in address_space_update_topology
>> > (as=0x7ffff8b278a0)
>> > at /home/devel/qemu/memory.c:726
>> > #7  0x00007ffff7e9cb95 in memory_region_update_topology
>> > (mr=0x7fffdeb179b0)
>> > at /home/devel/qemu/memory.c:746
>> > #8  0x00007ffff7e9e802 in memory_region_add_eventfd
>> > (mr=0x7fffdeb179b0,
>> > addr=16, size=2, match_data=true, data=0, fd=461) at
>> > /home/devel/qemu/memory.c:1220
>> > #9  0x00007ffff7d9e832 in virtio_pci_set_host_notifier_internal
>> > (proxy=0x7fffdeb175a0, n=0, assign=true) at
>> > /home/devel/qemu/hw/virtio-pci.c:175
>> > #10 0x00007ffff7d9ea5f in virtio_pci_start_ioeventfd
>> > (proxy=0x7fffdeb175a0)
>> > at /home/devel/qemu/hw/virtio-pci.c:230
>> > #11 0x00007ffff7d9ee51 in virtio_ioport_write
>> > (opaque=0x7fffdeb175a0,
>> > addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:325
>> > #12 0x00007ffff7d9f37b in virtio_pci_config_writeb
>> > (opaque=0x7fffdeb175a0,
>> > addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:457
>> > #13 0x00007ffff7e9ac23 in memory_region_iorange_write
>> > (iorange=0x7fffb8005cc0, offset=18, width=1, data=7) at
>> > /home/devel/qemu/memory.c:427
>> > #14 0x00007ffff7e857e2 in ioport_writeb_thunk
>> > (opaque=0x7fffb8005cc0,
>> > addr=61970, data=7) at /home/devel/qemu/ioport.c:212
>> > #15 0x00007ffff7e85197 in ioport_write (index=0, address=61970,
>> > data=7) at
>> > /home/devel/qemu/ioport.c:83
>> > #16 0x00007ffff7e85d9a in cpu_outb (addr=61970, val=7 '\a') at
>> > /home/devel/qemu/ioport.c:289
>> > #17 0x00007ffff7e8a70a in kvm_handle_io (port=61970,
>> > data=0x7ffff7c11000,
>> > direction=1, size=1, count=1) at /home/devel/qemu/kvm-all.c:1123
>> > #18 0x00007ffff7e8ad0a in kvm_cpu_exec (env=0x7fffc1688010) at
>> > /home/devel/qemu/kvm-all.c:1271
>> > #19 0x00007ffff7e595fc in qemu_kvm_cpu_thread_fn
>> > (arg=0x7fffc1688010) at
>> > /home/devel/qemu/cpus.c:733
>> > #20 0x00007ffff63687f1 in start_thread () from
>> > /lib64/libpthread.so.0
>> > #21 0x00007ffff497b92d in clone () from /lib64/libc.so.6
>> > (gdb) frame 2
>> > #2  0x00007ffff7e89a3d in kvm_io_ioeventfd_add
>> > (section=0x7fffbfbf5610,
>> > match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778
>> > 778             abort();
>> > (gdb) l
>> > 773         assert(match_data && section->size == 2);
>> > 774
>> > 775         r = kvm_set_ioeventfd_pio_word(fd,
>> > section->offset_within_address_space,
>> > 776                                        data, true);
>> > 777         if (r < 0) {
>> > 778             abort();
>> > 779         }
>>
>> This is where graceful fallback code needs to be added.  I have added
>> Avi because I'm not very familiar with the new memory API and how it
>> does ioeventfd.
>
> Hi, Stefan
>
> I thought a fix here, but virtio_pci_start_ioeventfd() could not know allocation failed.
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 77eadf6..7157e78 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -771,6 +771,8 @@ static void kvm_io_ioeventfd_add(MemoryRegionSection *sectio
>
>     r = kvm_set_ioeventfd_pio_word(fd, section->offset_within_address_space,
>                                    data, true);
> +    if (r == -ENOSPC)
> +        return;

The caller needs a way to detect the failure.

>     if (r < 0) {
>         abort();
>     }
>
>
>
>> Basically we need a way for ioeventfd to fail if we hit rlimit or the
>> in-kernel io bus device limit.  Suggestions?
>
>
>
>> (The reason I don't like using has_many_ioeventfds() is because it's
>> ugly and inefficient to open and then close file descriptors -
>> especially in a multithreaded program where file descriptor
>> availability can change while we're executing.)
>
> I identified it's too ugly ;)
> but I want to reserve it for older kernel (iobus limit is 6)
>
> Can we remove the check for old kernel (iobus limit is 6)?
> user can disable ioeventfd through qemu cmdline
>  virtio-net-pci.ioeventfd=on/off
>  virtio-blk-pci.ioeventfd=on/off

Why do you want to remove the iobus limit 6 check?  The point of that
check is to allow vhost-net to always work since it requires an
ioeventfd.  Userspace virtio devices, on the other hand, can
gracefully fall back to non-ioeventfd.

Stefan
Amos Kong March 14, 2012, 12:30 a.m. UTC | #2
----- Original Message -----
> On Tue, Mar 13, 2012 at 2:47 PM, Amos Kong <akong@redhat.com> wrote:

...

Hi, Stefan

> > diff --git a/kvm-all.c b/kvm-all.c
> > index 77eadf6..7157e78 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -771,6 +771,8 @@ static void
> > kvm_io_ioeventfd_add(MemoryRegionSection *sectio
> >
> >     r = kvm_set_ioeventfd_pio_word(fd,
> >     section->offset_within_address_space,
> >                                    data, true);
> > +    if (r == -ENOSPC)
> > +        return;
> 
> The caller needs a way to detect the failure.

Yes, about memory API
 
> >     if (r < 0) {
> >         abort();
> >     }
> >
> >
> >
> >> Basically we need a way for ioeventfd to fail if we hit rlimit or
> >> the
> >> in-kernel io bus device limit.  Suggestions?
> >
> >
> >
> >> (The reason I don't like using has_many_ioeventfds() is because
> >> it's
> >> ugly and inefficient to open and then close file descriptors -
> >> especially in a multithreaded program where file descriptor
> >> availability can change while we're executing.)
> >
> > I identified it's too ugly ;)
> > but I want to reserve it for older kernel (iobus limit is 6)
> >
> > Can we remove the check for old kernel (iobus limit is 6)?
> > user can disable ioeventfd through qemu cmdline
> >  virtio-net-pci.ioeventfd=on/off
> >  virtio-blk-pci.ioeventfd=on/off
> 
> Why do you want to remove the iobus limit 6 check?  The point of that
> check is to allow vhost-net to always work since it requires an
> ioeventfd.


### -device virtio-blk-pci,ioeventfd=off,drive=drive-virtio0-0-0,id=id1 -drive ...
     this blk dev will not use ioeventfd for io notification.

### -device virtio-net-pci,netdev=he,ioeventfd=off -netdev tap,id=he 
     this net dev will not use ioeventfd

### -device virtio-net-pci,netdev=he,vhost=on -netdev tap,id=he 
     this dev will take 2 ioeventfd (service notifications from rx/tx queues)

ioeventfd paramenter is a way for user to limit ioeventfd usage.


### qemu-kvm -net none -device virtio-blk-pci,ioeventfd=on,drive=drive-virtio0-0-0,id=id1 -drive ...
For some odd users, they don't use network, and they need a better disk io performance.
but we could not satisfy them if we reserve the checking of 6 iobus devs.

Strategy should be decided by users.

> Userspace virtio devices, on the other hand, can
> gracefully fall back to non-ioeventfd.

This is expected, not abort.


Thanks, Amos.
Stefan Hajnoczi March 14, 2012, 8:57 a.m. UTC | #3
On Wed, Mar 14, 2012 at 12:30 AM, Amos Kong <akong@redhat.com> wrote:
> ----- Original Message -----
>> On Tue, Mar 13, 2012 at 2:47 PM, Amos Kong <akong@redhat.com> wrote:
>
> ...
>
> Hi, Stefan
>
>> > diff --git a/kvm-all.c b/kvm-all.c
>> > index 77eadf6..7157e78 100644
>> > --- a/kvm-all.c
>> > +++ b/kvm-all.c
>> > @@ -771,6 +771,8 @@ static void
>> > kvm_io_ioeventfd_add(MemoryRegionSection *sectio
>> >
>> >     r = kvm_set_ioeventfd_pio_word(fd,
>> >     section->offset_within_address_space,
>> >                                    data, true);
>> > +    if (r == -ENOSPC)
>> > +        return;
>>
>> The caller needs a way to detect the failure.
>
> Yes, about memory API
>
>> >     if (r < 0) {
>> >         abort();
>> >     }
>> >
>> >
>> >
>> >> Basically we need a way for ioeventfd to fail if we hit rlimit or
>> >> the
>> >> in-kernel io bus device limit.  Suggestions?
>> >
>> >
>> >
>> >> (The reason I don't like using has_many_ioeventfds() is because
>> >> it's
>> >> ugly and inefficient to open and then close file descriptors -
>> >> especially in a multithreaded program where file descriptor
>> >> availability can change while we're executing.)
>> >
>> > I identified it's too ugly ;)
>> > but I want to reserve it for older kernel (iobus limit is 6)
>> >
>> > Can we remove the check for old kernel (iobus limit is 6)?
>> > user can disable ioeventfd through qemu cmdline
>> >  virtio-net-pci.ioeventfd=on/off
>> >  virtio-blk-pci.ioeventfd=on/off
>>
>> Why do you want to remove the iobus limit 6 check?  The point of that
>> check is to allow vhost-net to always work since it requires an
>> ioeventfd.
>
>
> ### -device virtio-blk-pci,ioeventfd=off,drive=drive-virtio0-0-0,id=id1 -drive ...
>     this blk dev will not use ioeventfd for io notification.
>
> ### -device virtio-net-pci,netdev=he,ioeventfd=off -netdev tap,id=he
>     this net dev will not use ioeventfd
>
> ### -device virtio-net-pci,netdev=he,vhost=on -netdev tap,id=he
>     this dev will take 2 ioeventfd (service notifications from rx/tx queues)
>
> ioeventfd paramenter is a way for user to limit ioeventfd usage.
>
>
> ### qemu-kvm -net none -device virtio-blk-pci,ioeventfd=on,drive=drive-virtio0-0-0,id=id1 -drive ...
> For some odd users, they don't use network, and they need a better disk io performance.
> but we could not satisfy them if we reserve the checking of 6 iobus devs.
>
> Strategy should be decided by users.

We're discussing the wrong thing here, the 6 ioeventfd workaround is
unrelated to the problem you are trying to solve.

Graceful fallback for ioeventfd failure previously existed but it
seems the new memory API broke it.  The thing that needs fixing is the
regression caused by the new memory API code.

An error code path needs to be added to the memory API ioeventfd code
so that virtio-pci.c can deal with failure and QEMU no longer aborts.

Stefan
diff mbox

Patch

diff --git a/kvm-all.c b/kvm-all.c
index 77eadf6..7157e78 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -771,6 +771,8 @@  static void kvm_io_ioeventfd_add(MemoryRegionSection *sectio
 
     r = kvm_set_ioeventfd_pio_word(fd, section->offset_within_address_space,
                                    data, true);
+    if (r == -ENOSPC)
+        return;
     if (r < 0) {
         abort();
     }