[0/2] virtio-pci: fix abort when fail to allocate ioeventfd

Message ID	4F6300F7.6080806@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> Message-ID: <4F6300F7.6080806@redhat.com> Date: Fri, 16 Mar 2012 16:59:35 +0800 From: Amos Kong <akong@redhat.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Stefan Hajnoczi <stefanha@gmail.com> References: <20120313103602.8741.71939.stgit@dhcp-8-167.nay.redhat.com> <4F606356.9080003@redhat.com> <CAJSP0QXxD8fSU1TrJsMj2owpBr4n8etycEr6gj08suMapu+=LQ@mail.gmail.com> <4F606D84.8000807@redhat.com> <CAJSP0QWUSjVJ4u9PykwyQeyHEcqBu7o=MNAG47L7WviOtxiZCg@mail.gmail.com> <4F6076EE.9090005@redhat.com> <CAJSP0QXHu9_Yahrdv5eu+EbOhSAAZtf2gUWRc6iZrYmtdqd4dA@mail.gmail.com> In-Reply-To: <CAJSP0QXHu9_Yahrdv5eu+EbOhSAAZtf2gUWRc6iZrYmtdqd4dA@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, kvm@vger.kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com> Subject: Re: [Qemu-devel] [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Message ID

4F6300F7.6080806@redhat.com

State

New

Headers

Message-ID: <4F6300F7.6080806@redhat.com>
Date: Fri, 16 Mar 2012 16:59:35 +0800
From: Amos Kong <akong@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1
MIME-Version: 1.0
To: Stefan Hajnoczi <stefanha@gmail.com>
References: <20120313103602.8741.71939.stgit@dhcp-8-167.nay.redhat.com>
	<4F606356.9080003@redhat.com>
	<CAJSP0QXxD8fSU1TrJsMj2owpBr4n8etycEr6gj08suMapu+=LQ@mail.gmail.com>
	<4F606D84.8000807@redhat.com>
	<CAJSP0QWUSjVJ4u9PykwyQeyHEcqBu7o=MNAG47L7WviOtxiZCg@mail.gmail.com>
	<4F6076EE.9090005@redhat.com>
	<CAJSP0QXHu9_Yahrdv5eu+EbOhSAAZtf2gUWRc6iZrYmtdqd4dA@mail.gmail.com>
In-Reply-To: <CAJSP0QXHu9_Yahrdv5eu+EbOhSAAZtf2gUWRc6iZrYmtdqd4dA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, kvm@vger.kernel.org,
	mtosatti@redhat.com, qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 0/2] virtio-pci: fix abort when fail to
	allocate ioeventfd
Precedence: list
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Commit Message

Amos Kong March 16, 2012, 8:59 a.m. UTC

On 14/03/12 19:46, Stefan Hajnoczi wrote:
> On Wed, Mar 14, 2012 at 10:46 AM, Avi Kivity<avi@redhat.com>  wrote:
>> On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote:
>>> On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity<avi@redhat.com>  wrote:
>>>> On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
>>>>> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity<avi@redhat.com>  wrote:
>>>>>> On 03/13/2012 12:42 PM, Amos Kong wrote:
>>>>>>> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
>>>>>>> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
>>>>>>> and check if available ioeventfd exists. If not, virtio-pci will
>>>>>>> fallback to userspace, and don't use ioeventfd for io notification.
>>>>>>
>>>>>> How about an alternative way of solving this, within the memory core:
>>>>>> trap those writes in qemu and write to the ioeventfd yourself.  This way
>>>>>> ioeventfds work even without kvm:
>>>>>>
>>>>>>
>>>>>>   core: create eventfd
>>>>>>   core: install handler for memory address that writes to ioeventfd
>>>>>>   kvm (optional): install kernel handler for ioeventfd

Can you give some detail about this? I'm not familiar with Memory API.


btw, can we fix this problem by replacing abort() by a error note?
virtio-pci will auto fallback to userspace.

  }

@@ -775,7 +776,8 @@ static void kvm_io_ioeventfd_add(MemoryRegionSection 
*section,
      r = kvm_set_ioeventfd_pio_word(fd, 
section->offset_within_address_space,
                                     data, true);
      if (r < 0) {
-        abort();
+        fprintf(stderr, "%s: unable to map ioeventfd: %s.\nFallback to "
+                "userspace (slower).\n", __func__, strerror(-r));
      }
  }


>>>>>> even if the third step fails, the ioeventfd still works, it's just slower.
>>>>>
>>>>> That approach will penalize guests with large numbers of disks - they
>>>>> see an extra switch to vcpu thread instead of kvm.ko ->  iothread.
>>>>
>>>> It's only a failure path.  The normal path is expected to have a kvm
>>>> ioeventfd installed.
>>>
>>> It's the normal path when you attach>232 virtio-blk devices to a
>>> guest (or 300 in the future).
>>
>> Well, there's nothing we can do about it.
>>
>> We'll increase the limit of course, but old kernels will remain out
>> there.  The right fix is virtio-scsi anyway.
>>
>>>>>    It
>>>>> seems okay provided we can solve the limit in the kernel once and for
>>>>> all by introducing a more dynamic data structure for in-kernel
>>>>> devices.  That way future kernels will never hit an arbitrary limit
>>>>> below their file descriptor rlimit.
>>>>>
>>>>> Is there some reason why kvm.ko must use a fixed size array?  Would it
>>>>> be possible to use a tree (maybe with a cache for recent lookups)?
>>>>
>>>> It does use bsearch today IIRC.  We'll expand the limit, but there must
>>>> be a limit, and qemu must be prepared to deal with it.
>>>
>>> Shouldn't the limit be the file descriptor rlimit?  If userspace
>>> cannot create more eventfds then it cannot set up more ioeventfds.
>>
>> You can use the same eventfd for multiple ioeventfds.  If you mean to
>> slave kvm's ioeventfd limit to the number of files the process can have,
>> that's a good idea.  Surely an ioeventfd occupies less resources than an
>> open file.
>
> Yes.
>
> Ultimately I guess you're right in that we still need to have an error
> path and virtio-scsi will reduce the pressure on I/O eventfds for
> storage.
>
> Stefan

Comments

Stefan Hajnoczi March 19, 2012, 8:21 a.m. UTC | #1

On Fri, Mar 16, 2012 at 04:59:35PM +0800, Amos Kong wrote:
> On 14/03/12 19:46, Stefan Hajnoczi wrote:
> >On Wed, Mar 14, 2012 at 10:46 AM, Avi Kivity<avi@redhat.com>  wrote:
> >>On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote:
> >>>On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity<avi@redhat.com>  wrote:
> >>>>On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote:
> >>>>>On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity<avi@redhat.com>  wrote:
> >>>>>>On 03/13/2012 12:42 PM, Amos Kong wrote:
> >>>>>>>Boot up guest with 232 virtio-blk disk, qemu will abort for fail to
> >>>>>>>allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(),
> >>>>>>>and check if available ioeventfd exists. If not, virtio-pci will
> >>>>>>>fallback to userspace, and don't use ioeventfd for io notification.
> >>>>>>
> >>>>>>How about an alternative way of solving this, within the memory core:
> >>>>>>trap those writes in qemu and write to the ioeventfd yourself.  This way
> >>>>>>ioeventfds work even without kvm:
> >>>>>>
> >>>>>>
> >>>>>>  core: create eventfd
> >>>>>>  core: install handler for memory address that writes to ioeventfd
> >>>>>>  kvm (optional): install kernel handler for ioeventfd
> 
> Can you give some detail about this? I'm not familiar with Memory API.
> 
> 
> btw, can we fix this problem by replacing abort() by a error note?
> virtio-pci will auto fallback to userspace.
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 3c6b4f0..cf23dbf 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -749,7 +749,8 @@ static void
> kvm_mem_ioeventfd_add(MemoryRegionSection *section,
>      r = kvm_set_ioeventfd_mmio_long(fd,
> section->offset_within_address_space,
>                                      data, true);
>      if (r < 0) {
> -        abort();
> +        fprintf(stderr, "%s: unable to map ioeventfd: %s.\nFallback to "
> +                "userspace (slower).\n", __func__, strerror(-r));

The challenge is propagating the error code.  If virtio-pci.c doesn't
know that ioeventfd has failed, then it's not possible to fall back to a
userspace handler.

I believe Avi's suggestion is to put the fallback code into the KVM
memory API implementation so that virtio-pci.c doesn't need to know that
ioeventfd failed at all.

Stefan

Avi Kivity March 19, 2012, 10:11 a.m. UTC | #2

On 03/16/2012 10:59 AM, Amos Kong wrote:
>
> Can you give some detail about this? I'm not familiar with Memory API.

Well there's a huge amount of detail needed here.  The idea is that
memory_region_add_eventfd() will always work, with or without kvm, and
even if kvm is enabled but we run out of ioeventfds.

One way to do this is to implement core_eventfd_add() in exec.c.  This
is unlikely to be easy however.

>
> btw, can we fix this problem by replacing abort() by a error note?
> virtio-pci will auto fallback to userspace.

But other users will silently break, need to audit all other users of
ioeventfd, for example ivshmem.

diff --git a/kvm-all.c b/kvm-all.c
index 3c6b4f0..cf23dbf 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -749,7 +749,8 @@  static void 
kvm_mem_ioeventfd_add(MemoryRegionSection *section,
      r = kvm_set_ioeventfd_mmio_long(fd, 
section->offset_within_address_space,
                                      data, true);
      if (r < 0) {
-        abort();
+        fprintf(stderr, "%s: unable to map ioeventfd: %s.\nFallback to "
+                "userspace (slower).\n", __func__, strerror(-r));
      }

[0/2] virtio-pci: fix abort when fail to allocate ioeventfd

Commit Message

Comments

Patch