From patchwork Fri Mar 16 08:59:35 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amos Kong X-Patchwork-Id: 147163 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 2F678B6FBB for ; Fri, 16 Mar 2012 20:00:19 +1100 (EST) Received: from localhost ([::1]:33976 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S8T1F-0008Uz-07 for incoming@patchwork.ozlabs.org; Fri, 16 Mar 2012 05:00:17 -0400 Received: from eggs.gnu.org ([208.118.235.92]:43962) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S8T0p-00081V-Mn for qemu-devel@nongnu.org; Fri, 16 Mar 2012 04:59:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S8T0h-0002yY-CH for qemu-devel@nongnu.org; Fri, 16 Mar 2012 04:59:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38878) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S8T0h-0002yE-3d for qemu-devel@nongnu.org; Fri, 16 Mar 2012 04:59:43 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q2G8xcCW021108 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 16 Mar 2012 04:59:38 -0400 Received: from [10.66.9.194] ([10.66.9.194]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q2G8xZ10028870; Fri, 16 Mar 2012 04:59:35 -0400 Message-ID: <4F6300F7.6080806@redhat.com> Date: Fri, 16 Mar 2012 16:59:35 +0800 From: Amos Kong User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Stefan Hajnoczi References: <20120313103602.8741.71939.stgit@dhcp-8-167.nay.redhat.com> <4F606356.9080003@redhat.com> <4F606D84.8000807@redhat.com> <4F6076EE.9090005@redhat.com> In-Reply-To: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, kvm@vger.kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org, Avi Kivity Subject: Re: [Qemu-devel] [PATCH 0/2] virtio-pci: fix abort when fail to allocate ioeventfd X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 14/03/12 19:46, Stefan Hajnoczi wrote: > On Wed, Mar 14, 2012 at 10:46 AM, Avi Kivity wrote: >> On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote: >>> On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity wrote: >>>> On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote: >>>>> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity wrote: >>>>>> On 03/13/2012 12:42 PM, Amos Kong wrote: >>>>>>> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to >>>>>>> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(), >>>>>>> and check if available ioeventfd exists. If not, virtio-pci will >>>>>>> fallback to userspace, and don't use ioeventfd for io notification. >>>>>> >>>>>> How about an alternative way of solving this, within the memory core: >>>>>> trap those writes in qemu and write to the ioeventfd yourself. This way >>>>>> ioeventfds work even without kvm: >>>>>> >>>>>> >>>>>> core: create eventfd >>>>>> core: install handler for memory address that writes to ioeventfd >>>>>> kvm (optional): install kernel handler for ioeventfd Can you give some detail about this? I'm not familiar with Memory API. btw, can we fix this problem by replacing abort() by a error note? virtio-pci will auto fallback to userspace. } @@ -775,7 +776,8 @@ static void kvm_io_ioeventfd_add(MemoryRegionSection *section, r = kvm_set_ioeventfd_pio_word(fd, section->offset_within_address_space, data, true); if (r < 0) { - abort(); + fprintf(stderr, "%s: unable to map ioeventfd: %s.\nFallback to " + "userspace (slower).\n", __func__, strerror(-r)); } } >>>>>> even if the third step fails, the ioeventfd still works, it's just slower. >>>>> >>>>> That approach will penalize guests with large numbers of disks - they >>>>> see an extra switch to vcpu thread instead of kvm.ko -> iothread. >>>> >>>> It's only a failure path. The normal path is expected to have a kvm >>>> ioeventfd installed. >>> >>> It's the normal path when you attach>232 virtio-blk devices to a >>> guest (or 300 in the future). >> >> Well, there's nothing we can do about it. >> >> We'll increase the limit of course, but old kernels will remain out >> there. The right fix is virtio-scsi anyway. >> >>>>> It >>>>> seems okay provided we can solve the limit in the kernel once and for >>>>> all by introducing a more dynamic data structure for in-kernel >>>>> devices. That way future kernels will never hit an arbitrary limit >>>>> below their file descriptor rlimit. >>>>> >>>>> Is there some reason why kvm.ko must use a fixed size array? Would it >>>>> be possible to use a tree (maybe with a cache for recent lookups)? >>>> >>>> It does use bsearch today IIRC. We'll expand the limit, but there must >>>> be a limit, and qemu must be prepared to deal with it. >>> >>> Shouldn't the limit be the file descriptor rlimit? If userspace >>> cannot create more eventfds then it cannot set up more ioeventfds. >> >> You can use the same eventfd for multiple ioeventfds. If you mean to >> slave kvm's ioeventfd limit to the number of files the process can have, >> that's a good idea. Surely an ioeventfd occupies less resources than an >> open file. > > Yes. > > Ultimately I guess you're right in that we still need to have an error > path and virtio-scsi will reduce the pressure on I/O eventfds for > storage. > > Stefan diff --git a/kvm-all.c b/kvm-all.c index 3c6b4f0..cf23dbf 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -749,7 +749,8 @@ static void kvm_mem_ioeventfd_add(MemoryRegionSection *section, r = kvm_set_ioeventfd_mmio_long(fd, section->offset_within_address_space, data, true); if (r < 0) { - abort(); + fprintf(stderr, "%s: unable to map ioeventfd: %s.\nFallback to " + "userspace (slower).\n", __func__, strerror(-r)); }