From patchwork Tue Aug 27 10:06:49 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 270078 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 1E4292C00C5 for ; Tue, 27 Aug 2013 20:07:27 +1000 (EST) Received: from localhost ([::1]:55223 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VEGBJ-0000cg-4x for incoming@patchwork.ozlabs.org; Tue, 27 Aug 2013 06:07:25 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33314) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VEGAt-0000cU-SB for qemu-devel@nongnu.org; Tue, 27 Aug 2013 06:07:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VEGAn-0002LV-MR for qemu-devel@nongnu.org; Tue, 27 Aug 2013 06:06:59 -0400 Received: from cantor2.suse.de ([195.135.220.15]:33523 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VEGAn-0002LN-9s; Tue, 27 Aug 2013 06:06:53 -0400 Received: from relay1.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 6C2C1A535B; Tue, 27 Aug 2013 12:06:52 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1278) From: Alexander Graf In-Reply-To: <5216E530.4060007@ozlabs.ru> Date: Tue, 27 Aug 2013 12:06:49 +0200 Message-Id: <6CCA8220-0125-4CB5-A54E-9648D8B409D2@suse.de> References: <1375865493-19143-1-git-send-email-aik@ozlabs.ru> <1375865493-19143-2-git-send-email-aik@ozlabs.ru> <20130819073550.GE17937@redhat.com> <5211CCC4.3010606@ozlabs.ru> <20130819075434.GA18579@redhat.com> <5211D2D9.5050101@ozlabs.ru> <20130819090128.GA20139@redhat.com> <5216E530.4060007@ozlabs.ru> To: Alexey Kardashevskiy X-Mailer: Apple Mail (2.1278) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x X-Received-From: 195.135.220.15 Cc: Anthony Liguori , "Michael S. Tsirkin" , qemu-devel@nongnu.org, Alex Williamson , qemu-ppc@nongnu.org, Paolo Bonzini , Paul Mackerras Subject: Re: [Qemu-devel] [PATCH 1/2] kvm irqfd: support msimessage to irq translation in PHB X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 23.08.2013, at 06:29, Alexey Kardashevskiy wrote: > On 08/19/2013 07:01 PM, Michael S. Tsirkin wrote: >> On Mon, Aug 19, 2013 at 06:10:01PM +1000, Alexey Kardashevskiy wrote: >>> On 08/19/2013 05:54 PM, Michael S. Tsirkin wrote: >>>> On Mon, Aug 19, 2013 at 05:44:04PM +1000, Alexey Kardashevskiy wrote: >>>>> On 08/19/2013 05:35 PM, Michael S. Tsirkin wrote: >>>>>> On Wed, Aug 07, 2013 at 06:51:32PM +1000, Alexey Kardashevskiy wrote: >>>>>>> On PPC64 systems MSI Messages are translated to system IRQ in a PCI >>>>>>> host bridge. This is already supported for emulated MSI/MSIX but >>>>>>> not for irqfd where the current QEMU allocates IRQ numbers from >>>>>>> irqchip and maps MSIMessages to those IRQ in the host kernel. >>>>>>> >>>>>>> The patch extends irqfd support in order to avoid unnecessary >>>>>>> mapping and reuse the one which already exists in a PCI host bridge. >>>>>>> >>>>>>> Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi() >>>>>>> to PCI API. The latter tries a bus specific map_msi and falls back to >>>>>>> the default kvm_irqchip_add_msi_route() if none set. >>>>>>> >>>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>> >>>>>> The mapping (irq # within data) is really KVM on PPC64 specific, >>>>>> isn't it? >>>>>> >>>>>> So why not implement >>>>>> kvm_irqchip_add_msi_route(kvm_state, msg); >>>>>> to simply return msg.data on PPC64? >>>>> >>>>> How exactly? Please give some details. kvm_irqchip_add_msi_route() is >>>>> implemented in kvm-all.c once for all platforms so hack it right there? >>>> >>>> You can add the map_msi callback in kvm state, >>>> then just call it if it's set, right? >>>> >>>>> I thought we discussed all of this already and decided that this thing >>>>> should go to PCI host bridge and by default PHB's map_msi() callback should >>>>> just call kvm_irqchip_add_msi_route(). This is why I touched i386. >>>>> >>>>> Things have changed since then? >>>> >>>> >>>> I think pci_bus_map_msi was there since day 1, it's not like >>>> we are going back and forth on it, right? >>> >>> >>> Sorry, I am not following you. pci_bus_map_msi() is added by this patch. >>> Where was it from day 1? >> >> I'm sorry. I merely meant that it's there in v1 of this patch. >> >>> >>>> I always felt it's best to have irqfd isolated to kvm somehow >>>> rather than have hooks in pci code, I just don't know enough >>>> about kvm on ppc to figure out the right way to do this. >>>> With your latest patches I think this is becoming clearer ... >>> >>> >>> Well... This encoding (irq# as msg.data) is used in spapr_pci.c in any >>> case, whether it is KVM or TCG. >> >> Not only on TCG. All emulated devices merely do a write to send an MSI, >> this is exactly what happens with real hardware. If this happens to >> land in the MSI region, you get an interrupt. The concept of mapping >> msi to irq normally doesn't belong in devices/pci core, it's something >> done by APIC or pci host. >> >> For KVM, we have this special hook where devices allocate a route and >> then Linux can send an interrupt to guest quickly bypassing QEMU. It >> was originally designed for paravirtualization, so it doesn't support >> strange cases such as guest programming MSI to write into guest memory, >> which is possible with real PCI devices. However, no one seems to do >> these hacks in practice, so in practice this works for >> some other cases, such as device assignment. >> >> That's why we have kvm_irqchip_add_msi_route in some places >> in code - it's so specific devices implemented by >> Linux can take this shortcut (it does not make sense for >> devices implemented by qemu). >> So the way I see it, it's not a PCI thing at all, it's >> a KVM specific implementation, so it seems cleaner if >> it's isolated there. > > The only PCI thing here (at least for spapr) is the way we translate > MSIMessage to IRQ number. Besides that, yeah, there is no good reason to > poison pci.c or spapr_pci.c with KVM. > > >> Now, we have this allocate/free API for reasons that >> have to do with the API of kvm on x86. spapr doesn't need to >> allocate/free resources, so just shortcut free and >> do map when we allocate. >> >> Doesn't this sound reasonable? > > > Yes, pretty much. But it did not help to get this patch accepted at the > first place and my vote costs a little here :) > > >>> I am confused. >>> 1.5 month ago Anthony suggested (I'll answer to that mail to bring that >>> discussion up) to do this thing as: >>> >>>> So what this should look like is: >>>> >>>> 1) A PCI bus function to do the MSI -> virq mapping >>>> 2) On x86 (and e500), this is implemented by calling >>> kvm_irqchip_add_msi_route() >>>> 3) On pseries, this just returns msi->data >>>> >>>> Perhaps (2) can just be the default PCI bus implementation to simplify >>> things. >>> >>> >>> Now it is not right. Anthony, please help. >> >> That's right, you implemented exactly what Anthony suggested. Now that >> you did, I think I see an even better, more contained way to do this. >> And it's not much of a change as compared to what you have, is it? >> >> I'm sorry if this looks like you are given a run-around, >> not everyone understands how spapr works, sometimes >> it takes a full implementation to make everyone understand >> the issues. >> >> But I agree, let's see what Anthony thinks, maybe I >> misunderstood how spapr works. > > > Anthony, Alex (Graf), ping, please? I think it makes sense to just have this special case be treated as a special case. Why don't we just add a new global check in kvm.c similar to kvm_gsi_routing_enabled() for kvm_gsi_routing_linear(). This should probably disable routing_enabled(), but we add a new check for kvm_gsi_routing_linear(). So basically we would end up with something like I agree that we should isolate kvm specifics to kvm code when we can. Alex diff --git a/kvm-all.c b/kvm-all.c index 716860f..ca3251e 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1190,6 +1190,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) struct kvm_irq_routing_entry kroute = {}; int virq; + if (kvm_gsi_routing_linear()) { + return msi.data & 0xffff; + } + if (!kvm_gsi_routing_enabled()) { return -ENOSYS; }