Message ID | 6CCA8220-0125-4CB5-A54E-9648D8B409D2@suse.de |
---|---|
State | New |
Headers | show |
On Tue, Aug 27, 2013 at 12:06:49PM +0200, Alexander Graf wrote: > > On 23.08.2013, at 06:29, Alexey Kardashevskiy wrote: > > > On 08/19/2013 07:01 PM, Michael S. Tsirkin wrote: > >> On Mon, Aug 19, 2013 at 06:10:01PM +1000, Alexey Kardashevskiy wrote: > >>> On 08/19/2013 05:54 PM, Michael S. Tsirkin wrote: > >>>> On Mon, Aug 19, 2013 at 05:44:04PM +1000, Alexey Kardashevskiy wrote: > >>>>> On 08/19/2013 05:35 PM, Michael S. Tsirkin wrote: > >>>>>> On Wed, Aug 07, 2013 at 06:51:32PM +1000, Alexey Kardashevskiy wrote: > >>>>>>> On PPC64 systems MSI Messages are translated to system IRQ in a PCI > >>>>>>> host bridge. This is already supported for emulated MSI/MSIX but > >>>>>>> not for irqfd where the current QEMU allocates IRQ numbers from > >>>>>>> irqchip and maps MSIMessages to those IRQ in the host kernel. > >>>>>>> > >>>>>>> The patch extends irqfd support in order to avoid unnecessary > >>>>>>> mapping and reuse the one which already exists in a PCI host bridge. > >>>>>>> > >>>>>>> Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi() > >>>>>>> to PCI API. The latter tries a bus specific map_msi and falls back to > >>>>>>> the default kvm_irqchip_add_msi_route() if none set. > >>>>>>> > >>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > >>>>>> > >>>>>> The mapping (irq # within data) is really KVM on PPC64 specific, > >>>>>> isn't it? > >>>>>> > >>>>>> So why not implement > >>>>>> kvm_irqchip_add_msi_route(kvm_state, msg); > >>>>>> to simply return msg.data on PPC64? > >>>>> > >>>>> How exactly? Please give some details. kvm_irqchip_add_msi_route() is > >>>>> implemented in kvm-all.c once for all platforms so hack it right there? > >>>> > >>>> You can add the map_msi callback in kvm state, > >>>> then just call it if it's set, right? > >>>> > >>>>> I thought we discussed all of this already and decided that this thing > >>>>> should go to PCI host bridge and by default PHB's map_msi() callback should > >>>>> just call kvm_irqchip_add_msi_route(). This is why I touched i386. > >>>>> > >>>>> Things have changed since then? > >>>> > >>>> > >>>> I think pci_bus_map_msi was there since day 1, it's not like > >>>> we are going back and forth on it, right? > >>> > >>> > >>> Sorry, I am not following you. pci_bus_map_msi() is added by this patch. > >>> Where was it from day 1? > >> > >> I'm sorry. I merely meant that it's there in v1 of this patch. > >> > >>> > >>>> I always felt it's best to have irqfd isolated to kvm somehow > >>>> rather than have hooks in pci code, I just don't know enough > >>>> about kvm on ppc to figure out the right way to do this. > >>>> With your latest patches I think this is becoming clearer ... > >>> > >>> > >>> Well... This encoding (irq# as msg.data) is used in spapr_pci.c in any > >>> case, whether it is KVM or TCG. > >> > >> Not only on TCG. All emulated devices merely do a write to send an MSI, > >> this is exactly what happens with real hardware. If this happens to > >> land in the MSI region, you get an interrupt. The concept of mapping > >> msi to irq normally doesn't belong in devices/pci core, it's something > >> done by APIC or pci host. > >> > >> For KVM, we have this special hook where devices allocate a route and > >> then Linux can send an interrupt to guest quickly bypassing QEMU. It > >> was originally designed for paravirtualization, so it doesn't support > >> strange cases such as guest programming MSI to write into guest memory, > >> which is possible with real PCI devices. However, no one seems to do > >> these hacks in practice, so in practice this works for > >> some other cases, such as device assignment. > >> > >> That's why we have kvm_irqchip_add_msi_route in some places > >> in code - it's so specific devices implemented by > >> Linux can take this shortcut (it does not make sense for > >> devices implemented by qemu). > >> So the way I see it, it's not a PCI thing at all, it's > >> a KVM specific implementation, so it seems cleaner if > >> it's isolated there. > > > > The only PCI thing here (at least for spapr) is the way we translate > > MSIMessage to IRQ number. Besides that, yeah, there is no good reason to > > poison pci.c or spapr_pci.c with KVM. > > > > > >> Now, we have this allocate/free API for reasons that > >> have to do with the API of kvm on x86. spapr doesn't need to > >> allocate/free resources, so just shortcut free and > >> do map when we allocate. > >> > >> Doesn't this sound reasonable? > > > > > > Yes, pretty much. But it did not help to get this patch accepted at the > > first place and my vote costs a little here :) > > > > > >>> I am confused. > >>> 1.5 month ago Anthony suggested (I'll answer to that mail to bring that > >>> discussion up) to do this thing as: > >>> > >>>> So what this should look like is: > >>>> > >>>> 1) A PCI bus function to do the MSI -> virq mapping > >>>> 2) On x86 (and e500), this is implemented by calling > >>> kvm_irqchip_add_msi_route() > >>>> 3) On pseries, this just returns msi->data > >>>> > >>>> Perhaps (2) can just be the default PCI bus implementation to simplify > >>> things. > >>> > >>> > >>> Now it is not right. Anthony, please help. > >> > >> That's right, you implemented exactly what Anthony suggested. Now that > >> you did, I think I see an even better, more contained way to do this. > >> And it's not much of a change as compared to what you have, is it? > >> > >> I'm sorry if this looks like you are given a run-around, > >> not everyone understands how spapr works, sometimes > >> it takes a full implementation to make everyone understand > >> the issues. > >> > >> But I agree, let's see what Anthony thinks, maybe I > >> misunderstood how spapr works. > > > > > > Anthony, Alex (Graf), ping, please? > > I think it makes sense to just have this special case be treated as a special case. Why don't we just add a new global check in kvm.c similar to kvm_gsi_routing_enabled() for kvm_gsi_routing_linear(). This should probably disable routing_enabled(), but we add a new check for kvm_gsi_routing_linear(). > > So basically we would end up with something like > > diff --git a/kvm-all.c b/kvm-all.c > index 716860f..ca3251e 100644 > --- a/kvm-all.c > +++ b/kvm-all.c > @@ -1190,6 +1190,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) > struct kvm_irq_routing_entry kroute = {}; > int virq; > > + if (kvm_gsi_routing_linear()) { > + return msi.data & 0xffff; > + } > + > if (!kvm_gsi_routing_enabled()) { > return -ENOSYS; > } > > I agree that we should isolate kvm specifics to kvm code when we can. > > > Alex This makes sense to me.
On Tue, 2013-08-27 at 13:26 +0300, Michael S. Tsirkin wrote: > e would end up with something like > > > > diff --git a/kvm-all.c b/kvm-all.c > > index 716860f..ca3251e 100644 > > --- a/kvm-all.c > > +++ b/kvm-all.c > > @@ -1190,6 +1190,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) > > struct kvm_irq_routing_entry kroute = {}; > > int virq; > > > > + if (kvm_gsi_routing_linear()) { > > + return msi.data & 0xffff; > > + } > > + I haven't followed the whole discussion, Alexey, is this the per-host bridge source number or the global XIRR (XICS interrupt number) ? Because in the latter case, it can be up to 24 bits... (And yes, MSI data is limited to 16). However maybe we can decide arbitrarily that under qemu/kvm we only support 16-bit XIRRs (it's fine, from a PAPR perspective at least if it keep things simpler). Cheers, Ben.
On 27.08.2013, at 12:32, Benjamin Herrenschmidt wrote: > On Tue, 2013-08-27 at 13:26 +0300, Michael S. Tsirkin wrote: >> e would end up with something like >>> >>> diff --git a/kvm-all.c b/kvm-all.c >>> index 716860f..ca3251e 100644 >>> --- a/kvm-all.c >>> +++ b/kvm-all.c >>> @@ -1190,6 +1190,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) >>> struct kvm_irq_routing_entry kroute = {}; >>> int virq; >>> >>> + if (kvm_gsi_routing_linear()) { >>> + return msi.data & 0xffff; >>> + } >>> + > > I haven't followed the whole discussion, Alexey, is this the per-host > bridge source number or the global XIRR (XICS interrupt number) ? This is a global interrupt number now. One of Alexey's previous patches changed it to fit that scheme, which makes it a lot easier for irqfd and the likes :). > Because in the latter case, it can be up to 24 bits... (And yes, MSI > data is limited to 16). Oh. We define the MSI data field from QEMU anyways, so maybe we should just not mask at all. Alex > However maybe we can decide arbitrarily that under qemu/kvm we only > support 16-bit XIRRs (it's fine, from a PAPR perspective at least if it > keep things simpler). > > Cheers, > Ben. > >
On 08/27/2013 08:32 PM, Benjamin Herrenschmidt wrote: > On Tue, 2013-08-27 at 13:26 +0300, Michael S. Tsirkin wrote: >> e would end up with something like >>> >>> diff --git a/kvm-all.c b/kvm-all.c >>> index 716860f..ca3251e 100644 >>> --- a/kvm-all.c >>> +++ b/kvm-all.c >>> @@ -1190,6 +1190,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) >>> struct kvm_irq_routing_entry kroute = {}; >>> int virq; >>> >>> + if (kvm_gsi_routing_linear()) { >>> + return msi.data & 0xffff; >>> + } >>> + > > I haven't followed the whole discussion, Alexey, is this the per-host > bridge source number or the global XIRR (XICS interrupt number) ? When I started this topic, that was per bridge and this is why I changed it to be global and eventually this spapr-MSIX fix reached agraf/ppc-next (happened last week) so the fix above should just work now. > Because in the latter case, it can be up to 24 bits... (And yes, MSI > data is limited to 16). Yes, this is the main limitation here. > However maybe we can decide arbitrarily that under qemu/kvm we only > support 16-bit XIRRs (it's fine, from a PAPR perspective at least if it > keep things simpler). I'll keep 0xffff then as everybody is happy with this.
diff --git a/kvm-all.c b/kvm-all.c index 716860f..ca3251e 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1190,6 +1190,10 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg) struct kvm_irq_routing_entry kroute = {}; int virq; + if (kvm_gsi_routing_linear()) { + return msi.data & 0xffff; + } + if (!kvm_gsi_routing_enabled()) { return -ENOSYS; }