diff mbox

[17/21] introduce memory_region_get_address() and use it in kvm/ioapic

Message ID 1366705795-24732-18-git-send-email-imammedo@redhat.com
State New
Headers show

Commit Message

Igor Mammedov April 23, 2013, 8:29 a.m. UTC
kvm/ioapic is relying on the fact that SysBus device
maps mmio regions with offset counted from start of system memory.
But if ioapic's region is moved to another sub-region which doesn't
start at the beginning of system memory then using offset isn't correct.

Fix kvm/ioapic by providing and using helper function that returns
absolute region address in respective address space.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
Note:
  next patch "move IOAPIC to ICC bus" converts IOAPICs to ICCDevice
  and breaks SysBus device assumption used by kvm/ioapic.
---
 hw/i386/kvm/ioapic.c  |    2 +-
 include/exec/memory.h |   10 ++++++++++
 memory.c              |   11 +++++++++++
 3 files changed, 22 insertions(+), 1 deletions(-)

Comments

Paolo Bonzini April 23, 2013, 5:02 p.m. UTC | #1
Il 23/04/2013 10:29, Igor Mammedov ha scritto:
> kvm/ioapic is relying on the fact that SysBus device
> maps mmio regions with offset counted from start of system memory.
> But if ioapic's region is moved to another sub-region which doesn't
> start at the beginning of system memory then using offset isn't correct.
> 
> Fix kvm/ioapic by providing and using helper function that returns
> absolute region address in respective address space.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Looks good.

Paolo
Peter Maydell April 23, 2013, 5:06 p.m. UTC | #2
On 23 April 2013 09:29, Igor Mammedov <imammedo@redhat.com> wrote:
>  /*
> + * memory_region_get_address: get current the address of a region
> + *
> + * Returns the absolute address of a region.
> + * May be used on regions that are currently part of a memory hierarchy.
> + *
> + * @mr: the region being queried
> + */
> +hwaddr memory_region_get_address(MemoryRegion *mr);

This doesn't make sense as a memory API in my opinion. There's
no such thing as the "absolute address of a MemoryRegion".

-- PMM
Paolo Bonzini April 23, 2013, 5:14 p.m. UTC | #3
Il 23/04/2013 19:06, Peter Maydell ha scritto:
>> >  /*
>> > + * memory_region_get_address: get current the address of a region
>> > + *
>> > + * Returns the absolute address of a region.
>> > + * May be used on regions that are currently part of a memory hierarchy.
>> > + *
>> > + * @mr: the region being queried
>> > + */
>> > +hwaddr memory_region_get_address(MemoryRegion *mr);
> This doesn't make sense as a memory API in my opinion. There's
> no such thing as the "absolute address of a MemoryRegion".

What about also passing an AddressSpace and stopping at the address
space's root?  Would that make more sense?

Paolo
Peter Maydell April 23, 2013, 5:26 p.m. UTC | #4
On 23 April 2013 18:14, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 23/04/2013 19:06, Peter Maydell ha scritto:
>>> >  /*
>>> > + * memory_region_get_address: get current the address of a region
>>> > + *
>>> > + * Returns the absolute address of a region.
>>> > + * May be used on regions that are currently part of a memory hierarchy.
>>> > + *
>>> > + * @mr: the region being queried
>>> > + */
>>> > +hwaddr memory_region_get_address(MemoryRegion *mr);
>> This doesn't make sense as a memory API in my opinion. There's
>> no such thing as the "absolute address of a MemoryRegion".
>
> What about also passing an AddressSpace and stopping at the address
> space's root?  Would that make more sense?

That would be an improvement, but it still requires this
device to know what address space it's mapped into, which
is a bit ugly.

-- PMM
Jan Kiszka April 23, 2013, 5:39 p.m. UTC | #5
On 2013-04-23 19:26, Peter Maydell wrote:
> On 23 April 2013 18:14, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 23/04/2013 19:06, Peter Maydell ha scritto:
>>>>>  /*
>>>>> + * memory_region_get_address: get current the address of a region
>>>>> + *
>>>>> + * Returns the absolute address of a region.
>>>>> + * May be used on regions that are currently part of a memory hierarchy.
>>>>> + *
>>>>> + * @mr: the region being queried
>>>>> + */
>>>>> +hwaddr memory_region_get_address(MemoryRegion *mr);
>>> This doesn't make sense as a memory API in my opinion. There's
>>> no such thing as the "absolute address of a MemoryRegion".
>>
>> What about also passing an AddressSpace and stopping at the address
>> space's root?  Would that make more sense?
> 
> That would be an improvement, but it still requires this
> device to know what address space it's mapped into, which
> is a bit ugly.

Unfortunately the KVM API mandates this knowledge (IOAPIC state contains
the absolute mapping address). So we need some channel to explore where
this device is mapped.

Jan
Peter Maydell April 23, 2013, 6 p.m. UTC | #6
On 23 April 2013 18:39, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2013-04-23 19:26, Peter Maydell wrote:
>> That would be an improvement, but it still requires this
>> device to know what address space it's mapped into, which
>> is a bit ugly.
>
> Unfortunately the KVM API mandates this knowledge (IOAPIC state contains
> the absolute mapping address). So we need some channel to explore where
> this device is mapped.

For ARM we defer the mechanics of "tell the kernel where this memory
region is" from the device implementation (hw/intc/arm_gic_kvm.c) to
generic code in target-arm/kvm.c, which I personally think is a
neat way to solve this problem. It looks like the x86 ABI has
tied up one-time config stuff (like the base address) with
generic state save/load though, so that doesn't quite work for
the x86 case.

Anyway, I think I've previously argued on this list against having
a "return address of this MR in this AddressSpace" function in the
API, but I've changed my mind on that: it's kind of equivalent to
the listener API functionality except it's a one-off query.

NB: I don't think we should call it memory_region_get_address()
though -- that implies a non-existent parallel to the current
memory_region_set_address().

thanks
-- PMM
Paolo Bonzini April 23, 2013, 9:02 p.m. UTC | #7
Il 23/04/2013 20:00, Peter Maydell ha scritto:
> For ARM we defer the mechanics of "tell the kernel where this memory
> region is" from the device implementation (hw/intc/arm_gic_kvm.c) to
> generic code in target-arm/kvm.c, which I personally think is a
> neat way to solve this problem.

Yes, it is.

> NB: I don't think we should call it memory_region_get_address()
> though -- that implies a non-existent parallel to the current
> memory_region_set_address().

Hmm, good point.  address_space_get_region_addr()?

Paolo
Peter Maydell April 23, 2013, 9:39 p.m. UTC | #8
On 23 April 2013 22:02, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 23/04/2013 20:00, Peter Maydell ha scritto:
>> NB: I don't think we should call it memory_region_get_address()
>> though -- that implies a non-existent parallel to the current
>> memory_region_set_address().
>
> Hmm, good point.  address_space_get_region_addr()?

Maybe we should make this parallel-ish to memory_region_find()
(which finds an MR within an AS by addr/size)?

Actually, I think I'd favour cleaning up some of the existing
API which has parameters named "address_space" which are
MemoryRegion*s. I think these probably predate the AddressSpace
type and should be updated to use it (and be renamed to fit too).

So how about:
 MemoryRegionSection memory_region_find(MemoryRegion *address_space,
                                       hwaddr addr, uint64_t size);
becomes
 MemoryRegionSection address_space_find_region_by_addr(
                                       AddressSpace *address_space,
                                       hwaddr addr, uint64_t size);
(bit of a mouthful, but never mind)

 void memory_global_sync_dirty_bitmap(MemoryRegion *address_space);
becomes
 void address_space_sync_dirty_bitmap(AddressSpace *address_space);

(in both cases the first thing the implementation does is call
memory_region_to_address_space() to get the AddressSpace* anyway.)

and our new function is
 MemoryRegionSection address_space_find_region(AddressSpace *as,
                                               MemoryRegion *mr);

?

(we don't have to do the tidyup first, but the new function
signature makes more sense viewed in the light of the others)

thanks
-- PMM
Paolo Bonzini April 23, 2013, 9:46 p.m. UTC | #9
Il 23/04/2013 23:39, Peter Maydell ha scritto:
>> >
>> > Hmm, good point.  address_space_get_region_addr()?
> Maybe we should make this parallel-ish to memory_region_find()
> (which finds an MR within an AS by addr/size)?
> 
> Actually, I think I'd favour cleaning up some of the existing
> API which has parameters named "address_space" which are
> MemoryRegion*s. I think these probably predate the AddressSpace
> type and should be updated to use it (and be renamed to fit too).
> 
> So how about:
>  MemoryRegionSection memory_region_find(MemoryRegion *address_space,
>                                        hwaddr addr, uint64_t size);
> becomes
>  MemoryRegionSection address_space_find_region_by_addr(
>                                        AddressSpace *address_space,
>                                        hwaddr addr, uint64_t size);
> (bit of a mouthful, but never mind)
> 
>  void memory_global_sync_dirty_bitmap(MemoryRegion *address_space);
> becomes
>  void address_space_sync_dirty_bitmap(AddressSpace *address_space);
> 
> (in both cases the first thing the implementation does is call
> memory_region_to_address_space() to get the AddressSpace* anyway.)
> 
> and our new function is
>  MemoryRegionSection address_space_find_region(AddressSpace *as,
>                                                MemoryRegion *mr);
> 
> ?
> 
> (we don't have to do the tidyup first, but the new function
> signature makes more sense viewed in the light of the others)

I can look at it for 1.6, since I've already forward ported Avi's IOMMU
patches and added s/DMAContext/AddressSpace/ on top.

What about 1.5 though?

Paolo
Peter Maydell April 23, 2013, 10 p.m. UTC | #10
On 23 April 2013 22:46, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 23/04/2013 23:39, Peter Maydell ha scritto:

>> and our new function is
>>  MemoryRegionSection address_space_find_region(AddressSpace *as,
>>                                                MemoryRegion *mr);
>>
>> ?
>>
>> (we don't have to do the tidyup first, but the new function
>> signature makes more sense viewed in the light of the others)
>
> I can look at it for 1.6, since I've already forward ported Avi's IOMMU
> patches and added s/DMAContext/AddressSpace/ on top.
>
> What about 1.5 though?

Ah, I was going to ask if this was a 1.5 series. If so, maybe
just add and use that new function? Anyway, I don't want to
bikeshed the function too much, as long as it takes an
AddressSpace* to search and lets the caller distinguish
"couldn't find the region" from valid actual addresses.

thanks
-- PMM
Blue Swirl April 25, 2013, 6:37 p.m. UTC | #11
On Tue, Apr 23, 2013 at 8:29 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> kvm/ioapic is relying on the fact that SysBus device
> maps mmio regions with offset counted from start of system memory.
> But if ioapic's region is moved to another sub-region which doesn't
> start at the beginning of system memory then using offset isn't correct.

But base_address in only initialized once, never changed. Does this
patch matter now?

The correct solution would be to track changes to APICBASE register at
PIIX3 chipset level and adjust mapping and KVM base also from there.

>
> Fix kvm/ioapic by providing and using helper function that returns
> absolute region address in respective address space.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> Note:
>   next patch "move IOAPIC to ICC bus" converts IOAPICs to ICCDevice
>   and breaks SysBus device assumption used by kvm/ioapic.
> ---
>  hw/i386/kvm/ioapic.c  |    2 +-
>  include/exec/memory.h |   10 ++++++++++
>  memory.c              |   11 +++++++++++
>  3 files changed, 22 insertions(+), 1 deletions(-)
>
> diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> index a3bd519..b80d41a 100644
> --- a/hw/i386/kvm/ioapic.c
> +++ b/hw/i386/kvm/ioapic.c
> @@ -96,7 +96,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
>
>      kioapic->id = s->id;
>      kioapic->ioregsel = s->ioregsel;
> -    kioapic->base_address = s->busdev.mmio[0].addr;
> +    kioapic->base_address = memory_region_get_address(&s->io_memory);
>      kioapic->irr = s->irr;
>      for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>          kioapic->redirtbl[i].bits = s->ioredtbl[i];
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 9e88320..954f353 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -706,6 +706,16 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled);
>  void memory_region_set_address(MemoryRegion *mr, hwaddr addr);
>
>  /*
> + * memory_region_get_address: get current the address of a region
> + *
> + * Returns the absolute address of a region.
> + * May be used on regions that are currently part of a memory hierarchy.
> + *
> + * @mr: the region being queried
> + */
> +hwaddr memory_region_get_address(MemoryRegion *mr);
> +
> +/*
>   * memory_region_set_alias_offset: dynamically update a memory alias's offset
>   *
>   * Dynamically updates the offset into the target region that an alias points
> diff --git a/memory.c b/memory.c
> index 75ca281..0651050 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1413,6 +1413,17 @@ void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
>      memory_region_transaction_commit();
>  }
>
> +hwaddr memory_region_get_address(MemoryRegion *mr)
> +{
> +    hwaddr addr = mr->addr;
> +
> +    while (mr->parent) {
> +        mr = mr->parent;
> +        addr += mr->addr;
> +    }
> +    return addr;
> +}
> +
>  void memory_region_set_alias_offset(MemoryRegion *mr, hwaddr offset)
>  {
>      assert(mr->alias);
> --
> 1.7.1
>
Igor Mammedov April 26, 2013, 2:17 p.m. UTC | #12
On Thu, 25 Apr 2013 18:37:19 +0000
Blue Swirl <blauwirbel@gmail.com> wrote:

> On Tue, Apr 23, 2013 at 8:29 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> > kvm/ioapic is relying on the fact that SysBus device
> > maps mmio regions with offset counted from start of system memory.
> > But if ioapic's region is moved to another sub-region which doesn't
> > start at the beginning of system memory then using offset isn't correct.
> 
> But base_address in only initialized once, never changed. Does this
> patch matter now?
this code path is used on only at machine start-up but also on resets and
post migration to initialize IO-APIC. And unfortunately KVM API takes not
only base address but a bunch of other parameters in that ioctl, so splitting
it doesn't look feasible for 1.5. Patch is useful in aspect that it hides
direct access to parent's internals and without dangling SysBusDevice
internals it allows to convert kvm/ioapic to ICCDevice [next patch in series],
leaving code a bit cleaner.

BTW:
there is an improved patch:
http://permalink.gmane.org/gmane.comp.emulators.qemu/208512

that instead of introducing a new function improves current
memory_region_find() API.

> The correct solution would be to track changes to APICBASE register at
> PIIX3 chipset level and adjust mapping and KVM base also from there.
That we would probably need a change in KVM API first to allow set
IO-APIC's base separately.
 
> 
> >
> > Fix kvm/ioapic by providing and using helper function that returns
> > absolute region address in respective address space.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > Note:
> >   next patch "move IOAPIC to ICC bus" converts IOAPICs to ICCDevice
> >   and breaks SysBus device assumption used by kvm/ioapic.
> > ---
> >  hw/i386/kvm/ioapic.c  |    2 +-
> >  include/exec/memory.h |   10 ++++++++++
> >  memory.c              |   11 +++++++++++
> >  3 files changed, 22 insertions(+), 1 deletions(-)
> >
> > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > index a3bd519..b80d41a 100644
> > --- a/hw/i386/kvm/ioapic.c
> > +++ b/hw/i386/kvm/ioapic.c
> > @@ -96,7 +96,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> >
> >      kioapic->id = s->id;
> >      kioapic->ioregsel = s->ioregsel;
> > -    kioapic->base_address = s->busdev.mmio[0].addr;
> > +    kioapic->base_address = memory_region_get_address(&s->io_memory);
> >      kioapic->irr = s->irr;
> >      for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> >          kioapic->redirtbl[i].bits = s->ioredtbl[i];
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 9e88320..954f353 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -706,6 +706,16 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled);
> >  void memory_region_set_address(MemoryRegion *mr, hwaddr addr);
> >
> >  /*
> > + * memory_region_get_address: get current the address of a region
> > + *
> > + * Returns the absolute address of a region.
> > + * May be used on regions that are currently part of a memory hierarchy.
> > + *
> > + * @mr: the region being queried
> > + */
> > +hwaddr memory_region_get_address(MemoryRegion *mr);
> > +
> > +/*
> >   * memory_region_set_alias_offset: dynamically update a memory alias's offset
> >   *
> >   * Dynamically updates the offset into the target region that an alias points
> > diff --git a/memory.c b/memory.c
> > index 75ca281..0651050 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1413,6 +1413,17 @@ void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
> >      memory_region_transaction_commit();
> >  }
> >
> > +hwaddr memory_region_get_address(MemoryRegion *mr)
> > +{
> > +    hwaddr addr = mr->addr;
> > +
> > +    while (mr->parent) {
> > +        mr = mr->parent;
> > +        addr += mr->addr;
> > +    }
> > +    return addr;
> > +}
> > +
> >  void memory_region_set_alias_offset(MemoryRegion *mr, hwaddr offset)
> >  {
> >      assert(mr->alias);
> > --
> > 1.7.1
> >
Blue Swirl April 26, 2013, 5:35 p.m. UTC | #13
On Fri, Apr 26, 2013 at 2:17 PM, Igor Mammedov <imammedo@redhat.com> wrote:
> On Thu, 25 Apr 2013 18:37:19 +0000
> Blue Swirl <blauwirbel@gmail.com> wrote:
>
>> On Tue, Apr 23, 2013 at 8:29 AM, Igor Mammedov <imammedo@redhat.com> wrote:
>> > kvm/ioapic is relying on the fact that SysBus device
>> > maps mmio regions with offset counted from start of system memory.
>> > But if ioapic's region is moved to another sub-region which doesn't
>> > start at the beginning of system memory then using offset isn't correct.
>>
>> But base_address in only initialized once, never changed. Does this
>> patch matter now?
> this code path is used on only at machine start-up but also on resets and
> post migration to initialize IO-APIC. And unfortunately KVM API takes not
> only base address but a bunch of other parameters in that ioctl, so splitting
> it doesn't look feasible for 1.5. Patch is useful in aspect that it hides
> direct access to parent's internals and without dangling SysBusDevice
> internals it allows to convert kvm/ioapic to ICCDevice [next patch in series],
> leaving code a bit cleaner.

But as the address can't be changed (yet), the entire patch could be simply:
-    kioapic->base_address = s->busdev.mmio[0].addr;
+    kioapic->base_address = IO_APIC_DEFAULT_ADDRESS;

Later, when it's possible to change the address via PIIX3 registers,
we can adjust the base and pass that properly to kioapic and on to
KVM.

Resolving the base address every time when kvm_ioapic_put() is called
is also less efficient, assuming of course that the base address
changes less often than the KVM ioctl is used.

>
> BTW:
> there is an improved patch:
> http://permalink.gmane.org/gmane.comp.emulators.qemu/208512
>
> that instead of introducing a new function improves current
> memory_region_find() API.
>
>> The correct solution would be to track changes to APICBASE register at
>> PIIX3 chipset level and adjust mapping and KVM base also from there.
> That we would probably need a change in KVM API first to allow set
> IO-APIC's base separately.
>
>>
>> >
>> > Fix kvm/ioapic by providing and using helper function that returns
>> > absolute region address in respective address space.
>> >
>> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>> > ---
>> > Note:
>> >   next patch "move IOAPIC to ICC bus" converts IOAPICs to ICCDevice
>> >   and breaks SysBus device assumption used by kvm/ioapic.
>> > ---
>> >  hw/i386/kvm/ioapic.c  |    2 +-
>> >  include/exec/memory.h |   10 ++++++++++
>> >  memory.c              |   11 +++++++++++
>> >  3 files changed, 22 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
>> > index a3bd519..b80d41a 100644
>> > --- a/hw/i386/kvm/ioapic.c
>> > +++ b/hw/i386/kvm/ioapic.c
>> > @@ -96,7 +96,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
>> >
>> >      kioapic->id = s->id;
>> >      kioapic->ioregsel = s->ioregsel;
>> > -    kioapic->base_address = s->busdev.mmio[0].addr;
>> > +    kioapic->base_address = memory_region_get_address(&s->io_memory);
>> >      kioapic->irr = s->irr;
>> >      for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>> >          kioapic->redirtbl[i].bits = s->ioredtbl[i];
>> > diff --git a/include/exec/memory.h b/include/exec/memory.h
>> > index 9e88320..954f353 100644
>> > --- a/include/exec/memory.h
>> > +++ b/include/exec/memory.h
>> > @@ -706,6 +706,16 @@ void memory_region_set_enabled(MemoryRegion *mr, bool enabled);
>> >  void memory_region_set_address(MemoryRegion *mr, hwaddr addr);
>> >
>> >  /*
>> > + * memory_region_get_address: get current the address of a region
>> > + *
>> > + * Returns the absolute address of a region.
>> > + * May be used on regions that are currently part of a memory hierarchy.
>> > + *
>> > + * @mr: the region being queried
>> > + */
>> > +hwaddr memory_region_get_address(MemoryRegion *mr);
>> > +
>> > +/*
>> >   * memory_region_set_alias_offset: dynamically update a memory alias's offset
>> >   *
>> >   * Dynamically updates the offset into the target region that an alias points
>> > diff --git a/memory.c b/memory.c
>> > index 75ca281..0651050 100644
>> > --- a/memory.c
>> > +++ b/memory.c
>> > @@ -1413,6 +1413,17 @@ void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
>> >      memory_region_transaction_commit();
>> >  }
>> >
>> > +hwaddr memory_region_get_address(MemoryRegion *mr)
>> > +{
>> > +    hwaddr addr = mr->addr;
>> > +
>> > +    while (mr->parent) {
>> > +        mr = mr->parent;
>> > +        addr += mr->addr;
>> > +    }
>> > +    return addr;
>> > +}
>> > +
>> >  void memory_region_set_alias_offset(MemoryRegion *mr, hwaddr offset)
>> >  {
>> >      assert(mr->alias);
>> > --
>> > 1.7.1
>> >
>
>
> --
> Regards,
>   Igor
Igor Mammedov April 26, 2013, 5:46 p.m. UTC | #14
On Fri, 26 Apr 2013 17:35:41 +0000
Blue Swirl <blauwirbel@gmail.com> wrote:

> On Fri, Apr 26, 2013 at 2:17 PM, Igor Mammedov <imammedo@redhat.com> wrote:
> > On Thu, 25 Apr 2013 18:37:19 +0000
> > Blue Swirl <blauwirbel@gmail.com> wrote:
> >
> >> On Tue, Apr 23, 2013 at 8:29 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> >> > kvm/ioapic is relying on the fact that SysBus device
> >> > maps mmio regions with offset counted from start of system memory.
> >> > But if ioapic's region is moved to another sub-region which doesn't
> >> > start at the beginning of system memory then using offset isn't correct.
> >>
> >> But base_address in only initialized once, never changed. Does this
> >> patch matter now?
> > this code path is used on only at machine start-up but also on resets and
> > post migration to initialize IO-APIC. And unfortunately KVM API takes not
> > only base address but a bunch of other parameters in that ioctl, so splitting
> > it doesn't look feasible for 1.5. Patch is useful in aspect that it hides
> > direct access to parent's internals and without dangling SysBusDevice
> > internals it allows to convert kvm/ioapic to ICCDevice [next patch in series],
> > leaving code a bit cleaner.
> 
> But as the address can't be changed (yet), the entire patch could be simply:
> -    kioapic->base_address = s->busdev.mmio[0].addr;
> +    kioapic->base_address = IO_APIC_DEFAULT_ADDRESS;
It's a bit fragile, but that for sure simpler and can work.

Jan, Paolo,
Are you ok with this approach? 

> 
> Later, when it's possible to change the address via PIIX3 registers,
> we can adjust the base and pass that properly to kioapic and on to
> KVM.
> 
> Resolving the base address every time when kvm_ioapic_put() is called
> is also less efficient, assuming of course that the base address
> changes less often than the KVM ioctl is used.
> 
> >
> > BTW:
> > there is an improved patch:
> > http://permalink.gmane.org/gmane.comp.emulators.qemu/208512
> >
> > that instead of introducing a new function improves current
> > memory_region_find() API.
> >
> >> The correct solution would be to track changes to APICBASE register at
> >> PIIX3 chipset level and adjust mapping and KVM base also from there.
> > That we would probably need a change in KVM API first to allow set
> > IO-APIC's base separately.
> >
Paolo Bonzini April 26, 2013, 10:13 p.m. UTC | #15
Il 26/04/2013 19:46, Igor Mammedov ha scritto:
>> > But as the address can't be changed (yet), the entire patch could be simply:
>> > -    kioapic->base_address = s->busdev.mmio[0].addr;
>> > +    kioapic->base_address = IO_APIC_DEFAULT_ADDRESS;
> It's a bit fragile, but that for sure simpler and can work.
> 
> Jan, Paolo,
> Are you ok with this approach? 
> 

I think extending memory_region_find is a good idea anyway, and at this
point I don't see a reason to do the above change...

Paolo
Blue Swirl April 27, 2013, 10:09 a.m. UTC | #16
On Fri, Apr 26, 2013 at 10:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 26/04/2013 19:46, Igor Mammedov ha scritto:
>>> > But as the address can't be changed (yet), the entire patch could be simply:
>>> > -    kioapic->base_address = s->busdev.mmio[0].addr;
>>> > +    kioapic->base_address = IO_APIC_DEFAULT_ADDRESS;
>> It's a bit fragile, but that for sure simpler and can work.
>>
>> Jan, Paolo,
>> Are you ok with this approach?
>>
>
> I think extending memory_region_find is a good idea anyway, and at this
> point I don't see a reason to do the above change...

The reasoning was in the part that Igor cut off:

"Later, when it's possible to change the address via PIIX3 registers,
we can adjust the base and pass that properly to kioapic and on to
KVM.

Resolving the base address every time when kvm_ioapic_put() is called
is also less efficient, assuming of course that the base address
changes less often than the KVM ioctl is used."

I think the patch is a bit flawed. If the guest maps something else on
top of IOAPIC, like LAPIC (which should be in CPU specific address
spaces, but for now it lives in the global system memory space), the
guest could trigger the abort() by resetting the system.

>
> Paolo
Paolo Bonzini April 27, 2013, 12:12 p.m. UTC | #17
Il 27/04/2013 12:09, Blue Swirl ha scritto:
> On Fri, Apr 26, 2013 at 10:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 26/04/2013 19:46, Igor Mammedov ha scritto:
>>>>> But as the address can't be changed (yet), the entire patch could be simply:
>>>>> -    kioapic->base_address = s->busdev.mmio[0].addr;
>>>>> +    kioapic->base_address = IO_APIC_DEFAULT_ADDRESS;
>>> It's a bit fragile, but that for sure simpler and can work.
>>>
>>> Jan, Paolo,
>>> Are you ok with this approach?
>>>
>>
>> I think extending memory_region_find is a good idea anyway, and at this
>> point I don't see a reason to do the above change...
> 
> The reasoning was in the part that Igor cut off:
> 
> "Later, when it's possible to change the address via PIIX3 registers,
> we can adjust the base and pass that properly to kioapic and on to
> KVM.
> 
> Resolving the base address every time when kvm_ioapic_put() is called
> is also less efficient, assuming of course that the base address
> changes less often than the KVM ioctl is used."
> 
> I think the patch is a bit flawed. If the guest maps something else on
> top of IOAPIC, like LAPIC (which should be in CPU specific address
> spaces, but for now it lives in the global system memory space), the
> guest could trigger the abort() by resetting the system.

The questions are, in order of importance:

(1) what privileges would this require in the guest?  Answer: a lot.

(2) is this likely to happen by chance?  Answer: no, not at all.

(3) is there a workaround?  Answer: yes, disable in-kernel irqchip.

Simply setting IO_APIC_DEFAULT_ADDRESS is also flawed in my opinion.
I'm not sure the in-kernel irqchip handles correctly an overlap between
the IOAPIC and LAPIC regions, maybe an abort is predictable after all.

Paolo
Blue Swirl April 27, 2013, 8:57 p.m. UTC | #18
On Sat, Apr 27, 2013 at 12:12 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 27/04/2013 12:09, Blue Swirl ha scritto:
>> On Fri, Apr 26, 2013 at 10:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 26/04/2013 19:46, Igor Mammedov ha scritto:
>>>>>> But as the address can't be changed (yet), the entire patch could be simply:
>>>>>> -    kioapic->base_address = s->busdev.mmio[0].addr;
>>>>>> +    kioapic->base_address = IO_APIC_DEFAULT_ADDRESS;
>>>> It's a bit fragile, but that for sure simpler and can work.
>>>>
>>>> Jan, Paolo,
>>>> Are you ok with this approach?
>>>>
>>>
>>> I think extending memory_region_find is a good idea anyway, and at this
>>> point I don't see a reason to do the above change...
>>
>> The reasoning was in the part that Igor cut off:
>>
>> "Later, when it's possible to change the address via PIIX3 registers,
>> we can adjust the base and pass that properly to kioapic and on to
>> KVM.
>>
>> Resolving the base address every time when kvm_ioapic_put() is called
>> is also less efficient, assuming of course that the base address
>> changes less often than the KVM ioctl is used."
>>
>> I think the patch is a bit flawed. If the guest maps something else on
>> top of IOAPIC, like LAPIC (which should be in CPU specific address
>> spaces, but for now it lives in the global system memory space), the
>> guest could trigger the abort() by resetting the system.
>
> The questions are, in order of importance:
>
> (1) what privileges would this require in the guest?  Answer: a lot.
>
> (2) is this likely to happen by chance?  Answer: no, not at all.
>
> (3) is there a workaround?  Answer: yes, disable in-kernel irqchip.

These questions ask if there is a risk of benevolent guests performing
these activities and I agree that the chances are close to zero.

But the interesting question is to ask if a malevolent guest can bring
down a VM uncontrollably this way and I think it only needs a few
elevated privileges in a guest to do this.

The fix is to avoid abort(), which is a separate issue to whether the
address base should be resolved for each KVM ioctl or not.

>
> Simply setting IO_APIC_DEFAULT_ADDRESS is also flawed in my opinion.
> I'm not sure the in-kernel irqchip handles correctly an overlap between
> the IOAPIC and LAPIC regions, maybe an abort is predictable after all.

At least the guest needs to be stopped. Perhaps we should have a
common function which does this and logs the guest error so we can
start replacing calls to abort() with it.

>
> Paolo
Paolo Bonzini April 29, 2013, 9:49 a.m. UTC | #19
Il 27/04/2013 22:57, Blue Swirl ha scritto:
>> The questions are, in order of importance:
>>
>> (1) what privileges would this require in the guest?  Answer: a lot.
>>
>> (2) is this likely to happen by chance?  Answer: no, not at all.
>>
>> (3) is there a workaround?  Answer: yes, disable in-kernel irqchip.
> 
> These questions ask if there is a risk of benevolent guests performing
> these activities and I agree that the chances are close to zero.
> 
> But the interesting question is to ask if a malevolent guest can bring
> down a VM uncontrollably this way and I think it only needs a few
> elevated privileges in a guest to do this.

If you have them, isn't it simpler to just turn off the VM (using APM or
ACPI)?  Also, killing your guest is not a very interesting thing to do
once you've gotten elevated privileges. ;)

>> Simply setting IO_APIC_DEFAULT_ADDRESS is also flawed in my opinion.
>> I'm not sure the in-kernel irqchip handles correctly an overlap between
>> the IOAPIC and LAPIC regions, maybe an abort is predictable after all.
> 
> At least the guest needs to be stopped. Perhaps we should have a
> common function which does this and logs the guest error so we can
> start replacing calls to abort() with it.

Yes, that's a good idea.  We can reuse the internal error runstate for that.

Paolo
Igor Mammedov April 29, 2013, 9:55 a.m. UTC | #20
On Sat, 27 Apr 2013 20:57:26 +0000
Blue Swirl <blauwirbel@gmail.com> wrote:

> On Sat, Apr 27, 2013 at 12:12 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 27/04/2013 12:09, Blue Swirl ha scritto:
> >> On Fri, Apr 26, 2013 at 10:13 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>> Il 26/04/2013 19:46, Igor Mammedov ha scritto:
> >>>>>> But as the address can't be changed (yet), the entire patch could be simply:
> >>>>>> -    kioapic->base_address = s->busdev.mmio[0].addr;
> >>>>>> +    kioapic->base_address = IO_APIC_DEFAULT_ADDRESS;
> >>>> It's a bit fragile, but that for sure simpler and can work.
> >>>>
> >>>> Jan, Paolo,
> >>>> Are you ok with this approach?
> >>>>
> >>>
> >>> I think extending memory_region_find is a good idea anyway, and at this
> >>> point I don't see a reason to do the above change...
> >>
> >> The reasoning was in the part that Igor cut off:
> >>
> >> "Later, when it's possible to change the address via PIIX3 registers,
> >> we can adjust the base and pass that properly to kioapic and on to
> >> KVM.
> >>
> >> Resolving the base address every time when kvm_ioapic_put() is called
> >> is also less efficient, assuming of course that the base address
> >> changes less often than the KVM ioctl is used."
> >>
> >> I think the patch is a bit flawed. If the guest maps something else on
> >> top of IOAPIC, like LAPIC (which should be in CPU specific address
> >> spaces, but for now it lives in the global system memory space), the
> >> guest could trigger the abort() by resetting the system.
> >
> > The questions are, in order of importance:
> >
> > (1) what privileges would this require in the guest?  Answer: a lot.
> >
> > (2) is this likely to happen by chance?  Answer: no, not at all.
> >
> > (3) is there a workaround?  Answer: yes, disable in-kernel irqchip.
> 
> These questions ask if there is a risk of benevolent guests performing
> these activities and I agree that the chances are close to zero.
> 
> But the interesting question is to ask if a malevolent guest can bring
> down a VM uncontrollably this way and I think it only needs a few
> elevated privileges in a guest to do this.
> 
> The fix is to avoid abort(), which is a separate issue to whether the
> address base should be resolved for each KVM ioctl or not.
> 
> >
> > Simply setting IO_APIC_DEFAULT_ADDRESS is also flawed in my opinion.
> > I'm not sure the in-kernel irqchip handles correctly an overlap between
> > the IOAPIC and LAPIC regions, maybe an abort is predictable after all.
> 
> At least the guest needs to be stopped. Perhaps we should have a
> common function which does this and logs the guest error so we can
> start replacing calls to abort() with it.
> 
> >
> > Paolo
> 

It looks like discussion got deviated from what patch does to another issue.

this patch doesn't address/change the way how/when base_address should be
set/updated but it has it's benefits as well:
 - removes/cleanups access to private field of parent, which allows to convert
   it to non SysBusDevice
 - extended memory_region_find() opens venue for cleaning-up/re-factoring
   devices that use framebuffer which are forced currently to access
   system_address_space directly.
diff mbox

Patch

diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
index a3bd519..b80d41a 100644
--- a/hw/i386/kvm/ioapic.c
+++ b/hw/i386/kvm/ioapic.c
@@ -96,7 +96,7 @@  static void kvm_ioapic_put(IOAPICCommonState *s)
 
     kioapic->id = s->id;
     kioapic->ioregsel = s->ioregsel;
-    kioapic->base_address = s->busdev.mmio[0].addr;
+    kioapic->base_address = memory_region_get_address(&s->io_memory);
     kioapic->irr = s->irr;
     for (i = 0; i < IOAPIC_NUM_PINS; i++) {
         kioapic->redirtbl[i].bits = s->ioredtbl[i];
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9e88320..954f353 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -706,6 +706,16 @@  void memory_region_set_enabled(MemoryRegion *mr, bool enabled);
 void memory_region_set_address(MemoryRegion *mr, hwaddr addr);
 
 /*
+ * memory_region_get_address: get current the address of a region
+ *
+ * Returns the absolute address of a region.
+ * May be used on regions that are currently part of a memory hierarchy.
+ *
+ * @mr: the region being queried
+ */
+hwaddr memory_region_get_address(MemoryRegion *mr);
+
+/*
  * memory_region_set_alias_offset: dynamically update a memory alias's offset
  *
  * Dynamically updates the offset into the target region that an alias points
diff --git a/memory.c b/memory.c
index 75ca281..0651050 100644
--- a/memory.c
+++ b/memory.c
@@ -1413,6 +1413,17 @@  void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
     memory_region_transaction_commit();
 }
 
+hwaddr memory_region_get_address(MemoryRegion *mr)
+{
+    hwaddr addr = mr->addr;
+
+    while (mr->parent) {
+        mr = mr->parent;
+        addr += mr->addr;
+    }
+    return addr;
+}
+
 void memory_region_set_alias_offset(MemoryRegion *mr, hwaddr offset)
 {
     assert(mr->alias);