Patchwork [RFC,ppc-next,3/6] memory: add memory_region_to_address()

login
register
mail settings
Submitter Scott Wood
Date Feb. 14, 2013, 6:31 a.m.
Message ID <1360823521-32306-4-git-send-email-scottwood@freescale.com>
Download mbox | patch
Permalink /patch/220385/
State New
Headers show

Comments

Scott Wood - Feb. 14, 2013, 6:31 a.m.
This is useful for when a user of the memory region API needs to
communicate the absolute bus address to something outside QEMU
(in particular, KVM).

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 include/exec/memory.h |    9 +++++++++
 memory.c              |   38 ++++++++++++++++++++++++++++++++++----
 2 files changed, 43 insertions(+), 4 deletions(-)
Alexander Graf - March 21, 2013, 8:31 a.m.
On 14.02.2013, at 07:31, Scott Wood wrote:

> This is useful for when a user of the memory region API needs to
> communicate the absolute bus address to something outside QEMU
> (in particular, KVM).
> 
> Signed-off-by: Scott Wood <scottwood@freescale.com>

Peter, how does the VGIC implementation handle this?


Alex

> ---
> include/exec/memory.h |    9 +++++++++
> memory.c              |   38 ++++++++++++++++++++++++++++++++++----
> 2 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 2322732..b800391 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -892,6 +892,15 @@ void *address_space_map(AddressSpace *as, hwaddr addr,
> void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
>                          int is_write, hwaddr access_len);
> 
> +/* memory_region_to_address: Find the full address of the start of the
> + *      given #MemoryRegion, ignoring aliases.  There is no guarantee
> + *      that the #MemoryRegion is actually visible at this address, if
> + *      there are overlapping regions.
> + *
> + * @mr: #MemoryRegion being queried
> + * @asp: if non-NULL, returns the #AddressSpace @mr is mapped in, if any
> + */
> +hwaddr memory_region_to_address(MemoryRegion *mr, AddressSpace **asp);
> 
> #endif
> 
> diff --git a/memory.c b/memory.c
> index cd7d5e0..0099f12 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -453,21 +453,51 @@ const IORangeOps memory_region_iorange_ops = {
>     .destructor = memory_region_iorange_destructor,
> };
> 
> -static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
> +static AddressSpace *memory_region_root_to_address_space(MemoryRegion *mr)
> {
>     AddressSpace *as;
> 
> -    while (mr->parent) {
> -        mr = mr->parent;
> -    }
>     QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>         if (mr == as->root) {
>             return as;
>         }
>     }
> +
> +    return NULL;
> +}
> +
> +static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
> +{
> +    AddressSpace *as;
> +
> +    while (mr->parent) {
> +        mr = mr->parent;
> +    }
> +
> +    as = memory_region_root_to_address_space(mr);
> +    if (as) {
> +        return as;
> +    }
> +
>     abort();
> }
> 
> +hwaddr memory_region_to_address(MemoryRegion *mr, AddressSpace **asp)
> +{
> +    hwaddr addr = mr->addr;
> +
> +    while (mr->parent) {
> +        mr = mr->parent;
> +        addr += mr->addr;
> +    }
> +
> +    if (asp) {
> +        *asp = memory_region_root_to_address_space(mr);
> +    }
> +
> +    return addr;
> +}
> +
> /* Render a memory region into the global view.  Ranges in @view obscure
>  * ranges in @mr.
>  */
> -- 
> 1.7.9.5
> 
>
Peter Maydell - March 21, 2013, 10:53 a.m.
On 21 March 2013 08:31, Alexander Graf <agraf@suse.de> wrote:
> On 14.02.2013, at 07:31, Scott Wood wrote:
>> This is useful for when a user of the memory region API needs to
>> communicate the absolute bus address to something outside QEMU
>> (in particular, KVM).
>>
>> Signed-off-by: Scott Wood <scottwood@freescale.com>
>
> Peter, how does the VGIC implementation handle this?

Check kvm_arm_register_device() in target-arm/kvm.c. Basically
the VGIC device model calls this function to say "tell the kernel
where this MemoryRegion is in the system address space, when it
eventually gets mapped". The code in kvm.c uses the memory system's
Notifier API to get a callback when the region is mapped into
an address space, which it uses to track the offset in the
address space. Finally, we use a machine init notifier so that
just before everything finally starts we can make the KVM ioctls
to say "here is where everything lives".

I think this is a pretty neat way of doing it because it means
neither the interrupt controller device nor the board model
really need to care about the kernel being told where things
are mapped; it's all abstracted out into kvm.c. If your
interrupt controller can be moved around at runtime that's
probably also handlable, but the ARM code just unregisters its
notifiers at machine init because the GIC can't move.

(I think the code assumes the device only gets mapped into
one address space; this could easily be fixed if it's not true
at some point in the future.)

thanks
-- PMM
Alexander Graf - March 21, 2013, 10:59 a.m.
On 21.03.2013, at 11:53, Peter Maydell wrote:

> On 21 March 2013 08:31, Alexander Graf <agraf@suse.de> wrote:
>> On 14.02.2013, at 07:31, Scott Wood wrote:
>>> This is useful for when a user of the memory region API needs to
>>> communicate the absolute bus address to something outside QEMU
>>> (in particular, KVM).
>>> 
>>> Signed-off-by: Scott Wood <scottwood@freescale.com>
>> 
>> Peter, how does the VGIC implementation handle this?
> 
> Check kvm_arm_register_device() in target-arm/kvm.c. Basically
> the VGIC device model calls this function to say "tell the kernel
> where this MemoryRegion is in the system address space, when it
> eventually gets mapped". The code in kvm.c uses the memory system's
> Notifier API to get a callback when the region is mapped into
> an address space, which it uses to track the offset in the
> address space. Finally, we use a machine init notifier so that
> just before everything finally starts we can make the KVM ioctls
> to say "here is where everything lives".

Same thing here. The question is how the kvm-vgic code in QEMU finds out where it got mapped to. Scott adds this patch to do this, but I'd assume you have some other way :)


Alex

> 
> I think this is a pretty neat way of doing it because it means
> neither the interrupt controller device nor the board model
> really need to care about the kernel being told where things
> are mapped; it's all abstracted out into kvm.c. If your
> interrupt controller can be moved around at runtime that's
> probably also handlable, but the ARM code just unregisters its
> notifiers at machine init because the GIC can't move.
> 
> (I think the code assumes the device only gets mapped into
> one address space; this could easily be fixed if it's not true
> at some point in the future.)
> 
> thanks
> -- PMM
Peter Maydell - March 21, 2013, 11:01 a.m.
On 21 March 2013 10:59, Alexander Graf <agraf@suse.de> wrote:
> On 21.03.2013, at 11:53, Peter Maydell wrote:
>> Check kvm_arm_register_device() in target-arm/kvm.c. Basically
>> the VGIC device model calls this function to say "tell the kernel
>> where this MemoryRegion is in the system address space, when it
>> eventually gets mapped". The code in kvm.c uses the memory system's
>> Notifier API to get a callback when the region is mapped into
>> an address space, which it uses to track the offset in the
>> address space. Finally, we use a machine init notifier so that
>> just before everything finally starts we can make the KVM ioctls
>> to say "here is where everything lives".
>
> Same thing here. The question is how the kvm-vgic code in QEMU
> finds out where it got mapped to. Scott adds this patch to do
> this, but I'd assume you have some other way :)

Hmm? The kvm-vgic code in QEMU doesn't need to know where it
lives. We have to tell the kernel so it can map its bits of
registers in at the right place, that's all.

-- PMM
Alexander Graf - March 21, 2013, 11:05 a.m.
On 21.03.2013, at 12:01, Peter Maydell wrote:

> On 21 March 2013 10:59, Alexander Graf <agraf@suse.de> wrote:
>> On 21.03.2013, at 11:53, Peter Maydell wrote:
>>> Check kvm_arm_register_device() in target-arm/kvm.c. Basically
>>> the VGIC device model calls this function to say "tell the kernel
>>> where this MemoryRegion is in the system address space, when it
>>> eventually gets mapped". The code in kvm.c uses the memory system's
>>> Notifier API to get a callback when the region is mapped into
>>> an address space, which it uses to track the offset in the
>>> address space. Finally, we use a machine init notifier so that
>>> just before everything finally starts we can make the KVM ioctls
>>> to say "here is where everything lives".
>> 
>> Same thing here. The question is how the kvm-vgic code in QEMU
>> finds out where it got mapped to. Scott adds this patch to do
>> this, but I'd assume you have some other way :)
> 
> Hmm? The kvm-vgic code in QEMU doesn't need to know where it
> lives. We have to tell the kernel so it can map its bits of
> registers in at the right place, that's all.

The kvm-vgic code in QEMU needs to tell the kernel, no? For that, it needs to know what to tell the kernel.

This patch adds a function that allows kvm-openpic to fetch its base flat address from the MemoryListener. I was wondering whether either this patch is superfluous or you guys had an awkward MemoryListener handler :)


Alex
Peter Maydell - March 21, 2013, 11:09 a.m.
On 21 March 2013 11:05, Alexander Graf <agraf@suse.de> wrote:
>
> On 21.03.2013, at 12:01, Peter Maydell wrote:
>
>> On 21 March 2013 10:59, Alexander Graf <agraf@suse.de> wrote:
>>> On 21.03.2013, at 11:53, Peter Maydell wrote:
>>>> Check kvm_arm_register_device() in target-arm/kvm.c. Basically
>>>> the VGIC device model calls this function to say "tell the kernel
>>>> where this MemoryRegion is in the system address space, when it
>>>> eventually gets mapped". The code in kvm.c uses the memory system's
>>>> Notifier API to get a callback when the region is mapped into
>>>> an address space, which it uses to track the offset in the
>>>> address space. Finally, we use a machine init notifier so that
>>>> just before everything finally starts we can make the KVM ioctls
>>>> to say "here is where everything lives".
>>>
>>> Same thing here. The question is how the kvm-vgic code in QEMU
>>> finds out where it got mapped to. Scott adds this patch to do
>>> this, but I'd assume you have some other way :)
>>
>> Hmm? The kvm-vgic code in QEMU doesn't need to know where it
>> lives. We have to tell the kernel so it can map its bits of
>> registers in at the right place, that's all.
>
> The kvm-vgic code in QEMU needs to tell the kernel, no? For
> that, it needs to know what to tell the kernel.

No. As I explained earlier, all the kvm-vgic code needs to do
is call kvm_arm_register_device(). That code in kvm.c then takes
care of telling the kernel. hw/kvm/arm_gic.c itself never knows or
needs to know where it's mapped. This is the whole point of the
mechanism involving notifiers.

> This patch adds a function that allows kvm-openpic to fetch its
> base flat address from the MemoryListener.

This sounds to me like the wrong way to do it -- it's board
models that decide where devices are mapped and the device
code itself shouldn't have to know where it has been mapped.

-- PMM
Alexander Graf - March 21, 2013, 11:14 a.m.
On 21.03.2013, at 12:09, Peter Maydell wrote:

> On 21 March 2013 11:05, Alexander Graf <agraf@suse.de> wrote:
>> 
>> On 21.03.2013, at 12:01, Peter Maydell wrote:
>> 
>>> On 21 March 2013 10:59, Alexander Graf <agraf@suse.de> wrote:
>>>> On 21.03.2013, at 11:53, Peter Maydell wrote:
>>>>> Check kvm_arm_register_device() in target-arm/kvm.c. Basically
>>>>> the VGIC device model calls this function to say "tell the kernel
>>>>> where this MemoryRegion is in the system address space, when it
>>>>> eventually gets mapped". The code in kvm.c uses the memory system's
>>>>> Notifier API to get a callback when the region is mapped into
>>>>> an address space, which it uses to track the offset in the
>>>>> address space. Finally, we use a machine init notifier so that
>>>>> just before everything finally starts we can make the KVM ioctls
>>>>> to say "here is where everything lives".
>>>> 
>>>> Same thing here. The question is how the kvm-vgic code in QEMU
>>>> finds out where it got mapped to. Scott adds this patch to do
>>>> this, but I'd assume you have some other way :)
>>> 
>>> Hmm? The kvm-vgic code in QEMU doesn't need to know where it
>>> lives. We have to tell the kernel so it can map its bits of
>>> registers in at the right place, that's all.
>> 
>> The kvm-vgic code in QEMU needs to tell the kernel, no? For
>> that, it needs to know what to tell the kernel.
> 
> No. As I explained earlier, all the kvm-vgic code needs to do
> is call kvm_arm_register_device(). That code in kvm.c then takes
> care of telling the kernel. hw/kvm/arm_gic.c itself never knows or
> needs to know where it's mapped. This is the whole point of the
> mechanism involving notifiers.

I fully disagree. Code that talks to the in-kernel device should live in hw/kvm/<device>.c, not in some random target-XXX/kvm.c file. Of course the board defines where the device gets mapped to, but the communication with the in-kernel device bits should really be contained to the device itself.

So this is the function that gets invoked on ARM:

static void kvm_arm_machine_init_done(Notifier *notifier, void *data)
{
    KVMDevice *kd, *tkd;

    memory_listener_unregister(&devlistener);
    QSLIST_FOREACH_SAFE(kd, &kvm_devices_head, entries, tkd) {
        if (kd->kda.addr != -1) {
            if (kvm_vm_ioctl(kvm_state, KVM_ARM_SET_DEVICE_ADDR,
                             &kd->kda) < 0) {
                fprintf(stderr, "KVM_ARM_SET_DEVICE_ADDRESS failed: %s\n",
                        strerror(errno));
                abort();
            }
        }
        g_free(kd);
    }
}

This only goes one level deep, right? So if you ever have to nest the VGIC inside of another MemoryRegion, this will break, right?


Alex

> 
>> This patch adds a function that allows kvm-openpic to fetch its
>> base flat address from the MemoryListener.
> 
> This sounds to me like the wrong way to do it -- it's board
> models that decide where devices are mapped and the device
> code itself shouldn't have to know where it has been mapped.
> 
> -- PMM
Peter Maydell - March 21, 2013, 11:22 a.m.
On 21 March 2013 11:14, Alexander Graf <agraf@suse.de> wrote:
>
> On 21.03.2013, at 12:09, Peter Maydell wrote:
>
>> On 21 March 2013 11:05, Alexander Graf <agraf@suse.de> wrote:
>>>
>>> On 21.03.2013, at 12:01, Peter Maydell wrote:
>>>
>>>> On 21 March 2013 10:59, Alexander Graf <agraf@suse.de> wrote:
>>>>> On 21.03.2013, at 11:53, Peter Maydell wrote:
>>>>>> Check kvm_arm_register_device() in target-arm/kvm.c. Basically
>>>>>> the VGIC device model calls this function to say "tell the kernel
>>>>>> where this MemoryRegion is in the system address space, when it
>>>>>> eventually gets mapped". The code in kvm.c uses the memory system's
>>>>>> Notifier API to get a callback when the region is mapped into
>>>>>> an address space, which it uses to track the offset in the
>>>>>> address space. Finally, we use a machine init notifier so that
>>>>>> just before everything finally starts we can make the KVM ioctls
>>>>>> to say "here is where everything lives".
>>>>>
>>>>> Same thing here. The question is how the kvm-vgic code in QEMU
>>>>> finds out where it got mapped to. Scott adds this patch to do
>>>>> this, but I'd assume you have some other way :)
>>>>
>>>> Hmm? The kvm-vgic code in QEMU doesn't need to know where it
>>>> lives. We have to tell the kernel so it can map its bits of
>>>> registers in at the right place, that's all.
>>>
>>> The kvm-vgic code in QEMU needs to tell the kernel, no? For
>>> that, it needs to know what to tell the kernel.
>>
>> No. As I explained earlier, all the kvm-vgic code needs to do
>> is call kvm_arm_register_device(). That code in kvm.c then takes
>> care of telling the kernel. hw/kvm/arm_gic.c itself never knows or
>> needs to know where it's mapped. This is the whole point of the
>> mechanism involving notifiers.
>
> I fully disagree. Code that talks to the in-kernel device should
> live in hw/kvm/<device>.c, not in some random target-XXX/kvm.c file.

The code in kvm.c is entirely generic -- it provides a mechanism
for a device to say "this memory region is kernel ID X and it will
want to know where it lives". The kvm.c code will work for any device
with a memory mapped region, whether it's the GIC or something else.

> Of course the board defines where the device gets mapped to, but
> the communication with the in-kernel device bits should really be
> contained to the device itself.

You're arguing that every device should implement its own set
of notifier functions so it can get called back when its memory
regions are finally mapped, just so it can make a non-device-specific
KVM ioctl? The obvious thing to do is abstract that functionality
out.

> So this is the function that gets invoked on ARM:
>
> static void kvm_arm_machine_init_done(Notifier *notifier, void *data)
> {
>     KVMDevice *kd, *tkd;
>
>     memory_listener_unregister(&devlistener);
>     QSLIST_FOREACH_SAFE(kd, &kvm_devices_head, entries, tkd) {
>         if (kd->kda.addr != -1) {
>             if (kvm_vm_ioctl(kvm_state, KVM_ARM_SET_DEVICE_ADDR,
>                              &kd->kda) < 0) {
>                 fprintf(stderr, "KVM_ARM_SET_DEVICE_ADDRESS failed: %s\n",
>                         strerror(errno));
>                 abort();
>             }
>         }
>         g_free(kd);
>     }
> }
>
> This only goes one level deep, right? So if you ever have to nest the
> VGIC inside of another MemoryRegion, this will break, right?

We already nest the VGIC inside another memory region (the a15mpcore
container), and it works fine. This function is just iterating through
"everything any device asked me to tell the kernel about".

-- PMM
Alexander Graf - March 21, 2013, 11:29 a.m.
On 21.03.2013, at 12:22, Peter Maydell wrote:

> On 21 March 2013 11:14, Alexander Graf <agraf@suse.de> wrote:
>> 
>> On 21.03.2013, at 12:09, Peter Maydell wrote:
>> 
>>> On 21 March 2013 11:05, Alexander Graf <agraf@suse.de> wrote:
>>>> 
>>>> On 21.03.2013, at 12:01, Peter Maydell wrote:
>>>> 
>>>>> On 21 March 2013 10:59, Alexander Graf <agraf@suse.de> wrote:
>>>>>> On 21.03.2013, at 11:53, Peter Maydell wrote:
>>>>>>> Check kvm_arm_register_device() in target-arm/kvm.c. Basically
>>>>>>> the VGIC device model calls this function to say "tell the kernel
>>>>>>> where this MemoryRegion is in the system address space, when it
>>>>>>> eventually gets mapped". The code in kvm.c uses the memory system's
>>>>>>> Notifier API to get a callback when the region is mapped into
>>>>>>> an address space, which it uses to track the offset in the
>>>>>>> address space. Finally, we use a machine init notifier so that
>>>>>>> just before everything finally starts we can make the KVM ioctls
>>>>>>> to say "here is where everything lives".
>>>>>> 
>>>>>> Same thing here. The question is how the kvm-vgic code in QEMU
>>>>>> finds out where it got mapped to. Scott adds this patch to do
>>>>>> this, but I'd assume you have some other way :)
>>>>> 
>>>>> Hmm? The kvm-vgic code in QEMU doesn't need to know where it
>>>>> lives. We have to tell the kernel so it can map its bits of
>>>>> registers in at the right place, that's all.
>>>> 
>>>> The kvm-vgic code in QEMU needs to tell the kernel, no? For
>>>> that, it needs to know what to tell the kernel.
>>> 
>>> No. As I explained earlier, all the kvm-vgic code needs to do
>>> is call kvm_arm_register_device(). That code in kvm.c then takes
>>> care of telling the kernel. hw/kvm/arm_gic.c itself never knows or
>>> needs to know where it's mapped. This is the whole point of the
>>> mechanism involving notifiers.
>> 
>> I fully disagree. Code that talks to the in-kernel device should
>> live in hw/kvm/<device>.c, not in some random target-XXX/kvm.c file.
> 
> The code in kvm.c is entirely generic -- it provides a mechanism
> for a device to say "this memory region is kernel ID X and it will
> want to know where it lives". The kvm.c code will work for any device
> with a memory mapped region, whether it's the GIC or something else.
> 
>> Of course the board defines where the device gets mapped to, but
>> the communication with the in-kernel device bits should really be
>> contained to the device itself.
> 
> You're arguing that every device should implement its own set
> of notifier functions so it can get called back when its memory
> regions are finally mapped, just so it can make a non-device-specific
> KVM ioctl? The obvious thing to do is abstract that functionality
> out.

What I'm arguing is that every device should look as if it was a QEMU device. Devices that happen to live in KVM, should still make a significant effort to expose themselves to the board model as if they were QEMU devices.

So yes, I think the device model should at least register the memory listener, because only the device model knows what its memory regions would map to in KVM's world. Not all devices have a single flat region. Some have more than one.

Whether we have a helper function in (generic) kvm.c that can call a (generic) ioctl to set a device's region X is a different matter. I'd be open to that if it makes sense.

> 
>> So this is the function that gets invoked on ARM:
>> 
>> static void kvm_arm_machine_init_done(Notifier *notifier, void *data)
>> {
>>    KVMDevice *kd, *tkd;
>> 
>>    memory_listener_unregister(&devlistener);
>>    QSLIST_FOREACH_SAFE(kd, &kvm_devices_head, entries, tkd) {
>>        if (kd->kda.addr != -1) {
>>            if (kvm_vm_ioctl(kvm_state, KVM_ARM_SET_DEVICE_ADDR,
>>                             &kd->kda) < 0) {
>>                fprintf(stderr, "KVM_ARM_SET_DEVICE_ADDRESS failed: %s\n",
>>                        strerror(errno));
>>                abort();
>>            }
>>        }
>>        g_free(kd);
>>    }
>> }
>> 
>> This only goes one level deep, right? So if you ever have to nest the
>> VGIC inside of another MemoryRegion, this will break, right?
> 
> We already nest the VGIC inside another memory region (the a15mpcore
> container), and it works fine. This function is just iterating through
> "everything any device asked me to tell the kernel about".

So kda is the real physical offset? I'm having a hard time reading that code :). According to this function:

static void kvm_arm_devlistener_add(MemoryListener *listener,
                                    MemoryRegionSection *section)
{
    KVMDevice *kd;

    QSLIST_FOREACH(kd, &kvm_devices_head, entries) {
        if (section->mr == kd->mr) {
            kd->kda.addr = section->offset_within_address_space;
        }
    }
}

it's only the offset within its parent region, which would mean it's broken, no?


Alex
Peter Maydell - March 21, 2013, 11:32 a.m.
On 21 March 2013 11:29, Alexander Graf <agraf@suse.de> wrote:
> On 21.03.2013, at 12:22, Peter Maydell wrote:
>> We already nest the VGIC inside another memory region (the a15mpcore
>> container), and it works fine. This function is just iterating through
>> "everything any device asked me to tell the kernel about".
>
> So kda is the real physical offset? I'm having a hard time reading that code :). According to this function:
>
> static void kvm_arm_devlistener_add(MemoryListener *listener,
>                                     MemoryRegionSection *section)
> {
>     KVMDevice *kd;
>
>     QSLIST_FOREACH(kd, &kvm_devices_head, entries) {
>         if (section->mr == kd->mr) {
>             kd->kda.addr = section->offset_within_address_space;
>         }
>     }
> }
>
> it's only the offset within its parent region, which would mean it's broken, no?

Address spaces are not the same thing as memory regions :-)
The only address space involved here is the system address space.
(As I say, we currently assume we only get mapped into one address
space, but that could be fixed if necessary.)

-- PMM
Alexander Graf - March 21, 2013, 11:38 a.m.
On 21.03.2013, at 12:32, Peter Maydell wrote:

> On 21 March 2013 11:29, Alexander Graf <agraf@suse.de> wrote:
>> On 21.03.2013, at 12:22, Peter Maydell wrote:
>>> We already nest the VGIC inside another memory region (the a15mpcore
>>> container), and it works fine. This function is just iterating through
>>> "everything any device asked me to tell the kernel about".
>> 
>> So kda is the real physical offset? I'm having a hard time reading that code :). According to this function:
>> 
>> static void kvm_arm_devlistener_add(MemoryListener *listener,
>>                                    MemoryRegionSection *section)
>> {
>>    KVMDevice *kd;
>> 
>>    QSLIST_FOREACH(kd, &kvm_devices_head, entries) {
>>        if (section->mr == kd->mr) {
>>            kd->kda.addr = section->offset_within_address_space;
>>        }
>>    }
>> }
>> 
>> it's only the offset within its parent region, which would mean it's broken, no?
> 
> Address spaces are not the same thing as memory regions :-)
> The only address space involved here is the system address space.
> (As I say, we currently assume we only get mapped into one address
> space, but that could be fixed if necessary.)

Interesting. Oh well, I'll leave that one to Scott to figure out ;).

So what if I want to write an in-kernel IDE PIO accelerator? Or even better yet: An AHCI accelerator that has one MMIO BAR and another PIO BAR that can be remapped by the guest at any time?

The distinction on whether a region is handled by KVM really needs to be done by the device model.


Alex
Peter Maydell - March 21, 2013, 11:44 a.m.
On 21 March 2013 11:38, Alexander Graf <agraf@suse.de> wrote:
>
> On 21.03.2013, at 12:32, Peter Maydell wrote:
>
>> On 21 March 2013 11:29, Alexander Graf <agraf@suse.de> wrote:
>>> On 21.03.2013, at 12:22, Peter Maydell wrote:
>>>> We already nest the VGIC inside another memory region (the a15mpcore
>>>> container), and it works fine. This function is just iterating through
>>>> "everything any device asked me to tell the kernel about".
>>>
>>> So kda is the real physical offset? I'm having a hard time reading that code :). According to this function:
>>>
>>> static void kvm_arm_devlistener_add(MemoryListener *listener,
>>>                                    MemoryRegionSection *section)
>>> {
>>>    KVMDevice *kd;
>>>
>>>    QSLIST_FOREACH(kd, &kvm_devices_head, entries) {
>>>        if (section->mr == kd->mr) {
>>>            kd->kda.addr = section->offset_within_address_space;
>>>        }
>>>    }
>>> }
>>>
>>> it's only the offset within its parent region, which would mean it's broken, no?
>>
>> Address spaces are not the same thing as memory regions :-)
>> The only address space involved here is the system address space.
>> (As I say, we currently assume we only get mapped into one address
>> space, but that could be fixed if necessary.)
>
> Interesting. Oh well, I'll leave that one to Scott to figure out ;).
>
> So what if I want to write an in-kernel IDE PIO accelerator?

Have the QEMU end of that device call (your equivalent of)
kvm_arm_register_device(), and provide a 'reserved' mmio region to
its users; the kernel end implements the standard 'tell me where I live'
ioctl; that's it.

> Or even better yet: An AHCI accelerator that has one MMIO BAR and
> another PIO BAR that can be remapped by the guest at any time?

Guest remappable KVM regions would require enhancements, yes (it's
not like we have an existing mechanism for doing that on any
architecture at the moment). The principle of implementing the
mechanics of this in common code still holds, probably even more
so for the increased complexity.

> The distinction on whether a region is handled by KVM really needs
> to be done by the device model.

It is -- the device model is what calls kvm_arm_register_device().
It's just the mechanics of "how do we tell the kernel the right
address for this region at the point when we know it" that are
handled in kvm.c.

-- PMM
Alexander Graf - March 21, 2013, 11:49 a.m.
On 21.03.2013, at 12:44, Peter Maydell wrote:

> On 21 March 2013 11:38, Alexander Graf <agraf@suse.de> wrote:
>> 
>> On 21.03.2013, at 12:32, Peter Maydell wrote:
>> 
>>> On 21 March 2013 11:29, Alexander Graf <agraf@suse.de> wrote:
>>>> On 21.03.2013, at 12:22, Peter Maydell wrote:
>>>>> We already nest the VGIC inside another memory region (the a15mpcore
>>>>> container), and it works fine. This function is just iterating through
>>>>> "everything any device asked me to tell the kernel about".
>>>> 
>>>> So kda is the real physical offset? I'm having a hard time reading that code :). According to this function:
>>>> 
>>>> static void kvm_arm_devlistener_add(MemoryListener *listener,
>>>>                                   MemoryRegionSection *section)
>>>> {
>>>>   KVMDevice *kd;
>>>> 
>>>>   QSLIST_FOREACH(kd, &kvm_devices_head, entries) {
>>>>       if (section->mr == kd->mr) {
>>>>           kd->kda.addr = section->offset_within_address_space;
>>>>       }
>>>>   }
>>>> }
>>>> 
>>>> it's only the offset within its parent region, which would mean it's broken, no?
>>> 
>>> Address spaces are not the same thing as memory regions :-)
>>> The only address space involved here is the system address space.
>>> (As I say, we currently assume we only get mapped into one address
>>> space, but that could be fixed if necessary.)
>> 
>> Interesting. Oh well, I'll leave that one to Scott to figure out ;).
>> 
>> So what if I want to write an in-kernel IDE PIO accelerator?
> 
> Have the QEMU end of that device call (your equivalent of)
> kvm_arm_register_device(), and provide a 'reserved' mmio region to
> its users; the kernel end implements the standard 'tell me where I live'
> ioctl; that's it.
> 
>> Or even better yet: An AHCI accelerator that has one MMIO BAR and
>> another PIO BAR that can be remapped by the guest at any time?
> 
> Guest remappable KVM regions would require enhancements, yes (it's
> not like we have an existing mechanism for doing that on any
> architecture at the moment). The principle of implementing the
> mechanics of this in common code still holds, probably even more
> so for the increased complexity.
> 
>> The distinction on whether a region is handled by KVM really needs
>> to be done by the device model.
> 
> It is -- the device model is what calls kvm_arm_register_device().
> It's just the mechanics of "how do we tell the kernel the right
> address for this region at the point when we know it" that are
> handled in kvm.c.

I think I'm slowly grasping what you're aiming at :). Ok, that works. You do actually do the listener in the device model, just that you pass code responsibility over to kvm.c.

That's perfectly valid and sounds like a good model that Scott probably wants to follow as well :).


Alex
Alexander Graf - March 21, 2013, 11:51 a.m.
On 21.03.2013, at 12:49, Alexander Graf wrote:

> 
> On 21.03.2013, at 12:44, Peter Maydell wrote:
> 
>> On 21 March 2013 11:38, Alexander Graf <agraf@suse.de> wrote:
>>> 
>>> On 21.03.2013, at 12:32, Peter Maydell wrote:
>>> 
>>>> On 21 March 2013 11:29, Alexander Graf <agraf@suse.de> wrote:
>>>>> On 21.03.2013, at 12:22, Peter Maydell wrote:
>>>>>> We already nest the VGIC inside another memory region (the a15mpcore
>>>>>> container), and it works fine. This function is just iterating through
>>>>>> "everything any device asked me to tell the kernel about".
>>>>> 
>>>>> So kda is the real physical offset? I'm having a hard time reading that code :). According to this function:
>>>>> 
>>>>> static void kvm_arm_devlistener_add(MemoryListener *listener,
>>>>>                                  MemoryRegionSection *section)
>>>>> {
>>>>>  KVMDevice *kd;
>>>>> 
>>>>>  QSLIST_FOREACH(kd, &kvm_devices_head, entries) {
>>>>>      if (section->mr == kd->mr) {
>>>>>          kd->kda.addr = section->offset_within_address_space;
>>>>>      }
>>>>>  }
>>>>> }
>>>>> 
>>>>> it's only the offset within its parent region, which would mean it's broken, no?
>>>> 
>>>> Address spaces are not the same thing as memory regions :-)
>>>> The only address space involved here is the system address space.
>>>> (As I say, we currently assume we only get mapped into one address
>>>> space, but that could be fixed if necessary.)
>>> 
>>> Interesting. Oh well, I'll leave that one to Scott to figure out ;).
>>> 
>>> So what if I want to write an in-kernel IDE PIO accelerator?
>> 
>> Have the QEMU end of that device call (your equivalent of)
>> kvm_arm_register_device(), and provide a 'reserved' mmio region to
>> its users; the kernel end implements the standard 'tell me where I live'
>> ioctl; that's it.
>> 
>>> Or even better yet: An AHCI accelerator that has one MMIO BAR and
>>> another PIO BAR that can be remapped by the guest at any time?
>> 
>> Guest remappable KVM regions would require enhancements, yes (it's
>> not like we have an existing mechanism for doing that on any
>> architecture at the moment). The principle of implementing the
>> mechanics of this in common code still holds, probably even more
>> so for the increased complexity.
>> 
>>> The distinction on whether a region is handled by KVM really needs
>>> to be done by the device model.
>> 
>> It is -- the device model is what calls kvm_arm_register_device().
>> It's just the mechanics of "how do we tell the kernel the right
>> address for this region at the point when we know it" that are
>> handled in kvm.c.
> 
> I think I'm slowly grasping what you're aiming at :). Ok, that works. You do actually do the listener in the device model, just that you pass code responsibility over to kvm.c.
> 
> That's perfectly valid and sounds like a good model that Scott probably wants to follow as well :).

s/follow/evaluate/ :).

The currently proposed device api doesn't have a generic notion of device regions. Regions are a per-device property, because a single device can have multiple regions.

However, maybe with a bit of brainstorming we could come up with a reasonably generic scheme.


Alex
Peter Maydell - March 21, 2013, 11:53 a.m.
On 21 March 2013 11:49, Alexander Graf <agraf@suse.de> wrote:
>
> On 21.03.2013, at 12:44, Peter Maydell wrote:
>> It is -- the device model is what calls kvm_arm_register_device().
>> It's just the mechanics of "how do we tell the kernel the right
>> address for this region at the point when we know it" that are
>> handled in kvm.c.
>
> I think I'm slowly grasping what you're aiming at :). Ok, that
> works. You do actually do the listener in the device model, just
> that you pass code responsibility over to kvm.c.
>
> That's perfectly valid and sounds like a good model that Scott
> probably wants to follow as well :).

Yep. We were actually originally going to make the device ioctl
a generic one, not an ARM one, because there really isn't anything
ARM specific about it. We should probably move the code from
target-arm/kvm.c into kvm-all.c with an arch hook to specify
the ioctl to use (same as irq_set_ioctl) if you want to do the
same approach with PPC.

Re multiple regions: yes, the VGIC has several. We just divide
the u32 ID into two halves, one for a device ID and one for
a region ID for that device.

-- PMM
Peter Maydell - March 22, 2013, 1:08 p.m.
On 21 March 2013 22:43, Scott Wood <scottwood@freescale.com> wrote:
> What if the update is to a parent memory region, not to the one directly
> associated with the device?
>
> Or does add() get called for all child regions (recursively) in such cases?

The memory API flattens the tree of memory regions down into a flat
view of the address space. These callbacks get called for the
final flattened view (so you'll never see a pure container in the
callback, only leaves). The callbacks happen for every region which
appears in the address space, in linear order. When an update happens
memory.c identifies the changes between the old flat view and the
new one and calls callbacks appropriately. This code isn't the
first use of the memory API listeners, so it's all well-tested code.

>> However, maybe with a bit of brainstorming we could come up with a
>> reasonably generic scheme.

> In the kernel API?  Or do you mean a generic scheme within QEMU that encodes
> any reasonably expected mechanism for setting the device adress (e.g. assume
> that it is either a 64-bit attribute, or uses the legacy ARM API), or
> perhaps a callback into device code?
>
> The MPIC's memory listener isn't that much code... I'm not sure
> there's a great need for a central KVM registry.

Well, nor is the ARM memory listener, but why have two bits of
code doing the same thing when you could have one?

-- PMM
Peter Maydell - March 23, 2013, 11:24 a.m.
On 22 March 2013 22:05, Scott Wood <scottwood@freescale.com> wrote:
> On 03/22/2013 08:08:57 AM, Peter Maydell wrote:
>> The memory API flattens the tree of memory regions down into a flat
>> view of the address space. These callbacks get called for the
>> final flattened view (so you'll never see a pure container in the
>> callback, only leaves). The callbacks happen for every region which
>> appears in the address space, in linear order. When an update happens
>> memory.c identifies the changes between the old flat view and the
>> new one and calls callbacks appropriately.
>
> OK, so .add and .del will be sufficient to capture any manipulation that
> would affect whether and where the region we care about is mapped?

Yes. Note that if the board (brokenly) maps the region so it is
'hidden' by another region, this manifests as a .del [since it
is no longer accessible]. Also I think if the board maps something
small on top and in the middle of the region you get an add for
each of the partially visible fragments. Personally I'm happy to
not worry about either of these cases on the basis that they would
be board model bugs.

>> This code isn't the
>> first use of the memory API listeners, so it's all well-tested code.
>
>
> Sure, I'm not suggesting the code doesn't work -- just trying to understand
> how, so I know I'm using it properly.  The implementation is a bit opaque
> (to me at least), and the listener callbacks aren't documented the way the
> normal API functions are.

Yeah, it would I guess be good to add doc comments for all the fields
in struct MemoryListener describing the semantics of the callbacks.

>> > The MPIC's memory listener isn't that much code... I'm not sure
>> > there's a great need for a central KVM registry.
>>
>> Well, nor is the ARM memory listener, but why have two bits of
>> code doing the same thing when you could have one?
>
> They're not doing quite the same thing, though, and the effort required to
> unify them is non-zero.  The two main issues are the way that the address is
> communicated to KVM, and the ability to change the mapping after the guest
> starts.

Ah, guest-programmable mappings are a real use case and not a hypothetical?
Do we run into synchronisation issues with making sure that QEMU and
the kernel both agree simultaneously about where the mapping is?
Can the mapping be different between different CPU cores? [let's
hope not :-)] Is the mapping controlled by a register within the
mapping itself, or is there some separate non-moving register which
defines the location of the mappable registers?

thanks
-- PMM

Patch

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2322732..b800391 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -892,6 +892,15 @@  void *address_space_map(AddressSpace *as, hwaddr addr,
 void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
                          int is_write, hwaddr access_len);
 
+/* memory_region_to_address: Find the full address of the start of the
+ *      given #MemoryRegion, ignoring aliases.  There is no guarantee
+ *      that the #MemoryRegion is actually visible at this address, if
+ *      there are overlapping regions.
+ *
+ * @mr: #MemoryRegion being queried
+ * @asp: if non-NULL, returns the #AddressSpace @mr is mapped in, if any
+ */
+hwaddr memory_region_to_address(MemoryRegion *mr, AddressSpace **asp);
 
 #endif
 
diff --git a/memory.c b/memory.c
index cd7d5e0..0099f12 100644
--- a/memory.c
+++ b/memory.c
@@ -453,21 +453,51 @@  const IORangeOps memory_region_iorange_ops = {
     .destructor = memory_region_iorange_destructor,
 };
 
-static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
+static AddressSpace *memory_region_root_to_address_space(MemoryRegion *mr)
 {
     AddressSpace *as;
 
-    while (mr->parent) {
-        mr = mr->parent;
-    }
     QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
         if (mr == as->root) {
             return as;
         }
     }
+
+    return NULL;
+}
+
+static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
+{
+    AddressSpace *as;
+
+    while (mr->parent) {
+        mr = mr->parent;
+    }
+
+    as = memory_region_root_to_address_space(mr);
+    if (as) {
+        return as;
+    }
+
     abort();
 }
 
+hwaddr memory_region_to_address(MemoryRegion *mr, AddressSpace **asp)
+{
+    hwaddr addr = mr->addr;
+
+    while (mr->parent) {
+        mr = mr->parent;
+        addr += mr->addr;
+    }
+
+    if (asp) {
+        *asp = memory_region_root_to_address_space(mr);
+    }
+
+    return addr;
+}
+
 /* Render a memory region into the global view.  Ranges in @view obscure
  * ranges in @mr.
  */