Patchwork [v5,01/12] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

login
register
mail settings
Submitter Christoffer Dall
Date Jan. 8, 2013, 6:41 p.m.
Message ID <20130108184130.46558.60819.stgit@ubuntu>
Download mbox | patch
Permalink /patch/210476/
State New
Headers show

Comments

Christoffer Dall - Jan. 8, 2013, 6:41 p.m.
On ARM (and possibly other architectures) some bits are specific to the
model being emulated for the guest and user space needs a way to tell
the kernel about those bits.  An example is mmio device base addresses,
where KVM must know the base address for a given device to properly
emulate mmio accesses within a certain address range or directly map a
device with virtualiation extensions into the guest address space.

We try to make this API slightly more generic than for our specific use,
but so far only the VGIC uses this feature.

Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 Documentation/virtual/kvm/api.txt |   37 +++++++++++++++++++++++++++++++++++++
 arch/arm/include/uapi/asm/kvm.h   |   13 +++++++++++++
 arch/arm/kvm/arm.c                |   23 ++++++++++++++++++++++-
 include/uapi/linux/kvm.h          |    8 ++++++++
 4 files changed, 80 insertions(+), 1 deletion(-)
Scott Wood - Jan. 8, 2013, 10:36 p.m.
On 01/08/2013 12:41:30 PM, Christoffer Dall wrote:
> On ARM (and possibly other architectures) some bits are specific to  
> the
> model being emulated for the guest and user space needs a way to tell
> the kernel about those bits.  An example is mmio device base  
> addresses,
> where KVM must know the base address for a given device to properly
> emulate mmio accesses within a certain address range or directly map a
> device with virtualiation extensions into the guest address space.
> 
> We try to make this API slightly more generic than for our specific  
> use,
> but so far only the VGIC uses this feature.
> 
> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
> ---
>  Documentation/virtual/kvm/api.txt |   37  
> +++++++++++++++++++++++++++++++++++++
>  arch/arm/include/uapi/asm/kvm.h   |   13 +++++++++++++
>  arch/arm/kvm/arm.c                |   23 ++++++++++++++++++++++-
>  include/uapi/linux/kvm.h          |    8 ++++++++
>  4 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt  
> b/Documentation/virtual/kvm/api.txt
> index 38066a7a..668956f 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2206,6 +2206,43 @@ This ioctl returns the guest registers that  
> are supported for the
>  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
> 
> 
> +4.80 KVM_SET_DEVICE_ADDRESS
> +
> +Capability: KVM_CAP_SET_DEVICE_ADDRESS
> +Architectures: arm
> +Type: vm ioctl
> +Parameters: struct kvm_device_address (in)
> +Returns: 0 on success, -1 on error
> +Errors:
> +  ENODEV: The device id is unknown
> +  ENXIO:  Device not supported on current system
> +  EEXIST: Address already set
> +  E2BIG:  Address outside guest physical address space
> +
> +struct kvm_device_address {
> +	__u64 id;
> +	__u64 addr;
> +};

What about this is really specific to addresses?  Can't we set other  
device parameters this way?

Sort of like a device equivalent of PPC's one-reg interface.

-Scott
Christoffer Dall - Jan. 8, 2013, 11:17 p.m.
On Tue, Jan 8, 2013 at 5:36 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 01/08/2013 12:41:30 PM, Christoffer Dall wrote:
>>
>> On ARM (and possibly other architectures) some bits are specific to the
>> model being emulated for the guest and user space needs a way to tell
>> the kernel about those bits.  An example is mmio device base addresses,
>> where KVM must know the base address for a given device to properly
>> emulate mmio accesses within a certain address range or directly map a
>> device with virtualiation extensions into the guest address space.
>>
>> We try to make this API slightly more generic than for our specific use,
>> but so far only the VGIC uses this feature.
>>
>> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
>> ---
>>  Documentation/virtual/kvm/api.txt |   37
>> +++++++++++++++++++++++++++++++++++++
>>  arch/arm/include/uapi/asm/kvm.h   |   13 +++++++++++++
>>  arch/arm/kvm/arm.c                |   23 ++++++++++++++++++++++-
>>  include/uapi/linux/kvm.h          |    8 ++++++++
>>  4 files changed, 80 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index 38066a7a..668956f 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2206,6 +2206,43 @@ This ioctl returns the guest registers that are
>> supported for the
>>  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
>>
>>
>> +4.80 KVM_SET_DEVICE_ADDRESS
>> +
>> +Capability: KVM_CAP_SET_DEVICE_ADDRESS
>> +Architectures: arm
>> +Type: vm ioctl
>> +Parameters: struct kvm_device_address (in)
>> +Returns: 0 on success, -1 on error
>> +Errors:
>> +  ENODEV: The device id is unknown
>> +  ENXIO:  Device not supported on current system
>> +  EEXIST: Address already set
>> +  E2BIG:  Address outside guest physical address space
>> +
>> +struct kvm_device_address {
>> +       __u64 id;
>> +       __u64 addr;
>> +};
>
>
> What about this is really specific to addresses?  Can't we set other device
> parameters this way?
>
> Sort of like a device equivalent of PPC's one-reg interface.
>
This has been discussed a number of times, and one or the other there
is a need for userspace to tell KVM to present memory-mapped devices
at a given address. It was also considered to make this specific to
irqchip initialization, but irqchips are different and a lot of that
code is x86-specific, so that approach was discarded.

This *could* look something like this:

struct kvm_device_param {
    u64 dev_id;
    u64 param_id;
    u64 value;
};

but that has less clear, or at least less specific, semantics.

-Christoffer
Christoffer Dall - Jan. 8, 2013, 11:49 p.m.
On Tue, Jan 8, 2013 at 6:29 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 01/08/2013 05:17:05 PM, Christoffer Dall wrote:
>>
>> On Tue, Jan 8, 2013 at 5:36 PM, Scott Wood <scottwood@freescale.com>
>> wrote:
>> > On 01/08/2013 12:41:30 PM, Christoffer Dall wrote:
>> >> +struct kvm_device_address {
>> >> +       __u64 id;
>> >> +       __u64 addr;
>> >> +};
>> >
>> >
>> > What about this is really specific to addresses?  Can't we set other
>> > device
>> > parameters this way?
>> >
>> > Sort of like a device equivalent of PPC's one-reg interface.
>> >
>> This has been discussed a number of times,
>
>
> Sorry, this patch was just pointed out to me today.  I googled the patch
> title but couldn't find this discussion.
>
>

I believe it was mainly discussed at the KVM Forum in person.

>> and one or the other there
>> is a need for userspace to tell KVM to present memory-mapped devices
>> at a given address. It was also considered to make this specific to
>> irqchip initialization, but irqchips are different and a lot of that
>> code is x86-specific, so that approach was discarded.
>>
>> This *could* look something like this:
>>
>> struct kvm_device_param {
>>     u64 dev_id;
>>     u64 param_id;
>>     u64 value;
>> };
>>
>> but that has less clear, or at least less specific, semantics.
>
>
> Why is it less clear?  You need to have device-specific documentation for
> what "id" means, so why not also an enumeration of "param"s?  Or just keep
> it as is, and rename "address" to "value".  Whether "dev" and "param" are
> combined is orthogonal from whether it's used for non-address things.

less clear in the sense that you have to look at more code to see what
it does. I'm not saying that it's too unclear, at all, I'm actually
fine with it, but to make my point, we can make an ioctl that's called
do_something() that takes a struct with val0, val1, val2, val3, ...

>
> If you leave it as "address", either we'll have it being used for
> non-address things regardless of the name ("Not a typewriter!"), or there'll
> end up being yet more unnecessary ioctls, or device-specific things will end
> up getting shoved into CPU interfaces such as one-reg.  For example, on MPIC
> we need to be able to specify the version of the chip to emulate in addition
> to the address at which it lives.
>
> Also, why is it documented as an "arm" interface?  Shouldn't it be a generic
> interface, with other architectures currently not implementing any IDs?
> What in the kvm_arch_vm_ioctl() wrapper is arm-specific?
>
As I remember the argument for keeping this the point was that there
were other preferred methods for other archs to do this, and that ARM
was the only platform that had this explicit need, but maybe I'm
making this up.

I'll let Peter and Alex respond this as well, and if they're fine with
changing it to what I proposed above, then let's do that, and we can
make it a non arm-specific interface.

-Christoffer
Alexander Graf - Jan. 9, 2013, 10:02 a.m.
Am 09.01.2013 um 00:49 schrieb Christoffer Dall <c.dall@virtualopensystems.com>:

> On Tue, Jan 8, 2013 at 6:29 PM, Scott Wood <scottwood@freescale.com> wrote:
>> On 01/08/2013 05:17:05 PM, Christoffer Dall wrote:
>>> 
>>> On Tue, Jan 8, 2013 at 5:36 PM, Scott Wood <scottwood@freescale.com>
>>> wrote:
>>>> On 01/08/2013 12:41:30 PM, Christoffer Dall wrote:
>>>>> +struct kvm_device_address {
>>>>> +       __u64 id;
>>>>> +       __u64 addr;
>>>>> +};
>>>> 
>>>> 
>>>> What about this is really specific to addresses?  Can't we set other
>>>> device
>>>> parameters this way?
>>>> 
>>>> Sort of like a device equivalent of PPC's one-reg interface.
>>>> 
>>> This has been discussed a number of times,
>> 
>> 
>> Sorry, this patch was just pointed out to me today.  I googled the patch
>> title but couldn't find this discussion.
>> 
>> 
> 
> I believe it was mainly discussed at the KVM Forum in person.
> 
>>> and one or the other there
>>> is a need for userspace to tell KVM to present memory-mapped devices
>>> at a given address. It was also considered to make this specific to
>>> irqchip initialization, but irqchips are different and a lot of that
>>> code is x86-specific, so that approach was discarded.
>>> 
>>> This *could* look something like this:
>>> 
>>> struct kvm_device_param {
>>>    u64 dev_id;
>>>    u64 param_id;
>>>    u64 value;
>>> };
>>> 
>>> but that has less clear, or at least less specific, semantics.
>> 
>> 
>> Why is it less clear?  You need to have device-specific documentation for
>> what "id" means, so why not also an enumeration of "param"s?  Or just keep
>> it as is, and rename "address" to "value".  Whether "dev" and "param" are
>> combined is orthogonal from whether it's used for non-address things.
> 
> less clear in the sense that you have to look at more code to see what
> it does. I'm not saying that it's too unclear, at all, I'm actually
> fine with it, but to make my point, we can make an ioctl that's called
> do_something() that takes a struct with val0, val1, val2, val3, ...
> 
>> 
>> If you leave it as "address", either we'll have it being used for
>> non-address things regardless of the name ("Not a typewriter!"), or there'll
>> end up being yet more unnecessary ioctls, or device-specific things will end
>> up getting shoved into CPU interfaces such as one-reg.  For example, on MPIC
>> we need to be able to specify the version of the chip to emulate in addition
>> to the address at which it lives.
>> 
>> Also, why is it documented as an "arm" interface?  Shouldn't it be a generic
>> interface, with other architectures currently not implementing any IDs?
>> What in the kvm_arch_vm_ioctl() wrapper is arm-specific?
>> 
> As I remember the argument for keeping this the point was that there
> were other preferred methods for other archs to do this, and that ARM
> was the only platform that had this explicit need, but maybe I'm
> making this up.
> 
> I'll let Peter and Alex respond this as well, and if they're fine with
> changing it to what I proposed above, then let's do that, and we can
> make it a non arm-specific interface.

I think we should make thus at least potentially generic. In fact, I wouldn't even mind calling it DEV_REG with the same semantics as ONE_REG, just that it also gets a unique dev id that gets created during in-kernel device creation and that it's a vm ioctl.

That way we wouldn't block our path to create two in-kernel irqchips one day.


Alex
Peter Maydell - Jan. 9, 2013, 2:48 p.m.
On 9 January 2013 10:02, Alexander Graf <agraf@suse.de> wrote:
> I think we should make thus at least potentially generic. In fact, I wouldn't even
> mind calling it DEV_REG with the same semantics as ONE_REG, just that it also
> gets a unique dev id that gets created during in-kernel device creation and that
> it's a vm ioctl.

Well, we might want a DEV_REG, but you might as well just make ONE_REG
OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
device registers. ONE_REG already supports dividing up the ID space so
you can say which device or whatever is being used, because we had
things like GIC registers in mind when we put that API together.

However, this shouldn't be DEV_REG, because this isn't actually setting state
in the irqchip device, it's configuring the kernel's part of the system model
[compare wiring up irq routing, which also has a custom ioctl rather than a
generic one]. As such it definitely needs to happen only before the VM is
actually started and not during VM execution, unlike a DEV_REG which would
presumably be generally valid.

> That way we wouldn't block our path to create two in-kernel irqchips one day.

Er, SET_DEVICE_ADDRESS doesn't block us doing that; that's why it has
an ID parameter.

The discussion at the KVM forum, as I recall it was basically:
 (a) some other ppc irqchips also want to set the base address for where
their memory mapped registers live, so this isn't a totally ARM specific
weirdness
 (b) we didn't need to tangle up and delay the KVM ARM work with a vague
and unspecified desire for general configurability

-- PMM
Alexander Graf - Jan. 9, 2013, 2:58 p.m.
On 09.01.2013, at 15:48, Peter Maydell wrote:

> On 9 January 2013 10:02, Alexander Graf <agraf@suse.de> wrote:
>> I think we should make thus at least potentially generic. In fact, I wouldn't even
>> mind calling it DEV_REG with the same semantics as ONE_REG, just that it also
>> gets a unique dev id that gets created during in-kernel device creation and that
>> it's a vm ioctl.
> 
> Well, we might want a DEV_REG, but you might as well just make ONE_REG
> OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
> device registers. ONE_REG already supports dividing up the ID space so
> you can say which device or whatever is being used, because we had
> things like GIC registers in mind when we put that API together.

ONE_REG's address space today doesn't include a field for the target device, because that's already predetermined by the fd you run it on. Hence my suggestion to add it as separate field, because then we keep the id space mask-free.

> However, this shouldn't be DEV_REG, because this isn't actually setting state
> in the irqchip device, it's configuring the kernel's part of the system model
> [compare wiring up irq routing, which also has a custom ioctl rather than a
> generic one]. As such it definitely needs to happen only before the VM is
> actually started and not during VM execution, unlike a DEV_REG which would
> presumably be generally valid.

The same can be true for ONE_REG/SET_SREGS. On PPC we support setting the PVR (CPU type) via SREGS because ONE_REG only came later. Setting the CPU type only makes sense before you first start using a vcpu. After that point it should be forbidden.

So in a way, the irqchip memory address location is a device register, as it has all the properties that a register on that device would have. It is

  * per device
  * one id for one purpose
  * may or may not be writable during certain times

We would also keep the gates open to get a new register id one day for a second address. Imagine an interrupt controller that splits its address mappings into "PIC base registers" and "MSI target registers", though the MSI registers logically belong to the PIC. Then the ioctl as designed here gets stuck too.

> 
>> That way we wouldn't block our path to create two in-kernel irqchips one day.
> 
> Er, SET_DEVICE_ADDRESS doesn't block us doing that; that's why it has
> an ID parameter.
> 
> The discussion at the KVM forum, as I recall it was basically:
> (a) some other ppc irqchips also want to set the base address for where
> their memory mapped registers live, so this isn't a totally ARM specific
> weirdness

Yes.

> (b) we didn't need to tangle up and delay the KVM ARM work with a vague
> and unspecified desire for general configurability

Yeah, that was the basic idea. Considering that the patch set hasn't been going in for another 2 months after that discussion indicates that this isn't too much of an issue though :).

In fact, I do recall myself pointing out that we need some padding in that ioctl to accomodate for later usages like the ones above. Then we could later rename (or copy name) it and everyone'd be happy.


Alex
Peter Maydell - Jan. 9, 2013, 3:11 p.m.
On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>
> On 09.01.2013, at 15:48, Peter Maydell wrote:
>
>> On 9 January 2013 10:02, Alexander Graf <agraf@suse.de> wrote:
>>> I think we should make thus at least potentially generic. In fact, I wouldn't even
>>> mind calling it DEV_REG with the same semantics as ONE_REG, just that it also
>>> gets a unique dev id that gets created during in-kernel device creation and that
>>> it's a vm ioctl.
>>
>> Well, we might want a DEV_REG, but you might as well just make ONE_REG
>> OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
>> device registers. ONE_REG already supports dividing up the ID space so
>> you can say which device or whatever is being used, because we had
>> things like GIC registers in mind when we put that API together.
>
> ONE_REG's address space today doesn't include a field for the target device,

There's 16 bits of "register group type".

> because that's already predetermined by the fd you run it on. Hence my
> suggestion to add it as separate field, because then we keep the id space
> mask-free.

Er, it already has masks for the different parts of ID space. That's a feature,
not a bug.

>> However, this shouldn't be DEV_REG, because this isn't actually setting state
>> in the irqchip device, it's configuring the kernel's part of the system model
>> [compare wiring up irq routing, which also has a custom ioctl rather than a
>> generic one]. As such it definitely needs to happen only before the VM is
>> actually started and not during VM execution, unlike a DEV_REG which would
>> presumably be generally valid.
>
> The same can be true for ONE_REG/SET_SREGS. On PPC we support setting
> the PVR (CPU type) via SREGS because ONE_REG only came later. Setting
> the CPU type only makes sense before you first start using a vcpu. After that
> point it should be forbidden.
>
> So in a way, the irqchip memory address location is a device register, as it has
> all the properties that a register on that device would have. It is
>
>   * per device
>   * one id for one purpose
>   * may or may not be writable during certain times

There's a distinction between "things you need to do in machine setup"
and "things you need to propagate for migration". It should be possible
to do a migration by doing a complete copy of all ONE_REG (and DEV_REG
if we have it) state from source to destination, so it needs to be always
possible to write them.

> We would also keep the gates open to get a new register id one day for a second
> address. Imagine an interrupt controller that splits its address mappings into
> "PIC base registers" and "MSI target registers", though the MSI registers logically
> belong to the PIC. Then the ioctl as designed here gets stuck too.

This is exactly what ARM's GIC has already. We have several memory regions
(one for the distributor, one for the cpu interface), and map them with several
calls to SET_DEVICE_ADDRESS.

>> (b) we didn't need to tangle up and delay the KVM ARM work with a vague
>> and unspecified desire for general configurability
>
> Yeah, that was the basic idea. Considering that the patch set hasn't been going
> in for another 2 months after that discussion indicates that this isn't too much of
> an issue though :).

We might get there faster if people didn't keep nitpicking the APIs at the
last minute :-)

-- PMM
Christoffer Dall - Jan. 9, 2013, 3:17 p.m.
On Wed, Jan 9, 2013 at 10:11 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>>
>> On 09.01.2013, at 15:48, Peter Maydell wrote:
>>
>>> On 9 January 2013 10:02, Alexander Graf <agraf@suse.de> wrote:
>>>> I think we should make thus at least potentially generic. In fact, I wouldn't even
>>>> mind calling it DEV_REG with the same semantics as ONE_REG, just that it also
>>>> gets a unique dev id that gets created during in-kernel device creation and that
>>>> it's a vm ioctl.
>>>
>>> Well, we might want a DEV_REG, but you might as well just make ONE_REG
>>> OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
>>> device registers. ONE_REG already supports dividing up the ID space so
>>> you can say which device or whatever is being used, because we had
>>> things like GIC registers in mind when we put that API together.
>>
>> ONE_REG's address space today doesn't include a field for the target device,
>
> There's 16 bits of "register group type".
>
>> because that's already predetermined by the fd you run it on. Hence my
>> suggestion to add it as separate field, because then we keep the id space
>> mask-free.
>
> Er, it already has masks for the different parts of ID space. That's a feature,
> not a bug.
>
>>> However, this shouldn't be DEV_REG, because this isn't actually setting state
>>> in the irqchip device, it's configuring the kernel's part of the system model
>>> [compare wiring up irq routing, which also has a custom ioctl rather than a
>>> generic one]. As such it definitely needs to happen only before the VM is
>>> actually started and not during VM execution, unlike a DEV_REG which would
>>> presumably be generally valid.
>>
>> The same can be true for ONE_REG/SET_SREGS. On PPC we support setting
>> the PVR (CPU type) via SREGS because ONE_REG only came later. Setting
>> the CPU type only makes sense before you first start using a vcpu. After that
>> point it should be forbidden.
>>
>> So in a way, the irqchip memory address location is a device register, as it has
>> all the properties that a register on that device would have. It is
>>
>>   * per device
>>   * one id for one purpose
>>   * may or may not be writable during certain times
>
> There's a distinction between "things you need to do in machine setup"
> and "things you need to propagate for migration". It should be possible
> to do a migration by doing a complete copy of all ONE_REG (and DEV_REG
> if we have it) state from source to destination, so it needs to be always
> possible to write them.
>
>> We would also keep the gates open to get a new register id one day for a second
>> address. Imagine an interrupt controller that splits its address mappings into
>> "PIC base registers" and "MSI target registers", though the MSI registers logically
>> belong to the PIC. Then the ioctl as designed here gets stuck too.
>
> This is exactly what ARM's GIC has already. We have several memory regions
> (one for the distributor, one for the cpu interface), and map them with several
> calls to SET_DEVICE_ADDRESS.
>
>>> (b) we didn't need to tangle up and delay the KVM ARM work with a vague
>>> and unspecified desire for general configurability
>>
>> Yeah, that was the basic idea. Considering that the patch set hasn't been going
>> in for another 2 months after that discussion indicates that this isn't too much of
>> an issue though :).
>
well, now it's actually going in, and it seems like this is the last
issue. Let's hold our breaths here for a second and let me write up a
new device attribute API and send that out, and if you guys can
possibly live with it, then please consider accepting it.

-Christoffer
Alexander Graf - Jan. 9, 2013, 3:20 p.m.
On 09.01.2013, at 16:11, Peter Maydell wrote:

> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>> 
>> On 09.01.2013, at 15:48, Peter Maydell wrote:
>> 
>>> On 9 January 2013 10:02, Alexander Graf <agraf@suse.de> wrote:
>>>> I think we should make thus at least potentially generic. In fact, I wouldn't even
>>>> mind calling it DEV_REG with the same semantics as ONE_REG, just that it also
>>>> gets a unique dev id that gets created during in-kernel device creation and that
>>>> it's a vm ioctl.
>>> 
>>> Well, we might want a DEV_REG, but you might as well just make ONE_REG
>>> OK as a VM ioctl, since there's no reason not to have not-per-cpu but not
>>> device registers. ONE_REG already supports dividing up the ID space so
>>> you can say which device or whatever is being used, because we had
>>> things like GIC registers in mind when we put that API together.
>> 
>> ONE_REG's address space today doesn't include a field for the target device,
> 
> There's 16 bits of "register group type".

That's "register group", not "device id". It's constant per id. If you have an ARM register, it will always stay an ARM register. It's not going to change. But if you reuse those bits for the device id, they are going to change, because a device id is similar to a handle that should be returned by the device create ioctl.

> 
>> because that's already predetermined by the fd you run it on. Hence my
>> suggestion to add it as separate field, because then we keep the id space
>> mask-free.
> 
> Er, it already has masks for the different parts of ID space. That's a feature,
> not a bug.

All those masks are constant masks and only used to make reading the IDs easier. They never change within a single type of a register.

> 
>>> However, this shouldn't be DEV_REG, because this isn't actually setting state
>>> in the irqchip device, it's configuring the kernel's part of the system model
>>> [compare wiring up irq routing, which also has a custom ioctl rather than a
>>> generic one]. As such it definitely needs to happen only before the VM is
>>> actually started and not during VM execution, unlike a DEV_REG which would
>>> presumably be generally valid.
>> 
>> The same can be true for ONE_REG/SET_SREGS. On PPC we support setting
>> the PVR (CPU type) via SREGS because ONE_REG only came later. Setting
>> the CPU type only makes sense before you first start using a vcpu. After that
>> point it should be forbidden.
>> 
>> So in a way, the irqchip memory address location is a device register, as it has
>> all the properties that a register on that device would have. It is
>> 
>>  * per device
>>  * one id for one purpose
>>  * may or may not be writable during certain times
> 
> There's a distinction between "things you need to do in machine setup"
> and "things you need to propagate for migration". It should be possible
> to do a migration by doing a complete copy of all ONE_REG (and DEV_REG
> if we have it) state from source to destination, so it needs to be always
> possible to write them.

You need to somewhere encode a list of registers you want to synchronize for migration anyways. Don't include this register in that list then.

> 
>> We would also keep the gates open to get a new register id one day for a second
>> address. Imagine an interrupt controller that splits its address mappings into
>> "PIC base registers" and "MSI target registers", though the MSI registers logically
>> belong to the PIC. Then the ioctl as designed here gets stuck too.
> 
> This is exactly what ARM's GIC has already. We have several memory regions
> (one for the distributor, one for the cpu interface), and map them with several
> calls to SET_DEVICE_ADDRESS.

I see.

> 
>>> (b) we didn't need to tangle up and delay the KVM ARM work with a vague
>>> and unspecified desire for general configurability
>> 
>> Yeah, that was the basic idea. Considering that the patch set hasn't been going
>> in for another 2 months after that discussion indicates that this isn't too much of
>> an issue though :).
> 
> We might get there faster if people didn't keep nitpicking the APIs at the
> last minute :-)

I guess we just had too much bad experience with designing horrible APIs in the past on the PPC side ;).


Alex
Marc Zyngier - Jan. 9, 2013, 3:22 p.m.
On Wed, 9 Jan 2013 15:11:39 +0000, Peter Maydell
<peter.maydell@linaro.org>
wrote:
> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>>
>> Yeah, that was the basic idea. Considering that the patch set hasn't
>> been going
>> in for another 2 months after that discussion indicates that this isn't
>> too much of
>> an issue though :).
> 
> We might get there faster if people didn't keep nitpicking the APIs at
the
> last minute :-)

Exactly. We're trying hard to get the damned thing merged, and the
permanent API churn quickly becomes a burden.

My understanding was that a consensus was reached 2 months ago. Has
something fundamentally changed? Something that makes this API invalid? If
not, can we please keep this one?

Thanks,

        M.
Alexander Graf - Jan. 9, 2013, 3:28 p.m.
On 09.01.2013, at 16:22, Marc Zyngier wrote:

> On Wed, 9 Jan 2013 15:11:39 +0000, Peter Maydell
> <peter.maydell@linaro.org>
> wrote:
>> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>>> 
>>> Yeah, that was the basic idea. Considering that the patch set hasn't
>>> been going
>>> in for another 2 months after that discussion indicates that this isn't
>>> too much of
>>> an issue though :).
>> 
>> We might get there faster if people didn't keep nitpicking the APIs at
> the
>> last minute :-)
> 
> Exactly. We're trying hard to get the damned thing merged, and the
> permanent API churn quickly becomes a burden.

As I said earlier, we have had a lot of experience in creating really bad APIs in the past.

But how about making this one specific? Call it KVM_ARM_SET_VGIC_ADDRESS, keep the rest as it is, resend it, and later we can come up with an actually generic interface.


Alex

> 
> My understanding was that a consensus was reached 2 months ago. Has
> something fundamentally changed? Something that makes this API invalid? If
> not, can we please keep this one?
> 
> Thanks,
> 
>        M.
> -- 
> Who you jivin' with that Cosmik Debris?
Marc Zyngier - Jan. 9, 2013, 3:50 p.m.
On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf <agraf@suse.de> wrote:
> On 09.01.2013, at 16:22, Marc Zyngier wrote:
> 
>> On Wed, 9 Jan 2013 15:11:39 +0000, Peter Maydell
>> <peter.maydell@linaro.org>
>> wrote:
>>> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>>>> 
>>>> Yeah, that was the basic idea. Considering that the patch set hasn't
>>>> been going
>>>> in for another 2 months after that discussion indicates that this
isn't
>>>> too much of
>>>> an issue though :).
>>> 
>>> We might get there faster if people didn't keep nitpicking the APIs at
>> the
>>> last minute :-)
>> 
>> Exactly. We're trying hard to get the damned thing merged, and the
>> permanent API churn quickly becomes a burden.
> 
> As I said earlier, we have had a lot of experience in creating really
bad
> APIs in the past.

Is this one really bad? Again, what changed in the meantime that makes you
think this API is wrong?

> But how about making this one specific? Call it
KVM_ARM_SET_VGIC_ADDRESS,
> keep the rest as it is, resend it, and later we can come up with an
> actually generic interface.

Let's pretend you never wrote that, shall we? ;-)

        M.
Alexander Graf - Jan. 9, 2013, 3:56 p.m.
On 09.01.2013, at 16:50, Marc Zyngier wrote:

> On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf <agraf@suse.de> wrote:
>> On 09.01.2013, at 16:22, Marc Zyngier wrote:
>> 
>>> On Wed, 9 Jan 2013 15:11:39 +0000, Peter Maydell
>>> <peter.maydell@linaro.org>
>>> wrote:
>>>> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>>>>> 
>>>>> Yeah, that was the basic idea. Considering that the patch set hasn't
>>>>> been going
>>>>> in for another 2 months after that discussion indicates that this
> isn't
>>>>> too much of
>>>>> an issue though :).
>>>> 
>>>> We might get there faster if people didn't keep nitpicking the APIs at
>>> the
>>>> last minute :-)
>>> 
>>> Exactly. We're trying hard to get the damned thing merged, and the
>>> permanent API churn quickly becomes a burden.
>> 
>> As I said earlier, we have had a lot of experience in creating really
> bad
>> APIs in the past.
> 
> Is this one really bad? Again, what changed in the meantime that makes you
> think this API is wrong?

I complained about it 2 months ago already.

>> But how about making this one specific? Call it
> KVM_ARM_SET_VGIC_ADDRESS,
>> keep the rest as it is, resend it, and later we can come up with an
>> actually generic interface.
> 
> Let's pretend you never wrote that, shall we? ;-)

Why? The worst thing that happened to us so far were interfaces that looked generic and extensible and turned out not to be (see SET_SREGS in the ppc code). We have 2 options to circumvent this:

  1) Commonly agree on an interface that actually is extensible and use it throughout the code
  2) Create a very specific interface for a single use case, keep it implemented "forever" and as soon as we need more, implement a more generic one that goes back to 1

Since the ARM patches have been out of tree for too long, I'm happy with going route 2 until we go back to square 1.


Alex
Marc Zyngier - Jan. 9, 2013, 4:12 p.m.
On 09/01/13 15:56, Alexander Graf wrote:
> 
> On 09.01.2013, at 16:50, Marc Zyngier wrote:
> 
>> On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf <agraf@suse.de> wrote:
>>> On 09.01.2013, at 16:22, Marc Zyngier wrote:
>>>
>>>> On Wed, 9 Jan 2013 15:11:39 +0000, Peter Maydell
>>>> <peter.maydell@linaro.org>
>>>> wrote:
>>>>> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>>>>>>
>>>>>> Yeah, that was the basic idea. Considering that the patch set hasn't
>>>>>> been going
>>>>>> in for another 2 months after that discussion indicates that this
>> isn't
>>>>>> too much of
>>>>>> an issue though :).
>>>>>
>>>>> We might get there faster if people didn't keep nitpicking the APIs at
>>>> the
>>>>> last minute :-)
>>>>
>>>> Exactly. We're trying hard to get the damned thing merged, and the
>>>> permanent API churn quickly becomes a burden.
>>>
>>> As I said earlier, we have had a lot of experience in creating really
>> bad
>>> APIs in the past.
>>
>> Is this one really bad? Again, what changed in the meantime that makes you
>> think this API is wrong?
> 
> I complained about it 2 months ago already.
> 
>>> But how about making this one specific? Call it
>> KVM_ARM_SET_VGIC_ADDRESS,
>>> keep the rest as it is, resend it, and later we can come up with an
>>> actually generic interface.
>>
>> Let's pretend you never wrote that, shall we? ;-)
> 
> Why? The worst thing that happened to us so far were interfaces that looked generic and extensible and turned out not to be (see SET_SREGS in the ppc code). We have 2 options to circumvent this:
> 
>   1) Commonly agree on an interface that actually is extensible and use it throughout the code
>   2) Create a very specific interface for a single use case, keep it implemented "forever" and as soon as we need more, implement a more generic one that goes back to 1
> 
> Since the ARM patches have been out of tree for too long, I'm happy with going route 2 until we go back to square 1.

I really don't want to see that. Either we keep the API as it is, or we
change it for something that is really better and used by other
architectures. No point in turning it upside down if we're the only one
doing it.

Oh, and as we're aiming for 3.9, it'd better be ready soon...

	M.
Christoffer Dall - Jan. 9, 2013, 4:29 p.m.
On Wed, Jan 9, 2013 at 11:12 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 09/01/13 15:56, Alexander Graf wrote:
>>
>> On 09.01.2013, at 16:50, Marc Zyngier wrote:
>>
>>> On Wed, 9 Jan 2013 16:28:03 +0100, Alexander Graf <agraf@suse.de> wrote:
>>>> On 09.01.2013, at 16:22, Marc Zyngier wrote:
>>>>
>>>>> On Wed, 9 Jan 2013 15:11:39 +0000, Peter Maydell
>>>>> <peter.maydell@linaro.org>
>>>>> wrote:
>>>>>> On 9 January 2013 14:58, Alexander Graf <agraf@suse.de> wrote:
>>>>>>>
>>>>>>> Yeah, that was the basic idea. Considering that the patch set hasn't
>>>>>>> been going
>>>>>>> in for another 2 months after that discussion indicates that this
>>> isn't
>>>>>>> too much of
>>>>>>> an issue though :).
>>>>>>
>>>>>> We might get there faster if people didn't keep nitpicking the APIs at
>>>>> the
>>>>>> last minute :-)
>>>>>
>>>>> Exactly. We're trying hard to get the damned thing merged, and the
>>>>> permanent API churn quickly becomes a burden.
>>>>
>>>> As I said earlier, we have had a lot of experience in creating really
>>> bad
>>>> APIs in the past.
>>>
>>> Is this one really bad? Again, what changed in the meantime that makes you
>>> think this API is wrong?
>>
>> I complained about it 2 months ago already.
>>
>>>> But how about making this one specific? Call it
>>> KVM_ARM_SET_VGIC_ADDRESS,
>>>> keep the rest as it is, resend it, and later we can come up with an
>>>> actually generic interface.
>>>
>>> Let's pretend you never wrote that, shall we? ;-)
>>
>> Why? The worst thing that happened to us so far were interfaces that looked generic and extensible and turned out not to be (see SET_SREGS in the ppc code). We have 2 options to circumvent this:
>>
>>   1) Commonly agree on an interface that actually is extensible and use it throughout the code
>>   2) Create a very specific interface for a single use case, keep it implemented "forever" and as soon as we need more, implement a more generic one that goes back to 1
>>
>> Since the ARM patches have been out of tree for too long, I'm happy with going route 2 until we go back to square 1.
>
> I really don't want to see that. Either we keep the API as it is, or we
> change it for something that is really better and used by other
> architectures. No point in turning it upside down if we're the only one
> doing it.
>
> Oh, and as we're aiming for 3.9, it'd better be ready soon...
>
ok, I renamed it to KVM_ARM_SET_DEVICE_ADDR, using the same binary
format, so user space tools remain compatible.

I'll be happy to help out with a more generic API, once the kvm/arm
patches reach upstream.

Thanks so far!
-Christoffer

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 38066a7a..668956f 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2206,6 +2206,43 @@  This ioctl returns the guest registers that are supported for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
+4.80 KVM_SET_DEVICE_ADDRESS
+
+Capability: KVM_CAP_SET_DEVICE_ADDRESS
+Architectures: arm
+Type: vm ioctl
+Parameters: struct kvm_device_address (in)
+Returns: 0 on success, -1 on error
+Errors:
+  ENODEV: The device id is unknown
+  ENXIO:  Device not supported on current system
+  EEXIST: Address already set
+  E2BIG:  Address outside guest physical address space
+
+struct kvm_device_address {
+	__u64 id;
+	__u64 addr;
+};
+
+Specify a device address in the guest's physical address space where guests
+can access emulated or directly exposed devices, which the host kernel needs
+to know about. The id field is an architecture specific identifier for a
+specific device.
+
+ARM divides the id field into two parts, a device id and an address type id
+specific to the individual device.
+
+  bits:  | 63        ...       32 | 31    ...    16 | 15    ...    0 |
+  field: |        0x00000000      |     device id   |  addr type id  |
+
+ARM currently only require this when using the in-kernel GIC support for the
+hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id.  When
+setting the base address for the guest's mapping of the VGIC virtual CPU
+and distributor interface, the ioctl must be called after calling
+KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs.  Calling
+this ioctl twice for any of the base addresses will return -EEXIST.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 73b9615..09911a7 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -65,6 +65,19 @@  struct kvm_regs {
 #define KVM_ARM_TARGET_CORTEX_A15	0
 #define KVM_ARM_NUM_TARGETS		1
 
+/* KVM_SET_DEVICE_ADDRESS ioctl id encoding */
+#define KVM_DEVICE_TYPE_SHIFT		0
+#define KVM_DEVICE_TYPE_MASK		(0xffff << KVM_DEVICE_TYPE_SHIFT)
+#define KVM_DEVICE_ID_SHIFT		16
+#define KVM_DEVICE_ID_MASK		(0xffff << KVM_DEVICE_ID_SHIFT)
+
+/* Supported device IDs */
+#define KVM_ARM_DEVICE_VGIC_V2		0
+
+/* Supported VGIC address types  */
+#define KVM_VGIC_V2_ADDR_TYPE_DIST	0
+#define KVM_VGIC_V2_ADDR_TYPE_CPU	1
+
 struct kvm_vcpu_init {
 	__u32 target;
 	__u32 features[7];
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f42d828..2f39b04 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -165,6 +165,8 @@  int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_COALESCED_MMIO:
 		r = KVM_COALESCED_MMIO_PAGE_OFFSET;
 		break;
+	case KVM_CAP_SET_DEVICE_ADDR:
+		r = 1;
 	case KVM_CAP_NR_VCPUS:
 		r = num_online_cpus();
 		break;
@@ -805,10 +807,29 @@  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	return -EINVAL;
 }
 
+static int kvm_vm_ioctl_set_device_address(struct kvm *kvm,
+					   struct kvm_device_address *dev_addr)
+{
+	return -ENODEV;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
-	return -EINVAL;
+	struct kvm *kvm = filp->private_data;
+	void __user *argp = (void __user *)arg;
+
+	switch (ioctl) {
+	case KVM_SET_DEVICE_ADDRESS: {
+		struct kvm_device_address dev_addr;
+
+		if (copy_from_user(&dev_addr, argp, sizeof(dev_addr)))
+			return -EFAULT;
+		return kvm_vm_ioctl_set_device_address(kvm, &dev_addr);
+	}
+	default:
+		return -EINVAL;
+	}
 }
 
 static void cpu_init_hyp_mode(void *vector)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dc63665..03248d1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -636,6 +636,7 @@  struct kvm_ppc_smmu_info {
 #define KVM_CAP_IRQFD_RESAMPLE 82
 #define KVM_CAP_PPC_BOOKE_WATCHDOG 83
 #define KVM_CAP_PPC_HTAB_FD 84
+#define KVM_CAP_SET_DEVICE_ADDR 85
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -783,6 +784,11 @@  struct kvm_msi {
 	__u8  pad[16];
 };
 
+struct kvm_device_address {
+	__u64 id;
+	__u64 addr;
+};
+
 /*
  * ioctls for VM fds
  */
@@ -868,6 +874,8 @@  struct kvm_s390_ucas_mapping {
 #define KVM_ALLOCATE_RMA	  _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
 /* Available with KVM_CAP_PPC_HTAB_FD */
 #define KVM_PPC_GET_HTAB_FD	  _IOW(KVMIO,  0xaa, struct kvm_get_htab_fd)
+/* Available with KVM_CAP_SET_DEVICE_ADDR */
+#define KVM_SET_DEVICE_ADDRESS	  _IOW(KVMIO,  0xab, struct kvm_device_address)
 
 /*
  * ioctls for vcpu fds