Patchwork [RFC] KVM: Specify byte order for KVM_EXIT_MMIO

login
register
mail settings
Submitter Christoffer Dall
Date Jan. 23, 2014, 11:46 p.m.
Message ID <1390520766-5275-1-git-send-email-christoffer.dall@linaro.org>
Download mbox | patch
Permalink /patch/313773/
State New
Headers show

Comments

Christoffer Dall - Jan. 23, 2014, 11:46 p.m.
The KVM API documentation is not clear about the semantics of the data
field on the mmio struct on the kvm_run struct.

This has become problematic when supporting ARM guests on big-endian
host systems with guests of both endianness types, because it is unclear
how the data should be exported to user space.

This should not break with existing implementations as all supported
existing implementations of known user space applications (QEMU and
kvmtools for virtio) only support default endianness of the
architectures on the host side.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 Documentation/virtual/kvm/api.txt | 5 +++++
 1 file changed, 5 insertions(+)
Peter Maydell - Jan. 24, 2014, 12:01 a.m.
On 23 January 2014 23:46, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> The KVM API documentation is not clear about the semantics of the data
> field on the mmio struct on the kvm_run struct.
>
> This has become problematic when supporting ARM guests on big-endian
> host systems with guests of both endianness types, because it is unclear
> how the data should be exported to user space.
>
> This should not break with existing implementations as all supported
> existing implementations of known user space applications (QEMU and
> kvmtools for virtio) only support default endianness of the
> architectures on the host side.
>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Alexander Graf <agraf@suse.de>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/api.txt | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 366bf4b..fb7c7e4 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>  by kvm.  The 'data' member contains the written data if 'is_write' is
>  true, and should be filled by application code otherwise.
>
> +The 'data' member byte order is host kernel native endianness, regardless of
> +the endianness of the guest, and represents the the value as it would go on the
> +bus in real hardware.  The host kernel should always be able to do:
> +<type> val = *((<type> *)mmio.data).

I think this would be better phrased as "The host userspace should always",
since this documentation is supposed to be telling userspace what the
kernel's contract with it is, not the kernel keeping notes for itself on
its own implementation. (It also clarifies what the intention is for the
obscure and maybe-we'll-never-implement-this case of an LE host
kernel using a compatibility interface to run the host userspace (QEMU)
as a BE process which sees the same ABI a BE kernel provides,
without actually dragging that red herring explicitly into the documentation.)

On the general semantics I am entirely in agreement with the clarification.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini - Jan. 24, 2014, 1:09 p.m.
Il 24/01/2014 01:01, Peter Maydell ha scritto:
>> >
>> > +The 'data' member byte order is host kernel native endianness, regardless of
>> > +the endianness of the guest, and represents the the value as it would go on the
>> > +bus in real hardware.  The host kernel should always be able to do:
>> > +<type> val = *((<type> *)mmio.data).
> I think this would be better phrased as "The host userspace should always",
> since this documentation is supposed to be telling userspace what the
> kernel's contract with it is, not the kernel keeping notes for itself on
> its own implementation. (It also clarifies what the intention is for the
> obscure and maybe-we'll-never-implement-this case of an LE host
> kernel using a compatibility interface to run the host userspace (QEMU)
> as a BE process which sees the same ABI a BE kernel provides,
> without actually dragging that red herring explicitly into the documentation.)

I agree, and also the first line should mention userspace.

In PPC I think it's possible or even common to have BE host kernel and 
LE host userspace (or perhaps vice versa is the common one).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Jan. 24, 2014, 1:13 p.m.
On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 24/01/2014 01:01, Peter Maydell ha scritto:
>>> >
>>> > +The 'data' member byte order is host kernel native endianness, regardless of
>>> > +the endianness of the guest, and represents the the value as it would go on the
>>> > +bus in real hardware.  The host kernel should always be able to do:
>>> > +<type> val = *((<type> *)mmio.data).
>> I think this would be better phrased as "The host userspace should always",
>> since this documentation is supposed to be telling userspace what the
>> kernel's contract with it is, not the kernel keeping notes for itself on
>> its own implementation. (It also clarifies what the intention is for the
>> obscure and maybe-we'll-never-implement-this case of an LE host
>> kernel using a compatibility interface to run the host userspace (QEMU)
>> as a BE process which sees the same ABI a BE kernel provides,
>> without actually dragging that red herring explicitly into the documentation.)
> 
> I agree, and also the first line should mention userspace.
> 
> In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one).

It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky - Jan. 24, 2014, 3:23 p.m.
On 24 January 2014 05:13, Alexander Graf <agraf@suse.de> wrote:
>
> On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> Il 24/01/2014 01:01, Peter Maydell ha scritto:
>>>> >
>>>> > +The 'data' member byte order is host kernel native endianness, regardless of
>>>> > +the endianness of the guest, and represents the the value as it would go on the
>>>> > +bus in real hardware.

Also if you use ints on real bus as description, you may want to clarify
restrictions on mmio.len. Basically on 32 bit platform (i.e like V7
ARM) one cannot have mmio.len=8, because one cannot have 64bit
value on 32bit data bus. Without such clarification introduction of
text like "the value as it would go on the bus in real hardware" is
confusing for len=8 for emulated CPUs where real busses are
32bit.

If ldrd/strd would be emulated on ARMV7 one would need to use
mmio twice to pass required data in either direction using len=4 ..

Thanks,
Victor

> The host kernel should always be able to do:
>>>> > +<type> val = *((<type> *)mmio.data).
>>> I think this would be better phrased as "The host userspace should always",
>>> since this documentation is supposed to be telling userspace what the
>>> kernel's contract with it is, not the kernel keeping notes for itself on
>>> its own implementation. (It also clarifies what the intention is for the
>>> obscure and maybe-we'll-never-implement-this case of an LE host
>>> kernel using a compatibility interface to run the host userspace (QEMU)
>>> as a BE process which sees the same ABI a BE kernel provides,
>>> without actually dragging that red herring explicitly into the documentation.)
>>
>> I agree, and also the first line should mention userspace.
>>
>> In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one).
>
> It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels.
>
>
> Alex
>
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini - Jan. 24, 2014, 3:32 p.m.
Il 24/01/2014 16:23, Victor Kamensky ha scritto:
> Also if you use ints on real bus as description, you may want to clarify
> restrictions on mmio.len. Basically on 32 bit platform (i.e like V7
> ARM) one cannot have mmio.len=8, because one cannot have 64bit
> value on 32bit data bus. Without such clarification introduction of
> text like "the value as it would go on the bus in real hardware" is
> confusing for len=8 for emulated CPUs where real busses are
> 32bit.

This is not necessarily true.  On a 32-bit CPU you can have a 64-bit 
memory bus.  Even x86 32-bit CPUs can do 64-bit MMIO via MMX or SSE or 
double-word compare-and-swap (CMPXCHG8B).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky - Jan. 24, 2014, 4:19 p.m.
On 24 January 2014 07:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 24/01/2014 16:23, Victor Kamensky ha scritto:
>
>> Also if you use ints on real bus as description, you may want to clarify
>> restrictions on mmio.len. Basically on 32 bit platform (i.e like V7
>> ARM) one cannot have mmio.len=8, because one cannot have 64bit
>> value on 32bit data bus. Without such clarification introduction of
>> text like "the value as it would go on the bus in real hardware" is
>> confusing for len=8 for emulated CPUs where real busses are
>> 32bit.
>
>
> This is not necessarily true.  On a 32-bit CPU you can have a 64-bit memory
> bus.  Even x86 32-bit CPUs can do 64-bit MMIO via MMX or SSE or double-word
> compare-and-swap (CMPXCHG8B).

Sure, that was part of my point :). But now text says about "real
hardware" buses and in any given moment emulator specify particular
type of CPU emulated, and I am quite sure that we can find one
emulated ARMV7 CPU that would just have real 32bit data bus.

Note there was no such problem and nobody cares what real
data buses width are, with definition I argued for. That I think
that was original intent of '__u8  data[8]' use - just bytes array
description of how memory at given phys_addr looks before
(mmiois_write=0) or would look after (mmio.is_write = 1)
requested memory transaction (or several of them for that
matter) are emulated by KVM_EXIT_MMIO. When if comes
to devices memory read/write it is logical view of memory of
course. Such definition works with any "real" data buses sizes,
starting with byte.

But, ooh, well ... nobody understands such definition .. I
stand failed to explain it clear enough.

Thanks,
Victor

> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky - Jan. 24, 2014, 4:34 p.m.
On 24 January 2014 07:23, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> On 24 January 2014 05:13, Alexander Graf <agraf@suse.de> wrote:
>>
>> On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>> Il 24/01/2014 01:01, Peter Maydell ha scritto:
>>>>> >
>>>>> > +The 'data' member byte order is host kernel native endianness, regardless of
>>>>> > +the endianness of the guest, and represents the the value as it would go on the
>>>>> > +bus in real hardware.

Other thing with mmio.len need clarification: please specify what
mmio.len values are allowed. It seems that it was implied that len
could be only power of 2 in range between 1 and 8. I would rather see
that it is spelled out. I think code on all sides of KVM_EXIT_MMIO
already make such assumption. Now when you talk about integers
values on "bus in real hardware" power of 2 sizes implied Also note
for memory pieces with sizes other than 2, 4, 8 endianness is not
defined.

>
> Also if you use ints on real bus as description, you may want to clarify
> restrictions on mmio.len. Basically on 32 bit platform (i.e like V7
> ARM) one cannot have mmio.len=8, because one cannot have 64bit
> value on 32bit data bus. Without such clarification introduction of
> text like "the value as it would go on the bus in real hardware" is
> confusing for len=8 for emulated CPUs where real busses are
> 32bit.
>
> If ldrd/strd would be emulated on ARMV7 one would need to use
> mmio twice to pass required data in either direction using len=4 ..
>
> Thanks,
> Victor
>
>> The host kernel should always be able to do:
>>>>> > +<type> val = *((<type> *)mmio.data).

would it be prudent to say that it is just integer <type> here.

Thanks,
Victor

>>>> I think this would be better phrased as "The host userspace should always",
>>>> since this documentation is supposed to be telling userspace what the
>>>> kernel's contract with it is, not the kernel keeping notes for itself on
>>>> its own implementation. (It also clarifies what the intention is for the
>>>> obscure and maybe-we'll-never-implement-this case of an LE host
>>>> kernel using a compatibility interface to run the host userspace (QEMU)
>>>> as a BE process which sees the same ABI a BE kernel provides,
>>>> without actually dragging that red herring explicitly into the documentation.)
>>>
>>> I agree, and also the first line should mention userspace.
>>>
>>> In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one).
>>
>> It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels.
>>
>>
>> Alex
>>
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky - Jan. 24, 2014, 7:17 p.m.
On 24 January 2014 05:13, Alexander Graf <agraf@suse.de> wrote:
>
> On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> Il 24/01/2014 01:01, Peter Maydell ha scritto:
>>>> >
>>>> > +The 'data' member byte order is host kernel native endianness, regardless of
>>>> > +the endianness of the guest, and represents the the value as it would go on the
>>>> > +bus in real hardware.  The host kernel should always be able to do:
>>>> > +<type> val = *((<type> *)mmio.data).
>>> I think this would be better phrased as "The host userspace should always",
>>> since this documentation is supposed to be telling userspace what the
>>> kernel's contract with it is, not the kernel keeping notes for itself on
>>> its own implementation. (It also clarifies what the intention is for the
>>> obscure and maybe-we'll-never-implement-this case of an LE host
>>> kernel using a compatibility interface to run the host userspace (QEMU)
>>> as a BE process which sees the same ABI a BE kernel provides,
>>> without actually dragging that red herring explicitly into the documentation.)
>>
>> I agree, and also the first line should mention userspace.
>>
>> In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one).
>
> It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels.

Paolo, Alex, good point about BE kernel / LE user-land mix!

How KVM kernel code that deals with KVM_MMIO_EXIT can find
out what is user process endianity that handles this
KVM_MMIO_EXIT? Do we have kernel function that can tell
that. "Hey, what is current user-land task endianity? :)".
And reverse case, how user-land code that wants to do
KVM_MMIO_EXIT can find out what is endianity of kernel?
Do we have such system call? "Hey, kernel tell me your
endianity :)".

Just a thought: should not we instead of trying to implicitly
setup endianity by some other side properties like emulator or
kernel endianity, let's just do it explicitly. Adding 'endianness'
field into current mmio structure is not an option, but maybe
there is outside mechanism like KVM features, special ioctl
number, through which one can explicitly either set of learn
endianity in mmio.data[] for given KVM session.

At least, if we don't want to consider mixed BE/LE kernel/user-land,
then document should clarify that BE/LE kernel/user-land mix is not
supported and assumption here is that they always coincide.

Thanks,
Victor

>
> Alex
>
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 366bf4b..fb7c7e4 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2565,6 +2565,11 @@  executed a memory-mapped I/O instruction which could not be satisfied
 by kvm.  The 'data' member contains the written data if 'is_write' is
 true, and should be filled by application code otherwise.
 
+The 'data' member byte order is host kernel native endianness, regardless of
+the endianness of the guest, and represents the the value as it would go on the
+bus in real hardware.  The host kernel should always be able to do:
+<type> val = *((<type> *)mmio.data).
+
 NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
       KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
 operations are complete (and guest state is consistent) only after userspace