Message ID | 1390520766-5275-1-git-send-email-christoffer.dall@linaro.org |
---|---|
State | New, archived |
Headers | show |
On 23 January 2014 23:46, Christoffer Dall <christoffer.dall@linaro.org> wrote: > The KVM API documentation is not clear about the semantics of the data > field on the mmio struct on the kvm_run struct. > > This has become problematic when supporting ARM guests on big-endian > host systems with guests of both endianness types, because it is unclear > how the data should be exported to user space. > > This should not break with existing implementations as all supported > existing implementations of known user space applications (QEMU and > kvmtools for virtio) only support default endianness of the > architectures on the host side. > > Cc: Marc Zyngier <marc.zyngier@arm.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: Alexander Graf <agraf@suse.de> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > --- > Documentation/virtual/kvm/api.txt | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt > index 366bf4b..fb7c7e4 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied > by kvm. The 'data' member contains the written data if 'is_write' is > true, and should be filled by application code otherwise. > > +The 'data' member byte order is host kernel native endianness, regardless of > +the endianness of the guest, and represents the the value as it would go on the > +bus in real hardware. The host kernel should always be able to do: > +<type> val = *((<type> *)mmio.data). I think this would be better phrased as "The host userspace should always", since this documentation is supposed to be telling userspace what the kernel's contract with it is, not the kernel keeping notes for itself on its own implementation. (It also clarifies what the intention is for the obscure and maybe-we'll-never-implement-this case of an LE host kernel using a compatibility interface to run the host userspace (QEMU) as a BE process which sees the same ABI a BE kernel provides, without actually dragging that red herring explicitly into the documentation.) On the general semantics I am entirely in agreement with the clarification. thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 24/01/2014 01:01, Peter Maydell ha scritto: >> > >> > +The 'data' member byte order is host kernel native endianness, regardless of >> > +the endianness of the guest, and represents the the value as it would go on the >> > +bus in real hardware. The host kernel should always be able to do: >> > +<type> val = *((<type> *)mmio.data). > I think this would be better phrased as "The host userspace should always", > since this documentation is supposed to be telling userspace what the > kernel's contract with it is, not the kernel keeping notes for itself on > its own implementation. (It also clarifies what the intention is for the > obscure and maybe-we'll-never-implement-this case of an LE host > kernel using a compatibility interface to run the host userspace (QEMU) > as a BE process which sees the same ABI a BE kernel provides, > without actually dragging that red herring explicitly into the documentation.) I agree, and also the first line should mention userspace. In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 24/01/2014 01:01, Peter Maydell ha scritto: >>> > >>> > +The 'data' member byte order is host kernel native endianness, regardless of >>> > +the endianness of the guest, and represents the the value as it would go on the >>> > +bus in real hardware. The host kernel should always be able to do: >>> > +<type> val = *((<type> *)mmio.data). >> I think this would be better phrased as "The host userspace should always", >> since this documentation is supposed to be telling userspace what the >> kernel's contract with it is, not the kernel keeping notes for itself on >> its own implementation. (It also clarifies what the intention is for the >> obscure and maybe-we'll-never-implement-this case of an LE host >> kernel using a compatibility interface to run the host userspace (QEMU) >> as a BE process which sees the same ABI a BE kernel provides, >> without actually dragging that red herring explicitly into the documentation.) > > I agree, and also the first line should mention userspace. > > In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one). It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24 January 2014 05:13, Alexander Graf <agraf@suse.de> wrote: > > On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> Il 24/01/2014 01:01, Peter Maydell ha scritto: >>>> > >>>> > +The 'data' member byte order is host kernel native endianness, regardless of >>>> > +the endianness of the guest, and represents the the value as it would go on the >>>> > +bus in real hardware. Also if you use ints on real bus as description, you may want to clarify restrictions on mmio.len. Basically on 32 bit platform (i.e like V7 ARM) one cannot have mmio.len=8, because one cannot have 64bit value on 32bit data bus. Without such clarification introduction of text like "the value as it would go on the bus in real hardware" is confusing for len=8 for emulated CPUs where real busses are 32bit. If ldrd/strd would be emulated on ARMV7 one would need to use mmio twice to pass required data in either direction using len=4 .. Thanks, Victor > The host kernel should always be able to do: >>>> > +<type> val = *((<type> *)mmio.data). >>> I think this would be better phrased as "The host userspace should always", >>> since this documentation is supposed to be telling userspace what the >>> kernel's contract with it is, not the kernel keeping notes for itself on >>> its own implementation. (It also clarifies what the intention is for the >>> obscure and maybe-we'll-never-implement-this case of an LE host >>> kernel using a compatibility interface to run the host userspace (QEMU) >>> as a BE process which sees the same ABI a BE kernel provides, >>> without actually dragging that red herring explicitly into the documentation.) >> >> I agree, and also the first line should mention userspace. >> >> In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one). > > It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels. > > > Alex > > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 24/01/2014 16:23, Victor Kamensky ha scritto: > Also if you use ints on real bus as description, you may want to clarify > restrictions on mmio.len. Basically on 32 bit platform (i.e like V7 > ARM) one cannot have mmio.len=8, because one cannot have 64bit > value on 32bit data bus. Without such clarification introduction of > text like "the value as it would go on the bus in real hardware" is > confusing for len=8 for emulated CPUs where real busses are > 32bit. This is not necessarily true. On a 32-bit CPU you can have a 64-bit memory bus. Even x86 32-bit CPUs can do 64-bit MMIO via MMX or SSE or double-word compare-and-swap (CMPXCHG8B). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24 January 2014 07:32, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 24/01/2014 16:23, Victor Kamensky ha scritto: > >> Also if you use ints on real bus as description, you may want to clarify >> restrictions on mmio.len. Basically on 32 bit platform (i.e like V7 >> ARM) one cannot have mmio.len=8, because one cannot have 64bit >> value on 32bit data bus. Without such clarification introduction of >> text like "the value as it would go on the bus in real hardware" is >> confusing for len=8 for emulated CPUs where real busses are >> 32bit. > > > This is not necessarily true. On a 32-bit CPU you can have a 64-bit memory > bus. Even x86 32-bit CPUs can do 64-bit MMIO via MMX or SSE or double-word > compare-and-swap (CMPXCHG8B). Sure, that was part of my point :). But now text says about "real hardware" buses and in any given moment emulator specify particular type of CPU emulated, and I am quite sure that we can find one emulated ARMV7 CPU that would just have real 32bit data bus. Note there was no such problem and nobody cares what real data buses width are, with definition I argued for. That I think that was original intent of '__u8 data[8]' use - just bytes array description of how memory at given phys_addr looks before (mmiois_write=0) or would look after (mmio.is_write = 1) requested memory transaction (or several of them for that matter) are emulated by KVM_EXIT_MMIO. When if comes to devices memory read/write it is logical view of memory of course. Such definition works with any "real" data buses sizes, starting with byte. But, ooh, well ... nobody understands such definition .. I stand failed to explain it clear enough. Thanks, Victor > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24 January 2014 07:23, Victor Kamensky <victor.kamensky@linaro.org> wrote: > On 24 January 2014 05:13, Alexander Graf <agraf@suse.de> wrote: >> >> On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >>> Il 24/01/2014 01:01, Peter Maydell ha scritto: >>>>> > >>>>> > +The 'data' member byte order is host kernel native endianness, regardless of >>>>> > +the endianness of the guest, and represents the the value as it would go on the >>>>> > +bus in real hardware. Other thing with mmio.len need clarification: please specify what mmio.len values are allowed. It seems that it was implied that len could be only power of 2 in range between 1 and 8. I would rather see that it is spelled out. I think code on all sides of KVM_EXIT_MMIO already make such assumption. Now when you talk about integers values on "bus in real hardware" power of 2 sizes implied Also note for memory pieces with sizes other than 2, 4, 8 endianness is not defined. > > Also if you use ints on real bus as description, you may want to clarify > restrictions on mmio.len. Basically on 32 bit platform (i.e like V7 > ARM) one cannot have mmio.len=8, because one cannot have 64bit > value on 32bit data bus. Without such clarification introduction of > text like "the value as it would go on the bus in real hardware" is > confusing for len=8 for emulated CPUs where real busses are > 32bit. > > If ldrd/strd would be emulated on ARMV7 one would need to use > mmio twice to pass required data in either direction using len=4 .. > > Thanks, > Victor > >> The host kernel should always be able to do: >>>>> > +<type> val = *((<type> *)mmio.data). would it be prudent to say that it is just integer <type> here. Thanks, Victor >>>> I think this would be better phrased as "The host userspace should always", >>>> since this documentation is supposed to be telling userspace what the >>>> kernel's contract with it is, not the kernel keeping notes for itself on >>>> its own implementation. (It also clarifies what the intention is for the >>>> obscure and maybe-we'll-never-implement-this case of an LE host >>>> kernel using a compatibility interface to run the host userspace (QEMU) >>>> as a BE process which sees the same ABI a BE kernel provides, >>>> without actually dragging that red herring explicitly into the documentation.) >>> >>> I agree, and also the first line should mention userspace. >>> >>> In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one). >> >> It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels. >> >> >> Alex >> >> >> _______________________________________________ >> kvmarm mailing list >> kvmarm@lists.cs.columbia.edu >> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24 January 2014 05:13, Alexander Graf <agraf@suse.de> wrote: > > On 24.01.2014, at 14:09, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> Il 24/01/2014 01:01, Peter Maydell ha scritto: >>>> > >>>> > +The 'data' member byte order is host kernel native endianness, regardless of >>>> > +the endianness of the guest, and represents the the value as it would go on the >>>> > +bus in real hardware. The host kernel should always be able to do: >>>> > +<type> val = *((<type> *)mmio.data). >>> I think this would be better phrased as "The host userspace should always", >>> since this documentation is supposed to be telling userspace what the >>> kernel's contract with it is, not the kernel keeping notes for itself on >>> its own implementation. (It also clarifies what the intention is for the >>> obscure and maybe-we'll-never-implement-this case of an LE host >>> kernel using a compatibility interface to run the host userspace (QEMU) >>> as a BE process which sees the same ABI a BE kernel provides, >>> without actually dragging that red herring explicitly into the documentation.) >> >> I agree, and also the first line should mention userspace. >> >> In PPC I think it's possible or even common to have BE host kernel and LE host userspace (or perhaps vice versa is the common one). > > It was possible on 32bit, but I'm not sure anyone's actively using it :). The thing that was very common (not so much anymore for enterprise distros) is 32-bit user space with 64-bit kernels. Paolo, Alex, good point about BE kernel / LE user-land mix! How KVM kernel code that deals with KVM_MMIO_EXIT can find out what is user process endianity that handles this KVM_MMIO_EXIT? Do we have kernel function that can tell that. "Hey, what is current user-land task endianity? :)". And reverse case, how user-land code that wants to do KVM_MMIO_EXIT can find out what is endianity of kernel? Do we have such system call? "Hey, kernel tell me your endianity :)". Just a thought: should not we instead of trying to implicitly setup endianity by some other side properties like emulator or kernel endianity, let's just do it explicitly. Adding 'endianness' field into current mmio structure is not an option, but maybe there is outside mechanism like KVM features, special ioctl number, through which one can explicitly either set of learn endianity in mmio.data[] for given KVM session. At least, if we don't want to consider mixed BE/LE kernel/user-land, then document should clarify that BE/LE kernel/user-land mix is not supported and assumption here is that they always coincide. Thanks, Victor > > Alex > > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 366bf4b..fb7c7e4 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. +The 'data' member byte order is host kernel native endianness, regardless of +the endianness of the guest, and represents the the value as it would go on the +bus in real hardware. The host kernel should always be able to do: +<type> val = *((<type> *)mmio.data). + NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR, KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace
The KVM API documentation is not clear about the semantics of the data field on the mmio struct on the kvm_run struct. This has become problematic when supporting ARM guests on big-endian host systems with guests of both endianness types, because it is unclear how the data should be exported to user space. This should not break with existing implementations as all supported existing implementations of known user space applications (QEMU and kvmtools for virtio) only support default endianness of the architectures on the host side. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Alexander Graf <agraf@suse.de> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> --- Documentation/virtual/kvm/api.txt | 5 +++++ 1 file changed, 5 insertions(+)