diff mbox

[v2] KVM: Specify byte order for KVM_EXIT_MMIO

Message ID 1390606782-14724-1-git-send-email-christoffer.dall@linaro.org
State New, archived
Headers show

Commit Message

Christoffer Dall Jan. 24, 2014, 11:39 p.m. UTC
The KVM API documentation is not clear about the semantics of the data
field on the mmio struct on the kvm_run struct.

This has become problematic when supporting ARM guests on big-endian
host systems with guests of both endianness types, because it is unclear
how the data should be exported to user space.

This should not break with existing implementations as all supported
existing implementations of known user space applications (QEMU and
kvmtools for virtio) only support default endianness of the
architectures on the host side.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changes [v1 - v2]:
 - s/host kernel should/host user space should/

 Documentation/virtual/kvm/api.txt | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Scott Wood Jan. 24, 2014, 11:51 p.m. UTC | #1
On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
> The KVM API documentation is not clear about the semantics of the data
> field on the mmio struct on the kvm_run struct.
> 
> This has become problematic when supporting ARM guests on big-endian
> host systems with guests of both endianness types, because it is unclear
> how the data should be exported to user space.
> 
> This should not break with existing implementations as all supported
> existing implementations of known user space applications (QEMU and
> kvmtools for virtio) only support default endianness of the
> architectures on the host side.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: Alexander Graf <agraf@suse.de>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
> Changes [v1 - v2]:
>  - s/host kernel should/host user space should/
> 
>  Documentation/virtual/kvm/api.txt | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 366bf4b..6dbd68c 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>  by kvm.  The 'data' member contains the written data if 'is_write' is
>  true, and should be filled by application code otherwise.
>  
> +The 'data' member byte order is host kernel native endianness, regardless of
> +the endianness of the guest, and represents the the value as it would go on the
> +bus in real hardware.  The host user space should always be able to do:
> +<type> val = *((<type> *)mmio.data).

Host userspace should be able to do that with what results?  It would
only produce a directly usable value if host endianness is the same as
the emulated device's endianness.

I'm not sure that "host kernel native endianness" is an accurate way of
describing what currently happens.  Regardless of host or guest
endianness, the guest should be swapping the value as necessary to
ensure that the value that goes on the (real or emulated) bus is the
same.

Plus, if it were really "as it would go on the bus" the value wouldn't
necessarily be left justified within data[], depending on how the bus 
works.

How about a wording like this:

  The 'data' member contains, in its first 'len' bytes, the value as it
  would appear if the guest had accessed memory rather than I/O.        

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky Jan. 25, 2014, 12:05 a.m. UTC | #2
On 24 January 2014 15:51, Scott Wood <scottwood@freescale.com> wrote:
> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>> The KVM API documentation is not clear about the semantics of the data
>> field on the mmio struct on the kvm_run struct.
>>
>> This has become problematic when supporting ARM guests on big-endian
>> host systems with guests of both endianness types, because it is unclear
>> how the data should be exported to user space.
>>
>> This should not break with existing implementations as all supported
>> existing implementations of known user space applications (QEMU and
>> kvmtools for virtio) only support default endianness of the
>> architectures on the host side.
>>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>> Changes [v1 - v2]:
>>  - s/host kernel should/host user space should/
>>
>>  Documentation/virtual/kvm/api.txt | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 366bf4b..6dbd68c 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>  by kvm.  The 'data' member contains the written data if 'is_write' is
>>  true, and should be filled by application code otherwise.
>>
>> +The 'data' member byte order is host kernel native endianness, regardless of
>> +the endianness of the guest, and represents the the value as it would go on the
>> +bus in real hardware.  The host user space should always be able to do:
>> +<type> val = *((<type> *)mmio.data).
>
> Host userspace should be able to do that with what results?  It would
> only produce a directly usable value if host endianness is the same as
> the emulated device's endianness.
>
> I'm not sure that "host kernel native endianness" is an accurate way of
> describing what currently happens.  Regardless of host or guest
> endianness, the guest should be swapping the value as necessary to
> ensure that the value that goes on the (real or emulated) bus is the
> same.
>
> Plus, if it were really "as it would go on the bus" the value wouldn't
> necessarily be left justified within data[], depending on how the bus
> works.
>
> How about a wording like this:
>
>   The 'data' member contains, in its first 'len' bytes, the value as it
>   would appear if the guest had accessed memory rather than I/O.

Scott, Yes! Thank you so much! Completely agree.
You managed elegantly to express in two lines,
what I failed to communicate with hundreds lines
of my emails in last few days on this and other
email thread.

- Victor

> -Scott
>
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Jan. 25, 2014, 12:24 a.m. UTC | #3
On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>> The KVM API documentation is not clear about the semantics of the data
>> field on the mmio struct on the kvm_run struct.
>>
>> This has become problematic when supporting ARM guests on big-endian
>> host systems with guests of both endianness types, because it is unclear
>> how the data should be exported to user space.
>>
>> This should not break with existing implementations as all supported
>> existing implementations of known user space applications (QEMU and
>> kvmtools for virtio) only support default endianness of the
>> architectures on the host side.
>>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>> Changes [v1 - v2]:
>>  - s/host kernel should/host user space should/
>>
>>  Documentation/virtual/kvm/api.txt | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 366bf4b..6dbd68c 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>  by kvm.  The 'data' member contains the written data if 'is_write' is
>>  true, and should be filled by application code otherwise.
>>
>> +The 'data' member byte order is host kernel native endianness, regardless of
>> +the endianness of the guest, and represents the the value as it would go on the
>> +bus in real hardware.  The host user space should always be able to do:
>> +<type> val = *((<type> *)mmio.data).
>
> Host userspace should be able to do that with what results?  It would
> only produce a directly usable value if host endianness is the same as
> the emulated device's endianness.

With the result that it gets the value the CPU has sent out on
the bus as the memory transaction. Obviously if what userspace
is emulating is a bus which has a byteswapping bridge or if it's
being helpful to device emulation by providing "here's the value
even though you think you're wired up backwards" then it needs
to byteswap.

> I'm not sure that "host kernel native endianness" is an accurate way of
> describing what currently happens.  Regardless of host or guest
> endianness, the guest should be swapping the value as necessary to
> ensure that the value that goes on the (real or emulated) bus is the
> same.

I don't know why you're bringing the guest in here. Whether
the guest chooses to byteswap or not is IMHO not relevant.
What KVM and userspace need to combine to achieve is that
whatever the guest happens to have done causes the same result
it would on the real hardware. Whether the guest sends out a
write of 0x12345678 because it wrote 0x12345678 directly or
because it started with 0x87654321 and issued a byte-reverse
instruction doesn't matter.

> Plus, if it were really "as it would go on the bus" the value wouldn't
> necessarily be left justified within data[], depending on how the bus
> works.

The point is that the value in data[] is not "as it would go on the bus",
but the value you get out by treating it as a host-native-endianness
value of the relevant size left-justified within data[] is the value as
it would go on the bus.

> How about a wording like this:
>
>   The 'data' member contains, in its first 'len' bytes, the value as it
>   would appear if the guest had accessed memory rather than I/O.

I think this is confusing, because now userspace authors have
to figure out how to get back to "value X of size Y at address Z"
by interpreting this text... Can you write out the equivalent of
Christoffer's text "here's how you get the memory transaction
value" for what you want?

(Also, value as it would appear to who?)

I think your wording implies that the order of bytes in data[] depend
on the guest CPU "usual byte order", ie the order which the CPU
does not do a byte-lane-swap for (LE for ARM, BE for PPC),
and it would mean it would come out differently from
my/Alex/Christoffer's proposal if the host kernel was the opposite
endianness from that "usual" order.

Finally, I think it's a bit confusing in that "as if the guest had
accessed memory" is assigning implicit semantics to memory
in the emulated system, when memory is actually kind of outside
KVM's purview because it's not part of the CPU.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall Jan. 25, 2014, 1:56 a.m. UTC | #4
On Sat, Jan 25, 2014 at 12:24:08AM +0000, Peter Maydell wrote:
> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> > On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
> >> The KVM API documentation is not clear about the semantics of the data
> >> field on the mmio struct on the kvm_run struct.
> >>
> >> This has become problematic when supporting ARM guests on big-endian
> >> host systems with guests of both endianness types, because it is unclear
> >> how the data should be exported to user space.
> >>
> >> This should not break with existing implementations as all supported
> >> existing implementations of known user space applications (QEMU and
> >> kvmtools for virtio) only support default endianness of the
> >> architectures on the host side.
> >>
> >> Cc: Marc Zyngier <marc.zyngier@arm.com>
> >> Cc: Peter Maydell <peter.maydell@linaro.org>
> >> Cc: Alexander Graf <agraf@suse.de>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >> ---
> >> Changes [v1 - v2]:
> >>  - s/host kernel should/host user space should/
> >>
> >>  Documentation/virtual/kvm/api.txt | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index 366bf4b..6dbd68c 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
> >>  by kvm.  The 'data' member contains the written data if 'is_write' is
> >>  true, and should be filled by application code otherwise.
> >>
> >> +The 'data' member byte order is host kernel native endianness, regardless of
> >> +the endianness of the guest, and represents the the value as it would go on the
> >> +bus in real hardware.  The host user space should always be able to do:
> >> +<type> val = *((<type> *)mmio.data).
> >
> > Host userspace should be able to do that with what results?  It would
> > only produce a directly usable value if host endianness is the same as
> > the emulated device's endianness.
> 
> With the result that it gets the value the CPU has sent out on
> the bus as the memory transaction. Obviously if what userspace
> is emulating is a bus which has a byteswapping bridge or if it's
> being helpful to device emulation by providing "here's the value
> even though you think you're wired up backwards" then it needs
> to byteswap.
> 

Yes, because KVM emulates the CPU, it's interface should emulate the
external interface of the CPU, not CPU-internal bits.

The value is directly usable to emulate the bus to attached devices in
user space.

> > I'm not sure that "host kernel native endianness" is an accurate way of
> > describing what currently happens.  Regardless of host or guest
> > endianness, the guest should be swapping the value as necessary to
> > ensure that the value that goes on the (real or emulated) bus is the
> > same.
> 
> I don't know why you're bringing the guest in here. Whether
> the guest chooses to byteswap or not is IMHO not relevant.
> What KVM and userspace need to combine to achieve is that
> whatever the guest happens to have done causes the same result
> it would on the real hardware. Whether the guest sends out a
> write of 0x12345678 because it wrote 0x12345678 directly or
> because it started with 0x87654321 and issued a byte-reverse
> instruction doesn't matter.
> 
> > Plus, if it were really "as it would go on the bus" the value wouldn't
> > necessarily be left justified within data[], depending on how the bus
> > works.
> 
> The point is that the value in data[] is not "as it would go on the bus",
> but the value you get out by treating it as a host-native-endianness
> value of the relevant size left-justified within data[] is the value as
> it would go on the bus.
> 
> > How about a wording like this:
> >
> >   The 'data' member contains, in its first 'len' bytes, the value as it
> >   would appear if the guest had accessed memory rather than I/O.
> 
> I think this is confusing, because now userspace authors have
> to figure out how to get back to "value X of size Y at address Z"
> by interpreting this text... Can you write out the equivalent of
> Christoffer's text "here's how you get the memory transaction
> value" for what you want?
> 
> (Also, value as it would appear to who?)
> 
> I think your wording implies that the order of bytes in data[] depend
> on the guest CPU "usual byte order", ie the order which the CPU
> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
> and it would mean it would come out differently from
> my/Alex/Christoffer's proposal if the host kernel was the opposite
> endianness from that "usual" order.
> 
> Finally, I think it's a bit confusing in that "as if the guest had
> accessed memory" is assigning implicit semantics to memory
> in the emulated system, when memory is actually kind of outside
> KVM's purview because it's not part of the CPU.
> 

The "as if the guest had accessed memory", would imply that user space
needs to look at the settings of the CPU (in the case of ARM the E-bit)
to understand which value the CPU intended for the memory transaction.

However, I'm fine with rewording the patch to something like:

  The 'data' member contains, in its first 'len' bytes, the value of the
  memory transaction in host kernel native endianness, regardless of the
  endianness of the guest.  Host user space should always be able to
  directly do the following to emulate the bus and attached peripherals:
  <type> val = *((<type> *)mmio.data)

if people find this more clear.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood Jan. 25, 2014, 1:58 a.m. UTC | #5
On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> > On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index 366bf4b..6dbd68c 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
> >>  by kvm.  The 'data' member contains the written data if 'is_write' is
> >>  true, and should be filled by application code otherwise.
> >>
> >> +The 'data' member byte order is host kernel native endianness, regardless of
> >> +the endianness of the guest, and represents the the value as it would go on the
> >> +bus in real hardware.  The host user space should always be able to do:
> >> +<type> val = *((<type> *)mmio.data).
> >
> > Host userspace should be able to do that with what results?  It would
> > only produce a directly usable value if host endianness is the same as
> > the emulated device's endianness.
> 
> With the result that it gets the value the CPU has sent out on
> the bus as the memory transaction.

Doesn't that assume the host kernel endianness is the same as the bus
(or rather, that the host CPU would not swap such an access before it
hits the bus)?

If you take the same hardware and boot a little endian host kernel one
day, and a big endian host kernel the next, the bus doesn't change, and
neither should the bytewise (assuming address invariance) contents of
data[].  How data[] would look when read as a larger integer would of
course change -- but that's due to how you're reading it.

It's clear to say that a value in memory has been stored there in host
endianness when the value is as you would want to see it in a CPU
register, but it's less clear when you talk about it relative to values
on a bus.  It's harder to correlate that to something that is software
visible.

I don't think there's any actual technical difference between your
wording and mine when each wording is properly interpreted, but I
suspect my wording is less likely to be misinterpreted (I could be
wrong).

> Obviously if what userspace
> is emulating is a bus which has a byteswapping bridge or if it's
> being helpful to device emulation by providing "here's the value
> even though you think you're wired up backwards" then it needs
> to byteswap.

Whether the emulated bus has "a byteswapping bridge" doesn't sound like
something that depends on the endianness that the host CPU is currently
running in.

> > How about a wording like this:
> >
> >   The 'data' member contains, in its first 'len' bytes, the value as it
> >   would appear if the guest had accessed memory rather than I/O.
> 
> I think this is confusing, because now userspace authors have
> to figure out how to get back to "value X of size Y at address Z"
> by interpreting this text... Can you write out the equivalent of
> Christoffer's text "here's how you get the memory transaction
> value" for what you want?

Userspace swaps the value if and only if userspace's endianness differs
from the endianness with which the device interprets the data
(regardless of whether said interpretation is considered natural or
swapped relative to the way the bus is documented).  It's similar to how
userspace would handle emulating DMA.

KVM swaps the value if and only if the endianness of the guest access
differs from that of the host, i.e. if it would have done swapping when
emulating an ordinary memory access.

> (Also, value as it would appear to who?)

As it would appear to anyone.  It works because data[] actually is
memory.  Any difference in how data appears based on the reader's
context would already be reflected when the reader performs the load.

> I think your wording implies that the order of bytes in data[] depend
> on the guest CPU "usual byte order", ie the order which the CPU
> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
> and it would mean it would come out differently from
> my/Alex/Christoffer's proposal if the host kernel was the opposite
> endianness from that "usual" order.

It doesn't depend on "usual" anything.  The only thing it implicitly
says about guest byte order is that it's KVM's job to implement any
swapping if the endianness of the guest access is different from the
endianness of the host kernel access (whether it's due to the guest's
mode, the way a page is mapped, the instruction used, etc).

> Finally, I think it's a bit confusing in that "as if the guest had
> accessed memory" is assigning implicit semantics to memory
> in the emulated system, when memory is actually kind of outside
> KVM's purview because it's not part of the CPU.

That's sort of the point.  It defines it in a way that is independent of
the CPU, and thus independent of what endianness the CPU operates in.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood Jan. 25, 2014, 2:04 a.m. UTC | #6
On Fri, 2014-01-24 at 17:56 -0800, Christoffer Dall wrote:
> On Sat, Jan 25, 2014 at 12:24:08AM +0000, Peter Maydell wrote:
> > Finally, I think it's a bit confusing in that "as if the guest had
> > accessed memory" is assigning implicit semantics to memory
> > in the emulated system, when memory is actually kind of outside
> > KVM's purview because it's not part of the CPU.
> > 
> 
> The "as if the guest had accessed memory", would imply that user space
> needs to look at the settings of the CPU (in the case of ARM the E-bit)
> to understand which value the CPU intended for the memory transaction.

No, it doesn't, just as userspace doesn't need to do that to interpret
DMA descriptors.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Jan. 25, 2014, 2:15 a.m. UTC | #7
On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:

> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index 366bf4b..6dbd68c 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>> true, and should be filled by application code otherwise.
>>>> 
>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>> +bus in real hardware.  The host user space should always be able to do:
>>>> +<type> val = *((<type> *)mmio.data).
>>> 
>>> Host userspace should be able to do that with what results?  It would
>>> only produce a directly usable value if host endianness is the same as
>>> the emulated device's endianness.
>> 
>> With the result that it gets the value the CPU has sent out on
>> the bus as the memory transaction.
> 
> Doesn't that assume the host kernel endianness is the same as the bus
> (or rather, that the host CPU would not swap such an access before it
> hits the bus)?
> 
> If you take the same hardware and boot a little endian host kernel one
> day, and a big endian host kernel the next, the bus doesn't change, and
> neither should the bytewise (assuming address invariance) contents of
> data[].  How data[] would look when read as a larger integer would of
> course change -- but that's due to how you're reading it.
> 
> It's clear to say that a value in memory has been stored there in host
> endianness when the value is as you would want to see it in a CPU
> register, but it's less clear when you talk about it relative to values
> on a bus.  It's harder to correlate that to something that is software
> visible.
> 
> I don't think there's any actual technical difference between your
> wording and mine when each wording is properly interpreted, but I
> suspect my wording is less likely to be misinterpreted (I could be
> wrong).
> 
>> Obviously if what userspace
>> is emulating is a bus which has a byteswapping bridge or if it's
>> being helpful to device emulation by providing "here's the value
>> even though you think you're wired up backwards" then it needs
>> to byteswap.
> 
> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
> something that depends on the endianness that the host CPU is currently
> running in.
> 
>>> How about a wording like this:
>>> 
>>>  The 'data' member contains, in its first 'len' bytes, the value as it
>>>  would appear if the guest had accessed memory rather than I/O.
>> 
>> I think this is confusing, because now userspace authors have
>> to figure out how to get back to "value X of size Y at address Z"
>> by interpreting this text... Can you write out the equivalent of
>> Christoffer's text "here's how you get the memory transaction
>> value" for what you want?
> 
> Userspace swaps the value if and only if userspace's endianness differs
> from the endianness with which the device interprets the data
> (regardless of whether said interpretation is considered natural or
> swapped relative to the way the bus is documented).  It's similar to how
> userspace would handle emulating DMA.
> 
> KVM swaps the value if and only if the endianness of the guest access
> differs from that of the host, i.e. if it would have done swapping when
> emulating an ordinary memory access.
> 
>> (Also, value as it would appear to who?)
> 
> As it would appear to anyone.  It works because data[] actually is
> memory.  Any difference in how data appears based on the reader's
> context would already be reflected when the reader performs the load.
> 
>> I think your wording implies that the order of bytes in data[] depend
>> on the guest CPU "usual byte order", ie the order which the CPU
>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>> and it would mean it would come out differently from
>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>> endianness from that "usual" order.
> 
> It doesn't depend on "usual" anything.  The only thing it implicitly
> says about guest byte order is that it's KVM's job to implement any
> swapping if the endianness of the guest access is different from the
> endianness of the host kernel access (whether it's due to the guest's
> mode, the way a page is mapped, the instruction used, etc).
> 
>> Finally, I think it's a bit confusing in that "as if the guest had
>> accessed memory" is assigning implicit semantics to memory
>> in the emulated system, when memory is actually kind of outside
>> KVM's purview because it's not part of the CPU.
> 
> That's sort of the point.  It defines it in a way that is independent of
> the CPU, and thus independent of what endianness the CPU operates in.

Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like

your proposal:

  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

-> ldw_p() will give us the correct value to work with

current proposal:

  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }

-> *(uint32_t*)data will give us the correct value to work with


There are pros and cons for both approaches.

Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.

Obviously both approaches would work as long as they're properly defined :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Jan. 25, 2014, 2:16 a.m. UTC | #8
On 25.01.2014, at 03:04, Scott Wood <scottwood@freescale.com> wrote:

> On Fri, 2014-01-24 at 17:56 -0800, Christoffer Dall wrote:
>> On Sat, Jan 25, 2014 at 12:24:08AM +0000, Peter Maydell wrote:
>>> Finally, I think it's a bit confusing in that "as if the guest had
>>> accessed memory" is assigning implicit semantics to memory
>>> in the emulated system, when memory is actually kind of outside
>>> KVM's purview because it's not part of the CPU.
>>> 
>> 
>> The "as if the guest had accessed memory", would imply that user space
>> needs to look at the settings of the CPU (in the case of ARM the E-bit)
>> to understand which value the CPU intended for the memory transaction.
> 
> No, it doesn't, just as userspace doesn't need to do that to interpret
> DMA descriptors.

DMA descriptors are handled by specific pieces of hardware that access memory. Guests usually push data to memory in a layout so that it's "natural" to the device they want to talk to. MMIO is more tricky.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall Jan. 25, 2014, 2:34 a.m. UTC | #9
On Sat, Jan 25, 2014 at 03:15:35AM +0100, Alexander Graf wrote:
> 
> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
> 
> > On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
> >> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> >>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index 366bf4b..6dbd68c 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
> >>>> by kvm.  The 'data' member contains the written data if 'is_write' is
> >>>> true, and should be filled by application code otherwise.
> >>>> 
> >>>> +The 'data' member byte order is host kernel native endianness, regardless of
> >>>> +the endianness of the guest, and represents the the value as it would go on the
> >>>> +bus in real hardware.  The host user space should always be able to do:
> >>>> +<type> val = *((<type> *)mmio.data).
> >>> 
> >>> Host userspace should be able to do that with what results?  It would
> >>> only produce a directly usable value if host endianness is the same as
> >>> the emulated device's endianness.
> >> 
> >> With the result that it gets the value the CPU has sent out on
> >> the bus as the memory transaction.
> > 
> > Doesn't that assume the host kernel endianness is the same as the bus
> > (or rather, that the host CPU would not swap such an access before it
> > hits the bus)?
> > 
> > If you take the same hardware and boot a little endian host kernel one
> > day, and a big endian host kernel the next, the bus doesn't change, and
> > neither should the bytewise (assuming address invariance) contents of
> > data[].  How data[] would look when read as a larger integer would of
> > course change -- but that's due to how you're reading it.
> > 
> > It's clear to say that a value in memory has been stored there in host
> > endianness when the value is as you would want to see it in a CPU
> > register, but it's less clear when you talk about it relative to values
> > on a bus.  It's harder to correlate that to something that is software
> > visible.
> > 
> > I don't think there's any actual technical difference between your
> > wording and mine when each wording is properly interpreted, but I
> > suspect my wording is less likely to be misinterpreted (I could be
> > wrong).
> > 
> >> Obviously if what userspace
> >> is emulating is a bus which has a byteswapping bridge or if it's
> >> being helpful to device emulation by providing "here's the value
> >> even though you think you're wired up backwards" then it needs
> >> to byteswap.
> > 
> > Whether the emulated bus has "a byteswapping bridge" doesn't sound like
> > something that depends on the endianness that the host CPU is currently
> > running in.
> > 
> >>> How about a wording like this:
> >>> 
> >>>  The 'data' member contains, in its first 'len' bytes, the value as it
> >>>  would appear if the guest had accessed memory rather than I/O.
> >> 
> >> I think this is confusing, because now userspace authors have
> >> to figure out how to get back to "value X of size Y at address Z"
> >> by interpreting this text... Can you write out the equivalent of
> >> Christoffer's text "here's how you get the memory transaction
> >> value" for what you want?
> > 
> > Userspace swaps the value if and only if userspace's endianness differs
> > from the endianness with which the device interprets the data
> > (regardless of whether said interpretation is considered natural or
> > swapped relative to the way the bus is documented).  It's similar to how
> > userspace would handle emulating DMA.
> > 
> > KVM swaps the value if and only if the endianness of the guest access
> > differs from that of the host, i.e. if it would have done swapping when
> > emulating an ordinary memory access.
> > 
> >> (Also, value as it would appear to who?)
> > 
> > As it would appear to anyone.  It works because data[] actually is
> > memory.  Any difference in how data appears based on the reader's
> > context would already be reflected when the reader performs the load.
> > 
> >> I think your wording implies that the order of bytes in data[] depend
> >> on the guest CPU "usual byte order", ie the order which the CPU
> >> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
> >> and it would mean it would come out differently from
> >> my/Alex/Christoffer's proposal if the host kernel was the opposite
> >> endianness from that "usual" order.
> > 
> > It doesn't depend on "usual" anything.  The only thing it implicitly
> > says about guest byte order is that it's KVM's job to implement any
> > swapping if the endianness of the guest access is different from the
> > endianness of the host kernel access (whether it's due to the guest's
> > mode, the way a page is mapped, the instruction used, etc).
> > 
> >> Finally, I think it's a bit confusing in that "as if the guest had
> >> accessed memory" is assigning implicit semantics to memory
> >> in the emulated system, when memory is actually kind of outside
> >> KVM's purview because it's not part of the CPU.
> > 
> > That's sort of the point.  It defines it in a way that is independent of
> > the CPU, and thus independent of what endianness the CPU operates in.
> 
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
> 
> your proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> -> ldw_p() will give us the correct value to work with
> 
> current proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
> 
> -> *(uint32_t*)data will give us the correct value to work with
> 
> 
> There are pros and cons for both approaches.
> 
> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
> 
> Obviously both approaches would work as long as they're properly defined :).
> 
Just to clarify, with approach 2 existing supported QEMU configurations
of BE/LE on both ARM and PPC still work - it is only for future mixed
endian supprt we need to modify QEMU, right?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky Jan. 25, 2014, 2:37 a.m. UTC | #10
On 24 January 2014 18:15, Alexander Graf <agraf@suse.de> wrote:
>
> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
>
>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>> index 366bf4b..6dbd68c 100644
>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>> true, and should be filled by application code otherwise.
>>>>>
>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>> +<type> val = *((<type> *)mmio.data).
>>>>
>>>> Host userspace should be able to do that with what results?  It would
>>>> only produce a directly usable value if host endianness is the same as
>>>> the emulated device's endianness.
>>>
>>> With the result that it gets the value the CPU has sent out on
>>> the bus as the memory transaction.
>>
>> Doesn't that assume the host kernel endianness is the same as the bus
>> (or rather, that the host CPU would not swap such an access before it
>> hits the bus)?
>>
>> If you take the same hardware and boot a little endian host kernel one
>> day, and a big endian host kernel the next, the bus doesn't change, and
>> neither should the bytewise (assuming address invariance) contents of
>> data[].  How data[] would look when read as a larger integer would of
>> course change -- but that's due to how you're reading it.
>>
>> It's clear to say that a value in memory has been stored there in host
>> endianness when the value is as you would want to see it in a CPU
>> register, but it's less clear when you talk about it relative to values
>> on a bus.  It's harder to correlate that to something that is software
>> visible.
>>
>> I don't think there's any actual technical difference between your
>> wording and mine when each wording is properly interpreted, but I
>> suspect my wording is less likely to be misinterpreted (I could be
>> wrong).
>>
>>> Obviously if what userspace
>>> is emulating is a bus which has a byteswapping bridge or if it's
>>> being helpful to device emulation by providing "here's the value
>>> even though you think you're wired up backwards" then it needs
>>> to byteswap.
>>
>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>> something that depends on the endianness that the host CPU is currently
>> running in.
>>
>>>> How about a wording like this:
>>>>
>>>>  The 'data' member contains, in its first 'len' bytes, the value as it
>>>>  would appear if the guest had accessed memory rather than I/O.
>>>
>>> I think this is confusing, because now userspace authors have
>>> to figure out how to get back to "value X of size Y at address Z"
>>> by interpreting this text... Can you write out the equivalent of
>>> Christoffer's text "here's how you get the memory transaction
>>> value" for what you want?
>>
>> Userspace swaps the value if and only if userspace's endianness differs
>> from the endianness with which the device interprets the data
>> (regardless of whether said interpretation is considered natural or
>> swapped relative to the way the bus is documented).  It's similar to how
>> userspace would handle emulating DMA.
>>
>> KVM swaps the value if and only if the endianness of the guest access
>> differs from that of the host, i.e. if it would have done swapping when
>> emulating an ordinary memory access.
>>
>>> (Also, value as it would appear to who?)
>>
>> As it would appear to anyone.  It works because data[] actually is
>> memory.  Any difference in how data appears based on the reader's
>> context would already be reflected when the reader performs the load.
>>
>>> I think your wording implies that the order of bytes in data[] depend
>>> on the guest CPU "usual byte order", ie the order which the CPU
>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>> and it would mean it would come out differently from
>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>> endianness from that "usual" order.
>>
>> It doesn't depend on "usual" anything.  The only thing it implicitly
>> says about guest byte order is that it's KVM's job to implement any
>> swapping if the endianness of the guest access is different from the
>> endianness of the host kernel access (whether it's due to the guest's
>> mode, the way a page is mapped, the instruction used, etc).
>>
>>> Finally, I think it's a bit confusing in that "as if the guest had
>>> accessed memory" is assigning implicit semantics to memory
>>> in the emulated system, when memory is actually kind of outside
>>> KVM's purview because it's not part of the CPU.
>>
>> That's sort of the point.  It defines it in a way that is independent of
>> the CPU, and thus independent of what endianness the CPU operates in.
>
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>
> your proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

Actually it is { 0x01, 0x02, 0x03, 0x04 } in all cases.
First of all, it does not depend on host endianity because in
Scott's case it is defined in terms where KVM host is not
even present.

If you just take real h/w (no emulation) and put
memory instead of IO and it is the same driver does the same
access operation on the same device bytes wise it will look
always the same. Because in one case driver just write word,
but when driver operates in opposite endianity it swaps word
first and then write it. Bytes wise it is the same memory
all the time. Device sees the same value in both cases.

Look at writel, readl, readl_relaxed, etc functions... Clean
endian agnostic driver just uses those access functions
and those functions handle byteswap if needed under
the cover.

Thanks,
Victor

> -> ldw_p() will give us the correct value to work with
>
> current proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>
> -> *(uint32_t*)data will give us the correct value to work with
>
>
> There are pros and cons for both approaches.
>
> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>
> Obviously both approaches would work as long as they're properly defined :).
>
>
> Alex
>
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Jan. 25, 2014, 9:13 a.m. UTC | #11
> Am 25.01.2014 um 03:34 schrieb Christoffer Dall <christoffer.dall@linaro.org>:
> 
>> On Sat, Jan 25, 2014 at 03:15:35AM +0100, Alexander Graf wrote:
>> 
>>> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
>>> 
>>>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>>>> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
>>>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index 366bf4b..6dbd68c 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>>> true, and should be filled by application code otherwise.
>>>>>> 
>>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>>> +<type> val = *((<type> *)mmio.data).
>>>>> 
>>>>> Host userspace should be able to do that with what results?  It would
>>>>> only produce a directly usable value if host endianness is the same as
>>>>> the emulated device's endianness.
>>>> 
>>>> With the result that it gets the value the CPU has sent out on
>>>> the bus as the memory transaction.
>>> 
>>> Doesn't that assume the host kernel endianness is the same as the bus
>>> (or rather, that the host CPU would not swap such an access before it
>>> hits the bus)?
>>> 
>>> If you take the same hardware and boot a little endian host kernel one
>>> day, and a big endian host kernel the next, the bus doesn't change, and
>>> neither should the bytewise (assuming address invariance) contents of
>>> data[].  How data[] would look when read as a larger integer would of
>>> course change -- but that's due to how you're reading it.
>>> 
>>> It's clear to say that a value in memory has been stored there in host
>>> endianness when the value is as you would want to see it in a CPU
>>> register, but it's less clear when you talk about it relative to values
>>> on a bus.  It's harder to correlate that to something that is software
>>> visible.
>>> 
>>> I don't think there's any actual technical difference between your
>>> wording and mine when each wording is properly interpreted, but I
>>> suspect my wording is less likely to be misinterpreted (I could be
>>> wrong).
>>> 
>>>> Obviously if what userspace
>>>> is emulating is a bus which has a byteswapping bridge or if it's
>>>> being helpful to device emulation by providing "here's the value
>>>> even though you think you're wired up backwards" then it needs
>>>> to byteswap.
>>> 
>>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>>> something that depends on the endianness that the host CPU is currently
>>> running in.
>>> 
>>>>> How about a wording like this:
>>>>> 
>>>>> The 'data' member contains, in its first 'len' bytes, the value as it
>>>>> would appear if the guest had accessed memory rather than I/O.
>>>> 
>>>> I think this is confusing, because now userspace authors have
>>>> to figure out how to get back to "value X of size Y at address Z"
>>>> by interpreting this text... Can you write out the equivalent of
>>>> Christoffer's text "here's how you get the memory transaction
>>>> value" for what you want?
>>> 
>>> Userspace swaps the value if and only if userspace's endianness differs
>>> from the endianness with which the device interprets the data
>>> (regardless of whether said interpretation is considered natural or
>>> swapped relative to the way the bus is documented).  It's similar to how
>>> userspace would handle emulating DMA.
>>> 
>>> KVM swaps the value if and only if the endianness of the guest access
>>> differs from that of the host, i.e. if it would have done swapping when
>>> emulating an ordinary memory access.
>>> 
>>>> (Also, value as it would appear to who?)
>>> 
>>> As it would appear to anyone.  It works because data[] actually is
>>> memory.  Any difference in how data appears based on the reader's
>>> context would already be reflected when the reader performs the load.
>>> 
>>>> I think your wording implies that the order of bytes in data[] depend
>>>> on the guest CPU "usual byte order", ie the order which the CPU
>>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>>> and it would mean it would come out differently from
>>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>>> endianness from that "usual" order.
>>> 
>>> It doesn't depend on "usual" anything.  The only thing it implicitly
>>> says about guest byte order is that it's KVM's job to implement any
>>> swapping if the endianness of the guest access is different from the
>>> endianness of the host kernel access (whether it's due to the guest's
>>> mode, the way a page is mapped, the instruction used, etc).
>>> 
>>>> Finally, I think it's a bit confusing in that "as if the guest had
>>>> accessed memory" is assigning implicit semantics to memory
>>>> in the emulated system, when memory is actually kind of outside
>>>> KVM's purview because it's not part of the CPU.
>>> 
>>> That's sort of the point.  It defines it in a way that is independent of
>>> the CPU, and thus independent of what endianness the CPU operates in.
>> 
>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>> 
>> your proposal:
>> 
>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>> 
>> -> ldw_p() will give us the correct value to work with
>> 
>> current proposal:
>> 
>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>> 
>> -> *(uint32_t*)data will give us the correct value to work with
>> 
>> 
>> There are pros and cons for both approaches.
>> 
>> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
>> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>> 
>> Obviously both approaches would work as long as they're properly defined :).
> Just to clarify, with approach 2 existing supported QEMU configurations
> of BE/LE on both ARM and PPC still work - it is only for future mixed
> endian supprt we need to modify QEMU, right?

Yes, I'm fairly sure we all agree that today's combinations should keep working the way they used to. It's only the new host ones (LE ppc, BE arm) that are under discussion.


Alex

> 
> -Christoffer
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Jan. 25, 2014, 9:20 a.m. UTC | #12
> Am 25.01.2014 um 03:37 schrieb Victor Kamensky <victor.kamensky@linaro.org>:
> 
>> On 24 January 2014 18:15, Alexander Graf <agraf@suse.de> wrote:
>> 
>>> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
>>> 
>>>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>>>> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
>>>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index 366bf4b..6dbd68c 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>>> true, and should be filled by application code otherwise.
>>>>>> 
>>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>>> +<type> val = *((<type> *)mmio.data).
>>>>> 
>>>>> Host userspace should be able to do that with what results?  It would
>>>>> only produce a directly usable value if host endianness is the same as
>>>>> the emulated device's endianness.
>>>> 
>>>> With the result that it gets the value the CPU has sent out on
>>>> the bus as the memory transaction.
>>> 
>>> Doesn't that assume the host kernel endianness is the same as the bus
>>> (or rather, that the host CPU would not swap such an access before it
>>> hits the bus)?
>>> 
>>> If you take the same hardware and boot a little endian host kernel one
>>> day, and a big endian host kernel the next, the bus doesn't change, and
>>> neither should the bytewise (assuming address invariance) contents of
>>> data[].  How data[] would look when read as a larger integer would of
>>> course change -- but that's due to how you're reading it.
>>> 
>>> It's clear to say that a value in memory has been stored there in host
>>> endianness when the value is as you would want to see it in a CPU
>>> register, but it's less clear when you talk about it relative to values
>>> on a bus.  It's harder to correlate that to something that is software
>>> visible.
>>> 
>>> I don't think there's any actual technical difference between your
>>> wording and mine when each wording is properly interpreted, but I
>>> suspect my wording is less likely to be misinterpreted (I could be
>>> wrong).
>>> 
>>>> Obviously if what userspace
>>>> is emulating is a bus which has a byteswapping bridge or if it's
>>>> being helpful to device emulation by providing "here's the value
>>>> even though you think you're wired up backwards" then it needs
>>>> to byteswap.
>>> 
>>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>>> something that depends on the endianness that the host CPU is currently
>>> running in.
>>> 
>>>>> How about a wording like this:
>>>>> 
>>>>> The 'data' member contains, in its first 'len' bytes, the value as it
>>>>> would appear if the guest had accessed memory rather than I/O.
>>>> 
>>>> I think this is confusing, because now userspace authors have
>>>> to figure out how to get back to "value X of size Y at address Z"
>>>> by interpreting this text... Can you write out the equivalent of
>>>> Christoffer's text "here's how you get the memory transaction
>>>> value" for what you want?
>>> 
>>> Userspace swaps the value if and only if userspace's endianness differs
>>> from the endianness with which the device interprets the data
>>> (regardless of whether said interpretation is considered natural or
>>> swapped relative to the way the bus is documented).  It's similar to how
>>> userspace would handle emulating DMA.
>>> 
>>> KVM swaps the value if and only if the endianness of the guest access
>>> differs from that of the host, i.e. if it would have done swapping when
>>> emulating an ordinary memory access.
>>> 
>>>> (Also, value as it would appear to who?)
>>> 
>>> As it would appear to anyone.  It works because data[] actually is
>>> memory.  Any difference in how data appears based on the reader's
>>> context would already be reflected when the reader performs the load.
>>> 
>>>> I think your wording implies that the order of bytes in data[] depend
>>>> on the guest CPU "usual byte order", ie the order which the CPU
>>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>>> and it would mean it would come out differently from
>>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>>> endianness from that "usual" order.
>>> 
>>> It doesn't depend on "usual" anything.  The only thing it implicitly
>>> says about guest byte order is that it's KVM's job to implement any
>>> swapping if the endianness of the guest access is different from the
>>> endianness of the host kernel access (whether it's due to the guest's
>>> mode, the way a page is mapped, the instruction used, etc).
>>> 
>>>> Finally, I think it's a bit confusing in that "as if the guest had
>>>> accessed memory" is assigning implicit semantics to memory
>>>> in the emulated system, when memory is actually kind of outside
>>>> KVM's purview because it's not part of the CPU.
>>> 
>>> That's sort of the point.  It defines it in a way that is independent of
>>> the CPU, and thus independent of what endianness the CPU operates in.
>> 
>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>> 
>> your proposal:
>> 
>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> Actually it is { 0x01, 0x02, 0x03, 0x04 } in all cases.
> First of all, it does not depend on host endianity because in
> Scott's case it is defined in terms where KVM host is not
> even present.
> 
> If you just take real h/w (no emulation) and put
> memory instead of IO and it is the same driver does the same
> access operation on the same device bytes wise it will look
> always the same. Because in one case driver just write word,
> but when driver operates in opposite endianity it swaps word
> first and then write it. Bytes wise it is the same memory
> all the time. Device sees the same value in both cases.
> 
> Look at writel, readl, readl_relaxed, etc functions... Clean
> endian agnostic driver just uses those access functions
> and those functions handle byteswap if needed under
> the cover.

You're thinking Linux again. Of course Linux does things on its side to compensate for endian swaps on the bus. It even does it on BE PPC if you access devices through swizzling buses, but we don't care as hypervisor. All we know in kvm is:

  - instruction used for access
  -> originating register (value)
  -> access size (len)
  -> virtual address (which we translate to phys)
  - guest endianness mode

So my example below was trying to show what happens when value=0x01020304, len=4 with different host/guest endian combinations.


Alex

> 
> Thanks,
> Victor
> 
>> -> ldw_p() will give us the correct value to work with
>> 
>> current proposal:
>> 
>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>> 
>> -> *(uint32_t*)data will give us the correct value to work with
>> 
>> 
>> There are pros and cons for both approaches.
>> 
>> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
>> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>> 
>> Obviously both approaches would work as long as they're properly defined :).
>> 
>> 
>> Alex
>> 
>> 
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky Jan. 25, 2014, 3:36 p.m. UTC | #13
On 25 January 2014 01:20, Alexander Graf <agraf@suse.de> wrote:
>
>
>> Am 25.01.2014 um 03:37 schrieb Victor Kamensky <victor.kamensky@linaro.org>:
>>
>>> On 24 January 2014 18:15, Alexander Graf <agraf@suse.de> wrote:
>>>
>>>> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
>>>>
>>>>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>>>>> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
>>>>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>>> index 366bf4b..6dbd68c 100644
>>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>>>> true, and should be filled by application code otherwise.
>>>>>>>
>>>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>>>> +<type> val = *((<type> *)mmio.data).
>>>>>>
>>>>>> Host userspace should be able to do that with what results?  It would
>>>>>> only produce a directly usable value if host endianness is the same as
>>>>>> the emulated device's endianness.
>>>>>
>>>>> With the result that it gets the value the CPU has sent out on
>>>>> the bus as the memory transaction.
>>>>
>>>> Doesn't that assume the host kernel endianness is the same as the bus
>>>> (or rather, that the host CPU would not swap such an access before it
>>>> hits the bus)?
>>>>
>>>> If you take the same hardware and boot a little endian host kernel one
>>>> day, and a big endian host kernel the next, the bus doesn't change, and
>>>> neither should the bytewise (assuming address invariance) contents of
>>>> data[].  How data[] would look when read as a larger integer would of
>>>> course change -- but that's due to how you're reading it.
>>>>
>>>> It's clear to say that a value in memory has been stored there in host
>>>> endianness when the value is as you would want to see it in a CPU
>>>> register, but it's less clear when you talk about it relative to values
>>>> on a bus.  It's harder to correlate that to something that is software
>>>> visible.
>>>>
>>>> I don't think there's any actual technical difference between your
>>>> wording and mine when each wording is properly interpreted, but I
>>>> suspect my wording is less likely to be misinterpreted (I could be
>>>> wrong).
>>>>
>>>>> Obviously if what userspace
>>>>> is emulating is a bus which has a byteswapping bridge or if it's
>>>>> being helpful to device emulation by providing "here's the value
>>>>> even though you think you're wired up backwards" then it needs
>>>>> to byteswap.
>>>>
>>>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>>>> something that depends on the endianness that the host CPU is currently
>>>> running in.
>>>>
>>>>>> How about a wording like this:
>>>>>>
>>>>>> The 'data' member contains, in its first 'len' bytes, the value as it
>>>>>> would appear if the guest had accessed memory rather than I/O.
>>>>>
>>>>> I think this is confusing, because now userspace authors have
>>>>> to figure out how to get back to "value X of size Y at address Z"
>>>>> by interpreting this text... Can you write out the equivalent of
>>>>> Christoffer's text "here's how you get the memory transaction
>>>>> value" for what you want?
>>>>
>>>> Userspace swaps the value if and only if userspace's endianness differs
>>>> from the endianness with which the device interprets the data
>>>> (regardless of whether said interpretation is considered natural or
>>>> swapped relative to the way the bus is documented).  It's similar to how
>>>> userspace would handle emulating DMA.
>>>>
>>>> KVM swaps the value if and only if the endianness of the guest access
>>>> differs from that of the host, i.e. if it would have done swapping when
>>>> emulating an ordinary memory access.
>>>>
>>>>> (Also, value as it would appear to who?)
>>>>
>>>> As it would appear to anyone.  It works because data[] actually is
>>>> memory.  Any difference in how data appears based on the reader's
>>>> context would already be reflected when the reader performs the load.
>>>>
>>>>> I think your wording implies that the order of bytes in data[] depend
>>>>> on the guest CPU "usual byte order", ie the order which the CPU
>>>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>>>> and it would mean it would come out differently from
>>>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>>>> endianness from that "usual" order.
>>>>
>>>> It doesn't depend on "usual" anything.  The only thing it implicitly
>>>> says about guest byte order is that it's KVM's job to implement any
>>>> swapping if the endianness of the guest access is different from the
>>>> endianness of the host kernel access (whether it's due to the guest's
>>>> mode, the way a page is mapped, the instruction used, etc).
>>>>
>>>>> Finally, I think it's a bit confusing in that "as if the guest had
>>>>> accessed memory" is assigning implicit semantics to memory
>>>>> in the emulated system, when memory is actually kind of outside
>>>>> KVM's purview because it's not part of the CPU.
>>>>
>>>> That's sort of the point.  It defines it in a way that is independent of
>>>> the CPU, and thus independent of what endianness the CPU operates in.
>>>
>>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>>>
>>> your proposal:
>>>
>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>
>> Actually it is { 0x01, 0x02, 0x03, 0x04 } in all cases.
>> First of all, it does not depend on host endianity because in
>> Scott's case it is defined in terms where KVM host is not
>> even present.
>>
>> If you just take real h/w (no emulation) and put
>> memory instead of IO and it is the same driver does the same
>> access operation on the same device bytes wise it will look
>> always the same. Because in one case driver just write word,
>> but when driver operates in opposite endianity it swaps word
>> first and then write it. Bytes wise it is the same memory
>> all the time. Device sees the same value in both cases.
>>
>> Look at writel, readl, readl_relaxed, etc functions... Clean
>> endian agnostic driver just uses those access functions
>> and those functions handle byteswap if needed under
>> the cover.
>
> You're thinking Linux again. Of course Linux does things on
> its side to compensate for endian swaps on the bus.

Anyone who would want to work with the same h/w in
endian agnostic way will have to do those  endian swaps
for one of the cases.

> It even
> does it on BE PPC if you access devices through swizzling
> buses, but we don't care as hypervisor. All we know in kvm is:
>
>   - instruction used for access
>   -> originating register (value)
>   -> access size (len)
>   -> virtual address (which we translate to phys)
>   - guest endianness mode

Do you agree that in above if you just change guest endianness
then device will see different (byteswapped) value so as far as
device concerned it is completely different transactions, different
effect. Why do we consider them? So if you want to see the
same transaction and have the same effect on device, then when
guest switches endianness it will need to add/drop byteswap,
in order to have the same effect.

Thanks,
Victor

> So my example below was trying to show what happens
> when value=0x01020304, len=4 with different host/guest endian
> combinations.
>
>
> Alex
>
>>
>> Thanks,
>> Victor
>>
>>> -> ldw_p() will give us the correct value to work with
>>>
>>> current proposal:
>>>
>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>>  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>
>>> -> *(uint32_t*)data will give us the correct value to work with
>>>
>>>
>>> There are pros and cons for both approaches.
>>>
>>> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
>>> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>>>
>>> Obviously both approaches would work as long as they're properly defined :).
>>>
>>>
>>> Alex
>>>
>>>
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@lists.cs.columbia.edu
>>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Jan. 25, 2014, 4:12 p.m. UTC | #14
On 25.01.2014, at 16:36, Victor Kamensky <victor.kamensky@linaro.org> wrote:

> On 25 January 2014 01:20, Alexander Graf <agraf@suse.de> wrote:
> 
>> It even
>> does it on BE PPC if you access devices through swizzling
>> buses, but we don't care as hypervisor. All we know in kvm is:
>> 
>>  - instruction used for access
>>  -> originating register (value)
>>  -> access size (len)
>>  -> virtual address (which we translate to phys)
>>  - guest endianness mode
> 
> Do you agree that in above if you just change guest endianness
> then device will see different (byteswapped) value so as far as
> device concerned it is completely different transactions, different
> effect. Why do we consider them? So if you want to see the
> same transaction and have the same effect on device, then when
> guest switches endianness it will need to add/drop byteswap,
> in order to have the same effect.

Yes. We consider it because KVM has to do the counter-swap manually based on the guest CPU's endianness setting because the only thing KVM knows are the 4 bits I mentioned above.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Jan. 25, 2014, 4:23 p.m. UTC | #15
On 25 January 2014 02:15, Alexander Graf <agraf@suse.de> wrote:
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>
> your proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>
> -> ldw_p() will give us the correct value to work with
>
> current proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>
> -> *(uint32_t*)data will give us the correct value to work with

For completeness, ditto, ARM:
Scott's proposal (you end up with the same values in
the data array as for PPC, so userspace has to know the
"default" endianness so it can do a byteswap if the
process endianness doesn't match it; on QEMU
ldl_p() handles this for us, as you say):

   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

current proposal:

   BE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
   LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

The values in the data array are different than on
PPC, reflecting the fact that the "default" endianness
is different; userspace does

 -> *(uint32_t*)data will give us the correct value to work with

regardless of what the guest CPU arch is.

> There are pros and cons for both approaches.
>
> Pro approach 1 is that it fits the way data[] is read today,
> so no QEMU changes are required. However, it means
> that user space needs to have awareness of the
> "default endianness".
> With approach 2 you don't care about endianness at all
> anymore - you just get a payload that the host process
> can read in.
>
> Obviously both approaches would work as long as they're
> properly defined :).

Agreed with all of that.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Jan. 27, 2014, 7:41 a.m. UTC | #16
On 26.01.2014, at 04:46, Victor Kamensky <victor.kamensky@linaro.org> wrote:

> On 25 January 2014 10:31, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>> On Sat, Jan 25, 2014 at 04:23:00PM +0000, Peter Maydell wrote:
>>> On 25 January 2014 02:15, Alexander Graf <agraf@suse.de> wrote:
>>>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
> 
> Alex, could you please for the clarification purpose, tell
> us what instructions BE guest and LE guest run in below
> cases. Suppose r0 register has device address, and r1
> register has word as 0x01020304. And suppose we have
> regular memory instead of io at r0 address. How do you
> see 'unsigned char data[4]' will look like if 'data'
> address is at r0.
> 
> For this, for a moment, I think, we can forget about KVM,
> just consider BE guest and LE guest running on regular
> h/w.
> 
> Next we will consider cases as below when the same
> BE guest and LE guest runs on top of BE host or LE
> host KVM in different proposals combinations. Effectively
> defining non-virtual examples first and consider how they
> will work under KVM.
> 
> I was thinking that that you consider
> 
> BE guest:
>   stw r1, 0, r0
> just write r1 at r0 address
> 
> LE guest
>   stwbrx r1, 0, r0
> stwbrx - word byteswap r1 value and write it at r0 address

LE guests also use "stw", but with MSR.LE=1.

Of course an OS that tries to access the same device on the same CPU with MSR.LE=1 that it used to access with stw may use stwbrx to swap the endianness of the resulting operation because the CPU will swap it again because of MSR.LE. But that's guest OS logic which we can't rely on in KVM. Maybe the guest was using stwbrx in the BE case and is using stw with LE. Maybe it uses stw in both cases and fetches the bytes it needs to write from a buffer it has prepared.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Jan. 27, 2014, 7:52 a.m. UTC | #17
On 26.01.2014, at 06:43, Victor Kamensky <victor.kamensky@linaro.org> wrote:

> On 25 January 2014 19:46, Victor Kamensky <victor.kamensky@linaro.org> wrote:
>> On 25 January 2014 10:31, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>> On Sat, Jan 25, 2014 at 04:23:00PM +0000, Peter Maydell wrote:
>>>> On 25 January 2014 02:15, Alexander Graf <agraf@suse.de> wrote:
>>>>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>> 
>> Alex, could you please for the clarification purpose, tell
>> us what instructions BE guest and LE guest run in below
>> cases. Suppose r0 register has device address, and r1
>> register has word as 0x01020304. And suppose we have
>> regular memory instead of io at r0 address. How do you
>> see 'unsigned char data[4]' will look like if 'data'
>> address is at r0.
>> 
>> For this, for a moment, I think, we can forget about KVM,
>> just consider BE guest and LE guest running on regular
>> h/w.
>> 
>> Next we will consider cases as below when the same
>> BE guest and LE guest runs on top of BE host or LE
>> host KVM in different proposals combinations. Effectively
>> defining non-virtual examples first and consider how they
>> will work under KVM.
>> 
>> I was thinking that that you consider
>> 
>> BE guest:
>>   stw r1, 0, r0
>> just write r1 at r0 address
>> 
>> LE guest
>>   stwbrx r1, 0, r0
>> stwbrx - word byteswap r1 value and write it at r0 address
>> 
>> which would produce at r0 address
>> data[] = {0x01, 0x02, 0x03, 0x04}
>> for both above cases, and have the same effect
>> on device if its register is mapped at r0 address.
>> 
>> But judging by your other response in previous email, I
>> think you have something different in mind. Please
>> describe it.
> 
> Alex, I reread your tables again. I think that just
> 'stw r1, 0, r0' for both LE and BE will match them. As
> far as data[] concerned in BE it will be data[] = {0x01,
> 0x02, 0x03, 0x04}, and in LE it wil lbe data[] = {0x04,
> 0x03, 0x02, 0x01}.
> 
> Do we agree that these are two different memory
> transactions, as far as device concerned?

Not sure I understand what you mean by that. "stw" is "stw" regardless of the MSR.LE setting. The only reason we care about all of this in KVM is that we implement an instruction emulator that needs to do what the CPU does - and that one interprets MSR.LE.

> 
>> Thanks,
>> Victor
>> 
>>>>> your proposal:
>>>>> 
>>>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>>>  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>>>  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>>>> 
>>>>> -> ldw_p() will give us the correct value to work with
>>>>> 
>>>>> current proposal:
>>>>> 
>>>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>>>  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>>>>  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>>> 
>>>>> -> *(uint32_t*)data will give us the correct value to work with
>>>> 
>>>> For completeness, ditto, ARM:
>>>> Scott's proposal (you end up with the same values in
>>>> the data array as for PPC, so userspace has to know the
>>>> "default" endianness so it can do a byteswap
> 
> Actually userland need to mange mismatch between
> native internal data of simulated device and device
> endianity. If native endianity mismatches device qemu
> need to do byteswap. Otherwise it will be just copy it
> into mmio.data as bytes array.
> In qemu there is already infrastructure that handles
> that. My ARM BE KVM patches use Scott's mmio.
> data[] definition and I did not do anything in qemu
> to make that work.

Right, that's what I indicated with my original email. Scott's proposal makes current QEMU "just work" while our proposal needs QEMU changes.

> 
>>>> if the
>>>> process endianness doesn't match it; on QEMU
>>>> ldl_p() handles this for us, as you say):
>>>> 
>>>>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> Peter, thank you for adding ARM case. It really
> helps to see the difference between definitions.
> 
> Don't you agree that Scott's definition has advantage
> because its interpretation does not depend on
> CPU type?
> 
> It is always the same for the same 'device access
> operation', guest endianity, host endianity. If one
> look at 'stw r1, 0, r0' in BE guest, and 'stwbrx r1,
> 0, r0' in LE guest, which is the same memory
> transaction as device concerned. Table using
> Scott's definition will always look like
> 
> LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
> LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
> BE guest, LE host: { 0x01, 0x02, 0x03, 0x04 }
> LE guest, LE host: { 0x01, 0x02, 0x03, 0x04 }
> 
> if ARM would work with this device to do the
> same thing, its table will look exactly the same.
> 
>>>> 
>>>> current proposal:
>>>> 
>>>>   BE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>>   LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>>> 
>>>> The values in the data array are different than on
>>>> PPC, reflecting the fact that the "default" endianness
>>>> is different; userspace does
>>>> 
>>>> -> *(uint32_t*)data will give us the correct value to work with
>>>> 
>>>> regardless of what the guest CPU arch is.
>>>> 
>>>>> There are pros and cons for both approaches.
>>>>> 
>>>>> Pro approach 1 is that it fits the way data[] is read today,
>>>>> so no QEMU changes are required. However, it means
>>>>> that user space needs to have awareness of the
>>>>> "default endianness".
>>>>> With approach 2 you don't care about endianness at all
>>>>> anymore - you just get a payload that the host process
>>>>> can read in.
>>>>> 
>>>>> Obviously both approaches would work as long as they're
>>>>> properly defined :).
>>>> 
>>>> Agreed with all of that.
>>>> 
>>> Agreed as well.
>>> 
>>> How do we settle on one versus the other?
> 
> Not surprisingly, I vote for Scott's proposal :). ARM BE KVM
> patches previously posted use this definition of mmio.data[].
> 
> Scott's definition interpretation does not depend on CPU type.
> It is much simpler. It does not use notion not very well defined
> like "real bus". it does not use word 'endianness', byteswap,
> cast example, which IMHO makes it more robust.

The more I think about it the more I tend to agree.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Jan. 27, 2014, 9:42 a.m. UTC | #18
On 27 January 2014 07:52, Alexander Graf <agraf@suse.de> wrote:
>
> On 26.01.2014, at 06:43, Victor Kamensky <victor.kamensky@linaro.org> wrote:
>> Scott's definition interpretation does not depend on CPU type.
>> It is much simpler. It does not use notion not very well defined
>> like "real bus". it does not use word 'endianness', byteswap,
>> cast example, which IMHO makes it more robust.
>
> The more I think about it the more I tend to agree.

Personally I find Scott's definition rather more confusing,
at least in wording, and it does require QEMU to depend
on the CPU type. (It took me a fair bit of thinking to confirm
that it really was different to the proposal in Christoffer's
patch and not just a rewording, for example). Anyway, would
somebody like to formulate a proposed wording change to
the API doc for that version? (For preference, one which
doesn't try to describe it in terms of "as if the guest wrote
to memory", given that that is IMHO ambiguous and
confusing.)

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood Jan. 28, 2014, 1:59 a.m. UTC | #19
On Sat, 2014-01-25 at 03:15 +0100, Alexander Graf wrote:
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
> 
> your proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> -> ldw_p() will give us the correct value to work with

Do you mean ldl_p(), which just does a memcpy()?  How is that different
from *(uint32_t*)data, other than complying with C99 aliasing rules?

What you get from doing ldl_p() is the value as interpreted by the host
endianness.  If the memory region endianness is different from the host
endianness, you need to swap.  adjust_endianness() sort of does this,
but it's using TARGET_WORDS_BIGENDIAN instead of the host endianness
which will break if they are not the same.  It works for TCG because it
passes the MMIO value in target endianness rather than host --
effectively the TARGET_WORDS_BIGENDIAN usages in memory.c and
include/exec/softmmu_template.h cancel each other.

> current proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
> 
> -> *(uint32_t*)data will give us the correct value to work with

*(uint32_t*)data in the "LE guest, LE host" case will get you 0x04030201
-- how is that the correct value to work with?  Did you make this table
with the assumption that big-endian interpretation is inherently "the
correct value" on PPC?

> There are pros and cons for both approaches.
> 
> Pro approach 1 is that it fits the way data[] is read today, so no QEMU
> changes are required. However, it means that user space needs to have
> awareness of the "default endianness".

Normally the endianness is well specified by the particular device/bus
(PCI is little-endian, e500 CCSR is big-endian, etc).

I suppose you could define one endianness as "natural" for a particular
platform, and the other as swapped, but I don't think that's a useful
way to look at it except for things like virtio that are defined as
being native-endian (and I don't think KVM_EXIT_MMIO is the right layer
to try to sort that problem out, as it just introduces confusion for
non-virtio MMIO).

Unfortunately, that appears to currently be how QEMU works, so either
QEMU needs to be fixed in a way that is compatible with currently
working scenarios (e.g. by changing adjust_endianness() to use host
endianness and changing the TCG path to match), or it needs to be
explicitly defined in the API what assumptions are being made regarding
the default endianness on each architecture.  I believe these
assumptions are present but implicit in the current proposal (if Alex's
intepretation of it matches what was intended).

> With approach 2 you don't care about endianness at all anymore - you
> just get a payload that the host process can read in.

You always need to care about endianness -- something has to result in
PCI registers appearing swapped compared to e500 CCSR registers.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell Jan. 28, 2014, 8:55 a.m. UTC | #20
On 28 January 2014 01:59, Scott Wood <scottwood@freescale.com> wrote:
> On Sat, 2014-01-25 at 03:15 +0100, Alexander Graf wrote:
>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>>
>> your proposal:
>>
>>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>
>> -> ldw_p() will give us the correct value to work with
>
> Do you mean ldl_p(), which just does a memcpy()?  How is that different
> from *(uint32_t*)data, other than complying with C99 aliasing rules?
>
> What you get from doing ldl_p() is the value as interpreted by the host
> endianness.  If the memory region endianness is different from the host
> endianness, you need to swap.  adjust_endianness() sort of does this,
> but it's using TARGET_WORDS_BIGENDIAN instead of the host endianness
> which will break if they are not the same.  It works for TCG because it
> passes the MMIO value in target endianness rather than host --
> effectively the TARGET_WORDS_BIGENDIAN usages in memory.c and
> include/exec/softmmu_template.h cancel each other.
>
>> current proposal:
>>
>>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>
>> -> *(uint32_t*)data will give us the correct value to work with
>
> *(uint32_t*)data in the "LE guest, LE host" case will get you 0x04030201
> -- how is that the correct value to work with?  Did you make this table
> with the assumption that big-endian interpretation is inherently "the
> correct value" on PPC?

So for ARM, if you put the CPU in big endian mode (by setting
CPSR.E) then the architecture defines that what happens here
is that before any access to the bus the CPU does a byte lane
swap of the data. So a register value of 0x04030201 sent
as a 32 bit store appears on the bus as 0x01020304. If the CPU
is in little endian mode no swapping happens and values in
registers go direct to the bus.

The assumption we made here is that PPC is similar except
that it does no swapping on BE, and swaps on LE. Is this wrong?
(I did look through the PPC architecture docs but couldn't find
anything that clearly stated what happened.)

> I suppose you could define one endianness as "natural" for a particular
> platform, and the other as swapped, but I don't think that's a useful
> way to look at it except for things like virtio that are defined as
> being native-endian (and I don't think KVM_EXIT_MMIO is the right layer
> to try to sort that problem out, as it just introduces confusion for
> non-virtio MMIO).

As I say above, for ARM (v7, BE8) considering one endianness as
'natural' is exactly how the architecture works. (We don't need to
care about BE32, fortunately, as it is obsolete: it used to work kind
of similarly except that it was word invariant and worked by XORing
the address with 1 or 3 when the CPU was in BE mode...)

> Unfortunately, that appears to currently be how QEMU works, so either
> QEMU needs to be fixed in a way that is compatible with currently
> working scenarios (e.g. by changing adjust_endianness() to use host
> endianness and changing the TCG path to match), or it needs to be
> explicitly defined in the API what assumptions are being made regarding
> the default endianness on each architecture.  I believe these
> assumptions are present but implicit in the current proposal (if Alex's
> intepretation of it matches what was intended).
>
>> With approach 2 you don't care about endianness at all anymore - you
>> just get a payload that the host process can read in.
>
> You always need to care about endianness -- something has to result in
> PCI registers appearing swapped compared to e500 CCSR registers.

That's the result of some bridge doing a byte lane swap, isn't it?
Obviously swapping bridges need to be modeled (and in QEMU
we currently do that rather badly, by marking the device endpoints
as having an endianness and inferring the existence of a swap)
but that's not related to the initial CPU-to-bus interface.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 366bf4b..6dbd68c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2565,6 +2565,11 @@  executed a memory-mapped I/O instruction which could not be satisfied
 by kvm.  The 'data' member contains the written data if 'is_write' is
 true, and should be filled by application code otherwise.
 
+The 'data' member byte order is host kernel native endianness, regardless of
+the endianness of the guest, and represents the the value as it would go on the
+bus in real hardware.  The host user space should always be able to do:
+<type> val = *((<type> *)mmio.data).
+
 NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
       KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
 operations are complete (and guest state is consistent) only after userspace