diff mbox

[v4,1/8] fw_cfg: max access size and region size are the same for MMIO data reg

Message ID 1418399932-7658-2-git-send-email-lersek@redhat.com
State New
Headers show

Commit Message

Laszlo Ersek Dec. 12, 2014, 3:58 p.m. UTC
Make it clear that the maximum access size to the MMIO data register
determines the full size of the memory region.

Currently the max access size is 1. Ensure that if a larger size were used
in "fw_cfg_data_mem_ops.valid.max_access_size", the memory subsystem would
split the access to byte-sized accesses internally, in increasing address
order.

fw_cfg_data_mem_read() and fw_cfg_data_mem_write() don't care about
"address" or "size"; they just call the sequential fw_cfg_read() and
fw_cfg_write() functions, correspondingly. Therefore the automatic
splitting is just right. (The endianness of "fw_cfg_data_mem_ops" is
native.)

This patch doesn't change behavior.

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---

Notes:
    v4:
    - unchanged
    
    v3:
    - new in v3 [Drew Jones]

 hw/nvram/fw_cfg.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Andrew Jones Dec. 16, 2014, 1:48 p.m. UTC | #1
On Fri, Dec 12, 2014 at 04:58:45PM +0100, Laszlo Ersek wrote:
> Make it clear that the maximum access size to the MMIO data register
> determines the full size of the memory region.
> 
> Currently the max access size is 1. Ensure that if a larger size were used
> in "fw_cfg_data_mem_ops.valid.max_access_size", the memory subsystem would
> split the access to byte-sized accesses internally, in increasing address
> order.
> 
> fw_cfg_data_mem_read() and fw_cfg_data_mem_write() don't care about
> "address" or "size"; they just call the sequential fw_cfg_read() and
> fw_cfg_write() functions, correspondingly. Therefore the automatic
> splitting is just right. (The endianness of "fw_cfg_data_mem_ops" is
> native.)

This 'is native' caught my eye. Laszlo's
Documentation/devicetree/bindings/arm/fw-cfg.txt patch states that the
selector register is LE and 

"
The data register allows accesses with 8, 16, 32 and 64-bit width.  Accesses
larger than a byte are interpreted as arrays, bundled together only for better
performance. The bytes constituting such a word, in increasing address order,
correspond to the bytes that would have been transferred by byte-wide accesses
in chronological order.
"

I chatted with Laszlo to make sure the host-is-BE case was considered.
It looks like there may be an issue there that Laszlo promised to look
into. Laszlo had already noticed that the selector was defined as native
in qemu as well, but should be LE. Now that we support > 1 byte reads
and writes from the data port, then maybe we should look into changing
that as well.

drew

> 
> This patch doesn't change behavior.
> 
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
> 
> Notes:
>     v4:
>     - unchanged
>     
>     v3:
>     - new in v3 [Drew Jones]
> 
>  hw/nvram/fw_cfg.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index c4b78ed..7f6031c 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -30,9 +30,8 @@
>  #include "qemu/error-report.h"
>  #include "qemu/config-file.h"
>  
>  #define FW_CFG_SIZE 2
> -#define FW_CFG_DATA_SIZE 1
>  #define TYPE_FW_CFG "fw_cfg"
>  #define FW_CFG_NAME "fw_cfg"
>  #define FW_CFG_PATH "/machine/" FW_CFG_NAME
>  #define FW_CFG(obj) OBJECT_CHECK(FWCfgState, (obj), TYPE_FW_CFG)
> @@ -323,8 +322,9 @@ static const MemoryRegionOps fw_cfg_data_mem_ops = {
>      .valid = {
>          .min_access_size = 1,
>          .max_access_size = 1,
>      },
> +    .impl.max_access_size = 1,
>  };
>  
>  static const MemoryRegionOps fw_cfg_comb_mem_ops = {
>      .read = fw_cfg_comb_read,
> @@ -608,9 +608,10 @@ static void fw_cfg_initfn(Object *obj)
>      memory_region_init_io(&s->ctl_iomem, OBJECT(s), &fw_cfg_ctl_mem_ops, s,
>                            "fwcfg.ctl", FW_CFG_SIZE);
>      sysbus_init_mmio(sbd, &s->ctl_iomem);
>      memory_region_init_io(&s->data_iomem, OBJECT(s), &fw_cfg_data_mem_ops, s,
> -                          "fwcfg.data", FW_CFG_DATA_SIZE);
> +                          "fwcfg.data",
> +                          fw_cfg_data_mem_ops.valid.max_access_size);
>      sysbus_init_mmio(sbd, &s->data_iomem);
>      /* In case ctl and data overlap: */
>      memory_region_init_io(&s->comb_iomem, OBJECT(s), &fw_cfg_comb_mem_ops, s,
>                            "fwcfg", FW_CFG_SIZE);
> -- 
> 1.8.3.1
> 
>
Laszlo Ersek Dec. 16, 2014, 7 p.m. UTC | #2
On 12/16/14 14:48, Andrew Jones wrote:
> On Fri, Dec 12, 2014 at 04:58:45PM +0100, Laszlo Ersek wrote:
>> Make it clear that the maximum access size to the MMIO data register
>> determines the full size of the memory region.
>>
>> Currently the max access size is 1. Ensure that if a larger size were
>> used in "fw_cfg_data_mem_ops.valid.max_access_size", the memory
>> subsystem would split the access to byte-sized accesses internally,
>> in increasing address order.
>>
>> fw_cfg_data_mem_read() and fw_cfg_data_mem_write() don't care about
>> "address" or "size"; they just call the sequential fw_cfg_read() and
>> fw_cfg_write() functions, correspondingly. Therefore the automatic
>> splitting is just right. (The endianness of "fw_cfg_data_mem_ops" is
>> native.)
>
> This 'is native' caught my eye. Laszlo's
> Documentation/devicetree/bindings/arm/fw-cfg.txt patch states that the
> selector register is LE and
>
> "
> The data register allows accesses with 8, 16, 32 and 64-bit width.
> Accesses larger than a byte are interpreted as arrays, bundled
> together only for better performance. The bytes constituting such a
> word, in increasing address order, correspond to the bytes that would
> have been transferred by byte-wide accesses in chronological order.
> "
>
> I chatted with Laszlo to make sure the host-is-BE case was considered.
> It looks like there may be an issue there that Laszlo promised to look
> into. Laszlo had already noticed that the selector was defined as
> native in qemu as well, but should be LE. Now that we support > 1 byte
> reads and writes from the data port, then maybe we should look into
> changing that as well.

Yes.

The root of this question is what each of

enum device_endian {
    DEVICE_NATIVE_ENDIAN,
    DEVICE_BIG_ENDIAN,
    DEVICE_LITTLE_ENDIAN,
};

means.

Consider the following call tree, which implements the splitting /
combining of an MMIO read:

memory_region_dispatch_read() [memory.c]
  memory_region_dispatch_read1()
    access_with_adjusted_size()
      memory_region_big_endian()
      for each byte in the wide access:
        memory_region_read_accessor()
          fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c]
            fw_cfg_read()
  adjust_endianness()
    memory_region_wrong_endianness()
    bswapXX()

The function access_with_adjusted_size() always iterates over the MMIO
address range in incrementing address order. However, it can calculate
the shift count for memory_region_read_accessor() in two ways.

When memory_region_big_endian() returns "true", the shift count
decreases as the MMIO address increases.

When memory_region_big_endian() returns "false", the shift count
increases as the MMIO address increases.

In memory_region_read_accessor(), the shift count is used for a logical
(ie. numeric) bitwise left shift (<<). That operator works with numeric
values and hides (ie. accounts for) host endianness.

Let's consider
- an array of two bytes, [0xaa, 0xbb],
- a uint16_t access made from the guest,
- and all twelve possible cases.

In the table below, the "host", "guest" and "device_endian" columns are
input variables. The columns to the right are calculated / derived
values. The arrows above the columns show the data dependencies.

After memory_region_dispatch_read1() constructs the value that is
visible in the "host value" column, it is passed to adjust_endianness().
If memory_region_wrong_endianness() returns "true", then the host value
is byte-swapped. The result is then passed to the guest.

              +---------------------------------------------------------------------------------------------------------------+----------+
              |                                                                                                               |          |
            +---- ------+-------------------------------------------------------------------------+                           |          |
            | |         |                                                                         |                           |          |
      +----------------------------------------------------------+---------+                      |                           |          |
      |     | |         |                                        |         |                      |                           |          |
      |   +-----------+-------------------+  +----------------+  |         |  +------------------------+-------------------+  |          |
      |   | | |       | |                 |  |                |  |         |  |                   |    |                   |  |          |
      |   | | |       | |                 v  |                v  |         v  |                   v    |                   v  |          v
 #  host  guest  device_endian  memory_region_big_endian()  host value  host repr.    memory_region_wrong_endianness()  guest repr.   guest value
--  ----  -----  -------------  --------------------------  ----------  ------------  --------------------------------  ------------  -----------
 1  LE    LE     native         0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
 2  LE    LE     BE             1                           0xaabb      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xbbaa
 3  LE    LE     LE             0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa

 4  LE    BE     native         1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
 5  LE    BE     BE             1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
 6  LE    BE     LE             0                           0xbbaa      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xbbaa

 7  BE    LE     native         0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
 8  BE    LE     BE             1                           0xaabb      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xaabb
 9  BE    LE     LE             0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb

10  BE    BE     native         1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
11  BE    BE     BE             1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
12  BE    BE     LE             0                           0xbbaa      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xaabb


Looking at the two rightmost columns, we must conclude:

(a) The "device_endian" field has absolutely no significance wrt. what
    the guest sees. In each triplet of cases, when we go from
    DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN,
    the guest sees the exact same value.

    I don't understand this result (it makes me doubt device_endian
    makes any sense). What did I do wrong?

    Just to be sure that I was not seeing ghosts, I actually tested this
    result. On an x86_64 hosts I tested the aarch64 guest firmware
    (using TCG), cycling the "fw_cfg_data_mem_ops.endianness" field
    through all three possible values. That is, I tested cases #1 to #3.

    They *all* worked!

(b) Considering a given host endianness (ie. a group of six cases): when
    the guest goes from little endian to big endian, the *numerical
    value* the guest sees does not change.

    In addition, fixating the guest endianness, and changing the host
    endianness, the *numerical value* that the guest sees (for the
    original host-side array [0xaa, 0xbb]) changes.

    This means that this interface is *value preserving*, not
    representation preserving. In other words: when host and guest
    endiannesses are identical, the *array* is transferred okay (the
    guest array matches the initial host array [0xaa, 0xbb]). When guest
    and host differ in endianness, the guest will see an incorrect
    *array*.

    Which, alas, makes this interface completely unsuitable for the
    purpose at hand (ie. transferring kernel & initrd images in words
    wider than a byte). We couldn't care less as to what numerical value
    the array [0xaa, 0xbb] encodes on the host -- whatever it encodes,
    the guest wants to receive the same *array* (not the same value).
    And the byte order cannot be fixed up in the guest, because it
    depends on the XOR of guest and host endiannesses, and the guest
    doesn't know about the host's endianness.

I apologize for wasting everyone's time, but I think both results are
very non-intuitive. I noticed the use of the bitwise shift in
memory_region_read_accessor() -- which internally accounts for the
host-side byte order -- just today, while discussing this with Drew on
IRC. I had assumed that it would store bytes in the recipient uint64_t
by address, not by numerical order of magnitude.

Looks like the DMA-like interface is the only way forward. :(

Laszlo
Paolo Bonzini Dec. 16, 2014, 7:49 p.m. UTC | #3
On 16/12/2014 20:00, Laszlo Ersek wrote:
> Yes.
> 
> The root of this question is what each of
> 
> enum device_endian {
>     DEVICE_NATIVE_ENDIAN,
>     DEVICE_BIG_ENDIAN,
>     DEVICE_LITTLE_ENDIAN,
> };

Actually, I think the root of the answer :) is that fw_cfg_read (and
thus fw_cfg_data_mem_read) is not idempotent.  The split/compose stuff
accesses the bytes at offsets 8,9,10,11,12,13,14,15 and composes them
according to the endianness.

In the case of fw_cfg it just retrieves 8 bytes, but in the case of host
big endian it reads them in the "wrong" order for some reason (sorry, I
haven't looked at this thoroughly).


So the solution is:

1) make fw_cfg_data_mem_ops DEVICE_LITTLE_ENDIAN

2) make fw_cfg_data_mem_read and fw_cfg_data_mem_write call fw_cfg_read
and fw_cfg_write SIZE times and build up a value from the lowest byte up.

Paolo
Laszlo Ersek Dec. 16, 2014, 8:06 p.m. UTC | #4
On 12/16/14 20:49, Paolo Bonzini wrote:
> 
> 
> On 16/12/2014 20:00, Laszlo Ersek wrote:
>> Yes.
>>
>> The root of this question is what each of
>>
>> enum device_endian {
>>     DEVICE_NATIVE_ENDIAN,
>>     DEVICE_BIG_ENDIAN,
>>     DEVICE_LITTLE_ENDIAN,
>> };
> 
> Actually, I think the root of the answer :) is that fw_cfg_read (and
> thus fw_cfg_data_mem_read) is not idempotent.  The split/compose stuff
> accesses the bytes at offsets 8,9,10,11,12,13,14,15 and composes them
> according to the endianness.
> 
> In the case of fw_cfg it just retrieves 8 bytes, but in the case of host
> big endian it reads them in the "wrong" order for some reason (sorry, I
> haven't looked at this thoroughly).

I can't imagine how that would happen; fw_cfg_data_mem_read() ignores
both "addr" and "size", and fw_cfg_read() simply advances the
"cur_offset" member.

> So the solution is:
> 
> 1) make fw_cfg_data_mem_ops DEVICE_LITTLE_ENDIAN
> 
> 2) make fw_cfg_data_mem_read and fw_cfg_data_mem_write call fw_cfg_read
> and fw_cfg_write SIZE times and build up a value from the lowest byte up.

Nonetheless, that's a really nice idea! I got so stuck with the
automatic splitting that I forgot about the possibility to act upon the
"size" parameter in fw_cfg_data_mem_read(). Thanks!

... Another thing that Andrew mentioned but I didn't cover in my other
email -- what about fw_cfg_ctl_mem_ops? It's currently DEVICE_NATIVE_ENDIAN.

You flipped the combined ops to LE in commit 6fdf98f2 (and, apparently,
I reviewed it). Shouldn't we do the same for the standalone selector?

Thanks
Laszlo
Laszlo Ersek Dec. 16, 2014, 8:17 p.m. UTC | #5
On 12/16/14 21:06, Laszlo Ersek wrote:
> On 12/16/14 20:49, Paolo Bonzini wrote:
>>
>>
>> On 16/12/2014 20:00, Laszlo Ersek wrote:
>>> Yes.
>>>
>>> The root of this question is what each of
>>>
>>> enum device_endian {
>>>     DEVICE_NATIVE_ENDIAN,
>>>     DEVICE_BIG_ENDIAN,
>>>     DEVICE_LITTLE_ENDIAN,
>>> };
>>
>> Actually, I think the root of the answer :) is that fw_cfg_read (and
>> thus fw_cfg_data_mem_read) is not idempotent.  The split/compose stuff
>> accesses the bytes at offsets 8,9,10,11,12,13,14,15 and composes them
>> according to the endianness.
>>
>> In the case of fw_cfg it just retrieves 8 bytes, but in the case of host
>> big endian it reads them in the "wrong" order for some reason (sorry, I
>> haven't looked at this thoroughly).
> 
> I can't imagine how that would happen; fw_cfg_data_mem_read() ignores
> both "addr" and "size", and fw_cfg_read() simply advances the
> "cur_offset" member.

Ah okay, I understand your point now; you're probably saying that
access_with_adjusted_size() traverses the offsets in the wrong order.
... I don't see how; the only difference in the access() param list is
the shift count. (I don't know how it should work by design.)

In any case fw_cfg should be able to handle the larger accesses itself.

Thanks
Laszlo
Paolo Bonzini Dec. 16, 2014, 8:40 p.m. UTC | #6
On 16/12/2014 21:06, Laszlo Ersek wrote:
> On 12/16/14 20:49, Paolo Bonzini wrote:
>> fw_cfg_read (and
>> thus fw_cfg_data_mem_read) is not idempotent.  The split/compose stuff
>> accesses the bytes at offsets 8,9,10,11,12,13,14,15 and composes them
>> according to the endianness.
>>
>> In the case of fw_cfg it just retrieves 8 bytes, but in the case of host
>> big endian it reads them in the "wrong" order for some reason (sorry, I
>> haven't looked at this thoroughly).
> 
> I can't imagine how that would happen; fw_cfg_data_mem_read() ignores
> both "addr" and "size", and fw_cfg_read() simply advances the
> "cur_offset" member.

Honestly neither can I.  But still the automatic splitting (which is
even tested by tests/endianness-test.c :)) assumes idempotency of the
components and it's not entirely surprising that it somehow/sometimes
breaks if you don't respect that.

>> So the solution is:
>>
>> 1) make fw_cfg_data_mem_ops DEVICE_LITTLE_ENDIAN
>>
>> 2) make fw_cfg_data_mem_read and fw_cfg_data_mem_write call fw_cfg_read
>> and fw_cfg_write SIZE times and build up a value from the lowest byte up.
> 
> Nonetheless, that's a really nice idea! I got so stuck with the
> automatic splitting that I forgot about the possibility to act upon the
> "size" parameter in fw_cfg_data_mem_read(). Thanks!
> 
> ... Another thing that Andrew mentioned but I didn't cover in my other
> email -- what about fw_cfg_ctl_mem_ops? It's currently DEVICE_NATIVE_ENDIAN.
> 
> You flipped the combined ops to LE in commit 6fdf98f2 (and, apparently,
> I reviewed it). Shouldn't we do the same for the standalone selector?

No.  The standalone selector is used as MMIO, and the BE platforms
expect the platform to be big-endian.  The combined ops are only used on
ISA ports, where the firmware expects them to be little-endian (as
mentioned in the commit message).

That said, the standalone selector is used by BE platforms only, so we
know that the standalone selector is always DEVICE_BIG_ENDIAN.

So if you want, you can make the standalone selector and the standalone
datum BE and swap them in the firmware.  If the suggestion doesn't make
you jump up and down, I understand that. :)

Paolo
Peter Maydell Dec. 16, 2014, 8:41 p.m. UTC | #7
On 16 December 2014 at 19:00, Laszlo Ersek <lersek@redhat.com> wrote:
> The root of this question is what each of
>
> enum device_endian {
>     DEVICE_NATIVE_ENDIAN,
>     DEVICE_BIG_ENDIAN,
>     DEVICE_LITTLE_ENDIAN,
> };
>
> means.

An opening remark: endianness is a horribly confusing topic and
support of more than one endianness is even worse. I may have made
some inadvertent errors in this reply; if you think so please
let me know and I'll have another stab at it.

That said: the device_endian options indicate what a device's
internal idea of its endianness is. This is most relevant if
a device accepts accesses at wider than byte width
(for instance, if you can read a 32-bit value from
address offset 0 and also an 8-bit value from offset 3 then
how do those values line up? If you read a 32-bit value then
which way round is it compared to what the device's io read/write
function return?).

NATIVE_ENDIAN means "same order as the CPU's main data bus's
natural representation". (Note that this is not necessarily
the same as "the endianness the CPU currently has"; on ARM
you can flip the CPU between LE and BE at runtime, which
is basically inserting a byte-swizzling step between data
accesses and the CPU's data bus, which is always LE for ARMv7+.)

Note that RAM accessible to a KVM guest is always NATIVE_ENDIAN
(because the guest vcpu reads it directly with the h/w cpu).

> Consider the following call tree, which implements the splitting /
> combining of an MMIO read:
>
> memory_region_dispatch_read() [memory.c]
>   memory_region_dispatch_read1()
>     access_with_adjusted_size()
>       memory_region_big_endian()
>       for each byte in the wide access:
>         memory_region_read_accessor()
>           fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c]
>             fw_cfg_read()
>   adjust_endianness()
>     memory_region_wrong_endianness()
>     bswapXX()
>
> The function access_with_adjusted_size() always iterates over the MMIO
> address range in incrementing address order. However, it can calculate
> the shift count for memory_region_read_accessor() in two ways.
>
> When memory_region_big_endian() returns "true", the shift count
> decreases as the MMIO address increases.
>
> When memory_region_big_endian() returns "false", the shift count
> increases as the MMIO address increases.

Yep, because this is how the device has told us that it thinks
of accesses as being put together.

The column in your table "host value" is the 16 bit value read from
the device, ie what we have decided (based on device_endian) that
it would have returned us if it had supported a 16 bit read directly
itself. BE devices compose 16 bit values with the high byte first,
LE devices with the low byte first, and native-endian devices
in the same order as guest-endianness.

> In memory_region_read_accessor(), the shift count is used for a logical
> (ie. numeric) bitwise left shift (<<). That operator works with numeric
> values and hides (ie. accounts for) host endianness.
>
> Let's consider
> - an array of two bytes, [0xaa, 0xbb],
> - a uint16_t access made from the guest,
> - and all twelve possible cases.
>
> In the table below, the "host", "guest" and "device_endian" columns are
> input variables. The columns to the right are calculated / derived
> values. The arrows above the columns show the data dependencies.
>
> After memory_region_dispatch_read1() constructs the value that is
> visible in the "host value" column, it is passed to adjust_endianness().
> If memory_region_wrong_endianness() returns "true", then the host value
> is byte-swapped. The result is then passed to the guest.
>
>               +---------------------------------------------------------------------------------------------------------------+----------+
>               |                                                                                                               |          |
>             +---- ------+-------------------------------------------------------------------------+                           |          |
>             | |         |                                                                         |                           |          |
>       +----------------------------------------------------------+---------+                      |                           |          |
>       |     | |         |                                        |         |                      |                           |          |
>       |   +-----------+-------------------+  +----------------+  |         |  +------------------------+-------------------+  |          |
>       |   | | |       | |                 |  |                |  |         |  |                   |    |                   |  |          |
>       |   | | |       | |                 v  |                v  |         v  |                   v    |                   v  |          v
>  #  host  guest  device_endian  memory_region_big_endian()  host value  host repr.    memory_region_wrong_endianness()  guest repr.   guest value
> --  ----  -----  -------------  --------------------------  ----------  ------------  --------------------------------  ------------  -----------
>  1  LE    LE     native         0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>  2  LE    LE     BE             1                           0xaabb      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xbbaa
>  3  LE    LE     LE             0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>
>  4  LE    BE     native         1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>  5  LE    BE     BE             1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>  6  LE    BE     LE             0                           0xbbaa      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xbbaa
>
>  7  BE    LE     native         0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>  8  BE    LE     BE             1                           0xaabb      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xaabb
>  9  BE    LE     LE             0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>
> 10  BE    BE     native         1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
> 11  BE    BE     BE             1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
> 12  BE    BE     LE             0                           0xbbaa      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xaabb

The column you have labelled 'guest repr' here is the returned data
from io_mem_read, whose API contract is "write the data from the
device into this host C uint16_t (or whatever) such that it is the
value returned by the device read as a native host value". It's
not related to the guest order at all.

So for instance, io_mem_read() is called by cpu_physical_memory_rw(),
which passes it a local variable "val". So now "val" has the
"guest repr" column's bytes in it, and (as a host C variable) the
value:
1,2,3 : 0xbbaa
4,5,6 : 0xaabb
7,8,9 : 0xbbaa
10,11,12 : 0xaabb

As you can see, this depends on the "guest endianness" (which is
kind of the endianness of the bus): a BE guest 16 bit access to
this device would return the 16 bit value 0xaabb, and an LE guest
0xbbaa, and we have exactly those values in our host C variable.
If this is TCG, then we'll copy that 16 bit host value into the
CPUState struct field corresponding to the destination guest
register as-is. (TCG CPUState fields are always in host-C-order.)

However, to pass them up to KVM we have to put them into a
buffer in RAM as per the KVM_EXIT_MMIO ABI. So:
cpu_physical_memory_rw() calls stw_p(buf, val) [which is "store
in target CPU endianness"], so now buf has the bytes:
1,2,3 : [0xaa, 0xbb]
4,5,6 : [0xaa, 0xbb]
7,8,9 : [0xaa, 0xbb]
10,11,12 : [0xaa, 0xbb]

...which is the same for every case.

This buffer is (for KVM) the run->mmio.data buffer, whose semantics
are "the value as it would appear if the VCPU performed a load or store
of the appropriate width directly to the byte array". Which is what we
want -- your device has two bytes in order 0xaa, 0xbb, and we did
a 16 bit load in the guest CPU, so we should get the same answer as if
we did a 16 bit load of RAM containing 0xaa, 0xbb. That will be
0xaabb if the VCPU is big-endian, and 0xbbaa if it is not.

I think the fact that all of these things come out to the same
set of bytes in the mmio.data buffer is the indication that all
QEMU's byte swapping is correct.

> Looking at the two rightmost columns, we must conclude:
>
> (a) The "device_endian" field has absolutely no significance wrt. what
>     the guest sees. In each triplet of cases, when we go from
>     DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN,
>     the guest sees the exact same value.
>
>     I don't understand this result (it makes me doubt device_endian
>     makes any sense). What did I do wrong?

I think it's because you defined your device as only supporting
byte accesses that you didn't see any difference between the
various device_endian settings. If a device presents itself as
just an array of bytes then it doesn't have an endianness, really.

As Paolo says, if you make your device support wider accesses
directly and build up the value yourself then you'll see that
setting the device_endian to LE vs BE vs native does change
the values you see in the guest (and that you'll need to set it
to LE and interpret the words in the guest as LE to get the
right behaviour).

>     This means that this interface is *value preserving*, not
>     representation preserving. In other words: when host and guest
>     endiannesses are identical, the *array* is transferred okay (the
>     guest array matches the initial host array [0xaa, 0xbb]). When guest
>     and host differ in endianness, the guest will see an incorrect
>     *array*.

Think of a device which supports only byte accesses as being
like a lump of RAM. A big-endian guest CPU which reads 32 bits
at a time will get different values in registers to an LE guest
which does the same.

*However*, if both CPUs just do "read 32 bits; write 32 bits to
RAM" (ie a kind of memcpy but with the source being the mmio
register rather than some other bit of RAM) then you should get
a bytewise-correct copy of the data in RAM.

So I *think* that would be a viable approach: have your QEMU
device code as it is now, and just make sure that if the guest
is doing wider-than-byte accesses it does them as
  do {
     load word from mmio register;
     store word to RAM;
  } while (count);

and doesn't try to interpret the byte order of the values while
they're in the CPU register in the middle of the loop.

Paolo's suggestion would work too, if you prefer that.

> I apologize for wasting everyone's time, but I think both results are
> very non-intuitive.

Everything around endianness handling is non-intuitive --
it's the nature of the problem, I'm afraid. (Some of this is
also QEMU's fault for not having good documentation of the
semantics of each of the layers involved in memory accesses.
I have on my todo list the vague idea of trying to write these
all up as a blog post.)

thanks
-- PMM
Peter Maydell Dec. 16, 2014, 9:47 p.m. UTC | #8
On 16 December 2014 at 20:40, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Honestly neither can I.  But still the automatic splitting (which is
> even tested by tests/endianness-test.c :)) assumes idempotency of the
> components and it's not entirely surprising that it somehow/sometimes
> breaks if you don't respect that.

Oh, good point. Yeah, I don't think we want to make any guarantee
about the order in which the N byte accesses hit the device if
we're assembling an N-byte access. Implementing the device as
a proper wide-access-supporting device is definitely the way to go.

-- PMM
Paolo Bonzini Dec. 16, 2014, 9:47 p.m. UTC | #9
On 16/12/2014 21:17, Laszlo Ersek wrote:
>> > I can't imagine how that would happen; fw_cfg_data_mem_read() ignores
>> > both "addr" and "size", and fw_cfg_read() simply advances the
>> > "cur_offset" member.
> Ah okay, I understand your point now; you're probably saying that
> access_with_adjusted_size() traverses the offsets in the wrong order.
> ... I don't see how; the only difference in the access() param list is
> the shift count. (I don't know how it should work by design.)

I think I have figured it out.

Guest endianness affects where those bytes are placed, but not the order
in which they are fetched; and the effects of guest endianness are
always cleaned up by adjust_endianness, so ultimately they do not matter.

Think of how you would implement the uint64_t read in fw_cfg:

File bytes         12 34 56 78 9a bc de f0

fw_cfg_data_mem_read should read

size==4 BE host    0x12345678
size==4 LE host    0x78563412
size==8 BE host    0x123456789abcdef0
size==8 LE host    0xf0debc9a78563412

So the implementation of fw_cfg_data_mem_read must depend on host
endianness.  Instead, memory.c always fills in bytes in the same order,
on the assumption that the reads are idempotent.

Paolo
Laszlo Ersek Dec. 17, 2014, 4:52 a.m. UTC | #10
On 12/16/14 22:47, Paolo Bonzini wrote:
> On 16/12/2014 21:17, Laszlo Ersek wrote:
>>>> I can't imagine how that would happen; fw_cfg_data_mem_read() ignores
>>>> both "addr" and "size", and fw_cfg_read() simply advances the
>>>> "cur_offset" member.
>> Ah okay, I understand your point now; you're probably saying that
>> access_with_adjusted_size() traverses the offsets in the wrong order.
>> ... I don't see how; the only difference in the access() param list is
>> the shift count. (I don't know how it should work by design.)
> 
> I think I have figured it out.
> 
> Guest endianness affects where those bytes are placed, but not the order
> in which they are fetched; and the effects of guest endianness are
> always cleaned up by adjust_endianness, so ultimately they do not matter.
> 
> Think of how you would implement the uint64_t read in fw_cfg:
> 
> File bytes         12 34 56 78 9a bc de f0
> 
> fw_cfg_data_mem_read should read
> 
> size==4 BE host    0x12345678
> size==4 LE host    0x78563412
> size==8 BE host    0x123456789abcdef0
> size==8 LE host    0xf0debc9a78563412
> 
> So the implementation of fw_cfg_data_mem_read must depend on host
> endianness.  Instead, memory.c always fills in bytes in the same order,
> on the assumption that the reads are idempotent.

I see. Thanks!
Laszlo
Laszlo Ersek Dec. 17, 2014, 5:06 a.m. UTC | #11
On 12/16/14 21:40, Paolo Bonzini wrote:
> On 16/12/2014 21:06, Laszlo Ersek wrote:

>> You flipped the combined ops to LE in commit 6fdf98f2 (and, apparently,
>> I reviewed it). Shouldn't we do the same for the standalone selector?
> 
> No.  The standalone selector is used as MMIO, and the BE platforms
> expect the platform to be big-endian.  The combined ops are only used on
> ISA ports, where the firmware expects them to be little-endian (as
> mentioned in the commit message).
> 
> That said, the standalone selector is used by BE platforms only, so we
> know that the standalone selector is always DEVICE_BIG_ENDIAN.

This series exposes the standalone selector (as MMIO) to ARM guests as
well; and in "Documentation/devicetree/bindings/arm/fw-cfg.txt" for the
kernel I'm saying that the selector is little endian.

Therefore I think that the standalone selector is not only (going to be)
used by BE platforms (or I don't understand your above statement
correctly). But, the current (and to be preserved) NATIVE_ENDIAN setting
still matches what I say in
"Documentation/devicetree/bindings/arm/fw-cfg.txt", because, Peter said:

> NATIVE_ENDIAN means "same order as the CPU's main data bus's natural
> representation". (Note that this is not necessarily the same as "the
> endianness the CPU currently has"; on ARM you can flip the CPU between
> LE and BE at runtime, which is basically inserting a byte-swizzling
> step between data accesses and the CPU's data bus, which is always LE
> for ARMv7+.)

In other words, the standalone selector is NATIVE_ENDIAN, but in the
description of the *ARM* bindings, we can simply say that it's little
endian.

Is that right?

Thanks
Laszlo

> 
> So if you want, you can make the standalone selector and the standalone
> datum BE and swap them in the firmware.  If the suggestion doesn't make
> you jump up and down, I understand that. :)
> 
> Paolo
>
Laszlo Ersek Dec. 17, 2014, 7:13 a.m. UTC | #12
On 12/16/14 21:41, Peter Maydell wrote:
> On 16 December 2014 at 19:00, Laszlo Ersek <lersek@redhat.com> wrote:
>> The root of this question is what each of
>>
>> enum device_endian {
>>     DEVICE_NATIVE_ENDIAN,
>>     DEVICE_BIG_ENDIAN,
>>     DEVICE_LITTLE_ENDIAN,
>> };
>>
>> means.
> 
> An opening remark: endianness is a horribly confusing topic and
> support of more than one endianness is even worse. I may have made
> some inadvertent errors in this reply; if you think so please
> let me know and I'll have another stab at it.
> 
> That said: the device_endian options indicate what a device's
> internal idea of its endianness is. This is most relevant if
> a device accepts accesses at wider than byte width
> (for instance, if you can read a 32-bit value from
> address offset 0 and also an 8-bit value from offset 3 then
> how do those values line up? If you read a 32-bit value then
> which way round is it compared to what the device's io read/write
> function return?).
> 
> NATIVE_ENDIAN means "same order as the CPU's main data bus's
> natural representation". (Note that this is not necessarily
> the same as "the endianness the CPU currently has"; on ARM
> you can flip the CPU between LE and BE at runtime, which
> is basically inserting a byte-swizzling step between data
> accesses and the CPU's data bus, which is always LE for ARMv7+.)
> 
> Note that RAM accessible to a KVM guest is always NATIVE_ENDIAN
> (because the guest vcpu reads it directly with the h/w cpu).
> 
>> Consider the following call tree, which implements the splitting /
>> combining of an MMIO read:
>>
>> memory_region_dispatch_read() [memory.c]
>>   memory_region_dispatch_read1()
>>     access_with_adjusted_size()
>>       memory_region_big_endian()
>>       for each byte in the wide access:
>>         memory_region_read_accessor()
>>           fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c]
>>             fw_cfg_read()
>>   adjust_endianness()
>>     memory_region_wrong_endianness()
>>     bswapXX()
>>
>> The function access_with_adjusted_size() always iterates over the MMIO
>> address range in incrementing address order. However, it can calculate
>> the shift count for memory_region_read_accessor() in two ways.
>>
>> When memory_region_big_endian() returns "true", the shift count
>> decreases as the MMIO address increases.
>>
>> When memory_region_big_endian() returns "false", the shift count
>> increases as the MMIO address increases.
> 
> Yep, because this is how the device has told us that it thinks
> of accesses as being put together.
> 
> The column in your table "host value" is the 16 bit value read from
> the device, ie what we have decided (based on device_endian) that
> it would have returned us if it had supported a 16 bit read directly
> itself. BE devices compose 16 bit values with the high byte first,
> LE devices with the low byte first, and native-endian devices
> in the same order as guest-endianness.
> 
>> In memory_region_read_accessor(), the shift count is used for a logical
>> (ie. numeric) bitwise left shift (<<). That operator works with numeric
>> values and hides (ie. accounts for) host endianness.
>>
>> Let's consider
>> - an array of two bytes, [0xaa, 0xbb],
>> - a uint16_t access made from the guest,
>> - and all twelve possible cases.
>>
>> In the table below, the "host", "guest" and "device_endian" columns are
>> input variables. The columns to the right are calculated / derived
>> values. The arrows above the columns show the data dependencies.
>>
>> After memory_region_dispatch_read1() constructs the value that is
>> visible in the "host value" column, it is passed to adjust_endianness().
>> If memory_region_wrong_endianness() returns "true", then the host value
>> is byte-swapped. The result is then passed to the guest.
>>
>>               +---------------------------------------------------------------------------------------------------------------+----------+
>>               |                                                                                                               |          |
>>             +---- ------+-------------------------------------------------------------------------+                           |          |
>>             | |         |                                                                         |                           |          |
>>       +----------------------------------------------------------+---------+                      |                           |          |
>>       |     | |         |                                        |         |                      |                           |          |
>>       |   +-----------+-------------------+  +----------------+  |         |  +------------------------+-------------------+  |          |
>>       |   | | |       | |                 |  |                |  |         |  |                   |    |                   |  |          |
>>       |   | | |       | |                 v  |                v  |         v  |                   v    |                   v  |          v
>>  #  host  guest  device_endian  memory_region_big_endian()  host value  host repr.    memory_region_wrong_endianness()  guest repr.   guest value
>> --  ----  -----  -------------  --------------------------  ----------  ------------  --------------------------------  ------------  -----------
>>  1  LE    LE     native         0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>>  2  LE    LE     BE             1                           0xaabb      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xbbaa
>>  3  LE    LE     LE             0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>>
>>  4  LE    BE     native         1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>>  5  LE    BE     BE             1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>>  6  LE    BE     LE             0                           0xbbaa      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xbbaa
>>
>>  7  BE    LE     native         0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>>  8  BE    LE     BE             1                           0xaabb      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xaabb
>>  9  BE    LE     LE             0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>>
>> 10  BE    BE     native         1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
>> 11  BE    BE     BE             1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
>> 12  BE    BE     LE             0                           0xbbaa      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xaabb
> 
> The column you have labelled 'guest repr' here is the returned data
> from io_mem_read, whose API contract is "write the data from the
> device into this host C uint16_t (or whatever) such that it is the
> value returned by the device read as a native host value". It's
> not related to the guest order at all.
> 
> So for instance, io_mem_read() is called by cpu_physical_memory_rw(),
> which passes it a local variable "val". So now "val" has the
> "guest repr" column's bytes in it, and (as a host C variable) the
> value:
> 1,2,3 : 0xbbaa
> 4,5,6 : 0xaabb
> 7,8,9 : 0xbbaa
> 10,11,12 : 0xaabb
> 
> As you can see, this depends on the "guest endianness" (which is
> kind of the endianness of the bus): a BE guest 16 bit access to
> this device would return the 16 bit value 0xaabb, and an LE guest
> 0xbbaa, and we have exactly those values in our host C variable.
> If this is TCG, then we'll copy that 16 bit host value into the
> CPUState struct field corresponding to the destination guest
> register as-is. (TCG CPUState fields are always in host-C-order.)
> 
> However, to pass them up to KVM we have to put them into a
> buffer in RAM as per the KVM_EXIT_MMIO ABI. So:
> cpu_physical_memory_rw() calls stw_p(buf, val) [which is "store
> in target CPU endianness"], so now buf has the bytes:
> 1,2,3 : [0xaa, 0xbb]
> 4,5,6 : [0xaa, 0xbb]
> 7,8,9 : [0xaa, 0xbb]
> 10,11,12 : [0xaa, 0xbb]
> 
> ...which is the same for every case.
> 
> This buffer is (for KVM) the run->mmio.data buffer, whose semantics
> are "the value as it would appear if the VCPU performed a load or store
> of the appropriate width directly to the byte array". Which is what we
> want -- your device has two bytes in order 0xaa, 0xbb, and we did
> a 16 bit load in the guest CPU, so we should get the same answer as if
> we did a 16 bit load of RAM containing 0xaa, 0xbb. That will be
> 0xaabb if the VCPU is big-endian, and 0xbbaa if it is not.
> 
> I think the fact that all of these things come out to the same
> set of bytes in the mmio.data buffer is the indication that all
> QEMU's byte swapping is correct.
> 
>> Looking at the two rightmost columns, we must conclude:
>>
>> (a) The "device_endian" field has absolutely no significance wrt. what
>>     the guest sees. In each triplet of cases, when we go from
>>     DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN,
>>     the guest sees the exact same value.
>>
>>     I don't understand this result (it makes me doubt device_endian
>>     makes any sense). What did I do wrong?
> 
> I think it's because you defined your device as only supporting
> byte accesses that you didn't see any difference between the
> various device_endian settings. If a device presents itself as
> just an array of bytes then it doesn't have an endianness, really.
> 
> As Paolo says, if you make your device support wider accesses
> directly and build up the value yourself then you'll see that
> setting the device_endian to LE vs BE vs native does change
> the values you see in the guest (and that you'll need to set it
> to LE and interpret the words in the guest as LE to get the
> right behaviour).
> 
>>     This means that this interface is *value preserving*, not
>>     representation preserving. In other words: when host and guest
>>     endiannesses are identical, the *array* is transferred okay (the
>>     guest array matches the initial host array [0xaa, 0xbb]). When guest
>>     and host differ in endianness, the guest will see an incorrect
>>     *array*.
> 
> Think of a device which supports only byte accesses as being
> like a lump of RAM. A big-endian guest CPU which reads 32 bits
> at a time will get different values in registers to an LE guest
> which does the same.
> 
> *However*, if both CPUs just do "read 32 bits; write 32 bits to
> RAM" (ie a kind of memcpy but with the source being the mmio
> register rather than some other bit of RAM) then you should get
> a bytewise-correct copy of the data in RAM.
> 
> So I *think* that would be a viable approach: have your QEMU
> device code as it is now, and just make sure that if the guest
> is doing wider-than-byte accesses it does them as
>   do {
>      load word from mmio register;
>      store word to RAM;
>   } while (count);
> 
> and doesn't try to interpret the byte order of the values while
> they're in the CPU register in the middle of the loop.
> 
> Paolo's suggestion would work too, if you prefer that.
> 
>> I apologize for wasting everyone's time, but I think both results are
>> very non-intuitive.
> 
> Everything around endianness handling is non-intuitive --
> it's the nature of the problem, I'm afraid. (Some of this is
> also QEMU's fault for not having good documentation of the
> semantics of each of the layers involved in memory accesses.
> I have on my todo list the vague idea of trying to write these
> all up as a blog post.)

Thanks for taking the time to write this up. My analysis must have
missed at least two important things then:
- the device's read/write function needs to consider address & size, and
  return values that match host byte order. fw_cfg doesn't conform ATM.
- there's one more layer outside the call tree that I checked that can
  perform endianness conversion.

I'll try to implement Paolo's suggestion (ie. support wide accesses in
fw_cfg internally, not relying on splitting/combining by memory.c).

Thanks
Laszlo
Alexander Graf Dec. 17, 2014, 8:28 a.m. UTC | #13
> Am 17.12.2014 um 08:13 schrieb Laszlo Ersek <lersek@redhat.com>:
> 
>> On 12/16/14 21:41, Peter Maydell wrote:
>>> On 16 December 2014 at 19:00, Laszlo Ersek <lersek@redhat.com> wrote:
>>> The root of this question is what each of
>>> 
>>> enum device_endian {
>>>    DEVICE_NATIVE_ENDIAN,
>>>    DEVICE_BIG_ENDIAN,
>>>    DEVICE_LITTLE_ENDIAN,
>>> };
>>> 
>>> means.
>> 
>> An opening remark: endianness is a horribly confusing topic and
>> support of more than one endianness is even worse. I may have made
>> some inadvertent errors in this reply; if you think so please
>> let me know and I'll have another stab at it.
>> 
>> That said: the device_endian options indicate what a device's
>> internal idea of its endianness is. This is most relevant if
>> a device accepts accesses at wider than byte width
>> (for instance, if you can read a 32-bit value from
>> address offset 0 and also an 8-bit value from offset 3 then
>> how do those values line up? If you read a 32-bit value then
>> which way round is it compared to what the device's io read/write
>> function return?).
>> 
>> NATIVE_ENDIAN means "same order as the CPU's main data bus's
>> natural representation". (Note that this is not necessarily
>> the same as "the endianness the CPU currently has"; on ARM
>> you can flip the CPU between LE and BE at runtime, which
>> is basically inserting a byte-swizzling step between data
>> accesses and the CPU's data bus, which is always LE for ARMv7+.)
>> 
>> Note that RAM accessible to a KVM guest is always NATIVE_ENDIAN
>> (because the guest vcpu reads it directly with the h/w cpu).
>> 
>>> Consider the following call tree, which implements the splitting /
>>> combining of an MMIO read:
>>> 
>>> memory_region_dispatch_read() [memory.c]
>>>  memory_region_dispatch_read1()
>>>    access_with_adjusted_size()
>>>      memory_region_big_endian()
>>>      for each byte in the wide access:
>>>        memory_region_read_accessor()
>>>          fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c]
>>>            fw_cfg_read()
>>>  adjust_endianness()
>>>    memory_region_wrong_endianness()
>>>    bswapXX()
>>> 
>>> The function access_with_adjusted_size() always iterates over the MMIO
>>> address range in incrementing address order. However, it can calculate
>>> the shift count for memory_region_read_accessor() in two ways.
>>> 
>>> When memory_region_big_endian() returns "true", the shift count
>>> decreases as the MMIO address increases.
>>> 
>>> When memory_region_big_endian() returns "false", the shift count
>>> increases as the MMIO address increases.
>> 
>> Yep, because this is how the device has told us that it thinks
>> of accesses as being put together.
>> 
>> The column in your table "host value" is the 16 bit value read from
>> the device, ie what we have decided (based on device_endian) that
>> it would have returned us if it had supported a 16 bit read directly
>> itself. BE devices compose 16 bit values with the high byte first,
>> LE devices with the low byte first, and native-endian devices
>> in the same order as guest-endianness.
>> 
>>> In memory_region_read_accessor(), the shift count is used for a logical
>>> (ie. numeric) bitwise left shift (<<). That operator works with numeric
>>> values and hides (ie. accounts for) host endianness.
>>> 
>>> Let's consider
>>> - an array of two bytes, [0xaa, 0xbb],
>>> - a uint16_t access made from the guest,
>>> - and all twelve possible cases.
>>> 
>>> In the table below, the "host", "guest" and "device_endian" columns are
>>> input variables. The columns to the right are calculated / derived
>>> values. The arrows above the columns show the data dependencies.
>>> 
>>> After memory_region_dispatch_read1() constructs the value that is
>>> visible in the "host value" column, it is passed to adjust_endianness().
>>> If memory_region_wrong_endianness() returns "true", then the host value
>>> is byte-swapped. The result is then passed to the guest.
>>> 
>>>              +---------------------------------------------------------------------------------------------------------------+----------+
>>>              |                                                                                                               |          |
>>>            +---- ------+-------------------------------------------------------------------------+                           |          |
>>>            | |         |                                                                         |                           |          |
>>>      +----------------------------------------------------------+---------+                      |                           |          |
>>>      |     | |         |                                        |         |                      |                           |          |
>>>      |   +-----------+-------------------+  +----------------+  |         |  +------------------------+-------------------+  |          |
>>>      |   | | |       | |                 |  |                |  |         |  |                   |    |                   |  |          |
>>>      |   | | |       | |                 v  |                v  |         v  |                   v    |                   v  |          v
>>> #  host  guest  device_endian  memory_region_big_endian()  host value  host repr.    memory_region_wrong_endianness()  guest repr.   guest value
>>> --  ----  -----  -------------  --------------------------  ----------  ------------  --------------------------------  ------------  -----------
>>> 1  LE    LE     native         0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>>> 2  LE    LE     BE             1                           0xaabb      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xbbaa
>>> 3  LE    LE     LE             0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>>> 
>>> 4  LE    BE     native         1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>>> 5  LE    BE     BE             1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>>> 6  LE    BE     LE             0                           0xbbaa      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xbbaa
>>> 
>>> 7  BE    LE     native         0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>>> 8  BE    LE     BE             1                           0xaabb      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xaabb
>>> 9  BE    LE     LE             0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>>> 
>>> 10  BE    BE     native         1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
>>> 11  BE    BE     BE             1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
>>> 12  BE    BE     LE             0                           0xbbaa      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xaabb
>> 
>> The column you have labelled 'guest repr' here is the returned data
>> from io_mem_read, whose API contract is "write the data from the
>> device into this host C uint16_t (or whatever) such that it is the
>> value returned by the device read as a native host value". It's
>> not related to the guest order at all.
>> 
>> So for instance, io_mem_read() is called by cpu_physical_memory_rw(),
>> which passes it a local variable "val". So now "val" has the
>> "guest repr" column's bytes in it, and (as a host C variable) the
>> value:
>> 1,2,3 : 0xbbaa
>> 4,5,6 : 0xaabb
>> 7,8,9 : 0xbbaa
>> 10,11,12 : 0xaabb
>> 
>> As you can see, this depends on the "guest endianness" (which is
>> kind of the endianness of the bus): a BE guest 16 bit access to
>> this device would return the 16 bit value 0xaabb, and an LE guest
>> 0xbbaa, and we have exactly those values in our host C variable.
>> If this is TCG, then we'll copy that 16 bit host value into the
>> CPUState struct field corresponding to the destination guest
>> register as-is. (TCG CPUState fields are always in host-C-order.)
>> 
>> However, to pass them up to KVM we have to put them into a
>> buffer in RAM as per the KVM_EXIT_MMIO ABI. So:
>> cpu_physical_memory_rw() calls stw_p(buf, val) [which is "store
>> in target CPU endianness"], so now buf has the bytes:
>> 1,2,3 : [0xaa, 0xbb]
>> 4,5,6 : [0xaa, 0xbb]
>> 7,8,9 : [0xaa, 0xbb]
>> 10,11,12 : [0xaa, 0xbb]
>> 
>> ...which is the same for every case.
>> 
>> This buffer is (for KVM) the run->mmio.data buffer, whose semantics
>> are "the value as it would appear if the VCPU performed a load or store
>> of the appropriate width directly to the byte array". Which is what we
>> want -- your device has two bytes in order 0xaa, 0xbb, and we did
>> a 16 bit load in the guest CPU, so we should get the same answer as if
>> we did a 16 bit load of RAM containing 0xaa, 0xbb. That will be
>> 0xaabb if the VCPU is big-endian, and 0xbbaa if it is not.
>> 
>> I think the fact that all of these things come out to the same
>> set of bytes in the mmio.data buffer is the indication that all
>> QEMU's byte swapping is correct.
>> 
>>> Looking at the two rightmost columns, we must conclude:
>>> 
>>> (a) The "device_endian" field has absolutely no significance wrt. what
>>>    the guest sees. In each triplet of cases, when we go from
>>>    DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN,
>>>    the guest sees the exact same value.
>>> 
>>>    I don't understand this result (it makes me doubt device_endian
>>>    makes any sense). What did I do wrong?
>> 
>> I think it's because you defined your device as only supporting
>> byte accesses that you didn't see any difference between the
>> various device_endian settings. If a device presents itself as
>> just an array of bytes then it doesn't have an endianness, really.
>> 
>> As Paolo says, if you make your device support wider accesses
>> directly and build up the value yourself then you'll see that
>> setting the device_endian to LE vs BE vs native does change
>> the values you see in the guest (and that you'll need to set it
>> to LE and interpret the words in the guest as LE to get the
>> right behaviour).
>> 
>>>    This means that this interface is *value preserving*, not
>>>    representation preserving. In other words: when host and guest
>>>    endiannesses are identical, the *array* is transferred okay (the
>>>    guest array matches the initial host array [0xaa, 0xbb]). When guest
>>>    and host differ in endianness, the guest will see an incorrect
>>>    *array*.
>> 
>> Think of a device which supports only byte accesses as being
>> like a lump of RAM. A big-endian guest CPU which reads 32 bits
>> at a time will get different values in registers to an LE guest
>> which does the same.
>> 
>> *However*, if both CPUs just do "read 32 bits; write 32 bits to
>> RAM" (ie a kind of memcpy but with the source being the mmio
>> register rather than some other bit of RAM) then you should get
>> a bytewise-correct copy of the data in RAM.
>> 
>> So I *think* that would be a viable approach: have your QEMU
>> device code as it is now, and just make sure that if the guest
>> is doing wider-than-byte accesses it does them as
>>  do {
>>     load word from mmio register;
>>     store word to RAM;
>>  } while (count);
>> 
>> and doesn't try to interpret the byte order of the values while
>> they're in the CPU register in the middle of the loop.
>> 
>> Paolo's suggestion would work too, if you prefer that.
>> 
>>> I apologize for wasting everyone's time, but I think both results are
>>> very non-intuitive.
>> 
>> Everything around endianness handling is non-intuitive --
>> it's the nature of the problem, I'm afraid. (Some of this is
>> also QEMU's fault for not having good documentation of the
>> semantics of each of the layers involved in memory accesses.
>> I have on my todo list the vague idea of trying to write these
>> all up as a blog post.)
> 
> Thanks for taking the time to write this up. My analysis must have
> missed at least two important things then:
> - the device's read/write function needs to consider address & size, and
>  return values that match host byte order. fw_cfg doesn't conform ATM.
> - there's one more layer outside the call tree that I checked that can
>  perform endianness conversion.
> 
> I'll try to implement Paolo's suggestion (ie. support wide accesses in
> fw_cfg internally, not relying on splitting/combining by memory.c).

Awesome :). Please define it as device little endian while you go as well. That should give us fewer headaches if we want to support wide access on ppc.


Alex
Laszlo Ersek Dec. 17, 2014, 8:40 a.m. UTC | #14
On 12/17/14 09:28, Alexander Graf wrote:
> 
> 
> 
>> Am 17.12.2014 um 08:13 schrieb Laszlo Ersek <lersek@redhat.com>:
>>
>>> On 12/16/14 21:41, Peter Maydell wrote:
>>>> On 16 December 2014 at 19:00, Laszlo Ersek <lersek@redhat.com> wrote:
>>>> The root of this question is what each of
>>>>
>>>> enum device_endian {
>>>>    DEVICE_NATIVE_ENDIAN,
>>>>    DEVICE_BIG_ENDIAN,
>>>>    DEVICE_LITTLE_ENDIAN,
>>>> };
>>>>
>>>> means.
>>>
>>> An opening remark: endianness is a horribly confusing topic and
>>> support of more than one endianness is even worse. I may have made
>>> some inadvertent errors in this reply; if you think so please
>>> let me know and I'll have another stab at it.
>>>
>>> That said: the device_endian options indicate what a device's
>>> internal idea of its endianness is. This is most relevant if
>>> a device accepts accesses at wider than byte width
>>> (for instance, if you can read a 32-bit value from
>>> address offset 0 and also an 8-bit value from offset 3 then
>>> how do those values line up? If you read a 32-bit value then
>>> which way round is it compared to what the device's io read/write
>>> function return?).
>>>
>>> NATIVE_ENDIAN means "same order as the CPU's main data bus's
>>> natural representation". (Note that this is not necessarily
>>> the same as "the endianness the CPU currently has"; on ARM
>>> you can flip the CPU between LE and BE at runtime, which
>>> is basically inserting a byte-swizzling step between data
>>> accesses and the CPU's data bus, which is always LE for ARMv7+.)
>>>
>>> Note that RAM accessible to a KVM guest is always NATIVE_ENDIAN
>>> (because the guest vcpu reads it directly with the h/w cpu).
>>>
>>>> Consider the following call tree, which implements the splitting /
>>>> combining of an MMIO read:
>>>>
>>>> memory_region_dispatch_read() [memory.c]
>>>>  memory_region_dispatch_read1()
>>>>    access_with_adjusted_size()
>>>>      memory_region_big_endian()
>>>>      for each byte in the wide access:
>>>>        memory_region_read_accessor()
>>>>          fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c]
>>>>            fw_cfg_read()
>>>>  adjust_endianness()
>>>>    memory_region_wrong_endianness()
>>>>    bswapXX()
>>>>
>>>> The function access_with_adjusted_size() always iterates over the MMIO
>>>> address range in incrementing address order. However, it can calculate
>>>> the shift count for memory_region_read_accessor() in two ways.
>>>>
>>>> When memory_region_big_endian() returns "true", the shift count
>>>> decreases as the MMIO address increases.
>>>>
>>>> When memory_region_big_endian() returns "false", the shift count
>>>> increases as the MMIO address increases.
>>>
>>> Yep, because this is how the device has told us that it thinks
>>> of accesses as being put together.
>>>
>>> The column in your table "host value" is the 16 bit value read from
>>> the device, ie what we have decided (based on device_endian) that
>>> it would have returned us if it had supported a 16 bit read directly
>>> itself. BE devices compose 16 bit values with the high byte first,
>>> LE devices with the low byte first, and native-endian devices
>>> in the same order as guest-endianness.
>>>
>>>> In memory_region_read_accessor(), the shift count is used for a logical
>>>> (ie. numeric) bitwise left shift (<<). That operator works with numeric
>>>> values and hides (ie. accounts for) host endianness.
>>>>
>>>> Let's consider
>>>> - an array of two bytes, [0xaa, 0xbb],
>>>> - a uint16_t access made from the guest,
>>>> - and all twelve possible cases.
>>>>
>>>> In the table below, the "host", "guest" and "device_endian" columns are
>>>> input variables. The columns to the right are calculated / derived
>>>> values. The arrows above the columns show the data dependencies.
>>>>
>>>> After memory_region_dispatch_read1() constructs the value that is
>>>> visible in the "host value" column, it is passed to adjust_endianness().
>>>> If memory_region_wrong_endianness() returns "true", then the host value
>>>> is byte-swapped. The result is then passed to the guest.
>>>>
>>>>              +---------------------------------------------------------------------------------------------------------------+----------+
>>>>              |                                                                                                               |          |
>>>>            +---- ------+-------------------------------------------------------------------------+                           |          |
>>>>            | |         |                                                                         |                           |          |
>>>>      +----------------------------------------------------------+---------+                      |                           |          |
>>>>      |     | |         |                                        |         |                      |                           |          |
>>>>      |   +-----------+-------------------+  +----------------+  |         |  +------------------------+-------------------+  |          |
>>>>      |   | | |       | |                 |  |                |  |         |  |                   |    |                   |  |          |
>>>>      |   | | |       | |                 v  |                v  |         v  |                   v    |                   v  |          v
>>>> #  host  guest  device_endian  memory_region_big_endian()  host value  host repr.    memory_region_wrong_endianness()  guest repr.   guest value
>>>> --  ----  -----  -------------  --------------------------  ----------  ------------  --------------------------------  ------------  -----------
>>>> 1  LE    LE     native         0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>>>> 2  LE    LE     BE             1                           0xaabb      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xbbaa
>>>> 3  LE    LE     LE             0                           0xbbaa      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xbbaa
>>>>
>>>> 4  LE    BE     native         1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>>>> 5  LE    BE     BE             1                           0xaabb      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xbbaa
>>>> 6  LE    BE     LE             0                           0xbbaa      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xbbaa
>>>>
>>>> 7  BE    LE     native         0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>>>> 8  BE    LE     BE             1                           0xaabb      [0xaa, 0xbb]  1                                 [0xbb, 0xaa]  0xaabb
>>>> 9  BE    LE     LE             0                           0xbbaa      [0xbb, 0xaa]  0                                 [0xbb, 0xaa]  0xaabb
>>>>
>>>> 10  BE    BE     native         1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
>>>> 11  BE    BE     BE             1                           0xaabb      [0xaa, 0xbb]  0                                 [0xaa, 0xbb]  0xaabb
>>>> 12  BE    BE     LE             0                           0xbbaa      [0xbb, 0xaa]  1                                 [0xaa, 0xbb]  0xaabb
>>>
>>> The column you have labelled 'guest repr' here is the returned data
>>> from io_mem_read, whose API contract is "write the data from the
>>> device into this host C uint16_t (or whatever) such that it is the
>>> value returned by the device read as a native host value". It's
>>> not related to the guest order at all.
>>>
>>> So for instance, io_mem_read() is called by cpu_physical_memory_rw(),
>>> which passes it a local variable "val". So now "val" has the
>>> "guest repr" column's bytes in it, and (as a host C variable) the
>>> value:
>>> 1,2,3 : 0xbbaa
>>> 4,5,6 : 0xaabb
>>> 7,8,9 : 0xbbaa
>>> 10,11,12 : 0xaabb
>>>
>>> As you can see, this depends on the "guest endianness" (which is
>>> kind of the endianness of the bus): a BE guest 16 bit access to
>>> this device would return the 16 bit value 0xaabb, and an LE guest
>>> 0xbbaa, and we have exactly those values in our host C variable.
>>> If this is TCG, then we'll copy that 16 bit host value into the
>>> CPUState struct field corresponding to the destination guest
>>> register as-is. (TCG CPUState fields are always in host-C-order.)
>>>
>>> However, to pass them up to KVM we have to put them into a
>>> buffer in RAM as per the KVM_EXIT_MMIO ABI. So:
>>> cpu_physical_memory_rw() calls stw_p(buf, val) [which is "store
>>> in target CPU endianness"], so now buf has the bytes:
>>> 1,2,3 : [0xaa, 0xbb]
>>> 4,5,6 : [0xaa, 0xbb]
>>> 7,8,9 : [0xaa, 0xbb]
>>> 10,11,12 : [0xaa, 0xbb]
>>>
>>> ...which is the same for every case.
>>>
>>> This buffer is (for KVM) the run->mmio.data buffer, whose semantics
>>> are "the value as it would appear if the VCPU performed a load or store
>>> of the appropriate width directly to the byte array". Which is what we
>>> want -- your device has two bytes in order 0xaa, 0xbb, and we did
>>> a 16 bit load in the guest CPU, so we should get the same answer as if
>>> we did a 16 bit load of RAM containing 0xaa, 0xbb. That will be
>>> 0xaabb if the VCPU is big-endian, and 0xbbaa if it is not.
>>>
>>> I think the fact that all of these things come out to the same
>>> set of bytes in the mmio.data buffer is the indication that all
>>> QEMU's byte swapping is correct.
>>>
>>>> Looking at the two rightmost columns, we must conclude:
>>>>
>>>> (a) The "device_endian" field has absolutely no significance wrt. what
>>>>    the guest sees. In each triplet of cases, when we go from
>>>>    DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN,
>>>>    the guest sees the exact same value.
>>>>
>>>>    I don't understand this result (it makes me doubt device_endian
>>>>    makes any sense). What did I do wrong?
>>>
>>> I think it's because you defined your device as only supporting
>>> byte accesses that you didn't see any difference between the
>>> various device_endian settings. If a device presents itself as
>>> just an array of bytes then it doesn't have an endianness, really.
>>>
>>> As Paolo says, if you make your device support wider accesses
>>> directly and build up the value yourself then you'll see that
>>> setting the device_endian to LE vs BE vs native does change
>>> the values you see in the guest (and that you'll need to set it
>>> to LE and interpret the words in the guest as LE to get the
>>> right behaviour).
>>>
>>>>    This means that this interface is *value preserving*, not
>>>>    representation preserving. In other words: when host and guest
>>>>    endiannesses are identical, the *array* is transferred okay (the
>>>>    guest array matches the initial host array [0xaa, 0xbb]). When guest
>>>>    and host differ in endianness, the guest will see an incorrect
>>>>    *array*.
>>>
>>> Think of a device which supports only byte accesses as being
>>> like a lump of RAM. A big-endian guest CPU which reads 32 bits
>>> at a time will get different values in registers to an LE guest
>>> which does the same.
>>>
>>> *However*, if both CPUs just do "read 32 bits; write 32 bits to
>>> RAM" (ie a kind of memcpy but with the source being the mmio
>>> register rather than some other bit of RAM) then you should get
>>> a bytewise-correct copy of the data in RAM.
>>>
>>> So I *think* that would be a viable approach: have your QEMU
>>> device code as it is now, and just make sure that if the guest
>>> is doing wider-than-byte accesses it does them as
>>>  do {
>>>     load word from mmio register;
>>>     store word to RAM;
>>>  } while (count);
>>>
>>> and doesn't try to interpret the byte order of the values while
>>> they're in the CPU register in the middle of the loop.
>>>
>>> Paolo's suggestion would work too, if you prefer that.
>>>
>>>> I apologize for wasting everyone's time, but I think both results are
>>>> very non-intuitive.
>>>
>>> Everything around endianness handling is non-intuitive --
>>> it's the nature of the problem, I'm afraid. (Some of this is
>>> also QEMU's fault for not having good documentation of the
>>> semantics of each of the layers involved in memory accesses.
>>> I have on my todo list the vague idea of trying to write these
>>> all up as a blog post.)
>>
>> Thanks for taking the time to write this up. My analysis must have
>> missed at least two important things then:
>> - the device's read/write function needs to consider address & size, and
>>  return values that match host byte order. fw_cfg doesn't conform ATM.
>> - there's one more layer outside the call tree that I checked that can
>>  perform endianness conversion.
>>
>> I'll try to implement Paolo's suggestion (ie. support wide accesses in
>> fw_cfg internally, not relying on splitting/combining by memory.c).
> 
> Awesome :). Please define it as device little endian while you go as well. That should give us fewer headaches if we want to support wide access on ppc.

Definitely; that was actually the first part of Paolo's suggestion. :)

Thanks!
Laszlo
Paolo Bonzini Dec. 17, 2014, 9:23 a.m. UTC | #15
On 17/12/2014 06:06, Laszlo Ersek wrote:
>> > That said, the standalone selector is used by BE platforms only, so we
>> > know that the standalone selector is always DEVICE_BIG_ENDIAN.
> This series exposes the standalone selector (as MMIO) to ARM guests as
> well; and in "Documentation/devicetree/bindings/arm/fw-cfg.txt" for the
> kernel I'm saying that the selector is little endian.
> 
> Therefore I think that the standalone selector is not only (going to be)
> used by BE platforms (or I don't understand your above statement
> correctly).

It's not going to be used only by BE platforms.  But so far it is used
by BE platforms only.  If you want to keep everything consistent, you
can make it (and the wide datum) BE; and swizzle data in the firmware.
Otherwise, NATIVE_ENDIAN is fine.

> In other words, the standalone selector is NATIVE_ENDIAN, but in the
> description of the *ARM* bindings, we can simply say that it's little
> endian.

Yes.

Paolo
Alexander Graf Dec. 17, 2014, 9:31 a.m. UTC | #16
On 17.12.14 10:23, Paolo Bonzini wrote:
> 
> 
> On 17/12/2014 06:06, Laszlo Ersek wrote:
>>>> That said, the standalone selector is used by BE platforms only, so we
>>>> know that the standalone selector is always DEVICE_BIG_ENDIAN.
>> This series exposes the standalone selector (as MMIO) to ARM guests as
>> well; and in "Documentation/devicetree/bindings/arm/fw-cfg.txt" for the
>> kernel I'm saying that the selector is little endian.
>>
>> Therefore I think that the standalone selector is not only (going to be)
>> used by BE platforms (or I don't understand your above statement
>> correctly).
> 
> It's not going to be used only by BE platforms.  But so far it is used
> by BE platforms only.  If you want to keep everything consistent, you
> can make it (and the wide datum) BE; and swizzle data in the firmware.
> Otherwise, NATIVE_ENDIAN is fine.

For the sake of keeping myself sane, I would really prefer not to have
NATIVE_ENDIAN. Once you start thinking about little endian guests on PPC
sharing code with little endian guests on x86 your head will start to
explode otherwise.

If code becomes simpler by making the mmio accessors BIG_ENDIAN, that's
fine with me as well.


Alex
diff mbox

Patch

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index c4b78ed..7f6031c 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -30,9 +30,8 @@ 
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
 
 #define FW_CFG_SIZE 2
-#define FW_CFG_DATA_SIZE 1
 #define TYPE_FW_CFG "fw_cfg"
 #define FW_CFG_NAME "fw_cfg"
 #define FW_CFG_PATH "/machine/" FW_CFG_NAME
 #define FW_CFG(obj) OBJECT_CHECK(FWCfgState, (obj), TYPE_FW_CFG)
@@ -323,8 +322,9 @@  static const MemoryRegionOps fw_cfg_data_mem_ops = {
     .valid = {
         .min_access_size = 1,
         .max_access_size = 1,
     },
+    .impl.max_access_size = 1,
 };
 
 static const MemoryRegionOps fw_cfg_comb_mem_ops = {
     .read = fw_cfg_comb_read,
@@ -608,9 +608,10 @@  static void fw_cfg_initfn(Object *obj)
     memory_region_init_io(&s->ctl_iomem, OBJECT(s), &fw_cfg_ctl_mem_ops, s,
                           "fwcfg.ctl", FW_CFG_SIZE);
     sysbus_init_mmio(sbd, &s->ctl_iomem);
     memory_region_init_io(&s->data_iomem, OBJECT(s), &fw_cfg_data_mem_ops, s,
-                          "fwcfg.data", FW_CFG_DATA_SIZE);
+                          "fwcfg.data",
+                          fw_cfg_data_mem_ops.valid.max_access_size);
     sysbus_init_mmio(sbd, &s->data_iomem);
     /* In case ctl and data overlap: */
     memory_region_init_io(&s->comb_iomem, OBJECT(s), &fw_cfg_comb_mem_ops, s,
                           "fwcfg", FW_CFG_SIZE);