Patchwork [01/15] exec: introduce endianness swapped mmio

login
register
mail settings
Submitter Alexander Graf
Date Nov. 25, 2010, 7:35 a.m.
Message ID <1290670555-12575-2-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/73033/
State New
Headers show

Comments

Alexander Graf - Nov. 25, 2010, 7:35 a.m.
The way we're currently modeling mmio is too simplified. We assume that
every device has the same endianness as the target CPU. In reality,
most devices are little endian (all PCI and ISA ones I'm aware of). Some
are big endian (special system devices) and a very little fraction is
target native endian (fw_cfg).

So instead of assuming every device to be native endianness, let's move
to a model where the device tells us which endianness it's in.

That way we can compile the devices only once and get rid of all the ugly
swap will be done by the underlying layer.

For the same of readability, this patch only introduces the helper framework
but doesn't allow the registering code to set its endianness yet.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 cpu-common.h |    4 ++
 exec.c       |  122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 0 deletions(-)
Andreas Färber - Nov. 26, 2010, 11 p.m.
Am 25.11.2010 um 08:35 schrieb Alexander Graf:

> The way we're currently modeling mmio is too simplified. We assume  
> that
> every device has the same endianness as the target CPU. In reality,
> most devices are little endian (all PCI and ISA ones I'm aware of).  
> Some
> are big endian (special system devices) and a very little fraction is
> target native endian (fw_cfg).
>
> So instead of assuming every device to be native endianness, let's  
> move
> to a model where the device tells us which endianness it's in.
>
> That way we can compile the devices only once and get rid of all the  
> ugly
> swap will be done by the underlying layer.
>
> For the same of readability, this patch only introduces the helper  
> framework
> but doesn't allow the registering code to set its endianness yet.
>
> Signed-off-by: Alexander Graf <agraf@suse.de>

> ---
> cpu-common.h |    4 ++
> exec.c       |  122 +++++++++++++++++++++++++++++++++++++++++++++++++ 
> +++++++++
> 2 files changed, 126 insertions(+), 0 deletions(-)
>
> diff --git a/cpu-common.h b/cpu-common.h
> index a543b5d..839b236 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -20,6 +20,10 @@
>
> #if !defined(CONFIG_USER_ONLY)
>
> +#define DEVICE_NATIVE_ENDIAN  0
> +#define DEVICE_BIG_ENDIAN     1
> +#define DEVICE_LITTLE_ENDIAN  2

I believe some people around here voiced a preference for enums, to  
aid with gdb debugging?

> diff --git a/exec.c b/exec.c
> index db9ff55..f54a360 100644
> --- a/exec.c
> +++ b/exec.c

> @@ -3370,6 +3474,22 @@ static int cpu_register_io_memory_fixed(int  
> io_index,
>     }
>     io_mem_opaque[io_index] = opaque;
>
> +    switch (endian) {
> +    case DEVICE_BIG_ENDIAN:
> +#ifndef TARGET_WORDS_BIGENDIAN
> +        swapendian_init(io_index);
> +#endif
> +        break;

So basically, you just moved the #ifdefs to another place. :)  
Shouldn't this be dependent on the CPU state and determined at  
runtime? Thinking of MSR LE bit on ppc. I guess QEMU doesn't support  
bi-endian ppc today, as does the 970, but it would be nice to keep it  
in mind.

Andreas
Alexander Graf - Nov. 26, 2010, 11:06 p.m.
On 27.11.2010, at 00:00, Andreas Färber wrote:

> Am 25.11.2010 um 08:35 schrieb Alexander Graf:
> 
>> The way we're currently modeling mmio is too simplified. We assume that
>> every device has the same endianness as the target CPU. In reality,
>> most devices are little endian (all PCI and ISA ones I'm aware of). Some
>> are big endian (special system devices) and a very little fraction is
>> target native endian (fw_cfg).
>> 
>> So instead of assuming every device to be native endianness, let's move
>> to a model where the device tells us which endianness it's in.
>> 
>> That way we can compile the devices only once and get rid of all the ugly
>> swap will be done by the underlying layer.
>> 
>> For the same of readability, this patch only introduces the helper framework
>> but doesn't allow the registering code to set its endianness yet.
>> 
>> Signed-off-by: Alexander Graf <agraf@suse.de>
> 
>> ---
>> cpu-common.h |    4 ++
>> exec.c       |  122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 126 insertions(+), 0 deletions(-)
>> 
>> diff --git a/cpu-common.h b/cpu-common.h
>> index a543b5d..839b236 100644
>> --- a/cpu-common.h
>> +++ b/cpu-common.h
>> @@ -20,6 +20,10 @@
>> 
>> #if !defined(CONFIG_USER_ONLY)
>> 
>> +#define DEVICE_NATIVE_ENDIAN  0
>> +#define DEVICE_BIG_ENDIAN     1
>> +#define DEVICE_LITTLE_ENDIAN  2
> 
> I believe some people around here voiced a preference for enums, to aid with gdb debugging?

Yeah, I actually like enums better too :). Good point!

> 
>> diff --git a/exec.c b/exec.c
>> index db9ff55..f54a360 100644
>> --- a/exec.c
>> +++ b/exec.c
> 
>> @@ -3370,6 +3474,22 @@ static int cpu_register_io_memory_fixed(int io_index,
>>    }
>>    io_mem_opaque[io_index] = opaque;
>> 
>> +    switch (endian) {
>> +    case DEVICE_BIG_ENDIAN:
>> +#ifndef TARGET_WORDS_BIGENDIAN
>> +        swapendian_init(io_index);
>> +#endif
>> +        break;
> 
> So basically, you just moved the #ifdefs to another place. :) Shouldn't this be dependent on the CPU state and determined at runtime? Thinking of MSR LE bit on ppc. I guess QEMU doesn't support bi-endian ppc today, as does the 970, but it would be nice to keep it in mind.

MSR_LE is the only case where runtime determination really makes sense. I'm not sure how that should be handled in the end. MSR_LE swizzles every memory access. Maybe it makes sense to just swap things inside of TCG code there.

Either way, I don't want to add any performance penalty to LE-on-LE (the default use case) and runtime checks on every MMIO are just too heavy :(.


Alex
Peter Maydell - Nov. 26, 2010, 11:18 p.m.
On 26 November 2010 23:00, Andreas Färber <andreas.faerber@web.de> wrote:
> Am 25.11.2010 um 08:35 schrieb Alexander Graf:
>> +    switch (endian) {
>> +    case DEVICE_BIG_ENDIAN:
>> +#ifndef TARGET_WORDS_BIGENDIAN
>> +        swapendian_init(io_index);
>> +#endif
>> +        break;
>
> So basically, you just moved the #ifdefs to another place. :) Shouldn't this
> be dependent on the CPU state and determined at runtime? Thinking of MSR LE
> bit on ppc. I guess QEMU doesn't support bi-endian ppc today, as does the
> 970, but it would be nice to keep it in mind.

I was wondering about that (ARM actually has two different 'big endian'
modes, one of which is byte-invariant and one of which is word-invariant,
and which are runtime selectable on at least some cores...) But
I think I agree with Paul Brook that endianness as we're considering it
here is really kind of a bus level property, and TARGET_WORDS_BIGENDIAN
is defining a sort of "system bus endianness". Runtime flippable cores
do the swapping in generated code before they get out to this bit, which
corresponds to how the hardware does it (things get swapped before they
get out to the signals external to the core).

That does mean that in some cases you end up doing a double byteswap,
but my guess would be that the slowdown on doing that for device access
is completely buried by the slowdown on having to do that for RAM access,
so it's not worth worrying about.

-- PMM
Paul Brook - Nov. 26, 2010, 11:47 p.m.
> > diff --git a/exec.c b/exec.c
> > index db9ff55..f54a360 100644
> > --- a/exec.c
> > +++ b/exec.c
> > 
> > @@ -3370,6 +3474,22 @@ static int cpu_register_io_memory_fixed(int
> > io_index,
> > 
> >     }
> >     io_mem_opaque[io_index] = opaque;
> > 
> > +    switch (endian) {
> > +    case DEVICE_BIG_ENDIAN:
> > +#ifndef TARGET_WORDS_BIGENDIAN
> > +        swapendian_init(io_index);
> > +#endif
> > +        break;
> 
> So basically, you just moved the #ifdefs to another place. :)

Many #ifdefs inconsistently scattered through all the device code have been 
replaced by a single #ifdef.

> Shouldn't this be dependent on the CPU state and determined at
> runtime? Thinking of MSR LE bit on ppc. I guess QEMU doesn't support
> bi-endian ppc today, as does the 970, but it would be nice to keep it
> in mind.

Switching endianness of a CPU generally does not effect the endianness of the 
CPU/peripheral busses.  It makes the CPU byteswap accesses before they are 
seen by either memory or devices.

In theory it might be possible to avoid redundant byteswaps if you're really 
clever.  In practice you still have to handle the fact that your devices are a 
different endianness to RAM, so it probably doesn't gain you a whole lot.

Paul
Blue Swirl - Nov. 27, 2010, 10:05 a.m.
On Fri, Nov 26, 2010 at 11:47 PM, Paul Brook <paul@codesourcery.com> wrote:
>> > diff --git a/exec.c b/exec.c
>> > index db9ff55..f54a360 100644
>> > --- a/exec.c
>> > +++ b/exec.c
>> >
>> > @@ -3370,6 +3474,22 @@ static int cpu_register_io_memory_fixed(int
>> > io_index,
>> >
>> >     }
>> >     io_mem_opaque[io_index] = opaque;
>> >
>> > +    switch (endian) {
>> > +    case DEVICE_BIG_ENDIAN:
>> > +#ifndef TARGET_WORDS_BIGENDIAN
>> > +        swapendian_init(io_index);
>> > +#endif
>> > +        break;
>>
>> So basically, you just moved the #ifdefs to another place. :)
>
> Many #ifdefs inconsistently scattered through all the device code have been
> replaced by a single #ifdef.
>
>> Shouldn't this be dependent on the CPU state and determined at
>> runtime? Thinking of MSR LE bit on ppc. I guess QEMU doesn't support
>> bi-endian ppc today, as does the 970, but it would be nice to keep it
>> in mind.
>
> Switching endianness of a CPU generally does not effect the endianness of the
> CPU/peripheral busses.  It makes the CPU byteswap accesses before they are
> seen by either memory or devices.
>
> In theory it might be possible to avoid redundant byteswaps if you're really
> clever.  In practice you still have to handle the fact that your devices are a
> different endianness to RAM, so it probably doesn't gain you a whole lot.

Sparc64 MMU can also perform byte swapping, there is also a byte
swapping CPU mode and byte swapping access instructions. I think only
the instructions are used (for PCI).
Paul Brook - Nov. 27, 2010, 10:24 a.m.
> > Switching endianness of a CPU generally does not effect the endianness of
> > the CPU/peripheral busses.  It makes the CPU byteswap accesses before
> > they are seen by either memory or devices.
> > 
> > In theory it might be possible to avoid redundant byteswaps if you're
> > really clever.  In practice you still have to handle the fact that your
> > devices are a different endianness to RAM, so it probably doesn't gain
> > you a whole lot.
> 
> Sparc64 MMU can also perform byte swapping, there is also a byte
> swapping CPU mode and byte swapping access instructions. I think only
> the instructions are used (for PCI).

Right, but that is (to a large extent) a separate problem from memory mapped 
peripherals.  You still have to handle the case where a single TLB entry 
covers both ram and a cross-endian device.

Paul
Gleb Natapov - Nov. 28, 2010, 8:12 a.m.
On Thu, Nov 25, 2010 at 08:35:41AM +0100, Alexander Graf wrote:
> The way we're currently modeling mmio is too simplified. We assume that
> every device has the same endianness as the target CPU. In reality,
> most devices are little endian (all PCI and ISA ones I'm aware of). Some
> are big endian (special system devices) and a very little fraction is
> target native endian (fw_cfg).
> 
As far as I can see fw_cfg calls cpu_to_le() on some of entries and
doesn't on others. Doesn't this makes it "mixed endian" (aka broken)?

--
			Gleb.
Alexander Graf - Nov. 28, 2010, 11:05 a.m.
On 28.11.2010, at 09:12, Gleb Natapov wrote:

> On Thu, Nov 25, 2010 at 08:35:41AM +0100, Alexander Graf wrote:
>> The way we're currently modeling mmio is too simplified. We assume that
>> every device has the same endianness as the target CPU. In reality,
>> most devices are little endian (all PCI and ISA ones I'm aware of). Some
>> are big endian (special system devices) and a very little fraction is
>> target native endian (fw_cfg).
>> 
> As far as I can see fw_cfg calls cpu_to_le() on some of entries and
> doesn't on others. Doesn't this makes it "mixed endian" (aka broken)?

There are two layers of brokenness:

1) MMIO
2) Internal structures

A device that assumes native target endianness for its own MMIO is usually broken. Devices don't know which CPU they're plugged to (unless built inside of course) :).

That specific issue is that fw_cfg_mem_writew directly takes the selector as is and both openbios and seabios expect it to accept values in native endianness. The way forward to fix this is probably to declare the selector as little endian and just commit a new version of openbios.


Its internal structures are a different story. If the interface spec of fw_cfg (basically the .c file atm) defines some values to be a different endianness from the other, that might be acceptable. It's still unpretty though. I'd way prefer to see everything aligned to a single endianness.


Alex

Patch

diff --git a/cpu-common.h b/cpu-common.h
index a543b5d..839b236 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -20,6 +20,10 @@ 
 
 #if !defined(CONFIG_USER_ONLY)
 
+#define DEVICE_NATIVE_ENDIAN  0
+#define DEVICE_BIG_ENDIAN     1
+#define DEVICE_LITTLE_ENDIAN  2
+
 /* address in the RAM (different from a physical address) */
 typedef unsigned long ram_addr_t;
 
diff --git a/exec.c b/exec.c
index db9ff55..f54a360 100644
--- a/exec.c
+++ b/exec.c
@@ -3336,6 +3336,109 @@  static int get_free_io_mem_idx(void)
     return -1;
 }
 
+/*
+ * Usually, devices operate in little endian mode. There are devices out
+ * there that operate in big endian too. Each device gets byte swapped
+ * mmio if plugged onto a CPU that does the other endianness.
+ *
+ * CPU          Device           swap?
+ *
+ * little       little           no
+ * little       big              yes
+ * big          little           yes
+ * big          big              no
+ */
+#ifdef TARGET_WORDS_BIGENDIAN
+
+typedef struct SwapEndianContainer {
+    CPUReadMemoryFunc *read[3];
+    CPUWriteMemoryFunc *write[3];
+    void *opaque;
+} SwapEndianContainer;
+
+static uint32_t swapendian_mem_readb (void *opaque, target_phys_addr_t addr)
+{
+    uint32_t val;
+    SwapEndianContainer *c = opaque;
+    val = c->read[0](c->opaque, addr);
+    return val;
+}
+
+static uint32_t swapendian_mem_readw(void *opaque, target_phys_addr_t addr)
+{
+    uint32_t val;
+    SwapEndianContainer *c = opaque;
+    val = bswap16(c->read[1](c->opaque, addr));
+    return val;
+}
+
+static uint32_t swapendian_mem_readl(void *opaque, target_phys_addr_t addr)
+{
+    uint32_t val;
+    SwapEndianContainer *c = opaque;
+    val = bswap32(c->read[2](c->opaque, addr));
+    return val;
+}
+
+static CPUReadMemoryFunc * const swapendian_readfn[3]={
+    swapendian_mem_readb,
+    swapendian_mem_readw,
+    swapendian_mem_readl
+};
+
+static void swapendian_mem_writeb(void *opaque, target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    SwapEndianContainer *c = opaque;
+    c->write[0](c->opaque, addr, val);
+}
+
+static void swapendian_mem_writew(void *opaque, target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    SwapEndianContainer *c = opaque;
+    c->write[1](c->opaque, addr, bswap16(val));
+}
+
+static void swapendian_mem_writel(void *opaque, target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    SwapEndianContainer *c = opaque;
+    c->write[2](c->opaque, addr, bswap32(val));
+}
+
+static CPUWriteMemoryFunc * const swapendian_writefn[3]={
+    swapendian_mem_writeb,
+    swapendian_mem_writew,
+    swapendian_mem_writel
+};
+
+static void swapendian_init(int io_index)
+{
+    SwapEndianContainer *c = qemu_malloc(sizeof(SwapEndianContainer));
+    int i;
+
+    /* Swap mmio for big endian targets */
+    c->opaque = io_mem_opaque[io_index];
+    for (i = 0; i < 3; i++) {
+        c->read[i] = io_mem_read[io_index][i];
+        c->write[i] = io_mem_write[io_index][i];
+
+        io_mem_read[io_index][i] = swapendian_readfn[i];
+        io_mem_write[io_index][i] = swapendian_writefn[i];
+    }
+    io_mem_opaque[io_index] = c;
+}
+
+static void swapendian_del(int io_index)
+{
+    if (io_mem_read[io_index][0] == swapendian_readfn[0]) {
+        qemu_free(io_mem_opaque[io_index]);
+    }
+}
+
+#endif
+
 /* mem_read and mem_write are arrays of functions containing the
    function to access byte (index 0), word (index 1) and dword (index
    2). Functions can be omitted with a NULL function pointer.
@@ -3349,6 +3452,7 @@  static int cpu_register_io_memory_fixed(int io_index,
                                         void *opaque)
 {
     int i;
+    int endian = DEVICE_NATIVE_ENDIAN;
 
     if (io_index <= 0) {
         io_index = get_free_io_mem_idx();
@@ -3370,6 +3474,22 @@  static int cpu_register_io_memory_fixed(int io_index,
     }
     io_mem_opaque[io_index] = opaque;
 
+    switch (endian) {
+    case DEVICE_BIG_ENDIAN:
+#ifndef TARGET_WORDS_BIGENDIAN
+        swapendian_init(io_index);
+#endif
+        break;
+    case DEVICE_LITTLE_ENDIAN:
+#ifdef TARGET_WORDS_BIGENDIAN
+        swapendian_init(io_index);
+#endif
+        break;
+    case DEVICE_NATIVE_ENDIAN:
+    default:
+        break;
+    }
+
     return (io_index << IO_MEM_SHIFT);
 }
 
@@ -3385,6 +3505,8 @@  void cpu_unregister_io_memory(int io_table_address)
     int i;
     int io_index = io_table_address >> IO_MEM_SHIFT;
 
+    swapendian_del(io_index);
+
     for (i=0;i < 3; i++) {
         io_mem_read[io_index][i] = unassigned_mem_read[i];
         io_mem_write[io_index][i] = unassigned_mem_write[i];