diff mbox series

[1/1] tci: eliminate UB due to unaligned reads

Message ID 20180127134908.24095-2-anatoly.trosinenko@gmail.com
State New
Headers show
Series Fix unaligned reads in the tcg/tci.c | expand

Commit Message

Anatoly Trosinenko Jan. 27, 2018, 1:49 p.m. UTC
Use ldl_he_p / ldq_he_p functions instead of a plain memory access
through pointer.

Signed-off-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
---
 tcg/tci.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Stefan Weil Jan. 27, 2018, 4:38 p.m. UTC | #1
Am 27.01.2018 um 14:49 schrieb Anatoly Trosinenko:
> Use ldl_he_p / ldq_he_p functions instead of a plain memory access
> through pointer.
> 
> Signed-off-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
> ---
>  tcg/tci.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)

A better alternative might be aligning the relevant data when generating
the bytecode. See also my comment on alignment in tcg/tci/README.

Stefan
Anatoly Trosinenko Jan. 28, 2018, 6:42 a.m. UTC | #2
My patch is kind of trivial quick fix that just eliminates these unaligned
reads and doesn't seem to require complicated testing supposing my code
properly handles integer promotion (and hope it will not slow the
interpreter down).

Aligning everything, on the other hand, can not only remove the UB but also
speed things up, but if I get it right, requires O(opcode count) manual
work and subsequent less trivial testing that every opcode's argument
layout match on generation and interpretation side (errors should be
significantly localized due to present assertion on operation size,
though). So my patch may be considered as temporary solution.

In fact, I had to similarly make these unaligned reads explicit when
porting QEMU to JavaScript because Emscripten hugely relies on absence of
some kinds of UB such as "implicit" unaligned accesses, and such fix seemed
to resolve this issue for me on "host with special alignment requirement".


2018-01-27 19:38 GMT+03:00 Stefan Weil <sw@weilnetz.de>:

> Am 27.01.2018 um 14:49 schrieb Anatoly Trosinenko:
> > Use ldl_he_p / ldq_he_p functions instead of a plain memory access
> > through pointer.
> >
> > Signed-off-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
> > ---
> >  tcg/tci.c | 16 +++++++++++-----
> >  1 file changed, 11 insertions(+), 5 deletions(-)
>
> A better alternative might be aligning the relevant data when generating
> the bytecode. See also my comment on alignment in tcg/tci/README.
>
> Stefan
>
>
Anatoly Trosinenko Feb. 19, 2018, 2:26 p.m. UTC | #3
Ping.
Patchwork link: http://patchwork.ozlabs.org/patch/866732/
Patchew link: http://patchew.org/QEMU/20180127134908.24095-1-
anatoly.trosinenko@gmail.com/

Original cover letter:
The code in tcg/tci.c reads some data from TCI bytecode through
pointer dereferencing. As far as I know unaligned reads in such a way are
undefined behavior and compiling with -fsanitize=undefined enumerated
them as such at run-time.

I have replaced such reads with invocations of ld{l,q}_he_p.
A comment in include/qemu/bswap.h:310 suggests they should be properly
translated by the compiler. I didn't added signed/unsigned casts
since bswap.h does contain separate signed/unsigned versions
for 16-bit integers but does not for 32- and 64-bit ones, so I supposed
the developers of the bswap.h already arranged everything so
integer promotions don't mess things up. I can add casts in case I'm
not right about it.
http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06516.html


2018-01-28 9:42 GMT+03:00 Anatoly Trosinenko <anatoly.trosinenko@gmail.com>:

> My patch is kind of trivial quick fix that just eliminates these unaligned
> reads and doesn't seem to require complicated testing supposing my code
> properly handles integer promotion (and hope it will not slow the
> interpreter down).
>
> Aligning everything, on the other hand, can not only remove the UB but
> also speed things up, but if I get it right, requires O(opcode count)
> manual work and subsequent less trivial testing that every opcode's
> argument layout match on generation and interpretation side (errors should
> be significantly localized due to present assertion on operation size,
> though). So my patch may be considered as temporary solution.
>
> In fact, I had to similarly make these unaligned reads explicit when
> porting QEMU to JavaScript because Emscripten hugely relies on absence of
> some kinds of UB such as "implicit" unaligned accesses, and such fix seemed
> to resolve this issue for me on "host with special alignment requirement".
>
>
> 2018-01-27 19:38 GMT+03:00 Stefan Weil <sw@weilnetz.de>:
>
>> Am 27.01.2018 um 14:49 schrieb Anatoly Trosinenko:
>> > Use ldl_he_p / ldq_he_p functions instead of a plain memory access
>> > through pointer.
>> >
>> > Signed-off-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
>> > ---
>> >  tcg/tci.c | 16 +++++++++++-----
>> >  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> A better alternative might be aligning the relevant data when generating
>> the bytecode. See also my comment on alignment in tcg/tci/README.
>>
>> Stefan
>>
>>
Anatoly Trosinenko March 3, 2018, 8:54 a.m. UTC | #4
Ping.
Patchwork link: http://patchwork.ozlabs.org/patch/866732/
Patchew link: http://patchew.org/QEMU/20180127134908.24095-1-anatoly.
trosinenko@gmail.com/

The code in tcg/tci.c reads some data from TCI bytecode through
pointer dereferencing. As far as I know unaligned reads in such a way are
undefined behavior and compiling with -fsanitize=undefined enumerated
them as such at run-time.

I have replaced such reads with invocations of ld{l,q}_he_p.
A comment in include/qemu/bswap.h:310 suggests they should be properly
translated by the compiler. I didn't added signed/unsigned casts
since bswap.h does contain separate signed/unsigned versions
for 16-bit integers but does not for 32- and 64-bit ones, so I supposed
the developers of the bswap.h already arranged everything so
integer promotions don't mess things up. I can add casts in case I'm
not right about it.
http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06516.html

2018-01-28 9:42 GMT+03:00 Anatoly Trosinenko <anatoly.trosinenko@gmail.com>:

> My patch is kind of trivial quick fix that just eliminates these unaligned
> reads and doesn't seem to require complicated testing supposing my code
> properly handles integer promotion (and hope it will not slow the
> interpreter down).
>
> Aligning everything, on the other hand, can not only remove the UB but
> also speed things up, but if I get it right, requires O(opcode count)
> manual work and subsequent less trivial testing that every opcode's
> argument layout match on generation and interpretation side (errors should
> be significantly localized due to present assertion on operation size,
> though). So my patch may be considered as temporary solution.
>
> In fact, I had to similarly make these unaligned reads explicit when
> porting QEMU to JavaScript because Emscripten hugely relies on absence of
> some kinds of UB such as "implicit" unaligned accesses, and such fix seemed
> to resolve this issue for me on "host with special alignment requirement".
>
>
> 2018-01-27 19:38 GMT+03:00 Stefan Weil <sw@weilnetz.de>:
>
>> Am 27.01.2018 um 14:49 schrieb Anatoly Trosinenko:
>> > Use ldl_he_p / ldq_he_p functions instead of a plain memory access
>> > through pointer.
>> >
>> > Signed-off-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
>> > ---
>> >  tcg/tci.c | 16 +++++++++++-----
>> >  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> A better alternative might be aligning the relevant data when generating
>> the bytecode. See also my comment on alignment in tcg/tci/README.
>>
>> Stefan
>>
>>
>
>
> --
> С уважением,
> Анатолий Тросиненко
> e-mail: anatoly.trosinenko@gmail.com
>
Richard Henderson March 3, 2018, 1:57 p.m. UTC | #5
On 03/03/2018 12:54 AM, Anatoly Trosinenko wrote:
> Ping.
> Patchwork link: http://patchwork.ozlabs.org/patch/866732/
> <http://patchwork.ozlabs.org/patch/866732/>
> Patchew link:
> http://patchew.org/QEMU/20180127134908.24095-1-anatoly.trosinenko@gmail.com/
> <http://patchew.org/QEMU/20180127134908.24095-1-anatoly.trosinenko@gmail.com/>
> 
> The code in tcg/tci.c reads some data from TCI bytecode through
> pointer dereferencing. As far as I know unaligned reads in such a way are
> undefined behavior and compiling with -fsanitize=undefined enumerated
> them as such at run-time.

This is exactly one of the reasons why I have urged for TCI to be abandoned.

While your patch works, it is *enormously* inefficient for hosts that require it.


r~
Anatoly Trosinenko March 3, 2018, 2:07 p.m. UTC | #6
Can rewriting TCI in such a way that every operation is aligned at 4- or
even 8-byte boundary fix the situation or are there some more serious
problems?

2018-03-03 16:57 GMT+03:00 Richard Henderson <rth@twiddle.net>:

> On 03/03/2018 12:54 AM, Anatoly Trosinenko wrote:
> > Ping.
> > Patchwork link: http://patchwork.ozlabs.org/patch/866732/
> > <http://patchwork.ozlabs.org/patch/866732/>
> > Patchew link:
> > http://patchew.org/QEMU/20180127134908.24095-1-
> anatoly.trosinenko@gmail.com/
> > <http://patchew.org/QEMU/20180127134908.24095-1-
> anatoly.trosinenko@gmail.com/>
> >
> > The code in tcg/tci.c reads some data from TCI bytecode through
> > pointer dereferencing. As far as I know unaligned reads in such a way are
> > undefined behavior and compiling with -fsanitize=undefined enumerated
> > them as such at run-time.
>
> This is exactly one of the reasons why I have urged for TCI to be
> abandoned.
>
> While your patch works, it is *enormously* inefficient for hosts that
> require it.
>
>
> r~
>
Richard Henderson March 3, 2018, 2:13 p.m. UTC | #7
On 03/03/2018 06:07 AM, Anatoly Trosinenko wrote:
> Can rewriting TCI in such a way that every operation is aligned at 4- or even
> 8-byte boundary fix the situation or are there some more serious problems?

With the current TCI, there are also problems with calls to helper functions.
The only portable way to do this is to use a library such as libffi.

I once rewrote TCI completely in order to address both problems, but that only
brought questions as to why TCI is useful at all.

So.  Why do you want to use TCI instead of a native TCG backend?


r~
Anatoly Trosinenko March 3, 2018, 2:59 p.m. UTC | #8
> So.  Why do you want to use TCI instead of a native TCG backend?

Frankly speaking, personally I just have a strange experiment on porting
QEMU to JavaScript. :) I used the TCI bytecode as some intermediate
patchable form for rarely executing BBs and for (re)generating asm.js from
it when required. I used a Python script to generate wrappers with exactly
10 arguments around helper functions. In fact, it may be worth for me to
create WebAssembly TCG backend and interpret **that** bytecode if required.

TCI may still be useful for someone else running QEMU on unsupported host,
though.

2018-03-03 17:13 GMT+03:00 Richard Henderson <rth@twiddle.net>:

> On 03/03/2018 06:07 AM, Anatoly Trosinenko wrote:
> > Can rewriting TCI in such a way that every operation is aligned at 4- or
> even
> > 8-byte boundary fix the situation or are there some more serious
> problems?
>
> With the current TCI, there are also problems with calls to helper
> functions.
> The only portable way to do this is to use a library such as libffi.
>
> I once rewrote TCI completely in order to address both problems, but that
> only
> brought questions as to why TCI is useful at all.
>
> So.  Why do you want to use TCI instead of a native TCG backend?
>
>
> r~
>

--
Best regards,
Anatoly
Stefan Weil March 3, 2018, 3:41 p.m. UTC | #9
Am 03.03.2018 um 15:07 schrieb Anatoly Trosinenko:
> Can rewriting TCI in such a way that every operation is aligned at 4- or
> even 8-byte boundary fix the situation or are there some more serious
> problems?

That's my preferred solution. Are there cases which would require 8-byte
alignment?

Richard, the discussion about non-portable calls to helper functions is
not new, but I still have no test case which fails. Do you think of a
special case (architecture, helper function)?

Stefan
Anatoly Trosinenko March 3, 2018, 4:01 p.m. UTC | #10
2018-03-03 18:41 GMT+03:00 Stefan Weil <sw@weilnetz.de>:

> Am 03.03.2018 um 15:07 schrieb Anatoly Trosinenko:
> > Can rewriting TCI in such a way that every operation is aligned at 4- or
> > even 8-byte boundary fix the situation or are there some more serious
> > problems?
>
> That's my preferred solution. Are there cases which would require 8-byte
> alignment?
>

And what if create some function like

uint8_t *align_and_increment(uint8_t **ptr, int pow2) {
  size_t size = 1 << pow2;
  uint8_t *result = (uint8_t*)((((uintptr)*ptr) + size - 1) & ~(size - 1));
  *ptr = result + size;
  return result;
}

and rewrite get / put functions like this:

static uint32_t tci_read_i32(uint8_t **tb_ptr)
{
    uint32_t value = *(uint32_t *)align_and_increment(tb_ptr, 2);
    return value;
}

On one hand, it involves some slightly obscure pointer calculations
(just in one place), on the other hand, no modifications will probably
be required for TCI TCG backend or interpreter loop code (they can
still be useful for **optimizations** of bytecode size, but it should
just work as is).

--
Best regards,
Anatoly
Richard Henderson March 12, 2018, 1:45 p.m. UTC | #11
On 03/03/2018 05:41 PM, Stefan Weil wrote:
> Richard, the discussion about non-portable calls to helper functions is
> not new, but I still have no test case which fails. Do you think of a
> special case (architecture, helper function)?

The two that come to mind immediately are:

i386 with a compiler defaulting to stdcall (which I admit is unlikely; it also
would not work for TCG without modification).

m68k (pointers passed in address registers; integrals passed in data registers).


r~
diff mbox series

Patch

diff --git a/tcg/tci.c b/tcg/tci.c
index 33edca1903..410817cf6a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -159,7 +159,13 @@  static uint64_t tci_uint64(uint32_t high, uint32_t low)
 /* Read constant (native size) from bytecode. */
 static tcg_target_ulong tci_read_i(uint8_t **tb_ptr)
 {
-    tcg_target_ulong value = *(tcg_target_ulong *)(*tb_ptr);
+#if TCG_TARGET_REG_BITS == 32
+    tcg_target_ulong value = ldl_he_p(*tb_ptr);
+#elif TCG_TARGET_REG_BITS == 64
+    tcg_target_ulong value = ldq_he_p(*tb_ptr);
+#else
+#error unsupported
+#endif
     *tb_ptr += sizeof(value);
     return value;
 }
@@ -167,7 +173,7 @@  static tcg_target_ulong tci_read_i(uint8_t **tb_ptr)
 /* Read unsigned constant (32 bit) from bytecode. */
 static uint32_t tci_read_i32(uint8_t **tb_ptr)
 {
-    uint32_t value = *(uint32_t *)(*tb_ptr);
+    uint32_t value = ldl_he_p(*tb_ptr);
     *tb_ptr += sizeof(value);
     return value;
 }
@@ -175,7 +181,7 @@  static uint32_t tci_read_i32(uint8_t **tb_ptr)
 /* Read signed constant (32 bit) from bytecode. */
 static int32_t tci_read_s32(uint8_t **tb_ptr)
 {
-    int32_t value = *(int32_t *)(*tb_ptr);
+    int32_t value = ldl_he_p(*tb_ptr);
     *tb_ptr += sizeof(value);
     return value;
 }
@@ -184,7 +190,7 @@  static int32_t tci_read_s32(uint8_t **tb_ptr)
 /* Read constant (64 bit) from bytecode. */
 static uint64_t tci_read_i64(uint8_t **tb_ptr)
 {
-    uint64_t value = *(uint64_t *)(*tb_ptr);
+    uint64_t value = ldq_he_p(*tb_ptr);
     *tb_ptr += sizeof(value);
     return value;
 }
@@ -1094,7 +1100,7 @@  uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             /* QEMU specific operations. */
 
         case INDEX_op_exit_tb:
-            ret = *(uint64_t *)tb_ptr;
+            ret = ldq_he_p(tb_ptr);
             goto exit;
             break;
         case INDEX_op_goto_tb: