Patchwork [2/9] tcg/optimizer: check types in copy propagation

login
register
mail settings
Submitter Aurelien Jarno
Date Sept. 19, 2012, 8 p.m.
Message ID <1348084823-18277-3-git-send-email-aurelien@aurel32.net>
Download mbox | patch
Permalink /patch/185210/
State New
Headers show

Comments

Aurelien Jarno - Sept. 19, 2012, 8 p.m.
The copy propagation doesn't check the types of the temps during copy
propagation. However TCG is using the mov_i32 for the i64 to i32
conversion and thus the two are not equivalent.

With this patch tcg_opt_gen_mov() doesn't consider two temps with
different types as copies anymore.

So far it seems the optimization was not aggressive enough to trigger
this bug, but it will be triggered later in this series once the copy
propagation is improved.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c |   18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)
Richard Henderson - Sept. 19, 2012, 9:33 p.m.
On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> The copy propagation doesn't check the types of the temps during copy
> propagation. However TCG is using the mov_i32 for the i64 to i32
> conversion and thus the two are not equivalent.
> 
> With this patch tcg_opt_gen_mov() doesn't consider two temps with
> different types as copies anymore.
> 
> So far it seems the optimization was not aggressive enough to trigger
> this bug, but it will be triggered later in this series once the copy
> propagation is improved.

Exactly where/how does this manifest as problematic?

We do this mov_i32 trick from i64->i32 when we're truncating.
Given that we're not (yet) targeting mips64 and having to
maintain 32-bit sign-extended quantities, I can't see how
that would matter.

We do the i32->i64 trick immediately before a proper extension.

In either case I can't see how plain copy propagation should
matter until some other operation is involved.  So, do we have
some other data propagation bug that is being masked here?


r~
Aurelien Jarno - Sept. 20, 2012, 5:54 a.m.
On Wed, Sep 19, 2012 at 02:33:46PM -0700, Richard Henderson wrote:
> On 09/19/2012 01:00 PM, Aurelien Jarno wrote:
> > The copy propagation doesn't check the types of the temps during copy
> > propagation. However TCG is using the mov_i32 for the i64 to i32
> > conversion and thus the two are not equivalent.
> > 
> > With this patch tcg_opt_gen_mov() doesn't consider two temps with
> > different types as copies anymore.
> > 
> > So far it seems the optimization was not aggressive enough to trigger
> > this bug, but it will be triggered later in this series once the copy
> > propagation is improved.
> 
> Exactly where/how does this manifest as problematic?

The problem arise when a 64-bit value is moved to a 32-bit value, and
later this 64-bit value is reused. With the copy propagation if you
consider them as identical, the 32-bit value might be used instead of
the 64-bit value. This happens for example for the umull instruction on
arm:

| OP:
|  ---- 0xf67e5ea0
|  mov_i32 tmp5,r3
|  mov_i32 tmp6,r8
|  ext32u_i64 tmp7,tmp5
|  ext32u_i64 tmp8,tmp6
|  mul_i64 tmp7,tmp7,tmp8

tmp7 is a 64-bit value

|  mov_i32 tmp6,tmp7

Now transfered to a 32-bit tmp

|  mov_i32 r1,tmp6

and a 32-bit register.

|  movi_i64 tmp8,$0x20
|  shr_i64 tmp7,tmp7,tmp8
|  mov_i32 tmp6,tmp7
|  mov_i32 r3,tmp6
|  goto_tb $0x1
|  movi_i32 pc,$0xf67e5ea4
|  exit_tb $0x7f16948b0299
| 
| OP after optimization and liveness analysis:
|  ---- 0xf67e5ea0
|  nopn $0x2,$0x2
|  nopn $0x2,$0x2
|  ext32u_i64 tmp7,r3
|  ext32u_i64 tmp8,r8
|  mul_i64 tmp7,tmp7,tmp8
|  mov_i32 tmp6,tmp7
|  mov_i32 r1,tmp6
|  movi_i64 tmp8,$0x20
|  shr_i64 tmp7,r1,tmp8

Here, tmp7 is replaced by r1. However r1 only contains the 32-bit low
part of tmp7, thus returning 0.

|  mov_i32 tmp6,tmp7
|  mov_i32 r3,tmp6
|  goto_tb $0x1
|  movi_i32 pc,$0xf67e5ea4
|  exit_tb $0x7f16948b0299
|  end


> We do this mov_i32 trick from i64->i32 when we're truncating.
> Given that we're not (yet) targeting mips64 and having to
> maintain 32-bit sign-extended quantities, I can't see how
> that would matter.

It does matter on architectures using different instructions to work on
the 32 part of a register the registers. Actually in the case a above
with a register shift right, shr_i32 and shr_i64 are always implemented
differently to not shift bits from the 32bit high part to the low part.

> We do the i32->i64 trick immediately before a proper extension.
> 
> In either case I can't see how plain copy propagation should
> matter until some other operation is involved.  So, do we have
> some other data propagation bug that is being masked here?

I think it's a bug of the copy propagation considering these registers
are equivalent. If on x86-64 you replace all accesses to rax by eax,
your code will be smaller, but I am not sure it is going to work 
correctly.

The only latent bug I can see there, is having mov_i32 both being used
to move data between 32-bit temps and between a 64-bit temp and a 32-bit
temp.
Richard Henderson - Sept. 20, 2012, 2 p.m.
On 09/19/2012 10:54 PM, Aurelien Jarno wrote:
> |  mov_i32 r1,tmp6
> |  movi_i64 tmp8,$0x20
> |  shr_i64 tmp7,r1,tmp8
> 
> Here, tmp7 is replaced by r1. However r1 only contains the 32-bit low
> part of tmp7, thus returning 0.

Ok.  Thanks for getting that on the record.


r~

Patch

diff --git a/tcg/optimize.c b/tcg/optimize.c
index ca81198..13b5b15 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -106,12 +106,13 @@  static TCGOpcode op_to_movi(TCGOpcode op)
     }
 }
 
-static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
-                            int nb_temps, int nb_globals)
+static void tcg_opt_gen_mov(TCGContext *s, TCGArg *gen_args,
+                            TCGArg dst, TCGArg src)
 {
-        reset_temp(dst, nb_temps, nb_globals);
+        reset_temp(dst, s->nb_temps, s->nb_globals);
         assert(temps[src].state != TCG_TEMP_COPY);
-        if (src >= nb_globals) {
+        /* Only consider temps with the same type (width) as copies. */
+        if (src >= s->nb_globals && s->temps[dst].type == s->temps[src].type) {
             assert(temps[src].state != TCG_TEMP_CONST);
             if (temps[src].state != TCG_TEMP_HAS_COPY) {
                 temps[src].state = TCG_TEMP_HAS_COPY;
@@ -440,8 +441,7 @@  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
-                    tcg_opt_gen_mov(gen_args, args[0], args[1],
-                                    nb_temps, nb_globals);
+                    tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
                     gen_args += 2;
                 }
                 args += 3;
@@ -478,8 +478,7 @@  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 } else {
                     gen_opc_buf[op_index] = op_to_mov(op);
-                    tcg_opt_gen_mov(gen_args, args[0], args[1], nb_temps,
-                                    nb_globals);
+                    tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
                     gen_args += 2;
                 }
                 args += 3;
@@ -503,8 +502,7 @@  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 break;
             }
             if (temps[args[1]].state != TCG_TEMP_CONST) {
-                tcg_opt_gen_mov(gen_args, args[0], args[1],
-                                nb_temps, nb_globals);
+                tcg_opt_gen_mov(s, gen_args, args[0], args[1]);
                 gen_args += 2;
                 args += 2;
                 break;