Patchwork [v3,2/6] Add copy and constant propagation.

login
register
mail settings
Submitter Kirill Batuzov
Date July 7, 2011, 12:37 p.m.
Message ID <80797ddb7efb09eef63b444485bd3f5c9fd328b9.1309865252.git.batuzovk@ispras.ru>
Download mbox | patch
Permalink /patch/103648/
State New
Headers show

Comments

Kirill Batuzov - July 7, 2011, 12:37 p.m.
Make tcg_constant_folding do copy and constant propagation. It is a
preparational work before actual constant folding.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |  182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 180 insertions(+), 2 deletions(-)
Stefan Weil - Aug. 3, 2011, 7 p.m.
Am 07.07.2011 14:37, schrieb Kirill Batuzov:
> Make tcg_constant_folding do copy and constant propagation. It is a
> preparational work before actual constant folding.
>
> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
> ---
>   tcg/optimize.c |  182 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 files changed, 180 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index c7c7da9..f8afe71 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
>    
...

This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
and w32 hosts). Simply running qemu (BIOS only) terminates
with abort(). As the error is easy to reproduce, I don't provide
a stack frame here.

> +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
> +                            int nb_temps, int nb_globals)
> +{
> +        reset_temp(dst, nb_temps, nb_globals);
> +        assert(temps[src].state != TCG_TEMP_COPY);
> +        if (src>= nb_globals) {
> +            assert(temps[src].state != TCG_TEMP_CONST);
> +            if (temps[src].state != TCG_TEMP_HAS_COPY) {
> +                temps[src].state = TCG_TEMP_HAS_COPY;
> +                temps[src].next_copy = src;
> +                temps[src].prev_copy = src;
> +            }
> +            temps[dst].state = TCG_TEMP_COPY;
> +            temps[dst].val = src;
> +            temps[dst].next_copy = temps[src].next_copy;
> +            temps[dst].prev_copy = src;
> +            temps[temps[dst].next_copy].prev_copy = dst;
> +            temps[src].next_copy = dst;
> +        }
> +        gen_args[0] = dst;
> +        gen_args[1] = src;
> +}
>    

QEMU with a modified tcg_opt_gen_mov() (without the if block) works.

Kind regards,
Stefan Weil
Blue Swirl - Aug. 3, 2011, 8:20 p.m.
On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>
>> Make tcg_constant_folding do copy and constant propagation. It is a
>> preparational work before actual constant folding.
>>
>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>> ---
>>  tcg/optimize.c |  182
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index c7c7da9..f8afe71 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>>
>
> ...
>
> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
> and w32 hosts). Simply running qemu (BIOS only) terminates
> with abort(). As the error is easy to reproduce, I don't provide
> a stack frame here.

I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
Sparc64 emulators work fine.

Maybe you have a stale build (bug in Makefile dependencies)?

>> +static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
>> +                            int nb_temps, int nb_globals)
>> +{
>> +        reset_temp(dst, nb_temps, nb_globals);
>> +        assert(temps[src].state != TCG_TEMP_COPY);
>> +        if (src>= nb_globals) {
>> +            assert(temps[src].state != TCG_TEMP_CONST);
>> +            if (temps[src].state != TCG_TEMP_HAS_COPY) {
>> +                temps[src].state = TCG_TEMP_HAS_COPY;
>> +                temps[src].next_copy = src;
>> +                temps[src].prev_copy = src;
>> +            }
>> +            temps[dst].state = TCG_TEMP_COPY;
>> +            temps[dst].val = src;
>> +            temps[dst].next_copy = temps[src].next_copy;
>> +            temps[dst].prev_copy = src;
>> +            temps[temps[dst].next_copy].prev_copy = dst;
>> +            temps[src].next_copy = dst;
>> +        }
>> +        gen_args[0] = dst;
>> +        gen_args[1] = src;
>> +}
>>
>
> QEMU with a modified tcg_opt_gen_mov() (without the if block) works.
>
> Kind regards,
> Stefan Weil
>
>
Stefan Weil - Aug. 3, 2011, 8:56 p.m.
Am 03.08.2011 22:20, schrieb Blue Swirl:
> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>
>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>> preparational work before actual constant folding.
>>>
>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>> ---
>>>  tcg/optimize.c |  182
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>> index c7c7da9..f8afe71 100644
>>> --- a/tcg/optimize.c
>>> +++ b/tcg/optimize.c
>>>
>>
>> ...
>>
>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>> and w32 hosts). Simply running qemu (BIOS only) terminates
>> with abort(). As the error is easy to reproduce, I don't provide
>> a stack frame here.
>
> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
> Sparc64 emulators work fine.
>
> Maybe you have a stale build (bug in Makefile dependencies)?

Sorry, an important information was wrong / missing in my report.
It's not qemu, but qemu-system-x86_64 which fails to work.

I just tested it once more with a new build:

$ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
/qemu/tcg/tcg.c:1646: tcg fatal error
Abgebrochen

Cheers,
Stefan
Stefan Weil - Aug. 3, 2011, 9:03 p.m.
Am 03.08.2011 22:56, schrieb Stefan Weil:
> Am 03.08.2011 22:20, schrieb Blue Swirl:
>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> 
>> wrote:
>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>
>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>> preparational work before actual constant folding.
>>>>
>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>> ---
>>>>  tcg/optimize.c |  182
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>> index c7c7da9..f8afe71 100644
>>>> --- a/tcg/optimize.c
>>>> +++ b/tcg/optimize.c
>>>>
>>>
>>> ...
>>>
>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>> with abort(). As the error is easy to reproduce, I don't provide
>>> a stack frame here.
>>
>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>> Sparc64 emulators work fine.
>>
>> Maybe you have a stale build (bug in Makefile dependencies)?
>
> Sorry, an important information was wrong / missing in my report.
> It's not qemu, but qemu-system-x86_64 which fails to work.
>
> I just tested it once more with a new build:
>
> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
> /qemu/tcg/tcg.c:1646: tcg fatal error
> Abgebrochen
>
> Cheers,
> Stefan

qemu-system-mips64el fails with the same error, so the problem
occurs when running 64 bit emulations on 32 bit hosts.
Blue Swirl - Aug. 4, 2011, 6:42 p.m.
On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 03.08.2011 22:56, schrieb Stefan Weil:
>>
>> Am 03.08.2011 22:20, schrieb Blue Swirl:
>>>
>>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>>>>
>>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>>
>>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>>> preparational work before actual constant folding.
>>>>>
>>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>>> ---
>>>>>  tcg/optimize.c |  182
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>>> index c7c7da9..f8afe71 100644
>>>>> --- a/tcg/optimize.c
>>>>> +++ b/tcg/optimize.c
>>>>>
>>>>
>>>> ...
>>>>
>>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>>> with abort(). As the error is easy to reproduce, I don't provide
>>>> a stack frame here.
>>>
>>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>>> Sparc64 emulators work fine.
>>>
>>> Maybe you have a stale build (bug in Makefile dependencies)?
>>
>> Sorry, an important information was wrong / missing in my report.
>> It's not qemu, but qemu-system-x86_64 which fails to work.
>>
>> I just tested it once more with a new build:
>>
>> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
>> /qemu/tcg/tcg.c:1646: tcg fatal error
>> Abgebrochen

OK, now that is broken also for me.

>> Cheers,
>> Stefan
>
> qemu-system-mips64el fails with the same error, so the problem
> occurs when running 64 bit emulations on 32 bit hosts.

Not always, Sparc64 still works fine.
Blue Swirl - Aug. 4, 2011, 7:24 p.m.
On Thu, Aug 4, 2011 at 6:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Aug 3, 2011 at 9:03 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> Am 03.08.2011 22:56, schrieb Stefan Weil:
>>>
>>> Am 03.08.2011 22:20, schrieb Blue Swirl:
>>>>
>>>> On Wed, Aug 3, 2011 at 7:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>>>>>
>>>>> Am 07.07.2011 14:37, schrieb Kirill Batuzov:
>>>>>>
>>>>>> Make tcg_constant_folding do copy and constant propagation. It is a
>>>>>> preparational work before actual constant folding.
>>>>>>
>>>>>> Signed-off-by: Kirill Batuzov<batuzovk@ispras.ru>
>>>>>> ---
>>>>>>  tcg/optimize.c |  182
>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>>  1 files changed, 180 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>>>>>> index c7c7da9..f8afe71 100644
>>>>>> --- a/tcg/optimize.c
>>>>>> +++ b/tcg/optimize.c
>>>>>>
>>>>>
>>>>> ...
>>>>>
>>>>> This patch breaks QEMU on 32 bit hosts (tested on 386 Linux
>>>>> and w32 hosts). Simply running qemu (BIOS only) terminates
>>>>> with abort(). As the error is easy to reproduce, I don't provide
>>>>> a stack frame here.
>>>>
>>>> I can't reproduce, i386/Linux and win32 versions of i386, Sparc32 and
>>>> Sparc64 emulators work fine.
>>>>
>>>> Maybe you have a stale build (bug in Makefile dependencies)?
>>>
>>> Sorry, an important information was wrong / missing in my report.
>>> It's not qemu, but qemu-system-x86_64 which fails to work.
>>>
>>> I just tested it once more with a new build:
>>>
>>> $ bin/x86_64-softmmu/qemu-system-x86_64 -L pc-bios
>>> /qemu/tcg/tcg.c:1646: tcg fatal error
>>> Abgebrochen
>
> OK, now that is broken also for me.
>
>>> Cheers,
>>> Stefan
>>
>> qemu-system-mips64el fails with the same error, so the problem
>> occurs when running 64 bit emulations on 32 bit hosts.
>
> Not always, Sparc64 still works fine.

x86_64 fails because 'mov_i32 cc_src_0,loc25' is incorrectly optimized
to 'mov_i32 cc_src_0,tmp6' where tmp6 is dead after brcond.

IN:
0x000000000ffeb90a:  shl    %cl,%eax

OP:
 ---- 0xffeb90a
 mov_i32 tmp2,rcx_0
 mov_i32 tmp3,rcx_1
 mov_i32 tmp0,rax_0
 mov_i32 tmp1,rax_1
 movi_i32 tmp20,$0x1f
 and_i32 tmp2,tmp2,tmp20
 movi_i32 tmp3,$0x0
 movi_i32 tmp21,$0xffffffff
 movi_i32 tmp22,$0xffffffff
 add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22
 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17

...tmp6 is assigned here...

 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
 mov_i32 rax_0,tmp0
 movi_i32 rax_1,$0x0
 mov_i32 loc23,tmp0
 mov_i32 loc24,tmp1
 mov_i32 loc25,tmp6

...tmp6 saved to loc25 to survive brcond...

 mov_i32 loc26,tmp7
 movi_i32 tmp21,$0x0
 movi_i32 tmp22,$0x0
 brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0
 mov_i32 cc_src_0,loc25

...used here.

 mov_i32 cc_src_1,loc26
 mov_i32 cc_dst_0,loc23
 mov_i32 cc_dst_1,loc24
 movi_i32 cc_op,$0x24
 set_label $0x0
 movi_i32 tmp8,$0xffeb90c
 movi_i32 tmp9,$0x0
 st_i32 tmp8,env,$0x80
 st_i32 tmp9,env,$0x84
 movi_i32 tmp20,$debug
 call tmp20,$0x0,$0

OP after liveness analysis:
 ---- 0xffeb90a
 mov_i32 tmp2,rcx_0
 nopn $0x2,$0x2
 mov_i32 tmp0,rax_0
 mov_i32 tmp1,rax_1
 movi_i32 tmp20,$0x1f
 and_i32 tmp2,tmp2,tmp20
 movi_i32 tmp3,$0x0
 movi_i32 tmp21,$0xffffffff
 movi_i32 tmp22,$0xffffffff
 add2_i32 tmp16,tmp17,tmp2,tmp3,tmp21,tmp22
 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp6,tmp7,tmp0,tmp1,tmp16,tmp17

OK

 movi_i32 tmp20,$0x80bd4e0
 call tmp20,$0x30,$2,tmp0,tmp1,tmp0,tmp1,tmp2,tmp3
 mov_i32 rax_0,tmp0
 movi_i32 rax_1,$0x0
 mov_i32 loc23,tmp0
 mov_i32 loc24,tmp1
 mov_i32 loc25,tmp6

OK, though loc25 is unused after this, why it is not optimized away?

 mov_i32 loc26,tmp7
 movi_i32 tmp21,$0x0
 movi_i32 tmp22,$0x0
 brcond2_i32 tmp2,tmp3,tmp21,tmp22,eq,$0x0
 mov_i32 cc_src_0,tmp6

Incorrect optimization.

 mov_i32 cc_src_1,tmp7
 mov_i32 cc_dst_0,tmp0
 mov_i32 cc_dst_1,tmp1
 movi_i32 cc_op,$0x24
 set_label $0x0
 movi_i32 tmp8,$0xffeb90c
 movi_i32 tmp9,$0x0
 st_i32 tmp8,env,$0x80
 st_i32 tmp9,env,$0x84
 movi_i32 tmp20,$debug
 call tmp20,$0x0,$0
 end

The corresponding translation code is in target-i386/translate.c:1456,
it looks correct.

Maybe the optimizer should consider stack and memory temporaries
different from register temporaries?

Patch

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c7c7da9..f8afe71 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -40,24 +40,196 @@ 
         glue(glue(case INDEX_op_, x), _i32)
 #endif
 
+typedef enum {
+    TCG_TEMP_UNDEF = 0,
+    TCG_TEMP_CONST,
+    TCG_TEMP_COPY,
+    TCG_TEMP_HAS_COPY,
+    TCG_TEMP_ANY
+} tcg_temp_state;
+
+struct tcg_temp_info {
+    tcg_temp_state state;
+    uint16_t prev_copy;
+    uint16_t next_copy;
+    tcg_target_ulong val;
+};
+
+static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+
+/* Reset TEMP's state to TCG_TEMP_ANY.  If TEMP was a representative of some
+   class of equivalent temp's, a new representative should be chosen in this
+   class. */
+static void reset_temp(TCGArg temp, int nb_temps, int nb_globals)
+{
+    int i;
+    TCGArg new_base = (TCGArg)-1;
+    if (temps[temp].state == TCG_TEMP_HAS_COPY) {
+        for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
+            if (i >= nb_globals) {
+                temps[i].state = TCG_TEMP_HAS_COPY;
+                new_base = i;
+                break;
+            }
+        }
+        for (i = temps[temp].next_copy; i != temp; i = temps[i].next_copy) {
+            if (new_base == (TCGArg)-1) {
+                temps[i].state = TCG_TEMP_ANY;
+            } else {
+                temps[i].val = new_base;
+            }
+        }
+        temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+        temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+    } else if (temps[temp].state == TCG_TEMP_COPY) {
+        temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+        temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+        new_base = temps[temp].val;
+    }
+    temps[temp].state = TCG_TEMP_ANY;
+    if (new_base != (TCGArg)-1 && temps[new_base].next_copy == new_base) {
+        temps[new_base].state = TCG_TEMP_ANY;
+    }
+}
+
+static int op_bits(int op)
+{
+    switch (op) {
+    case INDEX_op_mov_i32:
+        return 32;
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_mov_i64:
+        return 64;
+#endif
+    default:
+        fprintf(stderr, "Unrecognized operation %d in op_bits.\n", op);
+        tcg_abort();
+    }
+}
+
+static int op_to_movi(int op)
+{
+    switch (op_bits(op)) {
+    case 32:
+        return INDEX_op_movi_i32;
+#if TCG_TARGET_REG_BITS == 64
+    case 64:
+        return INDEX_op_movi_i64;
+#endif
+    default:
+        fprintf(stderr, "op_to_movi: unexpected return value of "
+                "function op_bits.\n");
+        tcg_abort();
+    }
+}
+
+static void tcg_opt_gen_mov(TCGArg *gen_args, TCGArg dst, TCGArg src,
+                            int nb_temps, int nb_globals)
+{
+        reset_temp(dst, nb_temps, nb_globals);
+        assert(temps[src].state != TCG_TEMP_COPY);
+        if (src >= nb_globals) {
+            assert(temps[src].state != TCG_TEMP_CONST);
+            if (temps[src].state != TCG_TEMP_HAS_COPY) {
+                temps[src].state = TCG_TEMP_HAS_COPY;
+                temps[src].next_copy = src;
+                temps[src].prev_copy = src;
+            }
+            temps[dst].state = TCG_TEMP_COPY;
+            temps[dst].val = src;
+            temps[dst].next_copy = temps[src].next_copy;
+            temps[dst].prev_copy = src;
+            temps[temps[dst].next_copy].prev_copy = dst;
+            temps[src].next_copy = dst;
+        }
+        gen_args[0] = dst;
+        gen_args[1] = src;
+}
+
+static void tcg_opt_gen_movi(TCGArg *gen_args, TCGArg dst, TCGArg val,
+                             int nb_temps, int nb_globals)
+{
+        reset_temp(dst, nb_temps, nb_globals);
+        temps[dst].state = TCG_TEMP_CONST;
+        temps[dst].val = val;
+        gen_args[0] = dst;
+        gen_args[1] = val;
+}
+
+/* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
 {
-    int i, nb_ops, op_index, op, nb_temps, nb_globals;
+    int i, nb_ops, op_index, op, nb_temps, nb_globals, nb_call_args;
     const TCGOpDef *def;
     TCGArg *gen_args;
+    /* Array VALS has an element for each temp.
+       If this temp holds a constant then its value is kept in VALS' element.
+       If this temp is a copy of other ones then this equivalence class'
+       representative is kept in VALS' element.
+       If this temp is neither copy nor constant then corresponding VALS'
+       element is unused. */
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
+    memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
 
     nb_ops = tcg_opc_ptr - gen_opc_buf;
     gen_args = args;
     for (op_index = 0; op_index < nb_ops; op_index++) {
         op = gen_opc_buf[op_index];
         def = &tcg_op_defs[op];
+        /* Do copy propagation */
+        if (!(def->flags & (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS))) {
+            assert(op != INDEX_op_call);
+            for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
+                if (temps[args[i]].state == TCG_TEMP_COPY) {
+                    args[i] = temps[args[i]].val;
+                }
+            }
+        }
+
+        /* Propagate constants through copy operations and do constant
+           folding.  Constants will be substituted to arguments by register
+           allocator where needed and possible.  Also detect copies. */
         switch (op) {
+        CASE_OP_32_64(mov):
+            if ((temps[args[1]].state == TCG_TEMP_COPY
+                && temps[args[1]].val == args[0])
+                || args[0] == args[1]) {
+                args += 2;
+                gen_opc_buf[op_index] = INDEX_op_nop;
+                break;
+            }
+            if (temps[args[1]].state != TCG_TEMP_CONST) {
+                tcg_opt_gen_mov(gen_args, args[0], args[1],
+                                nb_temps, nb_globals);
+                gen_args += 2;
+                args += 2;
+                break;
+            }
+            /* Source argument is constant.  Rewrite the operation and
+               let movi case handle it. */
+            op = op_to_movi(op);
+            gen_opc_buf[op_index] = op;
+            args[1] = temps[args[1]].val;
+            /* fallthrough */
+        CASE_OP_32_64(movi):
+            tcg_opt_gen_movi(gen_args, args[0], args[1], nb_temps, nb_globals);
+            gen_args += 2;
+            args += 2;
+            break;
         case INDEX_op_call:
-            i = (args[0] >> 16) + (args[0] & 0xffff) + 3;
+            nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
+            if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
+                for (i = 0; i < nb_globals; i++) {
+                    reset_temp(i, nb_temps, nb_globals);
+                }
+            }
+            for (i = 0; i < (args[0] >> 16); i++) {
+                reset_temp(args[i + 1], nb_temps, nb_globals);
+            }
+            i = nb_call_args + 3;
             while (i) {
                 *gen_args = *args;
                 args++;
@@ -69,6 +241,7 @@  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         case INDEX_op_jmp:
         case INDEX_op_br:
         CASE_OP_32_64(brcond):
+            memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
             for (i = 0; i < def->nb_args; i++) {
                 *gen_args = *args;
                 args++;
@@ -76,6 +249,11 @@  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             break;
         default:
+            /* Default case: we do know nothing about operation so no
+               propagation is done.  We only trash output args.  */
+            for (i = 0; i < def->nb_oargs; i++) {
+                reset_temp(args[i], nb_temps, nb_globals);
+            }
             for (i = 0; i < def->nb_args; i++) {
                 gen_args[i] = args[i];
             }