Patchwork tcg/tcg.c:1892: tcg fatal error

login
register
mail settings
Submitter Blue Swirl
Date April 26, 2011, 6:36 p.m.
Message ID <BANLkTi=Ybivw3H8hKCr2iqXL9uGDBvsozQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/92947/
State New
Headers show

Comments

Blue Swirl - April 26, 2011, 6:36 p.m.
On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote:
>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues
>>> <laurent.desnogues@gmail.com> wrote:
>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko
>>> > <igor.v.kovalenko@gmail.com> wrote:
>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues
>>> >> <laurent.desnogues@gmail.com> wrote:
>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko
>>> >>>> <igor.v.kovalenko@gmail.com> wrote:
>>> >>>>>>> Do you have public test case?
>>> >>>>>>> It is possible to code this delay slot write test but real issue may
>>> >>>>>>> be corruption elsewhere.
>>> >>>>
>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr.
>>> >>>>
>>> >>>>> In theory there could be multiple issues including compiler induced ones.
>>> >>>>> I'd prefer to see some kind of reproducible testcase.
>>> >>>>
>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and
>>> >>>> needed only because the bios entry point is 0x20).
>>> >>>>
>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios
>>> >>>> test-wrpr.bin -nographic
>>> >>>> Already up-to-date.
>>> >>>> make[1]: Nothing to be done for `all'.
>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error
>>> >>>> Aborted
>>> >>>
>>> >>> The problem seems to be that wrpr is using a non-local
>>> >>> TCG tmp (cpu_tmp0).
>>> >>
>>> >> Just tried the test case with write to %pil - seems like write itself is OK.
>>> >> The issue appears to be with save_state() call since adding save_state
>>> >> to %pil case provokes the same tcg abort.
>>> >
>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't
>>> > need to be saved across helper calls.  This results in the
>>> > TCG "optimizer" getting rid of it even though it's later used.
>>> > Look at the log and you'll see what I mean :-)
>>>
>>> I'm not very comfortable with tcg yet. Would it be possible to teach
>>> optimizer working with delay slots? Or do I look in the wrong place.
>>>
>>
>> The problem is not on the TCG side, but on the target-sparc/translate.c
>> side:
>>
>> |                    case 0x32: /* wrwim, V9 wrpr */
>> |                         {
>> |                             if (!supervisor(dc))
>> |                                 goto priv_insn;
>> |                             tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2);
>> | #ifdef TARGET_SPARC64
>>
>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not
>> saved across TCG branches.
>>
>> [...]
>>
>> |                             case 6: // pstate
>> |                                 save_state(dc, cpu_cond);
>> |                                 gen_helper_wrpstate(cpu_tmp0);
>> |                                 dc->npc = DYNAMIC_PC;
>> |                                 break;
>>
>> save_state() calls save_npc(), which in turns might call
>> gen_generic_branch():
>
> Hmm. This is not the only instruction using save_state() and cpu_tmp0.
> At least ldd is another example.
>
>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2,
>> |                                       TCGv r_cond)
>> | {
>> |     int l1, l2;
>> |
>> |     l1 = gen_new_label();
>> |     l2 = gen_new_label();
>> |
>> |     tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
>> |
>> |     tcg_gen_movi_tl(cpu_npc, npc1);
>> |     tcg_gen_br(l2);
>> |
>> |     gen_set_label(l1);
>> |     tcg_gen_movi_tl(cpu_npc, npc2);
>> |     gen_set_label(l2);
>> | }
>>
>> And here is the TCG branch, which drop the TCG temp cpu_temp0.
>>
>> The solution is either to rewrite gen_generic_branch() without TCG
>> branches, or to use a TCG temp local instead of a TCG temp.
>
> You mean something like
>
>                            case 6: // pstate
>                                {
>                                    TCGv r_temp;
>
>                                    r_temp = tcg_temp_new();
>                                    tcg_gen_mov_tl(r_temp, cpu_tmp0);
>                                    save_state(dc, cpu_cond);
>                                    gen_helper_wrpstate(r_temp);
>                                    tcg_temp_free(r_temp);
>                                    dc->npc = DYNAMIC_PC;
>                                }
>                                break;
>
> ?
> This fails with the same error message.

Close, but you need to use tcg_temp_local_new(). Does this work?
Igor V. Kovalenko - April 26, 2011, 7:07 p.m.
On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote:
>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues
>>>> <laurent.desnogues@gmail.com> wrote:
>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko
>>>> > <igor.v.kovalenko@gmail.com> wrote:
>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues
>>>> >> <laurent.desnogues@gmail.com> wrote:
>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko
>>>> >>>> <igor.v.kovalenko@gmail.com> wrote:
>>>> >>>>>>> Do you have public test case?
>>>> >>>>>>> It is possible to code this delay slot write test but real issue may
>>>> >>>>>>> be corruption elsewhere.
>>>> >>>>
>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr.
>>>> >>>>
>>>> >>>>> In theory there could be multiple issues including compiler induced ones.
>>>> >>>>> I'd prefer to see some kind of reproducible testcase.
>>>> >>>>
>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and
>>>> >>>> needed only because the bios entry point is 0x20).
>>>> >>>>
>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios
>>>> >>>> test-wrpr.bin -nographic
>>>> >>>> Already up-to-date.
>>>> >>>> make[1]: Nothing to be done for `all'.
>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error
>>>> >>>> Aborted
>>>> >>>
>>>> >>> The problem seems to be that wrpr is using a non-local
>>>> >>> TCG tmp (cpu_tmp0).
>>>> >>
>>>> >> Just tried the test case with write to %pil - seems like write itself is OK.
>>>> >> The issue appears to be with save_state() call since adding save_state
>>>> >> to %pil case provokes the same tcg abort.
>>>> >
>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't
>>>> > need to be saved across helper calls.  This results in the
>>>> > TCG "optimizer" getting rid of it even though it's later used.
>>>> > Look at the log and you'll see what I mean :-)
>>>>
>>>> I'm not very comfortable with tcg yet. Would it be possible to teach
>>>> optimizer working with delay slots? Or do I look in the wrong place.
>>>>
>>>
>>> The problem is not on the TCG side, but on the target-sparc/translate.c
>>> side:
>>>
>>> |                    case 0x32: /* wrwim, V9 wrpr */
>>> |                         {
>>> |                             if (!supervisor(dc))
>>> |                                 goto priv_insn;
>>> |                             tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2);
>>> | #ifdef TARGET_SPARC64
>>>
>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not
>>> saved across TCG branches.
>>>
>>> [...]
>>>
>>> |                             case 6: // pstate
>>> |                                 save_state(dc, cpu_cond);
>>> |                                 gen_helper_wrpstate(cpu_tmp0);
>>> |                                 dc->npc = DYNAMIC_PC;
>>> |                                 break;
>>>
>>> save_state() calls save_npc(), which in turns might call
>>> gen_generic_branch():
>>
>> Hmm. This is not the only instruction using save_state() and cpu_tmp0.
>> At least ldd is another example.
>>
>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2,
>>> |                                       TCGv r_cond)
>>> | {
>>> |     int l1, l2;
>>> |
>>> |     l1 = gen_new_label();
>>> |     l2 = gen_new_label();
>>> |
>>> |     tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
>>> |
>>> |     tcg_gen_movi_tl(cpu_npc, npc1);
>>> |     tcg_gen_br(l2);
>>> |
>>> |     gen_set_label(l1);
>>> |     tcg_gen_movi_tl(cpu_npc, npc2);
>>> |     gen_set_label(l2);
>>> | }
>>>
>>> And here is the TCG branch, which drop the TCG temp cpu_temp0.
>>>
>>> The solution is either to rewrite gen_generic_branch() without TCG
>>> branches, or to use a TCG temp local instead of a TCG temp.
>>
>> You mean something like
>>
>>                            case 6: // pstate
>>                                {
>>                                    TCGv r_temp;
>>
>>                                    r_temp = tcg_temp_new();
>>                                    tcg_gen_mov_tl(r_temp, cpu_tmp0);
>>                                    save_state(dc, cpu_cond);
>>                                    gen_helper_wrpstate(r_temp);
>>                                    tcg_temp_free(r_temp);
>>                                    dc->npc = DYNAMIC_PC;
>>                                }
>>                                break;
>>
>> ?
>> This fails with the same error message.
>
> Close, but you need to use tcg_temp_local_new(). Does this work?
>
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 3c958b2..52fa2f1 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc)
>                                 tcg_gen_mov_tl(cpu_tbr, cpu_tmp0);
>                                 break;
>                             case 6: // pstate
> -                                save_state(dc, cpu_cond);
> -                                gen_helper_wrpstate(cpu_tmp0);
> -                                dc->npc = DYNAMIC_PC;
> +                                {
> +                                    TCGv r_tmp = tcg_temp_local_new();
> +
> +                                    tcg_gen_mov_tl(r_tmp, cpu_tmp0);
> +                                    save_state(dc, cpu_cond);
> +                                    gen_helper_wrpstate(r_tmp);
> +                                    tcg_temp_free_ptr(r_tmp);
> +                                    dc->npc = DYNAMIC_PC;
> +                                }
>                                 break;
>                             case 7: // tl
>                                 save_state(dc, cpu_cond);
>

This one works (I have tested similar change but tcg_temp_free() would
be OK as well)
Could anyone please explain why it still works without being in delay
slot? Simply pointing at optimizer does not work :(
Blue Swirl - April 26, 2011, 8:07 p.m.
On Tue, Apr 26, 2011 at 10:07 PM, Igor Kovalenko
<igor.v.kovalenko@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote:
>>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues
>>>>> <laurent.desnogues@gmail.com> wrote:
>>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko
>>>>> > <igor.v.kovalenko@gmail.com> wrote:
>>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues
>>>>> >> <laurent.desnogues@gmail.com> wrote:
>>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko
>>>>> >>>> <igor.v.kovalenko@gmail.com> wrote:
>>>>> >>>>>>> Do you have public test case?
>>>>> >>>>>>> It is possible to code this delay slot write test but real issue may
>>>>> >>>>>>> be corruption elsewhere.
>>>>> >>>>
>>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr.
>>>>> >>>>
>>>>> >>>>> In theory there could be multiple issues including compiler induced ones.
>>>>> >>>>> I'd prefer to see some kind of reproducible testcase.
>>>>> >>>>
>>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and
>>>>> >>>> needed only because the bios entry point is 0x20).
>>>>> >>>>
>>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios
>>>>> >>>> test-wrpr.bin -nographic
>>>>> >>>> Already up-to-date.
>>>>> >>>> make[1]: Nothing to be done for `all'.
>>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error
>>>>> >>>> Aborted
>>>>> >>>
>>>>> >>> The problem seems to be that wrpr is using a non-local
>>>>> >>> TCG tmp (cpu_tmp0).
>>>>> >>
>>>>> >> Just tried the test case with write to %pil - seems like write itself is OK.
>>>>> >> The issue appears to be with save_state() call since adding save_state
>>>>> >> to %pil case provokes the same tcg abort.
>>>>> >
>>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't
>>>>> > need to be saved across helper calls.  This results in the
>>>>> > TCG "optimizer" getting rid of it even though it's later used.
>>>>> > Look at the log and you'll see what I mean :-)
>>>>>
>>>>> I'm not very comfortable with tcg yet. Would it be possible to teach
>>>>> optimizer working with delay slots? Or do I look in the wrong place.
>>>>>
>>>>
>>>> The problem is not on the TCG side, but on the target-sparc/translate.c
>>>> side:
>>>>
>>>> |                    case 0x32: /* wrwim, V9 wrpr */
>>>> |                         {
>>>> |                             if (!supervisor(dc))
>>>> |                                 goto priv_insn;
>>>> |                             tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2);
>>>> | #ifdef TARGET_SPARC64
>>>>
>>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not
>>>> saved across TCG branches.
>>>>
>>>> [...]
>>>>
>>>> |                             case 6: // pstate
>>>> |                                 save_state(dc, cpu_cond);
>>>> |                                 gen_helper_wrpstate(cpu_tmp0);
>>>> |                                 dc->npc = DYNAMIC_PC;
>>>> |                                 break;
>>>>
>>>> save_state() calls save_npc(), which in turns might call
>>>> gen_generic_branch():
>>>
>>> Hmm. This is not the only instruction using save_state() and cpu_tmp0.
>>> At least ldd is another example.
>>>
>>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2,
>>>> |                                       TCGv r_cond)
>>>> | {
>>>> |     int l1, l2;
>>>> |
>>>> |     l1 = gen_new_label();
>>>> |     l2 = gen_new_label();
>>>> |
>>>> |     tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
>>>> |
>>>> |     tcg_gen_movi_tl(cpu_npc, npc1);
>>>> |     tcg_gen_br(l2);
>>>> |
>>>> |     gen_set_label(l1);
>>>> |     tcg_gen_movi_tl(cpu_npc, npc2);
>>>> |     gen_set_label(l2);
>>>> | }
>>>>
>>>> And here is the TCG branch, which drop the TCG temp cpu_temp0.
>>>>
>>>> The solution is either to rewrite gen_generic_branch() without TCG
>>>> branches, or to use a TCG temp local instead of a TCG temp.
>>>
>>> You mean something like
>>>
>>>                            case 6: // pstate
>>>                                {
>>>                                    TCGv r_temp;
>>>
>>>                                    r_temp = tcg_temp_new();
>>>                                    tcg_gen_mov_tl(r_temp, cpu_tmp0);
>>>                                    save_state(dc, cpu_cond);
>>>                                    gen_helper_wrpstate(r_temp);
>>>                                    tcg_temp_free(r_temp);
>>>                                    dc->npc = DYNAMIC_PC;
>>>                                }
>>>                                break;
>>>
>>> ?
>>> This fails with the same error message.
>>
>> Close, but you need to use tcg_temp_local_new(). Does this work?
>>
>> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
>> index 3c958b2..52fa2f1 100644
>> --- a/target-sparc/translate.c
>> +++ b/target-sparc/translate.c
>> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc)
>>                                 tcg_gen_mov_tl(cpu_tbr, cpu_tmp0);
>>                                 break;
>>                             case 6: // pstate
>> -                                save_state(dc, cpu_cond);
>> -                                gen_helper_wrpstate(cpu_tmp0);
>> -                                dc->npc = DYNAMIC_PC;
>> +                                {
>> +                                    TCGv r_tmp = tcg_temp_local_new();
>> +
>> +                                    tcg_gen_mov_tl(r_tmp, cpu_tmp0);
>> +                                    save_state(dc, cpu_cond);
>> +                                    gen_helper_wrpstate(r_tmp);
>> +                                    tcg_temp_free_ptr(r_tmp);
>> +                                    dc->npc = DYNAMIC_PC;
>> +                                }
>>                                 break;
>>                             case 7: // tl
>>                                 save_state(dc, cpu_cond);
>>
>
> This one works (I have tested similar change but tcg_temp_free() would
> be OK as well)

Thanks for spotting, that was not intentional.

> Could anyone please explain why it still works without being in delay
> slot? Simply pointing at optimizer does not work :(

save_npc() will call gen_generic_branch() only if dc->npc == JUMP_PC,
which is used for delay slot handling. Otherwise gen_generic_branch()
is not called and thus  brcond does not get used, so the temporary
will not be flushed for non-delay slot cases.
Igor V. Kovalenko - April 26, 2011, 9:35 p.m.
On Wed, Apr 27, 2011 at 12:07 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 10:07 PM, Igor Kovalenko
> <igor.v.kovalenko@gmail.com> wrote:
>> On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote:
>>>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues
>>>>>> <laurent.desnogues@gmail.com> wrote:
>>>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko
>>>>>> > <igor.v.kovalenko@gmail.com> wrote:
>>>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues
>>>>>> >> <laurent.desnogues@gmail.com> wrote:
>>>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko
>>>>>> >>>> <igor.v.kovalenko@gmail.com> wrote:
>>>>>> >>>>>>> Do you have public test case?
>>>>>> >>>>>>> It is possible to code this delay slot write test but real issue may
>>>>>> >>>>>>> be corruption elsewhere.
>>>>>> >>>>
>>>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr.
>>>>>> >>>>
>>>>>> >>>>> In theory there could be multiple issues including compiler induced ones.
>>>>>> >>>>> I'd prefer to see some kind of reproducible testcase.
>>>>>> >>>>
>>>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and
>>>>>> >>>> needed only because the bios entry point is 0x20).
>>>>>> >>>>
>>>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios
>>>>>> >>>> test-wrpr.bin -nographic
>>>>>> >>>> Already up-to-date.
>>>>>> >>>> make[1]: Nothing to be done for `all'.
>>>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error
>>>>>> >>>> Aborted
>>>>>> >>>
>>>>>> >>> The problem seems to be that wrpr is using a non-local
>>>>>> >>> TCG tmp (cpu_tmp0).
>>>>>> >>
>>>>>> >> Just tried the test case with write to %pil - seems like write itself is OK.
>>>>>> >> The issue appears to be with save_state() call since adding save_state
>>>>>> >> to %pil case provokes the same tcg abort.
>>>>>> >
>>>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't
>>>>>> > need to be saved across helper calls.  This results in the
>>>>>> > TCG "optimizer" getting rid of it even though it's later used.
>>>>>> > Look at the log and you'll see what I mean :-)
>>>>>>
>>>>>> I'm not very comfortable with tcg yet. Would it be possible to teach
>>>>>> optimizer working with delay slots? Or do I look in the wrong place.
>>>>>>
>>>>>
>>>>> The problem is not on the TCG side, but on the target-sparc/translate.c
>>>>> side:
>>>>>
>>>>> |                    case 0x32: /* wrwim, V9 wrpr */
>>>>> |                         {
>>>>> |                             if (!supervisor(dc))
>>>>> |                                 goto priv_insn;
>>>>> |                             tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2);
>>>>> | #ifdef TARGET_SPARC64
>>>>>
>>>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not
>>>>> saved across TCG branches.
>>>>>
>>>>> [...]
>>>>>
>>>>> |                             case 6: // pstate
>>>>> |                                 save_state(dc, cpu_cond);
>>>>> |                                 gen_helper_wrpstate(cpu_tmp0);
>>>>> |                                 dc->npc = DYNAMIC_PC;
>>>>> |                                 break;
>>>>>
>>>>> save_state() calls save_npc(), which in turns might call
>>>>> gen_generic_branch():
>>>>
>>>> Hmm. This is not the only instruction using save_state() and cpu_tmp0.
>>>> At least ldd is another example.
>>>>
>>>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2,
>>>>> |                                       TCGv r_cond)
>>>>> | {
>>>>> |     int l1, l2;
>>>>> |
>>>>> |     l1 = gen_new_label();
>>>>> |     l2 = gen_new_label();
>>>>> |
>>>>> |     tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
>>>>> |
>>>>> |     tcg_gen_movi_tl(cpu_npc, npc1);
>>>>> |     tcg_gen_br(l2);
>>>>> |
>>>>> |     gen_set_label(l1);
>>>>> |     tcg_gen_movi_tl(cpu_npc, npc2);
>>>>> |     gen_set_label(l2);
>>>>> | }
>>>>>
>>>>> And here is the TCG branch, which drop the TCG temp cpu_temp0.
>>>>>
>>>>> The solution is either to rewrite gen_generic_branch() without TCG
>>>>> branches, or to use a TCG temp local instead of a TCG temp.
>>>>
>>>> You mean something like
>>>>
>>>>                            case 6: // pstate
>>>>                                {
>>>>                                    TCGv r_temp;
>>>>
>>>>                                    r_temp = tcg_temp_new();
>>>>                                    tcg_gen_mov_tl(r_temp, cpu_tmp0);
>>>>                                    save_state(dc, cpu_cond);
>>>>                                    gen_helper_wrpstate(r_temp);
>>>>                                    tcg_temp_free(r_temp);
>>>>                                    dc->npc = DYNAMIC_PC;
>>>>                                }
>>>>                                break;
>>>>
>>>> ?
>>>> This fails with the same error message.
>>>
>>> Close, but you need to use tcg_temp_local_new(). Does this work?
>>>
>>> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
>>> index 3c958b2..52fa2f1 100644
>>> --- a/target-sparc/translate.c
>>> +++ b/target-sparc/translate.c
>>> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc)
>>>                                 tcg_gen_mov_tl(cpu_tbr, cpu_tmp0);
>>>                                 break;
>>>                             case 6: // pstate
>>> -                                save_state(dc, cpu_cond);
>>> -                                gen_helper_wrpstate(cpu_tmp0);
>>> -                                dc->npc = DYNAMIC_PC;
>>> +                                {
>>> +                                    TCGv r_tmp = tcg_temp_local_new();
>>> +
>>> +                                    tcg_gen_mov_tl(r_tmp, cpu_tmp0);
>>> +                                    save_state(dc, cpu_cond);
>>> +                                    gen_helper_wrpstate(r_tmp);
>>> +                                    tcg_temp_free_ptr(r_tmp);
>>> +                                    dc->npc = DYNAMIC_PC;
>>> +                                }
>>>                                 break;
>>>                             case 7: // tl
>>>                                 save_state(dc, cpu_cond);
>>>
>>
>> This one works (I have tested similar change but tcg_temp_free() would
>> be OK as well)
>
> Thanks for spotting, that was not intentional.
>
>> Could anyone please explain why it still works without being in delay
>> slot? Simply pointing at optimizer does not work :(
>
> save_npc() will call gen_generic_branch() only if dc->npc == JUMP_PC,
> which is used for delay slot handling. Otherwise gen_generic_branch()
> is not called and thus  brcond does not get used, so the temporary
> will not be flushed for non-delay slot cases.

Great that explains it all, thanks! I think audit over translate.c
would be needed, alternatively all main switch temps for sparc64 could
be made local as hinted by Laurent - provided there is no significant
speed impact.
Artyom Tarasenko - April 27, 2011, 4:29 p.m.
On Tue, Apr 26, 2011 at 8:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote:
>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues
>>>> <laurent.desnogues@gmail.com> wrote:
>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko
>>>> > <igor.v.kovalenko@gmail.com> wrote:
>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues
>>>> >> <laurent.desnogues@gmail.com> wrote:
>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko
>>>> >>>> <igor.v.kovalenko@gmail.com> wrote:
>>>> >>>>>>> Do you have public test case?
>>>> >>>>>>> It is possible to code this delay slot write test but real issue may
>>>> >>>>>>> be corruption elsewhere.
>>>> >>>>
>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr.
>>>> >>>>
>>>> >>>>> In theory there could be multiple issues including compiler induced ones.
>>>> >>>>> I'd prefer to see some kind of reproducible testcase.
>>>> >>>>
>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and
>>>> >>>> needed only because the bios entry point is 0x20).
>>>> >>>>
>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios
>>>> >>>> test-wrpr.bin -nographic
>>>> >>>> Already up-to-date.
>>>> >>>> make[1]: Nothing to be done for `all'.
>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error
>>>> >>>> Aborted
>>>> >>>
>>>> >>> The problem seems to be that wrpr is using a non-local
>>>> >>> TCG tmp (cpu_tmp0).
>>>> >>
>>>> >> Just tried the test case with write to %pil - seems like write itself is OK.
>>>> >> The issue appears to be with save_state() call since adding save_state
>>>> >> to %pil case provokes the same tcg abort.
>>>> >
>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't
>>>> > need to be saved across helper calls.  This results in the
>>>> > TCG "optimizer" getting rid of it even though it's later used.
>>>> > Look at the log and you'll see what I mean :-)
>>>>
>>>> I'm not very comfortable with tcg yet. Would it be possible to teach
>>>> optimizer working with delay slots? Or do I look in the wrong place.
>>>>
>>>
>>> The problem is not on the TCG side, but on the target-sparc/translate.c
>>> side:
>>>
>>> |                    case 0x32: /* wrwim, V9 wrpr */
>>> |                         {
>>> |                             if (!supervisor(dc))
>>> |                                 goto priv_insn;
>>> |                             tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2);
>>> | #ifdef TARGET_SPARC64
>>>
>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not
>>> saved across TCG branches.
>>>
>>> [...]
>>>
>>> |                             case 6: // pstate
>>> |                                 save_state(dc, cpu_cond);
>>> |                                 gen_helper_wrpstate(cpu_tmp0);
>>> |                                 dc->npc = DYNAMIC_PC;
>>> |                                 break;
>>>
>>> save_state() calls save_npc(), which in turns might call
>>> gen_generic_branch():
>>
>> Hmm. This is not the only instruction using save_state() and cpu_tmp0.
>> At least ldd is another example.
>>
>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2,
>>> |                                       TCGv r_cond)
>>> | {
>>> |     int l1, l2;
>>> |
>>> |     l1 = gen_new_label();
>>> |     l2 = gen_new_label();
>>> |
>>> |     tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
>>> |
>>> |     tcg_gen_movi_tl(cpu_npc, npc1);
>>> |     tcg_gen_br(l2);
>>> |
>>> |     gen_set_label(l1);
>>> |     tcg_gen_movi_tl(cpu_npc, npc2);
>>> |     gen_set_label(l2);
>>> | }
>>>
>>> And here is the TCG branch, which drop the TCG temp cpu_temp0.
>>>
>>> The solution is either to rewrite gen_generic_branch() without TCG
>>> branches, or to use a TCG temp local instead of a TCG temp.
>>
>> You mean something like
>>
>>                            case 6: // pstate
>>                                {
>>                                    TCGv r_temp;
>>
>>                                    r_temp = tcg_temp_new();
>>                                    tcg_gen_mov_tl(r_temp, cpu_tmp0);
>>                                    save_state(dc, cpu_cond);
>>                                    gen_helper_wrpstate(r_temp);
>>                                    tcg_temp_free(r_temp);
>>                                    dc->npc = DYNAMIC_PC;
>>                                }
>>                                break;
>>
>> ?
>> This fails with the same error message.
>
> Close, but you need to use tcg_temp_local_new(). Does this work?
>
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 3c958b2..52fa2f1 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc)
>                                 tcg_gen_mov_tl(cpu_tbr, cpu_tmp0);
>                                 break;
>                             case 6: // pstate
> -                                save_state(dc, cpu_cond);
> -                                gen_helper_wrpstate(cpu_tmp0);
> -                                dc->npc = DYNAMIC_PC;
> +                                {
> +                                    TCGv r_tmp = tcg_temp_local_new();
> +
> +                                    tcg_gen_mov_tl(r_tmp, cpu_tmp0);
> +                                    save_state(dc, cpu_cond);
> +                                    gen_helper_wrpstate(r_tmp);
> +                                    tcg_temp_free_ptr(r_tmp);
> +                                    dc->npc = DYNAMIC_PC;
> +                                }
>                                 break;
>                             case 7: // tl
>                                 save_state(dc, cpu_cond);
>

Yep. This (with  tcg_temp_free) passes my tests, thanks!
A bonus question: how is immediate case handled?

I don't see any explicit "if (IS_IMM)" and yet it seems to produce the
expected results for the immediate instructions.
Blue Swirl - April 27, 2011, 5:40 p.m.
On Wed, Apr 27, 2011 at 12:35 AM, Igor Kovalenko
<igor.v.kovalenko@gmail.com> wrote:
> On Wed, Apr 27, 2011 at 12:07 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Tue, Apr 26, 2011 at 10:07 PM, Igor Kovalenko
>> <igor.v.kovalenko@gmail.com> wrote:
>>> On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>>>> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>>>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote:
>>>>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues
>>>>>>> <laurent.desnogues@gmail.com> wrote:
>>>>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko
>>>>>>> > <igor.v.kovalenko@gmail.com> wrote:
>>>>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues
>>>>>>> >> <laurent.desnogues@gmail.com> wrote:
>>>>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko
>>>>>>> >>>> <igor.v.kovalenko@gmail.com> wrote:
>>>>>>> >>>>>>> Do you have public test case?
>>>>>>> >>>>>>> It is possible to code this delay slot write test but real issue may
>>>>>>> >>>>>>> be corruption elsewhere.
>>>>>>> >>>>
>>>>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr.
>>>>>>> >>>>
>>>>>>> >>>>> In theory there could be multiple issues including compiler induced ones.
>>>>>>> >>>>> I'd prefer to see some kind of reproducible testcase.
>>>>>>> >>>>
>>>>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and
>>>>>>> >>>> needed only because the bios entry point is 0x20).
>>>>>>> >>>>
>>>>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios
>>>>>>> >>>> test-wrpr.bin -nographic
>>>>>>> >>>> Already up-to-date.
>>>>>>> >>>> make[1]: Nothing to be done for `all'.
>>>>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error
>>>>>>> >>>> Aborted
>>>>>>> >>>
>>>>>>> >>> The problem seems to be that wrpr is using a non-local
>>>>>>> >>> TCG tmp (cpu_tmp0).
>>>>>>> >>
>>>>>>> >> Just tried the test case with write to %pil - seems like write itself is OK.
>>>>>>> >> The issue appears to be with save_state() call since adding save_state
>>>>>>> >> to %pil case provokes the same tcg abort.
>>>>>>> >
>>>>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't
>>>>>>> > need to be saved across helper calls.  This results in the
>>>>>>> > TCG "optimizer" getting rid of it even though it's later used.
>>>>>>> > Look at the log and you'll see what I mean :-)
>>>>>>>
>>>>>>> I'm not very comfortable with tcg yet. Would it be possible to teach
>>>>>>> optimizer working with delay slots? Or do I look in the wrong place.
>>>>>>>
>>>>>>
>>>>>> The problem is not on the TCG side, but on the target-sparc/translate.c
>>>>>> side:
>>>>>>
>>>>>> |                    case 0x32: /* wrwim, V9 wrpr */
>>>>>> |                         {
>>>>>> |                             if (!supervisor(dc))
>>>>>> |                                 goto priv_insn;
>>>>>> |                             tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2);
>>>>>> | #ifdef TARGET_SPARC64
>>>>>>
>>>>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not
>>>>>> saved across TCG branches.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> |                             case 6: // pstate
>>>>>> |                                 save_state(dc, cpu_cond);
>>>>>> |                                 gen_helper_wrpstate(cpu_tmp0);
>>>>>> |                                 dc->npc = DYNAMIC_PC;
>>>>>> |                                 break;
>>>>>>
>>>>>> save_state() calls save_npc(), which in turns might call
>>>>>> gen_generic_branch():
>>>>>
>>>>> Hmm. This is not the only instruction using save_state() and cpu_tmp0.
>>>>> At least ldd is another example.
>>>>>
>>>>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2,
>>>>>> |                                       TCGv r_cond)
>>>>>> | {
>>>>>> |     int l1, l2;
>>>>>> |
>>>>>> |     l1 = gen_new_label();
>>>>>> |     l2 = gen_new_label();
>>>>>> |
>>>>>> |     tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
>>>>>> |
>>>>>> |     tcg_gen_movi_tl(cpu_npc, npc1);
>>>>>> |     tcg_gen_br(l2);
>>>>>> |
>>>>>> |     gen_set_label(l1);
>>>>>> |     tcg_gen_movi_tl(cpu_npc, npc2);
>>>>>> |     gen_set_label(l2);
>>>>>> | }
>>>>>>
>>>>>> And here is the TCG branch, which drop the TCG temp cpu_temp0.
>>>>>>
>>>>>> The solution is either to rewrite gen_generic_branch() without TCG
>>>>>> branches, or to use a TCG temp local instead of a TCG temp.
>>>>>
>>>>> You mean something like
>>>>>
>>>>>                            case 6: // pstate
>>>>>                                {
>>>>>                                    TCGv r_temp;
>>>>>
>>>>>                                    r_temp = tcg_temp_new();
>>>>>                                    tcg_gen_mov_tl(r_temp, cpu_tmp0);
>>>>>                                    save_state(dc, cpu_cond);
>>>>>                                    gen_helper_wrpstate(r_temp);
>>>>>                                    tcg_temp_free(r_temp);
>>>>>                                    dc->npc = DYNAMIC_PC;
>>>>>                                }
>>>>>                                break;
>>>>>
>>>>> ?
>>>>> This fails with the same error message.
>>>>
>>>> Close, but you need to use tcg_temp_local_new(). Does this work?
>>>>
>>>> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
>>>> index 3c958b2..52fa2f1 100644
>>>> --- a/target-sparc/translate.c
>>>> +++ b/target-sparc/translate.c
>>>> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc)
>>>>                                 tcg_gen_mov_tl(cpu_tbr, cpu_tmp0);
>>>>                                 break;
>>>>                             case 6: // pstate
>>>> -                                save_state(dc, cpu_cond);
>>>> -                                gen_helper_wrpstate(cpu_tmp0);
>>>> -                                dc->npc = DYNAMIC_PC;
>>>> +                                {
>>>> +                                    TCGv r_tmp = tcg_temp_local_new();
>>>> +
>>>> +                                    tcg_gen_mov_tl(r_tmp, cpu_tmp0);
>>>> +                                    save_state(dc, cpu_cond);
>>>> +                                    gen_helper_wrpstate(r_tmp);
>>>> +                                    tcg_temp_free_ptr(r_tmp);
>>>> +                                    dc->npc = DYNAMIC_PC;
>>>> +                                }
>>>>                                 break;
>>>>                             case 7: // tl
>>>>                                 save_state(dc, cpu_cond);
>>>>
>>>
>>> This one works (I have tested similar change but tcg_temp_free() would
>>> be OK as well)
>>
>> Thanks for spotting, that was not intentional.
>>
>>> Could anyone please explain why it still works without being in delay
>>> slot? Simply pointing at optimizer does not work :(
>>
>> save_npc() will call gen_generic_branch() only if dc->npc == JUMP_PC,
>> which is used for delay slot handling. Otherwise gen_generic_branch()
>> is not called and thus  brcond does not get used, so the temporary
>> will not be flushed for non-delay slot cases.
>
> Great that explains it all, thanks! I think audit over translate.c
> would be needed, alternatively all main switch temps for sparc64 could
> be made local as hinted by Laurent - provided there is no significant
> speed impact.

I think there may be a few other cases. Local temps are allocated from
stack instead of CPU registers, so there could be a speed impact.

It would be better if this case could be detected more reliably
instead, for example enhancing the debugging mode for TCG.
Blue Swirl - April 27, 2011, 5:41 p.m.
On Wed, Apr 27, 2011 at 7:29 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
> On Tue, Apr 26, 2011 at 8:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote:
>>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues
>>>>> <laurent.desnogues@gmail.com> wrote:
>>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko
>>>>> > <igor.v.kovalenko@gmail.com> wrote:
>>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues
>>>>> >> <laurent.desnogues@gmail.com> wrote:
>>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote:
>>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko
>>>>> >>>> <igor.v.kovalenko@gmail.com> wrote:
>>>>> >>>>>>> Do you have public test case?
>>>>> >>>>>>> It is possible to code this delay slot write test but real issue may
>>>>> >>>>>>> be corruption elsewhere.
>>>>> >>>>
>>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr.
>>>>> >>>>
>>>>> >>>>> In theory there could be multiple issues including compiler induced ones.
>>>>> >>>>> I'd prefer to see some kind of reproducible testcase.
>>>>> >>>>
>>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and
>>>>> >>>> needed only because the bios entry point is 0x20).
>>>>> >>>>
>>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios
>>>>> >>>> test-wrpr.bin -nographic
>>>>> >>>> Already up-to-date.
>>>>> >>>> make[1]: Nothing to be done for `all'.
>>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error
>>>>> >>>> Aborted
>>>>> >>>
>>>>> >>> The problem seems to be that wrpr is using a non-local
>>>>> >>> TCG tmp (cpu_tmp0).
>>>>> >>
>>>>> >> Just tried the test case with write to %pil - seems like write itself is OK.
>>>>> >> The issue appears to be with save_state() call since adding save_state
>>>>> >> to %pil case provokes the same tcg abort.
>>>>> >
>>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't
>>>>> > need to be saved across helper calls.  This results in the
>>>>> > TCG "optimizer" getting rid of it even though it's later used.
>>>>> > Look at the log and you'll see what I mean :-)
>>>>>
>>>>> I'm not very comfortable with tcg yet. Would it be possible to teach
>>>>> optimizer working with delay slots? Or do I look in the wrong place.
>>>>>
>>>>
>>>> The problem is not on the TCG side, but on the target-sparc/translate.c
>>>> side:
>>>>
>>>> |                    case 0x32: /* wrwim, V9 wrpr */
>>>> |                         {
>>>> |                             if (!supervisor(dc))
>>>> |                                 goto priv_insn;
>>>> |                             tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2);
>>>> | #ifdef TARGET_SPARC64
>>>>
>>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not
>>>> saved across TCG branches.
>>>>
>>>> [...]
>>>>
>>>> |                             case 6: // pstate
>>>> |                                 save_state(dc, cpu_cond);
>>>> |                                 gen_helper_wrpstate(cpu_tmp0);
>>>> |                                 dc->npc = DYNAMIC_PC;
>>>> |                                 break;
>>>>
>>>> save_state() calls save_npc(), which in turns might call
>>>> gen_generic_branch():
>>>
>>> Hmm. This is not the only instruction using save_state() and cpu_tmp0.
>>> At least ldd is another example.
>>>
>>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2,
>>>> |                                       TCGv r_cond)
>>>> | {
>>>> |     int l1, l2;
>>>> |
>>>> |     l1 = gen_new_label();
>>>> |     l2 = gen_new_label();
>>>> |
>>>> |     tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
>>>> |
>>>> |     tcg_gen_movi_tl(cpu_npc, npc1);
>>>> |     tcg_gen_br(l2);
>>>> |
>>>> |     gen_set_label(l1);
>>>> |     tcg_gen_movi_tl(cpu_npc, npc2);
>>>> |     gen_set_label(l2);
>>>> | }
>>>>
>>>> And here is the TCG branch, which drop the TCG temp cpu_temp0.
>>>>
>>>> The solution is either to rewrite gen_generic_branch() without TCG
>>>> branches, or to use a TCG temp local instead of a TCG temp.
>>>
>>> You mean something like
>>>
>>>                            case 6: // pstate
>>>                                {
>>>                                    TCGv r_temp;
>>>
>>>                                    r_temp = tcg_temp_new();
>>>                                    tcg_gen_mov_tl(r_temp, cpu_tmp0);
>>>                                    save_state(dc, cpu_cond);
>>>                                    gen_helper_wrpstate(r_temp);
>>>                                    tcg_temp_free(r_temp);
>>>                                    dc->npc = DYNAMIC_PC;
>>>                                }
>>>                                break;
>>>
>>> ?
>>> This fails with the same error message.
>>
>> Close, but you need to use tcg_temp_local_new(). Does this work?
>>
>> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
>> index 3c958b2..52fa2f1 100644
>> --- a/target-sparc/translate.c
>> +++ b/target-sparc/translate.c
>> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc)
>>                                 tcg_gen_mov_tl(cpu_tbr, cpu_tmp0);
>>                                 break;
>>                             case 6: // pstate
>> -                                save_state(dc, cpu_cond);
>> -                                gen_helper_wrpstate(cpu_tmp0);
>> -                                dc->npc = DYNAMIC_PC;
>> +                                {
>> +                                    TCGv r_tmp = tcg_temp_local_new();
>> +
>> +                                    tcg_gen_mov_tl(r_tmp, cpu_tmp0);
>> +                                    save_state(dc, cpu_cond);
>> +                                    gen_helper_wrpstate(r_tmp);
>> +                                    tcg_temp_free_ptr(r_tmp);
>> +                                    dc->npc = DYNAMIC_PC;
>> +                                }
>>                                 break;
>>                             case 7: // tl
>>                                 save_state(dc, cpu_cond);
>>
>
> Yep. This (with  tcg_temp_free) passes my tests, thanks!
> A bonus question: how is immediate case handled?
>
> I don't see any explicit "if (IS_IMM)" and yet it seems to produce the
> expected results for the immediate instructions.

Immediates are handled by get_src2().

Patch

diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 3c958b2..52fa2f1 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -3505,9 +3505,15 @@  static void disas_sparc_insn(DisasContext * dc)
                                 tcg_gen_mov_tl(cpu_tbr, cpu_tmp0);
                                 break;
                             case 6: // pstate
-                                save_state(dc, cpu_cond);
-                                gen_helper_wrpstate(cpu_tmp0);
-                                dc->npc = DYNAMIC_PC;
+                                {
+                                    TCGv r_tmp = tcg_temp_local_new();
+
+                                    tcg_gen_mov_tl(r_tmp, cpu_tmp0);
+                                    save_state(dc, cpu_cond);
+                                    gen_helper_wrpstate(r_tmp);
+                                    tcg_temp_free_ptr(r_tmp);
+                                    dc->npc = DYNAMIC_PC;
+                                }
                                 break;
                             case 7: // tl
                                 save_state(dc, cpu_cond);