Message ID | BANLkTi=Ybivw3H8hKCr2iqXL9uGDBvsozQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote: > On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: >>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >>>> <laurent.desnogues@gmail.com> wrote: >>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >>>> > <igor.v.kovalenko@gmail.com> wrote: >>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >>>> >> <laurent.desnogues@gmail.com> wrote: >>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >>>> >>>> <igor.v.kovalenko@gmail.com> wrote: >>>> >>>>>>> Do you have public test case? >>>> >>>>>>> It is possible to code this delay slot write test but real issue may >>>> >>>>>>> be corruption elsewhere. >>>> >>>> >>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr. >>>> >>>> >>>> >>>>> In theory there could be multiple issues including compiler induced ones. >>>> >>>>> I'd prefer to see some kind of reproducible testcase. >>>> >>>> >>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and >>>> >>>> needed only because the bios entry point is 0x20). >>>> >>>> >>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >>>> >>>> test-wrpr.bin -nographic >>>> >>>> Already up-to-date. >>>> >>>> make[1]: Nothing to be done for `all'. >>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >>>> >>>> Aborted >>>> >>> >>>> >>> The problem seems to be that wrpr is using a non-local >>>> >>> TCG tmp (cpu_tmp0). >>>> >> >>>> >> Just tried the test case with write to %pil - seems like write itself is OK. >>>> >> The issue appears to be with save_state() call since adding save_state >>>> >> to %pil case provokes the same tcg abort. >>>> > >>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't >>>> > need to be saved across helper calls. This results in the >>>> > TCG "optimizer" getting rid of it even though it's later used. >>>> > Look at the log and you'll see what I mean :-) >>>> >>>> I'm not very comfortable with tcg yet. Would it be possible to teach >>>> optimizer working with delay slots? Or do I look in the wrong place. >>>> >>> >>> The problem is not on the TCG side, but on the target-sparc/translate.c >>> side: >>> >>> | case 0x32: /* wrwim, V9 wrpr */ >>> | { >>> | if (!supervisor(dc)) >>> | goto priv_insn; >>> | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2); >>> | #ifdef TARGET_SPARC64 >>> >>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not >>> saved across TCG branches. >>> >>> [...] >>> >>> | case 6: // pstate >>> | save_state(dc, cpu_cond); >>> | gen_helper_wrpstate(cpu_tmp0); >>> | dc->npc = DYNAMIC_PC; >>> | break; >>> >>> save_state() calls save_npc(), which in turns might call >>> gen_generic_branch(): >> >> Hmm. This is not the only instruction using save_state() and cpu_tmp0. >> At least ldd is another example. >> >>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2, >>> | TCGv r_cond) >>> | { >>> | int l1, l2; >>> | >>> | l1 = gen_new_label(); >>> | l2 = gen_new_label(); >>> | >>> | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); >>> | >>> | tcg_gen_movi_tl(cpu_npc, npc1); >>> | tcg_gen_br(l2); >>> | >>> | gen_set_label(l1); >>> | tcg_gen_movi_tl(cpu_npc, npc2); >>> | gen_set_label(l2); >>> | } >>> >>> And here is the TCG branch, which drop the TCG temp cpu_temp0. >>> >>> The solution is either to rewrite gen_generic_branch() without TCG >>> branches, or to use a TCG temp local instead of a TCG temp. >> >> You mean something like >> >> case 6: // pstate >> { >> TCGv r_temp; >> >> r_temp = tcg_temp_new(); >> tcg_gen_mov_tl(r_temp, cpu_tmp0); >> save_state(dc, cpu_cond); >> gen_helper_wrpstate(r_temp); >> tcg_temp_free(r_temp); >> dc->npc = DYNAMIC_PC; >> } >> break; >> >> ? >> This fails with the same error message. > > Close, but you need to use tcg_temp_local_new(). Does this work? > > diff --git a/target-sparc/translate.c b/target-sparc/translate.c > index 3c958b2..52fa2f1 100644 > --- a/target-sparc/translate.c > +++ b/target-sparc/translate.c > @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) > tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); > break; > case 6: // pstate > - save_state(dc, cpu_cond); > - gen_helper_wrpstate(cpu_tmp0); > - dc->npc = DYNAMIC_PC; > + { > + TCGv r_tmp = tcg_temp_local_new(); > + > + tcg_gen_mov_tl(r_tmp, cpu_tmp0); > + save_state(dc, cpu_cond); > + gen_helper_wrpstate(r_tmp); > + tcg_temp_free_ptr(r_tmp); > + dc->npc = DYNAMIC_PC; > + } > break; > case 7: // tl > save_state(dc, cpu_cond); > This one works (I have tested similar change but tcg_temp_free() would be OK as well) Could anyone please explain why it still works without being in delay slot? Simply pointing at optimizer does not work :(
On Tue, Apr 26, 2011 at 10:07 PM, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote: > On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote: >> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: >>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >>>>> <laurent.desnogues@gmail.com> wrote: >>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >>>>> > <igor.v.kovalenko@gmail.com> wrote: >>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >>>>> >> <laurent.desnogues@gmail.com> wrote: >>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >>>>> >>>> <igor.v.kovalenko@gmail.com> wrote: >>>>> >>>>>>> Do you have public test case? >>>>> >>>>>>> It is possible to code this delay slot write test but real issue may >>>>> >>>>>>> be corruption elsewhere. >>>>> >>>> >>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr. >>>>> >>>> >>>>> >>>>> In theory there could be multiple issues including compiler induced ones. >>>>> >>>>> I'd prefer to see some kind of reproducible testcase. >>>>> >>>> >>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and >>>>> >>>> needed only because the bios entry point is 0x20). >>>>> >>>> >>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >>>>> >>>> test-wrpr.bin -nographic >>>>> >>>> Already up-to-date. >>>>> >>>> make[1]: Nothing to be done for `all'. >>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >>>>> >>>> Aborted >>>>> >>> >>>>> >>> The problem seems to be that wrpr is using a non-local >>>>> >>> TCG tmp (cpu_tmp0). >>>>> >> >>>>> >> Just tried the test case with write to %pil - seems like write itself is OK. >>>>> >> The issue appears to be with save_state() call since adding save_state >>>>> >> to %pil case provokes the same tcg abort. >>>>> > >>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't >>>>> > need to be saved across helper calls. This results in the >>>>> > TCG "optimizer" getting rid of it even though it's later used. >>>>> > Look at the log and you'll see what I mean :-) >>>>> >>>>> I'm not very comfortable with tcg yet. Would it be possible to teach >>>>> optimizer working with delay slots? Or do I look in the wrong place. >>>>> >>>> >>>> The problem is not on the TCG side, but on the target-sparc/translate.c >>>> side: >>>> >>>> | case 0x32: /* wrwim, V9 wrpr */ >>>> | { >>>> | if (!supervisor(dc)) >>>> | goto priv_insn; >>>> | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2); >>>> | #ifdef TARGET_SPARC64 >>>> >>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not >>>> saved across TCG branches. >>>> >>>> [...] >>>> >>>> | case 6: // pstate >>>> | save_state(dc, cpu_cond); >>>> | gen_helper_wrpstate(cpu_tmp0); >>>> | dc->npc = DYNAMIC_PC; >>>> | break; >>>> >>>> save_state() calls save_npc(), which in turns might call >>>> gen_generic_branch(): >>> >>> Hmm. This is not the only instruction using save_state() and cpu_tmp0. >>> At least ldd is another example. >>> >>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2, >>>> | TCGv r_cond) >>>> | { >>>> | int l1, l2; >>>> | >>>> | l1 = gen_new_label(); >>>> | l2 = gen_new_label(); >>>> | >>>> | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); >>>> | >>>> | tcg_gen_movi_tl(cpu_npc, npc1); >>>> | tcg_gen_br(l2); >>>> | >>>> | gen_set_label(l1); >>>> | tcg_gen_movi_tl(cpu_npc, npc2); >>>> | gen_set_label(l2); >>>> | } >>>> >>>> And here is the TCG branch, which drop the TCG temp cpu_temp0. >>>> >>>> The solution is either to rewrite gen_generic_branch() without TCG >>>> branches, or to use a TCG temp local instead of a TCG temp. >>> >>> You mean something like >>> >>> case 6: // pstate >>> { >>> TCGv r_temp; >>> >>> r_temp = tcg_temp_new(); >>> tcg_gen_mov_tl(r_temp, cpu_tmp0); >>> save_state(dc, cpu_cond); >>> gen_helper_wrpstate(r_temp); >>> tcg_temp_free(r_temp); >>> dc->npc = DYNAMIC_PC; >>> } >>> break; >>> >>> ? >>> This fails with the same error message. >> >> Close, but you need to use tcg_temp_local_new(). Does this work? >> >> diff --git a/target-sparc/translate.c b/target-sparc/translate.c >> index 3c958b2..52fa2f1 100644 >> --- a/target-sparc/translate.c >> +++ b/target-sparc/translate.c >> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) >> tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); >> break; >> case 6: // pstate >> - save_state(dc, cpu_cond); >> - gen_helper_wrpstate(cpu_tmp0); >> - dc->npc = DYNAMIC_PC; >> + { >> + TCGv r_tmp = tcg_temp_local_new(); >> + >> + tcg_gen_mov_tl(r_tmp, cpu_tmp0); >> + save_state(dc, cpu_cond); >> + gen_helper_wrpstate(r_tmp); >> + tcg_temp_free_ptr(r_tmp); >> + dc->npc = DYNAMIC_PC; >> + } >> break; >> case 7: // tl >> save_state(dc, cpu_cond); >> > > This one works (I have tested similar change but tcg_temp_free() would > be OK as well) Thanks for spotting, that was not intentional. > Could anyone please explain why it still works without being in delay > slot? Simply pointing at optimizer does not work :( save_npc() will call gen_generic_branch() only if dc->npc == JUMP_PC, which is used for delay slot handling. Otherwise gen_generic_branch() is not called and thus brcond does not get used, so the temporary will not be flushed for non-delay slot cases.
On Wed, Apr 27, 2011 at 12:07 AM, Blue Swirl <blauwirbel@gmail.com> wrote: > On Tue, Apr 26, 2011 at 10:07 PM, Igor Kovalenko > <igor.v.kovalenko@gmail.com> wrote: >> On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote: >>> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: >>>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >>>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >>>>>> <laurent.desnogues@gmail.com> wrote: >>>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >>>>>> > <igor.v.kovalenko@gmail.com> wrote: >>>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >>>>>> >> <laurent.desnogues@gmail.com> wrote: >>>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >>>>>> >>>> <igor.v.kovalenko@gmail.com> wrote: >>>>>> >>>>>>> Do you have public test case? >>>>>> >>>>>>> It is possible to code this delay slot write test but real issue may >>>>>> >>>>>>> be corruption elsewhere. >>>>>> >>>> >>>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr. >>>>>> >>>> >>>>>> >>>>> In theory there could be multiple issues including compiler induced ones. >>>>>> >>>>> I'd prefer to see some kind of reproducible testcase. >>>>>> >>>> >>>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and >>>>>> >>>> needed only because the bios entry point is 0x20). >>>>>> >>>> >>>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >>>>>> >>>> test-wrpr.bin -nographic >>>>>> >>>> Already up-to-date. >>>>>> >>>> make[1]: Nothing to be done for `all'. >>>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >>>>>> >>>> Aborted >>>>>> >>> >>>>>> >>> The problem seems to be that wrpr is using a non-local >>>>>> >>> TCG tmp (cpu_tmp0). >>>>>> >> >>>>>> >> Just tried the test case with write to %pil - seems like write itself is OK. >>>>>> >> The issue appears to be with save_state() call since adding save_state >>>>>> >> to %pil case provokes the same tcg abort. >>>>>> > >>>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't >>>>>> > need to be saved across helper calls. This results in the >>>>>> > TCG "optimizer" getting rid of it even though it's later used. >>>>>> > Look at the log and you'll see what I mean :-) >>>>>> >>>>>> I'm not very comfortable with tcg yet. Would it be possible to teach >>>>>> optimizer working with delay slots? Or do I look in the wrong place. >>>>>> >>>>> >>>>> The problem is not on the TCG side, but on the target-sparc/translate.c >>>>> side: >>>>> >>>>> | case 0x32: /* wrwim, V9 wrpr */ >>>>> | { >>>>> | if (!supervisor(dc)) >>>>> | goto priv_insn; >>>>> | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2); >>>>> | #ifdef TARGET_SPARC64 >>>>> >>>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not >>>>> saved across TCG branches. >>>>> >>>>> [...] >>>>> >>>>> | case 6: // pstate >>>>> | save_state(dc, cpu_cond); >>>>> | gen_helper_wrpstate(cpu_tmp0); >>>>> | dc->npc = DYNAMIC_PC; >>>>> | break; >>>>> >>>>> save_state() calls save_npc(), which in turns might call >>>>> gen_generic_branch(): >>>> >>>> Hmm. This is not the only instruction using save_state() and cpu_tmp0. >>>> At least ldd is another example. >>>> >>>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2, >>>>> | TCGv r_cond) >>>>> | { >>>>> | int l1, l2; >>>>> | >>>>> | l1 = gen_new_label(); >>>>> | l2 = gen_new_label(); >>>>> | >>>>> | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); >>>>> | >>>>> | tcg_gen_movi_tl(cpu_npc, npc1); >>>>> | tcg_gen_br(l2); >>>>> | >>>>> | gen_set_label(l1); >>>>> | tcg_gen_movi_tl(cpu_npc, npc2); >>>>> | gen_set_label(l2); >>>>> | } >>>>> >>>>> And here is the TCG branch, which drop the TCG temp cpu_temp0. >>>>> >>>>> The solution is either to rewrite gen_generic_branch() without TCG >>>>> branches, or to use a TCG temp local instead of a TCG temp. >>>> >>>> You mean something like >>>> >>>> case 6: // pstate >>>> { >>>> TCGv r_temp; >>>> >>>> r_temp = tcg_temp_new(); >>>> tcg_gen_mov_tl(r_temp, cpu_tmp0); >>>> save_state(dc, cpu_cond); >>>> gen_helper_wrpstate(r_temp); >>>> tcg_temp_free(r_temp); >>>> dc->npc = DYNAMIC_PC; >>>> } >>>> break; >>>> >>>> ? >>>> This fails with the same error message. >>> >>> Close, but you need to use tcg_temp_local_new(). Does this work? >>> >>> diff --git a/target-sparc/translate.c b/target-sparc/translate.c >>> index 3c958b2..52fa2f1 100644 >>> --- a/target-sparc/translate.c >>> +++ b/target-sparc/translate.c >>> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) >>> tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); >>> break; >>> case 6: // pstate >>> - save_state(dc, cpu_cond); >>> - gen_helper_wrpstate(cpu_tmp0); >>> - dc->npc = DYNAMIC_PC; >>> + { >>> + TCGv r_tmp = tcg_temp_local_new(); >>> + >>> + tcg_gen_mov_tl(r_tmp, cpu_tmp0); >>> + save_state(dc, cpu_cond); >>> + gen_helper_wrpstate(r_tmp); >>> + tcg_temp_free_ptr(r_tmp); >>> + dc->npc = DYNAMIC_PC; >>> + } >>> break; >>> case 7: // tl >>> save_state(dc, cpu_cond); >>> >> >> This one works (I have tested similar change but tcg_temp_free() would >> be OK as well) > > Thanks for spotting, that was not intentional. > >> Could anyone please explain why it still works without being in delay >> slot? Simply pointing at optimizer does not work :( > > save_npc() will call gen_generic_branch() only if dc->npc == JUMP_PC, > which is used for delay slot handling. Otherwise gen_generic_branch() > is not called and thus brcond does not get used, so the temporary > will not be flushed for non-delay slot cases. Great that explains it all, thanks! I think audit over translate.c would be needed, alternatively all main switch temps for sparc64 could be made local as hinted by Laurent - provided there is no significant speed impact.
On Tue, Apr 26, 2011 at 8:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote: > On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: >>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >>>> <laurent.desnogues@gmail.com> wrote: >>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >>>> > <igor.v.kovalenko@gmail.com> wrote: >>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >>>> >> <laurent.desnogues@gmail.com> wrote: >>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >>>> >>>> <igor.v.kovalenko@gmail.com> wrote: >>>> >>>>>>> Do you have public test case? >>>> >>>>>>> It is possible to code this delay slot write test but real issue may >>>> >>>>>>> be corruption elsewhere. >>>> >>>> >>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr. >>>> >>>> >>>> >>>>> In theory there could be multiple issues including compiler induced ones. >>>> >>>>> I'd prefer to see some kind of reproducible testcase. >>>> >>>> >>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and >>>> >>>> needed only because the bios entry point is 0x20). >>>> >>>> >>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >>>> >>>> test-wrpr.bin -nographic >>>> >>>> Already up-to-date. >>>> >>>> make[1]: Nothing to be done for `all'. >>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >>>> >>>> Aborted >>>> >>> >>>> >>> The problem seems to be that wrpr is using a non-local >>>> >>> TCG tmp (cpu_tmp0). >>>> >> >>>> >> Just tried the test case with write to %pil - seems like write itself is OK. >>>> >> The issue appears to be with save_state() call since adding save_state >>>> >> to %pil case provokes the same tcg abort. >>>> > >>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't >>>> > need to be saved across helper calls. This results in the >>>> > TCG "optimizer" getting rid of it even though it's later used. >>>> > Look at the log and you'll see what I mean :-) >>>> >>>> I'm not very comfortable with tcg yet. Would it be possible to teach >>>> optimizer working with delay slots? Or do I look in the wrong place. >>>> >>> >>> The problem is not on the TCG side, but on the target-sparc/translate.c >>> side: >>> >>> | case 0x32: /* wrwim, V9 wrpr */ >>> | { >>> | if (!supervisor(dc)) >>> | goto priv_insn; >>> | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2); >>> | #ifdef TARGET_SPARC64 >>> >>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not >>> saved across TCG branches. >>> >>> [...] >>> >>> | case 6: // pstate >>> | save_state(dc, cpu_cond); >>> | gen_helper_wrpstate(cpu_tmp0); >>> | dc->npc = DYNAMIC_PC; >>> | break; >>> >>> save_state() calls save_npc(), which in turns might call >>> gen_generic_branch(): >> >> Hmm. This is not the only instruction using save_state() and cpu_tmp0. >> At least ldd is another example. >> >>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2, >>> | TCGv r_cond) >>> | { >>> | int l1, l2; >>> | >>> | l1 = gen_new_label(); >>> | l2 = gen_new_label(); >>> | >>> | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); >>> | >>> | tcg_gen_movi_tl(cpu_npc, npc1); >>> | tcg_gen_br(l2); >>> | >>> | gen_set_label(l1); >>> | tcg_gen_movi_tl(cpu_npc, npc2); >>> | gen_set_label(l2); >>> | } >>> >>> And here is the TCG branch, which drop the TCG temp cpu_temp0. >>> >>> The solution is either to rewrite gen_generic_branch() without TCG >>> branches, or to use a TCG temp local instead of a TCG temp. >> >> You mean something like >> >> case 6: // pstate >> { >> TCGv r_temp; >> >> r_temp = tcg_temp_new(); >> tcg_gen_mov_tl(r_temp, cpu_tmp0); >> save_state(dc, cpu_cond); >> gen_helper_wrpstate(r_temp); >> tcg_temp_free(r_temp); >> dc->npc = DYNAMIC_PC; >> } >> break; >> >> ? >> This fails with the same error message. > > Close, but you need to use tcg_temp_local_new(). Does this work? > > diff --git a/target-sparc/translate.c b/target-sparc/translate.c > index 3c958b2..52fa2f1 100644 > --- a/target-sparc/translate.c > +++ b/target-sparc/translate.c > @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) > tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); > break; > case 6: // pstate > - save_state(dc, cpu_cond); > - gen_helper_wrpstate(cpu_tmp0); > - dc->npc = DYNAMIC_PC; > + { > + TCGv r_tmp = tcg_temp_local_new(); > + > + tcg_gen_mov_tl(r_tmp, cpu_tmp0); > + save_state(dc, cpu_cond); > + gen_helper_wrpstate(r_tmp); > + tcg_temp_free_ptr(r_tmp); > + dc->npc = DYNAMIC_PC; > + } > break; > case 7: // tl > save_state(dc, cpu_cond); > Yep. This (with tcg_temp_free) passes my tests, thanks! A bonus question: how is immediate case handled? I don't see any explicit "if (IS_IMM)" and yet it seems to produce the expected results for the immediate instructions.
On Wed, Apr 27, 2011 at 12:35 AM, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote: > On Wed, Apr 27, 2011 at 12:07 AM, Blue Swirl <blauwirbel@gmail.com> wrote: >> On Tue, Apr 26, 2011 at 10:07 PM, Igor Kovalenko >> <igor.v.kovalenko@gmail.com> wrote: >>> On Tue, Apr 26, 2011 at 10:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote: >>>> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: >>>>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >>>>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >>>>>>> <laurent.desnogues@gmail.com> wrote: >>>>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >>>>>>> > <igor.v.kovalenko@gmail.com> wrote: >>>>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >>>>>>> >> <laurent.desnogues@gmail.com> wrote: >>>>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >>>>>>> >>>> <igor.v.kovalenko@gmail.com> wrote: >>>>>>> >>>>>>> Do you have public test case? >>>>>>> >>>>>>> It is possible to code this delay slot write test but real issue may >>>>>>> >>>>>>> be corruption elsewhere. >>>>>>> >>>> >>>>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr. >>>>>>> >>>> >>>>>>> >>>>> In theory there could be multiple issues including compiler induced ones. >>>>>>> >>>>> I'd prefer to see some kind of reproducible testcase. >>>>>>> >>>> >>>>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and >>>>>>> >>>> needed only because the bios entry point is 0x20). >>>>>>> >>>> >>>>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >>>>>>> >>>> test-wrpr.bin -nographic >>>>>>> >>>> Already up-to-date. >>>>>>> >>>> make[1]: Nothing to be done for `all'. >>>>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >>>>>>> >>>> Aborted >>>>>>> >>> >>>>>>> >>> The problem seems to be that wrpr is using a non-local >>>>>>> >>> TCG tmp (cpu_tmp0). >>>>>>> >> >>>>>>> >> Just tried the test case with write to %pil - seems like write itself is OK. >>>>>>> >> The issue appears to be with save_state() call since adding save_state >>>>>>> >> to %pil case provokes the same tcg abort. >>>>>>> > >>>>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't >>>>>>> > need to be saved across helper calls. This results in the >>>>>>> > TCG "optimizer" getting rid of it even though it's later used. >>>>>>> > Look at the log and you'll see what I mean :-) >>>>>>> >>>>>>> I'm not very comfortable with tcg yet. Would it be possible to teach >>>>>>> optimizer working with delay slots? Or do I look in the wrong place. >>>>>>> >>>>>> >>>>>> The problem is not on the TCG side, but on the target-sparc/translate.c >>>>>> side: >>>>>> >>>>>> | case 0x32: /* wrwim, V9 wrpr */ >>>>>> | { >>>>>> | if (!supervisor(dc)) >>>>>> | goto priv_insn; >>>>>> | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2); >>>>>> | #ifdef TARGET_SPARC64 >>>>>> >>>>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not >>>>>> saved across TCG branches. >>>>>> >>>>>> [...] >>>>>> >>>>>> | case 6: // pstate >>>>>> | save_state(dc, cpu_cond); >>>>>> | gen_helper_wrpstate(cpu_tmp0); >>>>>> | dc->npc = DYNAMIC_PC; >>>>>> | break; >>>>>> >>>>>> save_state() calls save_npc(), which in turns might call >>>>>> gen_generic_branch(): >>>>> >>>>> Hmm. This is not the only instruction using save_state() and cpu_tmp0. >>>>> At least ldd is another example. >>>>> >>>>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2, >>>>>> | TCGv r_cond) >>>>>> | { >>>>>> | int l1, l2; >>>>>> | >>>>>> | l1 = gen_new_label(); >>>>>> | l2 = gen_new_label(); >>>>>> | >>>>>> | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); >>>>>> | >>>>>> | tcg_gen_movi_tl(cpu_npc, npc1); >>>>>> | tcg_gen_br(l2); >>>>>> | >>>>>> | gen_set_label(l1); >>>>>> | tcg_gen_movi_tl(cpu_npc, npc2); >>>>>> | gen_set_label(l2); >>>>>> | } >>>>>> >>>>>> And here is the TCG branch, which drop the TCG temp cpu_temp0. >>>>>> >>>>>> The solution is either to rewrite gen_generic_branch() without TCG >>>>>> branches, or to use a TCG temp local instead of a TCG temp. >>>>> >>>>> You mean something like >>>>> >>>>> case 6: // pstate >>>>> { >>>>> TCGv r_temp; >>>>> >>>>> r_temp = tcg_temp_new(); >>>>> tcg_gen_mov_tl(r_temp, cpu_tmp0); >>>>> save_state(dc, cpu_cond); >>>>> gen_helper_wrpstate(r_temp); >>>>> tcg_temp_free(r_temp); >>>>> dc->npc = DYNAMIC_PC; >>>>> } >>>>> break; >>>>> >>>>> ? >>>>> This fails with the same error message. >>>> >>>> Close, but you need to use tcg_temp_local_new(). Does this work? >>>> >>>> diff --git a/target-sparc/translate.c b/target-sparc/translate.c >>>> index 3c958b2..52fa2f1 100644 >>>> --- a/target-sparc/translate.c >>>> +++ b/target-sparc/translate.c >>>> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) >>>> tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); >>>> break; >>>> case 6: // pstate >>>> - save_state(dc, cpu_cond); >>>> - gen_helper_wrpstate(cpu_tmp0); >>>> - dc->npc = DYNAMIC_PC; >>>> + { >>>> + TCGv r_tmp = tcg_temp_local_new(); >>>> + >>>> + tcg_gen_mov_tl(r_tmp, cpu_tmp0); >>>> + save_state(dc, cpu_cond); >>>> + gen_helper_wrpstate(r_tmp); >>>> + tcg_temp_free_ptr(r_tmp); >>>> + dc->npc = DYNAMIC_PC; >>>> + } >>>> break; >>>> case 7: // tl >>>> save_state(dc, cpu_cond); >>>> >>> >>> This one works (I have tested similar change but tcg_temp_free() would >>> be OK as well) >> >> Thanks for spotting, that was not intentional. >> >>> Could anyone please explain why it still works without being in delay >>> slot? Simply pointing at optimizer does not work :( >> >> save_npc() will call gen_generic_branch() only if dc->npc == JUMP_PC, >> which is used for delay slot handling. Otherwise gen_generic_branch() >> is not called and thus brcond does not get used, so the temporary >> will not be flushed for non-delay slot cases. > > Great that explains it all, thanks! I think audit over translate.c > would be needed, alternatively all main switch temps for sparc64 could > be made local as hinted by Laurent - provided there is no significant > speed impact. I think there may be a few other cases. Local temps are allocated from stack instead of CPU registers, so there could be a speed impact. It would be better if this case could be detected more reliably instead, for example enhancing the debugging mode for TCG.
On Wed, Apr 27, 2011 at 7:29 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: > On Tue, Apr 26, 2011 at 8:36 PM, Blue Swirl <blauwirbel@gmail.com> wrote: >> On Tue, Apr 26, 2011 at 8:02 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>> On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: >>>> On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >>>>> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >>>>> <laurent.desnogues@gmail.com> wrote: >>>>> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >>>>> > <igor.v.kovalenko@gmail.com> wrote: >>>>> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >>>>> >> <laurent.desnogues@gmail.com> wrote: >>>>> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4qemu@gmail.com> wrote: >>>>> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >>>>> >>>> <igor.v.kovalenko@gmail.com> wrote: >>>>> >>>>>>> Do you have public test case? >>>>> >>>>>>> It is possible to code this delay slot write test but real issue may >>>>> >>>>>>> be corruption elsewhere. >>>>> >>>> >>>>> >>>> The test case is trivial: it's just the two instructions, branch and wrpr. >>>>> >>>> >>>>> >>>>> In theory there could be multiple issues including compiler induced ones. >>>>> >>>>> I'd prefer to see some kind of reproducible testcase. >>>>> >>>> >>>>> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and >>>>> >>>> needed only because the bios entry point is 0x20). >>>>> >>>> >>>>> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >>>>> >>>> test-wrpr.bin -nographic >>>>> >>>> Already up-to-date. >>>>> >>>> make[1]: Nothing to be done for `all'. >>>>> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >>>>> >>>> Aborted >>>>> >>> >>>>> >>> The problem seems to be that wrpr is using a non-local >>>>> >>> TCG tmp (cpu_tmp0). >>>>> >> >>>>> >> Just tried the test case with write to %pil - seems like write itself is OK. >>>>> >> The issue appears to be with save_state() call since adding save_state >>>>> >> to %pil case provokes the same tcg abort. >>>>> > >>>>> > The problem is that cpu_tmp0, not being a local tmp, doesn't >>>>> > need to be saved across helper calls. This results in the >>>>> > TCG "optimizer" getting rid of it even though it's later used. >>>>> > Look at the log and you'll see what I mean :-) >>>>> >>>>> I'm not very comfortable with tcg yet. Would it be possible to teach >>>>> optimizer working with delay slots? Or do I look in the wrong place. >>>>> >>>> >>>> The problem is not on the TCG side, but on the target-sparc/translate.c >>>> side: >>>> >>>> | case 0x32: /* wrwim, V9 wrpr */ >>>> | { >>>> | if (!supervisor(dc)) >>>> | goto priv_insn; >>>> | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2); >>>> | #ifdef TARGET_SPARC64 >>>> >>>> Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not >>>> saved across TCG branches. >>>> >>>> [...] >>>> >>>> | case 6: // pstate >>>> | save_state(dc, cpu_cond); >>>> | gen_helper_wrpstate(cpu_tmp0); >>>> | dc->npc = DYNAMIC_PC; >>>> | break; >>>> >>>> save_state() calls save_npc(), which in turns might call >>>> gen_generic_branch(): >>> >>> Hmm. This is not the only instruction using save_state() and cpu_tmp0. >>> At least ldd is another example. >>> >>>> | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2, >>>> | TCGv r_cond) >>>> | { >>>> | int l1, l2; >>>> | >>>> | l1 = gen_new_label(); >>>> | l2 = gen_new_label(); >>>> | >>>> | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); >>>> | >>>> | tcg_gen_movi_tl(cpu_npc, npc1); >>>> | tcg_gen_br(l2); >>>> | >>>> | gen_set_label(l1); >>>> | tcg_gen_movi_tl(cpu_npc, npc2); >>>> | gen_set_label(l2); >>>> | } >>>> >>>> And here is the TCG branch, which drop the TCG temp cpu_temp0. >>>> >>>> The solution is either to rewrite gen_generic_branch() without TCG >>>> branches, or to use a TCG temp local instead of a TCG temp. >>> >>> You mean something like >>> >>> case 6: // pstate >>> { >>> TCGv r_temp; >>> >>> r_temp = tcg_temp_new(); >>> tcg_gen_mov_tl(r_temp, cpu_tmp0); >>> save_state(dc, cpu_cond); >>> gen_helper_wrpstate(r_temp); >>> tcg_temp_free(r_temp); >>> dc->npc = DYNAMIC_PC; >>> } >>> break; >>> >>> ? >>> This fails with the same error message. >> >> Close, but you need to use tcg_temp_local_new(). Does this work? >> >> diff --git a/target-sparc/translate.c b/target-sparc/translate.c >> index 3c958b2..52fa2f1 100644 >> --- a/target-sparc/translate.c >> +++ b/target-sparc/translate.c >> @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) >> tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); >> break; >> case 6: // pstate >> - save_state(dc, cpu_cond); >> - gen_helper_wrpstate(cpu_tmp0); >> - dc->npc = DYNAMIC_PC; >> + { >> + TCGv r_tmp = tcg_temp_local_new(); >> + >> + tcg_gen_mov_tl(r_tmp, cpu_tmp0); >> + save_state(dc, cpu_cond); >> + gen_helper_wrpstate(r_tmp); >> + tcg_temp_free_ptr(r_tmp); >> + dc->npc = DYNAMIC_PC; >> + } >> break; >> case 7: // tl >> save_state(dc, cpu_cond); >> > > Yep. This (with tcg_temp_free) passes my tests, thanks! > A bonus question: how is immediate case handled? > > I don't see any explicit "if (IS_IMM)" and yet it seems to produce the > expected results for the immediate instructions. Immediates are handled by get_src2().
diff --git a/target-sparc/translate.c b/target-sparc/translate.c index 3c958b2..52fa2f1 100644 --- a/target-sparc/translate.c +++ b/target-sparc/translate.c @@ -3505,9 +3505,15 @@ static void disas_sparc_insn(DisasContext * dc) tcg_gen_mov_tl(cpu_tbr, cpu_tmp0); break; case 6: // pstate - save_state(dc, cpu_cond); - gen_helper_wrpstate(cpu_tmp0); - dc->npc = DYNAMIC_PC; + { + TCGv r_tmp = tcg_temp_local_new(); + + tcg_gen_mov_tl(r_tmp, cpu_tmp0); + save_state(dc, cpu_cond); + gen_helper_wrpstate(r_tmp); + tcg_temp_free_ptr(r_tmp); + dc->npc = DYNAMIC_PC; + } break; case 7: // tl save_state(dc, cpu_cond);