Message ID | CAFULd4Ya82yPJc3rvgH=OHxP5+pz7-MFovj_nBGWFEyjuGVOgg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Sun, 2012-11-04 at 14:52 +0100, Uros Bizjak wrote: > Hello! > > Vzeroupper placement patch uses MODE_EXIT to determine if vzeroupper > has to be placed before function exit. However, when compiling > following test > > --cut here-- > typedef struct objc_class *Class; > typedef struct objc_object > { > Class class_pointer; > } *id; > > typedef const struct objc_selector *SEL; > typedef void * retval_t; > typedef void * arglist_t; > > extern retval_t __objc_forward (id object, SEL sel, arglist_t args); > > double > __objc_double_forward (id rcv, SEL op, ...) > { > void *args, *res; > > args = __builtin_apply_args (); > res = __objc_forward (rcv, op, args); > __builtin_return (res); > } > --cut here-- > > __builtin_return emits a sequence of loads from "argument area" to all > possible function return registers in their widest mode (together with > corresponding USE RTXes), creating: > > (insn 31 30 33 2 (set (reg:DI 0 ax) > (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 33 31 35 2 (set (reg:XF 8 st) > (mem:XF (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 16 [0x10])) [0 S16 A8])) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 35 33 32 2 (set (reg:OI 21 xmm0) > (mem:OI (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 32 [0x20])) [0 S32 A8])) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 32 35 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1 > (nil)) > > This sequence breaks assumption in mode-switching.c, that final > function return register load operates in MODE_EXIT mode and triggere > following code: > > for (j = n_entities - 1; j >= 0; j--) > { > int e = entity_map[j]; > int mode = MODE_NEEDED (e, return_copy); > > if (mode != num_modes[e] && mode != MODE_EXIT (e)) > break; > } > > As discussed above, modes of loads, generated from __builtin_apply > have no connection to function return mode. mode-switching.c does > detect __builtin_apply situation and raises maybe_builtin_apply flag, > but doesn't use it to short-circuit wrong check. In proposed patch, we > detect this situation and raise force_late_switch in the same way, as > SH4 does for its "late" fpscr emission. For the testcase above, > following RTX sequence is generated: > > (insn 31 30 33 2 (set (reg:DI 0 ax) > (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) > avx-vzeroupper-27.c:23 63 {*movdi_internal_rex64} > (nil)) > (insn 33 31 35 2 (set (reg:XF 8 st) > (mem:XF (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 16 [0x10])) [0 S16 A8])) > avx-vzeroupper-27.c:23 106 {*movxf_internal} > (nil)) > (insn 35 33 52 2 (set (reg:OI 21 xmm0) > (mem:OI (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 32 [0x20])) [0 S32 A8])) > avx-vzeroupper-27.c:23 60 {*movoi_internal_avx} > (expr_list:REG_DEAD (reg/v/f:DI 60 [ res ]) > (nil))) > (insn 52 35 49 2 (unspec_volatile [ > (const_int 9 [0x9]) > ] UNSPECV_VZEROUPPER) -1 > (nil)) > (note 49 52 32 2 NOTE_INSN_DELETED) > (insn 32 49 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1 > (expr_list:REG_DEAD (reg:DI 0 ax) > (nil))) > (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1 > (expr_list:REG_DEAD (reg:XF 8 st) > (nil))) > (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 44 36 0 2 (use (reg/i:DF 21 xmm0)) avx-vzeroupper-27.c:24 -1 > > where mode switching correctly placed vzeroupper insn, based on > MODE_EXIT mode, determined from final use in (insn 44). > > 2012-11-04 Vladimir Yakovlev <vladimir.b.yakovlev@intel.com> > Uros Bizjak <ubizjak@gmail.com> > > * mode-switching.c (create_pre_exit): Added code for > maybe_builtin_apply case. > > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, > with vzeroupper patch [1] applied. > > I have added SH4 maintainer for possible comments. BTW, there are at least two mode-switching SH PRs: 41933 and 49220. I've tried those test cases with the vzeroupper patch [1] applied. Unfortunately, it doesn't change the situation of the two PRs. Cheers, Oleg
On Sun, Nov 4, 2012 at 3:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote: >> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, >> with vzeroupper patch [1] applied. >> >> I have added SH4 maintainer for possible comments. > > BTW, there are at least two mode-switching SH PRs: 41933 and 49220. > I've tried those test cases with the vzeroupper patch [1] applied. > Unfortunately, it doesn't change the situation of the two PRs. True, this patch fixes very specific case involving __builtin_apply only. Hopefully it doesn't _break_ something on SH. Uros.
On Sun, Nov 4, 2012 at 4:02 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, Nov 4, 2012 at 3:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote: > >>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, >>> with vzeroupper patch [1] applied. >>> >>> I have added SH4 maintainer for possible comments. >> >> BTW, there are at least two mode-switching SH PRs: 41933 and 49220. >> I've tried those test cases with the vzeroupper patch [1] applied. >> Unfortunately, it doesn't change the situation of the two PRs. > > True, this patch fixes very specific case involving __builtin_apply > only. Hopefully it doesn't _break_ something on SH. FYI, the testcase from PR41993 involving __builtin_return inserts vzeroupper at correct place, even with "-O0 -mavx -vzeroupper", so the ICE seems SH specific. The testcase from PR49220 works OK as well for all compile flags I have tried, although YMM registers are not used, and consequently no vzerouppers are inserted. Uros.
On Sun, Nov 4, 2012 at 4:23 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >>>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, >>>> with vzeroupper patch [1] applied. >>>> >>>> I have added SH4 maintainer for possible comments. >>> >>> BTW, there are at least two mode-switching SH PRs: 41933 and 49220. >>> I've tried those test cases with the vzeroupper patch [1] applied. >>> Unfortunately, it doesn't change the situation of the two PRs. >> >> True, this patch fixes very specific case involving __builtin_apply >> only. Hopefully it doesn't _break_ something on SH. > > FYI, the testcase from PR41993 involving __builtin_return inserts > vzeroupper at correct place, even with "-O0 -mavx -vzeroupper", so the > ICE seems SH specific. The testcase from PR49220 works OK as well for > all compile flags I have tried, although YMM registers are not used, > and consequently no vzerouppers are inserted. Please see PR41993 for follow-up discussion and possible fix. Uros.
Uros Bizjak <ubizjak@gmail.com> wrote: > 2012-11-04 Vladimir Yakovlev <vladimir.b.yakovlev@intel.com> > Uros Bizjak <ubizjak@gmail.com> > > * mode-switching.c (create_pre_exit): Added code for > maybe_builtin_apply case. > > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, > with vzeroupper patch [1] applied. > > I have added SH4 maintainer for possible comments. I've confirmed that there are no new failures with the patch on sh4-unknown-linux-gnu. BTW, it looks that the copyright year of mode-switching.c should be updated. Regards, kaz
> This sequence breaks assumption in mode-switching.c, that final > function return register load operates in MODE_EXIT mode and triggere > following code: > > for (j = n_entities - 1; j >= 0; j--) > { > int e = entity_map[j]; > int mode = MODE_NEEDED (e, return_copy); > > if (mode != num_modes[e] && mode != MODE_EXIT (e)) > break; > } > > As discussed above, modes of loads, generated from __builtin_apply > have no connection to function return mode. mode-switching.c does > detect __builtin_apply situation and raises maybe_builtin_apply flag, > but doesn't use it to short-circuit wrong check. In proposed patch, we > detect this situation and raise force_late_switch in the same way, as > SH4 does for its "late" fpscr emission. If I understand correctly, we need to insert the vzeroupper because the function returns double in SSE registers but we generate an OImode load instead of a DFmode load because of the __builtin_return. So we're in the forced_late_switch case but we fail to recognize the tweaked return value load since the number of registers doesn't match. If so, I'd rather add another special case, like for the SH4, instead of a generic bypass for maybe_builtin_apply, something along the lines of: /* For the x86 with AVX, we might be using a larger load for a value returned in SSE registers and we want to put the final mode switch after this return value copy. */ if (copy_start == ret_start && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)] && copy_num >= nregs && OBJECT_P (SET_SRC (return_copy_pat))) forced_late_switch = 1; > 2012-11-04 Vladimir Yakovlev <vladimir.b.yakovlev@intel.com> > Uros Bizjak <ubizjak@gmail.com> > > * mode-switching.c (create_pre_exit): Added code for > maybe_builtin_apply case. Present tense in ChangeLog.
Index: mode-switching.c =================================================================== --- mode-switching.c (revision 193133) +++ mode-switching.c (working copy) @@ -342,6 +342,16 @@ create_pre_exit (int n_entities, int *entity_map, } if (j >= 0) { + /* __builtin_return emits a sequence of loads to all + function value registers in their widest mode, + which breaks the assumption on the mode of the + return register load. Allow this situation, so the + final mode switch will be emitted after the load. */ + if (maybe_builtin_apply + && targetm.calls.function_value_regno_p + (copy_start)) + forced_late_switch = 1; + /* For the SH4, floating point loads depend on fpscr, thus we might need to put the final mode switch after the return value copy. That is still OK,