Patchwork [middle-end] : Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return

login
register
mail settings
Submitter Uros Bizjak
Date Nov. 4, 2012, 1:52 p.m.
Message ID <CAFULd4Ya82yPJc3rvgH=OHxP5+pz7-MFovj_nBGWFEyjuGVOgg@mail.gmail.com>
Download mbox | patch
Permalink /patch/197042/
State New
Headers show

Comments

Uros Bizjak - Nov. 4, 2012, 1:52 p.m.
Hello!

Vzeroupper placement patch uses MODE_EXIT to determine if vzeroupper
has to be placed before function exit. However, when compiling
following test

--cut here--
typedef struct objc_class *Class;
typedef struct objc_object
{
  Class class_pointer;
} *id;

typedef const struct objc_selector *SEL;
typedef void * retval_t;
typedef void * arglist_t;

extern retval_t __objc_forward (id object, SEL sel, arglist_t args);

double
__objc_double_forward (id rcv, SEL op, ...)
{
  void *args, *res;

  args = __builtin_apply_args ();
  res = __objc_forward (rcv, op, args);
  __builtin_return (res);
}
--cut here--

__builtin_return emits a sequence of loads from "argument area" to all
possible function return registers in their widest mode (together with
corresponding USE RTXes), creating:

(insn 31 30 33 2 (set (reg:DI 0 ax)
        (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 33 31 35 2 (set (reg:XF 8 st)
        (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 16 [0x10])) [0 S16 A8])) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 35 33 32 2 (set (reg:OI 21 xmm0)
        (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 32 [0x20])) [0 S32 A8])) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 32 35 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
     (nil))

This sequence breaks assumption in mode-switching.c, that final
function return register load operates in MODE_EXIT mode and triggere
following code:

		    for (j = n_entities - 1; j >= 0; j--)
		      {
			int e = entity_map[j];
			int mode = MODE_NEEDED (e, return_copy);

			if (mode != num_modes[e] && mode != MODE_EXIT (e))
			  break;
		      }

As discussed above, modes of loads, generated from __builtin_apply
have no connection to function return mode. mode-switching.c does
detect __builtin_apply situation and raises maybe_builtin_apply flag,
but doesn't use it to short-circuit wrong check. In proposed patch, we
detect this situation and raise force_late_switch in the same way, as
SH4 does for its "late" fpscr emission. For the testcase above,
following RTX sequence is generated:

(insn 31 30 33 2 (set (reg:DI 0 ax)
        (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8]))
avx-vzeroupper-27.c:23 63 {*movdi_internal_rex64}
     (nil))
(insn 33 31 35 2 (set (reg:XF 8 st)
        (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 16 [0x10])) [0 S16 A8]))
avx-vzeroupper-27.c:23 106 {*movxf_internal}
     (nil))
(insn 35 33 52 2 (set (reg:OI 21 xmm0)
        (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
                (const_int 32 [0x20])) [0 S32 A8]))
avx-vzeroupper-27.c:23 60 {*movoi_internal_avx}
     (expr_list:REG_DEAD (reg/v/f:DI 60 [ res ])
        (nil)))
(insn 52 35 49 2 (unspec_volatile [
            (const_int 9 [0x9])
        ] UNSPECV_VZEROUPPER) -1
     (nil))
(note 49 52 32 2 NOTE_INSN_DELETED)
(insn 32 49 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
     (expr_list:REG_DEAD (reg:DI 0 ax)
        (nil)))
(insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
     (expr_list:REG_DEAD (reg:XF 8 st)
        (nil)))
(insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
     (nil))
(insn 44 36 0 2 (use (reg/i:DF 21 xmm0)) avx-vzeroupper-27.c:24 -1

where mode switching correctly placed vzeroupper insn, based on
MODE_EXIT mode, determined from final use in (insn 44).

2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
	    Uros Bizjak  <ubizjak@gmail.com>

        * mode-switching.c (create_pre_exit): Added code for
maybe_builtin_apply case.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
with vzeroupper patch [1] applied.

I have added SH4 maintainer for possible comments.

OK for mainline SVN?

[1] http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00292.html

Uros.

--
Oleg Endo - Nov. 4, 2012, 2:58 p.m.
On Sun, 2012-11-04 at 14:52 +0100, Uros Bizjak wrote:
> Hello!
> 
> Vzeroupper placement patch uses MODE_EXIT to determine if vzeroupper
> has to be placed before function exit. However, when compiling
> following test
> 
> --cut here--
> typedef struct objc_class *Class;
> typedef struct objc_object
> {
>   Class class_pointer;
> } *id;
> 
> typedef const struct objc_selector *SEL;
> typedef void * retval_t;
> typedef void * arglist_t;
> 
> extern retval_t __objc_forward (id object, SEL sel, arglist_t args);
> 
> double
> __objc_double_forward (id rcv, SEL op, ...)
> {
>   void *args, *res;
> 
>   args = __builtin_apply_args ();
>   res = __objc_forward (rcv, op, args);
>   __builtin_return (res);
> }
> --cut here--
> 
> __builtin_return emits a sequence of loads from "argument area" to all
> possible function return registers in their widest mode (together with
> corresponding USE RTXes), creating:
> 
> (insn 31 30 33 2 (set (reg:DI 0 ax)
>         (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 33 31 35 2 (set (reg:XF 8 st)
>         (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 16 [0x10])) [0 S16 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 35 33 32 2 (set (reg:OI 21 xmm0)
>         (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 32 [0x20])) [0 S32 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 32 35 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
>      (nil))
> 
> This sequence breaks assumption in mode-switching.c, that final
> function return register load operates in MODE_EXIT mode and triggere
> following code:
> 
> 		    for (j = n_entities - 1; j >= 0; j--)
> 		      {
> 			int e = entity_map[j];
> 			int mode = MODE_NEEDED (e, return_copy);
> 
> 			if (mode != num_modes[e] && mode != MODE_EXIT (e))
> 			  break;
> 		      }
> 
> As discussed above, modes of loads, generated from __builtin_apply
> have no connection to function return mode. mode-switching.c does
> detect __builtin_apply situation and raises maybe_builtin_apply flag,
> but doesn't use it to short-circuit wrong check. In proposed patch, we
> detect this situation and raise force_late_switch in the same way, as
> SH4 does for its "late" fpscr emission. For the testcase above,
> following RTX sequence is generated:
> 
> (insn 31 30 33 2 (set (reg:DI 0 ax)
>         (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8]))
> avx-vzeroupper-27.c:23 63 {*movdi_internal_rex64}
>      (nil))
> (insn 33 31 35 2 (set (reg:XF 8 st)
>         (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 16 [0x10])) [0 S16 A8]))
> avx-vzeroupper-27.c:23 106 {*movxf_internal}
>      (nil))
> (insn 35 33 52 2 (set (reg:OI 21 xmm0)
>         (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 32 [0x20])) [0 S32 A8]))
> avx-vzeroupper-27.c:23 60 {*movoi_internal_avx}
>      (expr_list:REG_DEAD (reg/v/f:DI 60 [ res ])
>         (nil)))
> (insn 52 35 49 2 (unspec_volatile [
>             (const_int 9 [0x9])
>         ] UNSPECV_VZEROUPPER) -1
>      (nil))
> (note 49 52 32 2 NOTE_INSN_DELETED)
> (insn 32 49 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
>      (expr_list:REG_DEAD (reg:DI 0 ax)
>         (nil)))
> (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
>      (expr_list:REG_DEAD (reg:XF 8 st)
>         (nil)))
> (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 44 36 0 2 (use (reg/i:DF 21 xmm0)) avx-vzeroupper-27.c:24 -1
> 
> where mode switching correctly placed vzeroupper insn, based on
> MODE_EXIT mode, determined from final use in (insn 44).
> 
> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
> 	    Uros Bizjak  <ubizjak@gmail.com>
> 
>         * mode-switching.c (create_pre_exit): Added code for
> maybe_builtin_apply case.
> 
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
> with vzeroupper patch [1] applied.
> 
> I have added SH4 maintainer for possible comments.

BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
I've tried those test cases with the vzeroupper patch [1] applied.
Unfortunately, it doesn't change the situation of the two PRs.

Cheers,
Oleg
Uros Bizjak - Nov. 4, 2012, 3:02 p.m.
On Sun, Nov 4, 2012 at 3:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote:

>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
>> with vzeroupper patch [1] applied.
>>
>> I have added SH4 maintainer for possible comments.
>
> BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
> I've tried those test cases with the vzeroupper patch [1] applied.
> Unfortunately, it doesn't change the situation of the two PRs.

True, this patch fixes very specific case involving __builtin_apply
only.  Hopefully it doesn't _break_ something on SH.

Uros.
Uros Bizjak - Nov. 4, 2012, 3:23 p.m.
On Sun, Nov 4, 2012 at 4:02 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Nov 4, 2012 at 3:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote:
>
>>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
>>> with vzeroupper patch [1] applied.
>>>
>>> I have added SH4 maintainer for possible comments.
>>
>> BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
>> I've tried those test cases with the vzeroupper patch [1] applied.
>> Unfortunately, it doesn't change the situation of the two PRs.
>
> True, this patch fixes very specific case involving __builtin_apply
> only.  Hopefully it doesn't _break_ something on SH.

FYI, the testcase from PR41993 involving __builtin_return inserts
vzeroupper at correct place, even with "-O0 -mavx -vzeroupper", so the
ICE seems SH specific. The testcase from PR49220 works OK as well for
all compile flags I have tried, although YMM registers are not used,
and consequently no vzerouppers are inserted.

Uros.
Uros Bizjak - Nov. 4, 2012, 6:06 p.m.
On Sun, Nov 4, 2012 at 4:23 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
>>>> with vzeroupper patch [1] applied.
>>>>
>>>> I have added SH4 maintainer for possible comments.
>>>
>>> BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
>>> I've tried those test cases with the vzeroupper patch [1] applied.
>>> Unfortunately, it doesn't change the situation of the two PRs.
>>
>> True, this patch fixes very specific case involving __builtin_apply
>> only.  Hopefully it doesn't _break_ something on SH.
>
> FYI, the testcase from PR41993 involving __builtin_return inserts
> vzeroupper at correct place, even with "-O0 -mavx -vzeroupper", so the
> ICE seems SH specific. The testcase from PR49220 works OK as well for
> all compile flags I have tried, although YMM registers are not used,
> and consequently no vzerouppers are inserted.

Please see PR41993 for follow-up discussion and possible fix.

Uros.
Kaz Kojima - Nov. 5, 2012, 9:10 a.m.
Uros Bizjak <ubizjak@gmail.com> wrote:
> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
> 	    Uros Bizjak  <ubizjak@gmail.com>
> 
>         * mode-switching.c (create_pre_exit): Added code for
> maybe_builtin_apply case.
> 
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
> with vzeroupper patch [1] applied.
> 
> I have added SH4 maintainer for possible comments.

I've confirmed that there are no new failures with the patch
on sh4-unknown-linux-gnu.
BTW, it looks that the copyright year of mode-switching.c
should be updated.


Regards,
	kaz
Eric Botcazou - Nov. 5, 2012, 6:05 p.m.
> This sequence breaks assumption in mode-switching.c, that final
> function return register load operates in MODE_EXIT mode and triggere
> following code:
> 
> 		    for (j = n_entities - 1; j >= 0; j--)
> 		      {
> 			int e = entity_map[j];
> 			int mode = MODE_NEEDED (e, return_copy);
> 
> 			if (mode != num_modes[e] && mode != MODE_EXIT (e))
> 			  break;
> 		      }
> 
> As discussed above, modes of loads, generated from __builtin_apply
> have no connection to function return mode. mode-switching.c does
> detect __builtin_apply situation and raises maybe_builtin_apply flag,
> but doesn't use it to short-circuit wrong check. In proposed patch, we
> detect this situation and raise force_late_switch in the same way, as
> SH4 does for its "late" fpscr emission.

If I understand correctly, we need to insert the vzeroupper because the 
function returns double in SSE registers but we generate an OImode load 
instead of a DFmode load because of the __builtin_return.  So we're in the 
forced_late_switch case but we fail to recognize the tweaked return value load 
since the number of registers doesn't match.

If so, I'd rather add another special case, like for the SH4, instead of a 
generic bypass for maybe_builtin_apply, something along the lines of:

	/* For the x86 with AVX, we might be using a larger load for a value
	   returned in SSE registers and we want to put the final mode switch
	   after this return value copy.  */
	if (copy_start == ret_start
	    && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
	    && copy_num >= nregs
	    && OBJECT_P (SET_SRC (return_copy_pat)))
	  forced_late_switch = 1;

> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>
> 	    Uros Bizjak  <ubizjak@gmail.com>
> 
>         * mode-switching.c (create_pre_exit): Added code for
> maybe_builtin_apply case.

Present tense in ChangeLog.

Patch

Index: mode-switching.c
===================================================================
--- mode-switching.c    (revision 193133)
+++ mode-switching.c    (working copy)
@@ -342,6 +342,16 @@  create_pre_exit (int n_entities, int *entity_map,
                      }
                    if (j >= 0)
                      {
+                       /* __builtin_return emits a sequence of loads to all
+                          function value registers in their widest mode,
+                          which breaks the assumption on the mode of the
+                          return register load. Allow this situation, so the
+                          final mode switch will be emitted after the load.  */
+                       if (maybe_builtin_apply
+                           && targetm.calls.function_value_regno_p
+                               (copy_start))
+                         forced_late_switch = 1;
+
                        /* For the SH4, floating point loads depend on fpscr,
                           thus we might need to put the final mode switch
                           after the return value copy.  That is still OK,