diff mbox

[middle-end] : Fix mode-switching MODE_EXIT check with __builtin_apply/__builtin_return

Message ID CAFULd4ZtkLtw3rzoPGaAGCiH3z8K7dGdvq8iZqsyN5-ZTUFY+g@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak Nov. 5, 2012, 6:34 p.m. UTC
On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> This sequence breaks assumption in mode-switching.c, that final
>> function return register load operates in MODE_EXIT mode and triggere
>> following code:
>>
>>                   for (j = n_entities - 1; j >= 0; j--)
>>                     {
>>                       int e = entity_map[j];
>>                       int mode = MODE_NEEDED (e, return_copy);
>>
>>                       if (mode != num_modes[e] && mode != MODE_EXIT (e))
>>                         break;
>>                     }
>>
>> As discussed above, modes of loads, generated from __builtin_apply
>> have no connection to function return mode. mode-switching.c does
>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>> detect this situation and raise force_late_switch in the same way, as
>> SH4 does for its "late" fpscr emission.
>
> If I understand correctly, we need to insert the vzeroupper because the
> function returns double in SSE registers but we generate an OImode load
> instead of a DFmode load because of the __builtin_return.  So we're in the
> forced_late_switch case but we fail to recognize the tweaked return value load
> since the number of registers doesn't match.
>
> If so, I'd rather add another special case, like for the SH4, instead of a
> generic bypass for maybe_builtin_apply, something along the lines of:
>
>         /* For the x86 with AVX, we might be using a larger load for a value
>            returned in SSE registers and we want to put the final mode switch
>            after this return value copy.  */
>         if (copy_start == ret_start
>             && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
>             && copy_num >= nregs
>             && OBJECT_P (SET_SRC (return_copy_pat)))
>           forced_late_switch = 1;

Yes, this approach also works.

I assume it is OK to commit attached patch?

2012-11-05  Eric Botcazou  <ebotcazou@adacore.com>
	    Uros Bizjak  <ubizjak@gmail.com>

	* mode-switching.c (create_pre_exit): Force late switch for
	__builtin_return case, when value, returned in SSE register,
	was loaded using OImode load.

Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c

Thanks,
Uros.

Comments

Uros Bizjak Nov. 5, 2012, 6:43 p.m. UTC | #1
On Mon, Nov 5, 2012 at 7:34 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>> This sequence breaks assumption in mode-switching.c, that final
>>> function return register load operates in MODE_EXIT mode and triggere
>>> following code:
>>>
>>>                   for (j = n_entities - 1; j >= 0; j--)
>>>                     {
>>>                       int e = entity_map[j];
>>>                       int mode = MODE_NEEDED (e, return_copy);
>>>
>>>                       if (mode != num_modes[e] && mode != MODE_EXIT (e))
>>>                         break;
>>>                     }
>>>
>>> As discussed above, modes of loads, generated from __builtin_apply
>>> have no connection to function return mode. mode-switching.c does
>>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>>> detect this situation and raise force_late_switch in the same way, as
>>> SH4 does for its "late" fpscr emission.
>>
>> If I understand correctly, we need to insert the vzeroupper because the
>> function returns double in SSE registers but we generate an OImode load
>> instead of a DFmode load because of the __builtin_return.  So we're in the
>> forced_late_switch case but we fail to recognize the tweaked return value load
>> since the number of registers doesn't match.
>>
>> If so, I'd rather add another special case, like for the SH4, instead of a
>> generic bypass for maybe_builtin_apply, something along the lines of:
>>
>>         /* For the x86 with AVX, we might be using a larger load for a value
>>            returned in SSE registers and we want to put the final mode switch
>>            after this return value copy.  */
>>         if (copy_start == ret_start
>>             && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
>>             && copy_num >= nregs
>>             && OBJECT_P (SET_SRC (return_copy_pat)))
>>           forced_late_switch = 1;
>
> Yes, this approach also works.
>
> I assume it is OK to commit attached patch?
>
> 2012-11-05  Eric Botcazou  <ebotcazou@adacore.com>
>             Uros Bizjak  <ubizjak@gmail.com>
>
>         * mode-switching.c (create_pre_exit): Force late switch for
>         __builtin_return case, when value, returned in SSE register,
>         was loaded using OImode load.
>
> Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c

Unfortunately the proposed patch fails the testcase from PR41993:

--cut here--
short retframe_short (void *rframe)
{
    __builtin_return (rframe);
}
--cut here--

Uros.
>
> Thanks,
> Uros.
Uros Bizjak Nov. 5, 2012, 7:07 p.m. UTC | #2
On Mon, Nov 5, 2012 at 7:43 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>>> As discussed above, modes of loads, generated from __builtin_apply
>>>> have no connection to function return mode. mode-switching.c does
>>>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>>>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>>>> detect this situation and raise force_late_switch in the same way, as
>>>> SH4 does for its "late" fpscr emission.
>>>
>>> If I understand correctly, we need to insert the vzeroupper because the
>>> function returns double in SSE registers but we generate an OImode load
>>> instead of a DFmode load because of the __builtin_return.  So we're in the
>>> forced_late_switch case but we fail to recognize the tweaked return value load
>>> since the number of registers doesn't match.

Actually, the complication with __buitlin_apply/__builtin_return is
that it blindly loads all possible return registers. So, in PR41993
case, we load SSE register using OImode load (that forces AVX dirty
state), even if we actually return %eax. In your patch it is assumed
that wide-load corresponds to the current function return register,
which is not the case.

Please note, that we enter the above code only in case the needed mode
of processed insn is different than MODE_EXIT. So in x86 case, we know
that this was due to hard-register load in "wrong" mode from
__builtin_{apply,return}. My patch also uses extra condition, where
hard-reg should be one of possible return registers, but not only
current function return register.

Uros.
Eric Botcazou Nov. 5, 2012, 7:29 p.m. UTC | #3
> Unfortunately the proposed patch fails the testcase from PR41993:
> 
> --cut here--
> short retframe_short (void *rframe)
> {
>     __builtin_return (rframe);
> }

OK, so that's not only an issue with the mode of the return register, but with 
the return register itself.  Then the original patch is OK if the comment is 
reworked, something like:

   /* __builtin_return emits a sequence of loads to all return
      registers.  One of them might require another mode than
      MODE_EXIT, even if it is unrelated to the return value,
      so we want to put the final mode switch after it.  */
diff mbox

Patch

Index: mode-switching.c
===================================================================
--- mode-switching.c	(revision 193174)
+++ mode-switching.c	(working copy)
@@ -1,6 +1,6 @@ 
 /* CPU mode switching
    Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008,
-   2009, 2010 Free Software Foundation, Inc.
+   2009, 2010, 2012  Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -342,6 +342,17 @@  create_pre_exit (int n_entities, int *entity_map,
 		      }
 		    if (j >= 0)
 		      {
+			/* For the x86 with AVX, we might be using a larger
+			   load for a value returned in SSE registers and we
+			   want to put the final mode switch after this
+			   return value copy.  */
+			if (copy_start == ret_start
+			    && nregs
+			       == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
+			    && copy_num >= nregs
+			    && OBJECT_P (SET_SRC (return_copy_pat)))
+			  forced_late_switch = 1;
+
 			/* For the SH4, floating point loads depend on fpscr,
 			   thus we might need to put the final mode switch
 			   after the return value copy.  That is still OK,