diff mbox

RFA: another patch to fix PR61360

Message ID 54218928.4040709@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Sept. 23, 2014, 2:52 p.m. UTC
On 09/23/2014 02:07 AM, Uros Bizjak wrote:
> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>   The previous patch to solve PR61360 fixed the problem in IRA (it was
>> easier for me to do as I know the code well)
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
>>
>>   Although imo it was an ok fix, Richard expressed concerns with the patch
>> and the practice to have different enable attribute values depending on the
>> current pass.
>>
>>   I don't understand why "x,m" alternative is better to "x,r" and "x,r"
>> should be disabled.  Even if the path from general regs to sse regs is slow
>> (usually such slow path is implemented internally by micro-architecture
>> through cache).  "x,r" alternative results in only smaller insns (including
>> number of insns) with probably the same time for the movement.  So "x,r"
>> should be at least no slower, insn cache should have more locality, and less
>> overhead for decoding/translating insns.
>>
>>   Here I propose another solution avoiding to have different enable
>> attribute values.
>>
>>   The patch was successfully bootstrapped on x86/x86-64 and tested with and
>> without -march=amdfam10 (actually the patch results in 2 less failures when
>> -march=amdfam10 were used).
>>
>>   Uros, is i386.md change ok for the trunk?
> I don't think so. This would be a regression, since 4.8 (and later
> versions until Richard's patch) were able to handle this functionality
> just fine. Please also note that there are a couple of other patterns
> with the same problem, that is using ("nonimmediate_operand" "m")
> constraint.
>
> Please see PR 60704, comment 7 [1]. If LRA is able to fixup
> ("nonimmediate_operand" "m") to a memory from a register, then other
> RTL infrastructure should also be updated for this functionality. IMO,
> recog should be fixed/enhanced, so pseudo registers will also satisfy
> ("nonimmediate_operand" "m") constraint before LRA.
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7
>
>
Uros, my patch does not result in PR60704 (I tested it before submitting
the patch).

Also I don't understand why you are mentioning only "m" alternative as I
enabled another alternative "x,r" in original pattern (alternative "1"
in other words alternative in the middle).

(define_insn "*float<SWI48:mode><MODEF:mode>2_sse"
  [(set (match_operand:MODEF 0 "register_operand" "=f,x,x")
	(float:MODEF
	  (match_operand:SWI48 1 "nonimmediate_operand" "m,r,m")))]
  "SSE_FLOAT_MODE_P (<MODEF:MODE>mode) && TARGET_SSE_MATH"
  "@
   fild%Z1\t%1
   %vcvtsi2<MODEF:ssemodesuffix><SWI48:rex64suffix>\t{%1, %d0|%d0, %1}
   %vcvtsi2<MODEF:ssemodesuffix><SWI48:rex64suffix>\t{%1, %d0|%d0, %1}"
  [(set_attr "type" "fmov,sseicvt,sseicvt")
   (set_attr "prefix" "orig,maybe_vex,maybe_vex")
   (set_attr "mode" "<MODEF:MODE>")
   (set (attr "prefix_rex")
     (if_then_else
       (and (eq_attr "prefix" "maybe_vex")
	    (match_test "<SWI48:MODE>mode == DImode"))
       (const_string "1")
       (const_string "*")))
   (set_attr "unit" "i387,*,*")
   (set_attr "athlon_decode" "*,double,direct")
   (set_attr "amdfam10_decode" "*,vector,double")
   (set_attr "bdver1_decode" "*,double,direct")
   (set_attr "fp_int_src" "true")
   (set (attr "enabled")
     (cond [(eq_attr "alternative" "0")
              (symbol_ref "TARGET_MIX_SSE_I387
                           && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
                                                <SWI48:MODE>mode)")
            (eq_attr "alternative" "1")
              /* ??? For sched1 we need constrain_operands to be able to
                 select an alternative.  Leave this enabled before RA.  */
              (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
                           || optimize_function_for_size_p (cfun)
                           || !(reload_completed
                                || reload_in_progress
                                || lra_in_progress)")
           ]
           (symbol_ref "true")))
   ])

What you are saying would be true if I enabled "x,m" and it was single alternative in insn because constrain_operands rejects such instruction until pseudo is changed by memory.

You are right constrain_operands is not upto LRA possibilities and we should make the following change:


But that is a different story (for insns with single alternative containing only "m").

I guess I should submit such change for recog.c as a separate patch.

Comments

Uros Bizjak Sept. 23, 2014, 3:02 p.m. UTC | #1
On Tue, Sep 23, 2014 at 4:52 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 09/23/2014 02:07 AM, Uros Bizjak wrote:
>> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>   The previous patch to solve PR61360 fixed the problem in IRA (it was
>>> easier for me to do as I know the code well)
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
>>>
>>>   Although imo it was an ok fix, Richard expressed concerns with the patch
>>> and the practice to have different enable attribute values depending on the
>>> current pass.
>>>
>>>   I don't understand why "x,m" alternative is better to "x,r" and "x,r"
>>> should be disabled.  Even if the path from general regs to sse regs is slow
>>> (usually such slow path is implemented internally by micro-architecture
>>> through cache).  "x,r" alternative results in only smaller insns (including
>>> number of insns) with probably the same time for the movement.  So "x,r"
>>> should be at least no slower, insn cache should have more locality, and less
>>> overhead for decoding/translating insns.
>>>
>>>   Here I propose another solution avoiding to have different enable
>>> attribute values.
>>>
>>>   The patch was successfully bootstrapped on x86/x86-64 and tested with and
>>> without -march=amdfam10 (actually the patch results in 2 less failures when
>>> -march=amdfam10 were used).
>>>
>>>   Uros, is i386.md change ok for the trunk?
>> I don't think so. This would be a regression, since 4.8 (and later
>> versions until Richard's patch) were able to handle this functionality
>> just fine. Please also note that there are a couple of other patterns
>> with the same problem, that is using ("nonimmediate_operand" "m")
>> constraint.
>>
>> Please see PR 60704, comment 7 [1]. If LRA is able to fixup
>> ("nonimmediate_operand" "m") to a memory from a register, then other
>> RTL infrastructure should also be updated for this functionality. IMO,
>> recog should be fixed/enhanced, so pseudo registers will also satisfy
>> ("nonimmediate_operand" "m") constraint before LRA.
>>
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7
>>
>>
> Uros, my patch does not result in PR60704 (I tested it before submitting
> the patch).

No, we didn't understand each other. The fix for PR60704 (enablement
of register alternative before LRA) caused PR61360. As I argued in the
referred comment of PR61360, the original fix was wrong, and should be
reverted. So, the condition should simply read:

>    (set (attr "enabled")
>      (cond [(eq_attr "alternative" "0")
>               (symbol_ref "TARGET_MIX_SSE_I387
>                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
>                                                 <SWI48:MODE>mode)")
>             (eq_attr "alternative" "1")
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>                            || optimize_function_for_size_p (cfun)")
>            ]
>            (symbol_ref "true")))
>
> Also I don't understand why you are mentioning only "m" alternative as I
> enabled another alternative "x,r" in original pattern (alternative "1"
> in other words alternative in the middle).

This part of the comment refers to:

(define_insn "*float<SWI48x:mode><MODEF:mode>2_i387"

(as mention in the #c7 comment of PR60704).

> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>
> Index: recog.c
> ===================================================================
> --- recog.c     (revision 215337)
> +++ recog.c     (working copy)
> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>                                || (strict < 0 && CONSTANT_P (op))
>                                /* During reload, accept a pseudo  */
>                                || (reload_in_progress && REG_P (op)
> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
> +                              /* LRA can put reg value into memory if
> +                                 it is necessary.  */
> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>                     win = 1;
>                   else if (insn_extra_address_constraint (cn)
>                            /* Every address operand can be reloaded to fit.  */
>
> But that is a different story (for insns with single alternative containing only "m").
>
> I guess I should submit such change for recog.c as a separate patch.

I think that the above is the right approach to fix PR60704, so the
current PR60704 fix [1] should be reverted.

[1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989

Uros.
Vladimir Makarov Sept. 23, 2014, 3:22 p.m. UTC | #2
On 09/23/2014 11:02 AM, Uros Bizjak wrote:
> On Tue, Sep 23, 2014 at 4:52 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 09/23/2014 02:07 AM, Uros Bizjak wrote:
>>>
>> Uros, my patch does not result in PR60704 (I tested it before submitting
>> the patch).
> No, we didn't understand each other. The fix for PR60704 (enablement
> of register alternative before LRA) caused PR61360. As I argued in the
> referred comment of PR61360, the original fix was wrong, and should be
> reverted. So, the condition should simply read:
>
>>    (set (attr "enabled")
>>      (cond [(eq_attr "alternative" "0")
>>               (symbol_ref "TARGET_MIX_SSE_I387
>>                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
>>                                                 <SWI48:MODE>mode)")
>>             (eq_attr "alternative" "1")
>>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>>                            || optimize_function_for_size_p (cfun)")
>>            ]
>>            (symbol_ref "true")))
>>
>> Also I don't understand why you are mentioning only "m" alternative as I
>> enabled another alternative "x,r" in original pattern (alternative "1"
>> in other words alternative in the middle).
> This part of the comment refers to:
>
> (define_insn "*float<SWI48x:mode><MODEF:mode>2_i387"
>
> (as mention in the #c7 comment of PR60704).

Ok, I see.
>> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>>
>> Index: recog.c
>> ===================================================================
>> --- recog.c     (revision 215337)
>> +++ recog.c     (working copy)
>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>                                || (strict < 0 && CONSTANT_P (op))
>>                                /* During reload, accept a pseudo  */
>>                                || (reload_in_progress && REG_P (op)
>> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
>> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>> +                              /* LRA can put reg value into memory if
>> +                                 it is necessary.  */
>> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>>                     win = 1;
>>                   else if (insn_extra_address_constraint (cn)
>>                            /* Every address operand can be reloaded to fit.  */
>>
>> But that is a different story (for insns with single alternative containing only "m").
>>
>> I guess I should submit such change for recog.c as a separate patch.
> I think that the above is the right approach to fix PR60704, so the
> current PR60704 fix [1] should be reverted.
>
> [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>
>
Ok. I can submit patch reverting it + the change in recog.c.

I have still a question: do we really need

(eq_attr "alternative" "1")
              (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
                           || optimize_function_for_size_p (cfun)")

As I wrote I'd always enable the alternative.  I don't expect performance improvement in disabling this alternative when path r->x is slow (as I heard it is implemented internally by moving through cache anyway).  Even it is slow I believe it is still not faster than r->m->x.  What do you think?
Uros Bizjak Sept. 23, 2014, 3:23 p.m. UTC | #3
On Tue, Sep 23, 2014 at 5:02 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>>   The previous patch to solve PR61360 fixed the problem in IRA (it was
>>>> easier for me to do as I know the code well)
>>>>
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
>>>>
>>>>   Although imo it was an ok fix, Richard expressed concerns with the patch
>>>> and the practice to have different enable attribute values depending on the
>>>> current pass.
>>>>
>>>>   I don't understand why "x,m" alternative is better to "x,r" and "x,r"
>>>> should be disabled.  Even if the path from general regs to sse regs is slow
>>>> (usually such slow path is implemented internally by micro-architecture
>>>> through cache).  "x,r" alternative results in only smaller insns (including
>>>> number of insns) with probably the same time for the movement.  So "x,r"
>>>> should be at least no slower, insn cache should have more locality, and less
>>>> overhead for decoding/translating insns.
>>>>
>>>>   Here I propose another solution avoiding to have different enable
>>>> attribute values.
>>>>
>>>>   The patch was successfully bootstrapped on x86/x86-64 and tested with and
>>>> without -march=amdfam10 (actually the patch results in 2 less failures when
>>>> -march=amdfam10 were used).
>>>>
>>>>   Uros, is i386.md change ok for the trunk?
>>> I don't think so. This would be a regression, since 4.8 (and later
>>> versions until Richard's patch) were able to handle this functionality
>>> just fine. Please also note that there are a couple of other patterns
>>> with the same problem, that is using ("nonimmediate_operand" "m")
>>> constraint.
>>>
>>> Please see PR 60704, comment 7 [1]. If LRA is able to fixup
>>> ("nonimmediate_operand" "m") to a memory from a register, then other
>>> RTL infrastructure should also be updated for this functionality. IMO,
>>> recog should be fixed/enhanced, so pseudo registers will also satisfy
>>> ("nonimmediate_operand" "m") constraint before LRA.
>>>
>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7
>>>
>>>
>> Uros, my patch does not result in PR60704 (I tested it before submitting
>> the patch).
>
> No, we didn't understand each other. The fix for PR60704 (enablement
> of register alternative before LRA) caused PR61360. As I argued in the
> referred comment of PR61360, the original fix was wrong, and should be
> reverted. So, the condition should simply read:
>
>>    (set (attr "enabled")
>>      (cond [(eq_attr "alternative" "0")
>>               (symbol_ref "TARGET_MIX_SSE_I387
>>                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
>>                                                 <SWI48:MODE>mode)")
>>             (eq_attr "alternative" "1")
>>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>>                            || optimize_function_for_size_p (cfun)")
>>            ]
>>            (symbol_ref "true")))
>>
>> Also I don't understand why you are mentioning only "m" alternative as I
>> enabled another alternative "x,r" in original pattern (alternative "1"
>> in other words alternative in the middle).
>
> This part of the comment refers to:
>
> (define_insn "*float<SWI48x:mode><MODEF:mode>2_i387"
>
> (as mention in the #c7 comment of PR60704).
>
>> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>>
>> Index: recog.c
>> ===================================================================
>> --- recog.c     (revision 215337)
>> +++ recog.c     (working copy)
>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>                                || (strict < 0 && CONSTANT_P (op))
>>                                /* During reload, accept a pseudo  */
>>                                || (reload_in_progress && REG_P (op)
>> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
>> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>> +                              /* LRA can put reg value into memory if
>> +                                 it is necessary.  */
>> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>>                     win = 1;
>>                   else if (insn_extra_address_constraint (cn)
>>                            /* Every address operand can be reloaded to fit.  */
>>
>> But that is a different story (for insns with single alternative containing only "m").

This is also the situation when the fix for PR60704 is reverted. The
"r" alternative in "*float<SWI48:mode><MODEF:
mode>2_sse" is unconditionally disabled and all remainting alternatives are "m".

Uros.
Uros Bizjak Sept. 23, 2014, 3:33 p.m. UTC | #4
On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>>> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>>>
>>> Index: recog.c
>>> ===================================================================
>>> --- recog.c     (revision 215337)
>>> +++ recog.c     (working copy)
>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>>                                || (strict < 0 && CONSTANT_P (op))
>>>                                /* During reload, accept a pseudo  */
>>>                                || (reload_in_progress && REG_P (op)
>>> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
>>> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>>> +                              /* LRA can put reg value into memory if
>>> +                                 it is necessary.  */
>>> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>>>                     win = 1;
>>>                   else if (insn_extra_address_constraint (cn)
>>>                            /* Every address operand can be reloaded to fit.  */
>>>
>>> But that is a different story (for insns with single alternative containing only "m").
>>>
>>> I guess I should submit such change for recog.c as a separate patch.
>> I think that the above is the right approach to fix PR60704, so the
>> current PR60704 fix [1] should be reverted.
>>
>> [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>>
>>
> Ok. I can submit patch reverting it + the change in recog.c.
>
> I have still a question: do we really need
>
> (eq_attr "alternative" "1")
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>                            || optimize_function_for_size_p (cfun)")
>
> As I wrote I'd always enable the alternative.  I don't expect performance improvement in disabling this alternative when path r->x is slow (as I heard it is implemented internally by moving through cache anyway).  Even it is slow I believe it is still not faster than r->m->x.  What do you think?

The "r->x" alternative results in "vector" decoding on amdfam10. This
is AMD-speak for microcoded instructions, and AMD optimization manual
strongly recommends avoiding them. I have CC'd Ganesh, maybe he can
provide more relevant data on the performance impact.

Thanks,
Uros.
Richard Biener Sept. 23, 2014, 4:40 p.m. UTC | #5
On September 23, 2014 5:33:35 PM CEST, Uros Bizjak <ubizjak@gmail.com> wrote:
>On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmakarov@redhat.com>
>wrote:
>
>>>> You are right constrain_operands is not upto LRA possibilities and
>we should make the following change:
>>>>
>>>> Index: recog.c
>>>> ===================================================================
>>>> --- recog.c     (revision 215337)
>>>> +++ recog.c     (working copy)
>>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>>>                                || (strict < 0 && CONSTANT_P (op))
>>>>                                /* During reload, accept a pseudo 
>*/
>>>>                                || (reload_in_progress && REG_P (op)
>>>> -                                  && REGNO (op) >=
>FIRST_PSEUDO_REGISTER)))
>>>> +                                  && REGNO (op) >=
>FIRST_PSEUDO_REGISTER)
>>>> +                              /* LRA can put reg value into memory
>if
>>>> +                                 it is necessary.  */
>>>> +                              || (strict <= 0 && targetm.lra_p ()
>&& REG_P (op)))
>>>>                     win = 1;
>>>>                   else if (insn_extra_address_constraint (cn)
>>>>                            /* Every address operand can be reloaded
>to fit.  */
>>>>
>>>> But that is a different story (for insns with single alternative
>containing only "m").
>>>>
>>>> I guess I should submit such change for recog.c as a separate
>patch.
>>> I think that the above is the right approach to fix PR60704, so the
>>> current PR60704 fix [1] should be reverted.
>>>
>>> [1]
>https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>>>
>>>
>> Ok. I can submit patch reverting it + the change in recog.c.
>>
>> I have still a question: do we really need
>>
>> (eq_attr "alternative" "1")
>>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>>                            || optimize_function_for_size_p (cfun)")
>>
>> As I wrote I'd always enable the alternative.  I don't expect
>performance improvement in disabling this alternative when path r->x is
>slow (as I heard it is implemented internally by moving through cache
>anyway).  Even it is slow I believe it is still not faster than
>r->m->x.  What do you think?
>
>The "r->x" alternative results in "vector" decoding on amdfam10. This
>is AMD-speak for microcoded instructions, and AMD optimization manual
>strongly recommends avoiding them. I have CC'd Ganesh, maybe he can
>provide more relevant data on the performance impact.

IIRC a vector decoded instruction merely limits the frontend which can at most decode and dispatch one such insn at a time. So the performance impact depends on the context.

Richard.

>Thanks,
>Uros.
diff mbox

Patch

Index: recog.c
===================================================================
--- recog.c     (revision 215337)
+++ recog.c     (working copy)
@@ -2639,7 +2639,10 @@  constrain_operands (int strict)
                               || (strict < 0 && CONSTANT_P (op))
                               /* During reload, accept a pseudo  */
                               || (reload_in_progress && REG_P (op)
-                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
+                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
+                              /* LRA can put reg value into memory if
+                                 it is necessary.  */
+                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
                    win = 1;
                  else if (insn_extra_address_constraint (cn)
                           /* Every address operand can be reloaded to fit.  */