diff mbox

Reduce stack usage in sha512 (PR target/77308)

Message ID AM4PR0701MB2162BAB5D4B1D2548BC519F1E4D30@AM4PR0701MB2162.eurprd07.prod.outlook.com
State New
Headers show

Commit Message

Bernd Edlinger Oct. 18, 2016, 2:45 p.m. UTC
On 10/18/16 10:36, Christophe Lyon wrote:
>

> I am seeing a lot of regressions since this patch was committed:

> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>

> (you can click on "REGRESSED" to see the list of regressions, "sum"

> and "log" to download

> the corresponding .sum/.log)

>

> Thanks,

>

> Christophe

>


Oh, sorry, I have completely missed that.
Unfortunately I have tested one of the good combinations.

I have not considered the case that in and out could be
the same register.  But that happens here.


This should solve it.

Can you give it a try?



Thanks
Bernd.

Comments

Christophe Lyon Oct. 18, 2016, 3:35 p.m. UTC | #1
On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> On 10/18/16 10:36, Christophe Lyon wrote:
>>
>> I am seeing a lot of regressions since this patch was committed:
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html
>>
>> (you can click on "REGRESSED" to see the list of regressions, "sum"
>> and "log" to download
>> the corresponding .sum/.log)
>>
>> Thanks,
>>
>> Christophe
>>
>
> Oh, sorry, I have completely missed that.
> Unfortunately I have tested one of the good combinations.
>
> I have not considered the case that in and out could be
> the same register.  But that happens here.
>
>
> This should solve it.
>
> Can you give it a try?
>

Sure, I have started the validations, it will take a few hours.

>
>
> Thanks
> Bernd.
Christophe Lyon Oct. 19, 2016, 6:55 a.m. UTC | #2
On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>> On 10/18/16 10:36, Christophe Lyon wrote:
>>>
>>> I am seeing a lot of regressions since this patch was committed:
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html
>>>
>>> (you can click on "REGRESSED" to see the list of regressions, "sum"
>>> and "log" to download
>>> the corresponding .sum/.log)
>>>
>>> Thanks,
>>>
>>> Christophe
>>>
>>
>> Oh, sorry, I have completely missed that.
>> Unfortunately I have tested one of the good combinations.
>>
>> I have not considered the case that in and out could be
>> the same register.  But that happens here.
>>
>>
>> This should solve it.
>>
>> Can you give it a try?
>>
>
> Sure, I have started the validations, it will take a few hours.
>

It looks OK, thanks.

>>
>>
>> Thanks
>> Bernd.
Kyrill Tkachov Oct. 19, 2016, 8:34 a.m. UTC | #3
On 19/10/16 07:55, Christophe Lyon wrote:
> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>>> On 10/18/16 10:36, Christophe Lyon wrote:
>>>> I am seeing a lot of regressions since this patch was committed:
>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html
>>>>
>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"
>>>> and "log" to download
>>>> the corresponding .sum/.log)
>>>>
>>>> Thanks,
>>>>
>>>> Christophe
>>>>
>>> Oh, sorry, I have completely missed that.
>>> Unfortunately I have tested one of the good combinations.
>>>
>>> I have not considered the case that in and out could be
>>> the same register.  But that happens here.
>>>
>>>
>>> This should solve it.
>>>
>>> Can you give it a try?
>>>
>> Sure, I have started the validations, it will take a few hours.
>>
> It looks OK, thanks.

Ok. Thanks for testing Christophe.
Sorry for not catching it in review.
Kyrill

>>>
>>> Thanks
>>> Bernd.
Christophe Lyon Oct. 19, 2016, 10:44 a.m. UTC | #4
On 19 October 2016 at 10:34, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>
> On 19/10/16 07:55, Christophe Lyon wrote:
>>
>> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org>
>> wrote:
>>>
>>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de>
>>> wrote:
>>>>
>>>> On 10/18/16 10:36, Christophe Lyon wrote:
>>>>>
>>>>> I am seeing a lot of regressions since this patch was committed:
>>>>>
>>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html
>>>>>
>>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"
>>>>> and "log" to download
>>>>> the corresponding .sum/.log)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Christophe
>>>>>
>>>> Oh, sorry, I have completely missed that.
>>>> Unfortunately I have tested one of the good combinations.
>>>>
>>>> I have not considered the case that in and out could be
>>>> the same register.  But that happens here.
>>>>
>>>>
>>>> This should solve it.
>>>>
>>>> Can you give it a try?
>>>>
>>> Sure, I have started the validations, it will take a few hours.
>>>
>> It looks OK, thanks.
>
>
> Ok. Thanks for testing Christophe.
> Sorry for not catching it in review.
> Kyrill
>

Since it was painful to check that this 2nd patch fixed all the regressions
observed in all the configurations, I ran another validation with
the 2 patches combined, on top of r241272 to check the effect of the two
over the previous reference.

It turns out there is still a failure:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/241272-r241273-combined/report-build-info.html
gcc.c-torture/execute/pr34971.c   -O0  execution test
now fails on arm-none-linux-gnueabihf when using thumb mode, expect
when targeting cortex-a5.

Christophe

>>>>
>>>> Thanks
>>>> Bernd.
>
>
Bernd Edlinger Oct. 19, 2016, 4:06 p.m. UTC | #5
On 10/19/16 12:44, Christophe Lyon wrote:
> On 19 October 2016 at 10:34, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:

>>

>> On 19/10/16 07:55, Christophe Lyon wrote:

>>>

>>> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org>

>>> wrote:

>>>>

>>>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de>

>>>> wrote:

>>>>>

>>>>> On 10/18/16 10:36, Christophe Lyon wrote:

>>>>>>

>>>>>> I am seeing a lot of regressions since this patch was committed:

>>>>>>

>>>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

>>>>>>

>>>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"

>>>>>> and "log" to download

>>>>>> the corresponding .sum/.log)

>>>>>>

>>>>>> Thanks,

>>>>>>

>>>>>> Christophe

>>>>>>

>>>>> Oh, sorry, I have completely missed that.

>>>>> Unfortunately I have tested one of the good combinations.

>>>>>

>>>>> I have not considered the case that in and out could be

>>>>> the same register.  But that happens here.

>>>>>

>>>>>

>>>>> This should solve it.

>>>>>

>>>>> Can you give it a try?

>>>>>

>>>> Sure, I have started the validations, it will take a few hours.

>>>>

>>> It looks OK, thanks.

>>

>>

>> Ok. Thanks for testing Christophe.

>> Sorry for not catching it in review.

>> Kyrill

>>

>

> Since it was painful to check that this 2nd patch fixed all the regressions

> observed in all the configurations, I ran another validation with

> the 2 patches combined, on top of r241272 to check the effect of the two

> over the previous reference.

>

> It turns out there is still a failure:

> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/241272-r241273-combined/report-build-info.html

> gcc.c-torture/execute/pr34971.c   -O0  execution test

> now fails on arm-none-linux-gnueabihf when using thumb mode, expect

> when targeting cortex-a5.

>


Thanks Christophe, I looked in the test case,
and I think that is a pre-existing issue.

I can make the test case fail before my patch:

struct foo
{
   unsigned long long b:40;
} x;

extern void abort (void);

void test1(unsigned long long res)
{
   /* Build a rotate expression on a 40 bit argument.  */
   if ((x.b<<9) + (x.b>>31) != res)
     abort ();
}

int main()
{
   x.b = 0x0100000001;
   test1(0x0000000202);
   x.b = 0x0100000000;
   test1(0x0000000002);
   return 0;
}


fails with -mcpu=cortex-a9 -mthumb -mfpu=neon-fp16 -O0
even before my patch.

before reload we have this insn:
(insn 19 18 20 2 (parallel [
             (set (reg:DI 113 [ _4 ])
                 (lshiftrt:DI (reg:DI 112 [ _3 ])
                     (const_int 31 [0x1f])))
             (clobber (scratch:SI))
             (clobber (scratch:SI))
             (clobber (scratch:DI))
             (clobber (reg:CC 100 cc))
         ]) "pr34971.c":11 1232 {lshrdi3_neon}


which is replaced to this insn:
(insn 19 18 20 2 (parallel [
             (set (reg:DI 1 r1 [orig:113 _4 ] [113])
                 (lshiftrt:DI (reg:DI 0 r0 [orig:112 _3 ] [112])
                     (const_int 31 [0x1f])))
             (clobber (scratch:SI))
             (clobber (scratch:SI))
             (clobber (scratch:DI))
             (clobber (reg:CC 100 cc))
         ]) "pr34971.c":11 1232 {lshrdi3_neon}
      (nil))

And now that is not supposed to happen, when this code
is executed for shifts by a constant less than 32:

           emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));
           emit_insn (SET (out_down,
                           ORR (REV_LSHIFT (code, in_up, reverse_amount),
                                out_down)));
           emit_insn (SET (out_up, SHIFT (code, in_up, amount)));


out_down may not be the same hard register than in_up for this to work.

I opened a new BZ for this:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041

I am not sure if this is a LRA issue or if it can be handled in the
shift expansion.

Is the second patch good for trunk as it is?


Thanks
Bernd.
Kyrill Tkachov Oct. 19, 2016, 4:46 p.m. UTC | #6
On 19/10/16 17:06, Bernd Edlinger wrote:
> On 10/19/16 12:44, Christophe Lyon wrote:
>> On 19 October 2016 at 10:34, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>>> On 19/10/16 07:55, Christophe Lyon wrote:
>>>> On 18 October 2016 at 17:35, Christophe Lyon <christophe.lyon@linaro.org>
>>>> wrote:
>>>>> On 18 October 2016 at 16:45, Bernd Edlinger <bernd.edlinger@hotmail.de>
>>>>> wrote:
>>>>>> On 10/18/16 10:36, Christophe Lyon wrote:
>>>>>>> I am seeing a lot of regressions since this patch was committed:
>>>>>>>
>>>>>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html
>>>>>>>
>>>>>>> (you can click on "REGRESSED" to see the list of regressions, "sum"
>>>>>>> and "log" to download
>>>>>>> the corresponding .sum/.log)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Christophe
>>>>>>>
>>>>>> Oh, sorry, I have completely missed that.
>>>>>> Unfortunately I have tested one of the good combinations.
>>>>>>
>>>>>> I have not considered the case that in and out could be
>>>>>> the same register.  But that happens here.
>>>>>>
>>>>>>
>>>>>> This should solve it.
>>>>>>
>>>>>> Can you give it a try?
>>>>>>
>>>>> Sure, I have started the validations, it will take a few hours.
>>>>>
>>>> It looks OK, thanks.
>>>
>>> Ok. Thanks for testing Christophe.
>>> Sorry for not catching it in review.
>>> Kyrill
>>>
>> Since it was painful to check that this 2nd patch fixed all the regressions
>> observed in all the configurations, I ran another validation with
>> the 2 patches combined, on top of r241272 to check the effect of the two
>> over the previous reference.
>>
>> It turns out there is still a failure:
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/241272-r241273-combined/report-build-info.html
>> gcc.c-torture/execute/pr34971.c   -O0  execution test
>> now fails on arm-none-linux-gnueabihf when using thumb mode, expect
>> when targeting cortex-a5.
>>
> Thanks Christophe, I looked in the test case,
> and I think that is a pre-existing issue.
>
> I can make the test case fail before my patch:
>
> struct foo
> {
>     unsigned long long b:40;
> } x;
>
> extern void abort (void);
>
> void test1(unsigned long long res)
> {
>     /* Build a rotate expression on a 40 bit argument.  */
>     if ((x.b<<9) + (x.b>>31) != res)
>       abort ();
> }
>
> int main()
> {
>     x.b = 0x0100000001;
>     test1(0x0000000202);
>     x.b = 0x0100000000;
>     test1(0x0000000002);
>     return 0;
> }
>
>
> fails with -mcpu=cortex-a9 -mthumb -mfpu=neon-fp16 -O0
> even before my patch.
>
> before reload we have this insn:
> (insn 19 18 20 2 (parallel [
>               (set (reg:DI 113 [ _4 ])
>                   (lshiftrt:DI (reg:DI 112 [ _3 ])
>                       (const_int 31 [0x1f])))
>               (clobber (scratch:SI))
>               (clobber (scratch:SI))
>               (clobber (scratch:DI))
>               (clobber (reg:CC 100 cc))
>           ]) "pr34971.c":11 1232 {lshrdi3_neon}
>
>
> which is replaced to this insn:
> (insn 19 18 20 2 (parallel [
>               (set (reg:DI 1 r1 [orig:113 _4 ] [113])
>                   (lshiftrt:DI (reg:DI 0 r0 [orig:112 _3 ] [112])
>                       (const_int 31 [0x1f])))
>               (clobber (scratch:SI))
>               (clobber (scratch:SI))
>               (clobber (scratch:DI))
>               (clobber (reg:CC 100 cc))
>           ]) "pr34971.c":11 1232 {lshrdi3_neon}
>        (nil))
>
> And now that is not supposed to happen, when this code
> is executed for shifts by a constant less than 32:
>
>             emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));
>             emit_insn (SET (out_down,
>                             ORR (REV_LSHIFT (code, in_up, reverse_amount),
>                                  out_down)));
>             emit_insn (SET (out_up, SHIFT (code, in_up, amount)));
>
>
> out_down may not be the same hard register than in_up for this to work.
>
> I opened a new BZ for this:
>    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041
>
> I am not sure if this is a LRA issue or if it can be handled in the
> shift expansion.
>
> Is the second patch good for trunk as it is?

Thanks for the investigation and the bug report.
The patch is ok.
Kyrill

>
> Thanks
> Bernd.
diff mbox

Patch

2016-10-18  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/arm/arm.c (arm_emit_coreregs_64bit_shift): Clear the result
	register only if "in" and "out" are different registers.

--- gcc/config/arm/arm.c.orig	2016-10-17 11:00:34.045673223 +0200
+++ gcc/config/arm/arm.c	2016-10-18 14:53:06.710101327 +0200
@@ -29218,8 +29218,10 @@  arm_emit_coreregs_64bit_shift (enum rtx_
 
 	  /* Clearing the out register in DImode first avoids lots
 	     of spilling and results in less stack usage.
-	     Later this redundant insn is completely removed.  */
-	  emit_insn (SET (out, const0_rtx));
+	     Later this redundant insn is completely removed.
+	     Do that only if "in" and "out" are different registers.  */
+	  if (REG_P (out) && REG_P (in) && REGNO (out) != REGNO (in))
+	    emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, LSHIFT (code, in_down, amount)));
 	  emit_insn (SET (out_down,
 			  ORR (REV_LSHIFT (code, in_up, reverse_amount),
@@ -29231,11 +29233,14 @@  arm_emit_coreregs_64bit_shift (enum rtx_
 	  /* Shifts by a constant greater than 31.  */
 	  rtx adj_amount = GEN_INT (INTVAL (amount) - 32);
 
-	  emit_insn (SET (out, const0_rtx));
+	  if (REG_P (out) && REG_P (in) && REGNO (out) != REGNO (in))
+	    emit_insn (SET (out, const0_rtx));
 	  emit_insn (SET (out_down, SHIFT (code, in_up, adj_amount)));
 	  if (code == ASHIFTRT)
 	    emit_insn (gen_ashrsi3 (out_up, in_up,
 				    GEN_INT (31)));
+	  else
+	    emit_insn (SET (out_up, const0_rtx));
 	}
     }
   else