diff mbox

Simplification of 1U << (31 - x)

Message ID VI1PR08MB265576266CC24A0553CCB94898B30@VI1PR08MB2655.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Sudakshina Das Aug. 1, 2017, 9:14 a.m. UTC
Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification?
I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch.



Thanks

Sudi 




From: Wilco Dijkstra
Sent: Thursday, April 13, 2017 1:01 PM
To: Richard Biener; Jakub Jelinek
Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
Richard Biener wrote:
> It is IMHO a valid GIMPLE optimization / canonicalization.
>
>        movabsq $-9223372036854775808, %rax
>
> so this should then have been generated as 1<<63?
>
> At some point variable shifts were quite expensive as well..

Yes I don't see a major difference between movabsq and

        movl    $1, %eax
        salq    $63, %rax

on my Sandy Bridge, but if the above is faster then that is what the x64
backend should emit - it's 1 byte smaller as well, so probably better in all
cases.

Wilco

Comments

Richard Biener Aug. 4, 2017, 10:16 a.m. UTC | #1
On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <Sudi.Das@arm.com> wrote:
>
>
>
>
> Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification?

Yes, I think so.  Jakub requested code to undo this at RTL expansion
based on target costs, not sure if we really should
require that from you given the user could have written the target
sequence himself.

Few comments about the patch:

+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus INTEGER_CST@1 @2))

I think this warrants a single_use check on the minus (note :s isn't enough
as with the unsigned case we'd happily ignore it by design).

+  (if (INTEGRAL_TYPE_P (type)
+       && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+       && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))

You can relax this with using

          && wi::eq_p (@1, TYPE_PRECISION (type) - 1)

+   (if (TYPE_UNSIGNED(type))
+     (rshift (lshift @0 @1) @2)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert:type (rshift (lshift (convert:utype @0) @1) @2))))))
+

You can write (convert (rshift ...)), without the :type.

I'm leaving it to Jakub whether you need to write that RTL expansion tweak.

Thanks,
Richard.

> I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch.
>
> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
> @@ -24,10 +24,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f2, .-f2
> @@ -39,10 +37,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f3, .-f3
> @@ -52,11 +48,9 @@
>  f4:
>      sub    sp, sp, #16
>      str    w0, [sp, 12]
> -    mov    w1, 31
>      ldr    w0, [sp, 12]
> -    sub    w0, w1, w0
> -    mov    w1, 1
> -    lsl    w0, w1, w0
> +    mov    w1, -2147483648
> +    lsr    w0, w1, w0
>      add    sp, sp, 16
>      ret
>      .size    f4, .-f4
>
>
> Thanks
>
> Sudi
>
>
>
>
> From: Wilco Dijkstra
> Sent: Thursday, April 13, 2017 1:01 PM
> To: Richard Biener; Jakub Jelinek
> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> Richard Biener wrote:
>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>
>>        movabsq $-9223372036854775808, %rax
>>
>> so this should then have been generated as 1<<63?
>>
>> At some point variable shifts were quite expensive as well..
>
> Yes I don't see a major difference between movabsq and
>
>         movl    $1, %eax
>         salq    $63, %rax
>
> on my Sandy Bridge, but if the above is faster then that is what the x64
> backend should emit - it's 1 byte smaller as well, so probably better in all
> cases.
>
> Wilco
Sudakshina Das Sept. 26, 2017, 12:44 p.m. UTC | #2
Still waiting on Jakub's comment on whether there are more things needed at the backend. But I have updated the patch according to Richard's comments.

Thanks
Sudi



From: Richard Biener <richard.guenther@gmail.com>
Sent: Friday, August 4, 2017 11:16 AM
To: Sudi Das
Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <Sudi.Das@arm.com> wrote:
>
>
>
>
> Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification?

Yes, I think so.  Jakub requested code to undo this at RTL expansion
based on target costs, not sure if we really should
require that from you given the user could have written the target
sequence himself.

Few comments about the patch:

+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus INTEGER_CST@1 @2))

I think this warrants a single_use check on the minus (note :s isn't enough
as with the unsigned case we'd happily ignore it by design).

+  (if (INTEGRAL_TYPE_P (type)
+       && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
+       && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))

You can relax this with using

          && wi::eq_p (@1, TYPE_PRECISION (type) - 1)

+   (if (TYPE_UNSIGNED(type))
+     (rshift (lshift @0 @1) @2)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert:type (rshift (lshift (convert:utype @0) @1) @2))))))
+

You can write (convert (rshift ...)), without the :type.

I'm leaving it to Jakub whether you need to write that RTL expansion tweak.

Thanks,
Richard.

> I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch.
>
> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
> @@ -24,10 +24,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f2, .-f2
> @@ -39,10 +37,8 @@
>      str    x0, [sp, 8]
>      ldr    x0, [sp, 8]
>      mov    w1, w0
> -    mov    w0, 63
> -    sub    w0, w0, w1
> -    mov    x1, 1
> -    lsl    x0, x1, x0
> +    mov    x0, -9223372036854775808
> +    lsr    x0, x0, x1
>      add    sp, sp, 16
>      ret
>      .size    f3, .-f3
> @@ -52,11 +48,9 @@
>  f4:
>      sub    sp, sp, #16
>      str    w0, [sp, 12]
> -    mov    w1, 31
>      ldr    w0, [sp, 12]
> -    sub    w0, w1, w0
> -    mov    w1, 1
> -    lsl    w0, w1, w0
> +    mov    w1, -2147483648
> +    lsr    w0, w1, w0
>      add    sp, sp, 16
>      ret
>      .size    f4, .-f4
>
>
> Thanks
>
> Sudi
>
>
>
>
> From: Wilco Dijkstra
> Sent: Thursday, April 13, 2017 1:01 PM
> To: Richard Biener; Jakub Jelinek
> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> Richard Biener wrote:
>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>
>>        movabsq $-9223372036854775808, %rax
>>
>> so this should then have been generated as 1<<63?
>>
>> At some point variable shifts were quite expensive as well..
>
> Yes I don't see a major difference between movabsq and
>
>         movl    $1, %eax
>         salq    $63, %rax
>
> on my Sandy Bridge, but if the above is faster then that is what the x64
> backend should emit - it's 1 byte smaller as well, so probably better in all
> cases.
>
> Wilco
diff --git a/gcc/match.pd b/gcc/match.pd
index e9017e4..160c12d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && tree_nop_conversion_p (type, TREE_TYPE (@1)))
    (lshift @0 @2)))
 
+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3))
+  (if (INTEGRAL_TYPE_P (type)
+       && wi::eq_p (@2, TYPE_PRECISION (type) - 1)
+       && single_use (@1))
+   (if (TYPE_UNSIGNED(type))
+     (rshift (lshift @0 @2) @3)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert (rshift (lshift (convert:utype @0) @2) @3))))))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c
new file mode 100644
index 0000000..317ea3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80131-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+/* Checks the simplification of:
+   1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1
+   f1 is not simplified but f2, f3 and f4 are. */
+
+__INT64_TYPE__ f1 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (31 - i);
+}
+
+__INT64_TYPE__ f2 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (63 - i);
+}
+
+__UINT64_TYPE__ f3 (__INT64_TYPE__ i)
+{
+  return (__UINT64_TYPE__)1 << (63 - i);
+}
+
+__INT32_TYPE__ f4 (__INT32_TYPE__ i)
+{
+  return (__INT32_TYPE__)1 << (31 - i);
+}
+
+/* { dg-final { scan-tree-dump-times "= 31 -"  1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */
Jakub Jelinek Sept. 26, 2017, 1:06 p.m. UTC | #3
On Tue, Sep 26, 2017 at 12:44:10PM +0000, Sudi Das wrote:
> 
> Still waiting on Jakub's comment on whether there are more things needed
> at the backend.  But I have updated the patch according to Richard's
> comments.

Well, we don't want to regress performance wise on one of the most important
primary targets.  I don't care that much if the RTL/backend work is done
together with the patch, or as a follow-up during stage1/3, but it should be
done, the testcases I've posted can be used as a basis of a P1 runtime
performance regression.

	Jakub
Wilco Dijkstra Sept. 26, 2017, 1:20 p.m. UTC | #4
Jakub Jelinek wrote:

> Well, we don't want to regress performance wise on one of the most important
> primary targets.  I don't care that much if the RTL/backend work is done
> together with the patch, or as a follow-up during stage1/3, but it should be
> done, the testcases I've posted can be used as a basis of a P1 runtime
> performance regression.

It should be sufficient to file a bug about inefficient 64-bit constant expansions on
x64. I didn't see a significant difference in my benchmarking of it on x64, so I'd say
it's only a performance regression if large benchmarks regress measurably (quite
unlikely).

Wilco
Sudakshina Das Oct. 6, 2017, 5 p.m. UTC | #5
Hi Jakub

As per the discussions, I have a created a bug report for the possible regression this may cause.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82454

Sudi


From: Wilco Dijkstra
Sent: Tuesday, September 26, 2017 2:20 PM
To: Sudi Das; Jakub Jelinek
Cc: Richard Biener; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
Jakub Jelinek wrote:

> Well, we don't want to regress performance wise on one of the most important
> primary targets.  I don't care that much if the RTL/backend work is done
> together with the patch, or as a follow-up during stage1/3, but it should be
> done, the testcases I've posted can be used as a basis of a P1 runtime
> performance regression.

It should be sufficient to file a bug about inefficient 64-bit constant expansions on
x64. I didn't see a significant difference in my benchmarking of it on x64, so I'd say
it's only a performance regression if large benchmarks regress measurably (quite
unlikely).

Wilco
Richard Biener Oct. 9, 2017, 11:55 a.m. UTC | #6
On Tue, Sep 26, 2017 at 2:44 PM, Sudi Das <Sudi.Das@arm.com> wrote:
>
> Still waiting on Jakub's comment on whether there are more things needed at the backend. But I have updated the patch according to Richard's comments.

+   (if (TYPE_UNSIGNED(type))

space before '(type)'.

+     (rshift (lshift @0 @2) @3)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert (rshift (lshift (convert:utype @0) @2) @3))))))

I think your testcase needs a

{ dg-require-effective-target int32plus }

as otherwise it'll fail on AVR.

I think the patch is ok with these changes but obviously we should try
to address
the code-generation issue on x86 at RTL expansion time.  They are sort-of
existing missing optimizations.

Richard.

> Thanks
> Sudi
>
>
>
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, August 4, 2017 11:16 AM
> To: Sudi Das
> Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>
> On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <Sudi.Das@arm.com> wrote:
>>
>>
>>
>>
>> Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification?
>
> Yes, I think so.  Jakub requested code to undo this at RTL expansion
> based on target costs, not sure if we really should
> require that from you given the user could have written the target
> sequence himself.
>
> Few comments about the patch:
>
> +/* Fold (1 << (C - x)) where C = precision(type) - 1
> +   into ((1 << C) >> x). */
> +(simplify
> + (lshift integer_onep@0 (minus INTEGER_CST@1 @2))
>
> I think this warrants a single_use check on the minus (note :s isn't enough
> as with the unsigned case we'd happily ignore it by design).
>
> +  (if (INTEGRAL_TYPE_P (type)
> +       && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT
> +       && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1))
>
> You can relax this with using
>
>           && wi::eq_p (@1, TYPE_PRECISION (type) - 1)
>
> +   (if (TYPE_UNSIGNED(type))
> +     (rshift (lshift @0 @1) @2)
> +   (with
> +    { tree utype = unsigned_type_for (type); }
> +    (convert:type (rshift (lshift (convert:utype @0) @1) @2))))))
> +
>
> You can write (convert (rshift ...)), without the :type.
>
> I'm leaving it to Jakub whether you need to write that RTL expansion tweak.
>
> Thanks,
> Richard.
>
>> I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch.
>>
>> --- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
>> +++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
>> @@ -24,10 +24,8 @@
>>      str    x0, [sp, 8]
>>      ldr    x0, [sp, 8]
>>      mov    w1, w0
>> -    mov    w0, 63
>> -    sub    w0, w0, w1
>> -    mov    x1, 1
>> -    lsl    x0, x1, x0
>> +    mov    x0, -9223372036854775808
>> +    lsr    x0, x0, x1
>>      add    sp, sp, 16
>>      ret
>>      .size    f2, .-f2
>> @@ -39,10 +37,8 @@
>>      str    x0, [sp, 8]
>>      ldr    x0, [sp, 8]
>>      mov    w1, w0
>> -    mov    w0, 63
>> -    sub    w0, w0, w1
>> -    mov    x1, 1
>> -    lsl    x0, x1, x0
>> +    mov    x0, -9223372036854775808
>> +    lsr    x0, x0, x1
>>      add    sp, sp, 16
>>      ret
>>      .size    f3, .-f3
>> @@ -52,11 +48,9 @@
>>  f4:
>>      sub    sp, sp, #16
>>      str    w0, [sp, 12]
>> -    mov    w1, 31
>>      ldr    w0, [sp, 12]
>> -    sub    w0, w1, w0
>> -    mov    w1, 1
>> -    lsl    w0, w1, w0
>> +    mov    w1, -2147483648
>> +    lsr    w0, w1, w0
>>      add    sp, sp, 16
>>      ret
>>      .size    f4, .-f4
>>
>>
>> Thanks
>>
>> Sudi
>>
>>
>>
>>
>> From: Wilco Dijkstra
>> Sent: Thursday, April 13, 2017 1:01 PM
>> To: Richard Biener; Jakub Jelinek
>> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
>> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
>>
>> Richard Biener wrote:
>>> It is IMHO a valid GIMPLE optimization / canonicalization.
>>>
>>>        movabsq $-9223372036854775808, %rax
>>>
>>> so this should then have been generated as 1<<63?
>>>
>>> At some point variable shifts were quite expensive as well..
>>
>> Yes I don't see a major difference between movabsq and
>>
>>         movl    $1, %eax
>>         salq    $63, %rax
>>
>> on my Sandy Bridge, but if the above is faster then that is what the x64
>> backend should emit - it's 1 byte smaller as well, so probably better in all
>> cases.
>>
>> Wilco
>
Wilco Dijkstra Oct. 9, 2017, 1:02 p.m. UTC | #7
Richard Biener wrote:

> I think the patch is ok with these changes but obviously we should try
> to address
> the code-generation issue on x86 at RTL expansion time.  They are sort-of
> existing missing optimizations.

Note the only x64 specific issue is the backend expansion of 64-bit immediates
which could be improved like I suggested. However what we're really missing
is a generic optimization pass that tries to simplify immediates using accurate
target costs. In eg. x >= C or x > C we can use either C or C-1 - on many targets
one option may be a single instruction, while the other might take 2 or even 3.

Wilco
Sudakshina Das Oct. 10, 2017, 10:57 a.m. UTC | #8
Thanks, I have made the changes to the patch.
Also can someone please apply it for me. I do not have commit access.

2017-10-10  Sudakshina Das  <sudi.das@arm.com>

	PR middle-end/80131
	* match.pd: Simplify 1 << (C - x) where C = precision (x) - 1.

2017-10-10  Sudakshina Das  <sudi.das@arm.com>

	PR middle-end/80131
	* testsuite/gcc.dg/pr80131-1.c: New Test.


With regards to the existing missed optimizations needed to the x86 RTL expansion, I think the discussions can take place on the bug report that I created and maybe someone will pick it up.

Thanks
Sudi




From: Wilco Dijkstra
Sent: Monday, October 9, 2017 2:02 PM
To: Richard Biener; Sudi Das
Cc: Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh
Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x)
    
Richard Biener wrote:

> I think the patch is ok with these changes but obviously we should try
> to address
> the code-generation issue on x86 at RTL expansion time.  They are sort-of
> existing missing optimizations.

Note the only x64 specific issue is the backend expansion of 64-bit immediates
which could be improved like I suggested. However what we're really missing
is a generic optimization pass that tries to simplify immediates using accurate
target costs. In eg. x >= C or x > C we can use either C or C-1 - on many targets
one option may be a single instruction, while the other might take 2 or even 3.

Wilco
diff --git a/gcc/match.pd b/gcc/match.pd
index e58a65a..7a25a1b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && tree_nop_conversion_p (type, TREE_TYPE (@1)))
    (lshift @0 @2)))
 
+/* Fold (1 << (C - x)) where C = precision(type) - 1
+   into ((1 << C) >> x). */
+(simplify
+ (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3))
+  (if (INTEGRAL_TYPE_P (type)
+       && wi::eq_p (@2, TYPE_PRECISION (type) - 1)
+       && single_use (@1))
+   (if (TYPE_UNSIGNED (type))
+     (rshift (lshift @0 @2) @3)
+   (with
+    { tree utype = unsigned_type_for (type); }
+    (convert (rshift (lshift (convert:utype @0) @2) @3))))))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c
new file mode 100644
index 0000000..0bfe1f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80131-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+/* Checks the simplification of:
+   1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1
+   f1 is not simplified but f2, f3 and f4 are. */
+
+__INT64_TYPE__ f1 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (31 - i);
+}
+
+__INT64_TYPE__ f2 (__INT64_TYPE__ i)
+{
+  return (__INT64_TYPE__)1 << (63 - i);
+}
+
+__UINT64_TYPE__ f3 (__INT64_TYPE__ i)
+{
+  return (__UINT64_TYPE__)1 << (63 - i);
+}
+
+__INT32_TYPE__ f4 (__INT32_TYPE__ i)
+{
+  return (__INT32_TYPE__)1 << (31 - i);
+}
+
+/* { dg-final { scan-tree-dump-times "= 31 -"  1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */
Wilco Dijkstra Nov. 7, 2017, 12:28 p.m. UTC | #9
Sudi Das wrote:

> Thanks, I have made the changes to the patch.
> Also can someone please apply it for me. I do not have commit access.
>
> 2017-10-10  Sudakshina Das  <sudi.das@arm.com>
>
>        PR middle-end/80131
>        * match.pd: Simplify 1 << (C - x) where C = precision (x) - 1.
>
> 2017-10-10  Sudakshina Das  <sudi.das@arm.com>
>
>        PR middle-end/80131
>        * testsuite/gcc.dg/pr80131-1.c: New Test.
>
>
> With regards to the existing missed optimizations needed to the x86 RTL expansion, 
> I think the discussions can take place on the bug report that I created and maybe someone will pick it up.

I've committed this as r254496.

Wilco
Christophe Lyon Nov. 7, 2017, 1:23 p.m. UTC | #10
Hi Wilco

On 7 November 2017 at 13:28, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Sudi Das wrote:
>
>> Thanks, I have made the changes to the patch.
>> Also can someone please apply it for me. I do not have commit access.
>>
>> 2017-10-10  Sudakshina Das  <sudi.das@arm.com>
>>
>>        PR middle-end/80131
>>        * match.pd: Simplify 1 << (C - x) where C = precision (x) - 1.
>>
>> 2017-10-10  Sudakshina Das  <sudi.das@arm.com>
>>
>>        PR middle-end/80131
>>        * testsuite/gcc.dg/pr80131-1.c: New Test.
>>
>>
>> With regards to the existing missed optimizations needed to the x86 RTL expansion,
>> I think the discussions can take place on the bug report that I created and maybe someone will pick it up.
>
> I've committed this as r254496.

This causes my builds (all arm and aarch64 targets) to fail:
g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I.
-I. -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/.
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../include
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libcpp/include
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc1/./gmp
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gmp
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc1/./mpfr/src
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/mpfr/src
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/mpc/src
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libdecnumber
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libdecnumber/dpd
-I../libdecnumber
-I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libbacktrace
  -o gimple-match.o -MT gimple-match.o -MMD -MP -MF
./.deps/gimple-match.TPo gimple-match.c
In file included from
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:397,
                 from
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:22,
                 from gimple-match.c:4:
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:
In function ‘bool wi::eq_p(const T1&, const T2&) [with T1 =
tree_node*, T2 = int]’:
gimple-match.c:49533:   instantiated from here
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
gimple-match.c:49533:   instantiated from here
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:
In function ‘unsigned int wi::get_binary_precision(const T1&, const
T2&) [with T1 = tree_node*, T2 = int]’:
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1763:
  instantiated from ‘bool wi::eq_p(const T1&, const T2&) [with T1 =
tree_node*, T2 = int]’
gimple-match.c:49533:   instantiated from here
/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1696:
error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested
name specifier
make[2]: *** [gimple-match.o] Error 1

Can you have a look?

>
> Wilco
Wilco Dijkstra Nov. 7, 2017, 1:42 p.m. UTC | #11
Christophe Lyon wrote:

> This causes my builds (all arm and aarch64 targets) to fail:

Richard Biener already committed a fix in r254498 (thanks).
It seems constants in match.pd now need wi::to_wide.

Wilco
diff mbox

Patch

--- pr80131-1.s    2017-08-01 10:02:43.243374174 +0100
+++ pr80131-1.s-patched    2017-08-01 10:00:54.776455630 +0100
@@ -24,10 +24,8 @@ 
     str    x0, [sp, 8]
     ldr    x0, [sp, 8]
     mov    w1, w0
-    mov    w0, 63
-    sub    w0, w0, w1
-    mov    x1, 1
-    lsl    x0, x1, x0
+    mov    x0, -9223372036854775808
+    lsr    x0, x0, x1
     add    sp, sp, 16
     ret
     .size    f2, .-f2
@@ -39,10 +37,8 @@ 
     str    x0, [sp, 8]
     ldr    x0, [sp, 8]
     mov    w1, w0
-    mov    w0, 63
-    sub    w0, w0, w1
-    mov    x1, 1
-    lsl    x0, x1, x0
+    mov    x0, -9223372036854775808
+    lsr    x0, x0, x1
     add    sp, sp, 16
     ret
     .size    f3, .-f3
@@ -52,11 +48,9 @@ 
 f4:
     sub    sp, sp, #16
     str    w0, [sp, 12]
-    mov    w1, 31
     ldr    w0, [sp, 12]
-    sub    w0, w1, w0
-    mov    w1, 1
-    lsl    w0, w1, w0
+    mov    w1, -2147483648
+    lsr    w0, w1, w0
     add    sp, sp, 16
     ret
     .size    f4, .-f4