Message ID | VI1PR08MB265576266CC24A0553CCB94898B30@VI1PR08MB2655.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <Sudi.Das@arm.com> wrote: > > > > > Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification? Yes, I think so. Jakub requested code to undo this at RTL expansion based on target costs, not sure if we really should require that from you given the user could have written the target sequence himself. Few comments about the patch: +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus INTEGER_CST@1 @2)) I think this warrants a single_use check on the minus (note :s isn't enough as with the unsigned case we'd happily ignore it by design). + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT + && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1)) You can relax this with using && wi::eq_p (@1, TYPE_PRECISION (type) - 1) + (if (TYPE_UNSIGNED(type)) + (rshift (lshift @0 @1) @2) + (with + { tree utype = unsigned_type_for (type); } + (convert:type (rshift (lshift (convert:utype @0) @1) @2)))))) + You can write (convert (rshift ...)), without the :type. I'm leaving it to Jakub whether you need to write that RTL expansion tweak. Thanks, Richard. > I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch. > > --- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 > +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 > @@ -24,10 +24,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f2, .-f2 > @@ -39,10 +37,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f3, .-f3 > @@ -52,11 +48,9 @@ > f4: > sub sp, sp, #16 > str w0, [sp, 12] > - mov w1, 31 > ldr w0, [sp, 12] > - sub w0, w1, w0 > - mov w1, 1 > - lsl w0, w1, w0 > + mov w1, -2147483648 > + lsr w0, w1, w0 > add sp, sp, 16 > ret > .size f4, .-f4 > > > Thanks > > Sudi > > > > > From: Wilco Dijkstra > Sent: Thursday, April 13, 2017 1:01 PM > To: Richard Biener; Jakub Jelinek > Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh > Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) > > Richard Biener wrote: >> It is IMHO a valid GIMPLE optimization / canonicalization. >> >> movabsq $-9223372036854775808, %rax >> >> so this should then have been generated as 1<<63? >> >> At some point variable shifts were quite expensive as well.. > > Yes I don't see a major difference between movabsq and > > movl $1, %eax > salq $63, %rax > > on my Sandy Bridge, but if the above is faster then that is what the x64 > backend should emit - it's 1 byte smaller as well, so probably better in all > cases. > > Wilco
Still waiting on Jakub's comment on whether there are more things needed at the backend. But I have updated the patch according to Richard's comments. Thanks Sudi From: Richard Biener <richard.guenther@gmail.com> Sent: Friday, August 4, 2017 11:16 AM To: Sudi Das Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <Sudi.Das@arm.com> wrote: > > > > > Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification? Yes, I think so. Jakub requested code to undo this at RTL expansion based on target costs, not sure if we really should require that from you given the user could have written the target sequence himself. Few comments about the patch: +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus INTEGER_CST@1 @2)) I think this warrants a single_use check on the minus (note :s isn't enough as with the unsigned case we'd happily ignore it by design). + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT + && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1)) You can relax this with using && wi::eq_p (@1, TYPE_PRECISION (type) - 1) + (if (TYPE_UNSIGNED(type)) + (rshift (lshift @0 @1) @2) + (with + { tree utype = unsigned_type_for (type); } + (convert:type (rshift (lshift (convert:utype @0) @1) @2)))))) + You can write (convert (rshift ...)), without the :type. I'm leaving it to Jakub whether you need to write that RTL expansion tweak. Thanks, Richard. > I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch. > > --- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 > +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 > @@ -24,10 +24,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f2, .-f2 > @@ -39,10 +37,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f3, .-f3 > @@ -52,11 +48,9 @@ > f4: > sub sp, sp, #16 > str w0, [sp, 12] > - mov w1, 31 > ldr w0, [sp, 12] > - sub w0, w1, w0 > - mov w1, 1 > - lsl w0, w1, w0 > + mov w1, -2147483648 > + lsr w0, w1, w0 > add sp, sp, 16 > ret > .size f4, .-f4 > > > Thanks > > Sudi > > > > > From: Wilco Dijkstra > Sent: Thursday, April 13, 2017 1:01 PM > To: Richard Biener; Jakub Jelinek > Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh > Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) > > Richard Biener wrote: >> It is IMHO a valid GIMPLE optimization / canonicalization. >> >> movabsq $-9223372036854775808, %rax >> >> so this should then have been generated as 1<<63? >> >> At some point variable shifts were quite expensive as well.. > > Yes I don't see a major difference between movabsq and > > movl $1, %eax > salq $63, %rax > > on my Sandy Bridge, but if the above is faster then that is what the x64 > backend should emit - it's 1 byte smaller as well, so probably better in all > cases. > > Wilco diff --git a/gcc/match.pd b/gcc/match.pd index e9017e4..160c12d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && tree_nop_conversion_p (type, TREE_TYPE (@1))) (lshift @0 @2))) +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3)) + (if (INTEGRAL_TYPE_P (type) + && wi::eq_p (@2, TYPE_PRECISION (type) - 1) + && single_use (@1)) + (if (TYPE_UNSIGNED(type)) + (rshift (lshift @0 @2) @3) + (with + { tree utype = unsigned_type_for (type); } + (convert (rshift (lshift (convert:utype @0) @2) @3)))))) + /* Fold (C1/X)*C2 into (C1*C2)/X. */ (simplify (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c new file mode 100644 index 0000000..317ea3e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr80131-1.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-gimple" } */ + +/* Checks the simplification of: + 1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1 + f1 is not simplified but f2, f3 and f4 are. */ + +__INT64_TYPE__ f1 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (31 - i); +} + +__INT64_TYPE__ f2 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (63 - i); +} + +__UINT64_TYPE__ f3 (__INT64_TYPE__ i) +{ + return (__UINT64_TYPE__)1 << (63 - i); +} + +__INT32_TYPE__ f4 (__INT32_TYPE__ i) +{ + return (__INT32_TYPE__)1 << (31 - i); +} + +/* { dg-final { scan-tree-dump-times "= 31 -" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */
On Tue, Sep 26, 2017 at 12:44:10PM +0000, Sudi Das wrote: > > Still waiting on Jakub's comment on whether there are more things needed > at the backend. But I have updated the patch according to Richard's > comments. Well, we don't want to regress performance wise on one of the most important primary targets. I don't care that much if the RTL/backend work is done together with the patch, or as a follow-up during stage1/3, but it should be done, the testcases I've posted can be used as a basis of a P1 runtime performance regression. Jakub
Jakub Jelinek wrote: > Well, we don't want to regress performance wise on one of the most important > primary targets. I don't care that much if the RTL/backend work is done > together with the patch, or as a follow-up during stage1/3, but it should be > done, the testcases I've posted can be used as a basis of a P1 runtime > performance regression. It should be sufficient to file a bug about inefficient 64-bit constant expansions on x64. I didn't see a significant difference in my benchmarking of it on x64, so I'd say it's only a performance regression if large benchmarks regress measurably (quite unlikely). Wilco
Hi Jakub As per the discussions, I have a created a bug report for the possible regression this may cause. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82454 Sudi From: Wilco Dijkstra Sent: Tuesday, September 26, 2017 2:20 PM To: Sudi Das; Jakub Jelinek Cc: Richard Biener; GCC Patches; nd; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) Jakub Jelinek wrote: > Well, we don't want to regress performance wise on one of the most important > primary targets. I don't care that much if the RTL/backend work is done > together with the patch, or as a follow-up during stage1/3, but it should be > done, the testcases I've posted can be used as a basis of a P1 runtime > performance regression. It should be sufficient to file a bug about inefficient 64-bit constant expansions on x64. I didn't see a significant difference in my benchmarking of it on x64, so I'd say it's only a performance regression if large benchmarks regress measurably (quite unlikely). Wilco
On Tue, Sep 26, 2017 at 2:44 PM, Sudi Das <Sudi.Das@arm.com> wrote: > > Still waiting on Jakub's comment on whether there are more things needed at the backend. But I have updated the patch according to Richard's comments. + (if (TYPE_UNSIGNED(type)) space before '(type)'. + (rshift (lshift @0 @2) @3) + (with + { tree utype = unsigned_type_for (type); } + (convert (rshift (lshift (convert:utype @0) @2) @3)))))) I think your testcase needs a { dg-require-effective-target int32plus } as otherwise it'll fail on AVR. I think the patch is ok with these changes but obviously we should try to address the code-generation issue on x86 at RTL expansion time. They are sort-of existing missing optimizations. Richard. > Thanks > Sudi > > > > From: Richard Biener <richard.guenther@gmail.com> > Sent: Friday, August 4, 2017 11:16 AM > To: Sudi Das > Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh > Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) > > On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das <Sudi.Das@arm.com> wrote: >> >> >> >> >> Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification? > > Yes, I think so. Jakub requested code to undo this at RTL expansion > based on target costs, not sure if we really should > require that from you given the user could have written the target > sequence himself. > > Few comments about the patch: > > +/* Fold (1 << (C - x)) where C = precision(type) - 1 > + into ((1 << C) >> x). */ > +(simplify > + (lshift integer_onep@0 (minus INTEGER_CST@1 @2)) > > I think this warrants a single_use check on the minus (note :s isn't enough > as with the unsigned case we'd happily ignore it by design). > > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT > + && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1)) > > You can relax this with using > > && wi::eq_p (@1, TYPE_PRECISION (type) - 1) > > + (if (TYPE_UNSIGNED(type)) > + (rshift (lshift @0 @1) @2) > + (with > + { tree utype = unsigned_type_for (type); } > + (convert:type (rshift (lshift (convert:utype @0) @1) @2)))))) > + > > You can write (convert (rshift ...)), without the :type. > > I'm leaving it to Jakub whether you need to write that RTL expansion tweak. > > Thanks, > Richard. > >> I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch. >> >> --- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 >> +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 >> @@ -24,10 +24,8 @@ >> str x0, [sp, 8] >> ldr x0, [sp, 8] >> mov w1, w0 >> - mov w0, 63 >> - sub w0, w0, w1 >> - mov x1, 1 >> - lsl x0, x1, x0 >> + mov x0, -9223372036854775808 >> + lsr x0, x0, x1 >> add sp, sp, 16 >> ret >> .size f2, .-f2 >> @@ -39,10 +37,8 @@ >> str x0, [sp, 8] >> ldr x0, [sp, 8] >> mov w1, w0 >> - mov w0, 63 >> - sub w0, w0, w1 >> - mov x1, 1 >> - lsl x0, x1, x0 >> + mov x0, -9223372036854775808 >> + lsr x0, x0, x1 >> add sp, sp, 16 >> ret >> .size f3, .-f3 >> @@ -52,11 +48,9 @@ >> f4: >> sub sp, sp, #16 >> str w0, [sp, 12] >> - mov w1, 31 >> ldr w0, [sp, 12] >> - sub w0, w1, w0 >> - mov w1, 1 >> - lsl w0, w1, w0 >> + mov w1, -2147483648 >> + lsr w0, w1, w0 >> add sp, sp, 16 >> ret >> .size f4, .-f4 >> >> >> Thanks >> >> Sudi >> >> >> >> >> From: Wilco Dijkstra >> Sent: Thursday, April 13, 2017 1:01 PM >> To: Richard Biener; Jakub Jelinek >> Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh >> Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) >> >> Richard Biener wrote: >>> It is IMHO a valid GIMPLE optimization / canonicalization. >>> >>> movabsq $-9223372036854775808, %rax >>> >>> so this should then have been generated as 1<<63? >>> >>> At some point variable shifts were quite expensive as well.. >> >> Yes I don't see a major difference between movabsq and >> >> movl $1, %eax >> salq $63, %rax >> >> on my Sandy Bridge, but if the above is faster then that is what the x64 >> backend should emit - it's 1 byte smaller as well, so probably better in all >> cases. >> >> Wilco >
Richard Biener wrote: > I think the patch is ok with these changes but obviously we should try > to address > the code-generation issue on x86 at RTL expansion time. They are sort-of > existing missing optimizations. Note the only x64 specific issue is the backend expansion of 64-bit immediates which could be improved like I suggested. However what we're really missing is a generic optimization pass that tries to simplify immediates using accurate target costs. In eg. x >= C or x > C we can use either C or C-1 - on many targets one option may be a single instruction, while the other might take 2 or even 3. Wilco
Thanks, I have made the changes to the patch. Also can someone please apply it for me. I do not have commit access. 2017-10-10 Sudakshina Das <sudi.das@arm.com> PR middle-end/80131 * match.pd: Simplify 1 << (C - x) where C = precision (x) - 1. 2017-10-10 Sudakshina Das <sudi.das@arm.com> PR middle-end/80131 * testsuite/gcc.dg/pr80131-1.c: New Test. With regards to the existing missed optimizations needed to the x86 RTL expansion, I think the discussions can take place on the bug report that I created and maybe someone will pick it up. Thanks Sudi From: Wilco Dijkstra Sent: Monday, October 9, 2017 2:02 PM To: Richard Biener; Sudi Das Cc: Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) Richard Biener wrote: > I think the patch is ok with these changes but obviously we should try > to address > the code-generation issue on x86 at RTL expansion time. They are sort-of > existing missing optimizations. Note the only x64 specific issue is the backend expansion of 64-bit immediates which could be improved like I suggested. However what we're really missing is a generic optimization pass that tries to simplify immediates using accurate target costs. In eg. x >= C or x > C we can use either C or C-1 - on many targets one option may be a single instruction, while the other might take 2 or even 3. Wilco diff --git a/gcc/match.pd b/gcc/match.pd index e58a65a..7a25a1b 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && tree_nop_conversion_p (type, TREE_TYPE (@1))) (lshift @0 @2))) +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3)) + (if (INTEGRAL_TYPE_P (type) + && wi::eq_p (@2, TYPE_PRECISION (type) - 1) + && single_use (@1)) + (if (TYPE_UNSIGNED (type)) + (rshift (lshift @0 @2) @3) + (with + { tree utype = unsigned_type_for (type); } + (convert (rshift (lshift (convert:utype @0) @2) @3)))))) + /* Fold (C1/X)*C2 into (C1*C2)/X. */ (simplify (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c new file mode 100644 index 0000000..0bfe1f4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr80131-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target int32plus } */ +/* { dg-options "-fdump-tree-gimple" } */ + +/* Checks the simplification of: + 1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1 + f1 is not simplified but f2, f3 and f4 are. */ + +__INT64_TYPE__ f1 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (31 - i); +} + +__INT64_TYPE__ f2 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (63 - i); +} + +__UINT64_TYPE__ f3 (__INT64_TYPE__ i) +{ + return (__UINT64_TYPE__)1 << (63 - i); +} + +__INT32_TYPE__ f4 (__INT32_TYPE__ i) +{ + return (__INT32_TYPE__)1 << (31 - i); +} + +/* { dg-final { scan-tree-dump-times "= 31 -" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */
Sudi Das wrote: > Thanks, I have made the changes to the patch. > Also can someone please apply it for me. I do not have commit access. > > 2017-10-10 Sudakshina Das <sudi.das@arm.com> > > PR middle-end/80131 > * match.pd: Simplify 1 << (C - x) where C = precision (x) - 1. > > 2017-10-10 Sudakshina Das <sudi.das@arm.com> > > PR middle-end/80131 > * testsuite/gcc.dg/pr80131-1.c: New Test. > > > With regards to the existing missed optimizations needed to the x86 RTL expansion, > I think the discussions can take place on the bug report that I created and maybe someone will pick it up. I've committed this as r254496. Wilco
Hi Wilco On 7 November 2017 at 13:28, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > Sudi Das wrote: > >> Thanks, I have made the changes to the patch. >> Also can someone please apply it for me. I do not have commit access. >> >> 2017-10-10 Sudakshina Das <sudi.das@arm.com> >> >> PR middle-end/80131 >> * match.pd: Simplify 1 << (C - x) where C = precision (x) - 1. >> >> 2017-10-10 Sudakshina Das <sudi.das@arm.com> >> >> PR middle-end/80131 >> * testsuite/gcc.dg/pr80131-1.c: New Test. >> >> >> With regards to the existing missed optimizations needed to the x86 RTL expansion, >> I think the discussions can take place on the bug report that I created and maybe someone will pick it up. > > I've committed this as r254496. This causes my builds (all arm and aarch64 targets) to fail: g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/. -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../include -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libcpp/include -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc1/./gmp -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gmp -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64-none-linux-gnu/gcc1/./mpfr/src -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/mpfr/src -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/mpc/src -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libdecnumber -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libdecnumber/dpd -I../libdecnumber -I/tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/../libbacktrace -o gimple-match.o -MT gimple-match.o -MMD -MP -MF ./.deps/gimple-match.TPo gimple-match.c In file included from /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:397, from /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:22, from gimple-match.c:4: /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h: In function ‘bool wi::eq_p(const T1&, const T2&) [with T1 = tree_node*, T2 = int]’: gimple-match.c:49533: instantiated from here /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier gimple-match.c:49533: instantiated from here /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1764: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h: In function ‘unsigned int wi::get_binary_precision(const T1&, const T2&) [with T1 = tree_node*, T2 = int]’: /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1763: instantiated from ‘bool wi::eq_p(const T1&, const T2&) [with T1 = tree_node*, T2 = int]’ gimple-match.c:49533: instantiated from here /tmp/1717606_6.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/wide-int.h:1696: error: incomplete type ‘wi::int_traits<tree_node*>’ used in nested name specifier make[2]: *** [gimple-match.o] Error 1 Can you have a look? > > Wilco
Christophe Lyon wrote:
> This causes my builds (all arm and aarch64 targets) to fail:
Richard Biener already committed a fix in r254498 (thanks).
It seems constants in match.pd now need wi::to_wide.
Wilco
--- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 @@ -24,10 +24,8 @@ str x0, [sp, 8] ldr x0, [sp, 8] mov w1, w0 - mov w0, 63 - sub w0, w0, w1 - mov x1, 1 - lsl x0, x1, x0 + mov x0, -9223372036854775808 + lsr x0, x0, x1 add sp, sp, 16 ret .size f2, .-f2 @@ -39,10 +37,8 @@ str x0, [sp, 8] ldr x0, [sp, 8] mov w1, w0 - mov w0, 63 - sub w0, w0, w1 - mov x1, 1 - lsl x0, x1, x0 + mov x0, -9223372036854775808 + lsr x0, x0, x1 add sp, sp, 16 ret .size f3, .-f3 @@ -52,11 +48,9 @@ f4: sub sp, sp, #16 str w0, [sp, 12] - mov w1, 31 ldr w0, [sp, 12] - sub w0, w1, w0 - mov w1, 1 - lsl w0, w1, w0 + mov w1, -2147483648 + lsr w0, w1, w0 add sp, sp, 16 ret .size f4, .-f4