Message ID | 000401d02370$6167b440$24371cc0$@arm.com |
---|---|
State | New |
Headers | show |
On December 29, 2014 3:04:36 PM CET, Thomas Preud'homme <thomas.preudhomme@arm.com> wrote: >Since 16bit byteswap can be done via an 8 bit rotation (and it is the >canonical form), >the check for an optab only serves to prevent the bswap optimization >for >targets that don't have a 16bit byteswap (but do have a rotation >instruction). See >PR63259 (comments 6 and onwards) for more details. OK, but what about targets without a rotation optab? Is the fallback expansion reasonable in all cases? Richard. >ChangeLog entry is as follows: > >2014-12-24 Thomas Preud'homme thomas.preudhomme@arm.com > >PR tree-optimization/63259 >* tree-ssa-math-opts.c (pass_optimize_bswap::execute): Stop checking >if optab exists for 16bit byteswap. > > >diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c >index 1ed2838..c175e88 100644 >--- a/gcc/tree-ssa-math-opts.c >+++ b/gcc/tree-ssa-math-opts.c >@@ -2350,15 +2350,13 @@ unsigned int > pass_optimize_bswap::execute (function *fun) > { > basic_block bb; >- bool bswap16_p, bswap32_p, bswap64_p; >+ bool bswap32_p, bswap64_p; > bool changed = false; >- tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, >bswap64_type = NULL_TREE; >+ tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; > > if (BITS_PER_UNIT != 8) > return 0; > >- bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16) >- && optab_handler (bswap_optab, HImode) != CODE_FOR_nothing); > bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32) > && optab_handler (bswap_optab, SImode) != CODE_FOR_nothing); > bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64) >@@ -2367,12 +2365,6 @@ pass_optimize_bswap::execute (function *fun) > > /* Determine the argument type of the builtins. The code later on > assumes that the return and argument type are the same. */ >- if (bswap16_p) >- { >- tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); >- bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))); >- } >- > if (bswap32_p) > { > tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32); >@@ -2443,11 +2435,6 @@ pass_optimize_bswap::execute (function *fun) > if (code == LROTATE_EXPR || code == RROTATE_EXPR) > continue; > load_type = uint16_type_node; >- if (bswap16_p) >- { >- fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); >- bswap_type = bswap16_type; >- } > break; > case 32: > load_type = uint32_type_node; >@@ -2469,7 +2456,7 @@ pass_optimize_bswap::execute (function *fun) > continue; > } > >- if (bswap && !fndecl) >+ if (bswap && !fndecl && n.range != 16) > continue; > > if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, >load_type, > >Bootstrapped on x86_64-linux-gnu and aarch64-none-linux-gnu without any >testsuite regression > >Is this ok for trunk? > >Best regards, > >Thomas
> From: Richard Biener [mailto:rguenther@suse.de] > Sent: Monday, December 29, 2014 5:09 PM > > OK, but what about targets without a rotation optab? Is the fallback > expansion reasonable in all cases? To be honest I haven't checked. I thought being a treecode means it can always be expanded, using a sequence of shift and bitwise or if necessary. Isn't there some language that GCC support with rotate operators? Given your question I guess I was wrong assuming this. Is there a list of gimple construct that are necessary supported? What about a list of insn pattern that a backend must necessarily provide? Best regards, Thomas
On Mon, 2014-12-29 at 17:53 +0000, Thomas Preud'homme wrote: > > From: Richard Biener [mailto:rguenther@suse.de] > > Sent: Monday, December 29, 2014 5:09 PM > > > > OK, but what about targets without a rotation optab? Is the fallback > > expansion reasonable in all cases? > > To be honest I haven't checked. I thought being a treecode means it > can always be expanded, using a sequence of shift and bitwise or if > necessary. Isn't there some language that GCC support with rotate > operators? > > Given your question I guess I was wrong assuming this. Is there a list > of gimple construct that are necessary supported? What about a list > of insn pattern that a backend must necessarily provide? > __builtin_bswap16 expansion uses the 'rotlhi3' pattern to do a 16 bit bswap as a fallback when there's no 'bswaphi2' pattern in the backend (like on SH at the moment. I haven't added bswaphi2, as __builtin_bswap16 has been working without it). I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16 expands into shift + and + or (as expected). Thus, I don't think the patch will make something worse (than it already is) on some backends. If the bswap detection bails out at the tree level, the expanded ops will be shift + and + or -- as written in the original code. So probably, that will be the same as the fallback expansion for __builtin_bswap16, and we're back to sqrt (1). Cheers, Oleg
On December 29, 2014 7:44:13 PM CET, Oleg Endo <oleg.endo@t-online.de> wrote: >On Mon, 2014-12-29 at 17:53 +0000, Thomas Preud'homme wrote: >> > From: Richard Biener [mailto:rguenther@suse.de] >> > Sent: Monday, December 29, 2014 5:09 PM >> > >> > OK, but what about targets without a rotation optab? Is the >fallback >> > expansion reasonable in all cases? >> >> To be honest I haven't checked. I thought being a treecode means it >> can always be expanded, using a sequence of shift and bitwise or if >> necessary. Isn't there some language that GCC support with rotate >> operators? >> >> Given your question I guess I was wrong assuming this. Is there a >list >> of gimple construct that are necessary supported? What about a list >> of insn pattern that a backend must necessarily provide? >> > >__builtin_bswap16 expansion uses the 'rotlhi3' pattern to do a 16 bit >bswap as a fallback when there's no 'bswaphi2' pattern in the backend >(like on SH at the moment. I haven't added bswaphi2, as >__builtin_bswap16 has been working without it). > >I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16 >expands into shift + and + or (as expected). >Thus, I don't think the patch will make something worse (than it >already >is) on some backends. If the bswap detection bails out at the tree >level, the expanded ops will be shift + and + or -- as written in the >original code. So probably, that will be the same as the fallback >expansion for __builtin_bswap16, and we're back to sqrt (1). OK, that is what I was asking - are there cases where using rot is worse (like forcing a libcall or so). Richard. >Cheers, >Oleg
On Tue, 2014-12-30 at 16:22 +0100, Richard Biener wrote: > On December 29, 2014 7:44:13 PM CET, Oleg Endo <oleg.endo@t-online.de> wrote: > >On Mon, 2014-12-29 at 17:53 +0000, Thomas Preud'homme wrote: > >> > From: Richard Biener [mailto:rguenther@suse.de] > >> > Sent: Monday, December 29, 2014 5:09 PM > >> > > >> > OK, but what about targets without a rotation optab? Is the > >fallback > >> > expansion reasonable in all cases? > >> > >> To be honest I haven't checked. I thought being a treecode means it > >> can always be expanded, using a sequence of shift and bitwise or if > >> necessary. Isn't there some language that GCC support with rotate > >> operators? > >> > >> Given your question I guess I was wrong assuming this. Is there a > >list > >> of gimple construct that are necessary supported? What about a list > >> of insn pattern that a backend must necessarily provide? > >> > > > >__builtin_bswap16 expansion uses the 'rotlhi3' pattern to do a 16 bit > >bswap as a fallback when there's no 'bswaphi2' pattern in the backend > >(like on SH at the moment. I haven't added bswaphi2, as > >__builtin_bswap16 has been working without it). > > > >I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16 > >expands into shift + and + or (as expected). > >Thus, I don't think the patch will make something worse (than it > >already .L42: > >is) on some backends. If the bswap detection bails out at the tree > >level, the expanded ops will be shift + and + or -- as written in the > >original code. So probably, that will be the same as the fallback > >expansion for __builtin_bswap16, and we're back to sqrt (1). > > OK, that is what I was asking - are there cases where using rot is worse (like forcing a libcall or so). See also comment in expmed.c (expand_shift_1): It is theoretically possible that the target machine might not be able to perform either shift and hence we would be making two libcalls rather than just the one for the shift (similarly if IOR could not be done). We will allow this extremely unlikely lossage to avoid complicating the code below. */ then goto .L42 ;) Cheers, Oleg
> From: Oleg Endo [mailto:oleg.endo@t-online.de] > Sent: Tuesday, December 30, 2014 4:25 PM > > I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16 > expands into shift + and + or (as expected). > Thus, I don't think the patch will make something worse (than it > already > > .L42: > > is) on some backends. If the bswap detection bails out at the tree > level, the expanded ops will be shift + and + or -- as written in the > original code. So probably, that will be the same as the fallback > expansion for __builtin_bswap16, and we're back to sqrt (1). > > > OK, that is what I was asking - are there cases where using rot is worse > (like forcing a libcall or so). > > See also comment in expmed.c (expand_shift_1): > It is theoretically possible that the target machine might > not be able to perform either shift and hence we would > be making two libcalls rather than just the one for the > shift (similarly if IOR could not be done). We will allow > this extremely unlikely lossage to avoid complicating the > code below. */ > > then goto .L42 ;) Yes and if the fallback expansion for a rot is actually worse than the original set of stmt it means the fallback expansion should be improved. Now I realize that said like this it sounds more like stage1 material. Would it be fine for next stage1? Best regards, Thomas
On Mon, 2015-01-05 at 14:54 +0000, Thomas Preud'homme wrote: > > From: Oleg Endo [mailto:oleg.endo@t-online.de] > > Sent: Tuesday, December 30, 2014 4:25 PM > > > > I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16 > > expands into shift + and + or (as expected). > > Thus, I don't think the patch will make something worse (than it > > already > > > > .L42: > > > > is) on some backends. If the bswap detection bails out at the tree > > level, the expanded ops will be shift + and + or -- as written in the > > original code. So probably, that will be the same as the fallback > > expansion for __builtin_bswap16, and we're back to sqrt (1). > > > > > OK, that is what I was asking - are there cases where using rot is worse > > (like forcing a libcall or so). > > > > See also comment in expmed.c (expand_shift_1): > > It is theoretically possible that the target machine might > > not be able to perform either shift and hence we would > > be making two libcalls rather than just the one for the > > shift (similarly if IOR could not be done). We will allow > > this extremely unlikely lossage to avoid complicating the > > code below. */ > > > > then goto .L42 ;) > > Yes and if the fallback expansion for a rot is actually worse than the > original set of stmt it means the fallback expansion should be improved. I might be missing something, but how can the original shift/and/or statements (which eventually are translated to shift/and/or insns) be any better than the shift/and/or insns that are expanded by the rot fallback? In both cases the insns have to go through the backend expanders, and if those are absent/unsupported libcalls will be expanded. > Now I realize that said like this it sounds more like stage1 material. > > Would it be fine for next stage1? Or maybe just leave everything as it is and I add a bswaphi expander in the SH backend ;) Cheers, Oleg
On January 5, 2015 3:54:40 PM CET, Thomas Preud'homme <thomas.preudhomme@arm.com> wrote: >> From: Oleg Endo [mailto:oleg.endo@t-online.de] >> Sent: Tuesday, December 30, 2014 4:25 PM >> >> I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16 >> expands into shift + and + or (as expected). >> Thus, I don't think the patch will make something worse (than it >> already >> >> .L42: >> >> is) on some backends. If the bswap detection bails out at the tree >> level, the expanded ops will be shift + and + or -- as written in the >> original code. So probably, that will be the same as the fallback >> expansion for __builtin_bswap16, and we're back to sqrt (1). >> >> > OK, that is what I was asking - are there cases where using rot is >worse >> (like forcing a libcall or so). >> >> See also comment in expmed.c (expand_shift_1): >> It is theoretically possible that the target machine might >> not be able to perform either shift and hence we would >> be making two libcalls rather than just the one for the >> shift (similarly if IOR could not be done). We will allow >> this extremely unlikely lossage to avoid complicating the >> code below. */ >> >> then goto .L42 ;) > >Yes and if the fallback expansion for a rot is actually worse than the >original set of stmt it means the fallback expansion should be >improved. > >Now I realize that said like this it sounds more like stage1 material. > >Would it be fine for next stage1? No, I think the patch is fine as-is now. Thanks, Richard. >Best regards, > >Thomas
On Mon, Jan 5, 2015 at 1:09 PM, Richard Biener <rguenther@suse.de> wrote: > On January 5, 2015 3:54:40 PM CET, Thomas Preud'homme <thomas.preudhomme@arm.com> wrote: >>> From: Oleg Endo [mailto:oleg.endo@t-online.de] >>> Sent: Tuesday, December 30, 2014 4:25 PM >>> >>> I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16 >>> expands into shift + and + or (as expected). >>> Thus, I don't think the patch will make something worse (than it >>> already >>> >>> .L42: >>> >>> is) on some backends. If the bswap detection bails out at the tree >>> level, the expanded ops will be shift + and + or -- as written in the >>> original code. So probably, that will be the same as the fallback >>> expansion for __builtin_bswap16, and we're back to sqrt (1). >>> >>> > OK, that is what I was asking - are there cases where using rot is >>worse >>> (like forcing a libcall or so). >>> >>> See also comment in expmed.c (expand_shift_1): >>> It is theoretically possible that the target machine might >>> not be able to perform either shift and hence we would >>> be making two libcalls rather than just the one for the >>> shift (similarly if IOR could not be done). We will allow >>> this extremely unlikely lossage to avoid complicating the >>> code below. */ >>> >>> then goto .L42 ;) >> >>Yes and if the fallback expansion for a rot is actually worse than the >>original set of stmt it means the fallback expansion should be >>improved. >> >>Now I realize that said like this it sounds more like stage1 material. >> >>Would it be fine for next stage1? > > No, I think the patch is fine as-is now. Does it cause: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64718
On 12/29/2014 06:04 AM, Thomas Preud'homme wrote: > Since 16bit byteswap can be done via an 8 bit rotation (and it is the canonical form), > the check for an optab only serves to prevent the bswap optimization for > targets that don't have a 16bit byteswap (but do have a rotation instruction). See > PR63259 (comments 6 and onwards) for more details. I question the choice to have rotate be the canonical form. Doesn't this make things more complicated for targets that don't have rotate? Or, equivalently, no 16-bit rotate? That would seem to cover 99% of all 32-bit risc machines. r~
> -----Original Message----- > From: Richard Henderson [mailto:rth@redhat.com] > Sent: Thursday, January 22, 2015 5:24 AM > To: Thomas Preud'homme; gcc-patches@gcc.gnu.org; 'Richard Biener' > Subject: Re: [PATCH] Don't check for optab for 16bit bswap > > On 12/29/2014 06:04 AM, Thomas Preud'homme wrote: > > Since 16bit byteswap can be done via an 8 bit rotation (and it is the > canonical form), > > the check for an optab only serves to prevent the bswap optimization > for > > targets that don't have a 16bit byteswap (but do have a rotation > instruction). See > > PR63259 (comments 6 and onwards) for more details. > > I question the choice to have rotate be the canonical form. > > Doesn't this make things more complicated for targets that > don't have rotate? Or, equivalently, no 16-bit rotate? > That would seem to cover 99% of all 32-bit risc machines. Here is the difference between the two (see [1]): /* Rotation of 16bit values by 8 bits is effectively equivalent to a bswaphi. Note that this is not the case for bigger values. For instance a rotation of 0x01020304 by 16 bits gives 0x03040102 which is different from 0x04030201 (bswapsi). */ if (rotate && CONST_INT_P (op1) && INTVAL (op1) == BITS_PER_UNIT && GET_MODE_SIZE (scalar_mode) == 2 && optab_handler (bswap_optab, HImode) != CODE_FOR_nothing) return expand_unop (HImode, bswap_optab, shifted, NULL_RTX, unsignedp); [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/expmed.c?revision=219718&view=markup#l2208 It doesn't look much more complicated. There was also some change in the bswap pass but some code was both added and removed. As to the bug raised by H.J. Lu, I just wrote a patch and testing is underway. Given this, I don't think it's a problem. That being said, I'm happy to change the canonical representation if there is consensus for that. Best regards, Thomas
On January 21, 2015 10:23:56 PM CET, Richard Henderson <rth@redhat.com> wrote: >On 12/29/2014 06:04 AM, Thomas Preud'homme wrote: >> Since 16bit byteswap can be done via an 8 bit rotation (and it is the >canonical form), >> the check for an optab only serves to prevent the bswap optimization >for >> targets that don't have a 16bit byteswap (but do have a rotation >instruction). See >> PR63259 (comments 6 and onwards) for more details. > >I question the choice to have rotate be the canonical form. > >Doesn't this make things more complicated for targets that >don't have rotate? Or, equivalently, no 16-bit rotate? >That would seem to cover 99% of all 32-bit risc machines. I was asking for the generic expander to consider bswapHI. Rotates are certainly more likely to get combined with sth else. Richard. > >r~
On 01/21/2015 11:52 PM, Richard Biener wrote: > On January 21, 2015 10:23:56 PM CET, Richard Henderson <rth@redhat.com> wrote: >> On 12/29/2014 06:04 AM, Thomas Preud'homme wrote: >>> Since 16bit byteswap can be done via an 8 bit rotation (and it is the >> canonical form), >>> the check for an optab only serves to prevent the bswap optimization >> for >>> targets that don't have a 16bit byteswap (but do have a rotation >> instruction). See >>> PR63259 (comments 6 and onwards) for more details. >> >> I question the choice to have rotate be the canonical form. >> >> Doesn't this make things more complicated for targets that >> don't have rotate? Or, equivalently, no 16-bit rotate? >> That would seem to cover 99% of all 32-bit risc machines. > > I was asking for the generic expander to consider bswapHI. Rotates are > certainly more likely to get combined with sth else. Maybe. Alternately, don't we canonicalize byte-swapping memory ops as (set (reg:HI) (bswap:HI (mem:HI))) (set (mem:HI) (bswap:HI (reg:HI))) Certainly this is true for powerpc. (It appears that both sparc and s390 are missing similar patterns, despite having instructions that can perform this operation...) r~ r~
> From: Richard Henderson [mailto:rth@redhat.com] > Sent: Friday, January 23, 2015 2:43 AM > On 01/21/2015 11:52 PM, Richard Biener wrote: > > > > I was asking for the generic expander to consider bswapHI. Rotates are > > certainly more likely to get combined with sth else. > > Maybe. Alternately, don't we canonicalize byte-swapping memory ops > as > > (set (reg:HI) (bswap:HI (mem:HI))) > > (set (mem:HI) (bswap:HI (reg:HI))) Given the expmed.c bit I pasted, I'd say that's how it would be canonicalized in RTL form yes. I think Richard was referring to combination at GIMPLE level but that expander should expand to bswapHI to not break expectations. Best regards, Thomas
On Thu, 22 Jan 2015, Richard Henderson wrote: > On 01/21/2015 11:52 PM, Richard Biener wrote: > > On January 21, 2015 10:23:56 PM CET, Richard Henderson <rth@redhat.com> wrote: > >> On 12/29/2014 06:04 AM, Thomas Preud'homme wrote: > >>> Since 16bit byteswap can be done via an 8 bit rotation (and it is the > >> canonical form), > >>> the check for an optab only serves to prevent the bswap optimization > >> for > >>> targets that don't have a 16bit byteswap (but do have a rotation > >> instruction). See > >>> PR63259 (comments 6 and onwards) for more details. > >> > >> I question the choice to have rotate be the canonical form. > >> > >> Doesn't this make things more complicated for targets that > >> don't have rotate? Or, equivalently, no 16-bit rotate? > >> That would seem to cover 99% of all 32-bit risc machines. > > > > I was asking for the generic expander to consider bswapHI. Rotates are > > certainly more likely to get combined with sth else. Sorry for being GIMPLE centric here - but I meant canonicalization on GIMPLE. For RTL the expander will hopefully canonicalize it to whatever is canonical on RTL. > Maybe. Alternately, don't we canonicalize byte-swapping memory ops as > > (set (reg:HI) (bswap:HI (mem:HI))) > > (set (mem:HI) (bswap:HI (reg:HI))) > > Certainly this is true for powerpc. > > (It appears that both sparc and s390 are missing similar patterns, > despite having instructions that can perform this operation...) That makes sense, it looks easier to match than a form with rotates. I wonder when CPUs start to make a load-with-byte-shuffle instruction available... Richard.
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 1ed2838..c175e88 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2350,15 +2350,13 @@ unsigned int pass_optimize_bswap::execute (function *fun) { basic_block bb; - bool bswap16_p, bswap32_p, bswap64_p; + bool bswap32_p, bswap64_p; bool changed = false; - tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; + tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; if (BITS_PER_UNIT != 8) return 0; - bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16) - && optab_handler (bswap_optab, HImode) != CODE_FOR_nothing); bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32) && optab_handler (bswap_optab, SImode) != CODE_FOR_nothing); bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64) @@ -2367,12 +2365,6 @@ pass_optimize_bswap::execute (function *fun) /* Determine the argument type of the builtins. The code later on assumes that the return and argument type are the same. */ - if (bswap16_p) - { - tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); - bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))); - } - if (bswap32_p) { tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32); @@ -2443,11 +2435,6 @@ pass_optimize_bswap::execute (function *fun) if (code == LROTATE_EXPR || code == RROTATE_EXPR) continue; load_type = uint16_type_node; - if (bswap16_p) - { - fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); - bswap_type = bswap16_type; - } break; case 32: load_type = uint32_type_node; @@ -2469,7 +2456,7 @@ pass_optimize_bswap::execute (function *fun) continue; } - if (bswap && !fndecl) + if (bswap && !fndecl && n.range != 16) continue; if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, load_type,