diff mbox

Don't check for optab for 16bit bswap

Message ID 000401d02370$6167b440$24371cc0$@arm.com
State New
Headers show

Commit Message

Thomas Preud'homme Dec. 29, 2014, 2:04 p.m. UTC
Since 16bit byteswap can be done via an 8 bit rotation (and it is the canonical form),
the check for an optab only serves to prevent the bswap optimization for
targets that don't have a 16bit byteswap (but do have a rotation instruction). See
PR63259 (comments 6 and onwards) for more details.

ChangeLog entry is as follows:

2014-12-24  Thomas Preud'homme  thomas.preudhomme@arm.com

PR tree-optimization/63259
* tree-ssa-math-opts.c (pass_optimize_bswap::execute): Stop checking
if optab exists for 16bit byteswap.



Bootstrapped on x86_64-linux-gnu and aarch64-none-linux-gnu without any
testsuite regression

Is this ok for trunk?

Best regards,

Thomas

Comments

Richard Biener Dec. 29, 2014, 5:09 p.m. UTC | #1
On December 29, 2014 3:04:36 PM CET, Thomas Preud'homme <thomas.preudhomme@arm.com> wrote:
>Since 16bit byteswap can be done via an 8 bit rotation (and it is the
>canonical form),
>the check for an optab only serves to prevent the bswap optimization
>for
>targets that don't have a 16bit byteswap (but do have a rotation
>instruction). See
>PR63259 (comments 6 and onwards) for more details.

OK, but what about targets without a rotation optab?  Is the fallback expansion reasonable in all cases?

Richard.

>ChangeLog entry is as follows:
>
>2014-12-24  Thomas Preud'homme  thomas.preudhomme@arm.com
>
>PR tree-optimization/63259
>* tree-ssa-math-opts.c (pass_optimize_bswap::execute): Stop checking
>if optab exists for 16bit byteswap.
>
>
>diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
>index 1ed2838..c175e88 100644
>--- a/gcc/tree-ssa-math-opts.c
>+++ b/gcc/tree-ssa-math-opts.c
>@@ -2350,15 +2350,13 @@ unsigned int
> pass_optimize_bswap::execute (function *fun)
> {
>   basic_block bb;
>-  bool bswap16_p, bswap32_p, bswap64_p;
>+  bool bswap32_p, bswap64_p;
>   bool changed = false;
>-  tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE,
>bswap64_type = NULL_TREE;
>+  tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE;
> 
>   if (BITS_PER_UNIT != 8)
>     return 0;
> 
>-  bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16)
>-	       && optab_handler (bswap_optab, HImode) != CODE_FOR_nothing);
>   bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32)
> 	       && optab_handler (bswap_optab, SImode) != CODE_FOR_nothing);
>   bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
>@@ -2367,12 +2365,6 @@ pass_optimize_bswap::execute (function *fun)
> 
>   /* Determine the argument type of the builtins.  The code later on
>      assumes that the return and argument type are the same.  */
>-  if (bswap16_p)
>-    {
>-      tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
>-      bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)));
>-    }
>-
>   if (bswap32_p)
>     {
>       tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32);
>@@ -2443,11 +2435,6 @@ pass_optimize_bswap::execute (function *fun)
> 	      if (code == LROTATE_EXPR || code == RROTATE_EXPR)
> 		continue;
> 	      load_type = uint16_type_node;
>-	      if (bswap16_p)
>-		{
>-		  fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
>-		  bswap_type = bswap16_type;
>-		}
> 	      break;
> 	    case 32:
> 	      load_type = uint32_type_node;
>@@ -2469,7 +2456,7 @@ pass_optimize_bswap::execute (function *fun)
> 	      continue;
> 	    }
> 
>-	  if (bswap && !fndecl)
>+	  if (bswap && !fndecl && n.range != 16)
> 	    continue;
> 
>	  if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type,
>load_type,
>
>Bootstrapped on x86_64-linux-gnu and aarch64-none-linux-gnu without any
>testsuite regression
>
>Is this ok for trunk?
>
>Best regards,
>
>Thomas
Thomas Preud'homme Dec. 29, 2014, 5:53 p.m. UTC | #2
> From: Richard Biener [mailto:rguenther@suse.de]
> Sent: Monday, December 29, 2014 5:09 PM
> 
> OK, but what about targets without a rotation optab?  Is the fallback
> expansion reasonable in all cases?

To be honest I haven't checked. I thought being a treecode means it
can always be expanded, using a sequence of shift and bitwise or if
necessary. Isn't there some language that GCC support with rotate
operators?

Given your question I guess I was wrong assuming this. Is there a list
of gimple construct that are necessary supported? What about a list
of insn pattern that a backend must necessarily provide?

Best regards,

Thomas
Oleg Endo Dec. 29, 2014, 6:44 p.m. UTC | #3
On Mon, 2014-12-29 at 17:53 +0000, Thomas Preud'homme wrote:
> > From: Richard Biener [mailto:rguenther@suse.de]
> > Sent: Monday, December 29, 2014 5:09 PM
> > 
> > OK, but what about targets without a rotation optab?  Is the fallback
> > expansion reasonable in all cases?
> 
> To be honest I haven't checked. I thought being a treecode means it
> can always be expanded, using a sequence of shift and bitwise or if
> necessary. Isn't there some language that GCC support with rotate
> operators?
> 
> Given your question I guess I was wrong assuming this. Is there a list
> of gimple construct that are necessary supported? What about a list
> of insn pattern that a backend must necessarily provide?
> 

__builtin_bswap16 expansion uses the 'rotlhi3' pattern to do a 16 bit
bswap as a fallback when there's no 'bswaphi2' pattern in the backend
(like on SH at the moment.  I haven't added bswaphi2, as
__builtin_bswap16 has been working without it).

I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16
expands into shift + and + or (as expected).
Thus, I don't think the patch will make something worse (than it already
is) on some backends.  If the bswap detection bails out at the tree
level, the expanded ops will be shift + and + or -- as written in the
original code.  So probably, that will be the same as the fallback
expansion for __builtin_bswap16, and we're back to sqrt (1).

Cheers,
Oleg
Richard Biener Dec. 30, 2014, 3:22 p.m. UTC | #4
On December 29, 2014 7:44:13 PM CET, Oleg Endo <oleg.endo@t-online.de> wrote:
>On Mon, 2014-12-29 at 17:53 +0000, Thomas Preud'homme wrote:
>> > From: Richard Biener [mailto:rguenther@suse.de]
>> > Sent: Monday, December 29, 2014 5:09 PM
>> > 
>> > OK, but what about targets without a rotation optab?  Is the
>fallback
>> > expansion reasonable in all cases?
>> 
>> To be honest I haven't checked. I thought being a treecode means it
>> can always be expanded, using a sequence of shift and bitwise or if
>> necessary. Isn't there some language that GCC support with rotate
>> operators?
>> 
>> Given your question I guess I was wrong assuming this. Is there a
>list
>> of gimple construct that are necessary supported? What about a list
>> of insn pattern that a backend must necessarily provide?
>> 
>
>__builtin_bswap16 expansion uses the 'rotlhi3' pattern to do a 16 bit
>bswap as a fallback when there's no 'bswaphi2' pattern in the backend
>(like on SH at the moment.  I haven't added bswaphi2, as
>__builtin_bswap16 has been working without it).
>
>I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16
>expands into shift + and + or (as expected).
>Thus, I don't think the patch will make something worse (than it
>already
>is) on some backends.  If the bswap detection bails out at the tree
>level, the expanded ops will be shift + and + or -- as written in the
>original code.  So probably, that will be the same as the fallback
>expansion for __builtin_bswap16, and we're back to sqrt (1).

OK, that is what I was asking - are there cases where using rot is worse (like forcing a libcall or so).

Richard.

>Cheers,
>Oleg
Oleg Endo Dec. 30, 2014, 4:25 p.m. UTC | #5
On Tue, 2014-12-30 at 16:22 +0100, Richard Biener wrote:
> On December 29, 2014 7:44:13 PM CET, Oleg Endo <oleg.endo@t-online.de> wrote:
> >On Mon, 2014-12-29 at 17:53 +0000, Thomas Preud'homme wrote:
> >> > From: Richard Biener [mailto:rguenther@suse.de]
> >> > Sent: Monday, December 29, 2014 5:09 PM
> >> > 
> >> > OK, but what about targets without a rotation optab?  Is the
> >fallback
> >> > expansion reasonable in all cases?
> >> 
> >> To be honest I haven't checked. I thought being a treecode means it
> >> can always be expanded, using a sequence of shift and bitwise or if
> >> necessary. Isn't there some language that GCC support with rotate
> >> operators?
> >> 
> >> Given your question I guess I was wrong assuming this. Is there a
> >list
> >> of gimple construct that are necessary supported? What about a list
> >> of insn pattern that a backend must necessarily provide?
> >> 
> >
> >__builtin_bswap16 expansion uses the 'rotlhi3' pattern to do a 16 bit
> >bswap as a fallback when there's no 'bswaphi2' pattern in the backend
> >(like on SH at the moment.  I haven't added bswaphi2, as
> >__builtin_bswap16 has been working without it).
> >
> >I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16
> >expands into shift + and + or (as expected).
> >Thus, I don't think the patch will make something worse (than it
> >already

.L42:

> >is) on some backends.  If the bswap detection bails out at the tree
> >level, the expanded ops will be shift + and + or -- as written in the
> >original code.  So probably, that will be the same as the fallback
> >expansion for __builtin_bswap16, and we're back to sqrt (1).
> 
> OK, that is what I was asking - are there cases where using rot is worse (like forcing a libcall or so).

See also comment in expmed.c (expand_shift_1):
 It is theoretically possible that the target machine might
 not be able to perform either shift and hence we would
 be making two libcalls rather than just the one for the
 shift (similarly if IOR could not be done).  We will allow
 this extremely unlikely lossage to avoid complicating the
 code below.  */

then goto .L42  ;)

Cheers,
Oleg
Thomas Preud'homme Jan. 5, 2015, 2:54 p.m. UTC | #6
> From: Oleg Endo [mailto:oleg.endo@t-online.de]
> Sent: Tuesday, December 30, 2014 4:25 PM
> 
> I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16
> expands into shift + and + or (as expected).
> Thus, I don't think the patch will make something worse (than it
> already
> 
> .L42:
> 
> is) on some backends.  If the bswap detection bails out at the tree
> level, the expanded ops will be shift + and + or -- as written in the
> original code.  So probably, that will be the same as the fallback
> expansion for __builtin_bswap16, and we're back to sqrt (1).
> 
> > OK, that is what I was asking - are there cases where using rot is worse
> (like forcing a libcall or so).
> 
> See also comment in expmed.c (expand_shift_1):
>  It is theoretically possible that the target machine might
>  not be able to perform either shift and hence we would
>  be making two libcalls rather than just the one for the
>  shift (similarly if IOR could not be done).  We will allow
>  this extremely unlikely lossage to avoid complicating the
>  code below.  */
> 
> then goto .L42  ;)

Yes and if the fallback expansion for a rot is actually worse than the
original set of stmt it means the fallback expansion should be improved.

Now I realize that said like this it sounds more like stage1 material.

Would it be fine for next stage1?

Best regards,

Thomas
Oleg Endo Jan. 5, 2015, 8:58 p.m. UTC | #7
On Mon, 2015-01-05 at 14:54 +0000, Thomas Preud'homme wrote:
> > From: Oleg Endo [mailto:oleg.endo@t-online.de]
> > Sent: Tuesday, December 30, 2014 4:25 PM
> > 
> > I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16
> > expands into shift + and + or (as expected).
> > Thus, I don't think the patch will make something worse (than it
> > already
> > 
> > .L42:
> > 
> > is) on some backends.  If the bswap detection bails out at the tree
> > level, the expanded ops will be shift + and + or -- as written in the
> > original code.  So probably, that will be the same as the fallback
> > expansion for __builtin_bswap16, and we're back to sqrt (1).
> > 
> > > OK, that is what I was asking - are there cases where using rot is worse
> > (like forcing a libcall or so).
> > 
> > See also comment in expmed.c (expand_shift_1):
> >  It is theoretically possible that the target machine might
> >  not be able to perform either shift and hence we would
> >  be making two libcalls rather than just the one for the
> >  shift (similarly if IOR could not be done).  We will allow
> >  this extremely unlikely lossage to avoid complicating the
> >  code below.  */
> > 
> > then goto .L42  ;)
> 
> Yes and if the fallback expansion for a rot is actually worse than the
> original set of stmt it means the fallback expansion should be improved.

I might be missing something, but how can the original shift/and/or
statements (which eventually are translated to shift/and/or insns) be
any better than the shift/and/or insns that are expanded by the rot
fallback?  In both cases the insns have to go through the backend
expanders, and if those are absent/unsupported libcalls will be
expanded.

> Now I realize that said like this it sounds more like stage1 material.
> 
> Would it be fine for next stage1?

Or maybe just leave everything as it is and I add a bswaphi expander in
the SH backend ;)

Cheers,
Oleg
Richard Biener Jan. 5, 2015, 9:09 p.m. UTC | #8
On January 5, 2015 3:54:40 PM CET, Thomas Preud'homme <thomas.preudhomme@arm.com> wrote:
>> From: Oleg Endo [mailto:oleg.endo@t-online.de]
>> Sent: Tuesday, December 30, 2014 4:25 PM
>> 
>> I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16
>> expands into shift + and + or (as expected).
>> Thus, I don't think the patch will make something worse (than it
>> already
>> 
>> .L42:
>> 
>> is) on some backends.  If the bswap detection bails out at the tree
>> level, the expanded ops will be shift + and + or -- as written in the
>> original code.  So probably, that will be the same as the fallback
>> expansion for __builtin_bswap16, and we're back to sqrt (1).
>> 
>> > OK, that is what I was asking - are there cases where using rot is
>worse
>> (like forcing a libcall or so).
>> 
>> See also comment in expmed.c (expand_shift_1):
>>  It is theoretically possible that the target machine might
>>  not be able to perform either shift and hence we would
>>  be making two libcalls rather than just the one for the
>>  shift (similarly if IOR could not be done).  We will allow
>>  this extremely unlikely lossage to avoid complicating the
>>  code below.  */
>> 
>> then goto .L42  ;)
>
>Yes and if the fallback expansion for a rot is actually worse than the
>original set of stmt it means the fallback expansion should be
>improved.
>
>Now I realize that said like this it sounds more like stage1 material.
>
>Would it be fine for next stage1?

No, I think the patch is fine as-is now.

Thanks,
Richard.

>Best regards,
>
>Thomas
H.J. Lu Jan. 21, 2015, 7:16 p.m. UTC | #9
On Mon, Jan 5, 2015 at 1:09 PM, Richard Biener <rguenther@suse.de> wrote:
> On January 5, 2015 3:54:40 PM CET, Thomas Preud'homme <thomas.preudhomme@arm.com> wrote:
>>> From: Oleg Endo [mailto:oleg.endo@t-online.de]
>>> Sent: Tuesday, December 30, 2014 4:25 PM
>>>
>>> I've just tried disabling the 'rotlhi3' pattern and __builtin_bswap16
>>> expands into shift + and + or (as expected).
>>> Thus, I don't think the patch will make something worse (than it
>>> already
>>>
>>> .L42:
>>>
>>> is) on some backends.  If the bswap detection bails out at the tree
>>> level, the expanded ops will be shift + and + or -- as written in the
>>> original code.  So probably, that will be the same as the fallback
>>> expansion for __builtin_bswap16, and we're back to sqrt (1).
>>>
>>> > OK, that is what I was asking - are there cases where using rot is
>>worse
>>> (like forcing a libcall or so).
>>>
>>> See also comment in expmed.c (expand_shift_1):
>>>  It is theoretically possible that the target machine might
>>>  not be able to perform either shift and hence we would
>>>  be making two libcalls rather than just the one for the
>>>  shift (similarly if IOR could not be done).  We will allow
>>>  this extremely unlikely lossage to avoid complicating the
>>>  code below.  */
>>>
>>> then goto .L42  ;)
>>
>>Yes and if the fallback expansion for a rot is actually worse than the
>>original set of stmt it means the fallback expansion should be
>>improved.
>>
>>Now I realize that said like this it sounds more like stage1 material.
>>
>>Would it be fine for next stage1?
>
> No, I think the patch is fine as-is now.

Does it cause:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64718
Richard Henderson Jan. 21, 2015, 9:23 p.m. UTC | #10
On 12/29/2014 06:04 AM, Thomas Preud'homme wrote:
> Since 16bit byteswap can be done via an 8 bit rotation (and it is the canonical form),
> the check for an optab only serves to prevent the bswap optimization for
> targets that don't have a 16bit byteswap (but do have a rotation instruction). See
> PR63259 (comments 6 and onwards) for more details.

I question the choice to have rotate be the canonical form.

Doesn't this make things more complicated for targets that
don't have rotate?  Or, equivalently, no 16-bit rotate?
That would seem to cover 99% of all 32-bit risc machines.


r~
Thomas Preud'homme Jan. 22, 2015, 7:50 a.m. UTC | #11
> -----Original Message-----
> From: Richard Henderson [mailto:rth@redhat.com]
> Sent: Thursday, January 22, 2015 5:24 AM
> To: Thomas Preud'homme; gcc-patches@gcc.gnu.org; 'Richard Biener'
> Subject: Re: [PATCH] Don't check for optab for 16bit bswap
> 
> On 12/29/2014 06:04 AM, Thomas Preud'homme wrote:
> > Since 16bit byteswap can be done via an 8 bit rotation (and it is the
> canonical form),
> > the check for an optab only serves to prevent the bswap optimization
> for
> > targets that don't have a 16bit byteswap (but do have a rotation
> instruction). See
> > PR63259 (comments 6 and onwards) for more details.
> 
> I question the choice to have rotate be the canonical form.
> 
> Doesn't this make things more complicated for targets that
> don't have rotate?  Or, equivalently, no 16-bit rotate?
> That would seem to cover 99% of all 32-bit risc machines.

Here is the difference between the two (see [1]):

/* Rotation of 16bit values by 8 bits is effectively equivalent to a bswaphi.
Note that this is not the case for bigger values. For instance a rotation
of 0x01020304 by 16 bits gives 0x03040102 which is different from
0x04030201 (bswapsi). */
if (rotate
     && CONST_INT_P (op1)
     && INTVAL (op1) == BITS_PER_UNIT
     && GET_MODE_SIZE (scalar_mode) == 2
     && optab_handler (bswap_optab, HImode) != CODE_FOR_nothing)
  return expand_unop (HImode, bswap_optab, shifted, NULL_RTX,
			unsignedp);

[1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/expmed.c?revision=219718&view=markup#l2208

It doesn't look much more complicated. There was also some change in
the bswap pass but some code was both added and removed. As to the
bug raised by H.J. Lu, I just wrote a patch and testing is underway.

Given this, I don't think it's a problem. That being said, I'm happy to
change the canonical representation if there is consensus for that.

Best regards,

Thomas
Richard Biener Jan. 22, 2015, 7:52 a.m. UTC | #12
On January 21, 2015 10:23:56 PM CET, Richard Henderson <rth@redhat.com> wrote:
>On 12/29/2014 06:04 AM, Thomas Preud'homme wrote:
>> Since 16bit byteswap can be done via an 8 bit rotation (and it is the
>canonical form),
>> the check for an optab only serves to prevent the bswap optimization
>for
>> targets that don't have a 16bit byteswap (but do have a rotation
>instruction). See
>> PR63259 (comments 6 and onwards) for more details.
>
>I question the choice to have rotate be the canonical form.
>
>Doesn't this make things more complicated for targets that
>don't have rotate?  Or, equivalently, no 16-bit rotate?
>That would seem to cover 99% of all 32-bit risc machines.

I was asking for the generic expander to consider bswapHI.  Rotates are certainly more likely to get combined with sth else.

Richard.

>
>r~
Richard Henderson Jan. 22, 2015, 6:43 p.m. UTC | #13
On 01/21/2015 11:52 PM, Richard Biener wrote:
> On January 21, 2015 10:23:56 PM CET, Richard Henderson <rth@redhat.com> wrote:
>> On 12/29/2014 06:04 AM, Thomas Preud'homme wrote:
>>> Since 16bit byteswap can be done via an 8 bit rotation (and it is the
>> canonical form),
>>> the check for an optab only serves to prevent the bswap optimization
>> for
>>> targets that don't have a 16bit byteswap (but do have a rotation
>> instruction). See
>>> PR63259 (comments 6 and onwards) for more details.
>>
>> I question the choice to have rotate be the canonical form.
>>
>> Doesn't this make things more complicated for targets that
>> don't have rotate?  Or, equivalently, no 16-bit rotate?
>> That would seem to cover 99% of all 32-bit risc machines.
> 
> I was asking for the generic expander to consider bswapHI.  Rotates are
> certainly more likely to get combined with sth else.

Maybe.  Alternately, don't we canonicalize byte-swapping memory ops as

 (set (reg:HI) (bswap:HI (mem:HI)))

 (set (mem:HI) (bswap:HI (reg:HI)))

Certainly this is true for powerpc.

(It appears that both sparc and s390 are missing similar patterns,
despite having instructions that can perform this operation...)


r~


r~
Thomas Preud'homme Jan. 23, 2015, 1:19 a.m. UTC | #14
> From: Richard Henderson [mailto:rth@redhat.com]
> Sent: Friday, January 23, 2015 2:43 AM
> On 01/21/2015 11:52 PM, Richard Biener wrote:
> >
> > I was asking for the generic expander to consider bswapHI.  Rotates are
> > certainly more likely to get combined with sth else.
> 
> Maybe.  Alternately, don't we canonicalize byte-swapping memory ops
> as
> 
>  (set (reg:HI) (bswap:HI (mem:HI)))
> 
>  (set (mem:HI) (bswap:HI (reg:HI)))

Given the expmed.c bit I pasted, I'd say that's how it would be canonicalized
in RTL form yes. I think Richard was referring to combination at GIMPLE level
but that expander should expand to bswapHI to not break expectations.

Best regards,

Thomas
Richard Biener Jan. 23, 2015, 8:57 a.m. UTC | #15
On Thu, 22 Jan 2015, Richard Henderson wrote:

> On 01/21/2015 11:52 PM, Richard Biener wrote:
> > On January 21, 2015 10:23:56 PM CET, Richard Henderson <rth@redhat.com> wrote:
> >> On 12/29/2014 06:04 AM, Thomas Preud'homme wrote:
> >>> Since 16bit byteswap can be done via an 8 bit rotation (and it is the
> >> canonical form),
> >>> the check for an optab only serves to prevent the bswap optimization
> >> for
> >>> targets that don't have a 16bit byteswap (but do have a rotation
> >> instruction). See
> >>> PR63259 (comments 6 and onwards) for more details.
> >>
> >> I question the choice to have rotate be the canonical form.
> >>
> >> Doesn't this make things more complicated for targets that
> >> don't have rotate?  Or, equivalently, no 16-bit rotate?
> >> That would seem to cover 99% of all 32-bit risc machines.
> > 
> > I was asking for the generic expander to consider bswapHI.  Rotates are
> > certainly more likely to get combined with sth else.

Sorry for being GIMPLE centric here - but I meant canonicalization
on GIMPLE.  For RTL the expander will hopefully canonicalize it to
whatever is canonical on RTL.

> Maybe.  Alternately, don't we canonicalize byte-swapping memory ops as
> 
>  (set (reg:HI) (bswap:HI (mem:HI)))
> 
>  (set (mem:HI) (bswap:HI (reg:HI)))
> 
> Certainly this is true for powerpc.
> 
> (It appears that both sparc and s390 are missing similar patterns,
> despite having instructions that can perform this operation...)

That makes sense, it looks easier to match than a form with rotates.
I wonder when CPUs start to make a load-with-byte-shuffle instruction
available...

Richard.
diff mbox

Patch

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 1ed2838..c175e88 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2350,15 +2350,13 @@  unsigned int
 pass_optimize_bswap::execute (function *fun)
 {
   basic_block bb;
-  bool bswap16_p, bswap32_p, bswap64_p;
+  bool bswap32_p, bswap64_p;
   bool changed = false;
-  tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type = NULL_TREE;
+  tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE;
 
   if (BITS_PER_UNIT != 8)
     return 0;
 
-  bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16)
-	       && optab_handler (bswap_optab, HImode) != CODE_FOR_nothing);
   bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32)
 	       && optab_handler (bswap_optab, SImode) != CODE_FOR_nothing);
   bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
@@ -2367,12 +2365,6 @@  pass_optimize_bswap::execute (function *fun)
 
   /* Determine the argument type of the builtins.  The code later on
      assumes that the return and argument type are the same.  */
-  if (bswap16_p)
-    {
-      tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
-      bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)));
-    }
-
   if (bswap32_p)
     {
       tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32);
@@ -2443,11 +2435,6 @@  pass_optimize_bswap::execute (function *fun)
 	      if (code == LROTATE_EXPR || code == RROTATE_EXPR)
 		continue;
 	      load_type = uint16_type_node;
-	      if (bswap16_p)
-		{
-		  fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
-		  bswap_type = bswap16_type;
-		}
 	      break;
 	    case 32:
 	      load_type = uint32_type_node;
@@ -2469,7 +2456,7 @@  pass_optimize_bswap::execute (function *fun)
 	      continue;
 	    }
 
-	  if (bswap && !fndecl)
+	  if (bswap && !fndecl && n.range != 16)
 	    continue;
 
 	  if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, load_type,