diff mbox

[SH] PR 53568 - Add support for bswap built-ins

Message ID 1339626083.2198.22.camel@yam-132-YW-E178-FTW
State New
Headers show

Commit Message

Oleg Endo June 13, 2012, 10:21 p.m. UTC
Hello,

The attached patch improves code generated for byte swap expressions
such as ((x & 0xFF) << 8) | ((x >> 8) & 0xFF).
It seems that currently the tree optimizers only detect bswap32 and
bswap64 but not bswap16 patterns.  The patch adds detection for bswap16
patterns by playing along with the combine pass.

Tested with 
make -k -j8 check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m2a-single/-mb,-m4/-ml,
-m4/-mb,-m4-single/-ml,-m4-single/-mb,-m4a-single/-ml,
-m4a-single/-mb}"

and no new failures.
Test cases for this patch and the previous bswap32 patch will follow
shortly.

Cheers,
Oleg

ChangeLog:

	PR target/53568
	* config/sh/sh.md: Add peephole for swapbsi2. 
	(*swapbisi2_and_shl8, *swapbhisi2): New insns and splits.

Comments

Kaz Kojima June 14, 2012, 2:39 a.m. UTC | #1
Oleg Endo <oleg.endo@t-online.de> wrote:
> The attached patch improves code generated for byte swap expressions
> such as ((x & 0xFF) << 8) | ((x >> 8) & 0xFF).
> It seems that currently the tree optimizers only detect bswap32 and
> bswap64 but not bswap16 patterns.  The patch adds detection for bswap16
> patterns by playing along with the combine pass.
> 
> Tested with 
> make -k -j8 check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m2a-single/-mb,-m4/-ml,
> -m4/-mb,-m4-single/-ml,-m4-single/-mb,-m4a-single/-ml,
> -m4a-single/-mb}"
> 
> and no new failures.
> Test cases for this patch and the previous bswap32 patch will follow
> shortly.

The patch looks fine to me.  OK for trunk.

I guess that tree optimizers handle bswap32/64 with its cost
because they are more frequent and critical in the real working
set like network codes than bswap16, though I could be wrong
about it.

Regards,
	kaz
Richard Biener June 14, 2012, 8:17 a.m. UTC | #2
On Thu, Jun 14, 2012 at 4:39 AM, Kaz Kojima <kkojima@rr.iij4u.or.jp> wrote:
> Oleg Endo <oleg.endo@t-online.de> wrote:
>> The attached patch improves code generated for byte swap expressions
>> such as ((x & 0xFF) << 8) | ((x >> 8) & 0xFF).
>> It seems that currently the tree optimizers only detect bswap32 and
>> bswap64 but not bswap16 patterns.  The patch adds detection for bswap16
>> patterns by playing along with the combine pass.
>>
>> Tested with
>> make -k -j8 check RUNTESTFLAGS="--target_board=sh-sim
>> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m2a-single/-mb,-m4/-ml,
>> -m4/-mb,-m4-single/-ml,-m4-single/-mb,-m4a-single/-ml,
>> -m4a-single/-mb}"
>>
>> and no new failures.
>> Test cases for this patch and the previous bswap32 patch will follow
>> shortly.
>
> The patch looks fine to me.  OK for trunk.
>
> I guess that tree optimizers handle bswap32/64 with its cost
> because they are more frequent and critical in the real working
> set like network codes than bswap16, though I could be wrong
> about it.

I think the bswap16 builtin simply post-dates the bswap recognition code.

Richard.

> Regards,
>        kaz
diff mbox

Patch

Index: gcc/config/sh/sh.md
===================================================================
--- gcc/config/sh/sh.md	(revision 188525)
+++ gcc/config/sh/sh.md	(working copy)
@@ -4561,6 +4561,81 @@ 
   "swap.b	%1,%0"
   [(set_attr "type" "arith")])
 
+;; The *swapbisi2_and_shl8 pattern helps the combine pass simplifying
+;; partial byte swap expressions such as...
+;;   ((x & 0xFF) << 8) | ((x >> 8) & 0xFF).
+;; ...which are currently not handled by the tree optimizers.
+;; The combine pass will not initially try to combine the full expression,
+;; but only some sub-expressions.  In such a case the *swapbisi2_and_shl8
+;; pattern acts as an intermediate pattern that will eventually lead combine
+;; to the swapbsi2 pattern above.
+;; As a side effect this also improves code that does (x & 0xFF) << 8
+;; or (x << 8) & 0xFF00.
+(define_insn_and_split "*swapbisi2_and_shl8"
+  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
+	(ior:SI (and:SI (ashift:SI (match_operand:SI 1 "arith_reg_operand" "r")
+				   (const_int 8))
+			(const_int 65280))
+		(match_operand:SI 2 "arith_reg_operand" "r")))]
+  "TARGET_SH1 && ! reload_in_progress && ! reload_completed"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  rtx tmp0 = gen_reg_rtx (SImode);
+  rtx tmp1 = gen_reg_rtx (SImode);
+
+  emit_insn (gen_zero_extendqisi2 (tmp0, gen_lowpart (QImode, operands[1])));
+  emit_insn (gen_swapbsi2 (tmp1, tmp0));
+  emit_insn (gen_iorsi3 (operands[0], tmp1, operands[2]));
+  DONE;
+})
+
+;; The *swapbhisi2 pattern is, like the *swapbisi2_and_shl8 pattern, another
+;; intermediate pattern that will help the combine pass arriving at swapbsi2.
+(define_insn_and_split "*swapbhisi2"
+  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
+	(ior:SI (and:SI (ashift:SI (match_operand:SI 1 "arith_reg_operand" "r")
+				   (const_int 8))
+			(const_int 65280))
+		(zero_extract:SI (match_dup 1) (const_int 8) (const_int 8))))]
+  "TARGET_SH1 && ! reload_in_progress && ! reload_completed"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  rtx tmp = gen_reg_rtx (SImode);
+
+  emit_insn (gen_zero_extendhisi2 (tmp, gen_lowpart (HImode, operands[1])));
+  emit_insn (gen_swapbsi2 (operands[0], tmp));
+  DONE;
+})
+
+;; In some cases the swapbsi2 pattern might leave a sequence such as...
+;;   swap.b  r4,r4
+;;   mov     r4,r0
+;;
+;; which can be simplified to...
+;;   swap.b  r4,r0
+(define_peephole2
+  [(set (match_operand:SI 0 "arith_reg_dest" "")
+	(ior:SI (and:SI (match_operand:SI 1 "arith_reg_operand" "")
+			(const_int 4294901760))
+		(ior:SI (and:SI (ashift:SI (match_dup 1) (const_int 8))
+				(const_int 65280))
+			(and:SI (ashiftrt:SI (match_dup 1) (const_int 8))
+				(const_int 255)))))
+   (set (match_operand:SI 2 "arith_reg_dest" "")
+	(match_dup 0))]
+  "TARGET_SH1 && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2)
+	(ior:SI (and:SI (match_operand:SI 1 "arith_reg_operand" "")
+			(const_int 4294901760))
+		(ior:SI (and:SI (ashift:SI (match_dup 1) (const_int 8))
+				(const_int 65280))
+			(and:SI (ashiftrt:SI (match_dup 1) (const_int 8))
+				(const_int 255)))))])
+
 
 ;; -------------------------------------------------------------------------
 ;; Zero extension instructions