diff mbox

[AArch64] Add a SHA1H pattern

Message ID AM5PR0802MB2610FEC899BA4DC8E9E6AF5C83AD0@AM5PR0802MB2610.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Wilco Dijkstra Oct. 28, 2016, 3:54 p.m. UTC
James Greenhalgh wrote:
> On Wed, Oct 26, 2016 at 12:11:44PM +0000, Wilco Dijkstra wrote:
> > Add a SHA1H pattern with a V2SI input.  This avoids unnecessary
> > DUPs when using intrinsics like vsha1h_u32 (vgetq_lane_u32 (x, 0)).
>
> I think this is incorrect for big endian - element 0 of a vec_select in
> big-endian for V4SImode is the high 32-bits (i.e. bits 96-127 of the
> architected register). I think you'd need two patterns, one as below for
> !BYTES_BIG_ENDIAN, and one selecting element 3 for BYTES_BIG_ENDIAN.

Yes that's true, big-endian SIMD works in mysterious ways... Here is the updated
patch (tested on aarch64_be-none-elf too):

Add LE/BE SHA1H patterns with a V2SI input.  This avoids unnecessary
DUPs when using intrinsics like vsha1h_u32 (vgetq_lane_u32 (x, 0)).

ChangeLog:
2016-10-28  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64-simd.md (aarch64_crypto_sha1hv4si): New pattern.
	(aarch64_be_crypto_sha1hv4si): New pattern.
--

>          (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
>

Comments

James Greenhalgh Oct. 28, 2016, 3:57 p.m. UTC | #1
On Fri, Oct 28, 2016 at 04:54:05PM +0100, Wilco Dijkstra wrote:
> James Greenhalgh wrote:
> > On Wed, Oct 26, 2016 at 12:11:44PM +0000, Wilco Dijkstra wrote:
> > > Add a SHA1H pattern with a V2SI input.  This avoids unnecessary
> > > DUPs when using intrinsics like vsha1h_u32 (vgetq_lane_u32 (x, 0)).
> >
> > I think this is incorrect for big endian - element 0 of a vec_select in
> > big-endian for V4SImode is the high 32-bits (i.e. bits 96-127 of the
> > architected register). I think you'd need two patterns, one as below for
> > !BYTES_BIG_ENDIAN, and one selecting element 3 for BYTES_BIG_ENDIAN.
> 
> Yes that's true, big-endian SIMD works in mysterious ways... Here is the updated
> patch (tested on aarch64_be-none-elf too):
> 
> Add LE/BE SHA1H patterns with a V2SI input.  This avoids unnecessary
> DUPs when using intrinsics like vsha1h_u32 (vgetq_lane_u32 (x, 0)).


Thanks, this respin looks OK to me.

James

> ChangeLog:
> 2016-10-28  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* config/aarch64/aarch64-simd.md (aarch64_crypto_sha1hv4si): New pattern.
> 	(aarch64_be_crypto_sha1hv4si): New pattern.
> --
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9ce7f00050913aebd9f83ae9c4ce4ad469dd0d98..89bdcb3f7ed53d092dd95c81fe4a15fb15dc907c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5705,6 +5705,26 @@ 
   [(set_attr "type" "crypto_sha1_fast")]
 )
 
+(define_insn "aarch64_crypto_sha1hv4si"
+  [(set (match_operand:SI 0 "register_operand" "=w")
+	(unspec:SI [(vec_select:SI (match_operand:V4SI 1 "register_operand" "w")
+		     (parallel [(const_int 0)]))]
+	 UNSPEC_SHA1H))]
+  "TARGET_SIMD && TARGET_CRYPTO && !BYTES_BIG_ENDIAN"
+  "sha1h\\t%s0, %s1"
+  [(set_attr "type" "crypto_sha1_fast")]
+)
+
+(define_insn "aarch64_be_crypto_sha1hv4si"
+  [(set (match_operand:SI 0 "register_operand" "=w")
+	(unspec:SI [(vec_select:SI (match_operand:V4SI 1 "register_operand" "w")
+		     (parallel [(const_int 3)]))]
+	 UNSPEC_SHA1H))]
+  "TARGET_SIMD && TARGET_CRYPTO && BYTES_BIG_ENDIAN"
+  "sha1h\\t%s0, %s1"
+  [(set_attr "type" "crypto_sha1_fast")]
+)
+
 (define_insn "aarch64_crypto_sha1su1v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=w")
         (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")