From patchwork Wed Nov 8 13:14:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 835803 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-466243-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="X6nTU4bd"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yX6GT01Jrz9sMN for ; Thu, 9 Nov 2017 00:14:56 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; q=dns; s= default; b=AXB5rGFxhxS236s0vicAEHi8RJ7PikwWb5WKGFv3os58PstWT8cvD RGCDcRx+1K9ekw3cHrGTYGTH9skv42WdS3WdAi5x0oKEvWOB8UK305EIB3JWj5BK nhJBk87I2gdg03LxjcTdpTYp+aWGC+vtCoS5b9GDjovbFCRAvGwMNU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:mime-version:content-type:message-id; s= default; bh=hLmW4GfgI8b6vWfQwnC/rv8glxA=; b=X6nTU4bdhV571WIsng/G LLqgK0DvvqVme4dVhBCQjCukKs/IVPXo95axSnaEF+FTlk7adk34dBR/dtXbWtZm wq7NGWL3TWeYE4vZwJNcxiLMI3iPcuXSHKdw+9NCN5ySusOCOEaWpqg2kdMULKYu 4cG1AF5JQJ8SbqajNTQKw1o= Received: (qmail 7448 invoked by alias); 8 Nov 2017 13:14:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 6755 invoked by uid 89); 8 Nov 2017 13:14:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=hw, rx, 254513, rX X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 08 Nov 2017 13:14:40 +0000 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vA8DE68F019066 for ; Wed, 8 Nov 2017 08:14:35 -0500 Received: from e12.ny.us.ibm.com (e12.ny.us.ibm.com [129.33.205.202]) by mx0b-001b2d01.pphosted.com with ESMTP id 2e410rdchd-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 08 Nov 2017 08:14:35 -0500 Received: from localhost by e12.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 8 Nov 2017 08:14:34 -0500 Received: from b01cxnp22036.gho.pok.ibm.com (9.57.198.26) by e12.ny.us.ibm.com (146.89.104.199) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 8 Nov 2017 08:14:32 -0500 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vA8DEWku45678802; Wed, 8 Nov 2017 13:14:32 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E1DA8112034; Wed, 8 Nov 2017 08:13:58 -0500 (EST) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id C431C112047; Wed, 8 Nov 2017 08:13:58 -0500 (EST) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 95A0245D1E; Wed, 8 Nov 2017 08:14:31 -0500 (EST) Date: Wed, 8 Nov 2017 08:14:31 -0500 From: Michael Meissner To: GCC Patches , Segher Boessenkool , David Edelsohn , Bill Schmidt Subject: [PATCH], Generate XXBR{H, W, D} for bswap{16, 32, 64} on PowerPC ISA 3.0 (power9) Mail-Followup-To: Michael Meissner , GCC Patches , Segher Boessenkool , David Edelsohn , Bill Schmidt MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 x-cbid: 17110813-0048-0000-0000-000002022909 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008031; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000239; SDB=6.00942908; UDB=6.00475682; IPR=6.00723196; BA=6.00005679; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017911; XFM=3.00000015; UTC=2017-11-08 13:14:33 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17110813-0049-0000-0000-0000431CBE1A Message-Id: <20171108131431.GA11821@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-11-08_03:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711080183 X-IsSubscribed: yes PowerPC ISA 3.0 does not have a byte-reverse instruction that operates on the GPRs, but it does have vector byte swap half-word, word, double-word operations in the VSX registers. The enclosed patch enables generation of the byte revseral instructions for register-register operations. It still prefers to generate the load with byte reverse (L{H,W,D}BRX) or store with byte reverse (ST{H,W,D}BRX) instructions over the register sequence. For 16-bit and 32-bit byte swaps, it typically does the tradational operation in GPR registers, but it will generate XXBR{H,W} if the values are in vector registers. For 64-bit swaps, it no longer generates the 9 instruction sequence in favor of XXBRD. I did some timing runs on a prototype power9 system, and it was slightly faster to do direct move to the vecter unit, XXBRD, and direct move back to a GPR than the traditional sequence. I did bootstraps on little endian Power8 and Power9 systems (with the default cpu set to power8 and power9 respectively). There were no regressions. Can I check this patch into the trunk? [gcc] 2017-11-08 Michael Meissner * config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems, enable generating XXBR{H,W} if the value is in a vector register. (bswapsi2_reg): Likewise. (bswapdi2_reg): On ISA 3.0 systems, use XXBRD to do bswap64 instead of doing the GPR sequence used on previoius machines. (bswapdi2_xxbrd): Likewise. (bswapdi2_reg splitters): Use int_reg_operand instead of gpc_reg_operand to not match when XXBRD is generated. [gcc/testsuite] 2017-11-08 Michael Meissner * gcc.target/powerpc/p9-xxbr-3.c: New test. Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 254516) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -2432,13 +2432,15 @@ (define_insn "bswap2_store" [(set_attr "type" "store")]) (define_insn_and_split "bswaphi2_reg" - [(set (match_operand:HI 0 "gpc_reg_operand" "=&r") + [(set (match_operand:HI 0 "gpc_reg_operand" "=&r,wo") (bswap:HI - (match_operand:HI 1 "gpc_reg_operand" "r"))) - (clobber (match_scratch:SI 2 "=&r"))] + (match_operand:HI 1 "gpc_reg_operand" "r,wo"))) + (clobber (match_scratch:SI 2 "=&r,X"))] "" - "#" - "reload_completed" + "@ + # + xxbrh %x0,%x1" + "reload_completed && int_reg_operand (operands[0], HImode)" [(set (match_dup 3) (and:SI (lshiftrt:SI (match_dup 4) (const_int 8)) @@ -2454,18 +2456,20 @@ (define_insn_and_split "bswaphi2_reg" operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0); operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0); } - [(set_attr "length" "12") - (set_attr "type" "*")]) + [(set_attr "length" "12,4") + (set_attr "type" "*,vecperm")]) ;; We are always BITS_BIG_ENDIAN, so the bit positions below in ;; zero_extract insns do not change for -mlittle. (define_insn_and_split "bswapsi2_reg" - [(set (match_operand:SI 0 "gpc_reg_operand" "=&r") + [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,wo") (bswap:SI - (match_operand:SI 1 "gpc_reg_operand" "r")))] + (match_operand:SI 1 "gpc_reg_operand" "r,wo")))] "" - "#" - "reload_completed" + "@ + # + xxbrw %x0,%x1" + "reload_completed && int_reg_operand (operands[0], SImode)" [(set (match_dup 0) ; DABC (rotate:SI (match_dup 1) (const_int 24))) @@ -2481,7 +2485,9 @@ (define_insn_and_split "bswapsi2_reg" (const_int 255)) (and:SI (match_dup 0) (const_int -256))))] - "") + "" + [(set_attr "length" "12,4") + (set_attr "type" "*,vecperm")]) ;; On systems with LDBRX/STDBRX generate the loads/stores directly, just like ;; we do for L{H,W}BRX and ST{H,W}BRX above. If not, we have to generate more @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2" emit_insn (gen_bswapdi2_load (dest, src)); else if (MEM_P (dest)) emit_insn (gen_bswapdi2_store (dest, src)); + else if (TARGET_P9_VECTOR) + emit_insn (gen_bswapdi2_xxbrd (dest, src)); else emit_insn (gen_bswapdi2_reg (dest, src)); DONE; @@ -2537,6 +2545,13 @@ (define_insn "bswapdi2_store" "stdbrx %1,%y0" [(set_attr "type" "store")]) +(define_insn "bswapdi2_xxbrd" + [(set (match_operand:DI 0 "gpc_reg_operand" "=wo") + (bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wo")))] + "TARGET_POWERPC64 && TARGET_P9_VECTOR" + "xxbrd %x0,%x1" + [(set_attr "type" "vecperm")]) + (define_insn "bswapdi2_reg" [(set (match_operand:DI 0 "gpc_reg_operand" "=&r") (bswap:DI (match_operand:DI 1 "gpc_reg_operand" "r"))) @@ -2544,7 +2559,8 @@ (define_insn "bswapdi2_reg" (clobber (match_scratch:DI 3 "=&r"))] "TARGET_POWERPC64 && TARGET_LDBRX" "#" - [(set_attr "length" "36")]) + [(set_attr "length" "36") + (set_attr "type" "*")]) ;; Non-power7/cell, fall back to use lwbrx/stwbrx (define_insn "*bswapdi2_64bit" @@ -2560,7 +2576,7 @@ (define_insn "*bswapdi2_64bit" [(set_attr "length" "16,12,36")]) (define_split - [(set (match_operand:DI 0 "gpc_reg_operand" "") + [(set (match_operand:DI 0 "int_reg_operand" "") (bswap:DI (match_operand:DI 1 "indexed_or_indirect_operand" ""))) (clobber (match_operand:DI 2 "gpc_reg_operand" "")) (clobber (match_operand:DI 3 "gpc_reg_operand" ""))] @@ -2625,7 +2641,7 @@ (define_split (define_split [(set (match_operand:DI 0 "indexed_or_indirect_operand" "") - (bswap:DI (match_operand:DI 1 "gpc_reg_operand" ""))) + (bswap:DI (match_operand:DI 1 "int_reg_operand" ""))) (clobber (match_operand:DI 2 "gpc_reg_operand" "")) (clobber (match_operand:DI 3 "gpc_reg_operand" ""))] "TARGET_POWERPC64 && !TARGET_LDBRX && reload_completed" @@ -2687,10 +2703,10 @@ (define_split }") (define_split - [(set (match_operand:DI 0 "gpc_reg_operand" "") - (bswap:DI (match_operand:DI 1 "gpc_reg_operand" ""))) - (clobber (match_operand:DI 2 "gpc_reg_operand" "")) - (clobber (match_operand:DI 3 "gpc_reg_operand" ""))] + [(set (match_operand:DI 0 "int_reg_operand" "") + (bswap:DI (match_operand:DI 1 "int_reg_operand" ""))) + (clobber (match_operand:DI 2 "int_reg_operand" "")) + (clobber (match_operand:DI 3 "int_reg_operand" ""))] "TARGET_POWERPC64 && reload_completed" [(const_int 0)] " @@ -2722,9 +2738,9 @@ (define_insn "bswapdi2_32bit" [(set_attr "length" "16,12,36")]) (define_split - [(set (match_operand:DI 0 "gpc_reg_operand" "") + [(set (match_operand:DI 0 "int_reg_operand" "") (bswap:DI (match_operand:DI 1 "indexed_or_indirect_operand" ""))) - (clobber (match_operand:SI 2 "gpc_reg_operand" ""))] + (clobber (match_operand:SI 2 "int_reg_operand" ""))] "!TARGET_POWERPC64 && reload_completed" [(const_int 0)] " Index: gcc/testsuite/gcc.target/powerpc/p9-xxbr-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/p9-xxbr-3.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/p9-xxbr-3.c (working copy) @@ -0,0 +1,99 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mpower9-vector -O2" } */ + +/* Verify that the XXBR{H,W} instructions are generated if the value is + forced to be in a vector register, and XXBRD is generated all of the + time for register bswap64's. */ + +unsigned short +do_bswap16_mem (unsigned short *p) +{ + return __builtin_bswap16 (*p); /* LHBRX. */ +} + +unsigned short +do_bswap16_reg (unsigned short a) +{ + return __builtin_bswap16 (a); /* gpr sequences. */ +} + +void +do_bswap16_store (unsigned short *p, unsigned short a) +{ + *p = __builtin_bswap16 (a); /* STHBRX. */ +} + +unsigned short +do_bswap16_vect (unsigned short a) +{ + __asm__ (" # %x0" : "+v" (a)); + return __builtin_bswap16 (a); /* XXBRW. */ +} + +unsigned int +do_bswap32_mem (unsigned int *p) +{ + return __builtin_bswap32 (*p); /* LWBRX. */ +} + +unsigned int +do_bswap32_reg (unsigned int a) +{ + return __builtin_bswap32 (a); /* gpr sequences. */ +} + +void +do_bswap32_store (unsigned int *p, unsigned int a) +{ + *p = __builtin_bswap32 (a); /* STWBRX. */ +} + +unsigned int +do_bswap32_vect (unsigned int a) +{ + __asm__ (" # %x0" : "+v" (a)); + return __builtin_bswap32 (a); /* XXBRW. */ +} + +unsigned long +do_bswap64_mem (unsigned long *p) +{ + return __builtin_bswap64 (*p); /* LDBRX. */ +} + +unsigned long +do_bswap64_reg (unsigned long a) +{ + return __builtin_bswap64 (a); /* gpr sequences. */ +} + +void +do_bswap64_store (unsigned long *p, unsigned int a) +{ + *p = __builtin_bswap64 (a); /* STDBRX. */ +} + +double +do_bswap64_double (unsigned long a) +{ + return (double) __builtin_bswap64 (a); /* XXBRD. */ +} + +unsigned long +do_bswap64_vect (unsigned long a) +{ + __asm__ (" # %x0" : "+v" (a)); /* XXBRD. */ + return __builtin_bswap64 (a); +} + +/* Make sure XXBR{H,W,D} is not generated by default. */ +/* { dg-final { scan-assembler-times "xxbrd" 3 } } */ +/* { dg-final { scan-assembler-times "xxbrh" 1 } } */ +/* { dg-final { scan-assembler-times "xxbrw" 1 } } */ +/* { dg-final { scan-assembler-times "ldbrx" 1 } } */ +/* { dg-final { scan-assembler-times "lhbrx" 1 } } */ +/* { dg-final { scan-assembler-times "lwbrx" 1 } } */ +/* { dg-final { scan-assembler-times "stdbrx" 1 } } */ +/* { dg-final { scan-assembler-times "sthbrx" 1 } } */ +/* { dg-final { scan-assembler-times "stwbrx" 1 } } */