From patchwork Wed Jul 27 21:16:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 653478 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3s07983Yq5z9t0G for ; Thu, 28 Jul 2016 07:16:59 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=G8pglRRM; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:references:mime-version:content-type :in-reply-to:message-id; q=dns; s=default; b=i4K5Kfvu/NwlNLIq/Ow AFlyvAC/D/9/9HFnMt8ePI5iRSErp8vSXpBSovN+nXoSdkwMvurW4zt4iDJanVPt u5yIkooTPDb4elFh0lsGKD4+jbdr8FbwWZvbXc+URuU8ly4BMSt6Hl0Vvzf7c/Fk /swOG+xgC793vyGsL7tW9KEI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:references:mime-version:content-type :in-reply-to:message-id; s=default; bh=S5b7vQCcQ8jjucDEamQev+oeT 7I=; b=G8pglRRMUdaD3vNgiJN41T5PbM3XIByfLhzZM93/3uR0tSYTRe0vTiwPU +MBUWQYiIYg+Prtqlp1LM5vKucXS2tliRoue6TXcVWddYx3OWXy40xaPVirOAVo4 Ag8tfus7A4Fb6fw0DV1FVfOQ9kp2hTA7SBmVZt+MErguBKSCXM= Received: (qmail 89013 invoked by alias); 27 Jul 2016 21:16:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 88979 invoked by uid 89); 27 Jul 2016 21:16:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.4 required=5.0 tests=AWL, BAYES_50, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=King, 978, 2506r, 8994797 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 27 Jul 2016 21:16:39 +0000 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u6RL9oGe109464 for ; Wed, 27 Jul 2016 17:16:37 -0400 Received: from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158]) by mx0a-001b2d01.pphosted.com with ESMTP id 24eatjw6xa-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 27 Jul 2016 17:16:36 -0400 Received: from localhost by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 27 Jul 2016 15:16:35 -0600 Received: from d03dlp03.boulder.ibm.com (9.17.202.179) by e37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 27 Jul 2016 15:16:31 -0600 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: meissner@ibm-tiger.the-meissners.org Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id CDB9D19D8042; Wed, 27 Jul 2016 15:16:04 -0600 (MDT) Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u6RLG0G04850128; Wed, 27 Jul 2016 14:16:29 -0700 Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B609413604F; Wed, 27 Jul 2016 15:16:29 -0600 (MDT) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP id 799C013604E; Wed, 27 Jul 2016 15:16:29 -0600 (MDT) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id C63CE46CF2; Wed, 27 Jul 2016 17:16:28 -0400 (EDT) Date: Wed, 27 Jul 2016 17:16:28 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt Subject: Re: [PATCH, 2 of 4], Enhance PowerPC vec_extract support for power8/power9 machines Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , David Edelsohn , Bill Schmidt References: <20160727143221.GA18453@ibm-tiger.the-meissners.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160727143221.GA18453@ibm-tiger.the-meissners.org> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16072721-0024-0000-0000-000014349C07 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005534; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000177; SDB=6.00736323; UDB=6.00345781; IPR=6.00508918; BA=6.00004625; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012090; XFM=3.00000011; UTC=2016-07-27 21:16:33 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16072721-0025-0000-0000-000043139B2A Message-Id: <20160727211628.GA21085@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-07-27_12:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607270209 X-IsSubscribed: yes Next patch in the vec_extract series. This patch adds support for vec_extract with a variable argument element number for vector double or vector long on 64-bit ISA 2.07 (power8) or ISA 3.0 (power9) systems. It needs 64-bit ISA 2.07 for the direct move support. I have tested this on a little endian 64-bit power8 system and a big endian power7 system with both 32-bit and 64-bit targets. Note, the power7 system uses the existing method of saving the vector and addressing the vector elements in memory because it doesn't have the direct move instructions. There were no regressions in the test, can I install these patches on the trunk? [gcc, patch #2] 2016-07-27 Michael Meissner * config/rs6000/rs6000-protos.h (rs6000_split_vec_extract_var): New declaration. * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add support for vec_extract of vector double or vector long having a variable element number on 64-bit ISA 2.07 systems or newer. * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Likewise. (rs6000_split_vec_extract_var): New function to split a vec_extract built-in function with variable element number. (rtx_is_swappable_p): Variable vec_extracts and shifts are not swappable. * config/rs6000/vsx.md (UNSPEC_VSX_VSLO): New unspecs. (UNSPEC_VSX_EXTRACT): Likewise. (vsx_extract_, VSX_D iterator): Fix constraints to allow direct move instructions to be generated on 64-bit ISA 2.07 systems and newer, and to take advantage of the ISA 3.0 MFVSRLD instruction. (vsx_vslo_): New insn to do VSLO on V2DFmode and V2DImode arguments for vec_extract variable element. (vsx_extract__var, VSX_D iterator): New insn to support vec_extract with variable element on V2DFmode and V2DImode vectors. * config/rs6000/rs6000.h (TARGET_VEXTRACTUB): Remove -mupper-regs-df requirement, since it isn't needed. (VEC_EXTRACT_OPTIMIZE_P): New macro to say whether we can optmize vec_extract on 64-bit ISA 2.07 systems and newer. [gcc/testsuite, patch #2] 2016-07-27 Michael Meissner * gcc.target/powerpc/vec-extract-1.c: New test. Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 238775) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -62,6 +62,7 @@ extern void rs6000_expand_vector_init (r extern void paired_expand_vector_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, int); extern void rs6000_expand_vector_extract (rtx, rtx, rtx); +extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx); extern bool altivec_expand_vec_perm_const (rtx op[4]); extern void altivec_expand_vec_perm_le (rtx op[4]); extern bool rs6000_expand_vec_perm_const (rtx op[4]); Index: gcc/config/rs6000/rs6000-c.c =================================================================== --- gcc/config/rs6000/rs6000-c.c (revision 238772) +++ gcc/config/rs6000/rs6000-c.c (working copy) @@ -5105,29 +5105,61 @@ altivec_resolve_overloaded_builtin (loca arg2); } - /* If we can use the VSX xxpermdi instruction, use that for extract. */ + /* See if we can optimize vec_extracts with the current VSX instruction + set. */ mode = TYPE_MODE (arg1_type); - if ((mode == V2DFmode || mode == V2DImode) && VECTOR_MEM_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST - && wi::ltu_p (arg2, 2)) + if (VECTOR_MEM_VSX_P (mode)) + { tree call = NULL_TREE; + int nunits = GET_MODE_NUNITS (mode); - if (mode == V2DFmode) - call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DF]; - else if (mode == V2DImode) - call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DI]; + /* If the second argument is an integer constant, if the value is in + the expected range, generate the built-in code if we can. We need + 64-bit and direct move to extract the small integer vectors. */ + if (TREE_CODE (arg2) == INTEGER_CST && wi::ltu_p (arg2, nunits)) + { + switch (mode) + { + default: + break; + + case V1TImode: + call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V1TI]; + break; + + case V2DFmode: + call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DF]; + break; + + case V2DImode: + call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DI]; + break; + } + } + + /* If the second argument is variable, we can optimize it if we are + generating 64-bit code on a machine with direct move. */ + else if (TREE_CODE (arg2) != INTEGER_CST && VEC_EXTRACT_OPTIMIZE_P) + { + switch (mode) + { + default: + break; + + case V2DFmode: + call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DF]; + break; + + case V2DImode: + call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V2DI]; + break; + } + } if (call) return build_call_expr (call, 2, arg1, arg2); } - else if (mode == V1TImode && VECTOR_MEM_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST - && wi::eq_p (arg2, 0)) - { - tree call = rs6000_builtin_decls[VSX_BUILTIN_VEC_EXT_V1TI]; - return build_call_expr (call, 2, arg1, arg2); - } /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2). */ arg1_inner_type = TREE_TYPE (arg1_type); Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 238775) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6958,8 +6958,31 @@ rs6000_expand_vector_extract (rtx target emit_insn (gen_vsx_extract_v4si (target, vec, elt)); return; } - else - break; + break; + } + } + else if (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (elt) + && VEC_EXTRACT_OPTIMIZE_P) + { + if (GET_MODE (elt) != DImode) + { + rtx tmp = gen_reg_rtx (DImode); + convert_move (tmp, elt, 0); + elt = tmp; + } + + switch (mode) + { + case V2DFmode: + emit_insn (gen_vsx_extract_v2df_var (target, vec, elt)); + return; + + case V2DImode: + emit_insn (gen_vsx_extract_v2di_var (target, vec, elt)); + return; + + default: + gcc_unreachable (); } } @@ -6977,6 +7000,99 @@ rs6000_expand_vector_extract (rtx target emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0)); } +/* Split a variable vec_extract operation into the component instructions. */ + +void +rs6000_split_vec_extract_var (rtx dest, rtx src, rtx element, rtx tmp_gpr, + rtx tmp_altivec) +{ + machine_mode mode = GET_MODE (src); + machine_mode scalar_mode = GET_MODE (dest); + unsigned scalar_size = GET_MODE_SIZE (scalar_mode); + int byte_shift = exact_log2 (scalar_size); + + gcc_assert (byte_shift >= 0); + + if (REG_P (src) || SUBREG_P (src)) + { + int bit_shift = byte_shift + 3; + rtx element2; + + gcc_assert (REG_P (tmp_gpr) && REG_P (tmp_altivec)); + + /* For little endian, adjust element ordering. For V2DI/V2DF, we can use + an XOR, otherwise we need to subtract. The shift amount is so VSLO + will shift the element into the upper position (adding 3 to convert a + byte shift into a bit shift). */ + if (scalar_size == 8) + { + if (!VECTOR_ELT_ORDER_BIG) + { + emit_insn (gen_xordi3 (tmp_gpr, element, const1_rtx)); + element2 = tmp_gpr; + } + else + element2 = element; + + /* Generate RLDIC directly to shift left 6 bits and retrieve 1 + bit. */ + emit_insn (gen_rtx_SET (tmp_gpr, + gen_rtx_AND (DImode, + gen_rtx_ASHIFT (DImode, + element2, + GEN_INT (6)), + GEN_INT (64)))); + } + else + { + if (!VECTOR_ELT_ORDER_BIG) + { + rtx num_ele_m1 = GEN_INT (GET_MODE_NUNITS (mode) - 1); + + emit_insn (gen_anddi3 (tmp_gpr, element, num_ele_m1)); + emit_insn (gen_subdi3 (tmp_gpr, num_ele_m1, tmp_gpr)); + element2 = tmp_gpr; + } + else + element2 = element; + + emit_insn (gen_ashldi3 (tmp_gpr, element2, GEN_INT (bit_shift))); + } + + /* Get the value into the lower byte of the Altivec register where VSLO + expects it. */ + if (TARGET_P9_VECTOR) + emit_insn (gen_vsx_splat_v2di (tmp_altivec, tmp_gpr)); + else if (can_create_pseudo_p ()) + emit_insn (gen_vsx_concat_v2di (tmp_altivec, tmp_gpr, tmp_gpr)); + else + { + rtx tmp_di = gen_rtx_REG (DImode, REGNO (tmp_altivec)); + emit_move_insn (tmp_di, tmp_gpr); + emit_insn (gen_vsx_concat_v2di (tmp_altivec, tmp_di, tmp_di)); + } + + /* Do the VSLO to get the value into the final location. */ + switch (mode) + { + case V2DFmode: + emit_insn (gen_vsx_vslo_v2df (dest, src, tmp_altivec)); + return; + + case V2DImode: + emit_insn (gen_vsx_vslo_v2di (dest, src, tmp_altivec)); + return; + + default: + gcc_unreachable (); + } + + return; + } + else + gcc_unreachable (); + } + /* Return TRUE if OP is an invalid SUBREG operation on the e500. */ bool @@ -38601,6 +38717,7 @@ rtx_is_swappable_p (rtx op, unsigned int case UNSPEC_VSX_CVDPSPN: case UNSPEC_VSX_CVSPDP: case UNSPEC_VSX_CVSPDPN: + case UNSPEC_VSX_EXTRACT: return 0; case UNSPEC_VSPLT_DIRECT: *special = SH_SPLAT; Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 238775) +++ gcc/config/rs6000/vsx.md (working copy) @@ -309,6 +309,8 @@ (define_c_enum "unspec" UNSPEC_VSX_XVCVDPUXDS UNSPEC_VSX_SIGN_EXTEND UNSPEC_P9_MEMORY + UNSPEC_VSX_VSLO + UNSPEC_VSX_EXTRACT ]) ;; VSX moves @@ -2118,16 +2120,13 @@ (define_insn "vsx_set_" ;; register was picked. Limit the scalar value to FPRs for now. (define_insn "vsx_extract_" - [(set (match_operand: 0 "gpc_reg_operand" - "=d, wm, wo, d") + [(set (match_operand: 0 "gpc_reg_operand" "=d, d, wr, wr") (vec_select: - (match_operand:VSX_D 1 "gpc_reg_operand" - ", , , ") + (match_operand:VSX_D 1 "gpc_reg_operand" ", , wm, wo") (parallel - [(match_operand:QI 2 "const_0_to_1_operand" - "wD, wD, wL, n")])))] + [(match_operand:QI 2 "const_0_to_1_operand" "wD, n, wD, n")])))] "VECTOR_MEM_VSX_P (mode)" { int element = INTVAL (operands[2]); @@ -2205,6 +2204,34 @@ (define_insn "*vsx_extract__store" [(set_attr "type" "fpstore") (set_attr "length" "4")]) +;; Variable V2DI/V2DF extract shift +(define_insn "vsx_vslo_" + [(set (match_operand: 0 "gpc_reg_operand" "=v") + (unspec: [(match_operand:VSX_D 1 "gpc_reg_operand" "v") + (match_operand:V2DI 2 "gpc_reg_operand" "v")] + UNSPEC_VSX_VSLO))] + "VECTOR_MEM_VSX_P (mode) && VEC_EXTRACT_OPTIMIZE_P" + "vslo %0,%1,%2" + [(set_attr "type" "vecperm")]) + +;; Variable V2DI/V2DF extract +(define_insn_and_split "vsx_extract__var" + [(set (match_operand: 0 "gpc_reg_operand" "=v") + (unspec: [(match_operand:VSX_D 1 "input_operand" "v") + (match_operand:DI 2 "gpc_reg_operand" "r")] + UNSPEC_VSX_EXTRACT)) + (clobber (match_scratch:DI 3 "=r")) + (clobber (match_scratch:V2DI 4 "=&v"))] + "VECTOR_MEM_VSX_P (mode) && VEC_EXTRACT_OPTIMIZE_P" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rs6000_split_vec_extract_var (operands[0], operands[1], operands[2], + operands[3], operands[4]); + DONE; +}) + ;; Extract a SF element from V4SF (define_insn_and_split "vsx_extract_v4sf" [(set (match_operand:SF 0 "vsx_register_operand" "=f,f") Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (revision 238772) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -602,7 +602,6 @@ extern int rs6000_vector_align[]; #define TARGET_DIRECT_MOVE_128 (TARGET_P9_VECTOR && TARGET_DIRECT_MOVE \ && TARGET_POWERPC64) #define TARGET_VEXTRACTUB (TARGET_P9_VECTOR && TARGET_DIRECT_MOVE \ - && TARGET_UPPER_REGS_DF \ && TARGET_UPPER_REGS_DI && TARGET_POWERPC64) /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present @@ -761,6 +760,11 @@ extern int rs6000_vector_align[]; && TARGET_SINGLE_FLOAT \ && TARGET_DOUBLE_FLOAT) +/* Macro to say whether we can optimize vector extracts. */ +#define VEC_EXTRACT_OPTIMIZE_P (TARGET_DIRECT_MOVE \ + && TARGET_POWERPC64 \ + && TARGET_UPPER_REGS_DI) + /* Whether the various reciprocal divide/square root estimate instructions exist, and whether we should automatically generate code for the instruction by default. */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-extract-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-1.c (revision 0) @@ -0,0 +1,27 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2 -mupper-regs-df -mupper-regs-di" } */ + +#include + +double +add_double (vector double a, int n) +{ + return vec_extract (a, n) + 1.0; +} + +long +add_long (vector long a, int n) +{ + return vec_extract (a, n) + 1; +} + +/* { dg-final { scan-assembler "vslo" } } */ +/* { dg-final { scan-assembler "mtvsrd" } } */ +/* { dg-final { scan-assembler "mfvsrd" } } */ +/* { dg-final { scan-assembler-not "stxvd2x" } } */ +/* { dg-final { scan-assembler-not "stxvx" } } */ +/* { dg-final { scan-assembler-not "stxv" } } */ +/* { dg-final { scan-assembler-not "ldx" } } */