From patchwork Fri Jul 26 03:20:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1137193 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-505706-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="r+8wITGk"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45vvVn3RBMz9s8m for ; Fri, 26 Jul 2019 13:21:03 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:cc:references:from:date:mime-version:in-reply-to :content-type:message-id; q=dns; s=default; b=nrVpU2n+GOilDsWnF6 gH/MhIFQAVyoKoyTZUiW9VMw/vdh5+dqlCXSNkdLeWhguWwM7RgJ+KAr7U4dafzK k0PpRv46Ybz6CF6O6d2i1O6vBxMJfeUBFmpeqqNkB+phkCfL91Ak2xTNESEU3JOu IGCV3/YPgUatGHDvQZi18cNuU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:cc:references:from:date:mime-version:in-reply-to :content-type:message-id; s=default; bh=GIVwK1YLHYCO5aKN4jWDArbU gBQ=; b=r+8wITGknFGY1Goz9S/xrmjmjP5LluBYBb08kzigF+gTvo/OyJP4sjij prFlZ4Ff93cCAkXjmCuftLHxpav57ISJrALCjLRPk+rQdGg4mp1G22r6QSmoRd/R KzoqnC8718JCuJxLrg9nTxB0KxazljKKJvr+wx1VRSjz2SKhS68= Received: (qmail 29951 invoked by alias); 26 Jul 2019 03:20:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29943 invoked by uid 89); 26 Jul 2019 03:20:55 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, MIME_CHARSET_FARAWAY, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=9 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 26 Jul 2019 03:20:52 +0000 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6Q3HDL0078595 for ; Thu, 25 Jul 2019 23:20:51 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tyrwqs9km-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 25 Jul 2019 23:20:50 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 26 Jul 2019 04:20:48 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 26 Jul 2019 04:20:44 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6Q3KhR559048140 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 26 Jul 2019 03:20:44 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DFA95A4051; Fri, 26 Jul 2019 03:20:43 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 242F4A4040; Fri, 26 Jul 2019 03:20:41 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.146.113]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 26 Jul 2019 03:20:40 +0000 (GMT) Subject: [PATCH V3, rs6000] Support vrotr3 for int vector types To: Segher Boessenkool Cc: GCC Patches , Jakub Jelinek , Richard Biener , richard.sandiford@arm.com, Bill Schmidt References: <20190715085929.GO2125@tucnak> <32f89c4f-cd2d-a7bd-16d2-26fed6bb5f56@linux.ibm.com> <27be90e6-4beb-5c4c-a163-9b136490d783@linux.ibm.com> <20190717134025.GJ20882@gate.crashing.org> <83f8448e-3c59-8991-2176-729d87e08a86@linux.ibm.com> <20190718194818.GT20882@gate.crashing.org> <20190719150647.GZ20882@gate.crashing.org> <20190725134958.GR20882@gate.crashing.org> From: "Kewen.Lin" Date: Fri, 26 Jul 2019 11:20:27 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190725134958.GR20882@gate.crashing.org> x-cbid: 19072603-0028-0000-0000-00000387FD6B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19072603-0029-0000-0000-000024483EDC Message-Id: X-IsSubscribed: yes Hi Segher, on 2019/7/25 下午9:49, Segher Boessenkool wrote: > Hi Kewen, > > On Tue, Jul 23, 2019 at 02:28:28PM +0800, Kewen.Lin wrote: >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -1666,6 +1666,60 @@ >> "vrl %0,%1,%2" >> [(set_attr "type" "vecsimple")]) >> >> +;; Here these vrl_and are for vrotr3 expansion. >> +;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit >> +;; AND to indicate truncation but emit vrl insn. >> +(define_insn "vrlv2di_and" >> + [(set (match_operand:V2DI 0 "register_operand" "=v") >> + (and:V2DI >> + (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v") >> + (match_operand:V2DI 2 "register_operand" "v")) >> + (const_vector:V2DI [(const_int 63) (const_int 63)])))] >> + "VECTOR_UNIT_P8_VECTOR_P (V2DImode)" >> + "vrld %0,%1,%2" >> + [(set_attr "type" "vecsimple")]) > > "vrlv2di_and" is an a bit unhappy name, we have a "vrlv" intruction. > Just something like "rotatev2di_something", maybe? > > Do we have something similar for non-rotate vector shifts, already? We > probably should, so please keep that in mind for naming things. > > "vrlv2di_and" sounds like you first do the rotate, and then on what > that results in you do the and. And that is what the pattern does, > too. But this is wrong: it should mask off all but the lower bits > of operand 2, instead. > Thanks for reviewing! You are right, the name matches the pattern but not what we want. How about the name trunc_vrl, first do the truncation on the operand 2 then do the vector rotation. I didn't find any existing shifts with the similar pattern. I've updated the name and associated pattern in the new patch. >> +(define_insn "vrlv16qi_and" >> + [(set (match_operand:V16QI 0 "register_operand" "=v") >> + (and:V16QI >> + (rotate:V16QI (match_operand:V16QI 1 "register_operand" "v") >> + (match_operand:V16QI 2 "register_operand" "v")) >> + (const_vector:V16QI [(const_int 7) (const_int 7) >> + (const_int 7) (const_int 7) >> + (const_int 7) (const_int 7) >> + (const_int 7) (const_int 7) >> + (const_int 7) (const_int 7) >> + (const_int 7) (const_int 7) >> + (const_int 7) (const_int 7) >> + (const_int 7) (const_int 7)])))] >> + "VECTOR_UNIT_ALTIVEC_P (V16QImode)" >> + "vrlb %0,%1,%2" >> + [(set_attr "type" "vecsimple")]) > > All the patterns can be merged into one (using some code_iterator). That > can be a later improvement. > I guess you mean mode_attr? I did try to merge them since they look tedious. But the mode_attr can't contain either "[" or "(" inside, it seems can't be used for different const vector mappings. Really appreciate that if you can show me some examples. >> +;; Return 1 if op is a vector register that operates on integer vectors >> +;; or if op is a const vector with integer vector modes. >> +(define_predicate "vint_reg_or_const_vector" >> + (match_code "reg,subreg,const_vector") > Hrm, I don't like this name very much. Why is just vint_operand not > enough for what you use this for? > vint_operand isn't enough since the expander legalizes the const vector into vector register, I'm unable to get the feeder (const vector) of the input register operand. >> + rtx imm_vec >> + = simplify_const_unary_operation (NEG, mode, operands[2], > > (The "=" goes on the previous line). OK, thanks. >> + emit_insn (gen_vrl_and (operands[0], operands[1], rot_count)); >> + } >> + DONE; >> +}) > > Why do you have to emit as the "and" form here? Emitting the "bare" > rotate should work just as well here? Yes, the emitted insn is exactly the same. It follows Jakub's suggestion via https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01159.html Append one explicit AND to indicate the truncation for the case !SHIFT_COUNT_TRUNCATED. (sorry if the previous pattern misled.) > >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c >> @@ -0,0 +1,46 @@ >> +/* { dg-options "-O3" } */ >> +/* { dg-require-effective-target powerpc_vsx_ok } */ > >> +/* { dg-final { scan-assembler {\mvrld\M} } } */ >> +/* { dg-final { scan-assembler {\mvrlw\M} } } */ >> +/* { dg-final { scan-assembler {\mvrlh\M} } } */ >> +/* { dg-final { scan-assembler {\mvrlb\M} } } */ > > You need to generate code for whatever cpu introduced those insns, > if you expect those to be generated ;-) > > vsx_ok isn't needed. > Thanks for catching, update it with altivec_ok in new patch. I think we can still have this guard? since those instructions origin from isa 2.03. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index b6a22d9010c..2b0682ad2ba 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1666,6 +1666,56 @@ "vrl %0,%1,%2" [(set_attr "type" "vecsimple")]) +;; Here these vrl_and are for vrotr3 expansion. +;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit +;; AND to indicate truncation but emit vrl insn. +(define_insn "trunc_vrlv2di" + [(set (match_operand:V2DI 0 "register_operand" "=v") + (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v") + (and:V2DI (match_operand:V2DI 2 "register_operand" "v") + (const_vector:V2DI [(const_int 63) (const_int 63)]))))] + "VECTOR_UNIT_P8_VECTOR_P (V2DImode)" + "vrld %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_insn "trunc_vrlv4si" + [(set (match_operand:V4SI 0 "register_operand" "=v") + (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v") + (and:V4SI (match_operand:V4SI 2 "register_operand" "v") + (const_vector:V4SI [(const_int 31) (const_int 31) + (const_int 31) (const_int 31)]))))] + "VECTOR_UNIT_ALTIVEC_P (V4SImode)" + "vrlw %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_insn "trunc_vrlv8hi" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (rotate:V8HI (match_operand:V8HI 1 "register_operand" "v") + (and:V8HI (match_operand:V8HI 2 "register_operand" "v") + (const_vector:V8HI [(const_int 15) (const_int 15) + (const_int 15) (const_int 15) + (const_int 15) (const_int 15) + (const_int 15) (const_int 15)]))))] + "VECTOR_UNIT_ALTIVEC_P (V8HImode)" + "vrlh %0,%1,%2" + [(set_attr "type" "vecsimple")]) + +(define_insn "trunc_vrlv16qi" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (rotate:V16QI (match_operand:V16QI 1 "register_operand" "v") + (and:V16QI (match_operand:V16QI 2 "register_operand" "v") + (const_vector:V16QI [(const_int 7) (const_int 7) + (const_int 7) (const_int 7) + (const_int 7) (const_int 7) + (const_int 7) (const_int 7) + (const_int 7) (const_int 7) + (const_int 7) (const_int 7) + (const_int 7) (const_int 7) + (const_int 7) (const_int 7)]))))] + "VECTOR_UNIT_ALTIVEC_P (V16QImode)" + "vrlb %0,%1,%2" + [(set_attr "type" "vecsimple")]) + (define_insn "altivec_vrlmi" [(set (match_operand:VIlong 0 "register_operand" "=v") (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0") diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 8ca98299950..c4c74630d26 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -163,6 +163,17 @@ return VINT_REGNO_P (REGNO (op)); }) +;; Return 1 if op is a vector register that operates on integer vectors +;; or if op is a const vector with integer vector modes. +(define_predicate "vint_reg_or_const_vector" + (match_code "reg,subreg,const_vector") +{ + if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + return 1; + + return vint_operand (op, mode); +}) + ;; Return 1 if op is a vector register to do logical operations on (and, or, ;; xor, etc.) (define_predicate "vlogical_operand" diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 70bcfe02e22..8c50d09a7bf 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -1260,6 +1260,35 @@ "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" "") +;; Expanders for rotatert to make use of vrotl +(define_expand "vrotr3" + [(set (match_operand:VEC_I 0 "vint_operand") + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)" +{ + rtx rot_count = gen_reg_rtx (mode); + if (GET_CODE (operands[2]) == CONST_VECTOR) + { + machine_mode inner_mode = GET_MODE_INNER (mode); + unsigned int bits = GET_MODE_PRECISION (inner_mode); + rtx mask_vec = gen_const_vec_duplicate (mode, GEN_INT (bits - 1)); + rtx imm_vec + = simplify_const_unary_operation (NEG, mode, operands[2], + GET_MODE (operands[2])); + imm_vec + = simplify_const_binary_operation (AND, mode, imm_vec, mask_vec); + rot_count = force_reg (mode, imm_vec); + emit_insn (gen_vrotl3 (operands[0], operands[1], rot_count)); + } + else + { + emit_insn (gen_neg2 (rot_count, operands[2])); + emit_insn (gen_trunc_vrl (operands[0], operands[1], rot_count)); + } + DONE; +}) + ;; Expanders for arithmetic shift left on each vector element (define_expand "vashl3" [(set (match_operand:VEC_I 0 "vint_operand") diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c new file mode 100644 index 00000000000..7461f3b6317 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c @@ -0,0 +1,46 @@ +/* { dg-options "-O3" } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ + +/* Check vectorizer can exploit vector rotation instructions on Power, mainly + for the case rotation count is const number. */ + +#define N 256 +unsigned long long sud[N], rud[N]; +unsigned int suw[N], ruw[N]; +unsigned short suh[N], ruh[N]; +unsigned char sub[N], rub[N]; + +void +testULL () +{ + for (int i = 0; i < 256; ++i) + rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8)); +} + +void +testUW () +{ + for (int i = 0; i < 256; ++i) + ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8)); +} + +void +testUH () +{ + for (int i = 0; i < 256; ++i) + ruh[i] = (unsigned short) (suh[i] >> 9) + | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9)); +} + +void +testUB () +{ + for (int i = 0; i < 256; ++i) + rub[i] = (unsigned char) (sub[i] >> 5) + | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5)); +} + +/* { dg-final { scan-assembler {\mvrld\M} { target powerpc_p8vector_ok } } } */ +/* { dg-final { scan-assembler {\mvrlw\M} } } */ +/* { dg-final { scan-assembler {\mvrlh\M} } } */ +/* { dg-final { scan-assembler {\mvrlb\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c new file mode 100644 index 00000000000..bdfa1e25d07 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c @@ -0,0 +1,47 @@ +/* { dg-options "-O3" } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ + +/* Check vectorizer can exploit vector rotation instructions on Power, mainly + for the case rotation count isn't const number. */ + +#define N 256 +unsigned long long sud[N], rud[N]; +unsigned int suw[N], ruw[N]; +unsigned short suh[N], ruh[N]; +unsigned char sub[N], rub[N]; +extern unsigned char rot_cnt; + +void +testULL () +{ + for (int i = 0; i < 256; ++i) + rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt)); +} + +void +testUW () +{ + for (int i = 0; i < 256; ++i) + ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt)); +} + +void +testUH () +{ + for (int i = 0; i < 256; ++i) + ruh[i] = (unsigned short) (suh[i] >> rot_cnt) + | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt)); +} + +void +testUB () +{ + for (int i = 0; i < 256; ++i) + rub[i] = (unsigned char) (sub[i] >> rot_cnt) + | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt)); +} + +/* { dg-final { scan-assembler {\mvrld\M} { target powerpc_p8vector_ok } } } */ +/* { dg-final { scan-assembler {\mvrlw\M} } } */ +/* { dg-final { scan-assembler {\mvrlh\M} } } */ +/* { dg-final { scan-assembler {\mvrlb\M} } } */